EPSY440 - Evaluation
Chapter
Fourteen Notes (Nitko, 2001)
Don't assume all students will automatically know how to do their best.
Most educational assessments strive for maximum performance versus typical performance:
maximum performance - procedures and conditions are set for the student to attain the best score he/she can.typical performance - procedures and conditions are set so the student will perform under ordinary or typical conditions.
In order for students to perform their best, they should have the following information regarding the assessment:
In relation to the above bits of information students need in order to perform their best:
Students also need the following minimum test-taking skills:
Certain clues in items will alert test-wise students to the correct answer (e.g., obvious associations between words in stem and words in an alternative, specific determiners eliminating answers, longer and more qualified answers being the key, grammatical clues in the stem, and overlapping alternatives).
test-wiseness - the ability to use test-taking strategies, clues from poorly constructed items, and past experience to improve one's score beyond what it would normally be.
Although you should construct or use top-quality assessments, you should also make sure all students have the skills below to put them on an even par with more test-wise students:
Although the popular belief is that changes answers leads to making more mistakes, if the student uses thoughtful consideration, research shows that 2 out of 3 changed answers will be correct.
Test anxiety is common in those students who are motivated to do well.
Task-directed thoughts are held by students who perceive evaluation as a challenge, where their thoughts and actions are focused on completing the task at hand to reduce any associated tension.
Task-irrelevant thoughts are commonly held by students who perceive assessments as threats, where they are self-preoccupied and center on what would happen if they failed, their own helplessness, and a desire to escape from the tension-producing situation as soon as possible (cognitive interference).
Factors that may be under the instructor's control include:
The appearance and layout of an assessment is vital for validity of the interpretations.
You should type or write the assessment items neatly, and not give them orally.
If there are a lot of objective items, an answer sheet should be provided and items should be placed in a booklet.
Arranging items by format also reduces students having to change their "mindset."
I can be beneficial to arrange content areas according to how they were taught in class.
Within each content area, items should be arranged from easy to difficult.
Test directions should contain minimum information such as the number and format of items, the time allotted, where and how answers should be indicated, what the penalty is for guessing, and general strategies the students should follow.
Tests should be copied with high quality and should be kept in secure locations.
Before administering a test, every keyed response should be verified.
Correction for guessing formulas can be applied to scores on multiple-choice and true-false tests (see formulas and variations on p. 318.
If standardized tests are used, the test manual must be followed or else the interpretations of the results are invalid.
Item Analysis
item analysis - the process of collecting, summarizing, and using information from students' results to make decisions about each assessment task/item.
Uses of item analysis for classroom tests include:
Steps for doing an item-analysis are listed in Figure 14.8 and will be reviewed in class, as will the item analysis computations.
item-difficulty index - the fraction of the total group answering an item correctly (overall item difficulty for a complete test should average .50, with necessary adjustments for guessing).
item-discrimination index - the difference between the fraction of the upper group answering the item correctly and the fraction of the lower group answering the item correctly.
Item analysis for constructed-response and performance exams is similar to that of objective exams (see the formulas on p. 323).
Item difficulty can be used to:
Item discrimination is important when you wish to rank order students (i.e., show that some are outperforming others).
More weight should be given to the discrimination index than to the difficulty index because the discrimination index can change the rank ordering from the way you intended, but the difficulty index does not.
Generally speaking, you want discrimination indexes to be positive, because a negative index would indicate that the lower scoring students had a greater percentage answering the item correctly than the upper scoring students.
Improving Multiple-Choice item Quality
Every distractor should have at least one student from the lower-scoring group selecting it, and more students overall from the lower group should select the distractors than students from the upper group.
Note that not every lower-scoring student will lack knowledge of the correct answer for a particular item, and not very higher-scoring student will have the correct answer indicated.
If no one from the lower group selects a distractor, it should be revised or eliminated.
Higher-scoring students will sometimes select an answer that was not keyed but was plausible, which would call for revision of the alternative (ambiguous alternative).
When lower-scoring students are divided between two alternatives, this is more likely to mean the students are selecting a common misconception, which is what you would expect from less knowledgeable students.
You need to always consider the characteristics of the group of students taking the exam and not just rely totally on the statistics to make revisions.
If a large number of upper-scoring students select a wrong answer, it could be that it was keyed incorrectly (miskeyed item) .
If each alternative is selected by an equal number of upper-scoring students, that would very likely be an indication of blind guessing due to lack of knowledge or a poorly written item (always look at patterns of the upper group and not the lower group to determine guessing).
Guessing adds to the standard error of measurement and decreases the reliability and validity of the interpretations of an assessment.
Items that perform well can be added to an item bank or pool for future use.
Even if items have difficulty and discrimination indices that are less than ideal, you should still select items that will cover the important areas of content outlined by the test blueprint.
Finally, for tests that measure only one ability, the difficulty index for items should be between .16 and .84.
For tests that measure more than one ability, the difficulty index for each item should be between .40 and .60.
If tests are used for rank ordering (as most classroom tests are), then the discrimination index for each item should be above .00.
If tests are used for absolute (i.e., mastery) achievement, then item discrimination indices should be above .30.
This Webpage designed and updated (11/17/01) by Ron Dugan, University at Albany, State University of New York.