EPSY440 - Evaluation


Chapter Fifteen Notes (Nitko, 2001)


Continuous assessment - the daily process of gathering information about a student's progress in achieving the curriculum's learning targets.

· Includes both formative and summative assessment.

Grading - refers to the process of using symbols, such as letters, to indicate various types of student progress.

As indicated in previous chapters, grades must be based on reliable and valid instruments.

Information for assigning grades should come from several places, including:

· Curriculum materials
· Quizzes and tests
· Performance tasks
· Student products
· Portfolios
· Teaching teams

Many teachers dislike grading because:

1. educational achievement is difficult to evaluate properly,
2. different opinions and philosophies exist for utilizing different methods, and
3. judgments are difficult and often unpleasant to make.

Information included in report cards can include:

· the content and/or objectives met
· subject performance comparisons
· performance relative to others
· social behavior

Among the many uses for grades are:

1. Reaffirming what is already known.
2. Documenting progress and course completion.
3. Using extrinsic rewards/punishments.
4. Obtaining social or teacher attention.
5. Requesting new educational placement.
6. Judging a teacher's competence or fairness.
7. Indicating school problems.
8. Supporting vocational/career guidance.
9. Limiting or excluding students from extracurricular activities.
10. Promotion or retention.
11. Granting diplomas.
12. Determining prerequisite knowledge.
13. Selecting for postsecondary education.
14. Deciding if an individual has basic skills for a job.

Teachers and parents often have different interpretations of what is in a report card.

Grades communicate not just achievement, but the teacher's and school's values.

Evaluations of achievement need to be separate from evaluations of noncognitive student characteristics.

Grades are usually criticized for the following reasons:

1. They are essentially meaningless.

a. Diversity among institutions/teachers, a lack of definite grading policies, carelessness in grading, and the use of grades for punishment, are reasons for improving grading practices.
b. Summative grades were never meant to substitute for the complex details of assessing achievement.

2. They are educational unimportant.

a. Grades may be symbols but they are not unimportant.
b. Tangible outcomes can be observed or assessed and it is valuable to do so.
c. Both the teacher's evaluation and the student's self-evaluation should be used.
d. Grades predict future achievement and some extra-class accomplishments.
e. Valid grades reflect what students were taught.

3. They are unnecessary.

a. Although not needed for all evaluations, grades cannot be eliminated.
b. Grades are needed for counseling, guidance, and accountability.
c. They serve summative and record-keeping purposes for large amounts of information.

4. They are harmful.

a. Teachers and parents use grades for punishment, and parents overstress the importance of grades, but he extent to which this is done is not known.

A genuine concern among teachers is the effect low grades can have on student motivation.
Common methods for grading, and their advantages (a) and disadvantages (b) are:

1. Letter grades (A, B, C, etc.)

a. Easy to use, interpret, and provides a concise summary.
b. Meaning varies between persons/schools, do not describe strengths/weaknesses, elementary kids can be "demotivated" by them.

2. Number/Percentage grades (99, 98, 97, etc.)

a. Same as letter grades, plus more continuous and can be used in conjunction with letters.
b. Same as letter grades, plus meaning not immediately apparent unless explanations are provided.

3. Two-category grades (pass-fail, satisfactory-unsatisfactory, etc.)

a. Less harmful to younger students, encourages older students to take classes normally threatening to GPA.
b. Less reliable than continuous systems and doesn't communicate enough information for others to judge.

4. Checklists and Rating Scales (checks or numerical ratings of degree)

a. More detailed and can be combined with letter grades or group-referenced data.
b. Can be too detailed for parents and cumbersome for staff.

5. Narrative reports (usually accompanies types of codes given above)

a. Allows description, relates performance to standards, targets, etc., and opens dialogue between parents, teachers, and students.
b. Time-consuming, requires excellent writing skills, mat need translation, parents may not want to read them or may be overwhelmed, and usually modified to include checklists.

6. Pupil-teacher conferences (no codes but involves discussion of above)

a. Offers personal opportunities for discussion and can be an integrated, ongoing process.
b. Teacher must be skilled in offering positive & negative feedback, time-consuming, can be threatening, and doesn't offer summation for the institution.

7. Parent-teacher conferences (same as above)

a. Allows clarification and discussion of concerns, samples of work can be shown, can lead to improved home-school relations.
b. Time-consuming, requires preparation, can cause anxiety, not adequate for reporting large amounts of info, can be inconvenient.

8. Letter to parents (same as above)

a. A useful supplement for other reporting methods.
b. Short letters are inadequate and require exceptional writing skills.

When evaluating your school's reporting methods, consider the following:

1. Some methods are used more frequently than others.

a. Letter grades used more frequently with upper grades and conferences don't occur until later years.

2. Schools often use multiple methods on the same report card.

a. Letter grades for achievement, rating scales for attitude and behavior.

3. Conflicts can arise between methods.

a. School administrators need concise summaries, parents and teachers may want more explanation.

Multiple marking system - using more than one method and multiple symbols.

Permanent record card - official record of a student's school performance.

· Reporting elementary grades is controversial, yet some may get upset when letter grades suddenly replace two-category grading (satisfactory-unsatisfactory).

Grades need to be placed in one of three frameworks to make them interpretable to everyone:

1. Norm-referencing (relative standards)
2. Criterion-referencing (absolute standards)
3. Self-referencing (growth standards)


Grades are meaningless unless:

· You adopt a conceptual framework in which to assign them, and
· Use the framework in a consistent way with your teaching.


Grades need to be consistent with:

· The reasons why you assign them, and
· Your school district's philosophy and "grading culture."


Teaching approaches can be classified into two categories:

1. Focus on defined learning targets - students should attain high achievement while meeting high standards and attaining worthwhile learning targets.
2. Focus on outperforming one's peers - students should focus on high achievement by trying to outdo one's peers.


Formative Evaluation Purposes for Grading

Feedback on progress from a starting point - communicate to students the progress they have made from the point at which they began.

Feedback on nearness to achieving standards - communicate how close students are to achieving standards and learning targets.

Feedback on how one stands in relation to classmates - communicate how well a student is performing relative to his/her peers at various points during the year.


Summative Evaluation Purposes for Grading

Level of achievement in relation to standards - evaluates how well a student has achieved specific learning targets or high standards (without reference to where they began or how far they need to go).

Standing relative to peers - Evaluating where a student ranks in achievement relative to his/her peers.


Three Conceptual Frameworks for Grading

Criterion-referenced - also referred to as absolute standards or task-referenced grading, this method assigns grades by comparing student's performance to defined standards, learning targets, or knowledge to be gained (allows all "A"s or "B"s, or conversely, all "D"s or "F"s).

· Learning targets and standards need to be realistic.
· Absolute standards are set using norm-referenced information (what is appropriate for this level of class?).
· Meaningful when you have a well-defined domain of performance.
· Arguments for or against focus on importance of knowing achievement independent of others.

Norm-referenced - also called grading with relative standards or group-referenced grading, this method assigns grades based on how well a student performed compared to other students.

· Arguments for or against this method focus on competition (necessary or not).
· Need to define the reference group to which you are comparing a student.
· Does not convey what a student can actually do in relation to the curriculum.
· Requires good grades for top performers regardless of actual achievement level (and vice-versa for lower performers)
· Also need to be grounded in standards appropriate for the grade level.
· Don't "waffle" and grade "on the curve" because performance is low overall (you need to discover why).

Self-referenced - also known as growth-based or change-based grading, this method assigns grades based on a student's performance in relation to what he/she is capable of.

· Arguments in favor focus on reducing competition and using grades to motivate.
· Arguments against focus on unreliability of teacher judgments, parents' need to know performance relative to others, a focus on lower-performing students, and a possible conversion to effort grading.
· Biased against higher performers (less gain possible resulting in lower grades).


One grading system cannot serve all teaching philosophies or all purposes for wanting to assign grades.
(Table 15.5 on p. 353 shows which grading frameworks are consistent with which teaching approaches)

Report cards may contain all three types of grading (criterion-, norm-, and self-referenced).

 

(The rest of these notes are focused on summative grading methods)

Grading is linked to the assessment plan developed at the beginning of the marking period which outlined what activities would be assessed, and how they would be weighted (think of our class syllabus that described grading).

Assessment variables - the complete set of characteristics on which you gather information (e.g., attitudes, skills, motivation, etc.).

· Not all variables need to be recorded and reported.

Reporting variables - the set of assessment variables your district expects you to report on.

· Usually includes achievement, social behavior, skills attained, and effort.

Grading variables - an even smaller subset of variables upon which you base your assignment of course achievement grades.

· Includes performance tasks, tests, and quizzes.

The text recommends not confusing achievement grading with other classroom behavior grading (e.g., handing in assignments late, missing class, poor penmanship, etc.).

· Keep in mind that achievement at higher grade levels involves the integration of many skills, for which assessment on the various skills is a legitimate practice, especially when they are related to real-world success (e.g., writing and time management skills).

Include in your summative grading assessments you deem valuable (including formative assessment tasks such as homework and quizzes) and exclude those deemed not as valuable to a summative grade.

You need to plan ahead and make all your marking scales across assessments equivalent (such that simple sums of points will be meaningful, as in using all percentages):

· You cannot have separate scales of 1-50 points and 1-100 percentage points and then have the total sum equal the grade.
· You can have separate scales of 1-50 point and 1-100 points, or 1-50 percentage points and 1-100 percentage points, where the sum is meaningful.

o Proper example:

25/50 points on a percentage scale equals 50%
50/100 points on a percentage scale equals 50%
50% plus 50%, divided by 2, equals final grade of 50%, which is meaningful.

o Improper example:

25/50 points on a point scale equals 25 points
50/100 points on a percentage scale equals 50%
25 points plus 50%, divided by 2, equals a meaningless final grade of 37.5

More points or gradations on a scale allow for finer discrimination and increased reliability.

Just because a scale is based on percentage points, it can still be a coarse scale (vs. fine-grained) if there are only, say, five items, resulting in scores of 0, 20, 40, 60, 80, & 100 percentage points, which would be equivalent to have a scale of 0-6.

The weight of grading any assignment should be related to its importance to achieving the overall learning targets.

Consider these six factors when weighting the grading of any assessment:

1. Components that assess more learning targets and content should be weighted more heavily.
2. Components on which you spent the most time should receive more weight.
3. Components that require students to integrate and apply learning get more weight.
4. Overlapping components should receive less weight than those assessing individual components.
5. If certain components are biased toward some but necessary, weight it less than others.
6. Less reliable and objective components should be weighted less.

Setting boundaries for grades (i.e., what constitutes and "A", a "B", etc.) is determined by the grading method you are using and the school policy, and should be equivalent across assessments.

Borderline cases - students whose marks are on or near the boundaries.

· Lowering borderline grades is as valid as raising them when evidence justifies it.
· Remember true scores vary from observed scores.
· Raising or lowering borderline grades based on other achievement information is more valid than doing it based on effort.

Failing grade (F) - since this carries much emotion due to its negative consequences, it should be based on a student consistently performing below expected standards.

· Failing work (work of poor quality) and failing to try (unmotivated) are 2 ways of assessing failure; giving an "F" for failure to try is not valid because it does not signal level of achievement.
· Make sure lateness or failure to submit assignments is separate from level of achievement.
· Assigning an "incomplete" is more appropriate when work is not handed in.

Zeros can severely impact the average grade, so a better policy would be to grade only those assignments that have been handed in. You could also substitute the mean for the missing grade.

· Other factors need to be taken into consideration (equal-value and multiple assignments, reason for failure to turn in assignments, etc.).


Techniques for setting grade boundaries:

Grading on the curve - ranking the students' grades from highest to lowest, using preset percentages from the normal curve, or using the standard deviation from a normal distribution.

Percentages

Top 20% get "A"s Next 30% get "B"s Next 30% get "C"s
Next 15% get "D"s Lowest 5% get "F"s

Normal curve

Top 14% get "A"s Next 34% get "B"s Next 34% get "C"s
Next 14% get "D"s Bottom 4% get "F"s


Standard deviation

Complicated system based on using the mean GPA of the class (ability level), the standard deviation of performance on an assessment, and the class mean of the assessment. Then, lower boundaries are set for each grade level using the standard deviation (see Figure 15.5 on p. 360).

Grading a composite of scores:

· Consistent with the norm-referenced method (normal curve) outlined above.
· Usually based on assessments from several different assignments (quizzes, tests, etc.)
· The component that contributes the most to the final rankings should be weighted most heavily.

o Simply summing the marks then ranking will be inconsistent because the standard deviation from separate assessments will vary, changing the rankings between assessments and altering the final rankings such that the intended primary assessment has little effect on the final rankings (see p. 361 in textbook).
o Standardizing scores first, then multiplying them by the weights intended, and then summing the scores can correct this.

§ Composite scores equal the sum of the standardized scores times their weights.

Criterion-referenced letter grades:

Fixed-percent method - scores on separate assessments are first converted to percent correct, then percentages are translated into grades, using the same letter-grade boundaries for each assessment.

Total points method - each assessment is given a maximum point value, and grades are assigned based on the total number of points.

Quality level method (a.k.a. rubric or content-based method) - describe the quality level of performance students must demonstrate fro each grade (what qualifies as an "A", a "B", etc.).


Grading a single test or performance:

Fixed percent method - using percentages as the basis for assigning grades, which are really arbitrary percentages.

· Domain must be defined and properly sampled for this method to be adequate.
· Teacher must have a good idea of student ability levels and difficulty levels of tests.
· The same fixed percentages are used for every assessment.

Total points method - decide in advance what types of performance will contribute to the final grade, then assign total point maximums to each type of performance. Percentages are then assigned to the total point levels.


Grading a composite of several scores:

Fixed percentage method - have a percentage score for each student for each component. Then multiply each component percentage by its corresponding weight, add the products together, and divide by the sum of the weights.

· If you don't use weights, each component counts equally toward the composite (simple addition of percentages divided by total number of assessments).

Total points method - same as that described above (make sure the maximum points assigned to each component is equivalent to the weighting you want for them).

Many computer programs (gradebook programs) are available to assist in the calculation of grades and for storing and tracking grades (e.g., simple software programs that come with standard office suites, such as Excel).

Back to course notes.

Back to course homepage.

This Webpage designed and updated (11/25/01) by Ron Dugan, University at Albany, State University of New York.