Welcome to CELA!   Center on English Learning & Achievement

Bringing the Science Assessment Standards into the Classroom

Audrey Champagne, Susan Sherwood, and Ozlem Cezikturk

* This essay is prepared for the Teacher Materials Project of Horizon Research, Inc.

Here’s a pop quiz. No fair looking at your neighbor’s answer…

1. What is the purpose of assessment in science education? Choose one:

To monitor student progress

To plan teaching activities

To formulate education policy

Depends upon whom you ask

All of the above (and more)

The correct (and sometimes maddening) answer is "E. All of the above." Because assessment serves many purposes in our efforts to improve science achievement, assessment is a central focus of the standards-based reform movement. This new emphasis on assessment has been frustrating and at times overwhelming to classroom teachers whose primary goal is to develop science literacy, not to test it. The purpose of this essay is to suggest how the Assessment Standards contained in the National Science Education Standards (NRC, 1996) can help science teachers meet this goal.

To begin, we should all have a common definition of the term assessment. Time for another pop quiz (you know the rules):

2. Which of the following is most consistent with the definition of assessment contained in the National Science Education Standards? Circle one:

Multiple choice quizzes focused on factual information

Essay exams used to measure students’ conceptual understanding

Performance tasks used to measure students’ abilities to do inquiry

Tests used for grading

Data collection with a purpose

And the answer is "E." Next question: What kinds of data are collected and for what purposes? Government agencies at the federal and state levels spend millions of dollars collecting student achievement data, teacher quality data, and per-pupil expenditure data. These data are used primarily for the purpose of making policy. Some states use individual student achievement data to determine which students will receive high school diplomas. Local districts collect student achievement data and use it to identify teachers and schools that are doing the best job of meeting standards. Teachers regularly collect student achievement data for the purpose of grading.

Teachers also collect data to plan and guide instruction and provide feedback to students on their progress. Monitoring students’ responses (verbal and non-verbal) to teaching/learning activities and responding to them on a minute-to-minute basis is almost automatic. Teachers, therefore, are often unaware that they are collecting data and using it to decide how to proceed with a lesson. Intuitively, teachers understand that assessment is a powerful tool for improving student learning. The National Science Education Standards contain important information about how to harness the power of the assessment tool.

Next quiz:

How many National Science Education Assessment Standards are there?


Seems like a thousand


Whose standards are we talking about?


The total number of National Science Education Standards (NSES) set by the National Research Council (NRC) is twenty-eight. (The NRC NSES are not to be confused with the AAAS Benchmarks for Science Literacy.) Of the twenty-eight, five are Assessment Standards. The NRC Standards have been created for all facets of science education, including teaching, professional development, content, assessment, programs, and agencies and organizations. With that framework in mind, let’s focus on how the five Assessment Standards can be used to improve student achievement.

Standard A: Assessments must be consistent with the decisions they are designed to inform. (NRC, 1996, p. 78). This standard suggests that as teachers prepare for assessment, they need to answer three basic questions:

What general purpose will the data I collect serve? Is the purpose to plan my teaching, report to parents, provide feedback to students…?

What specific decision will I make with the data I collect?Will I use the data to decide how to improve the way I teach inquiry? Will I use the data to decide whether or not to fail a student? Will I use the data to decide if I should re-teach weight-weight problems tomorrow?

What data do I need to make the decisions? If my decision is related to the teaching of inquiry, my data should measure how students’ ability to inquire improved as a result of the teaching method I used. If my decision is about failing a student, my data should include, among other data, a broad range of information about the student’s achievement, the effort the student has put forth, and mitigating circumstances in the student’s personal life. If my decision is about whether or not to re-teach weight-weight problems tomorrow, my data might include the questions students asked at the end of class and their performance on the practice problems assigned for homework.

Standard B: "Achievement and opportunity to learn science must be assessed" (NRC, 1996, p. 79). There are two considerations here, opportunity to learn (OTL) and achievement. When addressing OTL, the NRC Standards tell us that "Student achievement can be interpreted only in light of the quality of the programs they have experienced" (p.78). Furthermore, …"Because student achievement is in part dependent upon opportunity to learn, opportunity to learn and achievement must be assessed equally" (p. 82). Fair’s fair. Without adequate resources, time, and teaching, students can not be held accountable for achievement. When examining student achievement, you need to ask yourself, did I give my students adequate opportunity to learn? As a classroom teacher, not only do you need to assess student achievement but also the OTL that students in your classroom have had to develop the understanding and abilities that you expect them to achieve.

The second consideration is that assessment should focus on highly valued content (inquiry; understanding facts, concepts, theories, and principles; scientific reasoning and decision making; and scientific communication) not just content that is easily assessed. Basic knowledge is more easily assessed than understanding or the abilities of inquiry. Even so, all must be addressed. If, for instance, only basic knowledge is assessed, both teachers and students will likely dismiss the importance of the understanding or the abilities of inquiry. Assessing inquiry abilities can be a time-consuming ordeal, but do we really want students to get the message that inquiry is of little significance? (Champagne, Kouba, & Hurley, in press) Moreover, if most of our assessment consists of multiple choice exams of scientific factoids, we are telling students that if they can pass a test on that information, they "know" science. (NRC, 1996, p. 82). The assessments you give communicate to your students the content you believe has most value. Consequently, you need to ensure that the content you assess is the valued content defined by your state and the National Science Education Standards. Additionally, the content must be assessed using appropriate measures.

Standard C: Decisions made and actions taken are dependent upon the quality of data that is used. The strategy for collecting data should match what you are trying to measure. For instance, it doesn’t make sense to assess students’ ability to conduct scientific inquiry with a conventional paper-and-pencil test. However, a multiple-choice test could be very useful to a teacher measuring student ability to get information about the chemical elements from the periodic table.

Assessments should be realistic (some use the adjective authentic), that is, similar to the activities engaged in by scientists or scientifically literate adults. Even when conducting assessments, you should be preparing your students to use their knowledge and abilities in situations outside of school. For example, you might design a test of students’ ability to perform purity tests on water in the context of providing information to a town board faced with an environmental decision. Done properly, assessment is a learning tool for students as well as a measuring tool for teachers.

A prime directive of this standard is that serious decisions require precise and accurate data. An informal "seat-of-the-pants" classroom assessment is fine if you are monitoring student feedback contemplating how to proceed with a lesson. However, if the purpose of data collection has far-reaching consequences such as student retention, you must be confident that the data are you have collected are reliable and valid.

Reliable assessments produce similar results. If inquiry skills are the content being measured, the same results should be obtained whether the inquiry task is in the context of the physical or life sciences. An assessment is valid when it measures that which it claims to measure. For instance, students’ ability to generate a testable hypothesis can not be measured using short answer items that require the student to identify those hypotheses that are testable.

When making a high-stakes decision such as who passes or fails a course, you have the responsibility to collect high quality data. The assessment must therefore be rigorously constructed.

Standard D: Assessments must be fair (NRC, 1996, p. 85). To say it another way, assessments should be equally unfair to all students. Tasks need to be set in a variety of contexts that do not favor the experience of any one group; that is, set in contexts which are accessible to males and females; city and rural dwellers; and any other groups you can identify whose experiences are unique. For instance, if physics teachers take the effort to include problems about rotational motion in their test banks that are set in the contexts of the kitchen, the farm and the garage, all students will have equal chances to encounter items set in contexts with which they are familiar. Conversely, all students will have an equal chance to encounter items set in contexts with which they are not familiar. The purpose of this standard is more than fairness; it’s to ensure assessment validity. Results should be related to student science content understanding only, not to gender, ethnicity, or other exogenous factors. (NRC, 1996, p. 85).

Standard E: Inferences made about student achievement and opportunity to learn must be sound (NRC, 1996, p. 86). Personal beliefs and experiences often come into play when we reason from data to conclusions, even when we strive for objectivity. To encourage others to consider the soundness of our conclusions, we must identify assumptions and be explicit about each step in the reasoning process from data to conclusion.

For example, if your students do very poorly on a test you have given, you might draw several different conclusions: the students are poorly motivated; the test was too hard; the content was too difficult; you did not give them ample opportunity to learn. Which, if any, is reasonable, and why? Any conclusion you draw without support of the underlying assumptions and clear evidential reasoning is meaningless. Whether discussing individual or class achievement, teachers should be vigilant about providing support for their inferences and conclusions to parents, administrators, and the community. In this way teachers present themselves as well-prepared professionals.

Last quiz, we promise. Since this question has several complex answers, we’ll use a constructed response format.

What’s an appropriate next step for you to take after reading this essay? (Write your response.)

Possible answers include some or all of the following (the more the better). Using the Assessment Standards as guidelines, teachers can explore the details of their programs to see if adequate time and materials are present for the level of student achievement that is expected. They can evaluate their assessment procedures to determine if all levels of student learning (basic content, inquiry skills, understanding) are included. Assessments to which students are exposed (including those that are required by the district and state) should be examined for reliability and validity. To be most effective, all of these steps require time and collaboration, so share this information with your colleagues and begin analyzing opportunity to learn and assessment in your classroom, school, district, and state. You need to go beyond the classroom as you think about assessment; even after you close the classroom door, district requirements and state mandates vis a vis assessment will influence what you do. Realizing the potential assessment has for improving your students’ opportunity to learn will take a while. (You might want to order a pizza; you’ll need sustenance. But hold the anchovies.)


American Association for the Advancement of Science (1993). Benchmarks for science literacy. New York: Oxford University Press.

Champagne, A. B., Kouba, V. L., & Hurley, M. (in press). Assessing inquiry. In J. Minstral & E. van der Zee (Eds.), Inquiring into inquiry. Washington, D. C.: American Association for the Advancement of Science.

National Research Council (1996). National science education standards. Washington, DC: National Academy Press.

image/reddot.gif (35 bytes)
The Center on English Learning and Achievement