Back to the Center on English Learning & Achievement   Center on English Learning & Achievement


Assessment of Literacy Development in Early Childhood

Peter H. Johnston and Rebecca Rogers

* Chapter 26, Handbook of Early Literacy Research. Susan B. Neuman and David K. Dickinson.

Most literacy assessment occurs in the school years because, at least in most Western countries, literacy learning is considered the responsibility of the school, though when school literacy instruction actually begins varies considerably internationally roughly between ages five through eight (Clay, 1993; Elley, 1992) and the nature of initial instruction varies similarly.   In the United States, since the thirties, literacy-related assessment has occurred in the early years of schooling because of beliefs about the relationship between learning and development (Durkin, 1967). Gessell's (1925) maturational view held that children's minds must reach a certain level of maturity before they would be able to learn to read. In that context, Morphett and Washburn published a study in 1932 claiming that children must reach a mental age of 6.5 (judged by an intelligence test) before which literacy instruction would be not merely wasteful, but downright damaging. In their suburban Chicago school system it was common practice to put a chart on the wall highlighting when each child would be chronologically old enough to have an appropriate "mental age" to learn to read.  In spite of arguments and data to the contrary, starting with Gates (1937), this "readiness" testing continues in one form or another with children being screened out of school (not the intention of early proponents). Similarly, the Gessell continues to be one of the most popular tests for school readiness (Gnezda & Bolig, 1988), and prescreening for Kindergarten is mandated in 19 states, though in some, such as Louisianna, the tests cannot be used to withhold from children services such as proceeding to first grade (Shepard, Taylor & Kagan, 1996).

The persistence of these practices, is due, in part, to beliefs about development, to which we shall return presently, and in part to the downward creep of accountability and meritocratic gatekeeping. It is also due, in part, to a general increase in testing that is consistent with cultural beliefs about the value of such practices (Johnston, 1993a). Consequently, as Shepard (1994, p. 212) points out, "In the past decade testing of 4-, 5-, and 6-year-olds has been excessive and inappropriate. … [encouraging] the teaching of decontextualized skills."  On the basis of interviews with state department officials, Shepard and her colleagues (Shepard & Taylor & Kagan, 1996) drew the following conclusions about early assessment practices in the United States. First, states are trying to reduce the use of readiness testing and retention from recent high levels. Second, state accountability testing has been virtually eliminated in grades K-3. Third, there is some exploration of alternative forms of assessment that are more representative of, and relevant to, instruction, although teachers appear to be ill-prepared to make instructional use of new forms of assessment. Fourth, there is less misuse of screening assessments for instructional purposes (for which they are ill-designed). Fifth, preschool children are tested primarily because of mandates attached to funding streams for special needs. Sixth, there is a strong demand for parent education about the meaning and implications of test scores. Seventh, there is considerable fragmentation among programs for young children and the assessments attached to these programs.

Although these assertions are made about early assessment practices in general rather than about literacy assessment specifically, they are applicable to early literacy assessment because, of all the assessments done in the early years of school, literacy is the largest focus. Nonetheless, in our view, the situation is not as optimistic as these state department officials portray it. In the United States accountability pressure continues to rise (Jennings, 1998) increasing the incentive for readiness testing and retention, and even though state departments might believe they are discouraging it, local testing goes on unabated, particularly in urban districts. We would also argue that not only are teachers ill-prepared to make instructional use of new forms of assessment, but the school psychologists to whom children are referred for mandate assessments, are commonly even less prepared. Additionally, the exploration of alternative forms of assessment is given such low status that progress and recognition are slow. Indeed, assessments introduced in the eighties remain some of the most productive available for young children (e.g., Barrs, Ellis, Hester, & Thomas, 1989; Clay, 1985), although these still see only modest use (Falk & Darling-Hammond, 1993), with the possible exception of Marie Clay’s work which has exercised substantial influence on the assessment instruments developed by schools and some states (e.g., the New York State Early Literacy Profile). Nonetheless, where these practices are employed, they generally take a back seat when making placement decisions or when evaluating programs.

Because most literacy assessment begins after children enter school, these assessments receive more attention in this paper than preschool literacy assessment. There is, nonetheless, reason for increasing concern about preschool literacy assessment, and not only in the United States. In England, five-year-olds are facing a standardized test, mostly involving literacy, which will be used for accountability and grouping purposes (Fletcher, 1998). In Japan, five-year-olds cram for months, even years, for the tests to get into prestigious private elementary schools (Tolbert, 1999). Within the United States there is concern for several reasons. First, screening children out of school with literacy-related readiness tests continues to occur. Second, financial incentives to classify and side-track children expected to experience difficulty becoming literate, available under PL94-142, have been extended downward to younger children through PL99-457. Third, recent research on literacy has been interpreted as supporting a view of literacy as simply decoding (Fletcher & Lyon, 1998; Grossen, 1996). This view has encouraged assessment of assumed prerequisites, particularly phonemic awareness, to move into the preschool years. Fourth, there is a justifiable interest in early intervention as a means for preventing long-term problems from developing (Clay, 1987; Vellutino, Scanlon, & Sipay, 1997).  Fifth, there is an increasingly popular sense that literacy can (and should) be acquired very early – according to the presidential front-runner at the time of writing this chapter, by the age of three. Sixth, electronic media increasingly move testing into children's homes, in part, it seems, as a way to establish a market for electronic self-instruction systems.

An example (perhaps symptom) of these issues is provided by the arrival of a new test "Reading Edge,"™ produced by some leading figures in educational psychology and reading. The test is intended to "identify children 'at risk' for reading problems in grades K-2", claiming that "up to 40 percent of the more than 50 million school-aged children in the U.S. experience substantial difficulties learning to read or becoming proficient readers." The test provides information on normative performance and "risk for developing reading difficulties" in "essential skill areas" "including phonological awareness, phonological memory, letter identification, phonemic decoding, phonemic blending, and sequencing and patterning skills."  These sub-tests are connected to a "family of training programs [that] enables children with language and reading problems to make gains on average, of 1 to 2 years in language skills in only 4 to 8 weeks of training." Understanding the source of these developments requires understanding the lingering rhetoric of traditional assessment in early childhood education, which we will explore later. But first we describe our stance on assessment and literate development.

A Way of Thinking about Early Literacy Assessment

The term assessment is used in this paper to refer to the broad repertoire of behaviors involved in noticing, documenting, recording and interpreting children’s behaviors and performances. Testing is a subset of assessment in which performances are controlled and elicited in standardized conditions. Assessment practices are a kind of literate activity involving the representation of children's behavior, often in print, and a range of value-laden social interactions around that representation process. Assessment is part of a larger discourse about children, literacy, and learning. In other words, literacy, learning, and assessment, are fundamentally discursive practices, involving ways of knowing, believing, valuing, relating, behaving and representing (Egan-Robertson, 1998; Gee, 1996). Two principles are attendant on this. First, assessment is fundamentally interpretive, influenced by values, beliefs and language and, second, assessments have consequences for children’s literate development. For example, Rueda and Mercer (1985) reported that the designation of either "learning disabled" or “language delayed" was a function of whether there was a psychologist or a speech therapist on a placement committee. The designation used will have ramifications for the literacy instruction the student receives, and thus for the literacy he or she acquires because it will change the nature and focus of the interactions that take place around text. These differences in interactions occur not just across different teachers, but also with the same teacher across children designated as more or less able. For example, when a teacher asked her first graders what someone needs to be a writer, a less able student said “eraser, pencil.” When asked “What kind of writer are you?” he said, “A printer.” Other students said things like “poet,” or “scarey writer.” These are self-assessments – statements of identity that have implications for future participation in literate engagements. As others in this volume have pointed out, becoming literate is not simply learning to read and write in the narrow sense of converting speech to print and back again. In becoming literate, children acquire beliefs, values, and relationships that are part of their developing identities (Gee, 1996), and assessment-related discourse is a central agent in this development.

These principles have implications for assessment validity. Any assessment practice will affect the discursive environment and thus alter the constructs teachers use to represent children’s learning and to organize their teaching behavior (Johnston, 1992; Moss, 1998). For example, two teachers can notice a child’s spelling of candy as knde and one call it an “invented spelling” and the other a “spelling error,” or even a sign of a disability. This, in turn changes the discursive environment for students, whose understandings and identities are also changed. Indeed, Mehan and others have shown the role of language in the process of children’s construction as “handicapped” or “disabled” which often occurs even before a child begins school (Clay, 1987; McDermott & Varenne, 1995; McDermott, 1993; Mehan, 1993). In other words, the consequences of assessment practices are an inseparable aspect of their validity (Messick, 1994; Moss, 1998).  As a further example, the impact on a child’s life of being retained, including the ways children incorporate this knowledge into their daily lives and identities, is part of the validity of the assessment that produced the retention. This impact can be substantial for as Shepard and Smith (Shepard & Smith, 1989) point out, retention ranks up near a death in the family in terms of psychological trauma. Pressures that follow from testing can also, lead not only schools to exclude children from kindergarten, but parents, the nett effect being to escalate curricula in the early years of school.

As with any interpretive practice, understandings are influenced by the circumstances in which they are made, including the assessment environment. For example, when asked for a description of the literacy development of a child they know well, teachers who work in high-stakes testing environments not only say less than teachers in less oppressive circumstances, but they are less likely to report on children's interests, or tastes in literature, or to mention specific books. Furthermore, the language they choose for describing students' literacy development is more distancing (Johnston, Guice, Weiss & Afflerbach, 1993). Assessment practices are not merely objective, non-reactive applications of scientific tools. Rather, they are always social interactions, always interpretive, and always have consequences: a teacher may change her instruction, a child might be placed in a different classroom, or a child, a teacher, or a parent may change her understanding of herself and of what it means to be literate. Whatever tool might be involved in assessment - a test, an observation checklist, a record of oral reading - it must be interpreted by a person within an interpretive community. Two people from different interpretive communities choose different things to assess, different means to assess them, frame the data they collect differently, and use different words to describe what they observe. Even if a test yields a number the test has been structured by people with particular views of literacy, and the number must be interpreted in order to have any meaning (Brozo and Brozo, 1994; Johnston, 1992; Tierney, 1998). Parents, teachers, and students will often make different meanings from children's literate behaviors, and it is common enough for two teachers to give very different descriptions of the same child's literacy development.

The purposes, forms, and interpretations involved in assessment practices are cultural artifacts. They reflect, and insist on, certain social values, beliefs, practices and relationships. For example, not only do different cultures expect children to learn to read and write at different ages, but some include critical literacy from the start, whereas others defer it until high school or do not address it at all. Some assessment practices focus entirely on what children have learned about letters, sounds and words, while others emphasize what they have learned about what literacy does and how it works (Johnston, 1997). Some assessment practices focus on normative performance, and only what can be accomplished independently, whereas others address what children can accomplish collaboratively, with normative performance being of limited relevance (Barrs et al., 1989). These assessment practices reflect social beliefs and values, however, they also enforce them by insisting that certain behaviors are valued, and insisting on certain representations of children and their literacy. They anchor institutional discourses.

Literacy Learning and Development

At the heart of different approaches to early literacy assessment are different views of literacy, learning, and development. For example, there are two fundamental assumptions underlying readiness testing. The first is the roughly Piagetian view that development precedes learning, such that learning cannot take place unless the appropriate prior cognitive development has occurred. The second assumption that enables readiness testing is that there is a fixed, non-accommodating instructional program – sometimes called “formal instruction.” For example, currently it is popular to view “phonemic awareness” essentially as a readiness feature since there are strong claims made that such awareness is a necessary precursor to literate development – anything prior to such development is viewed as prereading (Grossen, 1996). However, there have been major criticisms leveled at both the research and the interpretations that lead to such assessment practices (Allington & Woodside-Jiron, 1999; Dressman, 1999). Indeed, even strong advocates of the importance of phonemic awareness have argued to keep it in perspective. For example, in addressing the assessment of decoding processes, Juel (1991) points out that she assumes "that the person doing the assessment already has determined that the child has basic knowledge about the function and basic conventions of print (if the child does not have such basic knowledge, then worrying about decoding seems premature.)" (p. 144). Besides, such efforts rarely encourage examining children’s writing as a way of learning about their understanding of literate concepts. Dressman (1999) has criticized the misuse of research in the area of phonological awareness for early assessment and instruction, pointing out that the norms of particular tests do not accurately reflect the range of phonological systems of children taking the tests. He also challenges the connection often made between the poor performance of non-mainstream children in phonemic and phonological awareness tests and in reading achievement, as does Elley (1997).

The contrasting view of literacy assessment and development, in which readiness tests do not make sense, would be the Vygotskian position that learning leads development. In this view, children are socialized into a set of social practices, beliefs and values, through guided, socially meaningful participation. In Rogoff’s (1995) terms, we apprentice young children into literacy and they appropriate, or make their own, these social practices through a series of transformations in use. So although we learn such things as the conventions of text - the organization of print, books, computer screens, and so forth – we also learn to value and to act around print in particular ways. We learn what constitutes authority, whether we should ask questions of books and authors, under which circumstances, what kind of questions, who is in charge of producing and consuming texts, how able we are, what it means to be able, the significance of ability and who we are as literate beings. Literacy, and assessment are not just “done” they are part of a process in which children are “becoming” (Rogoff, 1995) or  “being” (Gee, 1999; Lankshear, 1997) literate.

These contrasting views influence the timing and function of early literacy assessment through their implications for intervention. Marie Clay (1991;1993), for example, has argued strongly that if long term failure is to be prevented it must be addressed before failure experience accumulates and children develop too many confusions about the organization of print systems. Holdaway (1979), agrees, observing “What we need is a preventive system which locates children experiencing difficulty very early before accumulating failure disorders their natural learning” (p. 167).   But the question of how early to intervene continues to be problematic and centrally involves assessment. If children arrive at school with little literate experience, Clay argues that it would be premature to conclude that they are experiencing difficulty learning. Most children develop a range of literate knowledge, if not before they come to school, then in the first year at school, when immersed in a literate learning community. She proposes letting a child participate in a productive literate environment for a year before assessing with an eye toward intervention. This approach assumes, among other things, that literacy is acquired as part of participation in a literate culture. It also assumes that instruction is a productive part of assessment. In other words, part of assessment involves observing how a child responds to a particular instructional environment. More direct and detailed examination of the child-instructional-context relationship is a rare assessment practice, exceptions being found in the work of Lyons (Lyons, 1991; Lyons, Pinnell, & DeFord, 1993) as part of Reading Recovery, and in arguments made by Vellutino and his colleagues (Vellutino, Scanlon, & Sipay, 1997) following Clay (1987).

Purposes of Early Literacy Assessment

Early intervention for preventing (or responding to) difficulty is one common purpose of literacy assessment. But there are two ways to approach examining the purposes of early literacy assessment – current practice and recommended practice. Starting with the latter, consider the position of the National Association for the Education of Young Children (NAEYC) (1991) on the appropriate purposes of early assessment. The first (and foremost) legitimate purpose of early childhood assessment is "to plan instruction for individuals and groups and for communicating with parents" (p.32) (we would argue that this is really two functions). The second function is "to identify children who may be in need of specialized services or intervention." This function deserves some caution since it is commonly transformed into "to classify children in order to access revenue streams," or "to identify children whose removal from the accountability testing pool would make the institution appear more successful," resulting in identification of as many children as possible. Indeed, retention, special education, and transition classrooms can claim up to two thirds of the students by the middle of elementary school (Allington & McGill-Franzen, 1992). The function might better be phrased, "to identify children’s strengths and appropriate, and sometimes specialized, instructional supports.” Placing the emphasis on the identification of instructional supports rather than identification of children makes this purpose simply another example of instructional planning. A step further might be "to identify what the child knows and can accomplish independently and with particular kinds of support."

The third function legitimized by NAEYC is "to evaluate how well the program is meeting its goals." This function requires that goals are in fact agreed upon, and that assessment practices can address those goals. Current practice relies on standardized, norm-referenced tests. But the use of these instruments with young children has been heavily criticized as we shall see, and program evaluation does not require every student to take a standardized test, or to do the same thing under the same circumstances (Clay, 1993; Johnston, 1992). Various kinds of performance sampling would be more appropriate for examining program effectiveness. But perhaps this function, too, should be reworded to "to examine avenues for improving instruction." Merely determining whether or not a program is meeting its goals would not provide information on how to improve. Changing the language also returns this function to an instructional one. A restatement of these NAEYC assessment functions, then, might be:

  1. to optimize student learning

  2. to engage parents (and students and other community members) in a productive conversation about learning. This function is, of course, an extension of function one, which requires students to be engaged in such a conversation about their own learning.

  3. to develop instructional programs and institutional supports for instruction.

That optimizing student learning is the primary goal of assessment is a principle asserted by position papers from NAEYC (1991), the International Reading Association and National Council of Teachers of English (1994), and the National Forum on Assessment (Phye, 1997).  Although the purposes of assessment described above make sense to a wide range of constituents, they are not, alas, what happens in practice. Current early literacy assessment functions include: Screening children for school readiness, identification of handicapping conditions, retaining or promoting children, grouping children by ability, holding teachers accountable for children's learning, or holding schools accountable for funding expenditures (e.g., Goals 2000 or Title I), and providing specific interventions for specific children. Most of these assessment practices have been roundly criticized in the literature both for the technical inadequacies of the available tests, and for the serious consequences of the practices themselves (Bredekamp, 1986; Shepard & Smith, 1986; Stallman & Pearson, 1991).  For example, on the matter of technical inadequacy, Shepard and Smith (1986, p. 80) point out that "none of the available tests is accurate enough to screen children into special programs without a 50 percent error rate."

Each professional group that offers standards on assessment argues that the primary purpose of early literacy assessment is to optimize student learning. To accomplish this goal, there are contextual as well as cognitive to issues to be addressed, such as (Gregory, 1997; Johnston, 1997): The funds of knowledge children bring to school, and the advantage to which they might be turned for literacy development, the languages the child brings and his or her ability to transfer one to the other, the permeabilityof classroom interactions around print to the language(s) and patterns of interaction in the child’s home, the child’s understanding of literate activity, the ways and circumstances under which the child enquires into language, what he or she can know and do under which circumstances, what the child partially understands,and can almost do, or do with support, the logic of the child’s errors, how the child recruits social resources to his or her own learning. This information is substantially dependent on detailed individual observation and interaction, particularly in the classroom, making the teacher the central agent of assessment. This responsibility of the teacher is also a principle that is strongly asserted by a number of authors (Brozo and Brozo, 1994; Hodges, 1997; Johnston, 1997) and in position statements by NAEYC (1991), IRA/NCTE (1994), and the National Forum on Assessment (Phye, 1997). The principle implies that improving assessment entails investing in the development of teachers’ knowledge of children and their literate development, more than investing in testing devices. However, this principle is in conflict with current practice and assessment privilege as encoded, for example, in the “formal” versus “informal” assessment contrast.

“Formal” and “Informal” Assessment

Among literacy researchers, there is considerable support for the necessity of “informal” assessment, and very little support for “formal” measures of early literacy (Clay, 1993; Johnston, 1997; Stallman & Pearson, 1991; Teale, 1991). The formal assessments are afflicted with numerous shortcomings, some of which accrue from the consequences of their use. There is little argument to be made for predictive tests, such as readiness and intelligence tests, which use group performance to predict individual performance in the future – a risky business at best and self-fulfilling at worst. Such tests also do not contribute instructionally useful information. Indeed, there is very little that norm-referenced, especially group tests have to offer for the assessment of young children. As Clay (1993, p. 1-2) points out, they are “indirect ways of informing teachers about learning. By comparison with the observation of learners at work, test scores are mere approximations or estimates, at times misrepresenting individual progress in learning, and at times presenting results stripped of the very information that is required for designing or evaluating sound instruction." By way of contrast, the Concepts about Print test (Clay, 1993), a standardized assessment of how a child understands beginning literacy conventions, correlates as well with later success in reading as does an individual intelligence test at five years of age, and substantially better at six years of age (Clay, 1998). In addition, however, unlike readiness or intelligence tests, it provides clear information about what a child knows and needs to know about print conventions – information that a teacher can act upon. In addition, it is conducted in a manner that is, while standardized, more like the normal literacy activities in which students engage.

The following additional critiques have been leveled at formal assessment practices – that is, standardized, group administered, norm-referenced tests (Heibert & Calfee, 1992; Johnston, 1992; Juel, 1991; Paris, Turner, & Lawton, 1991; Pearson & Stallman, 1991; Teale, 1988). First, group testing of young children with little experience of the behavior required in tests leads to much slippage in performance and interpretation. Young children are easily distracted, particularly in long, unengaging abstract activities , and group instructions are easily misunderstood and the misunderstandings are hard to detect (Bredekamp, 1986). Second, these tests are not very good at telling us what to do with particular children. For example, item sampling means that the teacher only learns, whether or not the child knows some letters of the alphabet, not which ones. Third, these tests are least sensitive with children who are having difficulty because test makers select fewer items from the extremes of the distribution (Juel, 1991). Fourth, tests oversample particular aspects. For example, the largest portion of items (48 percent according to Stallman and Pearson, 1991) is devoted to symbol-sound knowledge, and they have trouble dealing with writing. Fifth, they provide indirect indicators. For example, only five percent of items require production rather than recognition (Pearson & Stallman, 1991). Sixth, they do not capitalize on the significance of the systematic nature of children’s errors. Finally, they are not capable of dealing with linguistic and cultural diversity.

The science on which formal testing is based was borrowed from the physical sciences and is not well-suited for documenting and analyzing complex sociolinguistic development (Johnston, 1992; Keller, 1985; Knoblauch & Brannon, 1988; Taylor, 1990). Indeed the underlying technological premises of “decomposition, decontextualization, and objectivity” (Pearson & Stallman, 1991, p. 41) are not merely inappropriate, they are destructive, leading to a simplistic linear view of literacy and deficit driven instruction which promotes student passivity. Glazer and Seaross (1988) refer to this worldview as the “testing mentality” or in Coles’ (1987) terms the “learning Mystique.” As problematic as these assumptions are, they are deeply embedded in the psyche of the dominant culture of the United States (Johnston, 1993b). Report cards based on different assumptions often produce confusion and rejection.

The alternative to “formal” assessment practices is commonly referred to as “informal” assessment. Breaking down the privilege enjoyed by the unmarked “formal” will require replacing the distinction with terms like documentary, descriptive, or informing contrasted with traditional. Unlike the traditional “objective” measures, documentary assessment explicitly depends on the human expert (Johnston, 1987) – a sensitive observer (Clay, 1993), or kidwatcher (Goodman, 1978).  Although young children are less able to articulate their knowledge of literacy, the more authentic the circumstance, the more literacy is a common part of community talk, and the more sensitive the observer/ interviewer, the more children are able to help us understand their development (Tammivarra & Enright, 1986). For example, Scrivens (1998) provides case studies of three and four year old children who reveal through practice, interview, and imaginative play with peers a great deal about their literate development.  Indeed, children’s play is a particularly good place in which to observe their literate development, especially the literate practices into which they have been socialized (Roskos & Neuman, 1993; Teale, 1991; Whitmore & Goodman, 1995), and their understanding of the functions of literacy.

A further contrast between traditional and documentary assessment lies in the relationship between assessor and student. In traditional assessment, we pretend that there is really no interaction, or at least that it is the same for everybody. In documentary assessment the interaction is an important part of the assessment because the point of inquiry is not simply what the child knows and can do independently, but what the child; knows and can do, partially knows and can almost do, and can accomplish with some particular social support ( Feuerstein, 1979; Rogoff, 1995; Vygotsky, 1962). However, actual analysis of the interactions involved with children as part of assessing a child’s literacy, is rare (but see Roskos & Neuman, 1993, and Lyons, 1991). Intervention, or changing instruction, as an assessment technique is also not common, although Clay (1987) has argued that before a child is classified as reading disabled, one should at least systematically examine the possibility that careful instruction, focused by ongoing sensitive observation and record-keeping will bring the child into the normal range of performance (see also Vellutino, Scanlon, & Sipay, 1997). Such assessment strategies foreground even more the teacher’s role as the primary agent of assessment and, perhaps, interactions between the teacher and student as a primary “object” of assessment. 

Teacher as Primary Agent of Assessment

When it comes to young children's literacy learning, there is substantial consensus that the teacher is the primary assessment agent (Clay, 1998; Johnston, 1992; IRA/NCTE, 1994) for at least five reasons. First, the connection between assessment and instruction is most direct.  Second, individual teacher observation is particularly important with young children and in areas of complex learning (Clay, 1993).  Third, young children are often less articulate about their learning than are older children and thus require sensitive observation and interaction in order to access their literate understandings. Fourth, data are most meaningfully collected in the course of teachers’ daily interactions with students, and most meaningfully interpreted within an ongoing history. Fifth, since the zone of proximal development is fundamentally social – the area in which something can be accomplished only with assistance – it requires a trusting relationship for students to extend themselves into areas of potential failure. It also means that the teacher needs to have a working knowledge of appropriate kinds of support, and a sense of how to analyze the interactions.

The sixth, and most significant reason that teachers are the primary agent of assessment is that their assessments have an impact on their children’s learning, on their instructional relationship, and on children’s assessments of themselves as a literate individuals (Freppon, 1994; Johnston, 1997; Johnston, Woodside-Jiron, & Day, 1998; Ruiz and Enguidanos, 1997). For example, it is common in many places for children in kindergarten and first grade to be classified as learning disabled with respect to reading and writing. This process, which has a substantial effect on a child's literacy career (McGill-Franzen & Allington, 1993), begins with the teacher who’s assessment triggers a referal. Some teachers refer many more students to special education than do others. High-referral teachers, when asked for a description of a child's literacy development, give very brief and undetailed descriptions compared with those of low-referral teachers (Broikou, 1992). Although both high and low-referral teachers notice who is having trouble, those who have a more detailed knowledge of their students appear to have a greater sense of agency with respect to the students’ continued development.

The most important advances in early literacy assessment, then, involve educating teachers in observing, documenting, analyzing, and responding to children's literate behaviors (Athey, 1991; Johnston, 1993). As teachers gain control over language that allows for more, rather than less, complete and contextualized representations of children’s literate engagements they are also better able to help parents notice and make sense of children's literate development. Furthermore, as teachers become more able to help children talk about the literate activity in which they are engaged, they will also find it easier to document their development and understand their confusions. This involves the important assessment skill of listening, a skill that is often in short supply and reduced further by testing pressure. Many teachers have difficulty listening to young students for lack of time and because they feel obliged to ask known-answer questions to check comprehension. This is exacerbated with students who struggle with language or whose language is not the language of the classroom. Comprehension questions and retellings are primary assessment interactions (Brown & Cambourne, 1989; Morrow, 1990; Sulzby, 1985, Moss, 1997). These interaction patterns, and their implications, for relationships of authority between readers and text, become part of the literacies children acquire (Johnston, et al., 1998).  As an alternative, the questions that children ask about books, print, and writing, can be even more revealing. Arranging for them to ask such questions, listening, can be productive both for the information gained, and the literate practices acquired (Commeyras, 1995).

Qualities of Assessment Practices in Early Literacy

Traditionally, two primary dimensions of quality have been applied to assessments; validity and reliability. However, these criteria have undergone considerable transformation. The consequences of the use and interpretation of assessments has come to be integral to judgments of validity (Messick, 1995; Moss, 1995).

Thus, the dictum “first do no harm,” has been centralized in the construct meaning that unless the assessment practice improves the quality of the child’s literacy learning, it should not occur. Applying this criterion to traditional assessment practices renders them seriously problematic (Stallman & Pearson, 1991; Watson and Henson, 1991; Johnston, 1992; Taylor, 1990). Applying this framework to documentary forms of assessment, however, is not as simple because whether or not the assessment is productive depends a great deal on what the teacher knows (and the institution allows) about literacy development and the organization of literate environments, particularly ones in which children can experiment with the social uses of print, talking as they go.  It also depends on the constructs embedded in the discourse used to represent the children’s literate development. Teachers can bring a discourse of disability or a discourse of assets to their documentation, consequently documenting different behaviors using different constructs, with different consequences for children’s learning. Given this, the validity of the assessment cannot be considered without also examining the goals of the assessment practice. That is, of central importance is an alignment between pedagogical and assessment intentions and practices. 

Reliability, also a mainstay of traditional assessment, has been similarly re-worked in terms of generalizability – across time, observers, and so forth(Shavelson & Webb, 1991).

In documentary assessment it is appropriate to ask; what does this performance or interpretation represent? Or, over which dimensions, or contexts, does this performance or interpretation generalize for which purpose? Reliability in this sense is usually associated with generalizing a judgment over different judges, or over time, or different circumstances. Certainly one way of increasing agreement, or stability, is by increasing the number of assessments over time. This is an important strength of documentary assessment, in which assessments are part of the instructional context and occur more frequently than more traditional measures (Gipps, 1994). 

With reliability, the idea is to eliminate variability in interpretation, to get at the “true” representation, as if there were a single stable trait being measured. But since literacy is now seen as more contextualized than it is portrayed in traditional tests, some variability in performance is expected across contexts, and such variability is viewed not simply as an indicator of assessment error, but as an expected source of legitimate variability in performance. Furthermore, if assessment is viewed as a vehicle for learning, some reasonable level of disagreement is less problematic since it provides space for dialogue – and hence learning – about literacy and a student’s literate development.  Further, when collecting evidence of children’s literacy learning in various contexts, there are more likely to be surprises in the data collected. These surprises, rather than “error,” are places for further investigation into the instructional context. Whether or not an assessment will measure the same thing twice is less important than whether the assessment can productively focus our instruction.

Documentary assessment provides some advantages in the context of these newer views of assessment quality. For example, ongoing classroom assessments are more likely to directly affect the quality of instruction in positive ways. Similarly, because the “items” (instances of performance) in documentary assessment are more direct examples of actual reading and writing, students’ work will be represented in a manner consistent with their daily performance. As teachers and students are in control of day-to-day assessments, practicing these types of assessments allows them more agency in co-constructing supportive learning contexts. Over time and across contexts, these samples are more likely to reveal patterns of student learning. In the long run, the validity and generalizability of the particular assessment instrument seems to be less important than the trustworthiness of the interpretations and patterns that emerge across contexts, and the instructional responses to these interpretations.

Although teachers are likely to use documentary language in talking with other teachers or with parents, there continue to be tensions in how they represent children’s learning outside of the immediate context of the child. That is, when accountability data, report card grades, or information for student placement decisions need to be provided, teachers rely primarily on information garnered from standardized measures of achievement (Hodges, 1997). A necessary effort, then, in establishing the authority and visibility of documentary forms of assessment in early childhood literacy is to establish discussions around these points of tension.  Arguments against documentary assessment practices protest the potentially unsystematic and biased nature of the practices. But there is no way to avoid bias in assessment practices even with standardized tests (Johnston, 1992).

Tensions also continue to exist between what Valencia, Lipson and Wixson (1994) have referred to as “internal” and “external” assessment purposes. External assessment practices are those connected to accountability purposes while internal assessments inform on-going pedagogical decisions. These tensions revolve around what counts as evidence; how the information is reported, used and valued, and which people or groups of people get to make these decisions. The cultural disposition to regard objective, decontextualized and deficit driven information with more, credibility and authority results in continued emphasis on normative tests, particularly with high stakes purposes.  Nonetheless, a number of attempts have been made to increase the level of credibility of teacher’s ongoing “informed” assessments (Barrs et al., 1989; Wolf 1993). Furthermore, teachers’ careful observation and description of children’s literacy development, even that done “on the fly,” is not random or unsystematic. There are also many examples of systematic assessment instruments presented elsewhere (e.g., Clay, 1993; Juel, 1991; Barrs et al, 1989; Genishi & Dyson, 1985; Johnston, 1997). However, an example of such an assessment device might clarify important issues.

In Practice – An example

Viewing assessment as part of a discourse around children’s literacy development the question becomes, what are the implications of particular kinds of assessment practices for children’s literate development? Or, more positively, how can we arrange for a most productive discourse involving children’s literate learning and instruction? An example might help. Consider the Primary Language Record (PLR), developed by Barrs and her colleagues (Barrs et al., 1989) for early literacy assessment in multi-lingual inner-city London. The assessment is standardized in that it has requirements of what data need to be collected in which ways, how the data should be analyzed, who should be involved, and how consistency and trustworthiness should be maintained. It is not, however, primarily norm-referenced, and neither is it a test. The principles on which the procedure is based include (p. 8): the involvement of parents, children (including those with special needs), and all of all teachers who teach the child, the importance of recording children’s progress across the curriculum, in all major language modes, and in the other community languages they know as well as English, and the importance of a clear framework for evaluating progress in language. The intended use of the procedure is at different points during the year so that it informs instruction and is not simply summative paper work. It is also cumulative over the course of the child’s development. The record form begins with a space for a record of the discussion with parents around work samples and observations, and children’s literate practices at home. The record is signed by both teacher and parent. Instructions are provided for maximizing parent participation and engagement, and focus on the child’s development, including ways to minimize power issues. Next is a record of a discussion or interview with the child.  Teachers are provided with useful instructions on interviewing children and helping them reflect productively on their own work.

The PLR focuses on what the child can do, in order to “provide a positive basis for further work by parents and other teachers.” (p. 11).  The record also emphasizes the specificity of data and source – partly a matter of building trustworthiness, but also a matter of enabling challenges to assumptions. A record is also made of any instructional procedure that has been associated with progress for the child, and any potentially useful practices, given the other information available. Similarly, the manual points out that, “progress or lack of progress should always be seen in relation to the adequacy of the context” (p.18), thus moving the focus of assessment to the learning system, not simply the child. The manual also devotes considerable space to helping teachers understand the framework of reading/writing development on which it is based, and what is necessary to set the stage for productive performance. This overtly recognizes the values and assumptions underlying the assessment.

The child’s development as a reader is placed on two scales – independence to dependence and inexperience to experience with each point on these five-point scales clearly defined, and with potentially problematic terms further defined in the manual. These latter terms include dimensions such as “pleasure and involvement in story and reading, alone and with others,” and “the range, quantity and variety of reading in all areas of the curriculum.” (p. 28). Similarly, writing development is divided into compositional and transcriptional aspects. But beyond the child’s command of written language conventions, also of interest are the child’s ability to sustain engagement with literate processes, to engage in collaborative literate practices, the use the child makes of experience, the strategies used, and the range of genres entertained.

The complexity of the record honors the complexity of literate learning – certainly a matter of construct validity – but also places teacher knowledge at the center of assessment. This would be a worrisome psychometric property, particularly with the traditional issue of reliability, which tends to drop when authentic tasks are involved or when a range of people need to agree on complex observations. However, to reduce the process to merely increasing reliability would be to trivialize it. First, inter-rater reliability is a matter of getting people onto the same page, or rather into the same discourse. The PLR process works toward this through moderating sessions in which members of the immediate assessment community (including parents) work through cases together examining differences in judgment and their implications. This is not merely training judges, as is done for state tests, these negotiations involve more complex and wide-ranging data. Furthermore, because there are differences among participants and among the cases examined, the process extends the understanding of the individuals participating in the process. In fact, the differences that produce   “unreliability” are the very places where discussions produce learning for all parties and expansion of the discourse. This moderation process is being used and developed further in other places as ways of “looking together at student work” (Blythe, Allen, & Powell, 1999). Finally, there is a place in the procedure, and on the forms, for comments from parent and teacher to assist the child’s next teacher.

Aligned with the purposes of assessments, this procedure has several important features. First, it emphasizes the child’s assets, which redirects conversations about student learning and instructional decisions. Second, it builds in conversations with parents, students and the teachers about the student’s works, making the construction and the representation of learning a social project in which issues of power are minimized and actively addressed. Third, it uses ongoing classroom processes and performances to document children’s learning. In other words, it is particularly well adapted to accomplishing the major assessment functions of optimizing learning, engaging parents, students, teachers and others in productive conversations about learning, and shaping instructional programs and institutional supports.

Concluding Comments

Despite shifts in understandings of early childhood literacy and assessment, we continue to see traditional, norm-referenced testing and its associated language and values dominate early literacy assessment. There continue to be incentives to side-track children earlier, and access to institutional resources is often governed by tests which are firmly achored in cultural assumptions. Although documentary forms of assessment are used in localized contexts, they have not gained authority. This is partly because they are not ammenable to the simple numerical comparisons implicit in an accountability framework, partly because the teacher’s personal involvement in the assessments makes them appear “unobjective,” and partly because of the cultural assumptions about teaching, learning, and literacy.

Recognizing literacy and assessment as discursive practices means reorienting the conversation to examine the roles language plays in the construction and maintenance of traditional testing of literacy learning. Early literacy assessments as cultural artifacts of early childhood institutions and elementary schools represent the continued valuing of measuring individuals’ “worth” or “potential.” Even in situations where teachers or specialists don’t believe in the measurement tool; they continue to use it to talk with parents and other teachers about particular students (Brozo and Brozo, 1997). These paradoxes are also sustained by beliefs about the psychological nature of literacy development. Consistent with the competitive individualism and technological thinking endemic to the United States’ culture (Argyris & Schon, 1974; Bellah, Madsen, Sullivan, Swidler, & Tipton, 1985), we keep being drawn back to a technical, cognitive view of literacy in which skills can be taught in sequence and in isolation. This view is reflected in, and reinforced by, current reading tests. Rather than ask about the circumstances in which children are noticing and theorizing about language at all levels, we become caught up in asking whether they have all the short vowels and their sounds memorized.

Literacy assessment in early childhood is essentially formative in nature. It must function to improve instruction and to communicate progress. There is no reason for high stakes literacy testing in early childhood. There is no meritocratic function to elevate assessment to high stakes, and since accountability testing has unfortunate side-effects (Johnston, 1998; Smith, 1991; Smith & Rottenberg, 1991), that, too, should be avoided. Using literacy assessments to classify children as “unready,” as “disabled,” or as “language delayed” also are examples of counter-productive high-stakes practices. Early interventions should be based on detailed documentation across contexts rather than on decontextualized normative testing.

The teacher, as the primary assessment agent, is of central importance in improving early literacy assessment practices. Improving assessment, particularly the validity of assessment, requires improving the authenticity and quality of the data sources, the productiveness of interpretations, and the consequences of assessment actions – ultimately, whether the assessment practice improves the quality of student learning.  To accomplish this requires that teachers understand more about literacy development and its context – what to notice about children’s literate activity and the social interactions in which it is embedded and of which the teacher is a part.

There is extraordinary variation in the languages and cultures that children bring to literacy learning and in the literacies into which they are apprenticed prior to coming to school. We know, for example, that children often enter school with different expectations for ways of interacting with teachers or others in authority (Au & Jordan, 1981; Heath, 1983; Volk, 1997).  Differences between these discursive environments of home and school cultures are often encountered, if not recognized, in assessment practices (Moore, 1996). Whether on the internet or in school, using traditional or documentary practices, children’s lives are transformed, for better or worse, in the process. Standardized, norm-referenced testing does not accommodate these differences well, in part because in order to allow for diversity, there need to be some adjustments in interaction, which violates the standardization and the norming. Furthermore, such tests easily lead to a discourse of deficits (Taylor, 1990; Poplin, 1984; 1988; Coles, 1987). Preferable for assessment, both before and during the early years of school, is a community of teachers who are sensitive observers/listeners and who are able to document a child’s development without resorting to a discourse of disability and deficit. Because the teacher is the primary assessment agent in early childhood (as in the rest of schooling), we must look to theories of early literacy assessment that foster teacher development, and contexts that make it possible for teachers and students to feel, and assert, the authority of documentary assessments.

References

Allington, R. L., & Woodside-Jiron, H. (1999). The politics of literacy teaching: How "research" shaped educational policy. Educational Researcher, 28(8), 4-13.

Argyris, C. & Schon, D. (1974). Theory in Practice: Increasing professional effectiveness. London: Jossey-Bass.

Athey, I. (1990). The construct of emergent literacy: Putting it all together. In Morrow, L.M. & Smith, J (Eds). Assessment for instruction in early literacy, pp. 176-183. New Jersey: Prentice Hall.

Au, K. & Jordan, C. (1981). Teaching reading to Hawaiian children: Finding a culturally appropriate solution. In Culture in the Bilingual Classroom. Henry Trueba, Grace, Guthrie & Kathryn Au, eds. Rowley, Mass: Newbury House.

Barrs, M., Ellis, S., Hester, H., & Thomas, A. (1989). The primary language record: Handbook for teachers. London: Inner London Education Authority/Centre for Language in Primary Education.

Bellah, R., Madsen, R., Sullivan, W., Swidler, A., & Tipton, S. (1985). Habits of the heart: Individualism and commitment in American life. New York: Harper and Row.

Blythe, T., Allen, D., & Powell, B. S. (1999). Looking together at student work: A companion guide to Assessing Student Learning. New York: Teachers College Press.

Bredekamp, S (1997). NAEYC issues revised position statement on developmentally appropriate practices in early childhood programs. Young Children, 52 (2), 34-40.

Broikou, K. (1992). Understandinmg primary grade classroom teachers’ special education referral practices. Ph.D. dissertation, State University of New York at Albany.

Brown, H. & Cambourne, B. (1989). Read and retell: A strategy for the whole-language/natural learning classroom. Portsmouth: NH: Heinemann.

Brozo, W.G. (1990) Learning how at-risk readers learn best: A case for interactive assessment. Journal of Reading, 33, 522-527.

Brozo, W.G. & Brozo, C.C. (1994). Literacy assessment in standardized and zer-failure contexts. Reading and Writing Quarterly: Overcoming Learning Difficulties, 10, 189-200.

Clay, M. (1998). By different paths to common outcomes.   York, ME: Stenhouse.

Clay, M. (1993). An observation survey of early literacy achievement. Auckland, NZ: Heinemann.

Clay, M. (1991). Becoming literate: The construction of inner control. Auckland, NZ: Heinemann.

Clay, M. (1987). Learning to be learning disabled. New Zealand Journal of Educational Studies, 22 (1), 35-38.

Clay, M. M. (1985). The early detection of reading difficulties: A diagnostic survey with recovery procedures. (3rd ed.). Auckland: Heinemann.

Coles, G. (1987). The Learning Mystique. New York: Ballantine

Commeyras, M. (1995). What can we learn from students' questions? Theory into Practice, 34 (2), 101-105.

Dressman, M. (1999). On the use and misuse of research evidence: Decoding two states’ reading initiatives. Reading Research Quarterly, 34 (3), 258-285.

Durkin, D. (1966). Children who read early. New York: Teachers College Press.

Egan-Robertson, A. (1998). Learning about culture, language, and pwer: Understanding relationships among personhood, literacy practices, and intertextuality. Journal of Literacy Research, 30(4), 449-487.

Elley, W. B. (1992). How in the world do students read?: The I.E.A. study of reading literacy. Hamburg: The International Association for the Evaluation of Educational Achievement.

Falk, B. (1998). Testing the way children learn: Principles for valid literacy assessments. Language Arts, 76 (1), 57-66.

Falk, B & Darling Hammond, L. (1993). The Primary Language Record at P.S. 261: How assessment transforms teaching and learning. New York: National Center for Restructuring Education, Schools and Teaching (NCREST).

Feurestein, R. (1979). The dynamic assessment of retarded performers: The learning potential assessment device, theory, instrument and techniques. Washington, DC: Georgetown University Press. 

Fletcher, J., & Lyon, G. R. (1998). Reading: A research-based approach. In W. Evers (Ed.), What's gone wrong in America's classrooms? . Stanford, CA: Hoover Institute Press.

Garcia-Earnest, G. & Pearson, D. (1991). The role of assessment in a diverse society. In E.H. Hiebert (Ed.) Literacy for a diverse society. Perspectives, practices and policies, pp. 253-278. New York: Teachers College Press.

Gates, A. (1937). The necessary mental age for beginning reading. Elementary School Journal, 37, 497-508.

Gee, J.P. (1996). Social linguistics and literacies: Ideology in discourses (2nd ed.). London: Taylor & Francis.

Gee, J.P. (1999). Reading and the new literacy studies: Reframing the National Academy of Sciences Report on Reading. Journal of Literacy Research, 31 (3), 355-374.

Genishi, C. & Dyson, A. (1984). Language assessment in the early years. Norwood, NJ: Ablex.

Gessell, A. (1925). The mental growth of the preschool child. New York: Macmillan.

Gipps, C. (1994). Beyond Testing.Towards a theory of educational assessment. London: Falmer Press.

Glazer, S.M. & Searfoss, L.W. (1988). Reexamining reading diagnosis. In S. Glazer, L. Searfoss, & L. Gentile (Eds.). Reexamining reading diagnosis: New trends and procedures (pp. 1-11). Newark, DE: International Reading Association.

Gnezda, M. T., & Bolig, R. (1988). A national survey of public school testing  of prekindergarten and kindergarten children. Paper presented at the National Forum on the Future of Children and Families and the National Association of State Boards of Education.

Goodman, Y. (1978). Kidwatching: Observing children in the classroom. In A. Jagger & M. T. Smith-Burke (Eds.), Observing the language learner (pp. 9-18). Newark, DE: International Reading Association.

Gregory, E. (1997). Introduction. In E. Gregory (Ed.), One child, many worlds: Early learning in multicultural communities (pp. 1-8). New York: Teachers College Press.

Grossen, B. (1996). Making research serve the profession. American Educator, 20, 7-8, 22-27.

Heath, S.B. (1983). Ways with words: Language, life and work in communities and classrooms. Cambridge: Cambridge University Press.

Hiebert, E.H. & Calfee, R.C. (1992). Advancing academic literacy through teachers' assessments. Educational Leadership, 50-54.

Hodges, C. (1997). How valid and usful are alternative assessments for decision making in primary grade classrooms? Reading Research and Instruction, 36 (2),157-173.

Holdaway, D. (1979). The foundations of literacy. Gossford, Australia: Ashton-Scholastic.

IRA/NCTE Joint Task Force on Assessment. (1994). Standards for the assessment of reading and writing. Newark, DE: International Reading Association and the National Council of Teachers of English.

Jennings, J.F. (1998). What national standards and tests? Politics and the quest for better schools. Thousand Oaks: SAGE Publications.

Johnston, P. H. (1998). The consequences of the use of standarized tests. In S. Murphy (Ed.), Fragile evidence: A critique of reading assessment (pp. 89-101). Mahwah, NJ: Lawrence Erlbaum.

Johnston, P. H., Woodside-Jiron, H., & Day, J. P. (1998, December). Teaching and learning literate epistemologies . Paper presented at the National Reading Conference, Austin, TX.

Johnston, P. H. (1997). Knowing literacy: Constructive literacy assessment. York, ME: Stenhouse.

Johnston, P., Guice, S., Baker, K., Malone, J., & Michelson, N. (1995). Assessment of teaching and learning in "literature based" classrooms. Teaching and teacher education, 11(4), 359-371.

Johnston, P. (1993a). Assessment as social practice. In D. Leu & C. Kinzer (Eds.),  Forty-second Yearbook of the National Reading Conference . Chicago, IL: National Reading Conference.

Johnston, P. H. (1993b). Assessment as social practice. In D. Leu & C. Kinzer (Eds.), Forty-second yearbook of the National Reading Conference . Chicago, IL: National Reading Conference.

Johnston, P., Weiss, P., & Afflerbach, P. (1993).  Teachers' evaluation of teaching and learning of literacy. Educational Assessment, 1(2), 91-117.

Johnston, P.H. (1992). Constructive Evaluation of Literate Activity. New York: Longman.

Juel, C. (1991). The role of decoding in early literacy instruction and assessment. In L. M. Morrow & J. K. Smith (Eds.), Assessment for instruction in early literacy (pp. 135-154). Engelwood Cliffs, NJ: Prentice-Hall.

Keller, E. F. (1985). Reflections on gender and science. New Haven, CT: Yale University Press.

Knoblach, C. & Brannon, L. (1988). Knowing our knowledge: A phenomenological basis for teacher research. In Louise Z. Smith, ed., Audits of Meaning: A Festchrift in honor pf Ann E. Bethoff, pp. 17-28. Portsmouth, NHL Boynton/Cook-Heinemann.

Lankshear, C. (with J.Gee, M. Knobel and C. Searle) (1997). Changing literacies. Philadelphia: Open University Press

Lyons, C. (1991). Helping a learning disabled child enter the literate world. In D. Deford, C. Lyons, & G. S. Pinnell (Eds.), Bridges to literacy:   Learning from reading recovery. . Portsmouth, NH: Heinemann.

Lyons, C. A., Pinnell, G. S., & DeFord, D. E. (1993). Partners in learning: Teachers and children in Reading Recovery. New York: Teachers College Press.

McDermott, R. & Varenne, H. (1995) Culture as disability. Anthropology and Education Quarterly, 26, 324-348.

McDermott, R. P. (1993). The acquisition of a child by a learning disability. In S. Chaiklin & J. Lave (Eds.), Understanding practice: Perspectives on activity and context (pp. 269-305): Cambridge University Press.

McGill-Franzen, A. & Allington (1993). Flunk’em or get them classified: The contamination of primary grade accountability data. Educational researcher, 22, 19-22.

Mehan, H. (1993). Beneath the skin and between the ears: A case study in the politics of representation. In S. Chaiklin and J. Lave, eds., Understanding Practice: Perspectives on Activity and Contexts, pp. 241-268. Cambridge: Cambridge University Press.

Messick, S. (1995). Validity of psychological assessment: Validity of inferences from persons' responses and performances as scientific inquiry into score meaning. American Psychologist, 50, 741-749.

Messick, S. (1994). The interplay of evidence and consequences in the validation of performance assessments. Educational Researcher, 23, 13-24.

Moore, A. (1996). Assessing young readers: Questions of culture and ability. Language Arts, 73(September), 306-316.

Morphet, M and Washburn, C. (1932). When should children begin to read? Elementary School Journal, 31: 496-503.

Morrow. L.M. (1990). Assessing children’s understanding of story through their construction and reconstruction of narrative. In Morrow, L.M. & Smith, J (Eds). Assessment for instruction in early literacy, pp 110-134. New Jersey: Prentice Hall.

Moss, P. A. (1998). The role of consequences in validity theory. Educational measurement: Issues and practice, 17(2), 6-12.

Moss, P. A. (1995). Themes and variations in validity theory. Educational Measurement: Issues and Practice, 14(2), 5-13.

Moss, P. (1998). The role of consequences in validity theory. Educational Measurement: Issues and Practice, 6-12.

Moss, P. (1995). Themes and variations in validity theory. Educational Measurement: Issues and Practices, 5-13.

Moss, P. (1992). Shifting conceptions of validity in educational measurement: Implications for performance assessment. Review of Educational Research, 62, 229-258.

Moss, B. (1997). A qualitative assessment of first graders’ retelling of expository text. Reading Research and Instruction, 37 (1) 1-13.

National Association for the Education of Young Children (1998). Learning to read and write: Developmentally appropriate practices for young children. A joint position statement of the IRA and NAEYC. Young Children, 30-46.

National Association for the Education of Young Children (1991). Guidelines for the appropriate curriculum content and assessment in programs serving children ages 3 through 8. A Position statement of the National Association for the Education of Young Children and the National Association of Early Childhood Specialists in the State Department of Education. Young Children, 21-38.

Paris, S., Turner & Lawton (1991). A developmental perspective on standardized achievement testing. Educational Researcher, 20 (5), 12-20.

Phye, G. D. (1997). Epilogue: Classroom assessment -- Looking forward. In G. D. Phye (Ed.), Handbook of classroom assessment: Learning, adjustment, and achievement (pp. 531-534). New York: Academic Press.

Poplin, M. (1988). The reductionist fallacy in learning disablities: Replicating the past by reducing the present. Journal of Learning Disabilities, 21 (7), 389-400.

Rogoff, B. (1995). Observing sociocultural activity on three planes: Participatory appropriation, guided participation, apprenticeship. In Sociocultural Studies of Mind, eds. J. Wertsch, P. del Rio, and A. Alverez. Cambridge: Cambridge University Press. 139-164.

Roskos, K., & Neuman, S. B. (1993). Descriptive observations of adults' facilitation of literacy in young children's play. Early childhood research quarterly, 8, 77-97.

Rueda, R. & Mercer, J. (1985). Predictive analysis of decision-making practices with limited English proficient handicapped students. Presented at the Third Annual Symposium: Exceptional Hispanic Children and Youth. Monograph series 6 (1), 1-29. Denver, CO.

Scrivens, G. (1998). Nursery children as emerging readers and writers. In R. Campbell (Ed.), Facilitating preschool literacy (pp. 169-191). Newark, DE: International Reading Association.

Shavelson, R. J., & Webb, N. M. (1991). Generalizability theory: A primer. London: Sage.

Shepard, L.A., Taylor, G.A. & Kagan, S.L. (1996). Trends in early childhood assessment policies and practices. Washington, DC: OERI.

Shepard, L.A. (1994). The challenges of asssessing young children appropriately. Phi Delta Kappan 76 (3), 206-212.

Shepard, L. A., & Smith, M. L. (1989). Flunking grades: Research and policies on retention. New York: Falmer.

Shepard, L.A. & Smith, M.L. (1988). Escalating academic demand in kindergarten: Counterproductive policies. The Elementary School Journal, 89 (2), 135-145.

Shepard, L.A. & Smith, M.L. (1986). Synthesis of research on school readiness and kindergarten retention. Educational Leadership, 44, 78-86.

Smith, M. L. (1991). Put to the test: The effects of external testing on teachers. Educational Researcher, 20(5), 8-11.

Smith, M.L., & Rottenberg, C. (1991). Unintended consequences of external testing in elementary schools. Educational Measurement: Issues and Practice,10(4), 7-11.

Stallman, A. C., & Pearson, P. D. (1991). Formal measures of early literacy. In L. M. Morrow & J. K. Smith (Eds.), Assessment for instruction in early literacy (pp. 7-44). Englewood Cliffs, NJ: Prentice Hall.

Standards for the Assessment of Reading and Writing (1994). International Reading Association and National Council of Teachers of English, prepared by the IRA/NCTE Joint Task Force on Assessment.

Sulzby, E. (1985). Children’s emergent reading of favorite storybooks: A developmental study. Reading Research Quarterly, 20, 458-481.

Tammivaara, J & Enright, S. (1986). On eliciting information: Dialogues with child informants. Anthropology and Education Quarterly, 17, 218-238

Taylor, D. (1990). Teaching without testing. English Education, 22 (1), 4-74.

Teale, W. (1991). The promise and the challenge of informal assessment in early literacy. In Morrow, L.M. & Smith, J.K. (Eds). Assessment for instruction in early literacy, pp. 45-61. New Jersey: Prentice Hall.

Teale, W. (1988). Developmentally appropriate assessment of reading and writing in the early childhood classroom. Elementary School Journal, 89, 173-183.

Tierney. R.J. (1998). Literacy assessment reform: Shifting beliefs, principles possibilities, and emerging practices. The Reading Teacher, 51 (5), 374-390.

Tolbert, K. (1999, November 25). Tokyo's Tots Face Important, Early Test After 2 Years of Cramming, Kindergartners Take Private School Entrance Exams. Washington Post Foreign Service, pp. G01.

Vellutino, F. R., Scanlon, D. M., & Sipay, E. (1997). Toward distinguishing between cognitive and experiential deficits as primary sources of difficulty in learning to read: The importance of early intervention in diagnosing specific reading disability. In B. Blachman (Ed.), Foundations of reading acquisition and dyslexia: Implications for early intervention (pp. 347-380). Mahwah, NJ: Lawrence Erlbaum.

Volk, D. (1997). Continuities and discontinuities: teaching and learning in the home and school of a Puerto Rican five year old. In E. Gregory (Ed.), One child, many worlds: Early learning in multicultural communities . New York: Teachers College Press.

Vygotsky, L. (1962). Thought and Language. Ed. Eugenia Hanfmann. Trans. Gertrude Vakar. Cambridge, MA: MIT Press.

Whitmore & Goodman (1995) Inside the whole language classroom. School Administrator 49 (5), 20-26.

Wixson, K, Valenica, S.W., Lipson, M. (1994). Issues in literacy assessment: Facing the realities of internal and external assessment. Journal of Reading Behavior, 26 (3), 315-337.

Wolf, K.P. (1993). From informal to informed assessment: Recognizing the role of the classroom teacher. Journal of Reading, 36 (7), 518-523.

image/reddot.gif (35 bytes)
The Center on English Learning and Achievement