52 Mondays: Teachers and Students Both Ante Up in High Stakes Testing Gamble

An education isn't how much you have committed to memory, or even how much you know. It's being able to differentiate between what you know and what you don't. ~Anatole France 

I recently visited a blog to read an expert's view on the topic I wanted to launch today. (I anticipate returning to the topic more than a handful of times under various guises in order to do it justice.) I confess I didn't read the expert's post; an advert on the page captured my attention, and I focused on its message instead. The company eLumen examines student outcomes (i.e., assessment), and the advert as well as the corporate webpage I subsequently visited contained the following slogan, "Solving the problem that created the need for assessment in the first place." Having struggled with direction for the first essay on assessment and high stakes testing, these words provided me with a map, compass, and torch faster than you can say, "Jiminy Cricket."

It may seem as if January closed with me lambasting the PK-12 system and its teachers for not better preparing our public school children for college. While I do believe the system is broken, I have to believe that the reformations needed are possible, and I don't lay the blame at the feet of our nation's teachers. Further, I believe most do the best they can within the confines of their own educations, experiences, and the enormous pressures of a system that no longer recognizes itself. This, then, marks the beginning of a discussion on a topic so enmeshed with good intentions wedded with poor definitions we cannot hope to do more than fail at our endeavor without that discussion.

In short, our educational system has cast itself adrift and now wanders oceans afar from its mission.

To wit, in the spring of 2011, I received a letter from the school district attended by Daughter No. 2 (still attended by Daughter No. 2 and Daughter No. 3 and the district of which both myself and my sister are products). At the time, Daughter No. 2 was in the fifth grade. The letter explained to parents the end-of-grade (EOG) testing policy for the district. A scale score of three or four on the reading, math, and (for fifth grade only) science tests would be considered "proficient," while a score of one or two would not. In addition, students scoring a two would automatically retest, while students scoring a one would only retest on a parent's request. I don't know what other parents thought, but my take-away from this letter was that students scoring a two might be within striking distance of a three (particularly given test-retest effects). That is, they were considered meritorious of a retest, but "one" students were not. On further investigation, this policy revealed itself to be state and not local level policy. As with policy-setting in this country in general, local agencies (the LEAs) can adopt more stringent criteria for retesting than the state policy (i.e., they can require all ones and twos to retest), but they cannot adopt more liberal retest policies. That is, they must adhere to the state policy but can add on to it. 

My problems with high stakes testing, including EOGs, EOCs, and the like, number in the dozens. As someone who helps collegiate programs with their own educational assessments, I don't have a problem with assessment on its own merits, but I have many issues with PK-12 assessment in its current iteration (and many prior ones, and many I'm sure I'll have more problems before we get it right). Thus, the initial caveat that this blog will see more posts on the topic of high stakes testing in our public schools; I simply cannot cover the range of issues in one essay. Today, I'll focus on mission drift and the most basic tenet of assessment… of all research when we get right down to it.

A little known truth of programs – be they educational or otherwise – is that the method by which they will ultimately be assessed drives (or should drive) their initial design. That is, if my program seeks to deliver mail from Point A to Point B in the shortest amount of time, I wouldn't design such a program so that its eventual assessment determines program efficacy based on the volume of mail delivered from Point A to Point B. That would make no sense. Lapsed time in delivery would be the assessment. Such is the case (or should be the case) with both educational programs and their assessments. The program-aspecific model looks a bit like the following:

One of my (many) issues with high stakes testing in America's public schools is that the testing itself morphed from policy into an assessment afterthought. Thus, educational assessment, no matter how eloquently defined in any district's public documents, doesn't follow the above model. That is, we were well on the way to educating the nation's children for some couple of centuries before we thought about an EOG or EOC. To put it another way, state systems of accountability don't seem to know what, precisely, the EOG or EOC should measure, and a basic rule of any instrument is that it can't be considered valid if it isn't measuring what it's intended to measure. I know, I know. Everyone reading knows exactly what an EOG or EOC measures. We all receive letters from our districts telling us every year what they measure. Here's what the official report from the North Carolina Department of Public Instruction (DPI) has to say on the matter. "The North Carolina End-of-Course Tests (EOC) were developed …to provide accurate measurement of individual student knowledge and skills specified in the North Carolina Standard Course of Study…" In point of fact, North Carolina General Statutes allow for children to not be promoted based on EOG/EOC scores below the proficient level. Perhaps I'm making a false claim. Perhaps at least the state of North Carolina does know what it measures with these assessments.

I think not so much. See, tricky me, I truncated the document's statement above. What it really says is the following:

The North Carolina End-of-Course Tests (EOC) were developed for two purposes: 

  • To provide accurate measurement of individual student knowledge and skills specified in the North Carolina Standard Course of Study and

  • To provide accurate measurement of the knowledge and skills attained by groups of students for school, school system, and state accountability.

A very different statement, that. DPI can't have it both ways. They can't use EOG/EOC scores to test both individual progress and program efficacy. To return to my mail delivery example, that would be akin to my using a lapsed time assessment to determine both how quickly mail delivery occurred from Point A to Point B and how well the Mail Delivery System functioned as a management system of its employees to ensure that rapid delivery. The assessment works for one but not the other. With regard to high stakes testing in PK-12 (or even higher ed), the program determines the assessment, and you can't piggyback two outcomes onto a single assessment measure. To do so is to invite the sort of trouble we recently witnessed in the Chicago school system with thousands of teachers striking against unfair labor practices that were – at the end of the day – unfair.

On this single issue, I have two problems. Let's address students first, which in many ways also addresses teachers. A student – let's call her Paula – is in the third grade. At the end of the third grade, Paula will take an EOG for both reading and math. In North Carolina at least, she'll receive a scaled score of one, two, three, or four. If she scores a three or four, she'll most likely be promoted to the fourth grade. However, her EOG scores will also be factored into her final grades in both her language arts and math to the tune of a minimum of 25% of that final grade. Given that these tests generally contain between 30 and 40 items at the third grade level, this means we're willing to concede the sum total of what must be known about both language arts and math at the end of the third grade can be boiled down to these handful of items each. That idiocy aside, we're also willing to hinge a grade promotion or retention on the outcome of this test. We don't call them high stakes for nothing. Thirty-forty items can, in no way, be considered comprehensive. Nor can this method be considered fair (for either Paula or Paula's teacher). If Paula has a good testing day, she can conceivably raise a borderline grade, a grade where she might well have been retained for a year or have been referred for intensive remediation that would have helped her significantly to the point where she looks like a normal "C" student. I believe we're all used to the converse argument, equally valid, that a solid "C" student finds herself in sudden jeopardy of retention through a bad day testing or, worse, a bad case of the assessment doesn't fit the needs of the classroom. How can any of us believe 30-40 items to be truly comprehensive? As a final note on Paula, let's consider her year of study in the third grade. She hasn't been (or should not have been) preparing for an EOG in language arts and math. She should have been learning how to identify grammatical and spelling errors in sentences. How to read for comprehension. How write informational and creative essays of short lengths. She should have been learning the rules of multiplication and division. How to calculate perimeter and area. How to plot some simple points on a line. In doing these things, Paula's teacher would have been assessing her progress on a routine basis. These are the homework assignments, quizzes, and occasional tests about which all third graders groan. These are the marks of her comprehension, her burgeoning knowledge. To suggest otherwise, to suggest a 30-40 item EOG takes the place of that and determines her progression is offensive, not just to Paula but to her teacher, who knows her better than these few items ever will. To illustrate, I have yet to read a district report, state summary, or peer reviewed article that provides a look at the relationship between teachers' assessments of student achievement throughout a course and EOG/EOC assessment. Until I see that study, I will continue to maintain that students' assessment, particularly assessment related to progression, is best left in the gradebook.

EOGs and EOCs, at their core, are intended to be assessments of student learning outcomes, broad measures of an educational program's efficacy. I don't have a professional objection to such assessment for such a purpose. My objection lies in the evolution of the nature of these assessments into a high stakes venture for both students and teachers. A good many of the nation's public high schools are regionally accredited by the appropriate agency in their region (e.g., the Southern Association of Colleges and Schools accredits most public high schools in North Carolina). Maintaining accreditation can be less a matter of choice than one of policy in some districts, but that's conversation for a different essay. I raise the issue of accreditation, because all accrediting bodies set standards for the assessment of student learning outcomes. Although schools are given wide latitude in how they develop and measure these, two principles underlie all efforts at this critical endeavor. First, student learning outcomes marry the classroom learning with the overall program objectives. That is, what does a curriculum hope to achieve, and how does the classroom support that achievement? The individual support(s) are the student learning outcomes. Second, student learning outcomes are never to be tied to a student's grades, progress, or graduation potential. Why? Because the measurement of student learning outcomes equals the measurement of the program's efficacy, how well the program is doing its job… how well the curriculum is performing, not how well the student performs. This is a subtle but oh so crucial distinction. Colleges and universities have found themselves on monitoring or probationary status with their accrediting bodies for linking student learning outcomes to classroom performance. My suspicion is that high schools will not be far behind. Do I believe this means we should have no EOGs or EOCs? Not exactly. I do believe we should have far fewer of them (but not for this reason), but I also believe that where we have them, the results must not comprise any portion of a student's grade in the relevant class and must not comprise any element of the decision to promote or retain the student. When a state or a district imposes these assessments as a measure of programmatic accountability, then I can get behind them (albeit fewer of them) when the results are disentangled from grades and promotion/retention.

And also from individual teacher accountability.

Rahm Emanuel, for all that I thought he brought an unnecessary maelstrom upon his city, made a statement with which I could agree. I loosely paraphrase here, but he alluded to the fact that public school teachers are among the only profession that has no set system of review or accountability. Thus, his desire to use standardized test scores from the city's students to reward "good teachers," a plan that backfired when teachers went on strike and held out on a number of issues, including this one, until the mayor met union reps at the table to bargain in teachers' favor. I do agree that public school teachers need a system of accountability that works, that is based on more than a district's policy of peer evaluation, principal evaluation (whenever that occurs, be it once, twice, three times/year), students' test scores, and general good will (i.e., if I don't screw up, I won't get fired). The problem with using high stakes test scores for teacher accountability… hell, whom am I kidding? The problems are too numerous to name, but I'll begin with the following:

  • EOG/EOC scores are, at best, proxy measures for teacher efficacy. These scores reflect a range of studentcharacteristics (e.g., days in attendance, socioeconomic status, parents' education level, motivation) and not teacher characteristics. As well, students with these characteristics are not randomly distributed across teachers' classrooms. There are "good" classrooms (e.g., AP and honors classes) and "less good" classrooms (e.g., not AP and honors classes) where student characteristics such as those listed above congregate in greater or lesser measure. This stacks the deck for and against individual teachers.

  • As with piggybacking student performance assessment and program assessment on the same measure, which you really can't do, you can't piggyback student performance assessment and teacher assessment on the same measure. Now, of course, we're talking about using the same measure to assess all three. This is why I say we have mission drift. We can't even pinpoint why we're using EOGs and EOCs, because any methodologist will tell you it can't be for all three; it can only be for one of these. Which is it going to be?

  • The only way to assess teacher efficacy is directly. Look at the program assessment model above. Define what the outcomes are going to be for teacher efficacy. Implement that program. (Hint: It's not student outcomes. It's delivery of content.) Develop, before implementing the program, the assessments you'll use. (Hint: One of the measures will be teacher absenteeism, just as it is for students and their outcomes.) Measure. Analyze. Reflect. Refine. You'll be glad you did, because you'll have a much clearer picture of who's teaching well and who isn't than you will from EOGs and EOCs.

The reason EOGs and EOCs don't work in their current iteration, will never work in their current iteration (one reason), is the very fact that they are high stakes. They're high stakes for everyone, including students, teachers, and the systems that employ and instruct them. Remove the anguish, remove the punitive nature of the test, and the test itself will become a useful tool rather than an albatross. At the moment, we use these tests to assess individual student progress, program efficacy (for schools, districts, and states), and individual teacher efficacy. The only appropriate use for such a test is the second of these, which is why I took exception when I read eLumen's slogan, "Solving the problem that created the need for assessment in the first place." …that created the need for assessment in the first place. The need for assessment always exists, particularly in the high-dollar business of tax payer funded public instruction. I would never suggest that teachers, students, and state school systems should not subject themselves – willingly – to rigorous and transparent assessment. They should. I do believe, however, we have given up all hope of reinvigorating American public instruction with the nature of high stakes testing as it exists today.

52 Mondays is my 2013 project here on wrighterly. You can read about it at wrighterly.com as well. Each Monday, I'll post a different essay on some topic related to PK-20 education in America. The purpose is to raise the level of dialog on these issues if only among the modest audience I currently enjoy.


Rachel Green said...

I remain very glad I was educated in England in the seventies.

Stephanie Wright said...

You should! I wonder what this growing generation will praise in its education?


All material on this website ©2009-present by Stephanie M. Wright. All rights reserved. Contact for more information.