What We Lose in Winning the Test Score Race
Dr. Olaf Jorgenson
Posted May 17, 2012
To achieve perpetually better test results each year as mandated by the No Child Left Behind Act, teachers in successful schools like Leroy Anderson Elementary in San José, California, will “try anything” to raise scores, as the school’s principal stated in her interview with "The San José Mercury News." In schools across my state for the past decade, the same single-minded determination to improve outcomes feeds a relentless focus on teaching to the test that, to the dismay of many teachers, builds low-level skills such as memorization and recall at the expense of higher-order aptitudes—and at a tremendous cost to our community and future.
None of this is news to us. We’ve read the studies demonstrating that success on standardized tests rests heavily on such independent variables as a child’s socioeconomic status and the education level of his or her parents — factors that have nothing to do with the quality of schools or teachers. As the ever-provocative Alfie Kohn often tells conference audiences, the single best predictor of success on a standardized test is a child’s ZIP code. Children of affluent, educated parents are the best test takers. No amount of school improvement will impact the root societal causes of the “achievement gap,” although our public schools invest incomprehensible amounts of time, tax revenue, and effort to make a difference.
Most educators and many parents know that authentic learning involves much more than perpetually improving test results. To argue that one test score can represent a child’s learning is rather like saying a doctor can determine a patient’s overall health using only a tongue depressor.
Sit, Get, Spit, Forget
What don’t standardized test scores measure? We know that they don’t measure a child’s creative ability. They don’t require children to research, explain, debate, elaborate, present, rebut or improvise. They don’t demand public-speaking skills. They don’t reflect decades of research demonstrating that children come to school with an array of individual learning styles and perhaps nine or more different types of “intelligence,” only one or two of which educators can measure with a paper-and-pencil test.
Further, research points to mounting evidence that certain qualities schools don’t test—such as perseverance, resiliency, and determination—play a role in high academic achievement. Multiple-guess standardized testing can’t reflect character traits that encourage success in school and in later life.
How ironic that these standardized tests, which offer only one right answer to every problem, can’t capture the innovative, pioneering thought purportedly so valued by business and industry, particularly where I live in Silicon Valley. Memorizing and regurgitating facts for a multiple-guess exam—“sit-get-spit-forget”—certainly doesn’t prepare students for creative or entrepreneurial leadership. Nor will it serve a generation of digital learners for whom sorting, verifying, and applying information will transcend the need to acquire or memorize it. Standardized testing will be obsolete for the children now subjected to mastering it.
Whatever your opinion about the efficacy of and rationale for standardized testing, NCLB remains the law of the land. Waivers recently granted to a dozen states, with more pending, exchange student annual growth targets for the requirement that teacher evaluation be tied to student test results – different stripes, but the same animal. In the NCLB and waiver states, the call for improved student achievement as the foundation for school reform remains fixed in mainstream culture and media. As long as public officials are compelled to measure schools, teachers and children by the outcomes of annual multiple-guess exams, schools and teachers and children will continue to face comparisons and (labeling) according to the test scores they generate.
The Numbers Game
The testing phenomenon in America eclipses NCLB as a mindset, more than a mandate. Standardized test scores are useful because they’re measurable. We all want schools to be accountable, and test scores make accountability easy to quantify. In California, the Academic Performance Index (API) is a measurement of academic performance that tracks the progress of individual schools based on annual yearly progress targets. Although the API does afford passing attention to attendance and graduation rates in ranking schools, and in 2011 began considering dropout rates, the heavy emphasis in API comparisons of schools centers on standardized test outcomes.
Realtors selling houses in “high API neighborhoods” relish the scores. Politicians decrying the state of California’s public schools brandish stagnant or low API scores as “proof” of the decay of our system. Even improvements in test scores inspire critics to rage about the costs of public education and the anemic return on investment for paltry gains.
For parents and children across our country, America’s fixation on test scores obscures what teachers must omit from their lessons today—namely exercises in critical thinking, creative analysis, and unconventional problem solving.
Alternatives to the NCLB status quo exist, of course, but they’re not as expedient or convenient as simply comparing test score data from school to school, year to year. In 2005, then-Secretary of Education Margaret Spellings allowed two states, Tennessee and North Carolina, to participate in a U.S. Department of Education pilot session to test “growth” models as an alternative to the “status” model imposed by NCLB.
Growth assessment models vary, but they all focus on improvement realized by students rather than against a fixed (often arbitrary) target, acknowledging that not all children start out at the same academic level at the beginning of the school year and thus won’t all hit the same target. This is substantially different from the NCLB status model that demands a certain percentage of children in a school, regardless of ability level and across all subgroups, achieve proficiency; that this percentage increases annually; and that by 2014, all children in America across every school setting, subgroup, and ability level will be “proficient.”
North Carolina implemented a variation on the growth model, a sort of hybrid program that enables underperforming students to make progress on a value-added basis for up to four years, and then holds them accountable to a fixed proficiency target readily measured under the NCLB status quo. The Tennessee Department of Education developed the so-called “value added” growth model considered for the U.S. Department of Education pilot, but Tennessee subsequently deemed it incompatible with NCLB and submitted a separate growth model for pilot implementation.
In the value-added assessment model, schools determine how much students have progressed in the tested subjects over the course of one school year; a gain of at least one year’s growth is satisfactory for the child and the teacher. Less would be problematic, and more growth calls for commendation. It’s a very straightforward premise.
Using the value-added model, administrators can compare schools according to the size of the gains their teachers consistently generate—and ostensibly, not by an absolute ranking of aggregate student test scores. The U.S. Department of Education’s Race to the Top program requires the use of value-added models to assess student growth, leading to an increase in state adoption of these assessment systems.
In California, a value-added system for K-12 assessment would scramble the tidy API rankings of “better and worse” schools. We might expect growth scores to correspond inversely to the demographics of the school communities; indeed, an affluent suburban school serving middle class and privileged students who start every year at grade level and prepared to learn might well show smaller value-added gains than a school that’s effectively serving a more at-risk population. Which is the “better” school in this scenario?
Clearly a straight value-added model would prove more complex than the current testing paradigm. It will require more work, more time, and more funding to administer, process, evaluate, explain, and apply the results of value-added tests.
The value-added approach effectively undermines standardized test score comparisons and adherents to the status quo in other key respects. Let’s face it: NCLB and the now-prolific emphasis on test outcomes have spawned a lucrative cottage industry that thrives on the anxieties generated by multiple-guess high-stakes tests. From pricey test-prep services, programs, and materials to online and conventional publishers of testing booklets and study guides to for-profit organizations and charter schools that contract to take over “failing” schools—an entire network of opportunistic and entrepreneurial capitalists depend on the NCLB-inspired status quo, and many an outcomes-based livelihood could be in jeopardy with a shift to a value-added model.
What’s more, politicians and media pundits intent on using test scores to decry the deplorable state of public education would lose their single most relied-upon weapon. My hunch is that in many at-risk communities, a surprising number of the children who remain in a school for a reasonable amount of time with consistent attention from a qualified teacher make substantial growth gains that don’t receive any attention in the published all-school results. And despite what we hear in political speeches and television news broadcasts, we all know many qualified, dedicated teachers who serve at-risk youth. How many “failing schools” would shine in the light of a value-added assessment program?
Any complete, candid appraisal of the value-added model must pose the question: Would teachers’ unions welcome a test that measures whether a teacher’s impact is more or less than we should expect, relative to a child’s growth in a year of schooling? An ineffective teacher can’t easily hide behind assessments that disaggregate the results—and focus their aforementioned light—to the level of each individual student, before and after the teacher’s annual work is complete.
All the thorns on the value-added model might help explain why Tennessee deemed it incompatible with NCLB, why North Carolina blended its growth model with the uncompromising “all children proficient” target espoused by the law, and why some states haven’t widely considered implementation of a value-added assessment program. In terms of providing more meaningful data that can better serve teachers, children, families, and schools, the value-added model offers a bold departure from NCLB as we know it.
A Healthy Catalyst?
At its core, the campaign to bring more accountability and competition into America’s public schools is a noble one, to the extent that we seek to better prepare children to be productive and responsible citizens, capable of contributing to society while finding fulfillment in their chosen paths. Accountability and competition can serve as healthy catalysts in reforming public education over time if our assessment system helps us to:
- Intervene with children who aren’t making progress, individually as well as across at-risk subgroups;
- Identify and address underperforming (and high-achieving) educators; and
- Genuinely improve (rather than narrow) teaching and learning.
Pointedly, however, our test score obsession in California and elsewhere does little to improve teaching and learning in our schools. Ultimately, great schools are measured not by the accomplishments of their students, but by the extraordinary lives led by their graduates. With all that standardized tests subtract from the learning process, in our determined march toward high test scores, we fail to prepare today’s students to lead the extraordinary lives they deserve.
Published in the May/June 2012 Principal magazine.