The Learning Scientists

View Original

GUEST POST: What Causes Test-score Inflation? Comparing Two Theories

By Richard P. Phelps

Richard Phelps (@RichardPPhelps) has probably written or edited more scholarly works on standardized testing than anyone in the world; he also founded and manages the Nonpartisan Education Review.

Ask most anyone inside U.S. education—in the “education establishment”—about “teaching to the test”, and they will tell you that externally imposed high-stakes tests induce teaching to the test. Moreover, teaching to the test—replacing instruction on subject matter with training in subject-matter-independent test-taking skills—successfully increases test scores.

In the article, Teaching to the test: A very large red herring, I provide counter evidence and argument. Thousands of externally imposed high-stakes tests show no evidence of test-score inflation. Likewise, low- and no-stakes tests notoriously lead to test-score inflation when test security (or, “the integrity of test materials”) is lax. The necessary and sufficient condition for test-score inflation is lax security, not high stakes.

This debate is important because the public policy solutions in each case are very different. Here I provide a brief summary of the arguments provided in the much longer piece published in the Nonpartisan Education Review.

If high-stakes --> teaching to the test --> test score inflation is TRUE

1) As the theory’s primary advocate writes (1),

“Scores on high-stakes tests—tests that have serious consequences for students or teachers—often become severely inflated. That is, gains in scores on these tests are often far larger than true gains in students’ learning. Worse, this inflation is highly variable and unpredictable, so one cannot tell which school’s scores are inflated and which are legitimate.”

2) Subject-matter independent training in test taking works to increase test scores (as some test prep companies claim).

3) Test scores are, at best, only partly related to subject matter mastery, because they are also highly correlated with subject-matter-free test-taking skills.

4) And, corollary beliefs include:

i) The cause of educator cheating in testing administrations is high-stakes; without high-stakes, educators do not cheat.

ii) No- or low-stakes tests, by contrast, are not susceptible to test-score inflation because there are no incentives to manipulate scores.

Given the above, responsible public policy should…

a) … in the interest of improving test scores, teachers should teach to high-stakes tests. They should reduce the amount of time devoted to subject matter mastery—to regular instruction and learning—and, instead, devote more time to taking practice tests, coaching students on test-taking strategies, familiarizing their students with standardized test formats, etc.

b) … use of test prep services should be encouraged. Moreover, in the interest of fairness, these services should be subsidized, at least for poorer students.

c) … as score trends for high-stakes tests are unreliable and those for no- or low-stakes tests are reliable, no- or low-stakes tests may be used validly as shadow tests to audit the reliability of high-stakes tests’ score trends.

d) … test security (or, the integrity of test materials) is not an issue with no- or low-stakes tests, so they can be validly administered without security controls.

e) … or, eliminate the use of high stakes tests entirely. Given that they provide neither valid nor reliable information, there is no excuse for using them. Currently, high stakes tests are used for certification and licensure in most professions and trades.

If, instead, lax test security --> test score inflation is TRUE

1) Test scores and test score trends should not be trusted in the absence of test security controls, no matter what the stakes.

2) High-stakes test scores and score trends are typically not only valid and reliable when administered with tight security, they are more likely to be valid and reliable because they are more likely to be administered with tight security than low- and no-stakes tests.

3) Educators are normal human beings, and respond to a variety of incentives, just like the rest of us. By cheating on no- or low-stakes tests, educators might then publicize and take credit for the ostensible student learning increases. Note, however, that no “stakes” are involved; rather, self-aggrandizement is the motive.

 4) And, as corollaries:

i) Teaching to the test not only does not improve learning, because it takes time away from subject matter instruction, it reduces it.

ii) Money spent on test preparation services is money wasted if the service consists primarily of test-taking strategies, format familiarity, and practice test taking.

Given the above, responsible public policy should…

a) … consider test security (or, the “integrity of test materials”) far more seriously than it has been, and applicable to many no-or low-stakes tests

b) … encourage teachers to devote only a modicum of time to familiarizing their students with standardized test-taking formats and strategies. They should not sacrifice instruction in subject-matter mastery.

c) … eliminate the fallacious research practice that considers no-stakes tests to be always valid and reliable and thus trustworthy to use in “auditing” high-stakes tests.

But which of these two positions is best supported by data? For more, please read the longer piece.


References: 

(1) Koretz, D.M. (2008). Measuring up: What educational testing really tells us. Harvard University Press.