In earlier posts, I have advocated banning high-stakes testing as a means of making significant decisions about student performance (achievement in a course, passing a course—end-of-year-tests, being promoted, and graduating from high school).
I suggested this because the research evidence does not support continuing the practice in American schools.
The research reported here sheds light on high-stakes testing, and shows why they should not be used to make decisions about students’ achievement, teachers’ performance, or to make sanctions or offer rewards to schools.
High-Stakes Tests: No Effect on Student Achievement
The Board on Testing and Assessment of the National Research Council issued a report entitled Incentives and Test-Based Accountability in Education.
The report concludes that using test-based (high-stakes testing) incentives has not created positive effects on student achievement. It says that school-incentives such as those of the No Child Left Behind Act produce some of the highest effects in the programs studied, but only in elementary mathematics, and the improvements were miniscule. Exit exams, which are used in 25 states, typically given in each of the major content areas at the end-of-the-year have actually decreased graduation rates.
What do tests measure?
We rely on tests to inform us about academic learning, but we fail to consider not only what tests don’t measure, but the limitations on what they do measure.
We get ourselves in real trouble when we think that a score on a NCLB test, or a CRCT type of test is actually a good measure of student academic learning. We get ourselves in further trouble when we believe that the score represents what students know. The hole gets deeper when we think that changes in student test scores (positive or negative) are caused by the performance of teachers.
The authors of the National Research Council report on Incentives and Test-Based Accountability in Education had this to say about tests:
The tests that are typically used to measure performance in education fall short of providing a complete measure of desired educational outcomes in many ways. This is important because the use of incentives for performance on tests is likely to reduce emphasis on the outcomes that are not measured by the test.
Collateral Effects
The first is teaching to the test. The curriculum becomes narrow as we teach to the test. This often causes us to stray from interesting activities. We give less time to project-based work, and hand-on-collaborative activities.
Using projects and hands-on activities takes away time needed to drill students on the content of the test, or in the case of elementary schools, these take away time to teach math and reading/language arts.
One of the constraints of test-based incentives is that there are many goals of teaching that are not measured by bubble tests such as curiosity, persistence, ability to solve problems, or to collaborate. Yet, these might be as important as the content that is tested.
But as the Board of Testing report reveals, the tests that we use do not do a great job in measuring the performance in the tested areas such as science, mathematics, English, or social studies. Since the tests in these areas are based on the outline of content as represented in the content standards of each subject, there simply is not enough time to test students in each content standard.
Constructing a Test is not So Simple
There are seven major areas of standards:
- Science as Inquiry
- Physical Science
- Life Science
- Earth Space Science
- Science and Technology
- Science in Personal and Social Perspectives
- History and Nature of Science
In these seven areas there are 64 content standards just for grades K-8.
If you then look at the details of the Science Standards for any one of the 64 content standards, one finds at least three fundamental concepts and principles that underlie the standards.
So at the least, we have 192 concepts to measure on a test.
What is a test maker to do?
Look at this example.
If you were to develop a test for Grade 5, you would need to develop a domain chart that included about 96 concepts. If you wrote one test item for each concept, then the test would be 96 items long.
But, that’s too long a test, so the test must be reduced in number, to say 30 or 40 items, meaning that not all of the content standards have been measured.
And what is worse, we are only using one test item to “measure” performance on each standard. Wouldn’t it be more valid if we used two or more test items to “measure” each standard? If we do, then we end up testing fewer standards.
So high-stakes tests fall short in measuring the standards in most content areas, yet we continue to use them to make decisions about student, teacher and school performance.
As the National Research Council report suggested
…tests also fall short in measuring performance in the tested subjects and grades in important ways. Some aspects of performance in many tested subjects are difficult or even impossible to assess with current tests. As a result, tests can measure only a subset of the content of a tested subject.
We can define what a test measures, but in the current era of high-stakes testing, the tests that are being used to measure performance in any subject (math, science, English) do not represent the full scope of the curriculum, and have been shown to be ineffective in increasing student achievement. End-of-year tests, such as those given in Georgia, are high-stakes tests, and should not be used to determine if a student should graduate. The evidence is that end-of-year tests actually result in decreasing graduation rates.
Suggestions
The authors of the Incentives and Test-Based Accountability in Education report recommend that since we do not yet know how to use test-based incentives consistently to make positive effects, policy makers should support and look at alternative evaluation models. Furthermore, policy makers should make use of basic research and make choices from a number of options. They go on to say that:
We call on researchers, policy makers, and educators to examine the evidence in detail and not to reduce it to a simple thumbs-up or thumbs-down verdict. The school reform effort will move forward to the extent that everyone, from policy makers to parents, learns from a thorough and balanced analysis of each success and each failure.
We would wish that policy makers would use the report to put a moratorium on using high-stakes tests to make decisions about students, teachers and schools.
Do you think this will happen? Comment and tell us what you think.
0 Comments