Voodoo Statistics in Education

Education reform efforts are guided by an attempt to quantify the term “value”. This is done with something called “value-added models” (VAM’s). VAM’s use student performance data (in Pennsylvania’s case from the PSSA test) and attempt to show how much “value” a teacher has brought to the student. Education reformers that are infatuated with VAM’s claim that the model “controls” outside influences (things like poverty, race, etc) through complicated statistical processes; however a recent paper from the American Statistical Association cautions us about the use of VAM’s. The entire paper can be found here but I will share some highlights with you.

Some of the cautions that the paper presents are significant. At the most basic level, VAM’s rely on results from State mandates standardized tests. Interestingly, the paper claims that tests currently given to students do not meet the high standards required for validity and reliability. Specifically, the report states:

· “The measure of student achievement is typically a score on a standardized test, and VAMs are only as good as the data fed into them. Ideally, tests should fully measure student achievement with respect to the curriculum objectives and content standards adopted by the state, in both breadth and depth. In practice, no test meets this stringent standard, and it needs to be recognized that, at best, most VAMs predict only performance on the test and not necessarily long-range learning outcomes.”

The paper thus states that the test only measures how well students do on a test and not their future learner outcomes.

Additionally, teachers attribute very little to the variance in student performance on the test. Remember, VAM’s were created to “show” how much a student has “grown” during a school year thus attributing “value” to a teacher. Again, the paper:

· “Research on VAMs has been fairly consistent that aspects of educational effectiveness that are measurable and within teacher control represent a small part of the total variation in student test scores or growth; most estimates in the literature attribute between 1% and 14% of the total variability to teachers. This is not saying that teachers have little effect on students, but that variation among teachers accounts for a small part of the variation in scores. The majority of the variation in test scores is attributable to factors outside of the teacher’s control such as student and family background, poverty, curriculum, and unmeasured influences.”

It is worth noting that professional statisticians are warning us that the majority of student variation in test scores is out of the control of the teacher. Currently, Pennsylvania teachers will have 50% of their evaluation based on VAM scores. Maybe a better percentage would be 1-14% since that aligns to statistical reality.

Finally, VAM scores may not tell us anything worthwhile because of the range in which a score could possibly fall. The report states:

· “The VAM scores themselves have large standard errors, even when calculated using several years of data. These large standard errors make rankings unstable, even under the best scenarios for modeling.”

The paper does state that there are good uses for VAM’s, a point that I am suspicious of. The paper specifically points out that they are useful to analyze the effect of a system on student learning. This requires that the use of the scores not be focused on specific teachers or classrooms, rather a broader view of the school or school district may lead to a more useful analysis. The bottom line is that VAM’s may be used as a small part of a larger conversation about student learning and teacher effectiveness.