## More Problems With Value-Added Measurements for Teachers

I finally got around to reading and skimming the MATHEMATICA reports on VAM for schools and individual teachers in DCPS.

.
At first blush, it’s pretty impressive mathematical and statistical work. It looks like they were very careful to take care of lots of possible problems, and they have lots of nice greek letters and very learned and complicated mathematical formulas, with tables giving the values of many of the variables in their model. They even use large words like heteroscedasticity to scare off those not really adept at professional statistics (which would include even me). See pages 12 – 20 for examples of this mathematics of intimidation, as John Ewing of MfA and the AMS has described it. Here is one such learned equation:
BUT:
.
However clever and complex a model might be, it needs to do a good job of explaining and describing reality, or it’s just another failed hypothesis that needs to be rejected (like the theories of the 4 humours or the Aether). One needs to actually compare its track record with the real world and see how well the model compares with the real world.
.
Which is precisely what these authors do NOT do, even though they claim that “for teachers with the lowest possible IMPACT score in math — the bottom 3.6 percent of DCPS teachers — one can say with at least 99.9 percent confidence that these teachers were below average in 2010.” (p. 5)
.
Among other things, such a model would need to be consistent over time, i.e., reliable. Every indication I have seen, including in other cities that the authors themselves cite (NYC–see p. 2 of the 2010 report) indicates that individual value-added scores for a given teacher jump around randomly from year to year in cases of a teacher working at the exact same school, exact same grade level, exact same subject; or in cases of a teacher teaching 2 grade levels in the same school; or in cases of a teacher teaching 2 subjects, during the same year. Those correlations appear to be in the range of 0.2 to 0.3, which is frankly not enough to judge who is worth receiving large cash bonuses or a pink slip.
.
Unless something obvious escaped me, the authors do not appear to mention any study of how teachers’ IVA scores vary over time or from class to class, even though they had every student’s DC-CAS scores from 2007 through the present (see footnote, page 7).
.
In neither report do they acknowledge the possibility of cheating by adults (or students).
.
They do acknowledge on page 2 that a 2008 study found low correlations between proficiency gains and value-added estimates for individual schools in DCPS from 2005-2007. They attempt to explain that low correlation by “changes in the compositions of students from one year to the next” — which I doubt. I suspect it’s that neither one is a very good measure.
.
They also don’t mention anything about correlations between value-added scores and classroom-observations scores. From the one year of data that I received, this correlation is also very low. It is possible that this correlation is tighter today than it used to be, but I would be willing to wager tickets to a professional DC basketball, hockey, or soccer game that it’s not over 0.4.
.
The authors acknowledge that “[t]he DC CAS is not specifically designed for users to compare gains across grades.” Which means, they probably shouldn’t be doing it. It’s also the case that many, many people do not feel that the DC-CAS does a very good job of measuring much of anything useful except the socio-economic status of the student’s parents.
.
In any case, the mathematical model they have made may be wonderful, but real data so far suggests that it does not predict anything useful about teaching and learning.
Published in: on January 21, 2014 at 11:08 am  Comments (2)
Tags: , ,