One of the things that experimental scientists really should do is to try to replicate each other’s results to see if they are correct or not. I have begun doing that with the value-added scores awarded to teachers in New York City, and I find that I generally agree with the results obtained by Gary Rubenstein.

What I did is looked at the value-added scores, in percentiles, that were “awarded” to thousands of New York City public school teachers in school years 05-06, 06-07, and 07-08. I found that there is essentially no correlation between the scores of the exact same teacher from year to year. The r-squared coefficients are on the order of 0.08 to 0.09 – about as close to random as you can ever get in real life.

Here are my two graphs for the night:

I actually had Excel draw the line of regression, but it’s a joke: an r-squared correlation coefficient of 0.0877 means, as I said, that there is extremely little correlation between what any teacher got in school year 05-06 and what they got in SY 06-07. In the same school. With very similar kids. Teaching the same subject.

And, a similar graph comparing teachers’ scores for school year 06-07 with their scores for 07-08:

So, one year, a teacher might be around the 90th percentile. The next year, she might be around the 10th percentile. Or the other way around. Did the teacher suddenly get stupendously better (or worse)? I doubt it. By the time they are adults, most people are pretty consistent. But not according to this graph. In fact, if somebody is in the 90th to 100th percentile in school year 2006/07, then the probability that they would remain in the same 90th-to-100th-percentile bracket is roughly 1 in 4. If they are in the 0th to 10th percentile in 2006-2007, the chances that they would remain in the same bracket the following year is about 7%!!

What this shows is that using value-added scores to determine if someone should keep their job or get a bonus or a demotion is absolutely insane.

### Like this:

Like Loading...

*Related*

I became an emergency sub in a wild class in an inner-city. I had to scream, duck, jump, run, crawl, hide, intimidate, be screamed at…every moment was literally spent preserving my safety. I was asked to “leave.” However, the students’ test scores improved. Go wonder?

It would seem that classroom management skills are not related to increased learning.

Another “amazing” finding of VAM

[...] should also see his earlier posts, “Gary Rubenstein is right, no correlation on value-added scores in New York city,” and “Gary Rubenstein demonstrates that the NYC ‘value-added’ measurements [...]

Technically the term “correlation coefficient” is referred to as “R” while R-squared is the “coefficient of determination.” So it is unclear to me whether the data are one or the other. It doesn’t really change the interpretation but it should be clarified.

OK, you may have me on a technicality. I’m referring to r-squared throughout, whatever you may call it.

R-squared is mathematically the percent of one variable that is “explained” by the other variable: the coefficient of determination. Thus a value added in one year that has a low R-squared with a subsequent value added is simply stating that knowing the value added provides almost no advantage in knowing the other value added. Which is to say that knowing a teacher’s value added provides almost no information toward predicting, or “explaining”, the teacher’s value added in subsequent years. Conversely, if there was a high R-squared it would mean that knowing a teacher’s value added in one year was highly predictive of value added in other years, which is the default assumption of the VAM advocates, but denied by the empirical evidence of the R-squared.

[...] explains nothing about what’s going to happen next year. Or look here, or here, or here. Or here or here or here (by Gary Rubenstein) Would you trust a medical test of some sort that [...]

[...] a huge margin of error in this model’s results. (Other analyses have also suggested this lack of year-to-year correlation and the wide margin of error in individual teacher evaluations based on these value-added metrics: [...]

[…] there was no correlation between teachers’ 2005-06 scores and their 2007-08 scores. There was no correlation between a teacher’s value-add score from year to year. It was, in fact, close to random. A […]

[…] there was no correlation between teachers’ 2005-06 scores and their 2007-08 scores. There was no correlation between a teacher’s value-add score from year to year. It was, in fact, close to random. A […]