(Yes, I said SEVEN percent (7%) . Not seventy percent (70%).)
That’s about how useful a VAM measure is, going by the only city-wide historical data that we have been able to look at so far — that from NYC.
Utter nonsense.
I am glad that Ms. Wysocki spoke up and that Mr. Turque wrote it up.
There are many excellent reasons to reject VAM as a means of making decisions that affect either teachers’ or students’ lives. One of those reasons is the story of Ms. Wysocki – and she’s not the only one.
I have recently done some simple scatterplots using Excel on the now-publicly-available New York City value-added database. I won’t bore you with the details, but I was able to compare value-added scores for school years 0506, 0607, and 0708 — comparing the exact same teachers in the exact same schools teaching the exact same subjects at the exact same grade level.
In any case, if you look at a number of my recent columns, you will see scatter plots where I paired the value-added scores for each pair of years. And what I discovered was that the LACK of correlation was simply overwhelming.
From one year to the next, the variation was phenomenal. It was almost as bad as someone rolling dice or throwing darts. Not quite as random as that, but close.
The r-squared correlation values, the part of one year’s value-added score that statisticians say “explains” the next year’s score, was between 0.05 and 0.08. Yup, five to eight percent. For something like VAM to be effective and useful, in my opinion, it should have an r or r^2 value in the high 80s or 90s (meaning 80% to 90% or more, if you prefer) for r or r^2 – I’m not even picky here. (Tricky fact here: when you deal with decimals between 0 and 1, r^2 is SMALLER than r. But you knew that, right?)
You heard me right.
It’s like I’m saying that for VAM to be useful, its r^2 value – its ability to predict anything – should be scoring about in the range of 70% to 99%. However, as a whole, the predictive value of VAM is less than TEN PERCENT. Think about it. What kind of grade do you give someone (like Jason Kamras) who, with this system, are earning, essentially, scores of 6% to 9%?
That means, they fail. UTTERLY. It’s not even close.
And it’s not Ms. Wysocki who is failing. It’s that pompous ass, Jason Kamras, and his idol Erik Hanushek.
Seriously, would you trust medical or forensic test of anything if it only gives you the right answer less than ten percent of the time?
Please, see the charts for yourself.
BTW: Obviously other stuff changed in NYC (kids’ names, curriculum might change, etc) but this is about as close as you can get to a controlled experiment in education, holding most teacher stuff constant.
If you read the propaganda from Kamras and Rhee and their followers and funders, you would think that these scores would be very strongly correlated. And I’m talking about VALUE ADDED SCORES, not raw scores. Supposedly a strong teacher who is really strong is going to have a high value-added score every year, with only a very few exceptions, right? And if you continue drinking the VAM Kool-aid, you would also believe that teachers with low VAM scores would do a crappy job year after year after year as they wait in their cushy teachers’ chair to retire with a cast-gold pension without lifting a finger to do anything in class.
Facts are, however, stubborn things. And having data in hand allows us to see whether the educational DEformers’ claims are correct. Are VAM scores for teachers as consistent, say as students’ IQ and standardized tests and SAT scores and so on? (I bet those correlate pretty strongly in any one student — middling scores on any one of those tests will probably accompany middling scores on the others, fairly strongly; kids who get high SAT scores in HS probably did quite well on their state’s NCLB tests (if they take them); kids who get put through an IQ test and end up with low scores generally get low scores on the others as well. I put that in layman’s terms; statisticians have various ways of manipulating the numbers to come up with formulas that they find very meaningful, and are, but are often a bit much for the public to digest. (unfortunately).
However, a good scatter plot can equal many, many words, equations, or individual numbers for standard deviations, r-squared, linear or quadratic correlations, and the like.
If you have a blob that looks like a cigar that slopes up to the right at a 45-degree angle on a 2-variable graph, where you are plotting one score for a single teacher versus the same type of score for the same teacher, the VERY NEXT YEAR or the year after that, teaching the same subject in the same school building and at the same grade level, then most other things will be pretty stable the next year, usually. Such a graph implies that high-flyers are consistently good and scumbag lazy teachers are do-nothing idiots every year. If you have an elongated blimp that points down and to the right, then you have a strong negative correlation, which means that if they do well on the quantity measured on the X-axis, then they do POORLY on the quantity measured on the Y-axis, whatever they may be. The skinnier the blimpy blob, the stronger the correlation.
If you look at the graphs I made, which took me very, very little effort (Excel did all the hard work), you will see that this isn’t at all what we have. We have the next closest thing to absolute randomness: a big round blob with almost no direction at all.
Don’t believe me? Go look at the data yourself. Fire up a copy of Excel or any other spreadsheet. Put the entire column for 0506’s overall value-added score, expressed however you like (percentiles or whatever the NYC computed VA value is) and do the same thing for either 0607 or for 0708. (That may take some searching.) Ask your spreadsheet to plot them as a scatter plot. See if you don’t get a nearly formless blob. Ask Excel to calculate a linear regression. See what r^2 coefficient you get.
Don’t take my word for it. Have your computer do the hard work. See if I’m telling the truth or lying (like Kamras, Rhee, Bloomberg, and all the rest of them).
So this, in part, explains the insanity of the DCPS ‘value-added’ algorithm, and how a promising young teacher like Ms. Wysocki got fired.
=======================
Yup, you read that right.
If your printout says you are really at the 45th percentile (not so great; more than half the teachers’ computed scores are higher than yours), they openly admit that your real score might be anywhere from, sa, the 23rd percentile (really wretched) all the way up to 80th percentile (really great).
I’m not making this up.
See the data for yourself if you don’t believe me.
Has your wife actually looked at the year-to-year correlations of value-added scores in the New York City public school system? Entire spreadsheets are on line and easy to manipulate. Has she seen how low the correlation coefficients are from one year to the next for the exact same teachers teaching the exact same grade level, the identical subjects, in the exact same school buildings? I find it astonishing that anybody could be so stupid as to base ANYTHING of substance on numbers that are so close to numerology or throwing darts blindfolded at a chart on the wall to decide who gets fired and who gets a bonus.
“The DC School Board protected these teachers for decades as a politically connected jobs program. No one who’s a “veteran” really earned their way into DCPS with talent or skill. They just had a connected uncle.”
“I’m sure DCPS is hiring – teachers are constantly quitting in frustration or retiring, and the DC-CAS is administered next month. Surely you can pull out a 4.0 on your value-added measures in both english and math at the 5th grade level? You have a full five weeks to get it done! “
Thanks for bringing this to light. It’s absolutely ridiculous, and the Union is about to agree to let this type of data be used for firing decisions. A bunch of fellow NY teachers and I have just come out with a petition to demand better and try to stop the use of these value-added measures to publicly shame and unjustly fire teachers. Take a look and help get the word out – http://www.change.org/petitions/stop-the-public-shaming-and-unjust-firing-of-teachers
LikeLike
With the data plots I made and Gary Rubenstein made, the case can be made quite persuasively that this should not be any part of any evaluation.
LikeLike