Some Thoughts on the Wysocki Case

Since I’m so special and my comments are so wonderful (syke), I thought I would offer a “revised and extended” version of my comments on Bill Turque’s excellent expose today in the Washington Post.
========================
Look at the scatter plots of the NYC value-added data. They are essentially random clouds. Meaning that the VAM score one year essentially explains nothing about what’s going to happen next year. Or look here, or here, or here. Or here  or here or here (by Gary Rubenstein)
Would you trust a medical test of some sort that is only correct seven percent of the time?
(Yes, I said SEVEN percent (7%) . Not seventy percent (70%).)
That’s about how useful a VAM measure is, going by the only city-wide historical data that we have been able to look at so far — that from NYC.
(The LATimes seems to want at least $500 to allow me to look at such a spreadsheet for the LAUSD teachers, and from what Bill Turque writes, DCPS has lots of data as well and won’t release it. Because they are ashamed at how unreliable and useless it is. So I don’t have any other city except NYC.)
Utter nonsense.
I am glad that Ms. Wysocki spoke up and that Mr. Turque wrote it up.
This is good journalism.
(By the way, if look at one of Gary R’s posts,(his part 3) you will see that he found that if you track the NYCPS STUDENTS’ scores from year to year, they lie on a by nice messy cigar-shaped blob that goes up and to the right. The KIPP schools don’t change that. The charter schools don’t change that. Nor do the regular public schools.) Meaning that the students who score well one year, score pretty well the next year. Big surprise, huh? Teachers’ influence — well, we still don’t know how to measure it accurately. Maybe in a few decades will figure out something. But not now. We need to throw the educational DEformers out of office so that they can’t hurt anybody else.)

=========================

There are many excellent reasons to reject VAM as a means of making decisions that affect either teachers’ or students’ lives. One of those reasons is the story of Ms. Wysocki – and she’s not the only one.

I have recently done some simple scatterplots using Excel on the now-publicly-available New York City value-added database. I won’t bore you with the details, but I was able to compare value-added scores for school years 0506, 0607, and 0708 — comparing the exact same teachers in the exact same schools teaching the exact same subjects at the exact same grade level.

In any case, if you look at a number of my recent columns, you will see scatter plots where I paired the value-added scores for each pair of years. And what I discovered was that the LACK of correlation was simply overwhelming.

From one year to the next, the variation was phenomenal. It was almost as bad as someone rolling dice or throwing darts. Not quite as random as that, but close.

The r-squared correlation values, the part of one year’s value-added score that statisticians say “explains” the next year’s score, was between 0.05 and 0.08. Yup, five to eight percent. For something like VAM to be effective and useful, in my opinion, it should have an r or r^2 value in the high 80s or 90s (meaning 80% to 90% or more, if you prefer) for r or r^2 – I’m not even picky here. (Tricky fact here:  when you deal with decimals between 0 and 1, r^2 is SMALLER than r. But you knew that, right?)

You heard me right.

It’s like I’m saying that for VAM to be useful, its r^2 value – its ability to predict anything – should be scoring about in the range of 70% to 99%. However, as a whole, the predictive value of VAM is less than TEN PERCENT. Think about it. What kind of grade do you give someone (like Jason Kamras) who, with this system, are earning, essentially, scores of 6% to 9%?

That means, they fail. UTTERLY. It’s not even close.

And it’s not Ms. Wysocki who is failing. It’s that pompous ass, Jason Kamras, and his idol Erik Hanushek.

Seriously, would you trust medical or forensic test of anything if it only gives you the right answer less than ten percent of the time?

Please, see the charts for yourself.

BTW: Obviously other stuff changed in NYC (kids’ names, curriculum might change, etc) but this is about as close as you can get to a controlled experiment in education, holding most teacher stuff constant.

If you read the propaganda from Kamras and Rhee and their followers and funders, you would think that these scores would be very strongly correlated. And I’m talking about VALUE ADDED SCORES, not raw scores. Supposedly a strong teacher who is really strong is going to have a high value-added score every year, with only a very few exceptions, right? And if you continue drinking the VAM Kool-aid, you would also believe that teachers with low VAM scores would do a crappy job year after year after year as they wait in their cushy teachers’ chair to retire with a cast-gold pension without lifting a finger to do anything in class.

Facts are, however, stubborn things. And having data in hand allows us to see whether the educational DEformers’ claims are correct. Are VAM scores for teachers as consistent, say as students’ IQ and standardized tests and SAT scores and so on? (I bet those correlate pretty strongly in any one student — middling scores on any one of those tests will probably accompany middling scores on the others, fairly strongly; kids who get high SAT scores in HS probably did quite well on their state’s NCLB tests (if they take them); kids who get put through an IQ test and end up with low scores generally get low scores on the others as well. I put that in layman’s terms; statisticians have various ways of manipulating the numbers to come up with formulas that they find very meaningful, and are, but are often a bit much for the public to digest. (unfortunately).

However, a good scatter plot can equal many, many words, equations, or individual numbers for standard deviations, r-squared, linear or quadratic correlations, and the like.

If you have a blob that looks like a cigar that slopes up to the right at a 45-degree angle on a 2-variable graph, where you are plotting one score for a single teacher versus the same type of score for the same teacher, the VERY NEXT YEAR or the year after that, teaching the same subject in the same school building and at the same grade level, then most other things will be pretty stable the next year, usually. Such a graph implies that high-flyers are consistently good and scumbag lazy teachers are do-nothing idiots every year. If you have an elongated blimp that points down and to the right, then you have a strong negative correlation, which means that if they do well on the quantity measured on the X-axis, then they do POORLY on the quantity measured on the Y-axis, whatever they may be. The skinnier the blimpy blob, the stronger the correlation.

If you look at the graphs I made, which took me very, very little effort (Excel did all the hard work), you will see that this isn’t at all what we have. We have the next closest thing to absolute randomness: a big round blob with almost no direction at all.

Don’t believe me? Go look at the data yourself. Fire up a copy of Excel or any other spreadsheet. Put the entire column for 0506’s overall value-added score, expressed however you like (percentiles or whatever the NYC computed VA value is) and do the same thing for either 0607 or for 0708. (That may take some searching.) Ask your spreadsheet to plot them as a scatter plot. See if you don’t get a nearly formless blob. Ask Excel to calculate a linear regression. See what r^2 coefficient you get.

Don’t take my word for it. Have your computer do the hard work. See if I’m telling the truth or lying (like Kamras, Rhee, Bloomberg, and all the rest of them).

So this, in part, explains the insanity of the DCPS ‘value-added’ algorithm, and how a promising young teacher like Ms. Wysocki got fired.

=======================

What I found when looking at the value-added scores in NYC for 2005-8 was that a teacher in the top quintile among his/her fellow teachers (i.e., a “high flyer” by VAM) would have less than one chance in four of being anywhere in the top quintile the next year or 2 years later, even if teaching the exact same subject in the same school at the same grade level. And about a 20% chance of being in the bottom half in the second year. In other words, there was next to no correlation in scores from one year to the next.
Plus, I found that the “confidence interval” was extremely wide. a typical teacher (one at the median), the uncertainty in their value-added scores was FIFTY-SEVEN PERCENTILE RANKS.
Yup, you read that right.
If your printout says you are really at the 45th percentile (not so great; more than half the teachers’ computed scores are higher than yours), they openly admit that your real score might be anywhere from, sa, the 23rd percentile (really wretched) all the way up to 80th percentile (really great).
I’m not making this up.
See the data for yourself if you don’t believe me.
======================
A question for Bill Turque:
Did you ever ask Kaya Henderson whether she had any innocent explanation for the fact that Wayne Ryan, a literal poster boy for Michelle Rhee and Value Added and all that, who was all-but-indicted in the USA Today investigative series as a massive and serial eraser and changer of test answers in order to inflate test scores (and, not coincidentally, his bank account and official status) — yeah, him, — did she have any explanation as to why Wayne Ryan suddenly disappeared off the edge of the DC radar by suddenly “resigning” to “pursue other interests” soon after the report came out?
======================
pav11,
Has your wife actually looked at the year-to-year correlations of value-added scores in the New York City public school system? Entire spreadsheets are on line and easy to manipulate. Has she seen how low the correlation coefficients are from one year to the next for the exact same teachers teaching the exact same grade level, the identical subjects, in the exact same school buildings? I find it astonishing that anybody could be so stupid as to base ANYTHING of substance on numbers that are so close to numerology or throwing darts blindfolded at a chart on the wall to decide who gets fired and who gets a bonus.
==============
An exchange with a certain know-it-all:
FormerMCPSStudent wrote:
“DCPS is the 51% quality education in the US. The “veteran” teachers have proven themselves worthless.  
“The DC School Board protected these teachers for decades as a politically connected jobs program. No one who’s a “veteran” really earned their way into DCPS with talent or skill. They just had a connected uncle.”
——————-
 someone else wrote back,
“And what does this have to do with the article? Looks like you went to MCPS not DCPS, so where are your facts coming from?
 ——————
 FormerMCPSStudent wrote back,
“Having graduated from MCPS schools, I’m educated well enough to tell the difference between quality and crap.”
—————-
I then commented,
“Well, FormerMCPSStudent, perhaps you should try teaching in DCPS and showing the rest of us lazy, lame brain veteran teachers how easy it is to produce miracles?  
 
“I’m sure DCPS is hiring – teachers are constantly quitting in frustration or retiring, and the DC-CAS is administered next month. Surely you can pull out a 4.0 on your value-added measures in both english and math at the 5th grade level? You have a full five weeks to get it done! “
———————–
Published in: on March 7, 2012 at 3:35 pm  Comments (2)  
Tags: , ,

The URI to TrackBack this entry is: https://gfbrandenburg.wordpress.com/2012/03/07/some-thoughts-on-the-wysocki-case/trackback/

RSS feed for comments on this post.

2 CommentsLeave a comment

  1. Thanks for bringing this to light. It’s absolutely ridiculous, and the Union is about to agree to let this type of data be used for firing decisions. A bunch of fellow NY teachers and I have just come out with a petition to demand better and try to stop the use of these value-added measures to publicly shame and unjustly fire teachers. Take a look and help get the word out – http://www.change.org/petitions/stop-the-public-shaming-and-unjust-firing-of-teachers

    Like

    • With the data plots I made and Gary Rubenstein made, the case can be made quite persuasively that this should not be any part of any evaluation.

      Like


Leave a comment