People are Not Cattle!

This apparently did not occur to William Sanders.

He thought that statistical methods that are useful with farm animals could also be used to measure effectiveness of teachers.

I grew up on a farm, and as both a kid and a young man I had considerable experience handling cows, chickens, and sheep. (These are generic critter photos, not the actual animals we had.)

I also taught math and some science to kids like the ones shown below for over 30 years.

guy teaching  deal students

Caring for farm animals and teaching young people are not the same thing.


As the saying goes: “Teaching isn’t rocket science. It’s much harder.”

I am quite sure that with careful measurements of different types of feed, medications, pasturage, and bedding, it is quite possible to figure out which mix of those elements might help or hinder the production of milk and cream from dairy cows. That’s because dairy or meat cattle (or chickens, or sheep, or pigs) are pretty simple creatures: all a farmer wants is for them to produce lots of high-quality milk, meat, wool, or eggs for the least cost to the farmer, and without getting in trouble.

William Sanders was well-known for his statistical work with dairy cows. His step into hubris and nuttiness was to translate this sort of mathematics to little humans. From Wikipedia:

“The model has prompted numerous federal lawsuits charging that the evaluation system, which is now tied to teacher pay and tenure in Tennessee, doesn’t take into account student-level variables such as growing up in poverty. In 2014, the American Statistical Association called its validity into question, and other critics have said TVAAS should not be the sole tool used to judge teachers.”

But there are several problems with this.

  • We  don’t have an easily-defined and nationally-agreed upon goal for education that we can actually measure. If you don’t believe this, try asking a random set of people what they think should be primary the goal of education, and listen to all the different ideas!
  • It’s certainly not just ‘higher test scores’ — the math whizzes who brought us “collateralization of debt-swap obligations in leveraged financings” surely had exceedingly high math test scores, but I submit that their character education (as in, ‘not defrauding the public’) was lacking. In their selfishness and hubris, they have succeeded in nearly bankrupting the world economy while buying themselves multiple mansions and yachts, yet causing misery to billions living in slums around the world and millions here in the US who lost their homes and are now sleeping in their cars.
  • Is our goal also to ‘educate’ our future generations for the lowest cost? Given the prices for the best private schools and private tutors, it is clear that the wealthy believe that THEIR children should be afforded excellent educations that include very small classes, sports, drama, music, free play and exploration, foreign languages, writing, literature, a deep understanding and competency in mathematics & all of the sciences, as well as a solid grounding in the social sciences (including history, civics, and character education). Those parents realize that a good education is expensive, so they ‘throw money at the problem’. Unfortunately, the wealthy don’t want to do the same for the children of the poor.
  • Reducing the goals of education to just a student’s scores on secretive tests in just two subjects, and claiming that it’s possible to tease out the effectiveness of ANY teacher, even those who teach neither English/Language Arts or Math, is madness.
  • Why? Study after study (not by Sanders, of course) has shown that the actual influence of any given teacher on a student is only from 1% of 14% of test scores. By far the greatest influence is from the student’s own family background, not the ability of a single teacher to raise test scores in April. (An effect which I have shown is chimerical — the effect one year is mostly likely completely different the next year!)
  • By comparison, a cow’s life is pretty simple. They eat whatever they are given (be that straw, shredded newspaper, cotton seeds, chicken poop mixed with sawdust, or even the dregs from squeezing out orange juice [no, I’m not making that up.]. Cows also poop, drink, pee, chew their cud, and sometimes they try to bully each other. If it’s a dairy cow, it gets milked twice a day, every day, at set times. If it’s a steer, he/it mostly sits around and eats (and poops and pees) until it’s time to send  them off to the slaughterhouse. That’s pretty much it.
  • Gary Rubinstein and I have dissected the value-added scores for New York City public school teachers that were computed and released by the New York Times. We both found that for any given teacher who taught the same subject matter and grade level in the very same school over the period of the NYT data, there was almost NO CORRELATION between their scores for one year to the next.
  • We also showed that teachers who were given scores in both math and reading (say, elementary teachers), there was almost no correlation between their scores in math and in reading.
  • Furthermore, with teachers who were given scores in a single subject (say, math) but at different grade levels (say, 6th and 7th grade math), you guessed it: extremely low correlation.
  • In other words, it seemed to act like a very, very expensive and complicated random-number generator.
  • People have much, much more complicated inputs, and much more complicated outputs. Someone should have written on William Sanders’ tombstone the phrase “People are not cattle.”

Interesting fact: Jason Kamras was considered to be the architect of Value-Added measurement for teachers in Washington, DC, implemented under the notorious and now-disgraced Michelle Rhee. However, when he left DC to become head of Richmond VA public schools, he did not bring it with him.


Ten Years of Educational Reform in DC – Results: Total MathCounts Collapse for the Public AND Charter Schools

Just having finished helping to judge the first three rounds of the DC State-Level MathCounts competition, I have some sad news. NOT A SINGLE TEAM FROM ANY DC PUBLIC OR CHARTER SCHOOL PARTICIPATED. Two kids from Hardy MS were the only ones from any DC public or charter school.

I was in the judging room where all the answer sheets were handed in, and I and some engineers and mathematicians had volunteered to come in and score the answers.*

In past years, for example, when I was a math teacher and MathCounts coach at Alice Deal JHS/MS, the public schools often dominated the competitions. It wasn’t just my own teams, though — many students from other public schools, and later on, from DC’s charter schools, participated. (Many years, my team beat all of the others. Sometimes we didn’t, but we were always quite competitive, and I have a lot of trophies.)

While a few public or charter schools did field full or partial teams on the previous “chapter” level of competition last month, this time, at the “state” level I am sad to report that there were none at all. (Including Deal. =-{ )

That’s what ten years of Education ‘Reform’ has brought to DC public and charter schools.

Such excellence! a bunch of rot.

In addition to the facts that

  • one-third of last year’s DCPS senior class had so many unexcused class absences that they shouldn’t have graduated at all;
  • officials simply lied about massive attendance and truancy problems;
  • officials are finally beginning to investigate massive enrollment frauds at desirable DC public schools
  • DCPS hid enormous amounts of cheating by ADULTS on the SAT-9 NCLB test after Rhee twisted each principal’s arm to produce higher scores or else.
  • the punishment of pretty much any student misbehavior in class has been forbidden;
  • large number of actual suspensions were in fact hidden;
  • there is a massive turnover of teachers and school administrators – a revolving door as enormous percentages of teachers break down and quit mid-year (in both public and charter schools);
  • there isfraudulent manipulation of waiting lists;
  • these frauds are probably also true at some or all of charter schools, but nobody is investigating them at all because they don’t have to share data and the ‘state’ agency hides what they do get;
  • DC still has the largest black-white standardized test-score gap in the nation;
  • DC is still attempting to implement a developmentally-inappropriate “common core” curriculum funded by Bill Gates and written by a handful of know-it-alls who had never taught;
  • Rhee and Henderson fired or forced out massive numbers of African-American teachers, often lying about the reasons;
  • they implemented a now-many-times-discredited“value-added method” of determining the supposed worth of teachers and administrators, and used that to terminate many of them;
  • they also closed  dozens of public schools in poor, black neighborhoods.

Yes, fourth-grade NAEP national math and reading scores have continued to rise – but they were rising at just about that exact same rate from 2000 through 2007, that is to say, BEFORE mayoral control of schools and the appointment of that mistress of lies, fraud, and false accusations: Michelle Rhee.

So what I saw today at the DC ‘state’-wide competition is just one example of how to destroy public education.

When we will we go back to having an elected school board, and begin having a rational, integrated, high-quality public educational system in DC?


* Fortunately, we didn’t have to produce the answers ourselves! Those questions are really HARD! We adults, all mathematically quite proficient, had fun trying to solve a few of them when we had some down time — and marveled at the idea of sixth, seventh, or eighth graders solving them at all! (If you are curious, you can see previous year’s MathCounts questions here.)

Texas Decision Slams Value Added Measurements

And it does so for many of the reasons that I have been advocating. I am going to quote the entirety of Diane Ravitch’s column on this:

Audrey Amrein-Beardsley of Arizona State University is one of the nation’s most prominent scholars of teacher evaluation. She is especially critical of VAM (value-added measurement); she has studied TVAAS, EVAAS, and other similar metrics and found them deeply flawed. She has testified frequently in court cases as an expert witness.

In this post, she analyzes the court decision that blocks the use of VAM to evaluate teachers in Houston. The misuse of VAM was especially egregious in Houston, which terminated 221 teachers in one year, based on their VAM scores.

This is a very important article. Amrein-Beardsley and Jesse Rothstein of the University of California testified on behalf of the teachers; Tom Kane (who led the Gates’ Measures of Effective Teaching (MET) Study) and John Friedman (of the notorious Chetty-Friedman-Rockoff study) testified on behalf of the district.

Amrein-Beardsley writes:

Of primary issue will be the following (as taken from Judge Smith’s Summary Judgment released yesterday): “Plaintiffs [will continue to] challenge the use of EVAAS under various aspects of the Fourteenth Amendment, including: (1) procedural due process, due to lack of sufficient information to meaningfully challenge terminations based on low EVAAS scores,” and given “due process is designed to foster government decision-making that is both fair and accurate.”

Related, and of most importance, as also taken directly from Judge Smith’s Summary, he wrote:

HISD’s value-added appraisal system poses a realistic threat to deprive plaintiffs of constitutionally protected property interests in employment.

HISD does not itself calculate the EVAAS score for any of its teachers. Instead, that task is delegated to its third party vendor, SAS. The scores are generated by complex algorithms, employing “sophisticated software and many layers of calculations.” SAS treats these algorithms and software as trade secrets, refusing to divulge them to either HISD or the teachers themselves. HISD has admitted that it does not itself verify or audit the EVAAS scores received from SAS, nor does it engage any contractor to do so. HISD further concedes that any effort by teachers to replicate their own scores, with the limited information available to them, will necessarily fail. This has been confirmed by plaintiffs’ expert, who was unable to replicate the scores despite being given far greater access to the underlying computer codes than is available to an individual teacher [emphasis added, as also related to a prior post about how SAS claimed that plaintiffs violated SAS’s protective order (protecting its trade secrets), that the court overruled, see here].

The EVAAS score might be erroneously calculated for any number of reasons, ranging from data-entry mistakes to glitches in the computer code itself. Algorithms are human creations, and subject to error like any other human endeavor. HISD has acknowledged that mistakes can occur in calculating a teacher’s EVAAS score; moreover, even when a mistake is found in a particular teacher’s score, it will not be promptly corrected. As HISD candidly explained in response to a frequently asked question, “Why can’t my value-added analysis be recalculated?”:

Once completed, any re-analysis can only occur at the system level. What this means is that if we change information for one teacher, we would have to re- run the analysis for the entire district, which has two effects: one, this would be very costly for the district, as the analysis itself would have to be paid for again; and two, this re-analysis has the potential to change all other teachers’ reports.

The remarkable thing about this passage is not simply that cost considerations trump accuracy in teacher evaluations, troubling as that might be. Of greater concern is the house-of-cards fragility of the EVAAS system, where the wrong score of a single teacher could alter the scores of every other teacher in the district. This interconnectivity means that the accuracy of one score hinges upon the accuracy of all. Thus, without access to data supporting all teacher scores, any teacher facing discharge for a low value-added score will necessarily be unable to verify that her own score is error-free.

HISD’s own discovery responses and witnesses concede that an HISD teacher is unable to verify or replicate his EVAAS score based on the limited information provided by HISD.

According to the unrebutted testimony of plaintiffs’ expert, without access to SAS’s proprietary information – the value-added equations, computer source codes, decision rules, and assumptions – EVAAS scores will remain a mysterious “black box,” impervious to challenge.

While conceding that a teacher’s EVAAS score cannot be independently verified, HISD argues that the Constitution does not require the ability to replicate EVAAS scores “down to the last decimal point.” But EVAAS scores are calculated to the second decimal place, so an error as small as one hundredth of a point could spell the difference between a positive or negative EVAAS effectiveness rating, with serious consequences for the affected teacher.

Hence, “When a public agency adopts a policy of making high stakes employment decisions based on secret algorithms incompatible with minimum due process, the proper remedy is to overturn the policy.”

Judge in NY State Throws Out ‘Value-Added Model’ Ratings

I am pleased that in an important, precedent-setting case, a judge in New York State has ruled that using Value-Added measurements to judge the effectiveness of teachers is ‘arbitrary’ and ‘capricious’.

The case involved teacher Sheri Lederman, and was argued by her husband.

“New York Supreme Court Judge Roger McDonough said in his decision that he could not rule beyond the individual case of fourth-grade teacher Sheri G. Lederman because regulations around the evaluation system have been changed, but he said she had proved that the controversial method that King developed and administered in New York had provided her with an unfair evaluation. It is thought to be the first time a judge has made such a decision in a teacher evaluation case.”

In case you were unaware of it, VAM is a statistical black box used to predict how a hypothetical student is supposed to score on a Big Standardized Test one year based on the scores of every other student that year and in previous years. Any deviation (up or down) of that score is attributed to the teacher.

Gary Rubinstein and I have looked into how stable those VAM scores are in New York City, where we had actual scores to work with (leaked by the NYTimes and other newspapers). We found that they were inconsistent and unstable in the extreme! When you graph one year’s score versus next year’s score, we found that there was essentially no correlation at all, meaning that a teacher who is assigned the exact same grade level, in the same school, with very similar  students, can score high one year, low the next, and middling the third, or any combination of those. Very, very few teachers got scores that were consistent from year to year. Even teachers who taught two or more grade levels of the same subject (say, 7th and 8th grade math) had no consistency from one subject to the next. See my blog  (not all on NY City) herehere, here,  here, herehere, here, here,  herehere, and here. See Gary R’s six part series on his blog here, here, here, here, here, and here. As well as a less technical explanation here.

Mercedes Schneider has done similar research on teachers’ VAM scores in Louisiana and came up with the same sorts of results that Rubinstein and I did.

Which led all three of us to conclude that the entire VAM machinery was invalid.

And which is why the case of Ms. Lederman is so important. Similar cases have been filed in numerous states, but this is apparently the first one where a judgement has been reached.

(Also read this. and this.)

Gary Rubenstein is Right: No correlation on Value Added Scores in NYC

One of the things that experimental scientists really should do is to try to replicate each other’s results to see if they are correct or not. I have begun doing that with the value-added scores awarded to teachers in New York City, and I find that I generally agree with the results obtained by Gary Rubenstein.

What I did is looked at the value-added scores, in percentiles, that were “awarded” to thousands of New York City public school teachers in school years 05-06, 06-07, and 07-08. I found that there is essentially no correlation between the scores of the exact same teacher from year to year. The r-squared coefficients are on the order of 0.08 to 0.09 – about as close to random as you can ever get in real life.

Here are my two graphs for the night:

I actually had Excel draw the line of regression, but it’s a joke: an r-squared correlation coefficient of 0.0877 means, as I said, that there is extremely little correlation between what any teacher got in school year 05-06 and what they got in SY 06-07. In the same school. With very similar kids. Teaching the same subject.

And, a similar graph comparing teachers’ scores for school year 06-07 with their scores for 07-08:

So, one year, a teacher might be around the 90th percentile. The next year, she might be around the 10th percentile. Or the other way around. Did the teacher suddenly get stupendously better (or worse)? I doubt it. By the time they are adults, most people are pretty consistent. But not according to this graph. In fact, if somebody is in the 90th to 100th percentile in school year 2006/07, then the probability that they would remain in the same 90th-to-100th-percentile bracket is roughly 1 in 4. If they are in the 0th to 10th percentile in 2006-2007, the chances that they would remain in the same bracket the following year is about 7%!!

What this shows is that using value-added scores to determine if someone should keep their job or get a bonus or a demotion is absolutely insane.

Published in: on March 3, 2012 at 11:28 pm  Comments (12)  
Tags: , ,

A Study on Whether the Los Angeles Value-Added Measurements are Correct

Here is the link to the article, which is pretty wonky:

%d bloggers like this: