Texas Decision Slams Value Added Measurements

And it does so for many of the reasons that I have been advocating. I am going to quote the entirety of Diane Ravitch’s column on this:

Audrey Amrein-Beardsley of Arizona State University is one of the nation’s most prominent scholars of teacher evaluation. She is especially critical of VAM (value-added measurement); she has studied TVAAS, EVAAS, and other similar metrics and found them deeply flawed. She has testified frequently in court cases as an expert witness.

In this post, she analyzes the court decision that blocks the use of VAM to evaluate teachers in Houston. The misuse of VAM was especially egregious in Houston, which terminated 221 teachers in one year, based on their VAM scores.

This is a very important article. Amrein-Beardsley and Jesse Rothstein of the University of California testified on behalf of the teachers; Tom Kane (who led the Gates’ Measures of Effective Teaching (MET) Study) and John Friedman (of the notorious Chetty-Friedman-Rockoff study) testified on behalf of the district.

Amrein-Beardsley writes:

Of primary issue will be the following (as taken from Judge Smith’s Summary Judgment released yesterday): “Plaintiffs [will continue to] challenge the use of EVAAS under various aspects of the Fourteenth Amendment, including: (1) procedural due process, due to lack of sufficient information to meaningfully challenge terminations based on low EVAAS scores,” and given “due process is designed to foster government decision-making that is both fair and accurate.”

Related, and of most importance, as also taken directly from Judge Smith’s Summary, he wrote:

HISD’s value-added appraisal system poses a realistic threat to deprive plaintiffs of constitutionally protected property interests in employment.

HISD does not itself calculate the EVAAS score for any of its teachers. Instead, that task is delegated to its third party vendor, SAS. The scores are generated by complex algorithms, employing “sophisticated software and many layers of calculations.” SAS treats these algorithms and software as trade secrets, refusing to divulge them to either HISD or the teachers themselves. HISD has admitted that it does not itself verify or audit the EVAAS scores received from SAS, nor does it engage any contractor to do so. HISD further concedes that any effort by teachers to replicate their own scores, with the limited information available to them, will necessarily fail. This has been confirmed by plaintiffs’ expert, who was unable to replicate the scores despite being given far greater access to the underlying computer codes than is available to an individual teacher [emphasis added, as also related to a prior post about how SAS claimed that plaintiffs violated SAS’s protective order (protecting its trade secrets), that the court overruled, see here].

The EVAAS score might be erroneously calculated for any number of reasons, ranging from data-entry mistakes to glitches in the computer code itself. Algorithms are human creations, and subject to error like any other human endeavor. HISD has acknowledged that mistakes can occur in calculating a teacher’s EVAAS score; moreover, even when a mistake is found in a particular teacher’s score, it will not be promptly corrected. As HISD candidly explained in response to a frequently asked question, “Why can’t my value-added analysis be recalculated?”:

Once completed, any re-analysis can only occur at the system level. What this means is that if we change information for one teacher, we would have to re- run the analysis for the entire district, which has two effects: one, this would be very costly for the district, as the analysis itself would have to be paid for again; and two, this re-analysis has the potential to change all other teachers’ reports.

The remarkable thing about this passage is not simply that cost considerations trump accuracy in teacher evaluations, troubling as that might be. Of greater concern is the house-of-cards fragility of the EVAAS system, where the wrong score of a single teacher could alter the scores of every other teacher in the district. This interconnectivity means that the accuracy of one score hinges upon the accuracy of all. Thus, without access to data supporting all teacher scores, any teacher facing discharge for a low value-added score will necessarily be unable to verify that her own score is error-free.

HISD’s own discovery responses and witnesses concede that an HISD teacher is unable to verify or replicate his EVAAS score based on the limited information provided by HISD.

According to the unrebutted testimony of plaintiffs’ expert, without access to SAS’s proprietary information – the value-added equations, computer source codes, decision rules, and assumptions – EVAAS scores will remain a mysterious “black box,” impervious to challenge.

While conceding that a teacher’s EVAAS score cannot be independently verified, HISD argues that the Constitution does not require the ability to replicate EVAAS scores “down to the last decimal point.” But EVAAS scores are calculated to the second decimal place, so an error as small as one hundredth of a point could spell the difference between a positive or negative EVAAS effectiveness rating, with serious consequences for the affected teacher.

Hence, “When a public agency adopts a policy of making high stakes employment decisions based on secret algorithms incompatible with minimum due process, the proper remedy is to overturn the policy.”

It’s not so much that we have bad teachers (even tho they do exist): It’s an incoherent educational system that is at fault

Very interesting article in Atlantic by E.D. Hirsch on the problems facing American education. Among other things, he finds (as I do) that Value-Added Measurements are utterly unreliable and, indeed, preposterous. But most of all, he finds that the American educational system is extremely poorly run because its principal ideas lack any coherence at all.

Here are a couple of paragraphs:

The “quality” of a teacher doesn’t exist in a vacuum. Within the average American primary school, it is all but impossible for a superb teacher to be as effective as a merely average teacher is in the content-cumulative Japanese elementary school. For one thing, the American teacher has to deal with big discrepancies in student academic preparation while the Japanese teacher does not. In a system with a specific and coherent curriculum, the work of each teacher builds on the work of teachers who came before. The three Cs—cooperation, coherence, and cumulativeness—yield a bigger boost than the most brilliant efforts of teachers working individually against the odds within a system that lacks those qualities. A more coherent system makes teachers better individually and hugely better collectively.

American teachers (along with their students) are, in short, the tragic victims of inadequate theories. They are being blamed for the intellectual inadequacies behind the system in which they find themselves. The real problem is not teacher quality but idea quality. The difficulty lies not with the inherent abilities of teachers but with the theories that have watered down their training and created an intellectually chaotic school environment. The complaint that teachers do not know their subject matter would change almost overnight with a more specific curriculum with less evasion about what the subject matter of that curriculum ought to be. Then teachers could prepare themselves more effectively, and teacher training could ensure that teacher candidates have mastered the content they will be responsible for teaching.”


“You will differentiate instruction for every student in exactly the same way, or else”


One of the many reasons I rejoice every day that I was able to retire!


Read what classroom observations have devolved to:


Scooped by Gary Rubenstein

If you are very observant, take a look at a graph by Gary Rubenstein on his blog 2/12/25, and look at a graph by me on 3/9/12, nearly a (short) month later.

Both show the lack of correlation between a teachers’ score on the exceedingly complex Teaching and Learning Framework classroom observation scores on the one hand, and their scores on the Individual Added-Value measurement scheme either in math or reading or both, depending on what subject(s) and grade levels that they taught.

Gary’s graph is, of course, populated by lots of bright red triangles; mine has little blue squares. His grid is missing vertical lines, so mine is clearly better. (joke !) But look even more carefully – you can see that the individual triangles and squares are in the identical places.

This shows that Excel, when given the same data, will produce much the same graph.

It’s really easy to do, by the way. You should try it. Here is the original data table.


Published in: on March 11, 2012 at 2:48 pm  Comments (3)  
Tags: , , , ,
%d bloggers like this: