Value-Added Measurements Are Less Accurate Than Flipping a Coin

From a recent comment on my blog:

‘You’re probably familiar with “Reanalysis of the Effects of Teacher Replacement Using Value-Added Modeling” by Stuart Yeh that looked at a broad spectrum of studies that found a similar result regarding the (un)reliability of VAM. Namely, that it was no better than flipping a coin.

‘“The intertemporal reliability of value-added teacher rankings was investigated by Aaronson et al. (2007), Ballou (2005), Koedel and Betts (2007), and McCaffrey et al. (2009). In each study, VAM was used to rank teacher performance from high to low. In each study, a majority of teachers who ranked in the lowest quartile or lowest quintile shifted out of that quartile (or quintile) the following year (see Tables 1 and 2). Furthermore, a majority of teachers who ranked in the highest quartile or quintile shifted out of that quartile (or quintile) the following year (see Tables 1 and 2).

….

‘“In the case of value-added rankings, it is inappropriate to infer that a teacher should be hired or fired based on the rankings from any given year. Since this inference would be inappropriate, the results of valueadded teacher rankings are not valid for the purpose of high-stakes decisions regarding hiring and firing. In short, VAM lacks validity for the purpose of high-stakes decisions regarding individual teachers. 

‘While some researchers suggest averaging two or more years of rankings, averaging may introduce significant bias– raising the issue of validity once again (McCaffrey et al., 2009). Furthermore, it would not be uncommon for data to be missing in a way that would prevent averaging. For large numbers of teachers, it would be impractical. (Newton et al 2010).

‘Regardless, when two years of rankings are used for tenure decisions, intertemporal reliability remains low: In reading, data from North Carolina indicate that 68% of teachers ranked in the bottom quintile shift out of that quintile after tenure (indicated by a weighted average of all post-tenure observations), and 54% of teachers ranked in the top quintile shift out of that quintile post tenure (Goldhaber & Hansen, 2008). When three years of rankings are used, reliability is even worse: 74% of teachers ranked in the bottom quintile shift out of that quintile post-tenure, and 56% of teachers ranked in the top quintile shift out of that quintile post tenure (Goldhaber & Hansen, 2008). In math reliability is somewhat better, but over half of all teachers in the bottom and top quintiles shift out of those quintiles post tenure (Goldhaber & Hansen, 2008).

‘“These results were confirmed by a second value-added analysis, also using data from North Carolina, which found that more than half of all teachers who ranked in the bottom quintile shifted out of that quintile the following year, regardless of whether one, two, three, four or five years of data were used to predict future performance, regardless of the subject area (math or reading), and regardless of whether a simple or complex Bayes estimator was used to improve predictive accuracy”

//end quote’

Published in: on January 10, 2016 at 12:40 pm  Comments (1)  

The URI to TrackBack this entry is: https://gfbrandenburg.wordpress.com/2016/01/10/value-added-measurements-are-less-accurate-than-flipping-a-coin/trackback/

RSS feed for comments on this post.

One CommentLeave a comment

  1. […] one subject to the next. See my blog  (not all on NY City) here, here, here,  here, here, here, here, here,  here, here, and here. See Gary R’s six part series on his blog here, here, here, […]

    Like


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: