From a recent comment on my blog:
‘You’re probably familiar with “Reanalysis of the Effects of Teacher Replacement Using Value-Added Modeling” by Stuart Yeh that looked at a broad spectrum of studies that found a similar result regarding the (un)reliability of VAM. Namely, that it was no better than flipping a coin.
‘“The intertemporal reliability of value-added teacher rankings was investigated by Aaronson et al. (2007), Ballou (2005), Koedel and Betts (2007), and McCaffrey et al. (2009). In each study, VAM was used to rank teacher performance from high to low. In each study, a majority of teachers who ranked in the lowest quartile or lowest quintile shifted out of that quartile (or quintile) the following year (see Tables 1 and 2). Furthermore, a majority of teachers who ranked in the highest quartile or quintile shifted out of that quartile (or quintile) the following year (see Tables 1 and 2).
‘“In the case of value-added rankings, it is inappropriate to infer that a teacher should be hired or fired based on the rankings from any given year. Since this inference would be inappropriate, the results of valueadded teacher rankings are not valid for the purpose of high-stakes decisions regarding hiring and firing. In short, VAM lacks validity for the purpose of high-stakes decisions regarding individual teachers.
‘While some researchers suggest averaging two or more years of rankings, averaging may introduce significant bias– raising the issue of validity once again (McCaffrey et al., 2009). Furthermore, it would not be uncommon for data to be missing in a way that would prevent averaging. For large numbers of teachers, it would be impractical. (Newton et al 2010).
‘Regardless, when two years of rankings are used for tenure decisions, intertemporal reliability remains low: In reading, data from North Carolina indicate that 68% of teachers ranked in the bottom quintile shift out of that quintile after tenure (indicated by a weighted average of all post-tenure observations), and 54% of teachers ranked in the top quintile shift out of that quintile post tenure (Goldhaber & Hansen, 2008). When three years of rankings are used, reliability is even worse: 74% of teachers ranked in the bottom quintile shift out of that quintile post-tenure, and 56% of teachers ranked in the top quintile shift out of that quintile post tenure (Goldhaber & Hansen, 2008). In math reliability is somewhat better, but over half of all teachers in the bottom and top quintiles shift out of those quintiles post tenure (Goldhaber & Hansen, 2008).
‘“These results were confirmed by a second value-added analysis, also using data from North Carolina, which found that more than half of all teachers who ranked in the bottom quintile shifted out of that quintile the following year, regardless of whether one, two, three, four or five years of data were used to predict future performance, regardless of the subject area (math or reading), and regardless of whether a simple or complex Bayes estimator was used to improve predictive accuracy”