Results from an actual randomized study at the USAFA

A very interesting study done at the US Air Force Academy and published here, contradicts most of what Hanushek, Kamras, Rhee et al have said about ‘value-added’ measures of teaching.

Some of its conclusions:

“…our results indicate that professors who excel at promoting contemporaneous student achievement [that is, who do well at what Rhee and Kamras would call ‘value-added scores’], on average, harm the subsequent performance of their students in more advanced classes.

“Academic rank, teaching experience, and terminal degree status of professors are negatively correlated with contemporaneous value-added but positively correlated with follow-on course value-added. Hence, students of less experienced instructors who do not possess a doctorate perform significantly better in the contemporaneous course but perform worse in the follow-on related curriculum.

“Student evaluations are positively correlated with contemporaneous professor value-added and negatively correlated with follow-on student achievement. That is, students appear to reward higher grades in the introductory course but punish professors who increase deep learning (introductory course professor value-added in follow-on courses). Since many U.S. colleges and universities use student evaluations as a measurement of teaching quality for academic promotion and tenure decisions, this latter finding draws into question the value and accuracy of this practice.

“Similar to elementary and secondaryschool teachers, who often have advance knowledge of assessmentcontent in high-stakes testing systems, all professors teaching a given course at USAFA have an advance copy of the exam before it is given. Hence, educators in both settings must choose how much time to allocate to tasks that have great value for raising current scores but may have little value for lasting knowledge.

“Using our various measures of quality to rank-order professors leads to profoundly different results.”

“the correlation between introductory calculus professor value-added in the introductory and follow-on courses is negative, r=-0.68. Students appear to reward contemporaneous course value-added, r=+0.36, but punish deep learning, r=-0.31.”

(In other words, one method of measuring professor quality will rank them in one way, but if you use a different method of measuring professor quality, you get a ranking that is profoundly different.)

In this study, students and instructors were randomly assigned to all of the courses, and students had to take follow-up courses in, say, calculus and chemistry, regardless of how much they liked the subject or not, or how well they did in their previous course. Thus, once they were at the Air Force Academy, there was no self-selection by professors or by students. This eliminated an element that probably confounds prior studies.


The URI to TrackBack this entry is:

RSS feed for comments on this post.

4 CommentsLeave a comment

  1. Mr. Brandenburg,

    Thank you for this post. I am a university student conducting research on teacher evaluation systems (specifically IMPACT) for my thesis, and your blog has been a great resource.

    I actually would love to interview you for my thesis if possible. Please let me know if you are interested ( and I will provide you more details on my research project. This would be a huge bolster to my work and I’d love to hear more about your perspective!

    Thanks in advance.


  2. The authors don’t seem to be anti ‘value-added. They are just noting that it has to be done carefully. In fact they draw the following conclusions from their findings:

    These findings support recent research by Barlevy and Neal (2009), who propose an incentive pay scheme that links teacher compensation to the ranks of their students within appropriately defined comparison sets and requires that new assessments consisting of entirely new questions be given at each testing date. The use of new questions eliminates incentives for teachers to coach students concerning the answers to specific questions on previous assessments.


  3. […] as there is evidence from proper trials, it seems that the numerical scores awarded by students do not reflect how well they have learned from their teachers. In other words, the numbers going in are unreliable, especially since with low return rates the […]


  4. […] posted about student evaluations last week and I took the opportunity to bring up the  famous Air Force Academy study. (the paper; an accessible blog post about […]


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: