Revised HS PARCC ‘pass’ rates in English and Math in DC public and charter schools

My original graphs on the ‘pass’ rates for all DC publicly-funded high schools were incomplete, because I was using OSSE data only (Office of the State Superintendent of Education). A reader showed me where the DC charter school board (DC PCSB) posted their PARCC statistics and that gave me the pass rates for a couple of additional schools (Maya Angelou and BASIS IIRC). So here are the revised graphs which you can click on to enlarge:

2015 Math PARCC 'pass' rates, both public and charter schools in DC

2015 Math PARCC ‘pass’ rates, both public and charter schools in DC

2015 'pass' rates, public and charter high school math, PARCC, DC, 2015

2015 ‘pass’ rates, public and charter high school math, PARCC, DC, 2015

Note how many fewer students passed the PARCC math test than the reading test in DC. I haven’t yet seen any of the actual questions on either of the tests. But if these were tests that I had written and was using as a teacher with my students, I would likely conclude that the one with the much-lower scores was simply a much harder test, and I would probably do one of the following:

(A) “scale” the scores so that more students would pass, or else

(B) throw out the test results and try teaching with a different approach altogether, or else

(C) throw out the test and make one that at least a majority of students could pass if they’ve been paying attention.

{At my last school, if f I failed 80 to 90% of my students, I would have gotten an unsatisfactory evaluation and probably have gotten fired.}

Of course, this being the era when multi-billionaires who hate the very idea of public schools are in charge of said public schools, neither A, B or C will happen. In fact, my understanding is that the ‘cut’ scores for each of the categories of grades (meets expectations and so on) were set AFTER the students took the test, not in advance. So it was very much a politico-social decision that the vast majority of students were SUPPOSED to fail the math test.

Let me note strongly that by far the most effective way to have really good test scores for your school is to let in ONLY students who already get strong test scores. That’s how Phillips Exeter or Andover Academies or Riverdale or Sidwell Friends or or the Chicago Lab or Lakeside private schools do it, and that’s how Banneker, School Without Walls, Washington Latin, and BASIS do it. (Partial disclosure: I and some of my immediate family either went to, or worked at, some of those schools.) Teachers who are successful at those elite schools have a MUCH easier time teaching those students than do those who try to teach at school with large numbers of at-risk students, like Washington Metropolitan, Ballou, Cardozo, Maya Angelou, or Options public or charter schools. Idealistic teachers from elite schools who do transfer to tough inner-city public schools generally crash and burn, and I would predict that one of the easiest ways to lose your teaching job these days is to volunteer to teach at any one of the five latter schools.

Where DC is #1 on the NAEP

Of all the states and territories tested on the 2015 NAEP, there is one place where DC is Number ONE!

Unfortunately, it’s not a good #1.

We have, by far, the largest gaps between percentages of white and black students who are deemed ‘proficient’ or better. On every single test (8th grade, 4th grade, reading and math).

DC also the largest gaps between percentages of white and hispanic students – on every single test.

Our DC gaps are at least double the national gaps. And that’s not good. In fact, the gaps are anywhere between double and two-and-a-half times as large as the gaps nationally or the median of all states, as you see here:

gaps b-w and w-h

Kaya Henderson and Michelle Rhee really have some tremendous accomplishments, don’t they?


These scores, by the way, are for a carefully-selected sample of ALL students in Washington, DC – public, charters, private, and parochial. Rhee and Henderson and the various DC mayors have been in total control of all public and charter schools since 2007, with a school board that has exactly zero power and a teachers’ union that has lost almost any power to do anything meaningful to support teachers. And we have a teaching and supervisory force that is either brand-new (hired by Rhee or Henderson or by the heads of the many charter schools) or has passed all of the extremely difficult evaluations not once, but many times.

Trends on the NAEP give a clue as to why Arne Duncan quit

Seeing the rather large drop on the NAEP scores for students across the nation – results released at midnight last night – gives me the idea that Arne Duncan (secretary of education for the past 7 years) quit rather than face the blame for his failed policies. After all, he (and the rest of the billionaire deformer class) have been promising that if you open tons of unregulated charter schools, use numerology to fire many of the remaining veteran teachers, and make education into little more than test prep for all students of color or those who come from poor families, then the test results will improve.

Well, they didn’t improve.

I will let you see for yourself how the percentages of students deemed ‘proficient’ in 4th grade and 8th grade on the NAEP at the national level generally dropped. I include DC (where I come from), and in five other states – two that are high-performing (NH and Massachusetts) and three that are low-performing (CA, AL and NM).

The one bright spot for District residents is that DC is no longer the last in the nation in every category! DC students now have slightly higher percentages proficient in certain categories than two other impoverished states – New Mexico and Alabama, as you can see in the graphs below. (The graph for the District of Columbia is the light blue one at the bottom,)

On the other hand, the increases in percentages of students ‘proficient’ in DC since 2008, the first year after mayoral control was imposed and the elected school board was neutralized, are nothing but a continuation of previous trends.

As usual, if you want to take a closer look, click on the graphs.

% Proficient in 4th Grade Math: DC, Nation, MA, CA, NH, NM, AL through 2015

% Proficient in 4th Grade Math: DC, Nation, MA, CA, NH, NM, AL through 2015

Percentage 'Proficient' or Above on 4th grade NAEP reading through 2015, DC, Nation, AL, CA, MA, NH, NM

Percentage ‘Proficient’ or Above on 4th grade NAEP reading through 2015, DC, Nation, AL, CA, MA, NH, NM

8th grade math NAEP

8th grade math NAEP

8th grade reading

8th grade reading

Surprising Comparison of Charter and Regular Public School ‘Pass’ Rates on the HS PARCC

I was actually rather surprised to see that significantly larger percentages of regular DC public school students ‘passed’ the PARCC in both math and in reading than did DC charter school students.

If you don’t believe me, look for yourself at the OSSE press release.

What it says is that in the DC charter schools, 23% of the students ‘passed’ (got a 4 or a 5) on the English portion, whereas in the regular DC public schools, 27% ‘passed’.

And in math, they claim that only 7% of the charter school students ‘passed’, but 12% of the regular DC public school students passed.

Are you surprised, too?

A Few PARCC Scores Have Been Released for DC Public Schools

If you would like to see how District of Columbia public high school students did on the PARCC, you can look here at a press release from DCPS administration. This test was on ELA (reading) and Geometry. The scores for grades 3-8 have not yet been released.

The disparities in ‘pass’ rates between the DCPS magnet schools (Banneker and Walls) and every other DC public high school are amazing, particularly in geometry. Notice that several schools had not a single student ‘pass’. This year’s test gives students scores from 1 to 5; only a score of 4 or 5 is considered ‘college and career ready’ — although no studies have actually been done to determine whether that statement is actually true. Banneker and Walls have the lowest rates of students labeled ‘at risk’.

Here are two graphs which I cut-and-pasted from the press release. Click on them to enlarge them.


HS-PARCC geometry

Given what I’ve seen of the convoluted questions asked on released sample PARCC questions, it is no wonder that ‘pass’ rates dropped a lot this year, compared with previous years. The DC-CAS wasn’t a very good test, but PARCC is terrible.

Please keep in mind that public education in the District of Columbia has been under the control of DEformers like Michelle Rhee, Kaya Henderson, and the Gates and Broad foundations, for over 8 years now. The students taking this test last spring have been under their rule since they were rising third graders. Every single teacher in DCPS was either hired by Rhee or by Henderson or else passed numerous strict evaluations with flying colors, year after year, and has been teaching just as they were directed to – or else.

And this is the best that the DEformers can do?

Mental Math, Traditional Math, and Best Methods

James Tanton is one of the most insightful teachers of teachers of math I’ve come across in a long time.

In this video and this essay, he explains how some parents feel mystified by some of the newer style of math problems that children are bringing to their parents for help. Clearly, some of the problems (like the 32-12 one which fills up an entire page in a child’s workbook) are being done in a tortured, time-wasting method. However, if you take a more difficult problem, such as 103 – 87, you can do this completely in your head by adding small increments (MENTALLY) to the 87 until you get to 103.

This is a reasonable thing to do, much like people count out (or used to count out) change into a customer’s hand.

In this case, from 87 to 90 (which is a much ‘nicer’ number than 87) we need to go up by 3.

Then from 90 to 100 (which is getting closer to our goal), we go up by 10. So we’ve gone up by 13 all together.

Then from 100 to 103 is three more, so we’ve gone up by 16, which means that 103 – 87 is 16.

I would be crazy to write all those steps out! – which is unfortunately what the poor child was asked to do.

But doing it quickly in your head makes a lot of sense.

As Tanton says, let’s get rid of laborious, tortured, time-wasting algorithms and examples, but let’s keep the thinking and understanding.

Click on the picture for a larger view; it’s a slightly-modified still from his video.

james tanton on thinking

Published in: on October 25, 2015 at 11:31 am  Comments (1)  
Tags: ,

See Jersey Jazzman use the Gaussian Distribution to Show that Arne Duncan and Mike Petrilli are full of it

Excellent lesson from Jersey Jazzman showing that the old tests produce pretty much the same distribution of scores as the new tests.

old and new tests

He has superimposed the green scores from 2008 on top of the 2014 scores for New York state in 8th grade reading, and basically they have almost the same distribution. Furthermore, a scatter plot shows nearly the same thing, and that there is a nearly perfect correlation between the old scores and the new scores, by school.

old and new tests again

Read his article, which is clear and concise. I don’t have time to go into this in depth.

A 3 minute news segment on the NY lawsuit against Value-Added Modeling

Even-handed video from Al-Jazeera interviewing some of the people involved in the anti-Value-Added-Model lawsuit against Value-Added Model evalutations of teachers in New York state.

You may have heard of the lawsuit – it was filed by elementary teacher Sherri Lederman and her lawyer, who is also her husband.

Parents, students, and administrators had nothing but glowing praise for teacher Lederman. In fact, Sherri’s principal is quoted as saying,

“any computer program claiming Lederman ‘ineffective’ is fundamentally flawed.”

Lederman herself states,

“The model doesn’t work. It’s misrepresenting an entire profession.”

Statistician Aaron Pallas of Columbia University states,

“In Sherri’s case, she went from a 14 out of 20, which was fully in the effective range, to 1 out of 20 [the very next year], ineffective, and we look at that and say, ‘How can this be? Can a teacher’s performance really have changed that dramatically from one year to the next?’

“And if the numbers are really jumping around like that, can we really trust that they are telling us something important about a teacher’s career?”

Professor Pallas could perhaps have used one of my graphs as a visual aid, to help show just how much those scores do jump around from year to year, as you see here. This one shows how raw value-added scores for teachers in New York City in school year 2005-2006 correlated with those very same teachers, teaching the same grade level students in the very same schools the exact same subjects, one year later. Gary Rubinstein has similar graphs. You can look here, here, here, or here. if you want to see some more from me on this topic.

The plot that follows is a classic case of ‘nearly no correlation whatsoever’ that we now teach to kids in middle school.

In other words, yes, teachers’ scores do, indeed jump around like crazy from year to year. If you were above average on VAM one year – that is, to the right of the Y axis anywhere, it is quite likely that you will end under the X-axis (and hence below average) the next year. Or not.

I am glad somebody is finally taking this to court, because this sort of mathematics of intimidation has got to stop.

nyc raw value added scores sy 0506 versus 0607

Comment by Duane Swacker on Stuart Yen’s Study, at Diane Ravitch’s Blog

I hope Duane Swacker will not mind me reposting one of his long comments after the recent blog post by Diane Ravitch about professor Stuart Yen’s study on the lack of validity of Value-Added Metrics.


To understand the COMPLETE INSANITY that is VAM & SLO/SGP read and understand Noel Wilson’s never refuted nor rebutted destruction of educational standards and standardized testing (of which VAM & SLO/SGP are the bastard stepchildren) in “Educational Standards and the Problem of Error” found at:

Brief outline of Wilson’s “Educational Standards and the Problem of Error” and some comments of mine.

1. A description of a quality can only be partially quantified. Quantity is almost always a very small aspect of quality. It is illogical to judge/assess a whole category only by a part of the whole. The assessment is, by definition, lacking in the sense that “assessments are always of multidimensional qualities. To quantify them as unidimensional quantities (numbers or grades) is to perpetuate a fundamental logical error” (per Wilson). The teaching and learning process falls in the logical realm of aesthetics/qualities of human interactions. In attempting to quantify educational standards and standardized testing the descriptive information about said interactions is inadequate, insufficient and inferior to the point of invalidity and unacceptability.

2. A major epistemological mistake is that we attach, with great importance, the “score” of the student, not only onto the student but also, by extension, the teacher, school and district. Any description of a testing event is only a description of an interaction, that of the student and the testing device at a given time and place. The only correct logical thing that we can attempt to do is to describe that interaction (how accurately or not is a whole other story). That description cannot, by logical thought, be “assigned/attached” to the student as it cannot be a description of the student but the interaction. And this error is probably one of the most egregious “errors” that occur with standardized testing (and even the “grading” of students by a teacher).

3. Wilson identifies four “frames of reference” each with distinct assumptions (epistemological basis) about the assessment process from which the “assessor” views the interactions of the teaching and learning process: the Judge (think college professor who “knows” the students capabilities and grades them accordingly), the General Frame-think standardized testing that claims to have a “scientific” basis, the Specific Frame-think of learning by objective like computer based learning, getting a correct answer before moving on to the next screen, and the Responsive Frame-think of an apprenticeship in a trade or a medical residency program where the learner interacts with the “teacher” with constant feedback. Each category has its own sources of error and more error in the process is caused when the assessor confuses and conflates the categories.

4. Wilson elucidates the notion of “error”: “Error is predicated on a notion of perfection; to allocate error is to imply what is without error; to know error it is necessary to determine what is true. And what is true is determined by what we define as true, theoretically by the assumptions of our epistemology, practically by the events and non-events, the discourses and silences, the world of surfaces and their interactions and interpretations; in short, the practices that permeate the field. . . Error is the uncertainty dimension of the statement; error is the band within which chaos reigns, in which anything can happen. Error comprises all of those eventful circumstances which make the assessment statement less than perfectly precise, the measure less than perfectly accurate, the rank order less than perfectly stable, the standard and its measurement less than absolute, and the communication of its truth less than impeccable.”

In other word all the logical errors involved in the process render any conclusions invalid.

5. The test makers/psychometricians, through all sorts of mathematical machinations attempt to “prove” that these tests (based on standards) are valid-errorless or supposedly at least with minimal error [they aren’t]. Wilson turns the concept of validity on its head and focuses on just how invalid the machinations and the test and results are. He is an advocate for the test taker not the test maker. In doing so he identifies thirteen sources of “error”, any one of which renders the test making/giving/disseminating of results invalid. And a basic logical premise is that once something is shown to be invalid it is just that, invalid, and no amount of “fudging” by the psychometricians/test makers can alleviate that invalidity.

6. Having shown the invalidity, and therefore the unreliability, of the whole process Wilson concludes, rightly so, that any result/information gleaned from the process is “vain and illusory”. In other words start with an invalidity, end with an invalidity (except by sheer chance every once in a while, like a blind and anosmic squirrel who finds the occasional acorn, a result may be “true”) or to put in more mundane terms crap in-crap out.

7. And so what does this all mean? I’ll let Wilson have the second to last word: “So what does a test measure in our world? It measures what the person with the power to pay for the test says it measures. And the person who sets the test will name the test what the person who pays for the test wants the test to be named.”

In other words it attempts to measure “’something’ and we can specify some of the ‘errors’ in that ‘something’ but still don’t know [precisely] what the ‘something’ is.” The whole process harms many students as the social rewards for some are not available to others who “don’t make the grade (sic)” Should American public education have the function of sorting and separating students so that some may receive greater benefits than others, especially considering that the sorting and separating devices, educational standards and standardized testing, are so flawed not only in concept but in execution?
My answer is NO!!!!!

One final note with Wilson channeling Foucault and his concept of subjectivization:

“So the mark [grade/test score] becomes part of the story about yourself and with sufficient repetitions becomes true: true because those who know, those in authority, say it is true; true because the society in which you live legitimates this authority; true because your cultural habitus makes it difficult for you to perceive, conceive and integrate those aspects of your experience that contradict the story; true because in acting out your story, which now includes the mark and its meaning, the social truth that created it is confirmed; true because if your mark is high you are consistently rewarded, so that your voice becomes a voice of authority in the power-knowledge discourses that reproduce the structure that helped to produce you; true because if your mark is low your voice becomes muted and confirms your lower position in the social hierarchy; true finally because that success or failure confirms that mark that implicitly predicted the now self-evident consequences. And so the circle is complete.”

In other words students “internalize” what those “marks” (grades/test scores) mean, and since the vast majority of the students have not developed the mental skills to counteract what the “authorities” say, they accept as “natural and normal” that “story/description” of them. Although paradoxical in a sense, the “I’m an “A” student” is almost as harmful as “I’m an ‘F’ student” in hindering students becoming independent, critical and free thinkers. And having independent, critical and free thinkers is a threat to the current socio-economic structure of society.

Important Article Shows that ‘Value-Added’ Measurements are Neither Valid nor Reliable

As you probably know, a handful of agricultural researchers and economists have come up with extremely complicated “Value-Added” Measurement (VAM) systems that purport to be able to grade teachers’ output exactly.

These economists (Hanushek, Chetty and a few others) claim that their formulas are magically mathematically able to single out the contribution of every single teacher to the future test scores and total lifetime earnings of their students 5 to 50 years into the future. I’m not kidding.

Of course, those same economists claim that the teacher is the single most important variable affecting their student’s school and trajectories – not family background or income, nor peer pressure, nor even whole-school variables. (Many other studies have shown that the effect of any individual teacher, or all teachers, is pretty small – from 1% to 14% of the entire variation, which corresponds to what I found during my 30 years of teaching … ie, not nearly as much of an impact as I would have liked [or feared], one way or another…)

Diane Ravitch has brought to my attention an important study by Stuart Yen at UMinn that (once again) refutes those claims, which are being used right now in state after state and county after county, to randomly fire large numbers of teachers who have tried to devote their lives to helping students.

According to the study, here are a few of the problems with VAM:

1. As I have shown repeatedly using the New York City value-added scores that were printed in the NYTimes and NYPost, teachers’ VAM scores vary tremendously over time. (More on that below; note that if you use VAM scores, 80% of ALL teachers should be fired after their first year of teaching) Plus RAND researchers found much the same thing in North CarolinaAlso see this. And this.

2. Students are not assigned randomly to teachers (I can vouch for that!) or to schools, and there are always a fair number of students for whom no prior or future data is available, because they move to other schools or states, or drop out, or whatever; and those students with missing data are NOT randomly distributed, which pretty makes the whole VAM setup an exercise in futility.

3. The tests themselves often don’t measure what they are purported to measure. (Complaints about the quality of test items are legion…)

Here is an extensive quote from the article. It’s a section that Ravitch didn’t excerpt, so I will, with a few sentences highlighted by me, since it concurs with what I have repeatedly claimed on my blog:

A largely ignored problem is that true teacher performance, contrary to the main assumption underlying current VAM models, varies over time (Goldhaber & Hansen, 2012). These models assume that each teacher exhibits an underlying trend in performance that can be detected given a sufficient amount of data. The question of stability is not a question about whether average teacher performance rises, declines, or remains flat over time.

The issue that concerns critics of VAM is whether individual teacher performance fluctuates over time in a way that invalidates inferences that an individual teacher is “low-” or “high-” performing.

This distinction is crucial because VAM is increasingly being applied such that individual teachers who are identified as low-performing are to be terminated. From the perspective of individual teachers, it is inappropriate and invalid to fire a teacher whose performance is low this year but high the next year, and it is inappropriate to retain a teacher whose performance is high this year but low next year.

Even if average teacher performance remains stable over time, individual teacher performance may fluctuate wildly from year to year.  (my emphasis – gfb)

While previous studies examined the intertemporal stability of value-added teacher rankings over one-year periods and found that reliability is inadequate for high-stakes decisions, researchers tended to assume that this instability was primarily a function of measurement error and sought ways to reduce this error (Aaronson, Barrow, & Sander, 2007; Ballou, 2005; Koedel & Betts, 2007; McCaffrey, Sass, Lockwood, & Mihaly, 2009).

However, this hypothesis was rejected by Goldhaber and Hansen (2012), who investigated the stability of teacher performance in North Carolina using data spanning 10 years and found that much of a teacher’s true performance varies over time due to unobservable factors such as effort, motivation, and class chemistry that are not easily captured through VAM. This invalidates the assumption of stable teacher performance that is embedded in Hanushek’s (2009b) and Gordon et al.’s (2006) VAM-based policy proposals, as well as VAM models specified by McCaffrey et al. (2009) and Staiger and Rockoff (2010) (see Goldhaber & Hansen, 2012, p. 15).

The implication is that standard estimates of impact when using VAM to identify and replace low-performing teachers are significantly inflated (see Goldhaber & Hansen, 2012, p. 31).

As you also probably know, the four main ‘tools’ of the billionaire-led educational DEform movement are:

* firing lots of teachers

* breaking their unions

* closing public schools and turning education over to the private sector

* changing education into tests to prepare for tests that get the kids ready for tests that are preparation for the real tests

They’ve been doing this for almost a decade now under No Child Left Untested and Race to the Trough, and none of these ‘reforms’ have shown to make any actual improvement in the overall education of our youth.


Get every new post delivered to your Inbox.

Join 467 other followers

%d bloggers like this: