Monday, January 21, 2008

New Hampshire Recount: Statistical oddity

The New Hampshire primary vote recount is generating some interesting and unexpected results. The county that has been recounted so far shows wide differences between the vote and recount only in three precincts, in which there clearly were either malfunctions or manipulations of the voting machines. Overall, the recount does not affect the results on the Democratic side.

However, the distribution of differences between vote and recount (the errors) by precinct size is extremely intersting. As expected, the errors are smallest for the smallest precincts. If there are fewer votes to count, counting errors will of course be smaller. The spike in errors, however, is clearly in the mid-range of precinct sizes. The largest precincts, where we would expect to see the largest errors, actually have smaller errors.

In fact, it looks like there is an "expected" distribution of errors near the bottom of the graph, that slowly increases from small to large precincts. But in the region between 1000 and 2500 total votes (x axis), there seems to be a "hill" of unexpected results above the expected line (and the three outliers further above). This is a very unexpected pattern, and I am really not sure how to explain it right now.

Wednesday, January 16, 2008

The differences between the machine and hand counted votes in New Hampshire were real and significant

While I continue to work on the possible socio-economic differences between similar sized precincts that use different voting methods, I can confirm (thanks to a friend who ran the tests using R) that the differences between the machine counted and hand counted vote totals in New Hampshire's primary are real and significant. When regressed against precinct vote total and vote counting method, the percent differences between Obama and Clinton in the 2008 New Hampshire Primary are statistically significant for counting method.

The P value for vote total (a good proxy for precinct size) is 0.88 (not significant), while the value for voting method is 2.4 E-6 (very significant). In other words, there is a 0.0000024% chance that there is no relationship between the Obama-Clinton percent difference in votes and the vote counting method. So yes, the difference is real. Now there remains the question of whether it can be explained by differences between similar sized precincts that use different vote counting methods. I haven't seen anything so far that points in that direction, but I am still looking.

Monday, January 14, 2008

Are socio-economic differences between hand counted and machine counted precincts responsible for the New Hampshire discrepancies? A preliminary look


This guy is coming up with some results that look very much like mine (see previous posts). Note that in his analysis, Obama, Clinton and Edwards are the only ones to be affected by the machine vs hand count. That sort of flies in the face of expectations. If there are really fundamental differences in voting behaviour between similar sized reporting units, they should affect all candidates. Typically, the other candidates combined have fewer than 100 votes in even the large reporting units. Because of the much smaller sample, we should expect more variance in their vote, not less. Chance variation should have a larger effect on them than on Clinton, Obama, or Edwards.

I've started doing a little exploratory stuff on the socio-economic factors. I won't have time for anything systematic for a while. But for now, it looks like for a group of 10 similar sized reporting units around the smallest machine counted unit (in terms of vote total), Clinton won the machine counted one and 3 hand counted ones, while Obama won the remaining 6 hand counted ones.

Here are the variables I have so far for the units. The machine counted unit (Clinton) is first, and the hand counted ones (Obama) follow:

PopDens; Per Capita Income; Median Household Income
26,9; 16944; 38654; (Clinton, Machine)
55,7; 28503; 60433; (Clinton)
25,9; 23263; 44659; (Obama)
19,9; 19617; 48125; (Obama)
20,1; 19973; 35556; (Obama)
35,4; 17089; 36000; (Clinton)
130,1; 19675; 46150; (Obama)
10,1; 17998; 28523; (Clinton)
34,3; 17169; 38125; (Obama)
32,2; 23112; 55000; (Obama)

The machine counted Clinton unit is only remarkable for having the lowest Per Capita Income (not by much), but it has a relatively healthy Household Income. Both Obama and Clinton won in high and low Population Density areas, and in high and low Income areas. Clinton won a very low Density/low Income area as well as the highest Income area (with the second highest Density). If these rough numbers are any indication, finding clear demographic differences between the machine and hand counted units of similar size will not be easy.

Saturday, January 12, 2008

The results are in

Using the results for every county, it becomes clear that Obama was robbed. First, the numbers, then the graph. If we consider each reporting town or precinct as a contest between Obama and Clinton, Obama won 62% of the contests when the votes were hand-counted, and only 35% of the contests when the votes were machine-counted.

Larger towns or reporting units tend to machine count their votes. The smallest ones tend to hand count. There is a middle range of reporting unit size in which both methods are used. If we use only the reporting units larger than the smallest machine-counter and smaller than the largest hand-counter (in other words, the mid-range in which both methods are used), we find that Obama won 63% of contests if votes were hand-counted, and only 39% if votes were machine-counted.

In the above graph, the data points below the mid-line show contests won by Obama, while the data points above the line show constests won by Clinton. The blue circles are hand-counted totals, and the red crosses are machine-counted totals. As the numbers above suggest, there is more blue below the line, and more red above, especially in the mid-range of vote totals.

This analysis eliminates the size of the reporting unit as a factor in the vote totals difference. The main factor seems to be the method of counting votes. Hand counting favours Obama, machine counting favours Clinton. It is quite clear. It isn't that people in different sorts of places vote differently. It is that people in different places use different methods of vote counting.

I have a few more ideas to tease out the method they used in greater detail. I'll keep you posted.

Friday, January 11, 2008

Quick New Hampshire machine vote fraud update

I see that the vote totals for the missing counties have now been posted on

While I was waiting for them, I added some linear regression lines to the original graphs and came up with some very interesting results.

The first thing to notice is that for Obama, the hand counted totals regression is right on the expected line. This means that he scored exactly as the polls predicted when votes were hand-counted. The machine-counted totals regression has a very different slope, because he got many fewer votes than expected in the small and medium towns, but scored according to expectations in the larger ones. This is pretty unimpeachable if it holds up when I add the missing data.

Not surprisingly, both regressions for Clinton are above the line, and the machine-counted totals regression is almost exactly shifted up from the expected line. She beat the polls consistently, independent of voting method. The straight shifting up of the machine-counted totals is odd, but not terribly worrisome.

Again, the pattern for Edwards is a mirror image of Obama's. The slopes of the regressions differ per voting method, and the machine-counted totals for small towns seem elevated.

Now, I'll go plug in the new data.

Thursday, January 10, 2008

New Hampshire 2008 Primary fraud: Subtle

There are still four counties that have not posted detailed results for the Democratic candidates. But going on currently available data, it looks like there is a difference between hand counted (blue circles) and machine counted (red crosses) vote totals for Obama, in small and medium towns.

In the above graph, the diagonal represents Obama's expected vote totals, on the basis of the infamous January 6th CNN poll. For the bottom portion of the line, the small and medium sized towns, the hand counted vote totals are distributed around the line, above and below. That's normal. He's above expectations in some places, and below in others. For the same portion of the line, his machine counted vote totals are systematically below the line. This is not normal. It means that his hand counted votes and his machine counted votes are not behaving the same way for same-sized locations.

In larger towns, at the top part of the line, the distribution of machine counted vote around the expected line becomes fairly normal again. Some vote totals are above expectations, some are below. This is important. It means that a test of statistical significance is not likely to detect the deviation from normality in the smaller locations. The larger locations, with their higher vote counts and normal distribution will overwhelm the smaller locations. In other words, most tests of significance will be blind to the fraud.

Ok, so Obama lost some machine counted votes. But a fraud in which the combined vote totals don't match the overall number of votes cast is not a very subtle fraud. Where are the missing votes? Giving them to the winner is equally unsophisticated. In fact, it is clear that Clinton really did beat the polls. There is no real difference between her hand counted and her machine counted votes.

If we look at Edwards' vote totals however, we see a mirror image of the Obama pattern. In low and mid-sized towns, his machine counted vote totals are systematically above the expected line, while they abruptly go back to it for the larger locations.

Giving the votes to the third place guy who has no hope of winning is clever. Subtle. In fact, that is exactly what I would do. I'm devious that way. Here's another devious thing I would do. I would create a big stinky red herring to cover the subtle scent of my operation. For example, I might start loudly claiming that tens of thousands of votes were taken away from Ron Paul. I would make the story easily debunkable, and I would make sure that it immediately associates any talk of fraud in New Hampshire with the fringe, kooky element.

A subtle fraud, involving just enough votes to generate the desired outcome, a pattern that flies below the statistical significance radar, and a big stinky red herring. A job well done.