Introduction

Chi-square has been described by the statistician Michael Crawley as something taught to geographers at school and misunderstood thereafter! It’s a mischievous comment and a shame if true because despite its off-putting calculations there is nothing particularly complicated about chi-square.

It’s just a way of asking if two ‘things’ are related to one another or not, and assessing the statistical evidence for it.

This tutorial:

Which types of place voted for Brexit?

The data

There are 326 local authorities (LAs) in England and these can be grouped by how urban or rural they are using the Department for Environment, Food and Rural Affair’s Rural Urban Classification.

The results are shown in Table 1 where the types of area are sorted from the most urban (which are the 75 LAs described as A: Urban with Major Conurbation) to the most rural (the 50 described as F: Mainly Rural).

Table 1. Types of local authority in England
Area type SUM
A: Urban with Major Conurbation 75
B: Urban with Minor Conurbation 9
C: Urban with City and Town 97
D: Urban with Significant Rural 54
E: Largely Rural 41
F: Mainly Rural 50
SUM 326

Table 2 provides some additional information. For each LA it is known from the EU Referendum results whether the majority of those who voted were in favour of leaving the EU or staying. If we describe LAs where the Leavers were in a majority as pro-Brexit places, and LAs where Remainers were in the majority as anti-Brexit, then the number of pro- and anti- places can be counted per area type.

The counts are shown in Table 2. For example, of the 75 areas described as A: Urban with Major Conurbation, 36 had a majority in favour of Brexit and 39 were against.

Table 2 is sometimes known as a contingency table.

Table 2. The observed number of LAs in each group that had a majority voting for or against Brexit
Area type pro-Brexit anti-Brexit SUM
A: Urban with Major Conurbation 36 39 75
B: Urban with Minor Conurbation 9 0 9
C: Urban with City and Town 81 16 97
D: Urban with Significant Rural 43 11 54
E: Largely Rural 35 6 41
F: Mainly Rural 42 8 50
SUM 246 80 326

The chi-square test

We can use the counts in Table 2 to answer a question, which is

  • Is there a relationship between the type of local authority and whether its voters formed a majority in favour or against Brexit?

What we have are two variables both of which are categorical. The first category is the LA type. The second is whether it is majority Remain or majority Leave. What we want to know is whether those categories are related in some way.

We assess the relationship by comparing the numbers shown in Table 2 (the observed values) with what those numbers would look like if there was no relationship (the expected values). Essentially, all the chi-square test does is offer a way of summarising those differences.

The greater the differences, the greater the chi-square value and the more we can say the outcome is not what would be expected if there was no relationship.

Comparing what is observed with what could be expected

To repeat, the basis for the chi-square test is to compare the observed counts in Table 2 (what actually happened) with what would be expected under a hypothesis of no relationship.

So how do we calculate the expected values? Well, we can see from Table 2 that 246 of the 326 LAs had a majority in favour of Brexit. As a percentage that is 75.5. Meanwhile, 80 of the same 326 had a majority against Brexit, which is 24.5 per cent. That is the information we need to calculate the expected values.

Imagine people living in the A: Urban with Major Conurbation areas were, on average, no more or less likely to vote for Brexit than those in the B: Urban with Minor Conurbation areas, who were no more or less likely than those in C: Urban with City and Town areas, and so forth across all the area types. If that were the case then we should expect 75.5 per cent of LAs in each of the area types to be pro-Brexit. For the A: Urban with Major Conurbation group, the expected number is then 75.5 percent of the 75 LAs that are in the group, which gives 56.6. For the B: Urban with Minor Conurbation group the expected number is 75.5 percent of the 9, which is 6.8; and for the C: Urban with City and Town group the expected number is 75.5 percent of 97, which is 73.2. The same logic can be applied to each of the other area types in turn.

Similarly, we should expect 24.5 per cent of LAs in each area type to be anti-Brexit. For the A: Urban with Major Conurbation group, the expected number is then 24.5 percent of the 75 LAs that are in that group, which gives 18.4. For the B: Urban with Minor Conurbation group it is 24.5 percent of the 9, which is 2.2, and so forth.

Table 3 applies the thinking to each of the area types and displays the results. In each case, the expected number is either 75.5 per cent (pro-Brexit) or 24.5 per cent (anti-Brexit) of the total number of LAs in each group.

Table 3. The expected number of LAs in each group to have a majority voting for or against Brexit
Area type pro-Brexit anti-Brexit SUM
A: Urban with Major Conurbation 56.6 18.4 75
B: Urban with Minor Conurbation 6.8 2.2 9
C: Urban with City and Town 73.2 23.8 97
D: Urban with Significant Rural 40.8 13.2 54
E: Largely Rural 31.0 10.0 41
F: Mainly Rural 37.8 12.2 50
SUM 246.0 80.0 326

Calculating the chi-square value

Recall that the basis of the chi-square test is to subtract the expected values from the observed values and summarise the differences. It has some extra stages (shown in the formula below) but it is the differences that are key.

The logic of all this becomes clearer if we turn it on its head and consider a situation of there being no differences between the observed and the expected values. Such a circumstance would mean that the percentage of pro-Brexit LAs is 75.5 for every area type, and the percentage of anti-Brexit LAs is 24.5. If that were true, then knowing the area type of each LA would provide no useful knowledge for predicting the referendum outcome because there is no variation between those types. If there’s no variation then there’s no relationship between the type of local authority and whether its voters formed a majority in favour or against Brexit.

There clearly are differences between the observed values shown in Table 2 and the expected values in Table 3. But it would be surprising if those values were exactly the same. The statistical question is whether those differences really matter - are they substantial enough to suggest that there is some sort of relationship between the area type and the referendum result? Is there sufficient evidence to say that the relationship holds?

To determine this, each difference between the observed and expected value is calculated, squared, and divided by the expected value. Those values are then added together. We don’t really need to know why they are squared and divided before being summed together other than to say that the squaring deals with negative numbers (which arise when the observed value is less than the expected one) and the end result is a number that can be tested for how unusual it is.

Written as a formula:

\[ \chi^2 = \sum \frac{(O - E)^2}{E} \]

where \(\chi\) is the Greek letter chi, O is an observed value, E is the corresponding expected value, and \(\sum\) means to calculate the sum for all the rows in the table.

Applied to the referendum data and taken in stages. First, the values for \(\small(O - E)\) are shown in Table 4,

Table 4. The differences between the observed and expected values
Area type pro-Brexit anti-Brexit
A: Urban with Major Conurbation -20.6 20.6
B: Urban with Minor Conurbation 2.2 -2.2
C: Urban with City and Town 7.8 -7.8
D: Urban with Significant Rural 2.2 -2.2
E: Largely Rural 4.0 -4.0
F: Mainly Rural 4.2 -4.2

Second, the answers in Table 4 are squared (i.e. multiplied by themselves) and then divided by the expected values shown in Table 3. This gives the values for \(\frac{(O - E)^2}{E}\), which are are shown in Table 5.

Table 5. The differences between the observed and expected values, squared and divided by the expected values
Area type pro-Brexit anti-Brexit
A: Urban with Major Conurbation 7.50 23.06
B: Urban with Minor Conurbation 0.71 2.20
C: Urban with City and Town 0.83 2.56
D: Urban with Significant Rural 0.12 0.37
E: Largely Rural 0.52 1.60
F: Mainly Rural 0.47 1.45

Finally, the sum of the values in Table 5 is calculated, which gives \(\sum \frac{(O - E)^2}{E}\) and the chi-square value, \(\chi^2\) = 41.4.

So we have an answer, so what?

You may be unimpressed to know that the chi-square value is 41.4? I’m not surprised, so am I. The problem with the chi-square value is isn’t very useful. Not in itself. To say it is equal to some value is fairly meaningless - it has no immediate interpretation.

Looking at statistical significance and the p-value

One way to proceed is to note that the chi-square value is a test statistic. Test statistics are used to assess statistical significance.

You can undertake this assessment by using Excel - open a blank worksheet and enter into one of the cells the following,

= CHIDIST(41.4,5)

where 41.4 is the chi-square value and 5 is a number known as the degrees of freedom (df). You will obtain what is known as a p-value, which in this case is tiny, p = 0.00000008 or thereabouts.

What are the degrees of freedom?

The degrees of freedom is a statistical concept. Here it is the number of values that could (in principle) be changed in Table 2 whilst holding its row and column sums constant. For the six area type types shown, any five of the observed values could be changed in either the pro- or anti-Brexit columns but all the remaining values are then determined by the need for everything to add up to meet the row and column totals.

[If you don’t believe me, try changing any five of the values in Table 2 and note how the rest need updating to keep the row and column SUMs the same]

More simply, for the chi-square test,

\[ df = (\text{number of columns} - 1) \times (\text{number of rows} - 1) \] which, for our example, is \(df = (2 - 1) \times (6 - 1) = 5\)

What is the p-value?

The p-value is the measure used to judge statistical significance. If it is less than 0.05 then the relationship is said to be statistically significant at a 95% confidence and if it is less than 0.01, it is said to be significant at a 99% confidence. The 95% and 99% thresholds are the ones commonly used in statistical work.

For our data the p-value is much less than 0.01 so the relationship is statistical significant at a 99 per cent threshold and above. What this means is that there are statistically significant differences between the area types in terms of whether they had a majority for or against Brexit.

Another way of getting to the same conclusion is to look up in a statistical table what is the critical value that must be exceeded for the chi-square value to be considered statistically significant at a given level of confidence and with 5 degrees of freedom.

For a 95 per cent confidence the answer is 11.1 and for a 99 per cent confidence it is 15.1. Recall that for our data is \(\chi^2\) = 41.4 which is greater than both of those two critical values and therefore statistically significant at either level of confidence.

If you don’t have a statistical table, then these values also can be obtained in Excel. For the 95 per cent confidence threshold, click on an empty cell and type

= CHIINV(0.05,5) [where 5 is the degrees of freedom]

and for the 99 per cent threshold use

= CHIINV(0.01,5)

You can use p-values or critical values to determine statistical significance, it amounts to the same thing.

Doing it all in Excel

If you wish, you can undertake the entire process in Excel: open a blank worksheet and under columns A and B, rows 1 to 6, enter the observed values from Table 3 (do not include their sum). Next to them, in columns C and D, enter the expected values (again excluding their sum). Then, click on cell E1 and enter the following,

= CHITEST(A1:B6,C1:D6)

and in E2,

= CHIINV(E1,5)

The value in E2 is the chi-square value and the value in E1 is the p-value.

So the result is statistically significant, so what?

Sometimes the p-value is described as the probability that the result (in this example, the differences between the area types) arose due to chance. This is an intuitive way of understanding it and as such quite helpful but it’s not really correct.

The p is indeed short for probability but the idea that the result could have arisen due to chance assumes that what we have is a random sample from a much larger and, consequently, unmeasurable ‘population’ - think of pebbles on a beach, where it would be impossible to measure the size of every pebble so therefore we take a sample of them to see how they vary in size with distance from the waterline, for example. If there were really no differences between the area types (or the stones) then the p-value is the probability that the differences we observe in our data are simply a consequence of bad luck, of the places we happen to include in the analysis.

The problem with this logic is that the data we are using are not a random sample. They are the actual Brexit results for all LAs in England.1

Furthermore, the p-value doesn’t actually tell us what we want to know, which is how much of an association there is between the types of LA and whether they had a majority in favour of leaving or staying in the EU. All the p-value says is that the relationship is ‘statistically significant’, not how great it is. To measure the strength of the association - the correlation - between the area types and the voting outcomes, we can apply the formula,

\[ r = \sqrt \frac{\chi^2}{n} \] which, for our data is equal to the square root of 41.4 divided by 326, which gives r = 0.36.

As a rule of thumb, a correlation of 0.10 is considered to be small, a correlation of 0.30 is medium, and a correlation from 0.50 to 1 is large. The effect of area type on the voting outcome is not large but it is of a medium size and therefore of interest.

Alternative approaches (keeping it simple)

Although the chi-square test can be useful, there’s little sense in using it just for the sake of doing so when simpler approaches can be more effective in project work, for example.

Recall that the question we are interested in here is,

We don’t need a statistical test to answer that. Instead we could just convert Table 2 into percentages and then plot the results.

Table 6. The percentage of LAs with a majority of voters in favour and against Brexit in each area type
Area type pro-Brexit anti-Brexit
A: Urban with Major Conurbation 48.0 52.0
B: Urban with Minor Conurbation 100.0 0.0
C: Urban with City and Town 83.5 16.5
D: Urban with Significant Rural 79.6 20.4
E: Largely Rural 85.4 14.6
F: Mainly Rural 84.0 16.0
OVERALL 75.5 24.5

Figure 1. Percentage of LAs per area type that voted Leave and Remain.

It’s not necessary to look at Figure 1 for long to appreciate that the Type A group, the urban major conurbations, is different from the rest, being the only where more of the local authorities were pro-Remain than pro-Leave.2

Not only are the percentages and the charts easier to produce and easier to interpret, they also show us that in this example the conclusion drawn from the chi-square test was, in fact, misleading.

Actually, the differences are not that great between the area types in terms of whether they had a majority for or against Brexit, not for most of the area types. What is different, is Area type A. Compared with the rest it is an outlier, an unusual case.

Conclusion

The chi-square test has a long history in geography and is especially useful in fieldwork when it can be applied to see if there is any relationship between the various attributes of sampled data. As a method of analysis, it charaterises the various steps required for statistical testing, which are:

Despite the algebra and statistical terminology that come with it, the idea behind chi-square is quite simple: compare the numbers we have with what the numbers would be if there wasn’t any relationship. Greater differences give greater confidence that the ‘no relationship’ option can be rejected.

Nevertheless, there are simpler approaches and they are worth considering. Sometimes methods like calculating percentages or plotting data graphically can be more effective for understanding what the data are revealing.

Acknowledgement

This work was produced for the Royal Geographical Society (with IBG) as part of the Data skills in geography project, funded by the Nuffield Foundation. Copyright under a Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) licence, April 2017.


  1. The use of p-values for testing statistical significance is increasingly controversial. For a flavour of the debate, have a look at this article

  2. Much of the reason is due to a London effect where most of its boroughs voted to Remain