MT5762 Lecture 20

C. Donovan

Tables of counts

Most things examined thus far have fitted into the general framework of linear models

There are group of tests conducted on tables of counts/contingency tables which don't fit so well in this framwork that we cover now (you need to be au fait with these)

However, you could use Generalised Linear Models for similar analyses (as seen in MT5761)

Tables of counts

We look at intimately related tests

  • Chi-squared goodness-of-fit tests
  • Chi-squared tests for homogeneity
  • Chi-squared tests for independence

Tables of counts

The testing approach is somewhat familiar

  • We compare what we expect under some Null hypothesis, to what we actually observed
  • The discrepency provides some measure of evidence against \( H_0 \)
  • If the discrepency is so large \( H_0 \) seems implausible, we reject it
  • We assess this by a test statistic, a probability distribution for it assuming \( H_0 \) is true and subsequent \( p \)-value

Simple motivating example

Consider the following data tabulating support for Democratic, Republican or Independent candidates. In total there are 2757 individuals in the sample (example from Agresti 2007).

Democrat Independent Republican
Female 762 327 468
Male 484 239 477

Simple motivating example

Immediate questions arise:

  • Is there “more” female support for Democrats than Republicans?
  • Indeed there are more Female Democrats in our sample, but there were generally more Democrats.
  • Even adjusting for this, I know another sample would give different figures. So is the difference explicable by sampling variability?
Democrat Independent Republican
Female 762 327 468
Male 484 239 477

Simple motivating example

To answer this, we need to address:

  • What is \( H_0 \)?
  • What is expected under this?
  • What test statistic arises?
  • What distribution applies to this under \( H_0 \)?

We'll return to this example later

Drug use in Scotland

Following our themes of drugs and/or disaster

Case-study The Scottish Schools Adolescent Lifestyle and Substance Use Survey SALSUS has been established by the Scottish Executive to provide a broad based approach to the monitoring of substance use among young people in Scotland.

Some findings (2002):

  • Cannabis was the most commonly reported drug used in the last month: 21% of 15 year olds. Few reported any other drug.
  • 21% of 15 year olds who had used drugs in the month before the survey reported that they would like to stop using drugs now.
  • Over two thirds (69%) of the pupils who used drugs most days, and over half (53%) of those who used drugs at least once a week reported that they 'would not like to give up'.
  • Nearly half (47%) of 15 year old regular drug users reported spending 10 pounds or more each week on drugs.
  • (Various other somewhat grim findings…)

The Chi-square goodness of fit test

  • Chi-square tests on contingency tables look at the distributions of counts over the cells
  • So, we might ask: does a particular row or column distribution differ significantly from some other distribution?

For example, does this SALSUS sample represent Scotland ethnically?

The Chi-square goodness of fit test

We will assess the similarly of SALSUS sample ethnicity to the census population.

Ethnicity Census SALSUS SALSUS counts
White 97.26 95.23 21249
Bangladeshi 0.1 0.1 23
Indian 0.3 0.3 68
Pakistani 1.0 0.9 204
Mixed 0.5 0.9 204
Chinese 0.4 0.5 113
Blk African 0.1 0.2 45
Blk Caribbean 0.02 0.1 23
Blk Other 0.02 0.5 45
Other 0.3 1.2 272
Base 22246

What do we know?

  • We have quantitative data (counts) which are categorised using one factor: ethnicity. We have a one-dimensional table.
  • The census figures look quite similar to the SALSUS sample
  • Some possible differences in the white and black categories.
  • Are they explicable by sampling variability?

What do we want to test?

The usual strategy:

  • we take the difference between our theory and data and express it in a standardised way e.g. in units of variability.
  • This gives a test statistic, whose distribution we theoretically know under the Null hypothesis.
  • How likely is the test statistic (or bigger ones) from our sample given this distribution (i.e. if \( H_0 \) is true)?
  • If it is 'unlikely' then the data does not appear to support the \( H_0 \).

What do we want to test?

As for all tests, we have a null (\( H_0 \)) and alternative hypothesis (\( H_1 \)). In this case:

  • \( H_0 \): the data come from the census population
  • \( H_1 \): the data do not come from the census population

What do we expect to see if the null hypothesis is true?

We obtain expected counts directly using our null hypothesis:

Expected count = total \( \times \) specified cell probability

eg. For the 'White' group: 95.23% of 22246 =

\[ 22246 \times \frac{97.26}{100}=22246 \times 0.9726=21693.67 \]

eg. For the 'Bangladeshi' group 0.10% of 22246:

\[ 22246 \times \frac{0.10}{100}=22246 \times 0.001=22.246 \]

Is our data consistent with the null hypothesis?

The observed counts are obtained from our sample eg. 21249, 23,…,272. We calculate a measure of difference between the observed and expected counts using the Chi-square test statistic:

Chi-square test statistic

\[ \begin{align*} x^2_0 =& \sum_{\textrm{all cells}} \frac{\textrm{(observed count- expected count)}^2}{\textrm{expected count}}\\ =&\sum_{\textrm{all cells}} \frac{\textrm{(O - E)}^2}{\textrm{E}} \end{align*} \]

Is our data consistent with the null hypothesis?

Note a key component to this term is a simple (squared) distance between what is predicted by our theory, and what we observed in our data (O-E)

  • The Chi-square contributions are calculated for all ethnic groups in the table and these are added together to give the Chi-square test statistic.
  • As with the \( t \)-test and the \( F \)-test, the larger the test statistic the stronger the evidence against \( H_0 \)

Is our data consistent with the null hypothesis?

Note a key component to this term is a simple (squared) distance between what is predicted by our theory, and what we observed in our data (O-E)

  • eg. The White group contributes 6.94 to the overall test statistic:

\[ \frac{(21249-21636.45)^2}{21636.45}=6.938 \]

  • In this case, the Chi-square contributions from the 10 ethnic groups add to 1193.89.

Is our data consistent with the null hypothesis?

  • Is this test statistic (1193.9) peculiar value under \( H_0 \)?
  • What \( \chi^2_0 \)) are likely/unlikely if \( H_0 \) is true?
  • Theoretically under \( H_0 \), these follow a \( \chi^2 \) with df = number of categories-1.
  • We have 12 ethnic groups and so \( df=10-1=9 \).

Is our data consistent with the null hypothesis?

  x <- seq(0, 30, length = 100)

  plot(x, dchisq(x, 9), type = 'l', lwd = 2)

plot of chunk unnamed-chunk-5

Finding the \(p\)-value

Looking at that distribution, you know we're effectively zero:

 pchisq(1193.9, 9, lower.tail = F)
[1] 2.515343e-251

Finding the \(p\)-value

  • In our example, we got a very large Chi-square test statistic (1193.9).
  • The chance of getting a test statistic this large, or larger, when the data come from the census is effectively zero
  • We have very strong evidence against \( H_0 \), we have a highly significant result.

What can we conclude?

What is underlying this result? Why do we have such a huge test statistic?

Of course you don't do these things by hand…

Doing this in R

ethnicGroup <- c("White", "Bangladeshi", "Indian", "Pakistani", "Mixed",
                 "Chinese", "Blk African", "Blk Caribbean", "Blk Other", "Other")
salus <- c(21249, 23, 68, 204, 204, 113, 45, 23, 45, 272)
census <- c(97.26, 0.1, 0.3, 1, 0.5, 0.4, 0.1, 0.02, 0.02, 0.3)/100

# specify our table counts (salus) and the hypothesised dist (p = census)
# note a discrete distribution, hence probabilities
salus_test <- chisq.test(salus, p = census)

salusDF <- data.frame(ethnicGroup, salus, census)

head(salusDF)
  ethnicGroup salus census
1       White 21249 0.9726
2 Bangladeshi    23 0.0010
3      Indian    68 0.0030
4   Pakistani   204 0.0100
5       Mixed   204 0.0050
6     Chinese   113 0.0040

Doing this in R

The test object salus_test has various useful things:

names(salus_test)
[1] "statistic" "parameter" "p.value"   "method"    "data.name" "observed" 
[7] "expected"  "residuals" "stdres"   
  • The test statistic (statistic = 1193.894752)
  • The test \( p \)-value (p.value = 2.5219128 × 10-251)
  • Observed and expected count vectors observed & expected

Calculating \( (O-E)^2/E \):

chiContrib <- data.frame(ethnicGroup, observed = salus_test$observed, 
  expected = salus_test$expected, chisq_contrib = (salus_test$observed-salus_test$expected)^2/salus_test$expected)

knitr::kable(chiContrib, digits = 2)
ethnicGroup observed expected chisq_contrib
White 21249 21636.46 6.94
Bangladeshi 23 22.25 0.03
Indian 68 66.74 0.02
Pakistani 204 222.46 1.53
Mixed 204 111.23 77.37
Chinese 113 88.98 6.48
Blk African 45 22.25 23.27
Blk Caribbean 23 4.45 77.35
Blk Other 45 4.45 369.59
Other 272 66.74 631.31

What can we conclude?

  • Black Other, and Other groups were particularly over-represented compared to census.
  • We can also see that the White group was particularly under-represented compared to census.

So maybe a biased sample, or likely the demographics are changing

The Chi-square test for homogeneity

Do attitudes differ with drug use?

  • Here we seek to test if attitudes towards those who use or sell drugs differ with drug use status.
  • Specifically, do those in the population that use drugs feel differently about other drug users and drug suppliers than those that don't use drugs?
  • The students were asked their opinion on the following 5 statements and their responses were classified according to their reported drug use (table follows).

(%-ages) 15 year olds pupils' attitudes to those involved with drugs, by drug use status: Scotland 2002.

Statement Agree (%) Disagree (%) Don't know (%) \( N \)
Used drugs in last month
People my age who take drugs need help and advice 21 64 15 2235
All people who sell drugs should be punished 26 59 15 2242
People who take drugs are stupid 21 67 12 2234
All people who take drugs should be punished 6 85 9 2242
People who take heroin are junkies 59 30 11 2234
Never used drugs
People my age who take drugs need help and advice 76 10 14 6466
All people who sell drugs should be punished 70 15 15 6461
People who take drugs are stupid 65 22 13 6463
All people who take drugs should be punished 29 47 23 6459
People who take heroin are junkies 51 20 29 6458

What do we know?

  • Most students who used drugs in the last month disagreed with statements 1–4 and agreed with statements 5.
  • Conversely, the majority of students who reported they have never used drugs agreed with statements 1–3.
  • The majority of students in both groups were in agreement for statements 4 & 5.

We are going to look at one particular statement 'People who take drugs are stupid'

Comparison across usage groups

'People who take drugs are stupid':

Drug use status Agree Disagree Don't know Total
Used drugs in last month 469 1497 268 2234
Never used drugs 4201 1422 840 6463
Totals 4670 2919 1108 8697

15 year olds pupils' attitudes to those involved with drugs, by drug use status: Scotland 2002

The question

  • The majority (67%) of those who had used drugs in the last month did not agree that people who take drugs are stupid.
  • Conversely, the majority (65%) of those students who have never used drugs agreed that people who take drugs are stupid.
  • Are any differences in opinion between these two groups real, or are these differences due to sampling variability?

What do we want to test?

We want to test if attitudes in the population towards those who take drugs differ with drug use.

  • The null hypothesis (\( H_0 \)) The samples have the same underlying distribution i.e. Opinions are the same for those that have used drugs in the last month and those that have never used drugs.

  • The alternative hypothesis (\( H_1 \)) The samples do not have the same underlying distribution i.e. Opinions are not the same for those that have used drugs in the last month and those that have never used drugs.

Measuring consistency of data to $H_0$

We obtain the expected counts, under the null hypothesis, using:

\[ E = \frac{\textrm{Row total} \times \textrm{Column total}}{\textrm{Grand Total}} \]

eg. 'Used drugs in last month' and 'Agree': \[ \left[\frac{2234 \times 4670}{8697}\right]=119.58 \]

Is our data consistent with the null hypothesis? Our Observed counts are obtained from our sample. eg. 469,1497,…,840.

Measuring consistency of data to \(H_0\)

Easier in R

drugsTable <- as.table(rbind(c(469, 1497, 268), c(4201, 1422, 840)))
dimnames(drugsTable) <- list(group = c("Used recently", "Never used"),
                    response = c("Agree","Disagree", "Don't know"))

# see what I did here
drugTest <- chisq.test(drugsTable)

drugTest

    Pearson's Chi-squared test

data:  drugsTable
X-squared = 1602, df = 2, p-value < 2.2e-16

Measuring consistency of data to \(H_0\)

Expected values

  knitr::kable(drugTest$expected)
Agree Disagree Don't know
Used recently 1199.584 749.8041 284.6122
Never used 3470.416 2169.1959 823.3878

Observed values

  knitr::kable(drugTest$observed)
Agree Disagree Don't know
Used recently 469 1497 268
Never used 4201 1422 840

Measuring consistency of data to \(H_0\)

Their contributions to the test statistic

  contributionTable <- (drugTest$observed - drugTest$expected)^2/drugTest$expected

  knitr::kable(contributionTable)
Agree Disagree Don't know
Used recently 444.9482 744.5969 0.9696143
Never used 153.8008 257.3773 0.3351568
  sum(contributionTable)
[1] 1602.028

Measuring consistency of data to \(H_0\)

Discrepency between observed and expected counts as before:

\( \chi^2 \) test statistic

\[ x^2_0=\sum_{\textrm{all cells}} \frac{\textrm{(O - E)}^2}{\textrm{E}} \]

  • For example 'Used drugs in last month' and 'Agree' contributes 444.95 to the overall test statistic of 1602.03:

\[ \frac{[469-1199.58]^2}{1199.58}=444.95 \]

Measuring consistency of data to \(H_0\)

  • All cell contributions are added together to give the Chi-square test statistic, \( x_0^2 \).
  • The chi-square contributions from the 6 cells in the table add to 1602.03

\[ x^2_0 = 444.95+ 744.60+ 0.97+ 153.80+ 257.38+ 0.34=1602.3 \]

  • Is this likely to occur just to chance if \( H_0 \) were true? We quantify the 'likelihood' of this result under the Null hypothesis

Finding the \(p\)-value

  • How probable is our data result if \( H_0 \) were true?
  • \( p \)-value from a chi-square distribution with df = number of columns-1 \( \times \) number of rows-1

\[ df=(3-1) \times (2-1) = 2 \times 1 = 2 \]

Finding the \(p\)-value

  • Under the null hypothesis we expect to get Chi-square values 6 or less about 95% of the time when \( df=2 \).
  qchisq(0.95, 2)
[1] 5.991465
  • In our example, we got a mammoth Chi-square value of 1602.3.

Finding the \(p\)-value

  • In our example, we got a mammoth Chi-square value of 1602.3.

\[ Pr(\chi^2 \geq x^2_0) ~~\textrm{where} ~~ \chi^2 \sim \textrm{Chi-square}(df) \]

pchisq(1602, 2, lower.tail = F)
[1] 0
drugTest$p.value
[1] 0

What can we conclude?

  • We have very strong evidence against \( H_0 \). ie. very strong evidence that the opinions are not the same for those that have used drugs in the last month and those that have never used drugs.

Chi-square test for independence

Let's return to voters and gender

Voters and gender

  • Is there a relationship between gender and voting intention?
  • Is gender and voting intention independent of one another?

So we seek to test \( H_0 \): gender and voting intention are independent

Democrat Independent Republican
Female 762 327 468
Male 484 239 477

Voters and gender

So how do we calculate expected counts assuming \( H_0 \) is true?

  • In this case, the probability of being in a particular cell, is the probability of being in the particular row, multiplied by the probability of being in the particular column
  • \( P(A~and~B) = P(A)\times P(B) \) - as we saw waaay back with treating independent events
  • This rationale results the same calculations for counts as before: (row total \( \times \) col total)/table total

Voters and gender

voterTable <- as.table(rbind(c(762, 327, 468), c(484, 239, 477)))
dimnames(voterTable) <- list(gender = c("F", "M"),
                    party = c("Democrat","Independent", "Republican"))
voterTest <- chisq.test(voterTable)

knitr::kable(voterTable)
Democrat Independent Republican
F 762 327 468
M 484 239 477
voterTest

    Pearson's Chi-squared test

data:  voterTable
X-squared = 30.07, df = 2, p-value = 2.954e-07

Voters and gender

Observed and expected values

  knitr::kable(voterTest$expected)
Democrat Independent Republican
F 703.6714 319.6453 533.6834
M 542.3286 246.3547 411.3166
  knitr::kable(voterTest$observed)
Democrat Independent Republican
F 762 327 468
M 484 239 477

Their contributions to the test statistic

  contributionTable <- (voterTest$observed - voterTest$expected)^2/voterTest$expected

  knitr::kable(contributionTable)
Democrat Independent Republican
F 4.834967 0.1692254 8.084012
M 6.273369 0.2195700 10.489006
  sum(contributionTable)
[1] 30.07015
  pchisq(sum(contributionTable), 2, lower.tail = F)
[1] 2.953589e-07

Voters and gender

  • So we certainly reject \( H_0 \) as unlikely to be generating our data
  • The result is due to different gender distributions within Democrats and Republicans, larger than explained by sampling noise
  • Looking at the residuals, there are less males/more females within Democrats than implied by \( H_0 \)
  • The converse is true for Republicans
  knitr::kable(voterTest$stdres)
Democrat Independent Republican
F 4.502053 0.6994517 -5.315945
M -4.502053 -0.6994517 5.315945

The validity of Chi-square tests

As ever, there are assumptions

The assumptions

  • Chi-square tests are only valid when the data are collected as a random sample or as a number of random samples.
  • Chi-square tests are large sample tests that require the total count for the table to be sufficiently large.
  • There isn't complete agreement about how large is large enough
  • The following rules help ensure we don't use Chi-square tests for samples which are too small:
    • each expected count should be greater than 1
    • 80% of the expected counts should be at least 5

Recap and look-forwards

We've covered:

  • Chi-square tests on tables of counts/contingency tables
  • So we have tests for relationships between two categorical variables

Upcoming:

  • No more lectures
  • Pub