MT5762 Lecture 20

C. Donovan

Tables of counts

Most things examined thus far have fitted into the general framework of linear models

There are group of tests conducted on tables of counts/contingency tables which don't fit so well in this framwork that we cover now (you need to be au fait with these)

However, you could use Generalised Linear Models for similar analyses (as seen in MT5761)

Tables of counts

We look at intimately related tests

Chi-squared goodness-of-fit tests
Chi-squared tests for homogeneity
Chi-squared tests for independence

Tables of counts

The testing approach is somewhat familiar

We compare what we expect under some Null hypothesis, to what we actually observed
The discrepency provides some measure of evidence against $ H_0 $
If the discrepency is so large $ H_0 $ seems implausible, we reject it
We assess this by a test statistic, a probability distribution for it assuming $ H_0 $ is true and subsequent $ p $-value

Simple motivating example

Consider the following data tabulating support for Democratic, Republican or Independent candidates. In total there are 2757 individuals in the sample (example from Agresti 2007).

	Democrat	Independent	Republican
Female	762	327	468
Male	484	239	477

Simple motivating example

Immediate questions arise:

Is there “more” female support for Democrats than Republicans?
Indeed there are more Female Democrats in our sample, but there were generally more Democrats.

Even adjusting for this, I know another sample would give different figures. So is the difference explicable by sampling variability?

	Democrat	Independent	Republican
Female	762	327	468
Male	484	239	477

Simple motivating example

To answer this, we need to address:

What is $ H_0 $?
What is expected under this?
What test statistic arises?
What distribution applies to this under $ H_0 $?

We'll return to this example later

Drug use in Scotland

Following our themes of drugs and/or disaster

Case-study The Scottish Schools Adolescent Lifestyle and Substance Use Survey SALSUS has been established by the Scottish Executive to provide a broad based approach to the monitoring of substance use among young people in Scotland.

Some findings (2002):

Cannabis was the most commonly reported drug used in the last month: 21% of 15 year olds. Few reported any other drug.
21% of 15 year olds who had used drugs in the month before the survey reported that they would like to stop using drugs now.
Over two thirds (69%) of the pupils who used drugs most days, and over half (53%) of those who used drugs at least once a week reported that they 'would not like to give up'.
Nearly half (47%) of 15 year old regular drug users reported spending 10 pounds or more each week on drugs.
(Various other somewhat grim findings…)

The Chi-square goodness of fit test

Chi-square tests on contingency tables look at the distributions of counts over the cells
So, we might ask: does a particular row or column distribution differ significantly from some other distribution?

For example, does this SALSUS sample represent Scotland ethnically?

The Chi-square goodness of fit test

We will assess the similarly of SALSUS sample ethnicity to the census population.

Ethnicity	Census	SALSUS	SALSUS counts
White	97.26	95.23	21249
Bangladeshi	0.1	0.1	23
Indian	0.3	0.3	68
Pakistani	1.0	0.9	204
Mixed	0.5	0.9	204
Chinese	0.4	0.5	113
Blk African	0.1	0.2	45
Blk Caribbean	0.02	0.1	23
Blk Other	0.02	0.5	45
Other	0.3	1.2	272
Base			22246

What do we know?

We have quantitative data (counts) which are categorised using one factor: ethnicity. We have a one-dimensional table.
The census figures look quite similar to the SALSUS sample
Some possible differences in the white and black categories.
Are they explicable by sampling variability?

What do we want to test?

The usual strategy:

we take the difference between our theory and data and express it in a standardised way e.g. in units of variability.
This gives a test statistic, whose distribution we theoretically know under the Null hypothesis.
How likely is the test statistic (or bigger ones) from our sample given this distribution (i.e. if $ H_0 $ is true)?
If it is 'unlikely' then the data does not appear to support the $ H_0 $.

What do we want to test?

As for all tests, we have a null ($ H_0 $) and alternative hypothesis ($ H_1 $). In this case:

$ H_0 $: the data come from the census population
$ H_1 $: the data do not come from the census population

What do we expect to see if the null hypothesis is true?

We obtain expected counts directly using our null hypothesis:

Expected count = total $ \times $ specified cell probability

eg. For the 'White' group: 95.23% of 22246 =

\[ 22246 \times \frac{97.26}{100}=22246 \times 0.9726=21693.67 \]

eg. For the 'Bangladeshi' group 0.10% of 22246:

\[ 22246 \times \frac{0.10}{100}=22246 \times 0.001=22.246 \]

Is our data consistent with the null hypothesis?

The observed counts are obtained from our sample eg. 21249, 23,…,272. We calculate a measure of difference between the observed and expected counts using the Chi-square test statistic:

Chi-square test statistic

\[ \begin{align*} x^2_0 =& \sum_{\textrm{all cells}} \frac{\textrm{(observed count- expected count)}^2}{\textrm{expected count}}\\ =&\sum_{\textrm{all cells}} \frac{\textrm{(O - E)}^2}{\textrm{E}} \end{align*} \]

Is our data consistent with the null hypothesis?

Note a key component to this term is a simple (squared) distance between what is predicted by our theory, and what we observed in our data (O-E)

The Chi-square contributions are calculated for all ethnic groups in the table and these are added together to give the Chi-square test statistic.
As with the $ t $-test and the $ F $-test, the larger the test statistic the stronger the evidence against $ H_0 $

Is our data consistent with the null hypothesis?

Note a key component to this term is a simple (squared) distance between what is predicted by our theory, and what we observed in our data (O-E)

eg. The White group contributes 6.94 to the overall test statistic:

\[ \frac{(21249-21636.45)^2}{21636.45}=6.938 \]

In this case, the Chi-square contributions from the 10 ethnic groups add to 1193.89.

Is our data consistent with the null hypothesis?

Is this test statistic (1193.9) peculiar value under $ H_0 $?
What $ \chi^2_0 $) are likely/unlikely if $ H_0 $ is true?
Theoretically under $ H_0 $, these follow a $ \chi^2 $ with df = number of categories-1.
We have 12 ethnic groups and so $ df=10-1=9 $.

Is our data consistent with the null hypothesis?

  x <- seq(0, 30, length = 100)

  plot(x, dchisq(x, 9), type = 'l', lwd = 2)

plot of chunk unnamed-chunk-5

Finding the $p$-value

Looking at that distribution, you know we're effectively zero:

 pchisq(1193.9, 9, lower.tail = F)

[1] 2.515343e-251

Finding the $p$-value

In our example, we got a very large Chi-square test statistic (1193.9).
The chance of getting a test statistic this large, or larger, when the data come from the census is effectively zero
We have very strong evidence against $ H_0 $, we have a highly significant result.

What can we conclude?

What is underlying this result? Why do we have such a huge test statistic?

Of course you don't do these things by hand…

Doing this in R

ethnicGroup <- c("White", "Bangladeshi", "Indian", "Pakistani", "Mixed",
                 "Chinese", "Blk African", "Blk Caribbean", "Blk Other", "Other")
salus <- c(21249, 23, 68, 204, 204, 113, 45, 23, 45, 272)
census <- c(97.26, 0.1, 0.3, 1, 0.5, 0.4, 0.1, 0.02, 0.02, 0.3)/100

# specify our table counts (salus) and the hypothesised dist (p = census)
# note a discrete distribution, hence probabilities
salus_test <- chisq.test(salus, p = census)

salusDF <- data.frame(ethnicGroup, salus, census)

head(salusDF)

  ethnicGroup salus census
1       White 21249 0.9726
2 Bangladeshi    23 0.0010
3      Indian    68 0.0030
4   Pakistani   204 0.0100
5       Mixed   204 0.0050
6     Chinese   113 0.0040

Doing this in R

The test object salus_test has various useful things:

names(salus_test)

[1] "statistic" "parameter" "p.value"   "method"    "data.name" "observed" 
[7] "expected"  "residuals" "stdres"

The test statistic (statistic = 1193.894752)
The test $ p $-value (p.value = 2.5219128 × 10^-251)
Observed and expected count vectors observed & expected

Calculating $ (O-E)^2/E $:

chiContrib <- data.frame(ethnicGroup, observed = salus_test$observed, 
  expected = salus_test$expected, chisq_contrib = (salus_test$observed-salus_test$expected)^2/salus_test$expected)

knitr::kable(chiContrib, digits = 2)

ethnicGroup	observed	expected	chisq_contrib
White	21249	21636.46	6.94
Bangladeshi	23	22.25	0.03
Indian	68	66.74	0.02
Pakistani	204	222.46	1.53
Mixed	204	111.23	77.37
Chinese	113	88.98	6.48
Blk African	45	22.25	23.27
Blk Caribbean	23	4.45	77.35
Blk Other	45	4.45	369.59
Other	272	66.74	631.31

What can we conclude?

Black Other, and Other groups were particularly over-represented compared to census.
We can also see that the White group was particularly under-represented compared to census.

So maybe a biased sample, or likely the demographics are changing

The Chi-square test for homogeneity

Do attitudes differ with drug use?

Here we seek to test if attitudes towards those who use or sell drugs differ with drug use status.
Specifically, do those in the population that use drugs feel differently about other drug users and drug suppliers than those that don't use drugs?
The students were asked their opinion on the following 5 statements and their responses were classified according to their reported drug use (table follows).

(%-ages) 15 year olds pupils' attitudes to those involved with drugs, by drug use status: Scotland 2002.

Statement	Agree (%)	Disagree (%)	Don't know (%)	$ N $
Used drugs in last month
People my age who take drugs need help and advice	21	64	15	2235
All people who sell drugs should be punished	26	59	15	2242
People who take drugs are stupid	21	67	12	2234
All people who take drugs should be punished	6	85	9	2242
People who take heroin are junkies	59	30	11	2234
Never used drugs
People my age who take drugs need help and advice	76	10	14	6466
All people who sell drugs should be punished	70	15	15	6461
People who take drugs are stupid	65	22	13	6463
All people who take drugs should be punished	29	47	23	6459
People who take heroin are junkies	51	20	29	6458

What do we know?

Most students who used drugs in the last month disagreed with statements 1–4 and agreed with statements 5.
Conversely, the majority of students who reported they have never used drugs agreed with statements 1–3.
The majority of students in both groups were in agreement for statements 4 & 5.

We are going to look at one particular statement 'People who take drugs are stupid'

Comparison across usage groups

'People who take drugs are stupid':

Drug use status	Agree	Disagree	Don't know	Total
Used drugs in last month	469	1497	268	2234
Never used drugs	4201	1422	840	6463
Totals	4670	2919	1108	8697

15 year olds pupils' attitudes to those involved with drugs, by drug use status: Scotland 2002

The question

The majority (67%) of those who had used drugs in the last month did not agree that people who take drugs are stupid.
Conversely, the majority (65%) of those students who have never used drugs agreed that people who take drugs are stupid.
Are any differences in opinion between these two groups real, or are these differences due to sampling variability?

What do we want to test?

We want to test if attitudes in the population towards those who take drugs differ with drug use.

The null hypothesis ($ H_0 $) The samples have the same underlying distribution i.e. Opinions are the same for those that have used drugs in the last month and those that have never used drugs.
The alternative hypothesis ($ H_1 $) The samples do not have the same underlying distribution i.e. Opinions are not the same for those that have used drugs in the last month and those that have never used drugs.

Measuring consistency of data to $H_0$

We obtain the expected counts, under the null hypothesis, using:

\[ E = \frac{\textrm{Row total} \times \textrm{Column total}}{\textrm{Grand Total}} \]

eg. 'Used drugs in last month' and 'Agree': \[ \left[\frac{2234 \times 4670}{8697}\right]=119.58 \]

Is our data consistent with the null hypothesis? Our Observed counts are obtained from our sample. eg. 469,1497,…,840.

Measuring consistency of data to $H_0$

Easier in R

drugsTable <- as.table(rbind(c(469, 1497, 268), c(4201, 1422, 840)))
dimnames(drugsTable) <- list(group = c("Used recently", "Never used"),
                    response = c("Agree","Disagree", "Don't know"))

# see what I did here
drugTest <- chisq.test(drugsTable)

drugTest


    Pearson's Chi-squared test

data:  drugsTable
X-squared = 1602, df = 2, p-value < 2.2e-16

Measuring consistency of data to $H_0$

Expected values

  knitr::kable(drugTest$expected)

	Agree	Disagree	Don't know
Used recently	1199.584	749.8041	284.6122
Never used	3470.416	2169.1959	823.3878

Observed values

  knitr::kable(drugTest$observed)

	Agree	Disagree	Don't know
Used recently	469	1497	268
Never used	4201	1422	840

Measuring consistency of data to $H_0$

Their contributions to the test statistic

  contributionTable <- (drugTest$observed - drugTest$expected)^2/drugTest$expected

  knitr::kable(contributionTable)

	Agree	Disagree	Don't know
Used recently	444.9482	744.5969	0.9696143
Never used	153.8008	257.3773	0.3351568

  sum(contributionTable)

[1] 1602.028

Measuring consistency of data to $H_0$

Discrepency between observed and expected counts as before:

$ \chi^2 $ test statistic

\[ x^2_0=\sum_{\textrm{all cells}} \frac{\textrm{(O - E)}^2}{\textrm{E}} \]

For example 'Used drugs in last month' and 'Agree' contributes 444.95 to the overall test statistic of 1602.03:

\[ \frac{[469-1199.58]^2}{1199.58}=444.95 \]

Measuring consistency of data to $H_0$

All cell contributions are added together to give the Chi-square test statistic, $ x_0^2 $.
The chi-square contributions from the 6 cells in the table add to 1602.03

\[ x^2_0 = 444.95+ 744.60+ 0.97+ 153.80+ 257.38+ 0.34=1602.3 \]

Is this likely to occur just to chance if $ H_0 $ were true? We quantify the 'likelihood' of this result under the Null hypothesis

Finding the $p$-value

How probable is our data result if $ H_0 $ were true?
$ p $-value from a chi-square distribution with df = number of columns-1 $ \times $ number of rows-1

\[ df=(3-1) \times (2-1) = 2 \times 1 = 2 \]

Finding the $p$-value

Under the null hypothesis we expect to get Chi-square values 6 or less about 95% of the time when $ df=2 $.

  qchisq(0.95, 2)

[1] 5.991465

In our example, we got a mammoth Chi-square value of 1602.3.

Finding the $p$-value

In our example, we got a mammoth Chi-square value of 1602.3.

\[ Pr(\chi^2 \geq x^2_0) ~~\textrm{where} ~~ \chi^2 \sim \textrm{Chi-square}(df) \]

pchisq(1602, 2, lower.tail = F)

[1] 0

drugTest$p.value

[1] 0

What can we conclude?

We have very strong evidence against $ H_0 $. ie. very strong evidence that the opinions are not the same for those that have used drugs in the last month and those that have never used drugs.

Chi-square test for independence

Let's return to voters and gender

Voters and gender

Is there a relationship between gender and voting intention?
Is gender and voting intention independent of one another?

So we seek to test $ H_0 $: gender and voting intention are independent

	Democrat	Independent	Republican
Female	762	327	468
Male	484	239	477

Voters and gender

So how do we calculate expected counts assuming $ H_0 $ is true?

In this case, the probability of being in a particular cell, is the probability of being in the particular row, multiplied by the probability of being in the particular column
$ P(A~and~B) = P(A)\times P(B) $ - as we saw waaay back with treating independent events
This rationale results the same calculations for counts as before: (row total $ \times $ col total)/table total

Voters and gender

voterTable <- as.table(rbind(c(762, 327, 468), c(484, 239, 477)))
dimnames(voterTable) <- list(gender = c("F", "M"),
                    party = c("Democrat","Independent", "Republican"))
voterTest <- chisq.test(voterTable)

knitr::kable(voterTable)

	Democrat	Independent	Republican
F	762	327	468
M	484	239	477

voterTest


    Pearson's Chi-squared test

data:  voterTable
X-squared = 30.07, df = 2, p-value = 2.954e-07

Voters and gender

Observed and expected values

  knitr::kable(voterTest$expected)

	Democrat	Independent	Republican
F	703.6714	319.6453	533.6834
M	542.3286	246.3547	411.3166

  knitr::kable(voterTest$observed)

	Democrat	Independent	Republican
F	762	327	468
M	484	239	477

Their contributions to the test statistic

  contributionTable <- (voterTest$observed - voterTest$expected)^2/voterTest$expected

  knitr::kable(contributionTable)

	Democrat	Independent	Republican
F	4.834967	0.1692254	8.084012
M	6.273369	0.2195700	10.489006

  sum(contributionTable)

[1] 30.07015

  pchisq(sum(contributionTable), 2, lower.tail = F)

[1] 2.953589e-07

Voters and gender

So we certainly reject $ H_0 $ as unlikely to be generating our data
The result is due to different gender distributions within Democrats and Republicans, larger than explained by sampling noise
Looking at the residuals, there are less males/more females within Democrats than implied by $ H_0 $
The converse is true for Republicans

  knitr::kable(voterTest$stdres)

	Democrat	Independent	Republican
F	4.502053	0.6994517	-5.315945
M	-4.502053	-0.6994517	5.315945

The validity of Chi-square tests

As ever, there are assumptions

The assumptions

Chi-square tests are only valid when the data are collected as a random sample or as a number of random samples.
Chi-square tests are large sample tests that require the total count for the table to be sufficiently large.
There isn't complete agreement about how large is large enough
The following rules help ensure we don't use Chi-square tests for samples which are too small:
- each expected count should be greater than 1
- 80% of the expected counts should be at least 5

Recap and look-forwards

We've covered:

Chi-square tests on tables of counts/contingency tables
So we have tests for relationships between two categorical variables

Upcoming:

No more lectures
Pub

MT5762 Lecture 20

Tables of counts

Tables of counts

Tables of counts

Simple motivating example

Simple motivating example

Simple motivating example

Drug use in Scotland

Some findings (2002):

The Chi-square goodness of fit test

The Chi-square goodness of fit test

What do we know?

What do we want to test?

What do we want to test?

What do we expect to see if the null hypothesis is true?

Is our data consistent with the null hypothesis?

Is our data consistent with the null hypothesis?

Is our data consistent with the null hypothesis?

Is our data consistent with the null hypothesis?

Is our data consistent with the null hypothesis?

Finding the \(p\)-value

Finding the \(p\)-value

What can we conclude?

Doing this in R

Doing this in R

What can we conclude?

The Chi-square test for homogeneity

Do attitudes differ with drug use?

What do we know?

Comparison across usage groups

The question

What do we want to test?

Measuring consistency of data to $H_0$

Measuring consistency of data to \(H_0\)

Measuring consistency of data to \(H_0\)

Measuring consistency of data to \(H_0\)

Measuring consistency of data to \(H_0\)

Measuring consistency of data to \(H_0\)

Finding the \(p\)-value

Finding the \(p\)-value

Finding the \(p\)-value

What can we conclude?

Chi-square test for independence

Voters and gender

Voters and gender

Voters and gender

Voters and gender

Voters and gender

The validity of Chi-square tests

The assumptions

Recap and look-forwards