If you were fortunate to learn statistics using Andy Field’s excellent Discovering Statistics Using R/SPSS/SAS series, then you are already well aware that almost every test in classical (parametric) statistics education is now understood to be part of the general linear model or GLM. It turns out that focussing on the deeper GLM lurking everywhere gives us a convenient way to extend all these classical tests to the repeated measures case. For \(t\)-tests and ANOVA, the equivalent model is indeed just a linear mixed-effects model, so I’ll leave that alone for now, but you this follows from all the pieces:

  • the relationship between ANOVA and the \(t\)-test
  • the relationship betweenthe \(t\)-test and simple linear regression
  • repeated-measures ANOVA is just a special case of the mixed model with certain assumptions about the random effects (sphericity and only one variance component).

For the other models and tests, it makes sense to start with the fixed-effects (i.e. not mixed) models until we’re used to dealing with them. The mixed-effects models are a relatively straightforward extension, so we’ll just leave them alone for now. A lot of the frequenstist examples here were inspired by Bayesian examples from Rasmus Bååth and his Bayesian First Aid package. It’s good work, but I would prefer estimation to testing, so I have mixed feelings about displaying the models as tests.

Beyond the “estimating is better than testing” perspective and the difficulties of finding repeated-measures equivalents for a lot of these tests, there is anohter advantage: the explicit GLM-based modeling strategy allows for continuous predictors, while many classical tests do not.

For a lot of the examples today, we’ll be using the survey data from the MASS package.

library(MASS)
library(car)
Loading required package: carData
# use ANOVA-style contrasts
options(contrasts=c("contr.Sum", "contr.poly"))
library(tidyverse)
── Attaching packages ────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
✔ ggplot2 2.2.1     ✔ purrr   0.2.5
✔ tibble  1.4.2     ✔ dplyr   0.7.5
✔ tidyr   0.8.1     ✔ stringr 1.3.1
✔ readr   1.1.1     ✔ forcats 0.3.0
── Conflicts ───────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
✖ dplyr::recode() masks car::recode()
✖ dplyr::select() masks MASS::select()
✖ purrr::some()   masks car::some()
?survey
survey R Documentation

Student Survey Data

Description

This data frame contains the responses of 237 Statistics I students at the University of Adelaide to a number of questions.

Usage

survey

Format

The components of the data frame are:

Sex

The sex of the student. (Factor with levels “Male” and “Female”.)

Wr.Hnd

span (distance from tip of thumb to tip of little finger of spread hand) of writing hand, in centimetres.

NW.Hnd

span of non-writing hand.

W.Hnd

writing hand of student. (Factor, with levels “Left” and “Right”.)

Fold

“Fold your arms! Which is on top” (Factor, with levels “R on L”, “L on R”, “Neither”.)

Pulse

pulse rate of student (beats per minute).

Clap

‘Clap your hands! Which hand is on top?’ (Factor, with levels “Right”, “Left”, “Neither”.)

Exer

how often the student exercises. (Factor, with levels “Freq” (frequently), “Some”, “None”.)

Smoke

how much the student smokes. (Factor, levels “Heavy”, “Regul” (regularly), “Occas” (occasionally), “Never”.)

Height

height of the student in centimetres.

M.I

whether the student expressed height in imperial (feet/inches) or metric (centimetres/metres) units. (Factor, levels “Metric”, “Imperial”.)

Age

age of the student in years.

References

Venables, W. N. and Ripley, B. D. (1999) Modern Applied Statistics with S-PLUS. Third Edition. Springer.

For everything but the \(\chi^2\)-Test / Poisson model, there are similar (non repeated-measures) tests implemented in a Bayesian framework as part of Rasmus Bååth’s BayesianFirstAid package. His blog is generally worth a look, and if you’re joining us in a few weeks for the Bayes course, a must read. Similar thoughts in a more comprehensive format yet still very readable if you have the time can be found in McElreath’s book Statistical Rethinking. McElreath also gets more into the deeper relationship between the various distributions and thus tests – beyond all belonging to the exponential family, many of the distributions can be thought of as limiting cases of one another (e.g. the normal distribution is like a binomial distribution with an infinite number of trials). Some of these are hinted in the R documentation for the classical tests.

\(\chi^2\)-Test : Poisson Model

Classical Test: \(\chi^2\)-Test of Independence

smoke_exercise <- table(Smoke=survey$Smoke, Exer=survey$Exer)
smoke_exercise.chisq <- chisq.test(smoke_exercise, correct=FALSE)
## Warning in chisq.test(smoke_exercise, correct = FALSE): Chi-squared
## approximation may be incorrect
print(smoke_exercise.chisq)
## 
##  Pearson's Chi-squared test
## 
## data:  smoke_exercise
## X-squared = 5.4885, df = 6, p-value = 0.4828

We are unable to reject the null hypothesis. But that doesn’t tell us much. The chisq.test actually gives a lot more information than most people realize.

The observed data:

smoke_exercise.chisq$observed
Smoke/Exer Freq None Some
Heavy 7 1 3
Never 87 18 84
Occas 12 3 4
Regul 9 1 7

The expected results based on the null model:

smoke_exercise.chisq$expected
Freq None Some
Heavy 5.360169 1.072034 4.567797
Never 92.097458 18.419491 78.483051
Occas 9.258475 1.851695 7.889831
Regul 8.283898 1.656780 7.059322

And the Pearson residuals ( (observed - expected) / sqrt(expected) )

smoke_exercise.chisq$residuals
Smoke/Exer Freq None Some
Heavy 0.7082877 -0.0695717 -0.7335612
Never -0.5311654 -0.0977427 0.6227461
Occas 0.9009954 0.8438642 -1.3848312
Regul 0.2488040 -0.5102551 -0.0223272

Looking at the differences and the standardized residuals (which are somewhat analogous to \(z\)-scores), it’s no surprise that we can’t reject the null hypothesis.

Explicit GLM Test: Poisson

But I’ve always found the \(\chi^2\)-test confusing for anything but the simplest designs – a significant result suggest that there’s a difference somewhere, but where? I find it much easier to reason in terms of which things actually make a difference and what that difference looks like. We can do that by using the GLM explicitly.

The Poisson distribution is often used for modelling count data. There are some issues with Poisson models for data with too many zeros (so-called Zero-Inflated Poissons or ZIP) or for data where the variance is too (overdispersion), but we’ll leave those issues alone for today.

The basic Poisson model for our data is easy to compute, but first we need to get our data in the right format:

dat <- as.data.frame(smoke_exercise)
print(dat)
##    Smoke Exer Freq
## 1  Heavy Freq    7
## 2  Never Freq   87
## 3  Occas Freq   12
## 4  Regul Freq    9
## 5  Heavy None    1
## 6  Never None   18
## 7  Occas None    3
## 8  Regul None    1
## 9  Heavy Some    3
## 10 Never Some   84
## 11 Occas Some    4
## 12 Regul Some    7

Note that the the Freq column refers to the number of “matches” for a particular combination of smoker type Smoke and excercise regularity Exer and is not to be confused with the entry Freq in the Exer column, which refers to frequent exercise.

The model itself is for predicting how often we observe a particular combination of Smoke and Exer, so they form our predictors and Freq forms our prediction. We use the “over-parameterized” version of the model without an intercept so that each category is explicit and no category is “hidden” in the intercept.

smoke_exercise.glm <- glm(Freq ~ 0 + Smoke * Exer, data=dat, family=poisson())
summary(smoke_exercise.glm)
## 
## Call:
## glm(formula = Freq ~ 0 + Smoke * Exer, family = poisson(), data = dat)
## 
## Deviance Residuals: 
##  [1]  0  0  0  0  0  0  0  0  0  0  0  0
## 
## Coefficients:
##                             Estimate Std. Error z value Pr(>|z|)    
## SmokeHeavy                   1.01484    0.40499   2.506 0.012217 *  
## SmokeNever                   3.92903    0.09366  41.949  < 2e-16 ***
## SmokeOccas                   1.65660    0.27217   6.087 1.15e-09 ***
## SmokeRegul                   1.38104    0.37327   3.700 0.000216 ***
## Exer[S.Freq]                 0.77811    0.17721   4.391 1.13e-05 ***
## Exer[S.None]                -0.99813    0.27186  -3.672 0.000241 ***
## Smoke[S.Heavy]:Exer[S.Freq]  0.15296    0.37044   0.413 0.679662    
## Smoke[S.Never]:Exer[S.Freq] -0.24123    0.19418  -1.242 0.214124    
## Smoke[S.Occas]:Exer[S.Freq]  0.05020    0.28693   0.175 0.861128    
## Smoke[S.Heavy]:Exer[S.None] -0.01671    0.56796  -0.029 0.976534    
## Smoke[S.Never]:Exer[S.None] -0.04053    0.29589  -0.137 0.891060    
## Smoke[S.Occas]:Exer[S.None]  0.44014    0.40804   1.079 0.280735    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance:  1.3554e+03  on 12  degrees of freedom
## Residual deviance: -1.2212e-14  on  0  degrees of freedom
## AIC: 70.569
## 
## Number of Fisher Scoring iterations: 3

We also compute a null model. The relevant null here is “no interaction” (it is after all the \(\chi^2\) test for independence, i.e. not-interacting).

smoke_exercise.null <- glm(Freq ~ 0 + Smoke + Exer, data=dat, family=poisson())
summary(smoke_exercise.null)
## 
## Call:
## glm(formula = Freq ~ 0 + Smoke + Exer, family = poisson(), data = dat)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1.53148  -0.53993  -0.04637   0.63077   0.86126  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## SmokeHeavy    1.08919    0.30595   3.560 0.000371 ***
## SmokeNever    3.93305    0.08936  44.015  < 2e-16 ***
## SmokeOccas    1.63574    0.23521   6.954 3.54e-12 ***
## SmokeRegul    1.52451    0.24803   6.147 7.92e-10 ***
## Exer[S.Freq]  0.58980    0.09914   5.949 2.70e-09 ***
## Exer[S.None] -1.01964    0.14637  -6.966 3.25e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 1355.4456  on 12  degrees of freedom
## Residual deviance:    5.8015  on  6  degrees of freedom
## AIC: 64.37
## 
## Number of Fisher Scoring iterations: 4

The \(\chi^2\) for independence is then equal to the likelihood test for whether the interaction model fits the data better than the null model:

anova(smoke_exercise.null, smoke_exercise.glm, test="LRT")
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
6 5.801467 NA NA NA
0 0.000000 6 5.801467 0.4457935

The \(p\)-values here aren’t identical to ones above, but they’re very close. (The deviance differs slightly.) And it’s no coincendence that the \(\chi^2\) statistic occurs in both the likelihood-ratio test and the test of (marginal) independence!

The car package provides a convenience function for computing this type of model.

Anova(smoke_exercise.glm, test.statistic="LR", type = 2)
LR Chisq Df Pr(>Chisq)
Smoke 643.052690 4 0.0000000
Exer 73.837138 2 0.0000000
Smoke:Exer 5.801467 6 0.4457935

We can also predict from the null model:

dat$h0 <- predict(smoke_exercise.null, dat, type="response")
xtabs(h0 ~ Smoke + Exer, dat)
Smoke/Exer Freq None Some
Heavy 5.360169 1.072034 4.567797
Never 92.097458 18.419491 78.483051
Occas 9.258475 1.851695 7.889831
Regul 8.283898 1.656780 7.059322

And from the full model:

dat$h1 <- predict(smoke_exercise.glm, dat, type="response")
xtabs(h1 ~ Smoke + Exer, dat)
Smoke/Exer Freq None Some
Heavy 7 1 3
Never 87 18 84
Occas 12 3 4
Regul 9 1 7

This model nails our original data nearly perfectly. Now the reason why this isn’t significantly better becomes apparent when we look at the standard errors:

dat$h1.se <- predict(smoke_exercise.glm, dat, type="response", se.fit = TRUE)$se.fit
xtabs(h1.se ~ Smoke + Exer, dat)
Smoke/Exer Freq None Some
Heavy 2.645751 0.9999952 1.732051
Never 9.327379 4.2426407 9.165151
Occas 3.464102 1.7320507 2.000000
Regul 3.000000 0.9999952 2.645751

By the way, we can see that our null model is indeed similar to the classical test’s null model:

xtabs(h0 ~ Smoke + Exer, dat) - smoke_exercise.chisq$expected
Smoke/Exer Freq None Some
Heavy 0 0 0
Never 0 0 0
Occas 0 0 0
Regul 0 0 0
That’s a diff erence of one tenth of one billionth of an occurrence at its greatest.

And the same holds for the standard errors:

dat$h0.se <- predict(smoke_exercise.glm, dat, type="response", se.fit = TRUE)$se.fit
xtabs(h0.se ~ Smoke + Exer, dat) - smoke_exercise.chisq$stdres
Smoke/Exer Freq None Some
Heavy 1.632685 1.0749956 2.714517
Never 10.989644 4.4731867 7.340270
Occas 2.151865 0.8057224 3.888598
Regul 2.639293 1.5575498 2.676061

And of course, we can visualize all of this:

ggplot(dat, 
       aes(x=Smoke,color=Exer,y=h1,ymin=h1-2*h1.se,ymax=h1+2*h1.se)) + 
    geom_point(aes(y=Freq, shape="Data")) + 
    geom_point(aes(y=h1, shape="Model")) + 
    geom_errorbar() +
    geom_line(aes(group=Exer)) + 
    labs(x="Smoking habits", y="Number of Respondents", color="Exercise habits",
         shape="Point estimate",
         title="Poisson model investigating the independence of smoking and exercise habits",
         subtitle="Error bars represent 95% Wald prediction intervals") + 
    theme_light()

Binomial Test: Logistic Regression

Classical Test: Exact Binomial Test

The binomial test is often used for comparing the number of “succeses” (hits, ‘yes’ answers, etc.) to “failures”, where the null hypothesis that the probability of success is 0.5 (by default; you can change this manually). In R, it can be computed as number of success and total number:

binom.test(50, 100)
## 
##  Exact binomial test
## 
## data:  50 and 100
## number of successes = 50, number of trials = 100, p-value = 1
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
##  0.3983211 0.6016789
## sample estimates:
## probability of success 
##                    0.5

Or as number of success vs. number of failures:

binom.test(c(50,50))
## 
##  Exact binomial test
## 
## data:  c(50, 50)
## number of successes = 50, number of trials = 100, p-value = 1
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
##  0.3983211 0.6016789
## sample estimates:
## probability of success 
##                    0.5

Note the very subtle difference in the input!

For a more complex example, we can examine handedness.

summary(survey$W.Hnd)
##  Left Right  NA's 
##    18   218     1

to be more interesting, let’s set our null hypothesis to “probability of being right-handed is 99%” – i.e. that lefties are exceptionally rare. The classical test gives us:

hand.binom <- binom.test(c(218,18),p=.99)
hand.binom
## 
##  Exact binomial test
## 
## data:  c(218, 18)
## number of successes = 218, number of trials = 236, p-value =
## 5.234e-11
## alternative hypothesis: true probability of success is not equal to 0.99
## 95 percent confidence interval:
##  0.8821360 0.9541722
## sample estimates:
## probability of success 
##              0.9237288

Explicit GLM: Logit (or Probit)

For the exact equivalent to the classical binomial test, we only care what the probability of the Bernoulli trial is and whether it differs from our null hypothesis. This corresponds to an intercept only binomial model. We’ll use the logit link function, but this method can be easily adapted for probit regression.

hand.glm <- glm(W.Hnd ~ 1, data=survey, family=binomial(link="logit"))
summary(hand.glm)
## 
## Call:
## glm(formula = W.Hnd ~ 1, family = binomial(link = "logit"), data = survey)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.2687   0.3983   0.3983   0.3983   0.3983  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   2.4941     0.2452   10.17   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 127.24  on 235  degrees of freedom
## Residual deviance: 127.24  on 235  degrees of freedom
##   (1 observation deleted due to missingness)
## AIC: 129.24
## 
## Number of Fisher Scoring iterations: 5

We can convert logistic coefficients to probabilities with the plogis function:

plogis(coef(hand.glm))
## (Intercept) 
##   0.9237288

(If we just want odds-ratios, then simple exponentiation would suffice.)

This matched our estimate via the classical test. But how do we test against our null hypothesis of 99% probability? First we convert probability to log-odds:

logit.null <- qlogis(.99)
logit.null
## [1] 4.59512

Next, we use Wald confidence intevals to see if that value is included. confint()

confint(hand.glm)
## Waiting for profiling to be done...
##    2.5 %   97.5 % 
## 2.043428 3.010507

It’s not! We can actually see that the previous p-value is the point where the null would be included (confidence intervals can after alll beformed by inverting tests):

confint(hand.glm,level=1-hand.binom$p.value)
## Waiting for profiling to be done...
##      0 %    100 % 
## 1.162023 4.600165

(Some of the decimal digits are off here because it’s a finite sample.)

Proportion Test: (Multinomial) Logistic or Poisson Model

Classical Test: Test for Equality of Proportions

A generalization of the binomial test is the test of proportions. For the simplest case, it’s simply a different way of expressing the binomial test:

prop.test(218,n=236,p=.99,correct=FALSE)
## Warning in prop.test(218, n = 236, p = 0.99, correct = FALSE): Chi-squared
## approximation may be incorrect
## 
##  1-sample proportions test without continuity correction
## 
## data:  218 out of 236, null probability 0.99
## X-squared = 104.7, df = 1, p-value < 2.2e-16
## alternative hypothesis: true p is not equal to 0.99
## 95 percent confidence interval:
##  0.8826712 0.9512130
## sample estimates:
##         p 
## 0.9237288

(We should probably use Yates’ correction here, but our sample size isn’t that small and the various small sample size corrections hide the asymptotic equivance of the tests. Nonetheless, the \(p\)-values differ here because proportion test is an asymptotic/approximate test, while the binomial test is an exact test. )

With prop.test, we can also express various proportions within multiple categories and compare them. For example, we can see if the proportion of lefties differs between the sexes:

hand.sex <- xtabs(~Sex + W.Hnd,data=survey)
hand.sex
Sex/W.Hnd Left Right
Female 7 110
Male 10 108
# swap the order so that "righties" are in the "success" column on the left
prop.test(hand.sex[, c("Right","Left")])
## 
##  2-sample test for equality of proportions with continuity
##  correction
## 
## data:  hand.sex[, c("Right", "Left")]
## X-squared = 0.23563, df = 1, p-value = 0.6274
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.04971453  0.09954794
## sample estimates:
##    prop 1    prop 2 
## 0.9401709 0.9152542

We can also express this as group-wise number of success and totals:

prop.test(c(110,108),n=c(117,118))
## 
##  2-sample test for equality of proportions with continuity
##  correction
## 
## data:  c(110, 108) out of c(117, 118)
## X-squared = 0.23563, df = 1, p-value = 0.6274
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.04971453  0.09954794
## sample estimates:
##    prop 1    prop 2 
## 0.9401709 0.9152542

So now, we have by-sex estimates of the proportions as well as an estimate of the difference. (The point estimate of the difference is implicit in the sample estimates and the 95% confidence interval of the difference.) Note that prop.test doesn’t allow you to specify what the proportion is for multiple groups – prop.test only tests for the equality of proportions between groups.

prop.test will even allow you to have an arbitrary number of categories as rows in your table. For example, we could consider the relationship of handedness to smoking:

hand.smoke <- xtabs(~Smoke + W.Hnd,data=survey)
print(hand.smoke)
##        W.Hnd
## Smoke   Left Right
##   Heavy    1    10
##   Never   13   175
##   Occas    3    16
##   Regul    1    16
prop.test(hand.smoke)
## Warning in prop.test(hand.smoke): Chi-squared approximation may be
## incorrect
## 
##  4-sample test for equality of proportions without continuity
##  correction
## 
## data:  hand.smoke
## X-squared = 2.0307, df = 3, p-value = 0.5661
## alternative hypothesis: two.sided
## sample estimates:
##     prop 1     prop 2     prop 3     prop 4 
## 0.09090909 0.06914894 0.15789474 0.05882353

However, for more than two-groups, prop.test doesn’t provide an estimate of the difference because there are multiple estimated differences.

Explicit GLM: Logit

Now, all this success and failure language suggest that we can express these models as binomial GLMs, and we can. For sex, we have:

hand.sex.binom <-  glm(W.Hnd ~ 0 + Sex, data=na.omit(survey), family=binomial(link="logit"),
                       contrasts = list(Sex="contr.Sum"))
hand.sex.binom.null <-  glm(W.Hnd ~ 1, data=na.omit(survey), family=binomial(link="logit"))
print(summary(hand.sex.binom))
## 
## Call:
## glm(formula = W.Hnd ~ 0 + Sex, family = binomial(link = "logit"), 
##     data = na.omit(survey), contrasts = list(Sex = "contr.Sum"))
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.3754   0.3503   0.3503   0.4172   0.4172  
## 
## Coefficients:
##           Estimate Std. Error z value Pr(>|z|)    
## SexFemale   2.7600     0.4611   5.985 2.16e-09 ***
## SexMale     2.3979     0.3948   6.074 1.25e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 232.897  on 168  degrees of freedom
## Residual deviance:  86.099  on 166  degrees of freedom
## AIC: 90.099
## 
## Number of Fisher Scoring iterations: 5
print(plogis(coef(hand.sex.binom)))
## SexFemale   SexMale 
## 0.9404762 0.9166667
# non-overlap of83% CIs corresponds to 95% CI of diff not crossing zero:
# https://rpubs.com/bbolker/overlapCI
print(plogis(confint(hand.sex.binom,level=0.83)))
## Waiting for profiling to be done...
##               8.5 %    91.5 %
## SexFemale 0.8984303 0.9694946
## SexMale   0.8691164 0.9518829
anova(hand.sex.binom.null, hand.sex.binom, test="Chisq")
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
167 86.45906 NA NA NA
166 86.09853 1 0.3605357 0.5482089
The \(p\)-valu e here doesn’ t exa ctly match t he proportion test, but that’s okay. The point estimates for the group-wise proportions are quite similar, and this model has the advantage of being generative. Besides, prop.test delivers two different answers depending on whether or not we enable Yates’ correction and our \(p\)-value falls right between them:
print(prop.test(hand.sex[ , c(2,1)], correct=FALSE))
## 
##  2-sample test for equality of proportions without continuity
##  correction
## 
## data:  hand.sex[, c(2, 1)]
## X-squared = 0.54351, df = 1, p-value = 0.461
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.04120374  0.09103715
## sample estimates:
##    prop 1    prop 2 
## 0.9401709 0.9152542
print(prop.test(hand.sex[ , c(2,1)], correct=TRUE))
## 
##  2-sample test for equality of proportions with continuity
##  correction
## 
## data:  hand.sex[, c(2, 1)]
## X-squared = 0.23563, df = 1, p-value = 0.6274
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.04971453  0.09954794
## sample estimates:
##    prop 1    prop 2 
## 0.9401709 0.9152542

This suggests that our test is doing something right – it’s not quite as conservative as Yates correction, but because it’s an explicit GLM, it’s generative and so we can plot and make predictions and all that.

Explicit GLM: Poisson

Now, some of you might have noticed that the multi-condition proportion test can be viewed as having “occurrences [out of total events]” instead of “successes” and “failures”

If you were paying attention earlier, this should make you think of the Poisson model. Let’s try to reformulate this as a Poisson model. First, let’s convert our data from single Bernoulli trials to observed occurences in a given time unit (here: everything).

hand.sex.poisson <- glm(Freq ~ 1 + Sex * W.Hnd, data=as.data.frame(hand.sex), family=poisson(),
                        contrasts = list(Sex="contr.Sum"))
hand.sex.poisson.null <- glm(Freq ~ 1 + Sex + W.Hnd, data=as.data.frame(hand.sex), family=poisson(),
                        contrasts = list(Sex="contr.Sum"))
print(summary(hand.sex.poisson))
## 
## Call:
## glm(formula = Freq ~ 1 + Sex * W.Hnd, family = poisson(), data = as.data.frame(hand.sex), 
##     contrasts = list(Sex = "contr.Sum"))
## 
## Deviance Residuals: 
## [1]  0  0  0  0
## 
## Coefficients:
##                             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                  3.40778    0.12777  26.671   <2e-16 ***
## Sex[S.Female]               -0.08458    0.12777  -0.662    0.508    
## W.Hnd[S.Left]               -1.28353    0.12777 -10.046   <2e-16 ***
## Sex[S.Female]:W.Hnd[S.Left] -0.09376    0.12777  -0.734    0.463    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance:  2.0429e+02  on 3  degrees of freedom
## Residual deviance: -1.1102e-15  on 0  degrees of freedom
## AIC: 29.026
## 
## Number of Fisher Scoring iterations: 3
anova(hand.sex.poisson.null, hand.sex.poisson, test="Chisq")
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 0.5462879 NA NA NA
0 0.0000000 1 0.5462879 0.4598384

The \(p\)-values for the likelihood-ratio and for the interaction term lie on either side of the \(p\)-value of uncorrected test of proportions. The LRTs are known to be more conservative than the Wald (coefficient) tests, so this is not surprising within the Poisson model and suggests moreover that the Poisson model largely agrees with the uncorrected test for equality of proportions.

Classical Test: Poisson Test

Now, you’ve seen that the proportion test is effectively just a generalization of the binomial test. And you’ve seen that we can model the proportion test with a Poisson model. So the following facts from R’s documentation for poisson.test should not surprise you:

The one-sample case is effectively the binomial test with a very large n. The two sample case is converted to a binomial test by conditioning on the total event count, and the rate ratio is directly related to the odds in that binomial distribution.

Let’s take this bit by bit:

The one-sample case is effectively the binomial test with a very large n.

Hmmm.

print(summary(survey$W.Hnd))
##  Left Right  NA's 
##    18   218     1
print(binom.test(218,n=236))
## 
##  Exact binomial test
## 
## data:  218 and 236
## number of successes = 218, number of trials = 236, p-value <
## 2.2e-16
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
##  0.8821360 0.9541722
## sample estimates:
## probability of success 
##              0.9237288
print(prop.test(218,n=236))
## 
##  1-sample proportions test with continuity correction
## 
## data:  218 out of 236, null probability 0.5
## X-squared = 167.8, df = 1, p-value < 2.2e-16
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
##  0.8801771 0.9528813
## sample estimates:
##         p 
## 0.9237288
print(poisson.test(218,236))
## 
##  Exact Poisson test
## 
## data:  218 time base: 236
## number of events = 218, time base = 236, p-value = 0.2545
## alternative hypothesis: true event rate is not equal to 1
## 95 percent confidence interval:
##  0.8051693 1.0548307
## sample estimates:
## event rate 
##  0.9237288

Oh wow, those do all line up. The Poisson terminology of “rate” per unit “time” sounds weird, but we can also think of “events” being the same as “sucesses” and “rate” being the same as “probability” and then “unit time” being the same as “observation”.

The two sample case is converted to a binomial test by conditioning on the total event count, and the rate ratio is directly related to the odds in that binomial distribution.

hand.sex <- xtabs(~Sex + W.Hnd,data=survey)
print(hand.sex)
##         W.Hnd
## Sex      Left Right
##   Female    7   110
##   Male     10   108
# reverse the column order so that right-handedness is "sucess"
hand.sex <- hand.sex[, c("Right","Left")]
binom.female <- binom.test(hand.sex["Female",])
binom.male <- binom.test(hand.sex["Male",])
print(sprintf("Ratio of conditional binomial probabilities: %f", binom.female$estimate / binom.male$estimate))
## [1] "Ratio of conditional binomial probabilities: 1.027224"
total_righties <- sum(hand.sex[,"Right"])
total_females <- sum(hand.sex["Female",])
total_males <- sum(hand.sex["Male",])
total_all <- sum(hand.sex)

binom.test(hand.sex[,"Right"], total_righties, total_females / total_all)
## 
##  Exact binomial test
## 
## data:  hand.sex[, "Right"]
## number of successes = 110, number of trials = 218, p-value =
## 0.8923
## alternative hypothesis: true probability of success is not equal to 0.4978723
## 95 percent confidence interval:
##  0.4362578 0.5727901
## sample estimates:
## probability of success 
##              0.5045872
prop <- prop.test(hand.sex)
print(prop)
## 
##  2-sample test for equality of proportions with continuity
##  correction
## 
## data:  hand.sex
## X-squared = 0.23563, df = 1, p-value = 0.6274
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.04971453  0.09954794
## sample estimates:
##    prop 1    prop 2 
## 0.9401709 0.9152542
print(sprintf("Ratio of proportions: %f", prop$estimate[1]/prop$estimate[2]))
## [1] "Ratio of proportions: 1.027224"
print(poisson.test(hand.sex[,"Right"],c(total_females, total_males)))
## 
##  Comparison of Poisson rates
## 
## data:  hand.sex[, "Right"] time base: c(total_females, total_males)
## count1.Female = 110, expected count1 = 108.54, p-value = 0.8923
## alternative hypothesis: true rate ratio is not equal to 1
## 95 percent confidence interval:
##  0.7804746 1.3522292
## sample estimates:
## rate ratio.Female 
##          1.027224

Huh. The \(p\)-values from the binomial test and Poisson test line up perfectly, and ratio of proportions identical across all tests.

The Icy-Chill of Statistics: Modelling the Titanic data

(live interactive demonstration)

If there’s time: Which (G)LMM modelling stragegy works best for RT data?

cf. Lo and Andrews (2015, Frontiers in Psychology doi:10.3389/fpysg.2015.01171)