Tests for Proportions

One Sample Tests for Proportion

Motivating Example: Marriage Equality

In the national debate on same-sex marriage, it is commonly stated that half of all Americans favor same-sex marriage. In 2014, Pew Research conducted a poll of millennials* and found that 67% answered yes when asked: Do you support same-sex marriage? The poll was a random sample of 75 millennials, 50 were in support.

Does this poll provide convincing evidence that the support from millennials for same-sex marriage is higher than that of the larger general population of Americans?

Note: The Economist (in 2018): “generations are squishy concepts”, but the 1981 to 1996 birth cohort is a “widely accepted” definition for millennials.

# EXAMPLE: Marriage Equality 
x_obs<-50
n<-75
p0<-.5

Null

\(H_0: p=0.5\)

Millennials support same-sex marriage at the same rate as the larger general population.

Alternative

\(H_A> p=0.5\)

Millennials support same-sex marriage at a higher rate than the larger general population.

OPTION 1: EXACT BINOMIAL TEST

Uses the theoretic probabilities calculated by the binomial distribution.

## EXACT BINOMIAL
# CODE FOR R FUNCTION
binom.test(x_obs, n, p0, alternative = "greater")
## 
##  Exact binomial test
## 
## data:  x_obs and n
## number of successes = 50, number of trials = 75, p-value =
## 0.002614
## alternative hypothesis: true probability of success is greater than 0.5
## 95 percent confidence interval:
##  0.5665713 1.0000000
## sample estimates:
## probability of success 
##              0.6666667
# PLOT
# create support 
x<-seq(0, n, by=1)
y<-dbinom(x, n, p0)

binomDF<-data.frame(x, y, pval=x>=x_obs)

#install.packages("tidyverse")
library(tidyverse)
ggplot(binomDF, aes(x, y, fill=pval))+
  geom_col()+
  theme_bw()

OPTION 2: BINOMIAL SIMULATION

Uses the rbinom function to simulate draws for the binomial simulation (no theoretic calculations needed).

## SIMULATED BINOMIAL
nsim<-10000
nullDist<-data.frame(x=rbinom(nsim, n, p0))
head(nullDist)
##    x
## 1 35
## 2 35
## 3 41
## 4 39
## 5 43
## 6 35
head(nullDist>=x_obs)
##          x
## [1,] FALSE
## [2,] FALSE
## [3,] FALSE
## [4,] FALSE
## [5,] FALSE
## [6,] FALSE
mean(nullDist>=x_obs)
## [1] 0.0028
# PLOT THE SIMULATED NULL DISTR
ggplot(nullDist, aes(x))+
  geom_histogram()+
  theme_bw()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# "AS OR MORE EXTREME"
pvalSim<-nullDist%>%
  mutate(pval=x>=x_obs)

# PLOT THE SIMUALTED PVAL
ggplot(pvalSim, aes(x, fill=pval))+
  geom_histogram()+
  theme_bw()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

OPTION 3: NORMAL APPROXIMATION (Z-test for proportion)

Use information from the standard normal distribution!

## NORMAL DISTR
## LARGE SAMPLE APPROX
# Conditions met
n*p0
## [1] 37.5
n*(1-p0)
## [1] 37.5
SE0<-sqrt(p0*(1-p0)/n)
SE0
## [1] 0.05773503
p_hat<-x_obs/n
p_hat
## [1] 0.6666667
test_stat<-(p_hat-p0)/SE0
test_stat
## [1] 2.886751
pnorm(test_stat, lower.tail = FALSE)
## [1] 0.001946209

Two Sample Test for Propotions (aka Two Sample Z-test for Proportions)

See worksheet from class! We’ll be collected data in class!