Data Analysis in Social Science

Testing differences in means and proportion

two sample hypothesis tests

Example 1 - for means

Import the BES2015 dataset. Use either read.csv or load (depending which file you are importing)

load("C:/Users/poppr/Dropbox (CEMAP)/POL1041_SOC1041/lab9/BES2015.Rda")

Information about the data http://www.britishelectionstudy.com/data-object/post-election-wave-6-of-the-2014-2017-british-election-study-internet-panel/

Imagine that we want to know if the attitude to the duty to vote (dutyToVote2) is different between men and women (gender)?

H0: There is no difference in male and female assessment of duty to vote in the population. H0: mu(men) - mu(women) = 0 H1: There is a difference in male and female assessment of duty to vote in the population. H1: mu(men) - mu(women) != 0

use p-values to test these hypotheses

Step 1: specify alpha alpha = 0.05

Step 2: calculate the test statistic and look up the resulting p-value

# t = ((x_bar1 - x_bar2) - 0)/sqrt((S_1^2)/n1 + (s_2^2/n2))
# t =

We need the degres of freedom

# df = ((s1^2/n1) + (s2^2)/n2)^2) / 1/(n1-1)*((s1^2)/n1)^2 + 1/(n2-1)*((s2^2)/n2)^2
# df =

Lastly, we look up the p-value

Step 3: Draw your conclusion based on the p-value Is the p-value less than the error level?

Alternatively, use t.test()

t.test(BES2015$dutyToVote2 ~ BES2015$gender)

## 
##  Welch Two Sample t-test
## 
## data:  BES2015$dutyToVote2 by BES2015$gender
## t = -9.8707, df = 18336, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.13465320 -0.09003539
## sample estimates:
## mean in group 1 mean in group 2 
##        4.311263        4.423608

Example 2 - for means

Imagine now that we want to test if men’s belief that voting is a civic duty is weaker than women’s belief in the population (dutyToVote2, gender).

H0: mu(men) - mu(women) >= 0 H1: mu(men) - mu(women) < 0

use p-values to test these hypotheses

Step 1: specify alpha alpha = 0.05

Step 2: calculate the test statistic and look up the resulting p-value

# t = ((x_bar1 - x_bar2) - 0)/sqrt((S_1^2)/n1 + (s_2^2/n2))
# t =

We need the degrees of freedom

# df = ((s1^2/n1) + (s2^2)/n2)^2) / 1/(n1-1)*((s1^2)/n1)^2 + 1/(n2-1)*((s2^2)/n2)^2
# df =

Lastly, we look up the p-value

Step 3: Draw your conclusion based on the p-value Is the p-value less than the error level?

Alternatively, use t.test()

t.test(BES2015$dutyToVote2 ~ BES2015$gender, alternative="less")

## 
##  Welch Two Sample t-test
## 
## data:  BES2015$dutyToVote2 by BES2015$gender
## t = -9.8707, df = 18336, p-value < 2.2e-16
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##         -Inf -0.09362237
## sample estimates:
## mean in group 1 mean in group 2 
##        4.311263        4.423608

Example 3 - for means

Imagine that you want to know if there is a difference between younger and older voters regarding the level of trust towards MPs.

H0: mu(old)-mu(young)=0 H1: mu(old)-mu(young)!=0

Generate a new variable called young equal to 1 if Age is lower or equal to 34 and 0 otherwise.

BES2015$young<-ifelse(BES2015$Age<=34,1,0)

use p-values to test these hypotheses

Step 1: specify alpha alpha = 0.05

Step 2: calculate the test statistic and look up the resulting p-value

# t = ((x_bar1 - x_bar2) - 0)/sqrt((S_1^2)/n1 + (s_2^2/n2))
# t =

We need the degres of freedom

# df = ((s1^2/n1) + (s2^2)/n2)^2) / 1/(n1-1)*((s1^2)/n1)^2 + 1/(n2-1)*((s2^2)/n2)^2
# df =

Lastly, we look up the p-value

Step 3: Draw your conclusion based on the p-value Is the p-value less than the error level?

Alternatively, use t.test()

t.test(BES2015$trustMPs~BES2015$young)

## 
##  Welch Two Sample t-test
## 
## data:  BES2015$trustMPs by BES2015$young
## t = -0.39597, df = 4700.9, p-value = 0.6921
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.06728258  0.04467073
## sample estimates:
## mean in group 0 mean in group 1 
##        3.638149        3.649455

Example 4 - for proportions

Time magazine reported the result of a telephone poll of 800 adults on taxes for cigarettes. The respondets were asked whether the tax on cigarettes should be raised to pay for health care reform?" The results of the survey were: non-smokers: 605 respondents, 351 said ‘yes’, p1=0.58 smokers: 195 respondents, 41 said ‘yes’, p2=0.21

H0: p1 - p2 = 0 or p1 = p2 H1: p1 - p2 != 0 or p1 != p2

use p-values to test these hypotheses

Step 1: specify alpha

alpha = 0.05

We need a pooled sample proportion

# p_hat = (41+351)/(195+605)
# p_hat = 392/800
# p_hat = 0.49

Then we caculate the pooled standard error under the null hypothesis

#se = sqrt(p_hat(1-p_hat)(1/n1 + 1/n2))
#se = sqrt(0.49(0.51)(1/195 + 1/605))
# se = sqrt(0.2499 * 0.005128205 +0.001652893)
# se = sqrt (0.2499 * 0.006781098)
#se = 0.04116547

Step 2: calculate the test statistic and look up the resulting p-value (p1 - p2)/se

#0.58-0.21
#0.37/se
#0.37/0.04116547

Test-statistic is 8.988116

Lastly, we look up the p-value

1 - pnorm(8.988116)

## [1] 0

Step 3: Draw your conclusion based on the p-value Is the p-value less than the error level?

Alternatively, prop.test()

prop.test(x = c(41, 195), n = c(351, 605), correct = FALSE)

## 
##  2-sample test for equality of proportions without continuity
##  correction
## 
## data:  c(41, 195) out of c(351, 605)
## X-squared = 50.457, df = 1, p-value = 1.218e-12
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.2556645 -0.1553454
## sample estimates:
##    prop 1    prop 2 
## 0.1168091 0.3223140

Data Analysis in Social Science - lab 9

Raluca Popp

20 March 2018

Testing differences in means and proportion

two sample hypothesis tests

Example 1 - for means

use p-values to test these hypotheses

Example 2 - for means

use p-values to test these hypotheses

Example 3 - for means

use p-values to test these hypotheses

Example 4 - for proportions

use p-values to test these hypotheses