Data Analysis in Social Science

Null hypothesis significance testing

Example 1

Suppose a coin toss turns up 12 heads out of 20 trials. At .05 significance level, can one reject the null hypothesis that the coin toss is fair?

Calculate probability

12/20

## [1] 0.6

STEPS:

State the null and alternative hypotheses H0: p = 0.5 H1: p != 0.5
Choose level of significance 0.05
Find critical value

qt(c(0.025, 0.975), df=19)

## [1] -2.093024  2.093024

Calculate test statistics

# pbar = 12/20           # sample proportion 
# p = .5                 # hypothesized value 
# n = 20                 # sample size 
# z                      # test statistic 
# z = (pbar-p)/sqrt(p*(1-p)/n) 
z = 0.89

Draw your conclusion Because 0.89 is not in the rejection region (defined by the critical values), we fail to reject the null hypothesis

OR

prop.test(12, 20, p=0.5, correct=FALSE)

## 
##  1-sample proportions test without continuity correction
## 
## data:  12 out of 20, null probability 0.5
## X-squared = 0.8, df = 1, p-value = 0.3711
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
##  0.3865815 0.7811935
## sample estimates:
##   p 
## 0.6

Looking at the X-Squared value (0.8), the value is not in the rejection region, therefore we fail to reject H0. We can also say that we cannot reject the null hypothesis at a 0.05 significance level.

Example 2

David claims that the weather forecasts produced by the local radio are not better than those achieved by tossing a fair coin and predicting rain is a head is obtained or no rain if a tail is obtained. He records the weather for 30 randomly selected days. The local radio forecast is correct on 21 of these days.

Calculate the proportion

21/30

## [1] 0.7

p = 0.7

STEPS:

1 State the null and alternative hypotheses H0: p = 0.5 H1: p >= 0.5

Choose level of significance 0.05
Find critical value

qt(c(0.025, 0.975), df=29)

## [1] -2.04523  2.04523

4 Calculate test statistics

# pbar = 21/30           # sample proportion 
# p = .5                 # hypothesized value 
# n = 30                 # sample size 
# z                      # test statistic 
# z = (pbar-p)/sqrt(p*(1-p)/n) 
z = (0.7 - 0.5)/ sqrt((0.5*0.5)/30)
z = 2.19089

5 Draw your conclusion z = 2.19089 is in the rejection region, therefore we can reject the null hypothesis at at 0.05 significance level

OR

prop.test(21, 30, correct = FALSE, alternative = "greater")

## 
##  1-sample proportions test without continuity correction
## 
## data:  21 out of 30, null probability 0.5
## X-squared = 4.8, df = 1, p-value = 0.01423
## alternative hypothesis: true p is greater than 0.5
## 95 percent confidence interval:
##  0.5506175 1.0000000
## sample estimates:
##   p 
## 0.7

The value of the test statistic, 4.8, is outside the rejection regions set by the critical values. Running a confidence interval hypothesis testing, the value stated by H0 is outside the confidence interval, therefore we can reject H0 and accept H1.

Example 3

An e-commerce research company claims that 60% or more graduate students have bought merchandise on-line. A consumer group is suspicious of the claim and thinks that the proportion is lower than 60%. A random sample of 80 graduate students show that only 22 students have ever done so. Is there enough evidence to show that the true porportion is lower than 60%?

Calculate the proportion

22/80

## [1] 0.275

p = 0.275

STEPS:

1 State the null and alternative hypotheses H0 p >= 60 H1 p < 60

2 Choose level of significance 0.05

3 Find critical value

qt(c(0.025, 0.975), df=79)

## [1] -1.99045  1.99045

4 Calculate test statistics

 # pbar = 0.275           # sample proportion 
 # p = .6                 # hypothesized value 
 # n = 80                 # sample size 
 # z = (pbar-p)/sqrt(p*(1-p)/n) 
 z = (0.275 - 0.6) / sqrt(0.6*(1-0.6)/80)
 z = -6.01                # test statistic

5 Draw your conclusion We can reject the null hypothesis, because the value of the test statistic falls in the rejection region.

OR

prop.test(22, 80, p=0.6, correct=FALSE, alternative = "less")

## 
##  1-sample proportions test without continuity correction
## 
## data:  22 out of 80, null probability 0.6
## X-squared = 35.208, df = 1, p-value = 1.481e-09
## alternative hypothesis: true p is less than 0.6
## 95 percent confidence interval:
##  0.0000000 0.3634549
## sample estimates:
##     p 
## 0.275

Example 4

Assume we take a randomm sample of 1,000 students and measure how many hours of sleep they get on average over the course of a week. We find a mean of 8 hours and a standard deviation 2 hours. Assume further that the University administration believes that the typical Exeter student does not sleep 6 hours per night. However, they want to put this hypothesis to the test.

First, we generate the data:

set.seed(5678)
sleep <- rnorm(1000, mean = 8, sd = 2)

STEPS:

1 State the null and alternative hypotheses H0 mu = 6 H1 mu != 6

2 Choose level of significance 95

3 Find critical value

qt(c(0.025, 0.975), df=999)

## [1] -1.962341  1.962341

4 Calculate test statistics

# xbar = 8
# mu = 6
# t = (xbar - mu)/(s/sqrt(n))
# t = 8 - 6/2/(31.62)
# t = 31.62

5 Draw your conclusion

Based on the value of the test statistic, we reject H0 and accept H1.

OR

t.test(sleep, mu = 6)

## 
##  One Sample t-test
## 
## data:  sleep
## t = 31.546, df = 999, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 6
## 95 percent confidence interval:
##  7.883581 8.133466
## sample estimates:
## mean of x 
##  8.008523

Example 5

The administrator of a hospital states that on weekends the average wait time for emergency room visits is 10 minutes. Based on discussions Sam had with friends who have complained on how long they waited to be seen in the ER over a weekend, Sam dispute the administrator’s claim. Over the course of a few weekends he records the wait time for 40 randomly selected patients. The average wait time for these 40 patients is 11 minutes with a standard deviation of 3 minutes. Is there enough evidence to support the hypothesis that the average ER wait time exceeds 10 minutes?

First, generate the data:

wait <- rnorm(40, 11, 3)

STEPS:

1 State the null and alternative hypotheses H0: mu = 10 H1: mu > 10

2 Choose level of significance 0.95

3 Find critical value

qt(c(0.025, 0.975), df=39)

## [1] -2.022691  2.022691

4 Calculate test statistics

# xbar = 11
# mu = 10
# t = (xbar - mu)/(s/sqrt(n))
# t = 2.1081852

5 Draw your conclusion Based on the value of test statistic, we reject H0 at a significance value alpha = 0.05

OR

t.test(wait, mu=10, alternative = "greater")

## 
##  One Sample t-test
## 
## data:  wait
## t = 1.9568, df = 39, p-value = 0.02878
## alternative hypothesis: true mean is greater than 10
## 95 percent confidence interval:
##  10.11724      Inf
## sample estimates:
## mean of x 
##  10.84362

We reject H0 and accept H1

Data Analysis in Social Science - lab 8

Raluca Popp

13 March 2018

Null hypothesis significance testing

Example 1

STEPS:

OR

Example 2

STEPS:

OR

Example 3

STEPS:

OR

Example 4

STEPS:

OR

Example 5

STEPS:

OR