For these illustrations, we will use the data from the book by Chihara and Hestenberg “Mathematical Statistics with Resampling and R”. The data available at https://sites.google.com/site/chiharahesterberg/data2 or https://www2.math.binghamton.edu/p/people/kargin/math448/start
(The original data was from https://www.bgs.ac.uk/geological-data/datasets/. However, it is not available there anymore)
Here is the example:
A study recommends that the maximum concentration of arsenic in irrigation water be 100 𝜇g/l to prevent a buildup of the chemical that might harm future crop production. How does the water in Bangladesh measure up to this recommendation? Let 𝜇 denote the true mean level of arsenic in Bangladesh wells. We wish to test H0 ∶ 𝜇 = 100 versus HA ∶ 𝜇 > 100.
Bangladesh <- read.csv("./data/bangladesh.csv")
head(Bangladesh)
## Arsenic Chlorine Cobalt
## 1 2400 6.2 0.42
## 2 6 116.0 0.45
## 3 904 14.8 0.63
## 4 321 35.9 0.68
## 5 1280 18.9 0.58
## 6 151 7.8 0.35
summary(Bangladesh)
## Arsenic Chlorine Cobalt
## Min. : 0.5 Min. : 1.00 Min. :0.0500
## 1st Qu.: 6.0 1st Qu.: 5.00 1st Qu.:0.2825
## Median : 22.0 Median : 14.20 Median :0.4100
## Mean : 125.3 Mean : 78.08 Mean :0.5038
## 3rd Qu.: 109.0 3rd Qu.: 55.50 3rd Qu.:0.6300
## Max. :2400.0 Max. :1550.00 Max. :3.1800
## NA's :2 NA's :1
result <- t.test(Bangladesh$Arsenic, mu = 100, alt = "greater", conf.level = 0.95)
result
##
## One Sample t-test
##
## data: Bangladesh$Arsenic
## t = 1.3988, df = 270, p-value = 0.08151
## alternative hypothesis: true mean is greater than 100
## 95 percent confidence interval:
## 95.44438 Inf
## sample estimates:
## mean of x
## 125.3199
Note that p-value is 0.08, so we cannot reject the null hypothesis at \(5\%\) level. We can also see this conclusion from the fact that the (one-sided) confidence interval contains the value \(100\).
The Centers for Disease Control and Prevention (CDC) maintains a database on all babies born in a given year (http://wonder.cdc.gov/natality-current.html).One data set that we will investigate consists of a random sample of 1009 babies born in North Carolina during 2004.
Is there a real difference in the mean weights of North Carolina babies born to nonsmoking and smoking mothers in 2004?
data <- read.csv("./data/NCbirths2004.csv")
head(data)
## ID MothersAge Tobacco Alcohol Gender Weight Gestation Smoker
## 1 1 30-34 No No Male 3827 40 No
## 2 2 30-34 No No Male 3629 38 No
## 3 3 35-39 No No Female 3062 37 No
## 4 4 20-24 No No Female 3430 39 No
## 5 5 25-29 No No Male 3827 38 No
## 6 6 35-39 No No Female 3119 39 No
summary(data)
## ID MothersAge Tobacco Alcohol
## Min. : 1 Length:1009 Length:1009 Length:1009
## 1st Qu.: 253 Class :character Class :character Class :character
## Median : 505 Mode :character Mode :character Mode :character
## Mean : 505
## 3rd Qu.: 757
## Max. :1009
## Gender Weight Gestation Smoker
## Length:1009 Min. :1928 Min. :37.00 Length:1009
## Class :character 1st Qu.:3119 1st Qu.:38.00 Class :character
## Mode :character Median :3430 Median :39.00 Mode :character
## Mean :3448 Mean :39.11
## 3rd Qu.:3771 3rd Qu.:40.00
## Max. :5131 Max. :42.00
babies <- as.data.frame(data, stringsAsFactors = TRUE) # Convert all columns to factor
head(babies)
## ID MothersAge Tobacco Alcohol Gender Weight Gestation Smoker
## 1 1 30-34 No No Male 3827 40 No
## 2 2 30-34 No No Male 3629 38 No
## 3 3 35-39 No No Female 3062 37 No
## 4 4 20-24 No No Female 3430 39 No
## 5 5 25-29 No No Male 3827 38 No
## 6 6 35-39 No No Female 3119 39 No
summary(babies)
## ID MothersAge Tobacco Alcohol
## Min. : 1 Length:1009 Length:1009 Length:1009
## 1st Qu.: 253 Class :character Class :character Class :character
## Median : 505 Mode :character Mode :character Mode :character
## Mean : 505
## 3rd Qu.: 757
## Max. :1009
## Gender Weight Gestation Smoker
## Length:1009 Min. :1928 Min. :37.00 Length:1009
## Class :character 1st Qu.:3119 1st Qu.:38.00 Class :character
## Mode :character Median :3430 Median :39.00 Mode :character
## Mean :3448 Mean :39.11
## 3rd Qu.:3771 3rd Qu.:40.00
## Max. :5131 Max. :42.00
t.test(Weight ~ Tobacco, data = babies, alt = "greater", var.equal = F) #we do not assume that variances are the same, which is the default behavior.
##
## Welch Two Sample t-test
##
## data: Weight by Tobacco
## t = 4.1411, df = 134.01, p-value = 3.04e-05
## alternative hypothesis: true difference in means between group No and group Yes is greater than 0
## 95 percent confidence interval:
## 129.009 Inf
## sample estimates:
## mean in group No mean in group Yes
## 3471.912 3256.910
p-value is very small, so we can reject the null hypothesis at \(\alpha = 0.01\) and conclude that there is a real difference in baby weights for babies born to nonsmoking and smoking mothers.
Inciedentally, we can test whether the variances are the same.
var.test(Weight ~ Tobacco, data = babies)
##
## F test to compare two variances
##
## data: Weight by Tobacco
## F = 0.84538, num df = 897, denom df = 110, p-value = 0.2158
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.628280 1.102546
## sample estimates:
## ratio of variances
## 0.8453818
p-value indicates that we cannot reject the hypothesis that variances are the same.
Do men and women differ in their beliefs about an afterlife? In the 2002 General Social Survey (see case study in Section 1.7 of Chihara-Hestenberg book), participants were asked this question, and of the 684 women who responded, 550 said yes (80.40%), compared to 425 of the 563 men (75.49%).
prop.test(c(550, 425), c(684, 563), correct = TRUE) #this is done with continuity correction (default behavior)
##
## 2-sample test for equality of proportions with continuity correction
##
## data: c(550, 425) out of c(684, 563)
## X-squared = 4.101, df = 1, p-value = 0.04286
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## 0.001251812 0.097166228
## sample estimates:
## prop 1 prop 2
## 0.8040936 0.7548845
According to the p-value, we can reject the null hypothesis at \(5\%\) significance level but not at \(1\%\) level. Some supportive evidence exists but it is not very strong.