Title: Statistical Inference Course Project_Part2 | Author: Anna Huynh | Date: 11/25/2020
knitr::opts_chunk$set(echo = TRUE)
This project is to investigate the exponential distribution in R and compare it with the Central Limit Theorem (CLT), consisting of two parts:
library(datasets)
data(ToothGrowth)
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
head(ToothGrowth)
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
## 4 5.8 VC 0.5
## 5 6.4 VC 0.5
## 6 10.0 VC 0.5
# Plot initial data
qplot(len, dose, data = ToothGrowth, color = supp, facets = .~supp) +
geom_smooth(method = "lm") +
geom_point(data= ToothGrowth, size=3, alpha=1/2)
## `geom_smooth()` using formula 'y ~ x'
# subset data per dose type
sub0 <- subset(ToothGrowth, dose == 0.5, select= c("len")) # Half of dose
sub1 <- subset(ToothGrowth, dose == 1, select= c("len")) # One dose
sub2 <- subset(ToothGrowth, dose == 2, select= c("len")) # Two doses
# One-tailed independent t-test with unequal variance
t.test(sub1, sub0, alternative = "greater", paired = FALSE, var.equal = FALSE,
conf.level = 0.95) # Half of dose vs. One dose
##
## Welch Two Sample t-test
##
## data: sub1 and sub0
## t = 6.4766, df = 37.986, p-value = 6.342e-08
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 6.753323 Inf
## sample estimates:
## mean of x mean of y
## 19.735 10.605
t.test(sub2, sub0, alternative = "greater", paired = FALSE, var.equal = FALSE,
conf.level = 0.95) # Half of dose vs. Two doses
##
## Welch Two Sample t-test
##
## data: sub2 and sub0
## t = 11.799, df = 36.883, p-value = 2.199e-14
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 13.27926 Inf
## sample estimates:
## mean of x mean of y
## 26.100 10.605
t.test(sub2, sub1, alternative = "greater", paired = FALSE, var.equal = FALSE,
conf.level = 0.95) # One dose vs. Two doses
##
## Welch Two Sample t-test
##
## data: sub2 and sub1
## t = 4.9005, df = 37.101, p-value = 9.532e-06
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 4.17387 Inf
## sample estimates:
## mean of x mean of y
## 26.100 19.735
Observation: Compare the test statistic to the hypothetical distribution, the p-value is pretty small, we then reject Null Hypothesis
# Find the probability of getting a t statistic as large as correlative
# quantiles generated by Hypothesis Test.
100*pt(q=6.4766, df=(20 + 20 - 2), lower.tail=FALSE)
## [1] 6.332435e-06
100*pt(q=11.799, df=(20 + 20 - 2), lower.tail=FALSE)
## [1] 1.418943e-12
100*pt(q=4.9005, df=(20 + 20 - 2), lower.tail=FALSE)
## [1] 0.0009053701
Observation: If Null Hypothesis were true, we would see this large a test statistic with probabilities much less than 1%, which are rather a small probability. We then reject Null Hypothesis.
OJ <- subset(ToothGrowth, supp == "OJ")
VC <- subset(ToothGrowth, supp == "VC")
newset2 <- cbind(VC, OJ)
group_OJ <- ToothGrowth$len[1:30]
group_VC <- ToothGrowth$len[31:60]
diff <- group_OJ - group_VC
t.test(diff)
##
## One Sample t-test
##
## data: diff
## t = -3.3026, df = 29, p-value = 0.00255
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## -5.991341 -1.408659
## sample estimates:
## mean of x
## -3.7
t.test(group_VC, group_OJ, paired = TRUE)
##
## Paired t-test
##
## data: group_VC and group_OJ
## t = 3.3026, df = 29, p-value = 0.00255
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.408659 5.991341
## sample estimates:
## mean of the differences
## 3.7
t.test(len ~ I(relevel(supp, 2)), paired = TRUE, data = ToothGrowth)
##
## Paired t-test
##
## data: len by I(relevel(supp, 2))
## t = -3.3026, df = 29, p-value = 0.00255
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -5.991341 -1.408659
## sample estimates:
## mean of the differences
## -3.7
Observation: The p-value was small (<0.05) and 95 percent confidence interval of either (1.408659, 5.991341) (paired t-test) or (-5.991341, -1.408659) (one sample t-test), which does not contain the hypothesized population mean 0 so we're pretty confident we can safely reject the hypothesis
# Use two dosages
group2_OJ = ToothGrowth$len[ToothGrowth$supp == 'OJ' & ToothGrowth$dose == 2]
group2_VC = ToothGrowth$len[ToothGrowth$supp == 'VC' & ToothGrowth$dose == 2]
t.test(group2_VC, group2_OJ, alternative = "two.sided", paired = FALSE,
var.equal = FALSE, conf.level = 0.95)
##
## Welch Two Sample t-test
##
## data: group2_VC and group2_OJ
## t = 0.046136, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3.63807 3.79807
## sample estimates:
## mean of x mean of y
## 26.14 26.06
Observation: The p-value (0.9639) is high (>0.05) and test statistic (0.046136) is close to hypothetical mean (0), we then fail to reject Null Hypothesis.
# Use two dosages
group3_OJ = ToothGrowth$len[ToothGrowth$supp == 'OJ' & ToothGrowth$dose == 1]
group3_VC = ToothGrowth$len[ToothGrowth$supp == 'VC' & ToothGrowth$dose == 1]
t.test(group3_VC, group3_OJ, alternative = "two.sided", paired = FALSE,
var.equal = FALSE, conf.level = 0.95)
##
## Welch Two Sample t-test
##
## data: group3_VC and group3_OJ
## t = -4.0328, df = 15.358, p-value = 0.001038
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -9.057852 -2.802148
## sample estimates:
## mean of x mean of y
## 16.77 22.70
Observation: The p-value (0.001038) is small (<0.05) and 95 percent confidence interval does not contain the hypothesized population mean 0, we then reject Null Hypothesis.
# Use half of dosage
group4_OJ = ToothGrowth$len[ToothGrowth$supp == 'OJ' & ToothGrowth$dose == 0.5]
group4_VC = ToothGrowth$len[ToothGrowth$supp == 'VC' & ToothGrowth$dose == 0.5]
t.test(group4_VC, group4_OJ, alternative = "two.sided", paired = FALSE,
var.equal = FALSE, conf.level = 0.95)
##
## Welch Two Sample t-test
##
## data: group4_VC and group4_OJ
## t = -3.1697, df = 14.969, p-value = 0.006359
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -8.780943 -1.719057
## sample estimates:
## mean of x mean of y
## 7.98 13.23
Observation: The p-value (0.006359) is small (<0.05) and 95 percent confidence interval does not contain the hypothesized population mean 0, we then reject Null Hypothesis.