Title: Statistical Inference Course Project_Part2 | Author: Anna Huynh | Date: 11/25/2020


knitr::opts_chunk$set(echo = TRUE)

Overview

This project is to investigate the exponential distribution in R and compare it with the Central Limit Theorem (CLT), consisting of two parts:

Part 2: Basic Inferential Data Analysis Instructions

1. Load the ToothGrowth data

library(datasets)
data(ToothGrowth)

2. Provide a basic summary of the data.

summary(ToothGrowth)
##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000
head(ToothGrowth)
##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5
# Plot initial data
qplot(len, dose, data = ToothGrowth, color = supp, facets = .~supp) + 
        geom_smooth(method = "lm") +
        geom_point(data= ToothGrowth, size=3, alpha=1/2)
## `geom_smooth()` using formula 'y ~ x'

plot of chunk unnamed-chunk-2

3. Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose.

3.1. Hypothesis Test to compare tooth growth by dose

# subset data per dose type
sub0 <- subset(ToothGrowth, dose == 0.5, select= c("len")) # Half of dose
sub1 <- subset(ToothGrowth, dose == 1, select= c("len")) # One dose
sub2 <- subset(ToothGrowth, dose == 2, select= c("len")) # Two doses

# One-tailed independent t-test with unequal variance
t.test(sub1, sub0, alternative = "greater", paired = FALSE, var.equal = FALSE, 
       conf.level = 0.95) # Half of dose vs. One dose
## 
##  Welch Two Sample t-test
## 
## data:  sub1 and sub0
## t = 6.4766, df = 37.986, p-value = 6.342e-08
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  6.753323      Inf
## sample estimates:
## mean of x mean of y 
##    19.735    10.605
t.test(sub2, sub0, alternative = "greater", paired = FALSE, var.equal = FALSE, 
       conf.level = 0.95) # Half of dose vs. Two doses
## 
##  Welch Two Sample t-test
## 
## data:  sub2 and sub0
## t = 11.799, df = 36.883, p-value = 2.199e-14
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  13.27926      Inf
## sample estimates:
## mean of x mean of y 
##    26.100    10.605
t.test(sub2, sub1, alternative = "greater", paired = FALSE, var.equal = FALSE, 
       conf.level = 0.95) # One dose vs. Two doses
## 
##  Welch Two Sample t-test
## 
## data:  sub2 and sub1
## t = 4.9005, df = 37.101, p-value = 9.532e-06
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  4.17387     Inf
## sample estimates:
## mean of x mean of y 
##    26.100    19.735

Observation: Compare the test statistic to the hypothetical distribution, the p-value is pretty small, we then reject Null Hypothesis

3.2. What if Null Hypothesis is true, even P-value is small (rare event)

# Find the probability of getting a t statistic as large as correlative 
# quantiles generated by Hypothesis Test.
100*pt(q=6.4766, df=(20 + 20 - 2), lower.tail=FALSE)
## [1] 6.332435e-06
100*pt(q=11.799, df=(20 + 20 - 2), lower.tail=FALSE)
## [1] 1.418943e-12
100*pt(q=4.9005, df=(20 + 20 - 2), lower.tail=FALSE)
## [1] 0.0009053701

Observation: If Null Hypothesis were true, we would see this large a test statistic with probabilities much less than 1%, which are rather a small probability. We then reject Null Hypothesis.

3.3. Confidence Interval to compare tooth growth by supplement (supp)

OJ <- subset(ToothGrowth, supp == "OJ")
VC <- subset(ToothGrowth, supp == "VC")
newset2 <- cbind(VC, OJ)

group_OJ <- ToothGrowth$len[1:30]
group_VC <- ToothGrowth$len[31:60]
diff <- group_OJ - group_VC

t.test(diff)
## 
##  One Sample t-test
## 
## data:  diff
## t = -3.3026, df = 29, p-value = 0.00255
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -5.991341 -1.408659
## sample estimates:
## mean of x 
##      -3.7
t.test(group_VC, group_OJ, paired = TRUE)
## 
##  Paired t-test
## 
## data:  group_VC and group_OJ
## t = 3.3026, df = 29, p-value = 0.00255
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.408659 5.991341
## sample estimates:
## mean of the differences 
##                     3.7
t.test(len ~ I(relevel(supp, 2)), paired = TRUE, data = ToothGrowth)
## 
##  Paired t-test
## 
## data:  len by I(relevel(supp, 2))
## t = -3.3026, df = 29, p-value = 0.00255
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -5.991341 -1.408659
## sample estimates:
## mean of the differences 
##                    -3.7

Observation: The p-value was small (<0.05) and 95 percent confidence interval of either (1.408659, 5.991341) (paired t-test) or (-5.991341, -1.408659) (one sample t-test), which does not contain the hypothesized population mean 0 so we're pretty confident we can safely reject the hypothesis

3.4. Check correlation of supplement across using dosage (Two dosages/ One dosage/ Half of dosage).

# Use two dosages
group2_OJ = ToothGrowth$len[ToothGrowth$supp == 'OJ' & ToothGrowth$dose == 2]
group2_VC = ToothGrowth$len[ToothGrowth$supp == 'VC' & ToothGrowth$dose == 2]

t.test(group2_VC, group2_OJ, alternative = "two.sided", paired = FALSE, 
       var.equal = FALSE, conf.level = 0.95)
## 
##  Welch Two Sample t-test
## 
## data:  group2_VC and group2_OJ
## t = 0.046136, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.63807  3.79807
## sample estimates:
## mean of x mean of y 
##     26.14     26.06

Observation: The p-value (0.9639) is high (>0.05) and test statistic (0.046136) is close to hypothetical mean (0), we then fail to reject Null Hypothesis.

# Use two dosages
group3_OJ = ToothGrowth$len[ToothGrowth$supp == 'OJ' & ToothGrowth$dose == 1]
group3_VC = ToothGrowth$len[ToothGrowth$supp == 'VC' & ToothGrowth$dose == 1]

t.test(group3_VC, group3_OJ, alternative = "two.sided", paired = FALSE, 
       var.equal = FALSE, conf.level = 0.95)
## 
##  Welch Two Sample t-test
## 
## data:  group3_VC and group3_OJ
## t = -4.0328, df = 15.358, p-value = 0.001038
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -9.057852 -2.802148
## sample estimates:
## mean of x mean of y 
##     16.77     22.70

Observation: The p-value (0.001038) is small (<0.05) and 95 percent confidence interval does not contain the hypothesized population mean 0, we then reject Null Hypothesis.

# Use half of dosage
group4_OJ = ToothGrowth$len[ToothGrowth$supp == 'OJ' & ToothGrowth$dose == 0.5]
group4_VC = ToothGrowth$len[ToothGrowth$supp == 'VC' & ToothGrowth$dose == 0.5]

t.test(group4_VC, group4_OJ, alternative = "two.sided", paired = FALSE, 
       var.equal = FALSE, conf.level = 0.95)
## 
##  Welch Two Sample t-test
## 
## data:  group4_VC and group4_OJ
## t = -3.1697, df = 14.969, p-value = 0.006359
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -8.780943 -1.719057
## sample estimates:
## mean of x mean of y 
##      7.98     13.23

Observation: The p-value (0.006359) is small (<0.05) and 95 percent confidence interval does not contain the hypothesized population mean 0, we then reject Null Hypothesis.

4. Conclusions:

4.1. Compare tooth growth by dose (0.5 or 1 or 2)

4.2. Compare tooth growth by supplement (OJ or VC)