Title: Statistical Inference Course Project_Part2 | Author: Anna Huynh | Date: 11/25/2020

knitr::opts_chunk$set(echo = TRUE)

Overview

This project is to investigate the exponential distribution in R and compare it with the Central Limit Theorem (CLT), consisting of two parts:

Part 1: A simulation exercise.
Part 2: Basic inferential data analysis.

Part 2: Basic Inferential Data Analysis Instructions

1. Load the ToothGrowth data

library(datasets)
data(ToothGrowth)

2. Provide a basic summary of the data.

summary(ToothGrowth)

##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

head(ToothGrowth)

##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5

# Plot initial data
qplot(len, dose, data = ToothGrowth, color = supp, facets = .~supp) + 
        geom_smooth(method = "lm") +
        geom_point(data= ToothGrowth, size=3, alpha=1/2)

## `geom_smooth()` using formula 'y ~ x'

plot of chunk unnamed-chunk-2

3. Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose.

3.1. Hypothesis Test to compare tooth growth by dose

Null Hypothesis: True different in means is equal to 0
Alternative Hypothesis: True different in means is greater/less than 0

# subset data per dose type
sub0 <- subset(ToothGrowth, dose == 0.5, select= c("len")) # Half of dose
sub1 <- subset(ToothGrowth, dose == 1, select= c("len")) # One dose
sub2 <- subset(ToothGrowth, dose == 2, select= c("len")) # Two doses

# One-tailed independent t-test with unequal variance
t.test(sub1, sub0, alternative = "greater", paired = FALSE, var.equal = FALSE, 
       conf.level = 0.95) # Half of dose vs. One dose

## 
##  Welch Two Sample t-test
## 
## data:  sub1 and sub0
## t = 6.4766, df = 37.986, p-value = 6.342e-08
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  6.753323      Inf
## sample estimates:
## mean of x mean of y 
##    19.735    10.605

t.test(sub2, sub0, alternative = "greater", paired = FALSE, var.equal = FALSE, 
       conf.level = 0.95) # Half of dose vs. Two doses

## 
##  Welch Two Sample t-test
## 
## data:  sub2 and sub0
## t = 11.799, df = 36.883, p-value = 2.199e-14
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  13.27926      Inf
## sample estimates:
## mean of x mean of y 
##    26.100    10.605

t.test(sub2, sub1, alternative = "greater", paired = FALSE, var.equal = FALSE, 
       conf.level = 0.95) # One dose vs. Two doses

## 
##  Welch Two Sample t-test
## 
## data:  sub2 and sub1
## t = 4.9005, df = 37.101, p-value = 9.532e-06
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  4.17387     Inf
## sample estimates:
## mean of x mean of y 
##    26.100    19.735

Observation: Compare the test statistic to the hypothetical distribution, the p-value is pretty small, we then reject Null Hypothesis

3.2. What if Null Hypothesis is true, even P-value is small (rare event)

# Find the probability of getting a t statistic as large as correlative 
# quantiles generated by Hypothesis Test.
100*pt(q=6.4766, df=(20 + 20 - 2), lower.tail=FALSE)

## [1] 6.332435e-06

100*pt(q=11.799, df=(20 + 20 - 2), lower.tail=FALSE)

## [1] 1.418943e-12

100*pt(q=4.9005, df=(20 + 20 - 2), lower.tail=FALSE)

## [1] 0.0009053701

Observation: If Null Hypothesis were true, we would see this large a test statistic with probabilities much less than 1%, which are rather a small probability. We then reject Null Hypothesis.

3.3. Confidence Interval to compare tooth growth by supplement (supp)

Null Hypothesis: True different in means is equal to 0
Alternative Hypothesis: True different in means is not equal to 0

OJ <- subset(ToothGrowth, supp == "OJ")
VC <- subset(ToothGrowth, supp == "VC")
newset2 <- cbind(VC, OJ)

group_OJ <- ToothGrowth$len[1:30]
group_VC <- ToothGrowth$len[31:60]
diff <- group_OJ - group_VC

t.test(diff)

## 
##  One Sample t-test
## 
## data:  diff
## t = -3.3026, df = 29, p-value = 0.00255
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -5.991341 -1.408659
## sample estimates:
## mean of x 
##      -3.7

t.test(group_VC, group_OJ, paired = TRUE)

## 
##  Paired t-test
## 
## data:  group_VC and group_OJ
## t = 3.3026, df = 29, p-value = 0.00255
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.408659 5.991341
## sample estimates:
## mean of the differences 
##                     3.7

t.test(len ~ I(relevel(supp, 2)), paired = TRUE, data = ToothGrowth)

## 
##  Paired t-test
## 
## data:  len by I(relevel(supp, 2))
## t = -3.3026, df = 29, p-value = 0.00255
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -5.991341 -1.408659
## sample estimates:
## mean of the differences 
##                    -3.7

Observation: The p-value was small (<0.05) and 95 percent confidence interval of either (1.408659, 5.991341) (paired t-test) or (-5.991341, -1.408659) (one sample t-test), which does not contain the hypothesized population mean 0 so we're pretty confident we can safely reject the hypothesis

3.4. Check correlation of supplement across using dosage (Two dosages/ One dosage/ Half of dosage).

# Use two dosages
group2_OJ = ToothGrowth$len[ToothGrowth$supp == 'OJ' & ToothGrowth$dose == 2]
group2_VC = ToothGrowth$len[ToothGrowth$supp == 'VC' & ToothGrowth$dose == 2]

t.test(group2_VC, group2_OJ, alternative = "two.sided", paired = FALSE, 
       var.equal = FALSE, conf.level = 0.95)

## 
##  Welch Two Sample t-test
## 
## data:  group2_VC and group2_OJ
## t = 0.046136, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.63807  3.79807
## sample estimates:
## mean of x mean of y 
##     26.14     26.06

Observation: The p-value (0.9639) is high (>0.05) and test statistic (0.046136) is close to hypothetical mean (0), we then fail to reject Null Hypothesis.

# Use two dosages
group3_OJ = ToothGrowth$len[ToothGrowth$supp == 'OJ' & ToothGrowth$dose == 1]
group3_VC = ToothGrowth$len[ToothGrowth$supp == 'VC' & ToothGrowth$dose == 1]

t.test(group3_VC, group3_OJ, alternative = "two.sided", paired = FALSE, 
       var.equal = FALSE, conf.level = 0.95)

## 
##  Welch Two Sample t-test
## 
## data:  group3_VC and group3_OJ
## t = -4.0328, df = 15.358, p-value = 0.001038
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -9.057852 -2.802148
## sample estimates:
## mean of x mean of y 
##     16.77     22.70

Observation: The p-value (0.001038) is small (<0.05) and 95 percent confidence interval does not contain the hypothesized population mean 0, we then reject Null Hypothesis.

# Use half of dosage
group4_OJ = ToothGrowth$len[ToothGrowth$supp == 'OJ' & ToothGrowth$dose == 0.5]
group4_VC = ToothGrowth$len[ToothGrowth$supp == 'VC' & ToothGrowth$dose == 0.5]

t.test(group4_VC, group4_OJ, alternative = "two.sided", paired = FALSE, 
       var.equal = FALSE, conf.level = 0.95)

## 
##  Welch Two Sample t-test
## 
## data:  group4_VC and group4_OJ
## t = -3.1697, df = 14.969, p-value = 0.006359
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -8.780943 -1.719057
## sample estimates:
## mean of x mean of y 
##      7.98     13.23

Observation: The p-value (0.006359) is small (<0.05) and 95 percent confidence interval does not contain the hypothesized population mean 0, we then reject Null Hypothesis.

4. Conclusions:

4.1. Compare tooth growth by dose (0.5 or 1 or 2)

There was differently influences of dosages to tooth growth (different means). Two dosages was the most impacted to tooth growth, followed by one dosage and half of dosage respectively.

4.2. Compare tooth growth by supplement (OJ or VC)

Both supplement of VC and OJ very likely had similar influence to tooth growth under using two dosages.
However, OJ demonstrated the greater effect than VC when using one or half of dosage only.