Overview

The project consists of two parts: 1.A simulation exercise 2.Basic inferential data analysis.

In Part 2, Basic inferential data analysis will be performed on the ToothGrowth dataset.

Load dataset and explore

data(ToothGrowth)
head(ToothGrowth)
##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5
dim(ToothGrowth)
## [1] 60  3

Summary

summary(ToothGrowth)
##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

Plot dataset

par(mfrow=c(1,3))
# plot len~supp
boxplot(len~supp, ToothGrowth, col = "green", ylab = "Tooth length", xlab = "Supplement")

# plot len~dose
boxplot(len~dose, ToothGrowth, col = "blue",ylab = "Tooth length", xlab = "Dose")

# plot len~supp*dose
boxplot(len~supp*dose, ToothGrowth, col = c("green","blue"),ylab = "Tooth length", 
        xlab = "Supplement and Dose")

Hypothesis and testing

# Hypothesis 1: Supplemenet does not affect tooth length.
test_supp<- t.test(len~supp, ToothGrowth)
print(test_supp)
## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1710156  7.5710156
## sample estimates:
## mean in group OJ mean in group VC 
##         20.66333         16.96333
# p value > 0.05, and 0 is included in the confidence interval
# therefore DO NOT reject hypothesis
# Conclusion 1: Supplemenet does not affect tooth length.

# Hypothesis 2: Supplemenet does not affect tooth length with dose at 0.5.
test_supp_dose0.5<- t.test(len~supp, ToothGrowth[ToothGrowth$dose == 0.5,])
print(test_supp_dose0.5)
## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 3.1697, df = 14.969, p-value = 0.006359
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.719057 8.780943
## sample estimates:
## mean in group OJ mean in group VC 
##            13.23             7.98
# p value < 0.05, and 0 is not included in the confidence interval
# therefore REJECT hypothesis
# Conclusion 2: Supplemenet at a dose of 0.5 affects tooth length.

# Hypothesis 3: Supplemenet does not affect tooth length with dose at 1.
test_supp_dose1<- t.test(len~supp, ToothGrowth[ToothGrowth$dose == 1,])
print(test_supp_dose1)
## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 4.0328, df = 15.358, p-value = 0.001038
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  2.802148 9.057852
## sample estimates:
## mean in group OJ mean in group VC 
##            22.70            16.77
# p value < 0.05, and 0 is not included in the confidence interval
# therefore REJECT hypothesis
# Conclusion 3: Supplemenet at a dose of 1 affects tooth length.

# Hypothesis 4: Supplemenet does not affect tooth length with dose at 2.
test_supp_dose2<- t.test(len~supp, ToothGrowth[ToothGrowth$dose == 2,])
print(test_supp_dose2)
## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = -0.046136, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.79807  3.63807
## sample estimates:
## mean in group OJ mean in group VC 
##            26.06            26.14
# p value > 0.05, and 0 is not included in the confidence interval
# therefore DO NOT reject hypothesis
# Conclusion 4: Supplemenet at a dose of 2 does not affect tooth length.

Conclusions

Conclusion 1: Using different supplement (QJ or VC) does not affect tooth length. Conclusion 2-3: When taking the dose of each supplement (0.5, 1) into account, There were significant differences in tooth length between the 2 supplements. Conclusion 4: Higher dose of supplement (2) does not affect tooth length.

Assumptions

Assumption 1: the dataset used are representative of the population. Assumption 2: a siginificant level of 0.05 is used. Assumption 3: a t test with unpaired, unequal variance was assumed.