Overview

The second part of the project will illustrate the basic inferential data analysis using the ToothGrowth data in the R datasets package. Such analysis contains the basic exploratory data analyses, hypothesis tests of the data, and conlusions and assumptions needed for the conclusions.

Load data

data <- datasets::ToothGrowth

Basic summary of the data

str(data)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
head(data)
##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5
summary(data)
##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000
table(data$supp, data$dose)
##     
##      0.5  1  2
##   OJ  10 10 10
##   VC  10 10 10

Basic Exploratory Data Analysis

g1 <- ggplot(data, aes(x = supp, y = len, fill = supp))
g1 <- g1 + geom_boxplot()
g1

The above plot shows that the tooth length is different with respect to supplement OJ and VC. The tooth length under OJ seems longer than that of VC. A hypothesis test will be conducted to explore this relationship.

g2 <- ggplot(data, aes(x = as.factor(dose), y = len, fill = dose))
g2 <- g2 + geom_boxplot() + xlab("dose")
g2

The above plot shows that the tooth length is different with respect to dose amount of 0.5, 1, and 2. The tooth length under dose 2 seems to be the longest, followed by dose 1. While the tooth length under dose 0.5 seems to be the shortest. Three hypothesis tests will be conducted to explore these relationships.

Hypothesis Tests to compare tooth growth by supp and dose

Hypothesis Tests 1: Comparing tooth growth by supp OJ(orange juice) and VC(vitamin C)

  • \(H_0\): \(\mu_{OJ}\) = \(\mu_{VC}\)
  • \(H_a\): \(\mu_{OJ}\) > \(\mu_{VC}\)
  • \(\alpha\) level: 0.05
hypoTest1 <- t.test(len ~ supp, data = data)
hypoTest1
## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1710156  7.5710156
## sample estimates:
## mean in group OJ mean in group VC 
##         20.66333         16.96333

Since the p-value, 0.0606345, is greater than the \(\alpha\) level, we failed to reject the null hypothesis.

Hypothesis Tests 2: Comparing tooth growth by dose 0.5 and 2

  • \(H_0\): \(\mu_{2}\) = \(\mu_{0.5}\)
  • \(H_a\): \(\mu_{2}\) > \(\mu_{0.5}\)
  • \(\alpha\) level: 0.05
dose05_2 <- subset(data, data$dose %in% c(0.5, 2))
hypoTest2 <- t.test(len ~ dose, data = dose05_2)
hypoTest2
## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -11.799, df = 36.883, p-value = 4.398e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -18.15617 -12.83383
## sample estimates:
## mean in group 0.5   mean in group 2 
##            10.605            26.100

Since the p-value, 4.39752510^{-14}, is smaller than the \(\alpha\) level, we will reject the null hypothesis.

Hypothesis Tests 3: Comparing tooth growth by dose 1 and 2

  • \(H_0\): \(\mu_{2}\) = \(\mu_{1}\)
  • \(H_a\): \(\mu_{2}\) > \(\mu_{1}\)
  • \(\alpha\) level: 0.05
dose1_2 <- subset(data, data$dose %in% c(1, 2))
hypoTest3 <- t.test(len ~ dose, data = dose1_2)
hypoTest3
## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -8.996481 -3.733519
## sample estimates:
## mean in group 1 mean in group 2 
##          19.735          26.100

Since the p-value, 1.906429510^{-5}, is smaller than the \(\alpha\) level, we will reject the null hypothesis.

Hypothesis Tests 4: Comparing tooth growth by dose 0.5 and 1

  • \(H_0\): \(\mu_{1}\) = \(\mu_{0.5}\)
  • \(H_a\): \(\mu_{1}\) > \(\mu_{0.5}\)
  • \(\alpha\) level: 0.05
dose05_1 <- subset(data, data$dose %in% c(0.5, 1))
hypoTest4 <- t.test(len ~ dose, data = dose05_1)
hypoTest4
## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.983781  -6.276219
## sample estimates:
## mean in group 0.5   mean in group 1 
##            10.605            19.735

Since the p-value, 1.268300710^{-7}, is smaller than the \(\alpha\) level, we will reject the null hypothesis.

Conclusions & Assumptions needed for conclusions

Conclusions

In the case of comparing supplement OJ and VC, there is no significant difference in tooth length because the obtained p-value is greater than the \(\alpha\) level 5%. And we failed to reject the null hypothesis. This indicates different supplements have no impacts on tooth length.

In the case of comparing dose amount of 0.5, 1, and 2, there is significant difference in tooth length since the obtained p-values are smaller than the \(\alpha\) level 5%. And we will reject the null hypothesis and favor the alternative hypothesis. These indicate higher amount of dose having longer tooth length.

Assumptions needed for conclusions

  • the guinea pigs are repesentative for the population of guinea pigs
  • dosage and supplements were assigned randomly