Summary

In this report the ToothGrowth dataset in R was used to perform a basic inferential analysis, which has the following four parts:

1. Load the ToothGrowth data and perform some basic exploratory data analyses.
2. Provide a basic summary of the data.
3. Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose.
4. Conclusions and assumptions.

Inferential Analysis

1. Load the ToothGrowth dataset and perform some basic exploratory data analysis

Load ToothGrowth dataset.

library(datasets)
library(ggplot2)
data(ToothGrowth)

Display the dataset’s structure.

str(ToothGrowth)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...

Display a summary of the dataset.

summary(ToothGrowth)
##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

The dataset has 60 observations. The variable len is the length of odontoblasts (teeth) in each of 10 guinea pigs at each of three dose levels of Vitamin C (0.5, 1 and 2) with each of two delivery methods (the variable supp, which is either orange juice or ascorbic acid).

Display the unique values for the variable len.

unique(ToothGrowth$len)
##  [1]  4.2 11.5  7.3  5.8  6.4 10.0 11.2  5.2  7.0 16.5 15.2 17.3 22.5 13.6
## [15] 14.5 18.8 15.5 23.6 18.5 33.9 25.5 26.4 32.5 26.7 21.5 23.3 29.5 17.6
## [29]  9.7  8.2  9.4 19.7 20.0 25.2 25.8 21.2 27.3 22.4 24.5 24.8 30.9 29.4
## [43] 23.0

Display the unique values for the variable dose.

unique(ToothGrowth$dose)
## [1] 0.5 1.0 2.0

Display the unique values for the variable sup.

unique(ToothGrowth$sup)
## [1] VC OJ
## Levels: OJ VC

2. Provide a basic summary of the data

Convert the dose numerical variable into factors for plotting purposes.

ToothGrowth$dose<-as.factor(ToothGrowth$dose)

Plot tooth length (len) vs. supplement delivery method (supp) by the dose amount (dose).

ggplot(data=ToothGrowth, aes(x=supp, y=len)) +
  geom_boxplot(aes(fill=supp)) + xlab("Supplement Delivery") +
  ylab("Tooth Length") + facet_grid(~ dose) +
  ggtitle("Tooth Length vs. Delivery Method by Dose Amount") + 
  theme(plot.title = element_text(lineheight=.5, hjust=0.5, face="bold"))

Plot the tooth length (len) vs. the dose amount (dose) by the supplement delivery method (supp).

ggplot(data=ToothGrowth, aes(x=dose, y=len)) +
  geom_boxplot(aes(fill=dose)) + xlab("Dose Amount of Vitamin C") + 
  ylab("Tooth Length") + facet_grid(~ supp) + 
  ggtitle("Tooth Length vs. Dose Amount by Delivery Method") + 
  theme(plot.title = element_text(lineheight=.5, hjust=0.5, face="bold"))

3. Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose

Use t-test to compare the tooth growth by supplement.

t.test(len ~ supp, data=ToothGrowth)
## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1710156  7.5710156
## sample estimates:
## mean in group OJ mean in group VC 
##         20.66333         16.96333

The p-value of this test is 0.06063, which is greater that than 0.05. The confidence interval of the test contains zero, which implies that the supplement type (orange juice vs. ascorbic acid) does not have a significant impact on the tooth growth.

Use t-test to compare the tooth growth as a function of dose. Analyze different pairs of the dose values.

3.1. Analyze dose = 0.5 vs. dose = 1.0

tg.subset <- subset(ToothGrowth, ToothGrowth$dose %in% c(1.0,0.5))
t.test(len ~ dose, data=tg.subset)
## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.983781  -6.276219
## sample estimates:
## mean in group 0.5   mean in group 1 
##            10.605            19.735

The p-value of this test is 1.268e-07, which is significantly greater that than 0.05. The confidence interval of the test does not contain zero, which implies that the doze difference between 0.5 and 1.0 has a significant impact on the tooth growth.

3.2. Analyze dose = 1.0 vs. dose = 2.0

tg.subset <- subset(ToothGrowth, ToothGrowth$dose %in% c(1.0,2.0))
t.test(len ~ dose, data=tg.subset)
## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -8.996481 -3.733519
## sample estimates:
## mean in group 1 mean in group 2 
##          19.735          26.100

The p-value of this test is 1.906-05, which is significantly greater that than 0.05. The confidence interval of the test does not contain zero, which implies that the doze difference between 1.0 and 2.0 has a significant impact on the tooth growth.

3.3. Analyze dose = 0.5 vs. dose = 2.0

tg.subset <- subset(ToothGrowth, ToothGrowth$dose %in% c(0.5,2.0))
t.test(len ~ dose, data=tg.subset)
## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -11.799, df = 36.883, p-value = 4.398e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -18.15617 -12.83383
## sample estimates:
## mean in group 0.5   mean in group 2 
##            10.605            26.100

The p-value of this test is 4.398e-14, which is significantly greater that than 0.05. The confidence interval of the test does not contain zero, which implies that the doze difference between 0.5 and 2.0 has a significant impact on the tooth growth.

4. Conclusions and assumptions

The analysis assumed that
1. the sample is representative of the population,
2. guinea pigs are randomly assigned to different dose level categories and supplement type, and
3. the distribution of the sample means follows the Central Limit Theorem.

Based of the t-test analysis, it is concluded that
1. the tooth length is positively correlated with the dosage of vitamin C and
2. the supplement delivery method is found to have no significant effect.