In this assignment, we will perform basic explortory data analysis on the ToothGrowth dataset in R. The dataset contains data measuring the effect of Vitamin C on Tooth Growth in Guinea Pigs. We will also conduct hypothesis testing on the data to see whether we can make any significant conclusion to show the effectiveness of the supplement type and dosage to tooth growth in guinea pigs.
The dataset recorded the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, (orange juice or ascorbic acid (a form of vitamin C and coded as VC).
The data is stored in a data frame with 60 observations on 3 variables:
| Column Name | Class | Description |
|---|---|---|
| len | numeric | Tooth length |
| supp | factor | Supplement type (VC or OJ) |
| dose | numeric | Dose in milligrams/day (0.5, 1 or 2) |
Information of the dataset is obtained from the link below:
https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/ToothGrowth.html
data(ToothGrowth)
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: Factor w/ 3 levels "0.5","1","2": 1 1 1 1 1 1 1 1 1 1 ...
dfToothGrowth <- ToothGrowth %>%
group_by(supp, dose) %>%
summarise(Mean = mean(len), Num = length(len), Var = var(len), StdDev = sd(len),
Skew = skewness(len),ExcKurt = kurtosis(len)-3)
dfToothGrowth
## Source: local data frame [6 x 8]
## Groups: supp [?]
##
## supp dose Mean Num Var StdDev Skew ExcKurt
## (fctr) (fctr) (dbl) (int) (dbl) (dbl) (dbl) (dbl)
## 1 OJ 0.5 13.23 10 19.889000 4.459709 0.5131269 -0.9872929
## 2 OJ 1 22.70 10 15.295556 3.910953 -0.7970101 -0.1357745
## 3 OJ 2 26.06 10 7.049333 2.655058 0.4316050 -0.6367104
## 4 VC 0.5 7.98 10 7.544000 2.746634 0.1558587 -1.5270309
## 5 VC 1 16.77 10 6.326778 2.515309 1.0839861 0.7978228
## 6 VC 2 26.14 10 23.018222 4.797731 0.1880107 -0.8173491
As expected, there are a total of 60 observations. Each record for each guinea pig making every record independent. Supp (Supplyment Type) and dose (Dosage) have 2 and 3 factor levels respectively. This tallies with the description of the dataset. Also observed and checked that there are 10 records (number of guinea pigs) in each unique group of Supplement Type and Dosage.
The boxplot showed the following:
For hypothesis testing below, we seek to answer and confirm, with confidence, observations we see on the box plot.
Specifically,
TG0 <- subset(ToothGrowth, dose %in% c(0.5))
TG1 <- subset(ToothGrowth, dose %in% c(1))
TG2 <- subset(ToothGrowth, dose %in% c(2))
OJ01 <- subset(ToothGrowth, dose %in% c(0.5, 1) & supp == 'OJ')
OJ12 <- subset(ToothGrowth, dose %in% c(1, 2) & supp == 'OJ')
VC01 <- subset(ToothGrowth, dose %in% c(0.5, 1) & supp == 'VC')
VC12 <- subset(ToothGrowth, dose %in% c(1, 2) & supp == 'VC')
Data are filtered into 2-sample data subset for hypothesis testing.
\(\mu_1\) = population mean length of Tooth Growth by Orange Juice
\(\mu_2\) = population mean length of Tooth Growth by Ascorbic Acid
\(H_o\): \(\mu_1\) - \(\mu_2\) = 0
\(H_a\): \(\mu_1\) - \(\mu_2\) \(\neq\) 0
\(\alpha\) = 0.05
t.test(len ~ supp, paired=FALSE, var.equal=FALSE, data=TG0)$p.value
## [1] 0.006358607
Since the p-value of 0.006 is less than \(\alpha\) = 0.05, we reject the null hypothesis.
At 5% level of significance, the data does provide sufficient evidence that the mean length of tooth growth by orange juice and ascorbic acid are different.
t.test(len ~ supp, paired=FALSE, var.equal=FALSE, data=TG1)$p.value
## [1] 0.001038376
Since the p-value of 0.001 is less than \(\alpha\) = 0.05, we reject the null hypothesis.
At 5% level of significance, the data does provide sufficient evidence that the mean length of tooth growth by orange juice and ascorbic acid are different.
t.test(len ~ supp, paired=FALSE, var.equal=FALSE, data=TG2)$p.value
## [1] 0.9638516
Since the p-value of 0.964 is greater than \(\alpha\) = 0.05, we cannot reject the null hypothesis.
At 5% level of significance, the data does not provide sufficient evidence that the mean length of tooth growth by orange juice and ascorbic acid are different.
\(\mu_1\) = population mean length of Tooth Growth with 0.5 mg dosage
\(\mu_2\) = population mean length of Tooth Growth with 1 mg dosage
\(H_o\): \(\mu_1\) - \(\mu_2\) = 0
\(H_a\): \(\mu_1\) - \(\mu_2\) \(\neq\) 0
\(\alpha\) = 0.05
t.test(len ~ dose, paired=FALSE, var.equal=FALSE, data=OJ01)$p.value
## [1] 8.784919e-05
Since the p-value of 0 is less than \(\alpha\) = 0.05, we reject the null hypothesis.
At 5% level of significance, the data does provide sufficient evidence that the mean length of tooth growth with 0.5 and 1 mg dosage for supplement type orange juice are different.
t.test(len ~ dose, paired=FALSE, var.equal=FALSE, data=VC01)$p.value
## [1] 6.811018e-07
Since the p-value of 0 is less than \(\alpha\) = 0.05, we reject the null hypothesis.
At 5% level of significance, the data does provide sufficient evidence that the mean length of tooth growth with 0.5 and 1 mg dosage for supplement type ascorbic acid are different.
\(\mu_1\) = population mean length of Tooth Growth with 1 mg dosage
\(\mu_2\) = population mean length of Tooth Growth with 2 mg dosage
\(H_o\): \(\mu_1\) - \(\mu_2\) = 0
\(H_a\): \(\mu_1\) - \(\mu_2\) \(\neq\) 0
\(\alpha\) = 0.05
t.test(len ~ dose, paired=FALSE, var.equal=FALSE, data=OJ12)$p.value
## [1] 0.03919514
Since the p-value of 0.039 is less than \(\alpha\) = 0.05, we reject the null hypothesis.
At 5% level of significance, the data does provide sufficient evidence that the mean length of tooth growth with 1 and 2 mg dosage for supplement type orange juice are different.
t.test(len ~ dose, paired=FALSE, var.equal=FALSE, data=VC12)$p.value
## [1] 9.155603e-05
Since the p-value of 0 is less than \(\alpha\) = 0.05, we reject the null hypothesis.
At 5% level of significance, the data does provide sufficient evidence that the mean length of tooth growth with 1 and 2 mg dosage for supplement type ascorbic acid are different.
In hypothesis test I, there is sufficient evidence to conclude that at a lower dosage of 0.5 to 1 mg, orange juice has more impact to the length of tooth growth than ascorbic acid. At a dosage of 2 mg, the evidence is not sufficient to make a similar conclusion. It appears that there is a limit to tooth growth and cannot be any longer beyond a length of 26. Maybe data can be collected at a dosage of 1.5, 2.5 and 3 mg to have a more complete analysis.
In hypothesis test II and III, there is sufficient evidence to conclude that, for the given dosage of 0.5 mg to 2 mg, the increase in dosage of each supplement type, will increase the length of tooth growth. Similar to test I, more data has to be collected and analysed to obtain conclusion beyond dosage of 2 mg.
In short, both supplement types (orange juice and ascorbic acid) of Vitamin C has a positive enhancement on tooth growth in guinea pigs.
Libraries required for this assignment project: ggplot2, moments, dplyr