This report analyses the ToothGrowth data in the R datasets package. It seeks to understand the effect of dose levels of vitamin C administered by one of two delivery methods, orange juice or ascorbic acid (a form of vitamin C and coded as VC).
The ToothGrowth datasets contains data on the effect of vitamin C on tooth Growth in Guinea Pigs. The response is the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice or ascorbic acid (a form of vitamin C and coded as VC).
#load dataset
data(ToothGrowth)
The data frame has 60 observations on 3 variables:
| Variable | Data type | Description |
|---|---|---|
| len | numeric | Tooth length |
| supp | factor | Supplement type (VC or OJ) |
| dose | numeric | Dose in milligrams/day(0.5,1 or 2 2`mg/day) |
#looking at the data
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
#Toothgrowth dataset
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
#Orange juice delivery data summary
summary(filter(ToothGrowth, supp == "OJ"))
## len supp dose
## Min. : 8.20 OJ:30 Min. :0.500
## 1st Qu.:15.53 VC: 0 1st Qu.:0.500
## Median :22.70 Median :1.000
## Mean :20.66 Mean :1.167
## 3rd Qu.:25.73 3rd Qu.:2.000
## Max. :30.90 Max. :2.000
#A look at how the guinea pigs respondes to the the different dose levels of the two delivery methods
ggplot(ToothGrowth,aes(y=len,x=dose))+
geom_point(aes(color=supp))+
geom_smooth(aes(color=supp), method="lm")+
labs(title = "Vitamin C Effect on Tooth Growth in Guinea Pigs for the different delivery Methods",
x="Dose in milligrams/Day",
y= "Tooth Length",
caption="Figure 1") +
scale_color_discrete(name="Delivery Method",
breaks=c("OJ", "VC"),
labels=c("Orange Juice", "Vitamin C"))
ggplot(ToothGrowth, aes(x=factor(dose),y=len,fill=factor(dose))) +
geom_boxplot() + facet_grid(.~supp) +
labs(x="Dosage (Milligram)",
y="Length of Teeth",
title = "Boxplot of Tooth Growth Data",
caption="Figure 2")+
scale_fill_discrete(name="Dose mg/day")
Figure 1 above, shows that
1 - The pigs that received their supplemets by Orange Juice delivery method have longer teeth
2 - The pigs that received higher dose of supplements had longer teeth
Figure 2 above, showsmthat
3 - Increasing dosage of ascorbic acid seems to have more marginal effect than increasing of orange juice
I will perform T Test for each subset of data separated by dosages
d0.5 <- ToothGrowth[ToothGrowth$dose == 0.5, ]
d1.0 <- ToothGrowth[ToothGrowth$dose == 1.0, ]
d2.0 <- ToothGrowth[ToothGrowth$dose == 2.0, ]
test0.5 <- t.test(len ~ supp, paired = FALSE, var.equal = FALSE, data = d0.5)
p0.5 <- test0.5$p.value
ci0.5 <- test0.5$conf
## [1] "P Value for Dose = 0.5: 0.00635861"
## [1] "Confidence Interval for Dose = 0.5: 1.71906"
## [2] "Confidence Interval for Dose = 0.5: 8.78094"
test1.0 <- t.test(len ~ supp, paired = FALSE, var.equal = FALSE, data = d1.0)
p1.0 <- test1.0$p.value
ci1.0 <- test1.0$conf
## [1] "P Value for Dose = 1.0: 0.00103838"
## [1] "Confidence Interval for Dose = 1.0: 2.80215"
## [2] "Confidence Interval for Dose = 1.0: 9.05785"
test2.0 <- t.test(len ~ supp, paired = FALSE, var.equal = FALSE, data = d2.0)
p2.0 <- test2.0$p.value
ci2.0 <- test2.0$conf
## [1] "P Value for Dose = 2.0: 0.963852"
## [1] "Confidence Interval for Dose = 2.0: -3.79807"
## [2] "Confidence Interval for Dose = 2.0: 3.63807"
I will perform T Test for each subset of data separated by supp
#Exract the len and supp vectors from the df ToothGrowth
len <- ToothGrowth %>% select(len) %>% unlist()
supp <- ToothGrowth %>% select(supp) %>% unlist()
#Test
t.test(len~supp, paired=F)
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1710156 7.5710156
## sample estimates:
## mean in group OJ mean in group VC
## 20.66333 16.96333