We are going to research the growth of Guinea Pigs’ teeth using a treatment based on Vitamin C. To do so, we are going to give each animal different doses of two components that contain high amounts of Vitamin C: on the one hand Orange Juice, and on the other Ascorbic Acid. The conclusions have been outlined at the end of the report. We are going to use the following tools: Plots, Confidence Interval and Hypothesis Test.
We are going to take a look at the data to see how to focus the research.
data <- ToothGrowth
str(data)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
summary(data)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
table(data$dose, data$supp)
##
## OJ VC
## 0.5 10 10
## 1 10 10
## 2 10 10
At first glance, we can see three variables: the length of the tooth, with a continuous and numerical value; the type of treatment, which is a factor with two possible values (I then change the levels for more explanatory values); and finally, the dose of the treatment with three different values (I then change it to a factor). We can focus the research in two different ways: relating the length of the tooth to the dose, or relating the length of the tooth to the type of treatment. We are going to look at it on a graph to choose which way is best.
levels(data$supp) <- c("Orange Juice", "Ascorbic Acid")
data$dose <- as.factor(data$dose)
g <- ggplot(data, aes(x = factor(dose), y = len))
g <- g + facet_grid(.~supp)
g <- g + geom_boxplot(aes(fill = supp))
g <- g + labs(title = "Tooth Length by Dose")
g <- g + labs(x = "Dose", y = "Length")
g
In general, we can see that the growth of the teeth is better or greater with higher doses.
g <- ggplot(data, aes(x = supp, y = len))
g <- g + facet_grid(.~dose)
g <- g + geom_boxplot(aes(fill = dose))
g <- g + labs(title = "Tooth Length by Supplement")
g <- g + labs(x = "Supplement", y = "Length")
g
We can also see, in general, that the growth of the teeth is greater with Orange Juice than with Ascorbic Acid at smaller doses, and the results are very similar with higher doses. We can also see that the variances are pretty heterogenous and we do not have sufficient information to consider the data paired, given that the results are anonymous.
We are going to carry out various Hypothesis Tests, one for each dose, comparing the two treatments and observing the differences between them. The null Hypothesis is going to be that the difference between treatments is zero. And the alternative Hypothesis is going to be that the difference between treatments is distinct from zero, which is to say, that one of the treatments at a determined dose is more effective than the other.
1.- Dose 0.5. Orange Juice - Ascorbic Acid
data.aa.0.5 <- subset(data, as.character(dose) == "0.5")
data.aa.0.5 <- subset(data.aa.0.5, as.character(supp) == "Ascorbic Acid")
data.oj.0.5 <- subset(data, as.character(dose) == "0.5")
data.oj.0.5 <- subset(data.oj.0.5, as.character(supp) != "Ascorbic Acid")
data.0.5 <- t.test(data.oj.0.5$len, data.aa.0.5$len, paired = FALSE, var.equal = FALSE)
data.0.5$conf.int
## [1] 1.719057 8.780943
## attr(,"conf.level")
## [1] 0.95
data.0.5$p.value
## [1] 0.006358607
Since the p-value is lower than 0.05 (5%), we can consider rejecting the null hypothesis, since we already know that the effects of the two treatments are different, and in fact we can even confirm that Orange Juice makes teeth grow more than Ascorbic Acid at doses of .5 mg/day.
2.- Dose 1. Orange Juice - Ascorbic Acid
data.aa.1 <- subset(data, as.character(dose) == "1")
data.aa.1 <- subset(data.aa.1, as.character(supp) == "Ascorbic Acid")
data.oj.1 <- subset(data, as.character(dose) == "1")
data.oj.1 <- subset(data.oj.1, as.character(supp) != "Ascorbic Acid")
data.1 <- t.test(data.oj.1$len, data.aa.1$len, paired = FALSE, var.equal = FALSE)
data.1$conf.int
## [1] 2.802148 9.057852
## attr(,"conf.level")
## [1] 0.95
data.1$p.value
## [1] 0.001038376
Since the p-value is lower than 0.05 (5%), we can consider rejecting the null hypothesis, since we already know that the effects of the two treatments are different, and in fact we can even confirm that Orange Juice makes teeth grow more than Ascorbic Acid at doses of 1 mg/day.
3.- Dose 2. Orange Juice - Ascorbic Acid
data.aa.2 <- subset(data, as.character(dose) == "2")
data.aa.2 <- subset(data.aa.2, as.character(supp) == "Ascorbic Acid")
data.oj.2 <- subset(data, as.character(dose) == "2")
data.oj.2 <- subset(data.oj.2, as.character(supp) != "Ascorbic Acid")
data.2 <- t.test(data.oj.2$len, data.aa.2$len, paired = FALSE, var.equal = FALSE)
data.2$conf.int
## [1] -3.79807 3.63807
## attr(,"conf.level")
## [1] 0.95
data.2$p.value
## [1] 0.9638516
Here things change. Since the value of the p-value is much higher than 0.05 (5%), we cannot consider rejecting the null hypothesis, as we already know that the effects of the 2 treatments are very similar, at doses of 2 mg/day. I only want to add that I suspect that at doses greater than 2 mg/day, the effects of the Ascorbic Acid will likely be greater than that of Orange Juice, given that its variance is much higher. This can be seen in the graphs.
The conclusions have been described throughout the report, but in summary they are: - For low doses (0.5 and 1 mg/day), Orange Juice is more effective than Ascorbic Acid. - For medium doses (2 mg/day), the 2 supplements behave similarly in terms of mean values. - For high doses (> 2 mg/day), I “suspect” that the Ascorbic Acid will be more effective than the Orange Juice, judging by the existing difference between their variances.