The analysis below looks at two groups of guinea pigs (n = 30) given both vitamin C(Ascorbic Acid) and Orange Juice supplements. Each group had ten individuals take dosages of 0.5, 1, and 2 mg/day. In this investigation a basic summary of the data is provided and analysis is done to look at difference between the supplement groups.
library(datasets)
data("ToothGrowth")
growthdata <- ToothGrowth
The code below renames the columns of the datset so they are not abreviated. In the variable Supplement OJ and VC are replace witht the clearer terms of Orange Juice and Ascorbic Acid.
# data processing and cleaning
# Making variable names and observations more clear
names(growthdata) <- c("Length", "Supplement", "Dose")
growthdata$Supplement <- gsub("OJ", "Orange Juice", growthdata$Supplement)
growthdata$Supplement <- gsub("VC", "Ascorbic Acid", growthdata$Supplement)
The table below shows the format of the data. There are 60 guinea pigs total with 10 in each dosage group. The animals in each supplement group are not paired with one in the other supplement.
# exploratory data analysis
# showing size of groups and observations
table(growthdata[, c('Supplement', 'Dose')])
## Dose
## Supplement 0.5 1 2
## Ascorbic Acid 10 10 10
## Orange Juice 10 10 10
# finding mean for each
dose_sup_means <- aggregate(Length ~ Dose + Supplement, data = growthdata , mean)
dose_sup_means
## Dose Supplement Length
## 1 0.5 Ascorbic Acid 7.98
## 2 1.0 Ascorbic Acid 16.77
## 3 2.0 Ascorbic Acid 26.14
## 4 0.5 Orange Juice 13.23
## 5 1.0 Orange Juice 22.70
## 6 2.0 Orange Juice 26.06
# getting subset of orange juice and ascorbic acid
OJ_Data <- subset(growthdata, Supplement == "Orange Juice")
Ascorbic_Data <- subset(growthdata, Supplement == "Ascorbic Acid")
# finding the summary statistics of the datset
summary(OJ_Data$Length)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.20 15.52 22.70 20.66 25.72 30.90
summary(Ascorbic_Data$Length)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.20 11.20 16.50 16.96 23.10 33.90
# plotting graphs of means
library(ggplot2)
# stat = identity so categorical value is used not frequency, and make actual plot
# dodge causes bars to be next to each other not on top
g <- ggplot(data = dose_sup_means, aes(x = Dose, y = Length, fill = Supplement))
g <- g + geom_bar(stat = "identity", position = "dodge")
g + ggtitle("Average Guinea Pig Growth") + xlab("Dose (mg/day)")
The graphs and summary statistics above show that in general as the dosage increases the length of the guinea pigs teeth increase for both Ascorbic Acid and Orange Juice. The mean for Orange Juice seems to be higher for the whole dataset and each subgroup except 2.0 mg/day.
The plots and graphs below check the normality of each treatment subgroup as well as the population as a whole.
# making a histo gram of each supplement group
par(mfrow = c(1,2))
hist(OJ_Data$Length, main = "Orange Juice Group", xlab = "Length")
hist(Ascorbic_Data$Length, main = "Ascorbic_Acid(VC) Group", xlab = "Length")
The Ascorbic acid animals seem to be more normally distributed than the orange juice guinea pigs.
# checking normality
# making box plots of each each sample of 10 guinea pigs in the corresponding dose and supplement group
library(ggplot2)
b <- ggplot(data = growthdata, aes(x = Dose,y = Length, group = Dose))
b + geom_boxplot() + facet_grid(.~ Supplement) + ggtitle("Guinea Pig Tooth Growth by Supplement") +
xlab("Dose (mg/day)")
# making histograms of each each sample of 10 guinea pigs in the corresponding dose and supplement group
h <- ggplot(data = growthdata, aes(x = Length))
h + geom_histogram(binwidth = 2, color = "black", fill = "blue") + facet_grid(Supplement ~ Dose) +
ggtitle("Guinea Pig Tooth Growth by Dose & Supplement")
The box and whisker plots pictured above, show us that in general the Orange Juice samples appear to have distributions with larger values, except for the highest dose of 2 mg/day. The histograms show us that all the dosage supplement subgroups appear to be approximately normally distiributed. The groups with a dosage of .5 could be slightly skewed left. The 10 guiniea pigs in the orange juice 1mg group apear to be sliglty skewed right as well.
The hypothesis tests below are conducted with 2 sample t tests. Since our exploration of the data implied that guinea pigs that consumed orange juice had longer teeth, we chose to have orange juice observations greater than ascorbic acid as alternative hypothesis. The one exception is the 2.0 mg/day samples. In this case our tests checks to see if there is a difference in the means. We we able to conduct the t tests due to our samples be approximately normally distributed and our populations of guinea pigs being independent. We do not know if the populations have equal variance, so we assumed they did not.
t.test(OJ_Data$Length, Ascorbic_Data$Length, alternative = "greater", paired = FALSE)
##
## Welch Two Sample t-test
##
## data: OJ_Data$Length and Ascorbic_Data$Length
## t = 1.9153, df = 55.309, p-value = 0.03032
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 0.4682687 Inf
## sample estimates:
## mean of x mean of y
## 20.66333 16.96333
The confidence interval of the difference in means is above zero and the p value is less than .05. Therefore we can reject the null hypothesis and conclude that the average guinea pig teeth length is larger for orange juice pigs than ascorbic acid pigs.
t.test(subset(OJ_Data$Length, OJ_Data$Dose == 0.5), subset(Ascorbic_Data$Length, Ascorbic_Data$Dose == 0.5),
alternative = "greater", paired = FALSE)
##
## Welch Two Sample t-test
##
## data: subset(OJ_Data$Length, OJ_Data$Dose == 0.5) and subset(Ascorbic_Data$Length, Ascorbic_Data$Dose == 0.5)
## t = 3.1697, df = 14.969, p-value = 0.003179
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 2.34604 Inf
## sample estimates:
## mean of x mean of y
## 13.23 7.98
t.test(subset(OJ_Data$Length, OJ_Data$Dose == 1), subset(Ascorbic_Data$Length, Ascorbic_Data$Dose == 1),
alternative = "greater", paired = FALSE)
##
## Welch Two Sample t-test
##
## data: subset(OJ_Data$Length, OJ_Data$Dose == 1) and subset(Ascorbic_Data$Length, Ascorbic_Data$Dose == 1)
## t = 4.0328, df = 15.358, p-value = 0.0005192
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 3.356158 Inf
## sample estimates:
## mean of x mean of y
## 22.70 16.77
t.test(subset(OJ_Data$Length, OJ_Data$Dose == 2), subset(Ascorbic_Data$Length, Ascorbic_Data$Dose == 2),
alternative = "two.sided", paired = FALSE)
##
## Welch Two Sample t-test
##
## data: subset(OJ_Data$Length, OJ_Data$Dose == 2) and subset(Ascorbic_Data$Length, Ascorbic_Data$Dose == 2)
## t = -0.046136, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3.79807 3.63807
## sample estimates:
## mean of x mean of y
## 26.06 26.14
The confidence interval of the difference in means is above zero and the p value is less than .05 for both the 0.5 and 1 mg/day groups. Therefore we can reject the null hypothesis and conclude that the average guinea pig teeth length is larger for orange juice pigs than ascorbic acid pigs for these groups.
Since the means were much closer at the 2.0 mg/day groups we made our alternative hypothesis that the means were different. The results of the t test have a confidence interval that includes 0 and p value well above .05. Therefore we cannot reject the null and conclude that the tooth length is different for either suplement when the pigs recieve 2 mg.
The results of our analysis show that guinea pigs recieving orange juice on average have longer teeth than ones recieving ascorbic acid(vitamin C). The exception to to this rule is the highest dosage(2mg/day). Here we found no difference between the groups.