The response is the length of odontoblasts (teeth) in each of 10 guinea pigs at each of three dose levels of Vitamin C (0.5, 1, and 2 mg) with each of two delivery methods (orange juice or ascorbic acid).
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.2.3
library(datasets)
data("ToothGrowth")
first few row of dataset and structure of data set
head(ToothGrowth)
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
## 4 5.8 VC 0.5
## 5 6.4 VC 0.5
## 6 10.0 VC 0.5
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
Basic Analysis on data.
#convert does in factor
ToothGrowth$dose<- as.factor(ToothGrowth$dose)
ggplot(aes(x=supp,y=len),data = ToothGrowth)+geom_boxplot(aes(fill=supp))+
labs(title="Lenth vs Supplement",x="Supplement",y="lenth")
Its show that greater median for orange juice. Most of the values are between 12 to 26. This boxplot show that there is no strong relationship between lenth and Supplement.
ggplot(aes(x=dose,y=len),data = ToothGrowth)+geom_boxplot(aes(fill=dose))+
labs(title="Lenth vs Dose",x="Dose",y="lenth")
This graph show that there is a strong relation between dose and length.
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 0.5:20
## 1st Qu.:13.07 VC:30 1 :20
## Median :19.25 2 :20
## Mean :18.81
## 3rd Qu.:25.27
## Max. :33.90
Here’s a summary of mean of length for each supplement along with each dose:
aggregate(ToothGrowth$len, list(supp=ToothGrowth$supp, dose=ToothGrowth$dose),mean)
## supp dose x
## 1 OJ 0.5 13.23
## 2 VC 0.5 7.98
## 3 OJ 1 22.70
## 4 VC 1 16.77
## 5 OJ 2 26.06
## 6 VC 2 26.14
#aggregate(ToothGrowth$len ~ToothGrowth$supp +ToothGrowth$dose ,data=ToothGrowth,FUN= mean)
For each dose, the means appear to be greater with the orange juice supplement. However, this difference is small for dose 2. Besides, the means of response increase with dose, which was displayed before in the plot.
In this part we will do some t-test.
Let’s conduct a two sample t.test to compare the length of odontoblasts for orange juice and ascorbic acid conditions. As explained before, it isn’t a paired study
tapply(ToothGrowth$len, ToothGrowth$supp, var)
## OJ VC
## 43.63344 68.32723
as we can see the is a large differance in the variance so I assume that variance are differant.
t.test(data = ToothGrowth, len ~ supp, paired = FALSE, var.equal = FALSE)
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1710156 7.5710156
## sample estimates:
## mean in group OJ mean in group VC
## 20.66333 16.96333
The t-statistic is 1.9 with 55 degrees of freedom and the p-value>0.05. Besides, the 95 interval contains 0. Thus, we can fail to reject the null hypothesis that the differences between the mean are equal. There is no effect of the supplement condition on the response.
tapply(ToothGrowth$len, ToothGrowth$dose, var)
## 0.5 1 2
## 20.24787 19.49608 14.24421
as we can see the is a large differance in the variance so I assume that variance are differant.
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.2.3
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:stats':
##
## filter, lag
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
dose0.5 <- filter(ToothGrowth, dose == 0.5)
dose1 <- filter(ToothGrowth, dose == 1)
dose2 <- filter(ToothGrowth, dose == 2)
t.test(dose1$len,dose0.5$len,var.equal = FALSE,paired = FALSE)
##
## Welch Two Sample t-test
##
## data: dose1$len and dose0.5$len
## t = 6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 6.276219 11.983781
## sample estimates:
## mean of x mean of y
## 19.735 10.605
The t-statistic is 6.5 with 38 degrees of freedom and the p-value<0.05. Besides, the 95% interval is strictly above 0. Thus, we can reject the null hypothesis that the differences between the mean are equal. There is an increase of the response between the doses 0.5mg and 1mg.
t.test(dose2$len,dose0.5$len,var.equal = FALSE,paired = FALSE)
##
## Welch Two Sample t-test
##
## data: dose2$len and dose0.5$len
## t = 11.799, df = 36.883, p-value = 4.398e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 12.83383 18.15617
## sample estimates:
## mean of x mean of y
## 26.100 10.605
The t-statistic is 11.8 with 36.8 degrees of freedom and the p-value<0.05. Besides, the 95% interval is strictly above 0. Thus, we can reject the null hypothesis that the differences between the mean are equal. There is an increase of the response between the doses 0.5mg and 2mg.
t.test(dose2$len,dose1$len,var.equal = FALSE,paired = FALSE)
##
## Welch Two Sample t-test
##
## data: dose2$len and dose1$len
## t = 4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 3.733519 8.996481
## sample estimates:
## mean of x mean of y
## 26.100 19.735
The t-statistic is 5 with 37 degrees of freedom and the p-value<0.05. Besides, the 95% interval is strictly above 0. Thus, we can reject the null hypothesis that the differences between the mean are equal. There is an increase of the response between the doses 0.5mg and 2mg.
There is no effect of the supplement condition on response (length). Indeed, the t.test performed allowed us to fail to reject the hypothesis that differences of mean of both groups OJ and VC are equal.
There is an effect of the dose levels on response (length). Indeed, the three t.tests performed allowed us, in each test, to reject the null hypothesis that the differences between the means are equal. Also, there is an increase of the response between the doses 0.5mg, 1mg and 2mg.
Assumptions
I made these assumptions :
.1 : the data comes from a distribution that is normal
.2 : the variances are unequal