In this project we explore and analysis in the R dataset package. The response is the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice(coded as OJ) or ascorbic acid (a form of vitamin C and coded as VC)
Lets load and view the first 3 rows of our data.
data("ToothGrowth")
head(ToothGrowth, 3)
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
From the data preview there are three variables:
1. len - Numeric tooth length
2. supp - Supplement/ deliverly methods(VC or OJ)
3. dose - Numeric dose in milligrams/day
dim(ToothGrowth) # Output the number of row and column respectively
## [1] 60 3
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
From the we can derive some information:
The shortest tooth is 4.20 and the highest is 33.90 while the mean is 18.81.
There are two supplements that is OJ and VC ech with 30 observations.
The minmum dose is 0.5mg/day while the maximum is 2mg/day
Next from the numerical summary lets plot some histogram to view the general distribution of tooth length.
par(mfrow = c(1, 3))
hist(ToothGrowth$len, main = "Distribution For Both OJ and VC",
col = "red", xlab = "Tooth Length")
hist(subset(ToothGrowth, supp == "VC")$len, main = "Distribution For VC",
col = "blue", xlab = "Tooth Length")
hist(subset(ToothGrowth, supp == "OJ")$len, main = "Distribution For OJ",
col = "green", xlab = "Tooth Length")
From the plots we can see that:
The length for tooth with both supplement is more concetrated around the mean that is 18.81.
Further from the histogram of the splitted data by supplement we can say that tooth length for VC supplement are slightly higher that those for OJ supplement.
Next lets plot some boxplot to see if there is any pattern we can drive.
par(mfrow = c(1, 2))
boxplot(len ~ dose, data = ToothGrowth, main = "Boxplot by Dose",
col = "orange", ylab = "Tooth Length", xlab = "Dose Ammount")
boxplot(len ~ supp, data = ToothGrowth, main = "Boxplot by Supplement",
col = "red", ylab = "Tooth length", xlab = "Supplement method")
From the boxplot we can clearly see that there is a clear difference in tooth length for different amount of dose a guinea pig take per day i.e. As dose increases the tooth length also increases.For the supplement tooth length appear overlap with that OJ slightly higher than that of VC which is in line with what we saw in the histogram above. This indifference/ovelap take us to the next step of hypothesis testing.
Here we start by setting the null(H_0) and alternative hypothesis:
H_0 - Mean for tooth length for OJ = Mean for tooth length for VC
H_A - Mean for tooth length for OJ =! Mean for tooth length for VC
Next from our exploratory analysis we can see different doses cause a significant difference in tooth growth thus its adviseable to used a paired test for correct result.
There are many ways to run this case test but for this case I choose to create two groups of data i.e. VC and OJ as shown below.
VC <- subset(ToothGrowth, supp == "VC")$len # VC tooth length
OJ <- subset(ToothGrowth, supp == "OJ")$len # OJ tooth length
t.test(OJ, VC, paired = TRUE) # hypothesis test
##
## Paired t-test
##
## data: OJ and VC
## t = 3.3026, df = 29, p-value = 0.00255
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.408659 5.991341
## sample estimates:
## mean of the differences
## 3.7
From the test we obtain a p-Value of 0.00255 which is significantly small than the 0.05 significance level thus we Fail to reject the H_0
This mean that difference in tooth length caused by the different supplement is not statistically significant.
From the analysis we can conclude that more dose you take the larger the tooth growth this can be seen from the boxplot.
From the hypothesis test we can conclude that no supplement is better(Cause same tooyh growth) although the boxplot show a slighter high growth if one used OJ(Orange juice) compared to VC(Form of Vitamin C).