Introduction

In this project we explore and analysis in the R dataset package. The response is the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice(coded as OJ) or ascorbic acid (a form of vitamin C and coded as VC)

Load Data

Lets load and view the first 3 rows of our data.

data("ToothGrowth")

head(ToothGrowth, 3)
##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5

From the data preview there are three variables:
1. len - Numeric tooth length
2. supp - Supplement/ deliverly methods(VC or OJ)
3. dose - Numeric dose in milligrams/day

Basic Exploratory Data Analysis and Summary

dim(ToothGrowth) # Output the number of row and column respectively
## [1] 60  3
summary(ToothGrowth)
##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

From the we can derive some information:

Len

The shortest tooth is 4.20 and the highest is 33.90 while the mean is 18.81.

supp

There are two supplements that is OJ and VC ech with 30 observations.

dose

The minmum dose is 0.5mg/day while the maximum is 2mg/day

Histogram

Next from the numerical summary lets plot some histogram to view the general distribution of tooth length.

par(mfrow = c(1, 3))

hist(ToothGrowth$len, main = "Distribution For Both OJ and VC",
     
     col = "red", xlab = "Tooth Length")

hist(subset(ToothGrowth, supp == "VC")$len, main = "Distribution For VC", 
     
     col = "blue", xlab = "Tooth Length")

hist(subset(ToothGrowth, supp == "OJ")$len, main = "Distribution For OJ",
     
     col = "green", xlab = "Tooth Length")

From the plots we can see that:
The length for tooth with both supplement is more concetrated around the mean that is 18.81.
Further from the histogram of the splitted data by supplement we can say that tooth length for VC supplement are slightly higher that those for OJ supplement.

Simple boxplot by supplement method and Dose Ammount

Next lets plot some boxplot to see if there is any pattern we can drive.

par(mfrow = c(1, 2))

boxplot(len ~ dose, data = ToothGrowth, main = "Boxplot by Dose",
        
        col = "orange", ylab = "Tooth Length", xlab = "Dose Ammount")

boxplot(len ~ supp, data = ToothGrowth, main = "Boxplot by Supplement",
        
        col = "red", ylab = "Tooth length", xlab = "Supplement method")

From the boxplot we can clearly see that there is a clear difference in tooth length for different amount of dose a guinea pig take per day i.e. As dose increases the tooth length also increases.For the supplement tooth length appear overlap with that OJ slightly higher than that of VC which is in line with what we saw in the histogram above. This indifference/ovelap take us to the next step of hypothesis testing.

Hypothesis Testing

Here we start by setting the null(H_0) and alternative hypothesis:
H_0 - Mean for tooth length for OJ = Mean for tooth length for VC
H_A - Mean for tooth length for OJ =! Mean for tooth length for VC
Next from our exploratory analysis we can see different doses cause a significant difference in tooth growth thus its adviseable to used a paired test for correct result.
There are many ways to run this case test but for this case I choose to create two groups of data i.e. VC and OJ as shown below.

VC <- subset(ToothGrowth, supp == "VC")$len # VC tooth length

OJ <- subset(ToothGrowth, supp == "OJ")$len # OJ tooth length

t.test(OJ, VC, paired = TRUE) # hypothesis test
## 
##  Paired t-test
## 
## data:  OJ and VC
## t = 3.3026, df = 29, p-value = 0.00255
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.408659 5.991341
## sample estimates:
## mean of the differences 
##                     3.7

From the test we obtain a p-Value of 0.00255 which is significantly small than the 0.05 significance level thus we Fail to reject the H_0
This mean that difference in tooth length caused by the different supplement is not statistically significant.

Conclusion

From the analysis we can conclude that more dose you take the larger the tooth growth this can be seen from the boxplot.
From the hypothesis test we can conclude that no supplement is better(Cause same tooyh growth) although the boxplot show a slighter high growth if one used OJ(Orange juice) compared to VC(Form of Vitamin C).