The main of this project is to preform an explorartory data analysis of the ToothGrowth data in the R datasets package and perform a hypothesis testing on the data to get a deeper insight of the data.

Loading the dataset

library(ggplot2)
library(datasets)
attach(ToothGrowth)
summary(ToothGrowth)

##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

str(ToothGrowth)

## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...

From the above Summary on the data, there are 3 varibles, the tooth length, the menthod the supplement was given and the dosage levels.

Explorartory Data Analysis

Dosage_Levels <- factor(ToothGrowth$dose)
g <- ggplot(data = ToothGrowth,aes(dose,len)) +
     geom_boxplot(aes(group = dose, fill = Dosage_Levels )) +
     ggtitle("Tooth Length for differnt Dosage Levels") + 
     xlab("Dosage Levels") +
     ylab("Tooth Length")
    
g

From the above plot it can be seen that the mean tooth length for each dosage level is different and it can also be seen that the tooth length increases as the dosage level increases.

Supplement_Type <- factor(ToothGrowth$supp,labels = c("Orange Juice","Absorbic Acid"))
g <- ggplot(data = ToothGrowth,aes(Supplement_Type,len)) +
     geom_boxplot(aes(group = supp, fill = Supplement_Type)) +
     ggtitle("Tooth Length for differnt Supplement types") + 
     xlab("Supplement Type") +
     ylab("Tooth Length")

g

From the above plot it can be seen that the mean tooth length for each Supplement Type is different and it can also be seen that the tooth length is more when the supplement is delivered by mixing with Orange Juice - a form of Vitamin C.

ToothGrowth$Dosage_Levels <- factor(ToothGrowth$dose)
ToothGrowth$Supplement_Type <- factor(ToothGrowth$supp,labels = c("Orange Juice","Absorbic Acid"))
g <- ggplot(data = ToothGrowth,aes(Supplement_Type,len)) +
     geom_boxplot(aes(fill = Supplement_Type)) +
     facet_grid(~ dose) +
     ggtitle("Tooth Length for different Supplement types across Dosage Levels") + 
     xlab("Dosage Levels") +
     ylab("Tooth Length")

g

Hypothesis Testing

I have added two new columns to the data set by converting the Supplement Types and the Dosage Levels to factors.

Performing t tests for Dosage Levels

Comapring Dosage levels 0.5 and 1

data1 <- subset(ToothGrowth, Dosage_Levels %in% c(0.5,1))
t.test(len ~ Dosage_Levels, paired = FALSE, var.equal = FALSE, data = data1)

## 
##  Welch Two Sample t-test
## 
## data:  len by Dosage_Levels
## t = -6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.983781  -6.276219
## sample estimates:
## mean in group 0.5   mean in group 1 
##            10.605            19.735

It can be seen that the mean tooth length for each dosage levels 0.5 and 1 are different and it can also be seen that the mean tooth length is more for dosage level 1. Also, the confidence interval does not contain 0, so we reject the null hypothesis.

Comapring Dosage levels 1 and 2

data2 <- subset(ToothGrowth, Dosage_Levels %in% c(1,2))
t.test(len ~ Dosage_Levels, paired = FALSE, var.equal = FALSE, data = data2)

## 
##  Welch Two Sample t-test
## 
## data:  len by Dosage_Levels
## t = -4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -8.996481 -3.733519
## sample estimates:
## mean in group 1 mean in group 2 
##          19.735          26.100

It can be seen that the mean tooth length for each dosage levels 1 and 2 are different and it can also be seen that the mean tooth length is more for dosage level 2. Also, the confidence interval does not contain 0, so we reject the null hypothesis.

Comapring Dosage levels 0.5 and 2

data3 <- subset(ToothGrowth, Dosage_Levels %in% c(0.5,2))
t.test(len ~ Dosage_Levels, paired = FALSE, var.equal = FALSE, data = data3)

## 
##  Welch Two Sample t-test
## 
## data:  len by Dosage_Levels
## t = -11.799, df = 36.883, p-value = 4.398e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -18.15617 -12.83383
## sample estimates:
## mean in group 0.5   mean in group 2 
##            10.605            26.100

It can be seen that the mean tooth length for each dosage levels 0.5 and 2 are different and it can also be seen that the mean tooth length is more for dosage level 2. Also, the confidence interval does not contain 0, so we reject the null hypothesis.

Performing t tests for Supplement Types

t.test(len ~ Supplement_Type, paired = FALSE, var.equal = FALSE, data = ToothGrowth)

## 
##  Welch Two Sample t-test
## 
## data:  len by Supplement_Type
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1710156  7.5710156
## sample estimates:
##  mean in group Orange Juice mean in group Absorbic Acid 
##                    20.66333                    16.96333

It can be seen that the mean tooth length for each Supplement Type is different and it can also be seen that the tooth length is more when the supplement is delivered by mixing with Orange Juice - a form of Vitamin C. Also, the confidence interval contains 0, so we cannot reject the null hypothesis.

Conclusions based on the Analysis

As the Dosage Levels increases, the mean Tooth Length increases.
The mean Tooth Length is largest for Dosage Level 2.
Delivering the Supplement mixed in Orange Juice has greater effect on Tooth Growth.

Assumptions used for the Conclusion

The variances of the different groups are not equal.
The treatments and the dosage levels are not paired.
The data was collected from a random set of guinea pigs.

Analyzing the Tooth Growth Data

Srividya

24 June 2016