Overview

The goal of this project (part 2 of the course project of the Statistical Infenrence Course from Coursera) is to analyze the Tooth Growth Dataset included in R Datasets Package, exploring the relations between the tooth growth, the supplement type and the dose in [mg/day] of 60 guinea pigs.

We use t-test assuming that our statistical hypothesis follows a Student’s t-distribution under the null hypothesis.

We concluded that:

Therefore, the tooth growth of guinea pigs is affected by the dose.

Dataset

We work with Tooth Growth Dataset included in R Datasets Package. The data contain 3 measures of the effect of vitamin C on thooth growth in 60 guinea pigs. The measures are: thooth length “len”, supplement type “supp”, and dose in mg/day “dose”.

Summarizing Data

We show the size of dataset.

## [1] 60  3

We show the name of the variables.

## [1] "len"  "supp" "dose"

We show a subset of the dataset.

##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5

We summarized the dataset.

##       len        supp     dose   
##  Min.   : 4.20   OJ:30   0.5:20  
##  1st Qu.:13.07   VC:30   1  :20  
##  Median :19.25           2  :20  
##  Mean   :18.81                   
##  3rd Qu.:25.27                   
##  Max.   :33.90

Explorating Data

The more the dose, the more the thooth growth in guine pigs is, for both supplement types.

The thooth growth in guine pigs for OJ supplement is more than the thooth growth for CV supplement when dose is 0.5 [mg/day] or 1.0 [mg/day]), and thooth growth is about the same for both supplement types when dose is 1.5 [mg/day].

Inferential Data analysis

We do t-test between supp and len variables.

We print relevant results of the t-test.

## [1] "p value: 0.0606"
## [1] "95 % confidence interval: [-0.171, 7.571]"
## [1] "Estimate mean in group OJ: 20.6633"
## [1] "Estimate mean in group VC: 16.9633"

Since the p-value is 0.06, which is more than 0.05, we reject our alternative hypothesis. Therefore, there is no significant difference in tooth length between the two groups of supplement types VC or OJ.

We do t-test between dose with values (0.5, 1.0) and len variables.

We print relevant results of the t-test.

## [1] "p value: 0"
## [1] "95 % confidence interval: [-11.9838, -6.2762]"
## [1] "Estimate mean in group 0.5: 10.605"
## [1] "Estimate mean in group 1: 19.735"

Since the p-value is 0, which is less than 0.05, we accept our alternative hypothesis. Therefore, there is significant difference in tooth length between the two groups of dose values wtih 0.5 [mg/day] or 1.0 [mg/day].

We do t-test between dose with values (1.0, 2.0) and len variables.

We print relevant results of the t-test.

## [1] "p value: 0"
## [1] "95 % confidence interval: [-8.9965, -3.7335]"
## [1] "Estimate mean in group 1: 19.735"
## [1] "Estimate mean in group 2: 26.1"

Since the p-value is 0, which is less than 0.05, we accept our alternative hypothesis. Therefore, there is significant difference in tooth length between the two groups of dose values wtih 1.0 [mg/day] or 2.0 [mg/day].

We do t-test between dose with values (0.5, 2.0) and len variables.

We print relevant results of the t-test.

## [1] "p value: 0"
## [1] "95 % confidence interval: [-18.1562, -12.8338]"
## [1] "Estimate mean in group 0.5: 10.605"
## [1] "Estimate mean in group 2: 26.1"

Since the p-value is 0, which is less than 0.05, we accept our alternative hypothesis. Therefore, there is significant difference in tooth length between the two groups of dose values wtih 0.5 [mg/day] or 2.0 [mg/day].

Conclusions

We study the relations between the tooth growth and the supplement type, and between the tooth growth and the dose in [mg/day] in 60 guinea pigs from Tooth Growth Dataset.

We use t-test assuming that our statistical hypothesis follows a Student’s t-distribution under the null hypothesis.

We concluded that:

Therefore, the tooth growth of guinea pigs is affected by the dose.

Appendix

We show all the R code use to do the analysis.

###############################################################################
# Author: Sergio Contador
# Date: March 2017
# Title: Statistical Inference Course Project from Coursera, part 2
###############################################################################


# Load Libraries Required
library(ggplot2)

# Load ToothGrowth data
data("ToothGrowth")

# # Dispaly which Values we have in data
# levels(as.factor(as.character(ToothGrowth$len)))
# levels(as.factor(as.character(ToothGrowth$supp)))
# levels(as.factor(as.character(ToothGrowth$dose)))

# Convert to a factor
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
ToothGrowth$supp <- as.factor(ToothGrowth$supp)

# Display a summary of the data
dim(ToothGrowth)
names(ToothGrowth)
head(ToothGrowth)
summary(ToothGrowth)


# Plot tooth length ('len') vs. the dose amount ('dose'), broken out by supplement delivery method ('supp')
g <- ggplot(aes(x = dose, y = len), data = ToothGrowth)
g + geom_boxplot(aes(fill = dose)) + xlab("Dose Amount") + ylab("Tooth Length") +
        facet_grid(~ supp) + ggtitle("Tooth Length Along Dose Amount") 

# Plot tooth length ('len') vs. supplement delivery method ('supp') broken out by the dose amount ('dose')
g <- ggplot(aes(x = supp, y = len), data = ToothGrowth)
g + geom_boxplot(aes(fill=supp)) + xlab("Supplement Type") +
        ylab("Tooth Length") + facet_grid(~ dose) + ggtitle("Tooth Length Along Supplement Type") 


# run t-test
t1 <- t.test(len ~ supp, data = ToothGrowth)

# show relevant results
p1 <- paste("p value:", round(t1$p.value, digits = 4))
p2 <- paste("95 % confidence interval: ", "[", round(t1$conf.int[1], digits = 4),
      ", ", round(t1$conf.int[2], digits = 4), "]", sep = "")
p3 <- paste("Estimate ", names(t1$estimate[1]), ": ", 
      round(t1$estimate[1], digits = 4), sep = "")
p4 <- paste("Estimate ", names(t1$estimate[2]), ": ", 
      round(t1$estimate[2], digits = 4), sep = "")

print(p1); print(p2); print(p3); print(p4)


# run t-test using dose with values 0.5 and 1.0
ToothGrowth2 <- subset(ToothGrowth, ToothGrowth$dose %in% c(0.5, 1.0))
t2 <- t.test(len ~ dose, data = ToothGrowth2)

# show relevant results
p1 <- paste("p value:", round(t2$p.value, digits = 4))
p2 <- paste("95 % confidence interval: ", "[", round(t2$conf.int[1], digits = 4),
            ", ", round(t2$conf.int[2], digits = 4), "]", sep = "")
p3 <- paste("Estimate ", names(t2$estimate[1]), ": ", 
            round(t2$estimate[1], digits = 4), sep = "")
p4 <- paste("Estimate ", names(t2$estimate[2]), ": ", 
            round(t2$estimate[2], digits = 4), sep = "")

print(p1); print(p2); print(p3); print(p4)


# run t-test using dose with values 1.0 and 2.0
ToothGrowth2 <- subset(ToothGrowth, ToothGrowth$dose %in% c(1.0, 2.0))
t3 <- t.test(len ~ dose, data = ToothGrowth2)

# show relevant results
p1 <- paste("p value:", round(t3$p.value, digits = 4))
p2 <- paste("95 % confidence interval: ", "[", round(t3$conf.int[1], digits = 4),
            ", ", round(t3$conf.int[2], digits = 4), "]", sep = "")
p3 <- paste("Estimate ", names(t3$estimate[1]), ": ", 
            round(t3$estimate[1], digits = 4), sep = "")
p4 <- paste("Estimate ", names(t3$estimate[2]), ": ", 
            round(t3$estimate[2], digits = 4), sep = "")

print(p1); print(p2); print(p3); print(p4)


# run t-test using dose with values 0.5 and 1.0
ToothGrowth2 <- subset(ToothGrowth, ToothGrowth$dose %in% c(0.5, 2.0))
t4 <- t.test(len ~ dose, data = ToothGrowth2)

# show relevant results
p1 <- paste("p value:", round(t4$p.value, digits = 4))
p2 <- paste("95 % confidence interval: ", "[", round(t4$conf.int[1], digits = 4),
            ", ", round(t4$conf.int[2], digits = 4), "]", sep = "")
p3 <- paste("Estimate ", names(t4$estimate[1]), ": ", 
            round(t4$estimate[1], digits = 4), sep = "")
p4 <- paste("Estimate ", names(t4$estimate[2]), ": ", 
            round(t4$estimate[2], digits = 4), sep = "")

print(p1); print(p2); print(p3); print(p4)