The task developed in this document is for the completion of the Statistical Inference Course Assignment, part of Coursera’s Data Science Certification by Johns Hopkins University. There are two parts to this project :
As requested, each pdf report should have a maximum length of 3 pages admiting more 3 pages of supporting material as an appendix if needed.
Perform exploratory analysis on the ToothGrowth R dataset - The Effect of Vitamin C on Tooth Growth in Guinea Pigs. The response is the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice or ascorbic acid (a form of vitamin C and coded as VC).
Here is the summary :
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
From the report above we may see that half of the cases received the dose via the VC method and the other half the OJ method. We can also get the minimum, and maximum tooth length, and the minimum and maximum dosage.
By using the length mean (lenght_mean), length standard deviation (lenght_sd), and the number of observations (count), We may summarize the data in three ways:
## # A tibble: 6 Ă— 5
## # Groups: supp [2]
## supp dose lenght_mean lenght_sd count
## <fct> <dbl> <dbl> <dbl> <int>
## 1 OJ 0.5 13.2 4.46 10
## 2 OJ 1 22.7 3.91 10
## 3 OJ 2 26.1 2.66 10
## 4 VC 0.5 7.98 2.75 10
## 5 VC 1 16.8 2.52 10
## 6 VC 2 26.1 4.80 10
## # A tibble: 2 Ă— 4
## supp lenght_mean lenght_sd count
## * <fct> <dbl> <dbl> <int>
## 1 OJ 20.7 6.61 30
## 2 VC 17.0 8.27 30
## # A tibble: 3 Ă— 4
## dose lenght_mean lenght_sd count
## * <dbl> <dbl> <dbl> <int>
## 1 0.5 10.6 4.50 20
## 2 1 19.7 4.42 20
## 3 2 26.1 3.77 20
At first glance, it appears that OJ is a better supplement method than VC. It also looks like we may begin to agree that vitamin C is related with tooth growth.
As pictures may worth a thousand words, let’s research with them building a scatter plot using supplement and dosage relative to length of tooth:
The higher the dosage the longer the tooth grows. The graph shows the dosages are similar for both supplements at 2mg but it confirms that OJ has a bigger impact on teeth growth compared with VC on lower levels.
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1710156 7.5710156
## sample estimates:
## mean in group OJ mean in group VC
## 20.66333 16.96333
NO, there is not -> We see that the confidence interval includes 0 and the p-value is 0.06 (greater than 0.05 but not really significant).
| DOSE | Confidence Interval | P-Value | Evidence |
|---|---|---|---|
| 0.5 | 1.7191 to 8.7809 | 0.00636 | YES - There is - The confidence interval does not include zero |
| 1.0 | 2.8021 to 9.0579 | 0.00104 | YES - There is - The confidence interval does not include zero |
| 2.0 | -3.7981 to 3.6381 | 0.96385 | NO - There is not - The confidence interval includes zero |
| DOSES | Suppl | Interval | P-Value | Evidence |
|---|---|---|---|---|
| 0.5 / 1.0 | OJ | -13.416 to -5.524 | 8.7849191^{-5} | YES - There is - The confidence interval does not include zero |
| 1.0 / 2.0 | OJ | -6.531 to -0.189 | 0.0391951 | YES - There is - The confidence interval does not include zero |
| 0.5 / 1.0 | VC | -11.266 to -6.314 | 6.8110177^{-7} | YES - There is - The confidence interval does not include zero |
| 1.0 / 2.0 | VC | -13.054 to -5.686 | 9.1556031^{-5} | YES - There is - The confidence interval does not include zero |
Based on what this research has provided to us so far, is fair to conclude the following:
The requested assumptions were already stated at the project description.
# We will need ggplot to draw
library(ggplot2)
# Handy to manipulate vars.
library(dplyr)
# Load ToothGrowth data
data("ToothGrowth")
#
summary(ToothGrowth)
summarise_all <- ToothGrowth %>%
group_by(supp,dose) %>%
summarize(lenght_mean=mean(len), lenght_sd=sd(len), count = n())
print(summarise_all)
summarise_suplement <- ToothGrowth %>%
group_by(supp) %>%
summarize(lenght_mean=mean(len), lenght_sd=sd(len), count = n())
print(summarise_suplement)
summarise_dose <- ToothGrowth %>%
group_by(dose) %>%
summarize(lenght_mean=mean(len), lenght_sd=sd(len), count = n())
print(summarise_dose)
# Calculate len mean for every dose and supp
len_avg <- aggregate(len~.,data=ToothGrowth,mean)
# Now Plot the Tooth Lenght (len) relative to Dosage and Supplement
g <- ggplot(data = ToothGrowth,aes(x=dose,y=len))
g <- g + geom_point(aes(group=supp,colour=supp,size=1,alpha=0.6))
g <- g + geom_line(data=len_avg,aes(group=supp,colour=supp))
g <- g + labs(title="Tooth Lenght relative to Dosage and Supplement")
print(g)
# Do t test OJ to VC at all dosage levels
t.test(len ~ supp, paired=FALSE, var.equal=FALSE, data=ToothGrowth)
yes_lbl <- "**YES - There is** - The confidence interval does not include zero"
no_lbl <- "**NO - There is not** - The confidence interval includes zero"
vcoj_05 <- t.test(len ~ supp, paired=FALSE, var.equal=FALSE, data=ToothGrowth[ToothGrowth$dose==0.5, ])
vcoj_10 <- t.test(len ~ supp, paired=FALSE, var.equal=FALSE, data=ToothGrowth[ToothGrowth$dose==1.0, ])
vcoj_20 <- t.test(len ~ supp, paired=FALSE, var.equal=FALSE, data=ToothGrowth[ToothGrowth$dose==2.0, ])
vcoj_05_itvl <- paste(round(vcoj_05$conf.int[1],4), " to ", round(vcoj_05$conf.int[2],4))
vcoj_05_pvl <- round(vcoj_05$p.value,5)
vcoj_10_itvl <- paste(round(vcoj_10$conf.int[1],4), " to ", round(vcoj_10$conf.int[2],4))
vcoj_10_pvl <- round(vcoj_10$p.value,5)
vcoj_20_itvl <- paste(round(vcoj_20$conf.int[1],4), " to ", round(vcoj_20$conf.int[2],4))
vcoj_20_pvl <- round(vcoj_20$p.value,5)
oj_20 <- t.test(len ~ dose, paired=FALSE, var.equal=FALSE, data=filter(ToothGrowth, dose < 2, supp=="OJ"))
oj_05 <- t.test(len ~ dose, paired=FALSE, var.equal=FALSE, data=filter(ToothGrowth, dose > 0.5, supp=="OJ"))
vc_20 <- t.test(len ~ dose, paired=FALSE, var.equal=FALSE, data=filter(ToothGrowth, dose < 2, supp=="VC"))
vc_05 <- t.test(len ~ dose, paired=FALSE, var.equal=FALSE, data=filter(ToothGrowth, dose > 0.5, supp=="VC"))
oj_20_itvl <- paste(round(oj_20$conf.int[1],3), " to ", round(oj_20$conf.int[2],3))
oj_05_itvl <- paste(round(oj_05$conf.int[1],3), " to ", round(oj_05$conf.int[2],3))
vc_20_itvl <- paste(round(vc_20$conf.int[1],3), " to ", round(vc_20$conf.int[2],3))
vc_05_itvl <- paste(round(vc_05$conf.int[1],3), " to ", round(vc_05$conf.int[2],3))