This report presents a short analysis of the Tootgrowth dataset in Rstudio. The goal of the analysis is to understand how the different supplements impact the growth of teeth. First, there needs to be an undertanding of how big of a difference organge juice (OJ) vs vitamin C (VC) makes in the growth of teeth, and then understand if the dosage makes a difference for the same supplement and compared against the other supplement.
In order to start the analysis, the first step is to load the data and have a quick look into it:
#1. Load the ToothGrowth data and perform some basic exploratory data analyses
data<-ToothGrowth
head(data)
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
## 4 5.8 VC 0.5
## 5 6.4 VC 0.5
## 6 10.0 VC 0.5
lapply(data, class)
## $len
## [1] "numeric"
##
## $supp
## [1] "factor"
##
## $dose
## [1] "numeric"
By doing this quick analysis, it is easy to identify that there are 3 columns from which 2 are numeric and “supp” is a factor. To keep exploring the data, a density histogram represented in different colors for the 2 supplements gives an idea of how the data is distributed.
h1<-ggplot(data, aes(len, fill = supp)) +
geom_histogram(alpha = 0.5, aes(y = ..density..), position = 'identity', binwidth=1) +
labs(title="TootGrowth Histogram by Supplement", xlab="Length")
h1
Then, a quick summary of the data and grouping by supplement and dose lets you do a quick assessment on the different dependencies.
#2. Provide a basic summary of the data.
summary(data)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
mns<- aggregate(list(Means=data$len), by= list(Supplement=data$supp,Dose=data$dose), FUN= "mean")
mns
## Supplement Dose Means
## 1 OJ 0.5 13.23
## 2 VC 0.5 7.98
## 3 OJ 1.0 22.70
## 4 VC 1.0 16.77
## 5 OJ 2.0 26.06
## 6 VC 2.0 26.14
With this summary the plots below can be created and provide the following information:
Hypothesis: there is a difference in tooth length depending on the supplement taken.
#Comparison by supp
t1<- t.test(len ~ supp, paired = F, var.equal = F, data = data)
suppsummary<- data.frame(
"p-value"=t1$p.value,
"CI lower"= t1$conf[1],
"CI Upper"= t1$conf[2])
t1
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1710156 7.5710156
## sample estimates:
## mean in group OJ mean in group VC
## 20.66333 16.96333
suppsummary
## p.value CI.lower CI.Upper
## 1 0.06063451 -0.1710156 7.571016
Since 0 is in the confidence interval it cannot be demonstrated that there is a difference in tooth length among the two supplements.
Hypothesis: the dosage of the supplement impacts the length of the teeth.
#Comparison by dose
dose05 <- subset(data, data$dose==.5, select= c(len, supp))
dose1 <- subset(data, data$dose==1, select= c(len, supp))
dose2 <- subset(data, data$dose==2, select= c(len, supp))
#.5 vs 1
d05vsd1<- t.test(dose05$len, dose1$len, paired=F)
#.5 vs 2
d05vsd2<-t.test(dose05$len, dose2$len, paired=F)
# 2 vs 1
d2vsd1<- t.test(dose2$len, dose1$len, paired=F)
# putting the 3 comparisons in one dataframe:
dosesummary<- data.frame(
"p-value"=c(d05vsd1$p.value,d05vsd2$p.value,d2vsd1$p.value),
"CI lower"= c(d05vsd1$conf[1],d05vsd2$conf[1],d2vsd1$conf[1]),
"CI Upper"= c(d05vsd1$conf[2],d05vsd2$conf[2],d2vsd1$conf[2]),
row.names=c(".5 vs 1 ", ".5 vs 2 "," 2 vs 1 ")
)
dosesummary
## p.value CI.lower CI.Upper
## .5 vs 1 1.268301e-07 -11.983781 -6.276219
## .5 vs 2 4.397525e-14 -18.156167 -12.833833
## 2 vs 1 1.906430e-05 3.733519 8.996481
Since 0 is not part of any of the confidence intervals, it is safe to assume that depending on any of the 3 dosages, there will be an impact in tooth length.
For the conclusions, the following assumption were considered:
The data sample analyzed is meaningful representation of the overall porpulation,
When 0 is part of a confidence interval you can assume that is not deterministic that the variable will influence in tooth length,
If 0 is not part of the confidence intervals, it can be said that the variable does have an impact on tooth length.
After exploring the data, and running tests on the different supplements and doses, it is safe to conlude that going for supplement VC or OJ will not have a better impact in tooth lenght as much as the doses for this supplements. When the dose is 2, it doesn’t really make a meaninguful difference to go for VC or OJ.