This report presents a short analysis of the Tootgrowth dataset in Rstudio. The goal of the analysis is to understand how the different supplements impact the growth of teeth. First, there needs to be an undertanding of how big of a difference organge juice (OJ) vs vitamin C (VC) makes in the growth of teeth, and then understand if the dosage makes a difference for the same supplement and compared against the other supplement.

Load the data

In order to start the analysis, the first step is to load the data and have a quick look into it:

#1. Load the ToothGrowth data and perform some basic exploratory data analyses
data<-ToothGrowth
head(data)
##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5
lapply(data, class)
## $len
## [1] "numeric"
## 
## $supp
## [1] "factor"
## 
## $dose
## [1] "numeric"

Basic summary of the data

By doing this quick analysis, it is easy to identify that there are 3 columns from which 2 are numeric and “supp” is a factor. To keep exploring the data, a density histogram represented in different colors for the 2 supplements gives an idea of how the data is distributed.

h1<-ggplot(data, aes(len, fill = supp)) + 
  geom_histogram(alpha = 0.5, aes(y = ..density..), position = 'identity', binwidth=1) +
  labs(title="TootGrowth Histogram by Supplement", xlab="Length")
h1

Then, a quick summary of the data and grouping by supplement and dose lets you do a quick assessment on the different dependencies.

#2. Provide a basic summary of the data.
summary(data)
##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000
mns<- aggregate(list(Means=data$len), by= list(Supplement=data$supp,Dose=data$dose), FUN= "mean")
mns
##   Supplement Dose Means
## 1         OJ  0.5 13.23
## 2         VC  0.5  7.98
## 3         OJ  1.0 22.70
## 4         VC  1.0 16.77
## 5         OJ  2.0 26.06
## 6         VC  2.0 26.14

With this summary the plots below can be created and provide the following information:

  1. This plot shows how the different supplements impact the length of the teeth. You can see that OJ has a higher mean and the distribution is narrower than the one for VC.

  1. This plot is the same one as the first plot but is now dividing data by dose. It is demonstrating that the lower the dose the more difference the supplements make in tooth lenght. OJ has in general better results for 0.5 & 1 doses, but when the dose is 2, both supplements act the same in average. When the dose is 2, VC does have a higher variability, which says that by using OJ you would have a more predictive result.

  1. This bar graph shows a direct comparison of the means of each supplement for each dose. It has been added to support the data presented in the previous plot.

  1. In order to conclude the initial exploratory analysis, the following plots show the different densities by supplement and by dose. As it was representes in plot #1 & #2 , OJ & Dose 1 have a smaller variability.

Confidence Intervals

Compare toothgrowth by supplement:

Hypothesis: there is a difference in tooth length depending on the supplement taken.

#Comparison by supp
t1<- t.test(len ~ supp, paired = F, var.equal = F, data = data)
suppsummary<- data.frame(
  "p-value"=t1$p.value,
  "CI lower"= t1$conf[1],
  "CI Upper"= t1$conf[2])
t1
## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1710156  7.5710156
## sample estimates:
## mean in group OJ mean in group VC 
##         20.66333         16.96333
suppsummary
##      p.value   CI.lower CI.Upper
## 1 0.06063451 -0.1710156 7.571016

Since 0 is in the confidence interval it cannot be demonstrated that there is a difference in tooth length among the two supplements.

Compare tootgrowth by dose:

Hypothesis: the dosage of the supplement impacts the length of the teeth.

#Comparison by dose
dose05 <- subset(data, data$dose==.5, select= c(len, supp))
dose1 <- subset(data, data$dose==1, select= c(len, supp))
dose2 <- subset(data, data$dose==2, select= c(len, supp))

#.5 vs 1
d05vsd1<- t.test(dose05$len, dose1$len, paired=F)
#.5 vs 2
d05vsd2<-t.test(dose05$len, dose2$len, paired=F)
# 2 vs 1
d2vsd1<- t.test(dose2$len, dose1$len, paired=F)

# putting the 3 comparisons in one dataframe:
dosesummary<- data.frame(
  "p-value"=c(d05vsd1$p.value,d05vsd2$p.value,d2vsd1$p.value),
  "CI lower"= c(d05vsd1$conf[1],d05vsd2$conf[1],d2vsd1$conf[1]),
  "CI Upper"= c(d05vsd1$conf[2],d05vsd2$conf[2],d2vsd1$conf[2]),
  row.names=c(".5 vs 1 ", ".5 vs 2 "," 2 vs 1 ")
)

dosesummary
##               p.value   CI.lower   CI.Upper
## .5 vs 1  1.268301e-07 -11.983781  -6.276219
## .5 vs 2  4.397525e-14 -18.156167 -12.833833
##  2 vs 1  1.906430e-05   3.733519   8.996481

Since 0 is not part of any of the confidence intervals, it is safe to assume that depending on any of the 3 dosages, there will be an impact in tooth length.

State your conclusions and the assumptions needed for your conclusions.

For the conclusions, the following assumption were considered:

After exploring the data, and running tests on the different supplements and doses, it is safe to conlude that going for supplement VC or OJ will not have a better impact in tooth lenght as much as the doses for this supplements. When the dose is 2, it doesn’t really make a meaninguful difference to go for VC or OJ.