Overview

In the second part of the project we will analyze the ToothGrowth data in the R datasets package.

1. Load the ToothGrowth data and perform some basic exploratory data analyses

data(ToothGrowth)
df <- ToothGrowth
str(df)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
head(df)

Basic summary

summary(df)
##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

Graphing the length distribution

df %>% ggplot(aes(x = len)) + geom_histogram(binwidth = 3, color="black", fill="gray", aes(y=..density..) ) +
  geom_vline(xintercept = mean(df$len), color = "red", size=1.0) + 
  stat_function(fun=dnorm, args=list(mean= mean(df$len),  sd=sd(df$len)), color="blue", size =1) +
  stat_density(geom = "line", color = "red", size =1)

Let’s use the Shapiro-Wilk test of normality

stats::shapiro.test(df$len)
## 
##  Shapiro-Wilk normality test
## 
## data:  df$len
## W = 0.96743, p-value = 0.1091

From the the p-value > 0.05 the distribution of the data is not significantly different from normal distribution. We can assume the normality.

2.Basic summary of the data

df %>% group_by(supp) %>% 
  summarise(Mean = mean(len))
df %>% group_by(dose) %>% 
  summarise(Mean = mean(len))
df %>% group_by(supp, dose) %>% 
  summarise(Mean = mean(len))

3. Using confidence intervals to compare growth of tooth by supplement dose

df %>% ggplot(aes(x = factor(supp), y = len)) + geom_boxplot(aes(fill = factor(dose)))

Ttest for dose .5 mg:

t.test(len ~ supp, ToothGrowth[ToothGrowth$dose == .5, ])
## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 3.1697, df = 14.969, p-value = 0.006359
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.719057 8.780943
## sample estimates:
## mean in group OJ mean in group VC 
##            13.23             7.98

T-test for dose 1 mg:

t.test(len ~ supp, ToothGrowth[ToothGrowth$dose == 1, ])
## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 4.0328, df = 15.358, p-value = 0.001038
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  2.802148 9.057852
## sample estimates:
## mean in group OJ mean in group VC 
##            22.70            16.77

T-test for dose 2 mg:

t.test(len ~ supp, ToothGrowth[ToothGrowth$dose == 2, ])
## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = -0.046136, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.79807  3.63807
## sample estimates:
## mean in group OJ mean in group VC 
##            26.06            26.14

4. State your conclusions and the assumptions needed for your conclusions.

The p-values for the smaller dosages (0.5 and 1) are bellow 0.01, but are above it for the highest dosage. From that and from the graphs we can conclude that:
The type of supplement is relevant in smaller dosages with Orange Juice having a higher effect than vitamin C on teeth lenght For higher doses there is no difference between the types of supplementation.