Overview

In the second part of the project we will analyze the ToothGrowth data in the R datasets package.

1. Load the ToothGrowth data and perform some basic exploratory data analyses

data(ToothGrowth)
df <- ToothGrowth
str(df)

## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...

head(df)

Basic summary

summary(df)

##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

Graphing the length distribution

df %>% ggplot(aes(x = len)) + geom_histogram(binwidth = 3, color="black", fill="gray", aes(y=..density..) ) +
  geom_vline(xintercept = mean(df$len), color = "red", size=1.0) + 
  stat_function(fun=dnorm, args=list(mean= mean(df$len),  sd=sd(df$len)), color="blue", size =1) +
  stat_density(geom = "line", color = "red", size =1)

Let’s use the Shapiro-Wilk test of normality

stats::shapiro.test(df$len)

## 
##  Shapiro-Wilk normality test
## 
## data:  df$len
## W = 0.96743, p-value = 0.1091

From the the p-value > 0.05 the distribution of the data is not significantly different from normal distribution. We can assume the normality.

2.Basic summary of the data

df %>% group_by(supp) %>% 
  summarise(Mean = mean(len))

df %>% group_by(dose) %>% 
  summarise(Mean = mean(len))

df %>% group_by(supp, dose) %>% 
  summarise(Mean = mean(len))

3. Using confidence intervals to compare growth of tooth by supplement dose

df %>% ggplot(aes(x = factor(supp), y = len)) + geom_boxplot(aes(fill = factor(dose)))

Ttest for dose .5 mg:

t.test(len ~ supp, ToothGrowth[ToothGrowth$dose == .5, ])

## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 3.1697, df = 14.969, p-value = 0.006359
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.719057 8.780943
## sample estimates:
## mean in group OJ mean in group VC 
##            13.23             7.98

T-test for dose 1 mg:

t.test(len ~ supp, ToothGrowth[ToothGrowth$dose == 1, ])

## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 4.0328, df = 15.358, p-value = 0.001038
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  2.802148 9.057852
## sample estimates:
## mean in group OJ mean in group VC 
##            22.70            16.77

T-test for dose 2 mg:

t.test(len ~ supp, ToothGrowth[ToothGrowth$dose == 2, ])

## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = -0.046136, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.79807  3.63807
## sample estimates:
## mean in group OJ mean in group VC 
##            26.06            26.14

4. State your conclusions and the assumptions needed for your conclusions.

The p-values for the smaller dosages (0.5 and 1) are bellow 0.01, but are above it for the highest dosage. From that and from the graphs we can conclude that:
The type of supplement is relevant in smaller dosages with Orange Juice having a higher effect than vitamin C on teeth lenght For higher doses there is no difference between the types of supplementation.

Peer-graded Assignment: Statistical Inference Course Project - Part 2

Caio Hofmann Francisco Alves

28/05/2020

Overview

1. Load the ToothGrowth data and perform some basic exploratory data analyses

2.Basic summary of the data

3. Using confidence intervals to compare growth of tooth by supplement dose

4. State your conclusions and the assumptions needed for your conclusions.