Analysing ToothGrowth data set

Introduction

In this project I will analize the ToothGrowth data in the R datasets package.

The analysis will be separated in four parts

Load the ToothGrowth data and perform some basic exploratory data analyses
Provide a basic summary of the data.
Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose. (Only use the techniques from class, even if there’s other approaches worth considering)
State your conclusions and the assumptions needed for your conclusions.

1. Load data and provide exploratory data analysis

# First load the data
data(ToothGrowth)

# See how it is structured
str(ToothGrowth)

## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...

# See first 5 rows
head(ToothGrowth, 5)

##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5

2. Basic summary of the data

summary(ToothGrowth)

##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

2.1 See how the data is separated depending on dose an type of supplement

First load libraries

library(dplyr)

## Warning: package 'dplyr' was built under R version 3.6.3

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(ggplot2)

## Warning: package 'ggplot2' was built under R version 3.6.3

# convert dose column from a numeric to a factor variable
ToothGrowth$dose <- as.factor(ToothGrowth$dose)

T.growth <- ToothGrowth %>% 
                group_by(supp, dose) %>% 
                    summarise(len = mean(len))
ggplot(ToothGrowth, aes(dose, len, fill = dose)) + 
  geom_boxplot(size = 1, aes(colour = dose)) + facet_grid(.~supp) + ggtitle("Tooth Growth lenght related to dosis")+
  xlab("Dose (mg)") + ylab("Teeth length")

2.3 Before the t.test analysis lets prepare the data

# Create 2 groups of data depending on the type of supplement
group_oj <- ToothGrowth$len[ToothGrowth$supp == 'OJ']
group_vc <- ToothGrowth$len[ToothGrowth$supp == 'VC']

# Separate data depending only of dose
group_dose_0.5 <- ToothGrowth$len[ToothGrowth$dose == 0.5]
group_dose_1 <- ToothGrowth$len[ToothGrowth$dose == 1]
group_dose_2 <- ToothGrowth$len[ToothGrowth$dose == 2]

# Separate data depending on dose and type of supplement
group_oj_0.5 <- ToothGrowth$len[ToothGrowth$dose == 0.5 & ToothGrowth$supp == 'OJ']
group_oj_1 <- ToothGrowth$len[ToothGrowth$dose == 1 & ToothGrowth$supp == 'OJ']
group_oj_2 <- ToothGrowth$len[ToothGrowth$dose == 2 & ToothGrowth$supp == 'OJ']
group_vc_0.5 <- ToothGrowth$len[ToothGrowth$dose == 0.5 & ToothGrowth$supp == 'VC']
group_vc_1 <- ToothGrowth$len[ToothGrowth$dose == 1 & ToothGrowth$supp == 'VC']
group_vc_2 <- ToothGrowth$len[ToothGrowth$dose == 2 & ToothGrowth$supp == 'VC']

3.Hyphotesis testing

General conditions:

Data is considered normally distributed
Sample is considered randomly selected
Only 60 guinea pigs were sampled
Variances are considered unequal

3.1 Compare both supplements independently from their doses

Will see if there is a relation between doses, that is if one makes the tooth larger than the other

t.test(group_oj, group_vc, paired = FALSE, alternative = "greater", var.equal = FALSE, conf.level = 0.95)

## 
##  Welch Two Sample t-test
## 
## data:  group_oj and group_vc
## t = 1.9153, df = 55.309, p-value = 0.03032
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  0.4682687       Inf
## sample estimates:
## mean of x mean of y 
##  20.66333  16.96333

Because the p-value is less than alpha = 0.05 we have to reject the null hypothesis, which is that there is no difference in the medians of both supplements. Therefore it seems to be a relationship, that OJ results in greater tooth growth.

3.2 Compare only doses independently from the supplement

Compare dose 0.5 against dose 1

t.test(group_dose_0.5, group_dose_1, paired = FALSE, alternative = "less", var.equal = FALSE, conf.level = 0.95)

## 
##  Welch Two Sample t-test
## 
## data:  group_dose_0.5 and group_dose_1
## t = -6.4766, df = 37.986, p-value = 6.342e-08
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##       -Inf -6.753323
## sample estimates:
## mean of x mean of y 
##    10.605    19.735

Because the p value is smaller than alpha we have to reject the null hypothesis that both dose have the same effect and go for the alternative, which is that dose 1 have a greater effect on lenght that dose 0.5.

Compare dose 1 against dose 2

t.test(group_dose_1, group_dose_2, paired = FALSE, alternative = "less", var.equal = FALSE, conf.level = 0.95)

## 
##  Welch Two Sample t-test
## 
## data:  group_dose_1 and group_dose_2
## t = -4.9005, df = 37.101, p-value = 9.532e-06
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##      -Inf -4.17387
## sample estimates:
## mean of x mean of y 
##    19.735    26.100

Same result as the previous case. We have to reject the null hypothesis because p value is smaller than alpha (0.05) and consider that dose 2 have a greater effect than dose 1.

3.3 Now lets compare both supplements with each dose

Compare OJ and OC for dose 0.5

t.test(group_oj_0.5, group_vc_0.5, paired = FALSE, alternative = "greater", var.equal = FALSE, conf.level = 0.95)

## 
##  Welch Two Sample t-test
## 
## data:  group_oj_0.5 and group_vc_0.5
## t = 3.1697, df = 14.969, p-value = 0.003179
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  2.34604     Inf
## sample estimates:
## mean of x mean of y 
##     13.23      7.98

Here, the null hypothesis has to be rejected due to the small p-value and consider the alternative hypothesis true, which is that supplement OJ with dose 0.5 has a greater effect than supplement VC with the same dose.

Compare OJ and OC for dose 1

t.test(group_oj_1, group_vc_1, paired = FALSE, alternative = "greater", var.equal = FALSE, conf.level = 0.95)

## 
##  Welch Two Sample t-test
## 
## data:  group_oj_1 and group_vc_1
## t = 4.0328, df = 15.358, p-value = 0.0005192
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  3.356158      Inf
## sample estimates:
## mean of x mean of y 
##     22.70     16.77

Similar case than the one with dose 0.5. Here the p-value is small and the null hypothesis has to be rejected and consider the alternative hypothesis true, which is that supplement OJ with dose 1 has a greater effect than supplement VC with the same dose.

Compare OJ and OC for dose 2

t.test(group_oj_2, group_vc_2, paired = FALSE, alternative = "greater", var.equal = FALSE, conf.level = 0.95)

## 
##  Welch Two Sample t-test
## 
## data:  group_oj_2 and group_vc_2
## t = -0.046136, df = 14.04, p-value = 0.5181
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  -3.1335     Inf
## sample estimates:
## mean of x mean of y 
##     26.06     26.14

Now, in this test, the p-value is greater than alpha, therefore, we can not reject the null hypothesis, and consider that both supplements with dose 2 have the same effect.

4. Conclusion

Considering only doses and no supplement, dose 0.5 has a smaller effect than dose 1. In addition, dose 1 has a smalle effect than dose 2. As a consequence, dose 2 would be the best to use in order to increase the lenght of pig teeth
Considering different supplements with each dose, we can state that supplement OJ with dose 0.5 is better than the other supplement. Dose 1 of supplement OJ is better than VC supplement. Finally, we can not reject that the dose 2 for both supplement have different effect.

To make a better hypothesis testing, the sample should be increased in order to have more defined variance and try to find other relations.

Statistical Inference Course Project. Part 2

Maximiliano

12/7/2020