Analysing ToothGrowth data set

Introduction

In this project I will analize the ToothGrowth data in the R datasets package.

The analysis will be separated in four parts

  • Load the ToothGrowth data and perform some basic exploratory data analyses
  • Provide a basic summary of the data.
  • Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose. (Only use the techniques from class, even if thereโ€™s other approaches worth considering)
  • State your conclusions and the assumptions needed for your conclusions.

1. Load data and provide exploratory data analysis

# First load the data
data(ToothGrowth)

# See how it is structured
str(ToothGrowth)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
# See first 5 rows
head(ToothGrowth, 5)
##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5

2. Basic summary of the data

summary(ToothGrowth)
##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

2.1 See how the data is separated depending on dose an type of supplement

First load libraries

library(dplyr)
## Warning: package 'dplyr' was built under R version 3.6.3
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.6.3
# convert dose column from a numeric to a factor variable
ToothGrowth$dose <- as.factor(ToothGrowth$dose)

T.growth <- ToothGrowth %>% 
                group_by(supp, dose) %>% 
                    summarise(len = mean(len))
ggplot(ToothGrowth, aes(dose, len, fill = dose)) + 
  geom_boxplot(size = 1, aes(colour = dose)) + facet_grid(.~supp) + ggtitle("Tooth Growth lenght related to dosis")+
  xlab("Dose (mg)") + ylab("Teeth length")

2.3 Before the t.test analysis lets prepare the data

# Create 2 groups of data depending on the type of supplement
group_oj <- ToothGrowth$len[ToothGrowth$supp == 'OJ']
group_vc <- ToothGrowth$len[ToothGrowth$supp == 'VC']

# Separate data depending only of dose
group_dose_0.5 <- ToothGrowth$len[ToothGrowth$dose == 0.5]
group_dose_1 <- ToothGrowth$len[ToothGrowth$dose == 1]
group_dose_2 <- ToothGrowth$len[ToothGrowth$dose == 2]

# Separate data depending on dose and type of supplement
group_oj_0.5 <- ToothGrowth$len[ToothGrowth$dose == 0.5 & ToothGrowth$supp == 'OJ']
group_oj_1 <- ToothGrowth$len[ToothGrowth$dose == 1 & ToothGrowth$supp == 'OJ']
group_oj_2 <- ToothGrowth$len[ToothGrowth$dose == 2 & ToothGrowth$supp == 'OJ']
group_vc_0.5 <- ToothGrowth$len[ToothGrowth$dose == 0.5 & ToothGrowth$supp == 'VC']
group_vc_1 <- ToothGrowth$len[ToothGrowth$dose == 1 & ToothGrowth$supp == 'VC']
group_vc_2 <- ToothGrowth$len[ToothGrowth$dose == 2 & ToothGrowth$supp == 'VC']

3.Hyphotesis testing

General conditions:

  • Data is considered normally distributed
  • Sample is considered randomly selected
  • Only 60 guinea pigs were sampled
  • Variances are considered unequal

3.1 Compare both supplements independently from their doses

Will see if there is a relation between doses, that is if one makes the tooth larger than the other

t.test(group_oj, group_vc, paired = FALSE, alternative = "greater", var.equal = FALSE, conf.level = 0.95)
## 
##  Welch Two Sample t-test
## 
## data:  group_oj and group_vc
## t = 1.9153, df = 55.309, p-value = 0.03032
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  0.4682687       Inf
## sample estimates:
## mean of x mean of y 
##  20.66333  16.96333

Because the p-value is less than alpha = 0.05 we have to reject the null hypothesis, which is that there is no difference in the medians of both supplements. Therefore it seems to be a relationship, that OJ results in greater tooth growth.

3.2 Compare only doses independently from the supplement

Compare dose 0.5 against dose 1

t.test(group_dose_0.5, group_dose_1, paired = FALSE, alternative = "less", var.equal = FALSE, conf.level = 0.95)
## 
##  Welch Two Sample t-test
## 
## data:  group_dose_0.5 and group_dose_1
## t = -6.4766, df = 37.986, p-value = 6.342e-08
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##       -Inf -6.753323
## sample estimates:
## mean of x mean of y 
##    10.605    19.735

Because the p value is smaller than alpha we have to reject the null hypothesis that both dose have the same effect and go for the alternative, which is that dose 1 have a greater effect on lenght that dose 0.5.

Compare dose 1 against dose 2

t.test(group_dose_1, group_dose_2, paired = FALSE, alternative = "less", var.equal = FALSE, conf.level = 0.95)
## 
##  Welch Two Sample t-test
## 
## data:  group_dose_1 and group_dose_2
## t = -4.9005, df = 37.101, p-value = 9.532e-06
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##      -Inf -4.17387
## sample estimates:
## mean of x mean of y 
##    19.735    26.100

Same result as the previous case. We have to reject the null hypothesis because p value is smaller than alpha (0.05) and consider that dose 2 have a greater effect than dose 1.

3.3 Now lets compare both supplements with each dose

Compare OJ and OC for dose 0.5

t.test(group_oj_0.5, group_vc_0.5, paired = FALSE, alternative = "greater", var.equal = FALSE, conf.level = 0.95)
## 
##  Welch Two Sample t-test
## 
## data:  group_oj_0.5 and group_vc_0.5
## t = 3.1697, df = 14.969, p-value = 0.003179
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  2.34604     Inf
## sample estimates:
## mean of x mean of y 
##     13.23      7.98

Here, the null hypothesis has to be rejected due to the small p-value and consider the alternative hypothesis true, which is that supplement OJ with dose 0.5 has a greater effect than supplement VC with the same dose.

Compare OJ and OC for dose 1

t.test(group_oj_1, group_vc_1, paired = FALSE, alternative = "greater", var.equal = FALSE, conf.level = 0.95)
## 
##  Welch Two Sample t-test
## 
## data:  group_oj_1 and group_vc_1
## t = 4.0328, df = 15.358, p-value = 0.0005192
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  3.356158      Inf
## sample estimates:
## mean of x mean of y 
##     22.70     16.77

Similar case than the one with dose 0.5. Here the p-value is small and the null hypothesis has to be rejected and consider the alternative hypothesis true, which is that supplement OJ with dose 1 has a greater effect than supplement VC with the same dose.

Compare OJ and OC for dose 2

t.test(group_oj_2, group_vc_2, paired = FALSE, alternative = "greater", var.equal = FALSE, conf.level = 0.95)
## 
##  Welch Two Sample t-test
## 
## data:  group_oj_2 and group_vc_2
## t = -0.046136, df = 14.04, p-value = 0.5181
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  -3.1335     Inf
## sample estimates:
## mean of x mean of y 
##     26.06     26.14

Now, in this test, the p-value is greater than alpha, therefore, we can not reject the null hypothesis, and consider that both supplements with dose 2 have the same effect.

4. Conclusion

  • Considering only doses and no supplement, dose 0.5 has a smaller effect than dose 1. In addition, dose 1 has a smalle effect than dose 2. As a consequence, dose 2 would be the best to use in order to increase the lenght of pig teeth
  • Considering different supplements with each dose, we can state that supplement OJ with dose 0.5 is better than the other supplement. Dose 1 of supplement OJ is better than VC supplement. Finally, we can not reject that the dose 2 for both supplement have different effect.

To make a better hypothesis testing, the sample should be increased in order to have more defined variance and try to find other relations.