Part 2: Analyze the ToothGrowth data in the R datasets package

Overview: This section covers statistical data analysis using the ToothGrowth data, using only methods covered in class.

Questions 1 + 2: Load the ToothGrowth data and perform basic exploratory data analyses, and a basic summary of the data

Load data and packages, plot a basic chart to visualise data.

library(ggplot2)
library(datasets)
colnames(ToothGrowth)
## [1] "len"  "supp" "dose"
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
summary(ToothGrowth)
##       len        supp     dose   
##  Min.   : 4.20   OJ:30   0.5:20  
##  1st Qu.:13.07   VC:30   1  :20  
##  Median :19.25           2  :20  
##  Mean   :18.81                   
##  3rd Qu.:25.27                   
##  Max.   :33.90
#Basic plot
qplot(dose ,len ,data = ToothGrowth, 
      col = supp, 
      main = "Tooth growth of guinea pigs by supplement type and dosage (mg)", 
      xlab = "Dosage (mg)", 
      ylab = "Tooth length")

Use box plots to better see the differences between the different supplements.

qplot(supp, len, data = ToothGrowth, 
      facets = ~dose, 
      main = "Tooth growth of guinea pigs by supplement type and dosage (mg)", 
      xlab = "Supplement type", 
      ylab = "Tooth length") + 
        geom_boxplot(aes(fill = supp))

OJ generally performs better compared to VC increasing the dosage (from 0.5, to 1, to 2) increased tooth length, for both supplement types

Question 3: Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose

Comparing tooth length by supplement and dose

Hypothesis 1

Null hypothesis: there is no difference in tooth growth given OJ or VC. Alternative hypothesis: tooth growth is greater when using OJ than VC.

VC.length <- ToothGrowth$len[ToothGrowth$supp == "VC"]
OJ.length <- ToothGrowth$len[ToothGrowth$supp == "OJ"]

Students t test - use the two vectors stated above to perform a t test.

t.test(OJ.length, VC.length, 
       alternative = "greater", # testing hypothesis if OJ is *greater* than VC
       paired = FALSE, # the data point are not paired with each other
       var.equal = FALSE, # the variances are not equal
       conf.level = 0.95) # a 95% confidence itnerval is taken as the default if nothing else is defined
## 
##  Welch Two Sample t-test
## 
## data:  OJ.length and VC.length
## t = 1.9153, df = 55.309, p-value = 0.03032
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  0.4682687       Inf
## sample estimates:
## mean of x mean of y 
##  20.66333  16.96333

The p-values of this comparison is p = 3%, lower than 5%. We reject the null hypothesis. In other words: The chance that the null hypothesis (no difference in growth between OJ and VC) is true and that the data turned out as it is (one can see that there is a clear difference between OJ and VC), is 3%, which is too low to accept. We conclude that the alternative hypothesis is true: OJ has a greater impact on tooth growth than VC

Hypothesis 2:

For this case, the null hypothesis is that there is no difference in tooth growth rates for different doses.

dose_0.5 <- ToothGrowth$len[ToothGrowth$dose == "0.5"]
dose_1   <- ToothGrowth$len[ToothGrowth$dose == "1"]
dose_2   <- ToothGrowth$len[ToothGrowth$dose == "2"]

Step 1: perform a t-test between dose_0.5 and dose_1

t.test(dose_0.5, dose_1, alternative = "less", # is the alternative that dose_0.5 has a smaller mean than dose_1 
       paired = FALSE, # the data points are not paired 
       var.equal = FALSE, # the variances are not equal
       conf.level = 0.95)
## 
##  Welch Two Sample t-test
## 
## data:  dose_0.5 and dose_1
## t = -6.4766, df = 37.986, p-value = 6.342e-08
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##       -Inf -6.753323
## sample estimates:
## mean of x mean of y 
##    10.605    19.735

The p-value is very small (6.342e-8), therefore we can conclude that the null hypothesis can be rejected looking at dose_0.5 and dose_1

Step 2: perform a t-test between dose_1 and dose_2

t.test(dose_1, dose_2, 
       alternative = "less", # is the alternative that dose_1 has a smaller mean than dose_2
       paired = FALSE, 
       var.equal = FALSE, 
       conf.level = 0.95)
## 
##  Welch Two Sample t-test
## 
## data:  dose_1 and dose_2
## t = -4.9005, df = 37.101, p-value = 9.532e-06
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##      -Inf -4.17387
## sample estimates:
## mean of x mean of y 
##    19.735    26.100

Here too, the p-value is very small, therefore we can reject the null hypothesis. So the higher the dosage gets, more the teeth grow.

Question 4: State your conclusions and the assumptions needed for your conclusions

  • There is (at least) a 95% confidence that by increasing the dosage from 0.5 to 1mg and from 1 to 2mg, increases the tooth length.
  • There is (at least) a 95% confidence that giving the supplement OJ (Orange Juice) increases tooth length more significantly than giving VC (Vitamin C) Assumptions made are that this sample is representative of the population in question, the assignment for categories was random and that the distribution of the means is normal.