Basic Inferential Data Analysis

by Sandy Sng
16 May 2018

1) Load the ToothGrowth data and perform some basic exploratory data analyses.

library(datasets)
data(ToothGrowth)
?ToothGrowth
str(ToothGrowth)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...

2) Provide a basic summary of the data.

Description: The response is the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice or ascorbic acid (a form of vitamin C and coded as VC).

Format: A data frame with 60 observations on 3 variables.

[,1] len numeric Tooth length

[,2] supp factor Supplement type (VC or OJ).

[,3] dose numeric Dose in milligrams/day

summary(ToothGrowth)
##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

3) Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose. (Only use the techniques from class, even if there’s other approaches worth considering)

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
qplot(ToothGrowth$len, ToothGrowth$dose, color = ToothGrowth$supp, geom = c("point", "smooth"))
## `geom_smooth()` using method = 'loess'

Note that 95% confidence interval is indicated by grey zone.

Using t.test, we check if there’s difference in the performance of the treatments (to compare tooth length by supplement and dose). For each case, we check if the p-value >0.05, and reject if otherwise.

# Comparing tooth growth by supplement (supp)
t.test(ToothGrowth$len[ToothGrowth$supp == "OJ"], ToothGrowth$len[ToothGrowth$supp == "VC"])
## 
##  Welch Two Sample t-test
## 
## data:  ToothGrowth$len[ToothGrowth$supp == "OJ"] and ToothGrowth$len[ToothGrowth$supp == "VC"]
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1710156  7.5710156
## sample estimates:
## mean of x mean of y 
##  20.66333  16.96333

Since p-value = 0.06063, the p-value of this test is more than 0.05. We cannot assume that the supplement type(either OJ or VC) affects tooth length.

# Comparing tooth growth by dosage (dose) -- for levels 0.5-1, 1-2, 0.5-2
t.test(ToothGrowth$len[ToothGrowth$dose == 1], ToothGrowth$len[ToothGrowth$dose == 0.5], var.equal = TRUE)
## 
##  Two Sample t-test
## 
## data:  ToothGrowth$len[ToothGrowth$dose == 1] and ToothGrowth$len[ToothGrowth$dose == 0.5]
## t = 6.4766, df = 38, p-value = 1.266e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   6.276252 11.983748
## sample estimates:
## mean of x mean of y 
##    19.735    10.605
t.test(ToothGrowth$len[ToothGrowth$dose == 2], ToothGrowth$len[ToothGrowth$dose == 1], var.equal = TRUE)
## 
##  Two Sample t-test
## 
## data:  ToothGrowth$len[ToothGrowth$dose == 2] and ToothGrowth$len[ToothGrowth$dose == 1]
## t = 4.9005, df = 38, p-value = 1.811e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  3.735613 8.994387
## sample estimates:
## mean of x mean of y 
##    26.100    19.735
t.test(ToothGrowth$len[ToothGrowth$dose == 2], ToothGrowth$len[ToothGrowth$dose == 0.5], var.equal = TRUE)
## 
##  Two Sample t-test
## 
## data:  ToothGrowth$len[ToothGrowth$dose == 2] and ToothGrowth$len[ToothGrowth$dose == 0.5]
## t = 11.799, df = 38, p-value = 2.838e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  12.83648 18.15352
## sample estimates:
## mean of x mean of y 
##    26.100    10.605

Since p-values are below 0.05 for all 3 t.test for:

  • tooth length between 0.5 mg and 1 mg/day of vitamin C — p-value = 1.266e-07
  • tooth length between 1 mg and 2 mg/day of vitamin C — p-value = 1.811e-05
  • tooth length between 0.5 mg and 2 mg/day of vitamin C — p-value = 2.838e-14

this shows that the p-values are significant, and there is an effect on tooth length for the 3 ranges of dosages (i.e. higher dosages give more growth). This is independent of supplement type.

4) State your conclusions and the assumptions needed for your conclusions.

Conclusion:

  • Based on the plot above, the length of tooth increases as the dosage increases, for both supplement methods
  • Based on the p-value significance levels, higher dosages give more growth, independent of supplement methods

Assumptions:

  • Variance is equal for the evaluation of the 3 ranges of dosages
  • The sample is representative of the population
  • The distribution of the sample means follows normal distribution