Statistical Inference - Basic Inferential Data Analysis

Load packages

library(ggplot2)
library(datasets)
library(gridExtra)
data(ToothGrowth)
attach(ToothGrowth)

Description of the data set

str(ToothGrowth)

## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...

summary(ToothGrowth)

##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

The dataset has 60 observations and 3 variables (len, supp and dose). We will look at the first 6 rows of the data to have an overview how the data looks like

head(ToothGrowth)

Exploratory Data Analysis

qplot(supp, len, data = ToothGrowth, facets = ~dose, main = "Tooth Growth by Supplement and Dosage", xlab = "Supplement type", ylab = "Tooth Length") + geom_boxplot(aes(fill = supp))

According to the chart, dosage increases the tooth length. Also, the OJ increases the toothgrowth more than the VC except at the highest dosage (2.0 mg)

Hypothesis test

We assume that the variables are independent from each other, tooth growth follows the normal distribution and alpha is set to be 5%. We will perform hypothesis test with confidence interval and use t.test function to find 95% confidence interval

1. Inference test with supplement delivery variable (supp)

State Hypothesis

Null Value: lenOJ = lenVC
Alternative value: lenOJ > lenVC

Prepare the data set

OJ = subset(ToothGrowth, supp %in% c("OJ"))
VC = subset(ToothGrowth, supp %in% c("VC"))

t.test(len ~ supp, alternative = "greater", paired = FALSE, var.equal = FALSE, conf.level = 0.95)

## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 1.9153, df = 55.309, p-value = 0.03032
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  0.4682687       Inf
## sample estimates:
## mean in group OJ mean in group VC 
##         20.66333         16.96333

p-value is less than 5%. Also 95% CI ~ (0.47, inf)-> reject the null hypothesis –> lenOJ is greater than lenVC –> OJ has greater effect on the tooth length more than VC

2. Inference test with supplement dosage variable (dose)

State Hypothesis

Null Value: There is no correlation between the dose and Tooth Length
Alternative value: There is a correlation between the dose and Tooth Length

Prepare the dose for analysis

dose05_10 <- subset(ToothGrowth, dose %in% c(0.5, 1.0))
dose05_20 <- subset(ToothGrowth, dose %in% c(0.5, 2.0))
dose10_20 <- subset(ToothGrowth, dose %in% c(1.0, 2.0))

For “dose05_10”

t.test(len ~ dose, paired = F, var.equal = F, data = dose05_10)

## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.983781  -6.276219
## sample estimates:
## mean in group 0.5   mean in group 1 
##            10.605            19.735

p-value is less than 5% and 95% CI ~ (-11.98, -6.28)–> reject the null hypothesis

For “dose05_20”

t.test(len ~ dose, paired = F, var.equal = F, data = dose05_20)

## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -11.799, df = 36.883, p-value = 4.398e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -18.15617 -12.83383
## sample estimates:
## mean in group 0.5   mean in group 2 
##            10.605            26.100

The p-value is less than 5% and 95% CI ~(-18.16, -12.83)–> reject null hypothesis

For “dose10_20”

t.test(len ~ dose, paired = F, var.equal =F, data = dose10_20)

## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -8.996481 -3.733519
## sample estimates:
## mean in group 1 mean in group 2 
##          19.735          26.100

The p-value is less than 5% and 95% CI ~(-9, -3.7)–> reject the null hypothesis

*In conclusion, there is a correlation between the dosage and the tooth length

3. Inference test with supplement delivery within dose levels

Prepare the data

dose05 <- subset(ToothGrowth, dose %in% c(0.5))
dose10 <- subset(ToothGrowth, dose %in% c(1.0))
dose20 <- subset(ToothGrowth, dose %in% c(2.0))

For “dose05”

t.test(len ~ supp, paired = F, var.equal = F, data = dose05)

## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 3.1697, df = 14.969, p-value = 0.006359
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.719057 8.780943
## sample estimates:
## mean in group OJ mean in group VC 
##            13.23             7.98

p-value is less than 5% and 95% CI ~(1.72, 8.78) –> reject the null hypothesis

For “dose10”

t.test(len ~ supp, paired = F, var.equal = F, data = dose10)

## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 4.0328, df = 15.358, p-value = 0.001038
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  2.802148 9.057852
## sample estimates:
## mean in group OJ mean in group VC 
##            22.70            16.77

The p-value is less than 5% and 95% CI ~(2.80, 9.06) –> reject the null hypothesis

For “dose20”

t.test(len ~ supp, paired = F, var.equal = F, data = dose20)

## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = -0.046136, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.79807  3.63807
## sample estimates:
## mean in group OJ mean in group VC 
##            26.06            26.14

The p-value(0.9639) is greater than 5% and 95% CI ~(-3.80, 3.64) –> fail to reject the null hypothesis –> there is some effect on tooth length when using supplement delivery at the dosage of 2.0 mg

*** Conclusion **

Increase the supplement dosage will increase the tooth length
Delivery method has no effect on the increase of the tooth length. Yet, supplement delivery method with a dosage of 2.0mg will increase the tooth length

Statistical Inference - Basic Inferential Data Analysis - John Hopkins

Hoang Lam (Nancy Lam)

December 21, 2018