Statistical Inference Project Part 2

Overview:

We will analyze the ToothGrowth data in the R datasets package. As part of the analysis, we will

Perform basic exploratory analyses
Summarize the data
Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose.

Loading the data and doing data analyses.

library(datasets)
data("ToothGrowth")

varlen <- length(ToothGrowth)
colnames <- colnames(ToothGrowth)
obsv <- length(ToothGrowth$supp)
var1 <- colnames(ToothGrowth)[1]
var2 <- colnames(ToothGrowth)[2]
var3 <- colnames(ToothGrowth)[3]

unisupp <-   unique(ToothGrowth$supp)
unidose <-unique(ToothGrowth$dose)

Perform basic exploratory analyses and summarizing data.

The dataset defines the Effect of Vitamin C on Tooth Growth in 10 Guinea Pigs.

There are 3 variables - len, supp, dose - with 60 observations.

The first variable - len - defines the length of the teeth(odontoblasts) and has numeric values.

The second variable - supp - defines the supplement delivery method and has two factors - VC, OJ

The third variable - dose - defines the dosage and has numeric values with 3 levels - 0.5, 1, 2

Relationship in the dataset can be summarized -

knitr::kable(summary(ToothGrowth))

len	supp	dose
Min. : 4.20	OJ:30	Min. :0.500
1st Qu.:13.07	VC:30	1st Qu.:0.500
Median :19.25	NA	Median :1.000
Mean :18.81	NA	Mean :1.167
3rd Qu.:25.27	NA	3rd Qu.:2.000
Max. :33.90	NA	Max. :2.000

We will explore the relationship between the Dosage and Tooth Length for each of the supplement type.

library(ggplot2)
qplot(dose,len,data=ToothGrowth, facets=~supp, main="Tooth growth by Dosage",xlab="Dosage Levels", ylab="Tooth length") + geom_boxplot(aes(fill = factor(dose)))

From the plot “Tooth growth by Dosage”, we see that the tooth length has a positive correlation with the dosage levels.

In the next plot, We will explore the relationship between the Supplement delivery type and Tooth Length.

library(ggplot2)
qplot(supp,len,data=ToothGrowth, main="Tooth growth by Supplement Delivery",xlab="Supplement Delivery Type", ylab="Tooth length") + geom_boxplot(aes(fill = factor(supp)))

From the plot “Tooth growth by Supplement Delivery”, we see that the Supplement delivery type - OJ - has led to higher tooth growth length than type VC. Hence we can say that supplement delivery type - OJ - is more effective than type VC

Use confidence intervals and hypothesis tests to compare tooth growth by supp and dose.

Confidence level: Tooth growth by Supplement delivery type.

VC supplement is the first 30 records while OJ is second 30 records. So we will seperate them out. Also they are not paired and do not have equal variance.

vcgrp <- ToothGrowth$len[1:30]
ojgrp <- ToothGrowth$len[31:60]

conflevel <-  t.test(vcgrp,ojgrp,paired = FALSE, var.equal = FALSE)$conf
pval <-  t.test(vcgrp,ojgrp,paired = FALSE, var.equal = FALSE)$p.value

t.test(vcgrp,ojgrp,paired = FALSE, var.equal = FALSE)

## 
##  Welch Two Sample t-test
## 
## data:  vcgrp and ojgrp
## t = -1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -7.5710156  0.1710156
## sample estimates:
## mean of x mean of y 
##  16.96333  20.66333

From the above test with p-value of 0.0606345, the confidence level -7.5710156, 0.1710156 does not allow us to reject the null hypothesis.

Confidence level: Tooth growth by dosage levels.

We will conduct T tests between the 3 dosage levels 0.5, 1, 2 by considering 2 dosages in a test.

dosecomp1 <- ToothGrowth$len[ToothGrowth$dose == 0.5]
dosecomp2 <- ToothGrowth$len[ToothGrowth$dose == 1.0]
dosecomp3 <- ToothGrowth$len[ToothGrowth$dose == 2.0]

T Confidence test for dosecomp1 and dosecomp2 (dosages 0.5 and 1.0)

conflevel1 <-  t.test(dosecomp1, dosecomp2,paired = FALSE, var.equal = FALSE)$conf
pval1 <- t.test(dosecomp1, dosecomp2,paired = FALSE, var.equal = FALSE)$p.value
t.test(dosecomp1, dosecomp2,paired = FALSE, var.equal = FALSE)

## 
##  Welch Two Sample t-test
## 
## data:  dosecomp1 and dosecomp2
## t = -6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.983781  -6.276219
## sample estimates:
## mean of x mean of y 
##    10.605    19.735

T Confidence test for dosecomp2 and dosecomp3 (dosages 1.0 and 2.0)

conflevel2 <-  t.test(dosecomp2, dosecomp3,paired = FALSE, var.equal = FALSE)$conf
pval2 <- t.test(dosecomp2, dosecomp3,paired = FALSE, var.equal = FALSE)$p.value
t.test(dosecomp2, dosecomp3,paired = FALSE, var.equal = FALSE)

## 
##  Welch Two Sample t-test
## 
## data:  dosecomp2 and dosecomp3
## t = -4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -8.996481 -3.733519
## sample estimates:
## mean of x mean of y 
##    19.735    26.100

T Confidence test for dosecomp3 and dosecomp1 (dosages 2.0 and 0.5)

conflevel3 <-  t.test(dosecomp3, dosecomp1,paired = FALSE, var.equal = FALSE)$conf
pval3 <- t.test(dosecomp3, dosecomp1,paired = FALSE, var.equal = FALSE)$p.value
t.test(dosecomp3, dosecomp1,paired = FALSE, var.equal = FALSE)

## 
##  Welch Two Sample t-test
## 
## data:  dosecomp3 and dosecomp1
## t = 11.799, df = 36.883, p-value = 4.398e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  12.83383 18.15617
## sample estimates:
## mean of x mean of y 
##    26.100    10.605

From the Tconfidence tests performed between 3 groups of dosage comparison,

for Dosages 0.5 and 1.0, the confidence level is -11.9837813, -6.2762187 and p-value is \(1.2683007\times 10^{-7}\)

for Dosages 1.0 and 2.0, the confidence level is -8.9964805, -3.7335195 and p-value is \(1.9064295\times 10^{-5}\)

for Dosages 2.0 and 0.5, the confidence level is 12.8338335, 18.1561665 and p-value is \(4.397525\times 10^{-14}\)

So it can be said that the null hypothesis can be rejected for the above T confidence tests.

Assumptions:

It is assumed that each guinea pig had a 3 dose levels of each supplement delivery method.
The guinea pig populations are identical and independent for the test.
It is a random population sample of 10 pigs.

Conclusions:

There is a positive correlation between Tooth length and dosage levels.
Supplement delivery type OJ led to longer teeth than delivery type VC. Hence OJ is a much effective delivery type.

Appendix: You can find the online RPub document at

RPub Repository