Investigation on ToothGrowth Data from R datasets

A five number summary! To visually show the data , I provide a plot.

library("ggplot2");
library(datasets)
data(ToothGrowth)
summary(ToothGrowth);

##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

supp = ToothGrowth$supp;
dose = ToothGrowth$dose;

## geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.

Hyposis: OJ is significantly effective than VC

That means H0: length of OJ > length of VC Ha: length of OJ <= length of VC.

Here’s the t test between OJ&VC

oj<-split(ToothGrowth,ToothGrowth$supp)[[1]];
vc<-split(ToothGrowth,ToothGrowth$supp)[[2]];
t.test(oj$len,vc$len,paired = TRUE)

## 
##  Paired t-test
## 
## data:  oj$len and vc$len
## t = 3.3026, df = 29, p-value = 0.00255
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.408659 5.991341
## sample estimates:
## mean of the differences 
##                     3.7

Conclusion:As the t test shows , the p-value is small enough to confirm H0. H0 is failed to be rejected by the t test above.

Also, the range is entirely above zero, so H0 is not rejected. I chose to use paired data because dose can interfere with the correlation between OJ and VC.

Here’s t tests among different dosages.

(I omit H0&Ha , since the results are obvious.)

split_by_dose<-split(ToothGrowth$len,ToothGrowth$dose)
half = split_by_dose[[1]]
one = split_by_dose[[2]]
two = split_by_dose[[3]]
t.test(half,one)

## 
##  Welch Two Sample t-test
## 
## data:  half and one
## t = -6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.983781  -6.276219
## sample estimates:
## mean of x mean of y 
##    10.605    19.735

t.test(one,two)

## 
##  Welch Two Sample t-test
## 
## data:  one and two
## t = -4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -8.996481 -3.733519
## sample estimates:
## mean of x mean of y 
##    19.735    26.100

t.test(half,two)

## 
##  Welch Two Sample t-test
## 
## data:  half and two
## t = -11.799, df = 36.883, p-value = 4.398e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -18.15617 -12.83383
## sample estimates:
## mean of x mean of y 
##    10.605    26.100

Conclusion: More dosages is probable to lead to longer length. As the T-tests above demonstrate , the p-values in all 3 tests are small , which indicates the phenomenon is not a chance.