Overview

In this document we are analysing Tooth Growth data present in R package ‘Datasets’.

ToothGrowth data contains response of Tooth Growth in each of the 10 Guinea Pigs at each of three dose levels of Vitamin C (0.5, 1, and 2 mg) with each of two delivery methods (orange juice or ascorbic acid).

We will be analysing Tooth Growth and will compare the effect of Supplement by various dose levels.

For multiple comparisons we will also normalize our ‘p-values’ to avoid Type-1 error.

Summary

Loading the data

##Loading Required Libraries
require(datasets)
require(ggplot2)
require(dplyr)
require(magrittr)
require(knitr)
tgdata<-ToothGrowth

##Converting 'Dose' to factor as they are only three dose types
tgdata$dose<-as.factor(ToothGrowth$dose)

Summary

Here we can see the basic summary of the data.

summary(tgdata)

##       len        supp     dose   
##  Min.   : 4.20   OJ:30   0.5:20  
##  1st Qu.:13.07   VC:30   1  :20  
##  Median :19.25           2  :20  
##  Mean   :18.81                   
##  3rd Qu.:25.27                   
##  Max.   :33.90

Plot

Let’s also visualise the data using plotting

ggplot(tgdata,aes(x=dose,y=len,group=supp,color=supp))+
geom_line(stat="summary_bin",fun.y="mean")+geom_point()+
labs(x="Dose",y="Length",color="Supplement",title="Tooth Growth By Dose and Supplement")

Assumptions

We can see that the no of observations per ‘Supplement’ type is very less i.e. 30. Hence we will be using ‘T-Statistics’ for our hypothetical testing and confidence interval calculation.
We are assuming that the data across different subsets of dose levels have un-equal variances and will be using the same using t-test.
We can see that the no of observations per ‘Supplement’ type is very less i.e. 30. Which means per dose levels it will be even fewer. Hence we will be using ‘T-Statistics’ for our hypothetical testing and confidence interval calculation.
Level of Significance will be taken as 5% and also ‘Benjamini Hochberg’ method has been applied further to adjust the p-values because of multiple comparisons.

Inferential Analysis

Here we will be comparing effect of supplement type (supp) on Length (len) For Each Dose Levels (dose)

Here we can see that two supplement types are available.
Each Supplement Type was given for all the three dose levels.
Hence for each Dose Levels we can compare the effect of both the supplement types on Tooth Growth.

Sub-setting of data

In this section we will create our subsets of data on the basis of Dose Levels so that individual ‘t.tests’ can be calculated on them.

We start by creating two groups for two supplement types and sub-setting only the data for dose level ‘0.5 mg’,‘1 mg’ and ‘2mg’ respectively

#Sub-setting 0.5 mg dose level data
halfmg<-tgdata[tgdata$dose==0.5,]
halfmg_summ<-data.frame(halfmg %>% group_by(supp) %>% summarise(mean=mean(len),sd=sd(len)))
vc0.5mg<-halfmg[halfmg$supp=='VC',]
oj0.5mg<-halfmg[halfmg$supp=='OJ',]

#Sub-setting 1 mg dose level data
onemg<-tgdata[tgdata$dose==1,]
onemg_summ<-data.frame(onemg %>% group_by(supp) %>% summarise(mean=mean(len),sd=sd(len)))
vc1mg<-onemg[onemg$supp=='VC',]
oj1mg<-onemg[onemg$supp=='OJ',]

#Sub-setting 2 mg dose level data
twomg<-tgdata[tgdata$dose==2,]
twomg_summ<-data.frame(twomg %>% group_by(supp) %>% summarise(mean=mean(len),sd=sd(len)))
vc2mg<-twomg[twomg$supp=='VC',]
oj2mg<-twomg[twomg$supp=='OJ',]

Summary Of Subsets of Data

Let’s see the basic summary of these subsets of data that we have extracted.

Mean and Standard Deviation for Dose Level ‘0.5 mg’

supp	mean	sd
OJ	13.23	4.459708
VC	7.98	2.746634

Mean and Standard Deviation for Dose Level ‘1 mg’

supp	mean	sd
OJ	22.70	3.910953
VC	16.77	2.515309

Mean and Standard Deviation for Dose Level ‘2 mg’

supp	mean	sd
OJ	26.06	2.655058
VC	26.14	4.797731

Hypothesis Testing and P-Values Calculation

The Hypothesis that we are going to do here for all the three subset of data are

Let \(\mu_{oj}\) be the mean of Population with Supplement ‘OJ’ and \(\mu_{vc}\) be the mean of population with supplement ‘VC’.

Hence our null and alternate hypothesis will be

\(H_0 : \mu_{oj}\leq\mu_{vc}\) i.e. supplement ‘OJ’ has less than or equal effect on length than ‘VC’

\(H_a : \mu_{oj}\ngtr\mu_{vc}\) i.e. supplement ‘OJ’ has more effect on length than ‘VC’

Doing a t-test on these hypothesis on all the three data and storing their results

t0.5test<-t.test(oj0.5mg$len,vc0.5mg$len,paired = FALSE,var.equal = FALSE,alternative = "greater")
t1test<-t.test(oj1mg$len,vc1mg$len,paired = FALSE,var.equal = FALSE,alternative = "greater")
t2test<-t.test(oj2mg$len,vc2mg$len,paired = FALSE,var.equal = FALSE,alternative = "greater")

Comparing With 97.5 Quantile

We will also compare the quantiles obtained in ‘t.test’ with our ‘97.5%’ quantile.

t0.5comp97.5<-t0.5test$statistic>qt(0.975,t0.5test$parameter)
t1comp97.5<-t1test$statistic>qt(0.975,t1test$parameter)
t2comp97.5<-t2test$statistic>qt(0.975,t2test$parameter)

Adjusting P-Values Using Benjamini Hochberg Method

Since we are doing here multiple comparisons, probability of Type-1 Error Increase hence it is a good idea to normalize the ‘P-Values’ obtained in our all statistics using ‘Benjamini Hochberg’ method.

padjusted<-p.adjust(c(t0.5test$p.value,t1test$p.value,t2test$p.value),method="BH")

Combining Results Of All the Statistics

Now we will combine the result of all the t.test that we have done across dose levels.

dose<-c(0.5,1,2)
p.values<-c(t0.5test$p.value,t1test$p.value,t2test$p.value)
tComp97.5<-c(t0.5comp97.5,t1comp97.5,t2comp97.5)
results<-data.frame(dose,p.values,tComp97.5,padjusted)
names(results)<-c("Dose (mg)","P-Values (H0)","tcalc>97.5","P-Adjusted")
kable(results,format = "markdown")

Dose (mg)	P-Values (H0)	tcalc>97.5	P-Adjusted
0.5	0.0031793	TRUE	0.0047690
1.0	0.0005192	TRUE	0.0015576
2.0	0.5180742	FALSE	0.5180742

Evaluating 2 mg Test Further

Since for 2 mg test probability in favour of null hypothesis i.e. OJ has less effect than VC was very high and hence accepted but this is not by significant level it would be a better idea to check the reverse hypothesis or directly null hypothesis comparing the mean.

For this new test our hypothesis will be

\(H_0 : \mu_{oj}=\mu_{vc}\) i.e. supplement ‘OJ’ has equal effect on length as ‘VC’

\(H_a : \mu_{oj}\neq\mu_{vc}\) i.e. supplement ‘OJ’ and ‘VC’ have different effects on length.

Doing a t.test with these hypothesis and quantile comparison

t2testextended<-t.test(oj2mg$len,vc2mg$len,paired = FALSE,var.equal = FALSE)
t2testextended

## 
##  Welch Two Sample t-test
## 
## data:  oj2mg$len and vc2mg$len
## t = -0.046136, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.79807  3.63807
## sample estimates:
## mean of x mean of y 
##     26.06     26.14

t2testextended$statistic>qt(0.975,t2testextended$parameter)

##     t 
## FALSE

Here p-value that we have obtained is very high in favour of our null hypothesis which means the effect of Supplement ‘OJ’ and ‘VC’ for ‘2 mg’ dose level is equal and we accept the null hypothesis.

Interpretation of the results

From the results we can conclude following about our data.

For Dose Level 0.5 mg and 1 mg Vitamin OJ has better contribution to ToothGrowth than Vitamin VC. The p-value and adjust p-value is very less and T-statistic is also greater than 97.5% quantile.
For Dose Level 2 mg Vitamin the hypothesis of having either Vitamin OJ or VC more contribution against each another was very less and doing further analysis we found out that both the supplements have same effect on length if dose level is 2 mg.

Final Interpretation To the Question

Do Delivery methods and/or Dosage affect growth in guinea pigs?

The delivery methods i.e. the Vitamins as supplements given to Guinea pigs are very much in favourable of Vitamin OJ over VC unless dose levels are 0.5mg and 1mg as they reach 2mg they have approximately equal effects.

Analysis Of Tooth Growth Data By Supplement And Dose

Dhawal Kapil

February 27, 2016

Overview

Summary

Summary

Plot

Assumptions

Inferential Analysis

Sub-setting of data

Summary Of Subsets of Data

Hypothesis Testing and P-Values Calculation

Comparing With 97.5 Quantile

Adjusting P-Values Using Benjamini Hochberg Method

Combining Results Of All the Statistics

Evaluating 2 mg Test Further

Interpretation of the results

Final Interpretation To the Question