Tooth Growth statistical inference assignment

Loading library and data sets

suppressMessages(library(plyr))
## Warning: package 'plyr' was built under R version 3.1.3
suppressMessages(library(dplyr))
## Warning: package 'dplyr' was built under R version 3.1.3
suppressMessages(library(ggplot2))
## Warning: package 'ggplot2' was built under R version 3.1.3
data(ToothGrowth)

Basic summary of data

summary_1<-str(ToothGrowth)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
summary_2<-summary(ToothGrowth)
summary_3<-ddply(ToothGrowth, .(supp, dose), summarise, count = length(len)) 

Following observations can be made

  1. This shows the data frame has 60 rows and 3 colums.
  2. Column 1 - len is numeric and data does not have many outliers as qualtiles shows data is well distributed.
  3. Column 2 - supp has only 2 values and data set has 30 observations each.
  4. Column 3 - dose has only 3 values and each value has 20 observations each

Further analysis can be done by plotting.

plot1 <- ggplot(ToothGrowth, aes(x=factor(dose),y=len,fill=factor(dose)))
plot1 + geom_boxplot() + facet_grid(.~supp) + scale_x_discrete("Dosage (mg)") + 
  stat_summary(fun.y=mean, colour="darkred", geom="point", shape=18, size=3,show_guide = FALSE) +   
    scale_y_continuous("Teeth Growth") +  
    ggtitle("Effect of Dosage and Supplement Type") +  
    theme(legend.position=c(1,0), legend.justification=c(1,0))

plot2 <- ggplot(ToothGrowth, aes(x=factor(supp),y=len,fill=factor(dose)))
plot2 + geom_boxplot() + facet_grid(.~dose) + scale_x_discrete("Dosage (mg)") + 
    stat_summary(fun.y=mean, colour="darkred", geom="point", shape=18, size=3,show_guide = FALSE) +   
    scale_y_continuous("Teeth Growth") +  
    ggtitle("Effect of Dosage and Supplement Type") +  
    theme(legend.position=c(1,0), legend.justification=c(1,0))

aggregate(ToothGrowth$len,list(supp = ToothGrowth$supp, dose = ToothGrowth$dose), FUN=function(x) c(mean =mean(x), median=median(x) ) )
##   supp dose x.mean x.median
## 1   OJ  0.5  13.23    12.25
## 2   VC  0.5   7.98     7.15
## 3   OJ  1.0  22.70    23.45
## 4   VC  1.0  16.77    16.50
## 5   OJ  2.0  26.06    25.95
## 6   VC  2.0  26.14    25.95

It can be infered that

  1. Higher dose results in longer teeth growth
  2. Lower doses of OJ are more effective than VC
  3. Increase in dose for VC results in higher teeth growth
  4. When dose = 2, teeth growth is more or less equal for both OJ and VC

Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose

Hypothesis for the dose amount

Null hypothesis: No difference in tooth growth by dose amount Alternate hypothesis: More tooth growth by increase in dose amount

dose_half = ToothGrowth$len[ToothGrowth$dose == 0.5]
dose_one = ToothGrowth$len[ToothGrowth$dose == 1]
dose_two = ToothGrowth$len[ToothGrowth$dose == 2]

One-tailed independent t-test with unequal variance.

t.test(dose_half, dose_one, alternative = "less", paired = FALSE, var.equal = FALSE, conf.level = 0.95)
## 
##  Welch Two Sample t-test
## 
## data:  dose_half and dose_one
## t = -6.4766, df = 37.986, p-value = 6.342e-08
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##       -Inf -6.753323
## sample estimates:
## mean of x mean of y 
##    10.605    19.735
t.test(dose_one, dose_two, alternative = "less", paired = FALSE, var.equal = FALSE, conf.level = 0.95)
## 
##  Welch Two Sample t-test
## 
## data:  dose_one and dose_two
## t = -4.9005, df = 37.101, p-value = 9.532e-06
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##      -Inf -4.17387
## sample estimates:
## mean of x mean of y 
##    19.735    26.100

The p-value for both tests are lower than threshold of 0.05 which indicates that we can reject Null hypothesis. Thus we accept alternate hypothesis as correct which means that More tooth growth by increase in dose amount.

Hypothesis for the supp amount

Null hypothesis: No difference in teeth growth by use of supp Alternate hypothesis: More tooth growth by use of OJ (mean of OJ is more than VC)

One-tailed independent t-test with unequal variance.

supp_OJ = ToothGrowth[ToothGrowth$supp == 'OJ',1]
supp_VC = ToothGrowth[ToothGrowth$supp == 'VC',1]
t.test(supp_OJ, supp_VC, alternative = "greater", paired = FALSE, var.equal = FALSE, conf.level = 0.95)
## 
##  Welch Two Sample t-test
## 
## data:  supp_OJ and supp_VC
## t = 1.9153, df = 55.309, p-value = 0.03032
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  0.4682687       Inf
## sample estimates:
## mean of x mean of y 
##  20.66333  16.96333

The p-value is above 0 but below threshold of 0.05 which indicates we can reject Null hypothesis. Thus we accpet Alternate hypothesis as correct which means More tooth growth by use of OJ.

State your conclusions and the assumptions needed for your conclusions.

With 95% level of confidence we can state that

  1. Teeth growth can be increased by increase in dose amount
  2. OJ is more effective VC and