Statistical inferance Part 2

Introduction

The response is the length of odontoblasts (teeth) in each of 10 guinea pigs at each of three dose levels of Vitamin C (0.5, 1, and 2 mg) with each of two delivery methods (orange juice or ascorbic acid).

Loading and Exploditory data Analysis

library(ggplot2)

## Warning: package 'ggplot2' was built under R version 3.2.3

library(datasets)
data("ToothGrowth")

first few row of dataset and structure of data set

head(ToothGrowth)

##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5

str(ToothGrowth)

## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...

Basic Analysis on data.

#convert does in factor
ToothGrowth$dose<- as.factor(ToothGrowth$dose)

plot for lenth and Supplement

ggplot(aes(x=supp,y=len),data = ToothGrowth)+geom_boxplot(aes(fill=supp))+
  labs(title="Lenth vs Supplement",x="Supplement",y="lenth")

Its show that greater median for orange juice. Most of the values are between 12 to 26. This boxplot show that there is no strong relationship between lenth and Supplement.

plot for lenth and Does.

ggplot(aes(x=dose,y=len),data = ToothGrowth)+geom_boxplot(aes(fill=dose))+
  labs(title="Lenth vs Dose",x="Dose",y="lenth")

This graph show that there is a strong relation between dose and length.

Basic Summary of the Data

summary(ToothGrowth)

##       len        supp     dose   
##  Min.   : 4.20   OJ:30   0.5:20  
##  1st Qu.:13.07   VC:30   1  :20  
##  Median :19.25           2  :20  
##  Mean   :18.81                   
##  3rd Qu.:25.27                   
##  Max.   :33.90

Here’s a summary of mean of length for each supplement along with each dose:

aggregate(ToothGrowth$len, list(supp=ToothGrowth$supp, dose=ToothGrowth$dose),mean)

##   supp dose     x
## 1   OJ  0.5 13.23
## 2   VC  0.5  7.98
## 3   OJ    1 22.70
## 4   VC    1 16.77
## 5   OJ    2 26.06
## 6   VC    2 26.14

#aggregate(ToothGrowth$len ~ToothGrowth$supp +ToothGrowth$dose ,data=ToothGrowth,FUN= mean)

For each dose, the means appear to be greater with the orange juice supplement. However, this difference is small for dose 2. Besides, the means of response increase with dose, which was displayed before in the plot.

Comparison of Tooth Growth by Dose and Supplement

In this part we will do some t-test.

Let’s conduct a two sample t.test to compare the length of odontoblasts for orange juice and ascorbic acid conditions. As explained before, it isn’t a paired study

tapply(ToothGrowth$len, ToothGrowth$supp, var)

##       OJ       VC 
## 43.63344 68.32723

as we can see the is a large differance in the variance so I assume that variance are differant.

t.test(data = ToothGrowth, len ~ supp, paired = FALSE, var.equal = FALSE)

## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1710156  7.5710156
## sample estimates:
## mean in group OJ mean in group VC 
##         20.66333         16.96333

The t-statistic is 1.9 with 55 degrees of freedom and the p-value>0.05. Besides, the 95 interval contains 0. Thus, we can fail to reject the null hypothesis that the differences between the mean are equal. There is no effect of the supplement condition on the response.

Dose levels

tapply(ToothGrowth$len, ToothGrowth$dose, var)

##      0.5        1        2 
## 20.24787 19.49608 14.24421

as we can see the is a large differance in the variance so I assume that variance are differant.

library(dplyr)

## Warning: package 'dplyr' was built under R version 3.2.3

## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

dose0.5 <- filter(ToothGrowth, dose == 0.5)
dose1 <- filter(ToothGrowth, dose == 1)
dose2 <- filter(ToothGrowth, dose == 2)

Sample t.test between doses 0.5mg and 1mg

t.test(dose1$len,dose0.5$len,var.equal = FALSE,paired = FALSE)

## 
##  Welch Two Sample t-test
## 
## data:  dose1$len and dose0.5$len
## t = 6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   6.276219 11.983781
## sample estimates:
## mean of x mean of y 
##    19.735    10.605

The t-statistic is 6.5 with 38 degrees of freedom and the p-value<0.05. Besides, the 95% interval is strictly above 0. Thus, we can reject the null hypothesis that the differences between the mean are equal. There is an increase of the response between the doses 0.5mg and 1mg.

Sample t.test between doses 0.5mg and 2mg

t.test(dose2$len,dose0.5$len,var.equal = FALSE,paired = FALSE)

## 
##  Welch Two Sample t-test
## 
## data:  dose2$len and dose0.5$len
## t = 11.799, df = 36.883, p-value = 4.398e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  12.83383 18.15617
## sample estimates:
## mean of x mean of y 
##    26.100    10.605

The t-statistic is 11.8 with 36.8 degrees of freedom and the p-value<0.05. Besides, the 95% interval is strictly above 0. Thus, we can reject the null hypothesis that the differences between the mean are equal. There is an increase of the response between the doses 0.5mg and 2mg.

Sample t.test between doses 0.1mg and 2mg

t.test(dose2$len,dose1$len,var.equal = FALSE,paired = FALSE)

## 
##  Welch Two Sample t-test
## 
## data:  dose2$len and dose1$len
## t = 4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  3.733519 8.996481
## sample estimates:
## mean of x mean of y 
##    26.100    19.735

The t-statistic is 5 with 37 degrees of freedom and the p-value<0.05. Besides, the 95% interval is strictly above 0. Thus, we can reject the null hypothesis that the differences between the mean are equal. There is an increase of the response between the doses 0.5mg and 2mg.

Conclusions

Conclusion 1

There is no effect of the supplement condition on response (length). Indeed, the t.test performed allowed us to fail to reject the hypothesis that differences of mean of both groups OJ and VC are equal.

Conclusion 2

There is an effect of the dose levels on response (length). Indeed, the three t.tests performed allowed us, in each test, to reject the null hypothesis that the differences between the means are equal. Also, there is an increase of the response between the doses 0.5mg, 1mg and 2mg.

Assumptions

I made these assumptions :

.1 : the data comes from a distribution that is normal

.2 : the variances are unequal