Statistical Inference Course Project

Introduction

In this report it will be applied some of techniques teached in class of Inferential Statistics. First, the exponential distribution will be explored by use of simulation and analysis with respect to Central Limit Theorem.

In second part of this report, the ToolGrowth data will be presented. Some statiscal methods will be applied to compare the response in function of the variables.

Part 2 - ToothGrowth data

This section will explore the ToothGrowth data. This data set presents the response in the length of odontoblasts (cells responsible for tooth growth) in guinea pigs to levels of vitamin C and delivery methods.

Let’s star with a brief description of data provided.

data("ToothGrowth")
head(ToothGrowth)

##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5

dim(ToothGrowth)

## [1] 60  3

As presented above, it is data frame with 60 observations on 3 variables described below:

[,1] - len - Numeric - Tooth length

[,2] - supp - Factor - Supplement type [VC (Ascorbic Acid) or OJ (Orange Juice))

[,3] - dose - Numeric - Dose in milligrams/day

This data frame is already presented in tidy format (observations in each row and variables in collumns). So, it is not necessary any transformation in this data frame.

Let’s evaluate if the application method, variable supp, don’t have influence in reponse. First, let’s summarise data by supp variable.

library(dplyr)
ToothGrowth %>%
  group_by(supp) %>%
    summarise(len = mean(len))

## # A tibble: 2 × 2
##     supp      len
##   <fctr>    <dbl>
## 1     OJ 20.66333
## 2     VC 16.96333

Let’s make a hypothesis test:

Consider \(H_0: \mu_{VC} = \mu_{OJ}\) versus \(H_a: \mu_{VC} \neq mu_{OJ}\)

The code below makes a t hypotesis test.

with(ToothGrowth, t.test(len ~ supp))

## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1710156  7.5710156
## sample estimates:
## mean in group OJ mean in group VC 
##         20.66333         16.96333

As we can see, for 95% confidence interval, the p-value is equal to 0.06 > 0.05, so we can not reject the main hypothesis that states both methods have same mean. The same conclusion can be achieved by looking to confidence interval, ranging from -0.17 to 7.57. As this interval contains 0, we fail to reject the main hypothesis.

The application method has no influence in lenght observed.

Now, we will analyze the variable dose. Again, let’s summarise by analized variable.

library(dplyr)
ToothGrowth %>%
  group_by(dose) %>%
    summarise(len = mean(len))

## # A tibble: 3 × 2
##    dose    len
##   <dbl>  <dbl>
## 1   0.5 10.605
## 2   1.0 19.735
## 3   2.0 26.100

By the table above, it is expected that length is influenced by dose. Let’s make some hypthesis tests to assure that.

It will be created 3 subsets of data, since t-test analyze two factor in each test.

library(dplyr)
subset1<-subset(ToothGrowth, dose %in% c(0.5,1))
subset2<-subset(ToothGrowth, dose %in% c(0.5,2))
subset3<-subset(ToothGrowth, dose %in% c(1,2))

Now, let’s make the t-test similar to the one evaluated for supp variable.

Test 1 - Consider \(H_0: \mu_{DOSE=0.5} = \mu_{DOSE=1.0}\) versus \(H_a: \mu_{DOSE=0.5} \neq mu_{DOSE=1.0}\)

Test 2 - Consider \(H_0: \mu_{DOSE=0.5} = \mu_{DOSE=2.0}\) versus \(H_a: \mu_{DOSE=0.5} \neq mu_{DOSE=2.0}\)

Test 3 - Consider \(H_0: \mu_{DOSE=1.0} = \mu_{DOSE=2.0}\) versus \(H_a: \mu_{DOSE=1.0} \neq mu_{DOSE=2.0}\)

The code below makes the three t hypotesis test described above.

with(subset1, t.test(len ~ dose))

## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -6.4766, df = 37.986, p-value = 0.0000001268
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.983781  -6.276219
## sample estimates:
## mean in group 0.5   mean in group 1 
##            10.605            19.735

with(subset2, t.test(len ~ dose))

## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -11.799, df = 36.883, p-value = 0.00000000000004398
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -18.15617 -12.83383
## sample estimates:
## mean in group 0.5   mean in group 2 
##            10.605            26.100

with(subset3, t.test(len ~ dose))

## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -4.9005, df = 37.101, p-value = 0.00001906
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -8.996481 -3.733519
## sample estimates:
## mean in group 1 mean in group 2 
##          19.735          26.100

As we can see, the three tests have p-value extremely low. So we can reject the main hypothesis. This same conclusion can be achieved by interval of confidence because all three intervals do not cointais 0, so we can reject the main hypothesis.

The level of dose has influence in lenght observerd.

Statistical Inference Course Project - Part 2

Andre Morato

April 8, 2017

Introduction

Part 2 - ToothGrowth data