In this report it will be applied some of techniques teached in class of Inferential Statistics. First, the exponential distribution will be explored by use of simulation and analysis with respect to Central Limit Theorem.
In second part of this report, the ToolGrowth data will be presented. Some statiscal methods will be applied to compare the response in function of the variables.
This section will explore the ToothGrowth data. This data set presents the response in the length of odontoblasts (cells responsible for tooth growth) in guinea pigs to levels of vitamin C and delivery methods.
Let’s star with a brief description of data provided.
data("ToothGrowth")
head(ToothGrowth)
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
## 4 5.8 VC 0.5
## 5 6.4 VC 0.5
## 6 10.0 VC 0.5
dim(ToothGrowth)
## [1] 60 3
As presented above, it is data frame with 60 observations on 3 variables described below:
[,1] - len - Numeric - Tooth length
[,2] - supp - Factor - Supplement type [VC (Ascorbic Acid) or OJ (Orange Juice))
[,3] - dose - Numeric - Dose in milligrams/day
This data frame is already presented in tidy format (observations in each row and variables in collumns). So, it is not necessary any transformation in this data frame.
Let’s evaluate if the application method, variable supp, don’t have influence in reponse. First, let’s summarise data by supp variable.
library(dplyr)
ToothGrowth %>%
group_by(supp) %>%
summarise(len = mean(len))
## # A tibble: 2 × 2
## supp len
## <fctr> <dbl>
## 1 OJ 20.66333
## 2 VC 16.96333
Let’s make a hypothesis test:
Consider \(H_0: \mu_{VC} = \mu_{OJ}\) versus \(H_a: \mu_{VC} \neq mu_{OJ}\)
The code below makes a t hypotesis test.
with(ToothGrowth, t.test(len ~ supp))
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1710156 7.5710156
## sample estimates:
## mean in group OJ mean in group VC
## 20.66333 16.96333
As we can see, for 95% confidence interval, the p-value is equal to 0.06 > 0.05, so we can not reject the main hypothesis that states both methods have same mean. The same conclusion can be achieved by looking to confidence interval, ranging from -0.17 to 7.57. As this interval contains 0, we fail to reject the main hypothesis.
The application method has no influence in lenght observed.
Now, we will analyze the variable dose. Again, let’s summarise by analized variable.
library(dplyr)
ToothGrowth %>%
group_by(dose) %>%
summarise(len = mean(len))
## # A tibble: 3 × 2
## dose len
## <dbl> <dbl>
## 1 0.5 10.605
## 2 1.0 19.735
## 3 2.0 26.100
By the table above, it is expected that length is influenced by dose. Let’s make some hypthesis tests to assure that.
It will be created 3 subsets of data, since t-test analyze two factor in each test.
library(dplyr)
subset1<-subset(ToothGrowth, dose %in% c(0.5,1))
subset2<-subset(ToothGrowth, dose %in% c(0.5,2))
subset3<-subset(ToothGrowth, dose %in% c(1,2))
Now, let’s make the t-test similar to the one evaluated for supp variable.
Test 1 - Consider \(H_0: \mu_{DOSE=0.5} = \mu_{DOSE=1.0}\) versus \(H_a: \mu_{DOSE=0.5} \neq mu_{DOSE=1.0}\)
Test 2 - Consider \(H_0: \mu_{DOSE=0.5} = \mu_{DOSE=2.0}\) versus \(H_a: \mu_{DOSE=0.5} \neq mu_{DOSE=2.0}\)
Test 3 - Consider \(H_0: \mu_{DOSE=1.0} = \mu_{DOSE=2.0}\) versus \(H_a: \mu_{DOSE=1.0} \neq mu_{DOSE=2.0}\)
The code below makes the three t hypotesis test described above.
with(subset1, t.test(len ~ dose))
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -6.4766, df = 37.986, p-value = 0.0000001268
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.983781 -6.276219
## sample estimates:
## mean in group 0.5 mean in group 1
## 10.605 19.735
with(subset2, t.test(len ~ dose))
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -11.799, df = 36.883, p-value = 0.00000000000004398
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -18.15617 -12.83383
## sample estimates:
## mean in group 0.5 mean in group 2
## 10.605 26.100
with(subset3, t.test(len ~ dose))
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -4.9005, df = 37.101, p-value = 0.00001906
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -8.996481 -3.733519
## sample estimates:
## mean in group 1 mean in group 2
## 19.735 26.100
As we can see, the three tests have p-value extremely low. So we can reject the main hypothesis. This same conclusion can be achieved by interval of confidence because all three intervals do not cointais 0, so we can reject the main hypothesis.
The level of dose has influence in lenght observerd.