Summary

This is the project for the statistical inference class. We will use simulation to explore inference and do some simple inferential data analysis on the ToothGrowth dataset. The project consists of two parts:

  1. Simulation exercises.
  2. Basic inferential data analysis.

Simulation Excercise

For the simulation of the mean distribution of exponential distributions, 40 samples have been drawn from an exponential distribution with a rate factor of 0.2 and the mean as well as the standard deviation has been computed. This was repeated between 1000 times, and the density of this mean distribution was visualized together with a QQ Norm plot of the z-transformed sample [Fig.: 1]. The function for this computation can be seen in the following code chunk.

simulateExpMean <- function(n, nosim, lambda) {
    mu <- rep(0, nosim)
    for (i in 1:nosim) {
        mu[i] <- mean(rexp(n, lambda))
    }
    return(mu)
}
sTest <- shapiro.test(sample(simulations$nosim_1000, size=5000))

A Shapiro Wilk Test confirmed the Null-Hypothesis [p=4.9345 × 10-15], that the sample was drawn from a normal distribution with a sample mean of \(\hat\mu\)=4.985 and a \(\hat\sigma^2\)=0.6361, which almost equal the theoretical distribution parameters \(\mu\)=\(\lambda^{-1}\)=5 and \(\sigma^2\)=\(\frac{1}{n-1}\cdot\lambda^{-2}\)=0.641, with \(\lambda\)=0.2 and n=40.

Computation of the coverage for the 95% confidence interval.

mu_hat[2] + c(-1,1) * 1.96 * sqrt(var_hat[2])/sqrt(n)
## [1] 4.738 5.232

Coverage above 95% confidence could be achieved for the interval [4.738, 5.232].

Basic inferential data analysis

data(ToothGrowth)

We will use the ToothGrowth dataset and provide a basic summary of the data.

vitaminC <- filter(ToothGrowth, as.character(supp) == "VC")
orangeJuice <- filter(ToothGrowth, as.character(supp) == "OJ")
summary(vitaminC$len); summary(orangeJuice$len); table(ToothGrowth$supp, 
                                                       ToothGrowth$dose)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     4.2    11.2    16.5    17.0    23.1    33.9
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     8.2    15.5    22.7    20.7    25.7    30.9
##     
##      0.5  1  2
##   OJ  10 10 10
##   VC  10 10 10

We can see that there is a difference in the mean teeth length and that the data is balanced for both conditions, where the guinee pigs received orange juice (OC) or vitamin C (VC). Whether this difference in mean [Fig.: 2] is significant was explored using a simple paired t-test.

t_test <- t.test(vitaminC$len, orangeJuice$len, paired=TRUE)

With a p-value of 0.0025, we reject the null hypothesis and claim that there is a signficiant difference in the mean length of teeth under the two conditions orange juice and vitamin C. The 95% confidence interval was shown to be [-5.991341, -1.408659] It was still unclear however, whether the dosage, in either of the conditions had an influence on the tooth growth. We ran two paired t-tests for dosages lesser or equal to 0.5 and greater than 0.5.

t_test <- t.test(filter(vitaminC, dose <= .5)$len, 
                 filter(orangeJuice, dose <= .5)$len, 
                 paired=TRUE)
t_test <- t.test(filter(vitaminC, dose > .5)$len, 
                 filter(orangeJuice, dose > .5)$len, 
                 paired=TRUE)

Under the two conditions, dosages below 0.5 a significant difference [p = 0.01547], with a 95% confidence interval of [-9.236542, -1.263458] in tooth growth was shown, as well as for dosages above 0.5 [p = 0.05482], with a 95% confidence interval of [-5.91682146, 0.06682146].

Conclusion

It could be shown that, the difference in tooth growth under two conditions, was significantly different and tooth growth for dosages of up to 1 mg orange juice outperformed the effect on tooth growth of pure vitamin c of the same concentration.

Appendix

Fig 1 - Parameter Estimation via Simulation

plot of chunk fig 1

Fig 2 - Influence of Vitamin C on Tooth Growth

plot of chunk fig2