Overview

This is a doc for Statistical Inference Course Final Project. This project aims to study the Central Limit Theorem by plotting with real examples. The rate given in this study for the rexp function is 0.2.

Part 1

Simulations

mns = NULL
for (i in 1 : 1000) mns = c(mns, mean(rexp(runif(40),0.2)))
hist(mns,probability = TRUE)
dmean<-mean(mns)
abline(v = dmean, col = "blue", lwd = 2, lty=2)

Get the mean value on the plot

dmean<-mean(mns)
dmean
## [1] 4.999425

Narrow down the mean by using t-test 95% confidence

t.test(mns)
## 
##  One Sample t-test
## 
## data:  mns
## t = 200.64, df = 999, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  4.95053 5.04832
## sample estimates:
## mean of x 
##  4.999425

The comparison between theoretical mean and the real mean

Since the theoretical mean is equal to 1/lambda =1/0.2 =5, and the calculated mean showing on the graph is close to 5. So the result makes sense. Besides, the 95% confidence interval gives a mean value interval.

The comparison between theoretical variance and the real sample variance

dvar<-var(mns)
dvar
## [1] 0.6208456
theo_var<-((1/0.2)^2)/40
theo_var
## [1] 0.625

Since the theoretical variance theo_var is close to the value of real sample variance dvar, we can say the simulaiton makes sense in terms of variance.

Distribution

set.seed(1)
c<-runif(1000)
#Use probability to plot histogram
hist(rexp(c,0.2),probability = TRUE)

It is clear that in the random rexponential function distribution, the distribution does not center at the center of the graph, not like the distribution we discussed above, which centers at the center of the graph and shows a normal distribution fit.

Part 2

data("ToothGrowth")

Summary of the data

summary(ToothGrowth)
##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

Take a look at the data

head(ToothGrowth)
##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5

Compare the tooth growth length by different supp (OJ vs VC)

tapply(ToothGrowth$len,ToothGrowth$supp, mean)
##       OJ       VC 
## 20.66333 16.96333

Data visualization

library(ggplot2)
ggplot(ToothGrowth, aes(factor(dose), len, fill = factor(dose))) +
      geom_boxplot() +
      facet_grid(.~supp, labeller = as_labeller(
            c("OJ" = "Orange juice", 
              "VC" = "Ascorbic Acid"))) +
      labs(title = "Tooth growth by dosage and by supp",
           x = "Dose in mg/day", 
           y = "Tooth Lengh") +
      scale_fill_discrete(name = "Dosage of\nvitamin C\nin mg/day")

Hypothesis testing

1) First, get the dataset if the length ~ supp with a fixed dose of 0.5

d05<-subset(ToothGrowth, dose=="0.5")
d05
##     len supp dose
## 1   4.2   VC  0.5
## 2  11.5   VC  0.5
## 3   7.3   VC  0.5
## 4   5.8   VC  0.5
## 5   6.4   VC  0.5
## 6  10.0   VC  0.5
## 7  11.2   VC  0.5
## 8  11.2   VC  0.5
## 9   5.2   VC  0.5
## 10  7.0   VC  0.5
## 31 15.2   OJ  0.5
## 32 21.5   OJ  0.5
## 33 17.6   OJ  0.5
## 34  9.7   OJ  0.5
## 35 14.5   OJ  0.5
## 36 10.0   OJ  0.5
## 37  8.2   OJ  0.5
## 38  9.4   OJ  0.5
## 39 16.5   OJ  0.5
## 40  9.7   OJ  0.5

2) Take a look at the distribution of the data

ggplot(d05,aes(supp,len))+geom_boxplot()

3) Assume the variance of two groups are the same, and we will analyze the effect of OJ vs VC in tooth growth when dosage is 0.5.

res<-t.test(len~supp,data = d05,var.equal=TRUE,paired=FALSE, alternative="less", conf.level=0.95)
res
## 
##  Two Sample t-test
## 
## data:  len by supp
## t = 3.1697, df = 18, p-value = 0.9973
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##      -Inf 8.122114
## sample estimates:
## mean in group OJ mean in group VC 
##            13.23             7.98

Conclusion: Since the p-value obtained is significantly bigger than 0.05, we can conclude that the average toothlength by using OJ is siginificantly longer than the group that uses VC when the dosage is fixed at 0.05, assuming the variance between two groups is the same.

We can certainly apply the same analysis to groups with dosage of 1 and 2. The procedure shall be the same. Here is to provide one hypothesis testing method.