Statistical Inference Course Project BY A.ALEID


OVERVIEW

The first part covers a simulation exponential distribution of 0.2 lambda of a thousand simulation of average of 40 where The difference between theoretical and sample mean and variance were calculated.

In the second part, ToothGrowth data was explored and difference between different doses and supplements were statistically compared.


PART 1


1. SIMULATION

The mean of exponential distribution of 0.2 lambda of a thousand simulation of average of 40 is carried out.

The difference between the sample mean and variance compared to theoretical mean and variance are calculated.


2. MEANS

 mns = NULL
for (i in 1 : 1000) mns = c(mns, mean(rexp(40, .2)))
hist(mns, main = "Histogram of Means") #display histogram
a=(1/0.2)-mean(mns)#calculate difference between theor. and sample means
b=((1/0.2^2)/40)-var(mns)#calculate difference between theor. and sample variance
abline(v=mean(mns), col="red", lwd=2)
abline(v=5, col="blue", lwd=2)
legend(x = "topright", c("sample Mean", "Theoretical Mean"), col = c("red", "blue"),lwd = c(2, 2))

The difference between the theoretical mean and sample mean is 0.0070208


3. VARIANCE

 vrs = NULL
for (i in 1 : 1000) vrs = c(vrs, var(rexp(40, .2)))
hist(vrs, main = "Histogram of Variances") #display histogram of variance
abline(v=var(mns), col="red", lwd=2)
abline(v=b, col="blue", lwd=2)
legend(x = "topright", c("sample Mean","Theoretical Variance"), col = c("red", "blue"),lwd = 2)

The difference between the theoretical variance and sample variance is 0.0017617


4. DISTRIBUTION

hist(mns, breaks=18, prob=TRUE, xlab="Mean of exponentials", ylab="Frequency", col="blue", main = "Distribution")

curve(dnorm(x, mean=mean(mns), sd=sd(mns)), col="red", lwd=2, 
      lty = "dotted", add=TRUE, yaxt="n")
curve(dnorm(x, mean=5, sd=0.79), col="black", lwd=2, add=TRUE, yaxt="n")



PART 2

1. Exploratory data analyses and summary

data("ToothGrowth")
str(ToothGrowth)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
summary(ToothGrowth)
##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000
with(ToothGrowth, plot(supp, len))

2. T.TEST FOR SUPPLEMENTS

vc1<-ToothGrowth[ToothGrowth$supp=="VC",]
mean(vc1$len)
## [1] 16.96333
sd(vc1$len)
## [1] 8.266029
OJ1<-ToothGrowth[ToothGrowth$supp=="OJ",]
mean(OJ1$len)
## [1] 20.66333
sd(OJ1$len)
## [1] 6.605561
t.test(vc1$len, OJ1$len, paired = FALSE, var.equal = TRUE)$conf
## [1] -7.5670064  0.1670064
## attr(,"conf.level")
## [1] 0.95

3. T.TEST FOR DOSES

ds05<-ToothGrowth[ToothGrowth$dose==.5,]
mean(ds05$len)
## [1] 10.605
sd(ds05$len)
## [1] 4.499763
ds1<-ToothGrowth[ToothGrowth$dose==1,]
mean(ds1$len)
## [1] 19.735
sd(ds1$len)
## [1] 4.415436
ds2<-ToothGrowth[ToothGrowth$dose==2,]
mean(ds2$len)
## [1] 26.1
sd(ds2$len)
## [1] 3.77415
t.test(ds05$len, ds1$len, paired = FALSE, var.equal = TRUE)$conf
## [1] -11.983748  -6.276252
## attr(,"conf.level")
## [1] 0.95
t.test(ds05$len, ds2$len, paired = FALSE, var.equal = TRUE)$conf
## [1] -18.15352 -12.83648
## attr(,"conf.level")
## [1] 0.95
t.test(ds2$len, ds1$len, paired = FALSE, var.equal = TRUE)$conf
## [1] 3.735613 8.994387
## attr(,"conf.level")
## [1] 0.95

4. CONCLUSION

The above results show the following:

  • At 95% confidence interval, the null hypothesis cannot be rejected, and there is no significant difference between the two types of supplements (i.e. OJ and VC) on growth of teeth.

  • At 95% confidence interval, the null hypothesis is rejected for the three combinations of doses (0.5, 1.0 and 2.0), and there is significant difference between the three types of doses on growth of teeth.