Part 1 - Exponential Distribution Simulations and True Values

In this project the exponential distribution will be used and it wil be compared with the Central Limit Theorem. The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. Lambda = 0.2 will be used for all of the simulations. The distribution of averages of 40 exponentials (with 1,000 simulations) will be analyzed as.

The sample mean will be calculated and compared to the theoretical mean of the distribution.
The sample variability will be shown (via variance) and will compare it to the theoretical variance of the distribution. The objective is to show that the distribution is approximately normal.
the difference between the distribution of a large collection of random exponentials and the distribution of a large collection of averages of 40 exponentials will be shown.

lambda = 0.2
     n = 40
 num_sim = 1000
set.seed(001)

1000 simulations - Data Generation

data_sim0 <- c(seq(1:num_sim), replicate(num_sim, mean(rexp(n, lambda))),replicate(num_sim, var(rexp(n,lambda))),replicate(num_sim, sd(rexp(n,lambda))))

data_sim <- matrix(data_sim0,ncol=4)
colnames(data_sim) <- c("Obs","Mean", "Var", "SD")

head(data_sim, 15)

##       Obs     Mean       Var       SD
##  [1,]   1 4.860372  14.87657 3.819551
##  [2,]   2 5.961285  36.75818 7.252338
##  [3,]   3 4.279204  27.10864 4.243233
##  [4,]   4 4.702298  26.44750 4.501053
##  [5,]   5 5.196446 102.22795 4.366088
##  [6,]   6 4.397114  19.87817 3.849005
##  [7,]   7 5.995896  24.12763 5.594448
##  [8,]   8 5.072570  20.50609 5.149621
##  [9,]   9 4.027864  12.71475 4.036134
## [10,]  10 4.163866  26.81836 3.923682
## [11,]  11 4.874493  19.34167 4.044320
## [12,]  12 4.539306  15.52259 6.124232
## [13,]  13 5.115051  19.25615 5.376306
## [14,]  14 6.351128  22.22989 5.797816
## [15,]  15 5.009125  16.56606 4.139898

tail(data_sim, 15)

##          Obs     Mean      Var       SD
## [986,]   986 4.145751 28.86888 4.095807
## [987,]   987 5.476217 37.02790 3.674008
## [988,]   988 4.806067 21.97471 4.388054
## [989,]   989 4.243286 43.42819 4.049938
## [990,]   990 4.826787 13.21191 4.613409
## [991,]   991 5.918929 28.38579 5.248376
## [992,]   992 5.142787 10.50490 4.313459
## [993,]   993 5.308126 17.50120 5.280325
## [994,]   994 5.035313 28.71859 5.426077
## [995,]   995 6.286201 18.58398 6.136355
## [996,]   996 5.075948 33.15492 4.657401
## [997,]   997 4.589360 21.87433 4.553191
## [998,]   998 5.236807 23.14276 4.114057
## [999,]   999 4.585593 14.73064 5.183856
## [1000,] 1000 5.365658 15.91492 4.908612

Means of simulations

sample_mean <- mean(data_sim[,2])
sample_mean

## [1] 4.990025

Theoretical Mean (Central Limit Distribution)

rexp_mean <- c(1/lambda)
rexp_mean

## [1] 5

Comparison of Sample Mean with Theretical Mean of rexp

samp_and_theo_means <- c( sample_mean , 1/lambda ) 
samp_and_theo_means

## [1] 4.990025 5.000000

plots: Distributio of the Mean of 1000 simulations

hist(data_sim[,2], 
     main='Histogram - Means of 1000 Repetitions - R Rexp(40, Lambda) Function', 
     xlab='Sample Mean Values - Lambda=0.2', 
     border="blue", 
     col="blue", 
     xlim=c(2,8), 
     las=1, 
     breaks=100, 
     prob = TRUE)
lines(density(data_sim[,2]),lwd=6)

abline(v=mean(data_sim[,2]),col="green", lwd=10)

abline(v=1/lambda,col="red", lwd=4)

Comparison of the Sample Variance with Theretical Variance of rexp

samp_and_theo_variances <- c( mean(data_sim[,3]), (1/lambda)^2) 
samp_and_theo_variances

## [1] 25.57262 25.00000

plots: Distribution of the Variance of 1000 simulations

hist(data_sim[,3], 
     main='Histogram - Variance of 1000 Repetitions - R Rexp(40, Lambda) Function', 
     xlab='Sample Variance Values - Lambda=0.2', 
     border="blue", 
     col="blue", 
     xlim=c(0,60), 
     las=1, 
     breaks=400, 
     prob = TRUE
     )
lines(density(data_sim[,3]),lwd=6)
abline(v=mean(data_sim[,3]),col="red", lwd=10)
abline(v=(1/lambda)^2,col="green", lwd=4)

Conclusion

For both, the sample mean and sample variance of 1000 simulations of the R package rexp(n,lambda) function, are centered at the theoretical mean and theoretical variance values. As the charts presented above show, the Mean and variance distributions are approximatly normal and are consistent with the principles of the central limit theory.

Part 2: Basic Inferential Data Analysis Instructionsless

The following section shows the analyze of the “ToothGrowth data” (R datasets package). This will be carried out as:

Loading of the ToothGrowth data and performing of basic exploratory data analyses
A basic summary of the data.
The use of confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose. (Only use the techniques from class, even if there’s other approaches worth considering) State your conclusions and the assumptions needed for your conclusions.

Dataset: The effect of Vitamin C on Tooth Growth in Guinea Pigs

Description: The response (Tooth length) is the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods , (orange juice or ascorbic acid (a form of vitamin C and coded as VC).

ToothGrowth format: A data frame with 60 observations on 3 variables. [,1] len numeric Tooth length [,2] supp factor Supplement type (VC or OJ). [,3] dose numeric Dose in milligrams/day

Source C. I. Bliss (1952) The Statistics of Bioassay. Academic Press. (referenced from R documentation datasets)

library(datasets)
data("ToothGrowth")
summary(ToothGrowth)

##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

require(graphics)
coplot(len ~ dose | supp, data = ToothGrowth, panel = panel.smooth,xlab = "Tooth Lenght Response on Dose by Supplement Type")

Table1 <- aggregate(ToothGrowth$len, list( ToothGrowth$dose , ToothGrowth$supp), FUN=mean)
colnames(Table1) <- c("Dose","Supp", "Dose(Mean)")
Table1

##   Dose Supp Dose(Mean)
## 1  0.5   OJ      13.23
## 2  1.0   OJ      22.70
## 3  2.0   OJ      26.06
## 4  0.5   VC       7.98
## 5  1.0   VC      16.77
## 6  2.0   VC      26.14

Assuming that the experimental study has controlled for other factors related to tooth growth in mice, an initial analysis of the data suggests that dose levels of vitamin C may have an effect in tooth lenght. However, a compound effect of dose levels and type of supplement may be present.

The type of supplement (VC or OJ) seems to have an effect on tooth growth in the dose category of 0.5mg. At the dose level of 0.5mg, the data suggests a difference in tooth lenght amomg the two types of supplement: (lenght growth men of OJ=13.23 vs VC=7.98). In the high dose category (2.0) it seems tehre are not differences between the type of supplement and differences in tooth lenght.

To carry out the analysis by dose levels, the data will be suset in three different ways depending: a data subset of doses i the categories 0.5 and 1.0, another subset of values 0.5 and 2.0 and a third data subset of values 1.0 vs 2.0.

Dose05_Dose10 <- subset(ToothGrowth, dose== 0.5 |dose == 1.0 )
Dose05_Dose20 <- subset(ToothGrowth, dose== 0.5 |dose == 2.0 )
Dose10_Dose20 <- subset(ToothGrowth, dose== 1.0 |dose == 2.0 )
 
 t.test(len~supp, paired=F, var.equal=T, data=ToothGrowth, lternative="greater")

## 
##  Two Sample t-test
## 
## data:  len by supp
## t = 1.9153, df = 58, p-value = 0.06039
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1670064  7.5670064
## sample estimates:
## mean in group OJ mean in group VC 
##         20.66333         16.96333

 t.test(len~dose, paired=F, var.equal=T, data=Dose05_Dose10)

## 
##  Two Sample t-test
## 
## data:  len by dose
## t = -6.4766, df = 38, p-value = 1.266e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.983748  -6.276252
## sample estimates:
## mean in group 0.5   mean in group 1 
##            10.605            19.735

 t.test(len~dose, paired=F, var.equal=T, data=Dose05_Dose20)

## 
##  Two Sample t-test
## 
## data:  len by dose
## t = -11.799, df = 38, p-value = 2.838e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -18.15352 -12.83648
## sample estimates:
## mean in group 0.5   mean in group 2 
##            10.605            26.100

 t.test(len~dose, paired=F, var.equal=T, data=Dose10_Dose20)

## 
##  Two Sample t-test
## 
## data:  len by dose
## t = -4.9005, df = 38, p-value = 1.811e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -8.994387 -3.735613
## sample estimates:
## mean in group 1 mean in group 2 
##          19.735          26.100

Conclusion

The data in the study indicates that on overall, the type of supplement (VC vs OJ) has not effect on teeth growth lenght (T p-value = 0.06039. The 95% condifence interval includes the value of zero (-0.1670064 7.5670064).

On the other hand, the data suggest that dose levels of Vitamin C have an effect on teeth growth. The differences in teeth growth are significant among the three different dose levels, and the tree confidence intervals calculated for the combinations of levels (0.5 vs 1.0, 0.5 vs 2.0 and 1.0 vs 2.0), all exclude the value of zero from the CI. Then, we can accept the alternative hypothesis that teeh lenght differ among dose levels.

===

Rpubs MauVas http://rpubs.com/MauVas.

Howtos: [knitr documentation& markdown] (http://shiny.rstudio.com/articles/rmarkdown.html)

Statistical Inference Course Project

Mauricio Vasquez

August 13, 2016

Part 1 - Exponential Distribution Simulations and True Values

Conclusion

Part 2: Basic Inferential Data Analysis Instructionsless

Conclusion