In this project the exponential distribution will be used and it wil be compared with the Central Limit Theorem. The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. Lambda = 0.2 will be used for all of the simulations. The distribution of averages of 40 exponentials (with 1,000 simulations) will be analyzed as.
The sample mean will be calculated and compared to the theoretical mean of the distribution.
The sample variability will be shown (via variance) and will compare it to the theoretical variance of the distribution. The objective is to show that the distribution is approximately normal.
the difference between the distribution of a large collection of random exponentials and the distribution of a large collection of averages of 40 exponentials will be shown.
lambda = 0.2
n = 40
num_sim = 1000
set.seed(001)
1000 simulations - Data Generation
data_sim0 <- c(seq(1:num_sim), replicate(num_sim, mean(rexp(n, lambda))),replicate(num_sim, var(rexp(n,lambda))),replicate(num_sim, sd(rexp(n,lambda))))
data_sim <- matrix(data_sim0,ncol=4)
colnames(data_sim) <- c("Obs","Mean", "Var", "SD")
head(data_sim, 15)
## Obs Mean Var SD
## [1,] 1 4.860372 14.87657 3.819551
## [2,] 2 5.961285 36.75818 7.252338
## [3,] 3 4.279204 27.10864 4.243233
## [4,] 4 4.702298 26.44750 4.501053
## [5,] 5 5.196446 102.22795 4.366088
## [6,] 6 4.397114 19.87817 3.849005
## [7,] 7 5.995896 24.12763 5.594448
## [8,] 8 5.072570 20.50609 5.149621
## [9,] 9 4.027864 12.71475 4.036134
## [10,] 10 4.163866 26.81836 3.923682
## [11,] 11 4.874493 19.34167 4.044320
## [12,] 12 4.539306 15.52259 6.124232
## [13,] 13 5.115051 19.25615 5.376306
## [14,] 14 6.351128 22.22989 5.797816
## [15,] 15 5.009125 16.56606 4.139898
tail(data_sim, 15)
## Obs Mean Var SD
## [986,] 986 4.145751 28.86888 4.095807
## [987,] 987 5.476217 37.02790 3.674008
## [988,] 988 4.806067 21.97471 4.388054
## [989,] 989 4.243286 43.42819 4.049938
## [990,] 990 4.826787 13.21191 4.613409
## [991,] 991 5.918929 28.38579 5.248376
## [992,] 992 5.142787 10.50490 4.313459
## [993,] 993 5.308126 17.50120 5.280325
## [994,] 994 5.035313 28.71859 5.426077
## [995,] 995 6.286201 18.58398 6.136355
## [996,] 996 5.075948 33.15492 4.657401
## [997,] 997 4.589360 21.87433 4.553191
## [998,] 998 5.236807 23.14276 4.114057
## [999,] 999 4.585593 14.73064 5.183856
## [1000,] 1000 5.365658 15.91492 4.908612
Means of simulations
sample_mean <- mean(data_sim[,2])
sample_mean
## [1] 4.990025
Theoretical Mean (Central Limit Distribution)
rexp_mean <- c(1/lambda)
rexp_mean
## [1] 5
Comparison of Sample Mean with Theretical Mean of rexp
samp_and_theo_means <- c( sample_mean , 1/lambda )
samp_and_theo_means
## [1] 4.990025 5.000000
plots: Distributio of the Mean of 1000 simulations
hist(data_sim[,2],
main='Histogram - Means of 1000 Repetitions - R Rexp(40, Lambda) Function',
xlab='Sample Mean Values - Lambda=0.2',
border="blue",
col="blue",
xlim=c(2,8),
las=1,
breaks=100,
prob = TRUE)
lines(density(data_sim[,2]),lwd=6)
abline(v=mean(data_sim[,2]),col="green", lwd=10)
abline(v=1/lambda,col="red", lwd=4)
Comparison of the Sample Variance with Theretical Variance of rexp
samp_and_theo_variances <- c( mean(data_sim[,3]), (1/lambda)^2)
samp_and_theo_variances
## [1] 25.57262 25.00000
plots: Distribution of the Variance of 1000 simulations
hist(data_sim[,3],
main='Histogram - Variance of 1000 Repetitions - R Rexp(40, Lambda) Function',
xlab='Sample Variance Values - Lambda=0.2',
border="blue",
col="blue",
xlim=c(0,60),
las=1,
breaks=400,
prob = TRUE
)
lines(density(data_sim[,3]),lwd=6)
abline(v=mean(data_sim[,3]),col="red", lwd=10)
abline(v=(1/lambda)^2,col="green", lwd=4)
For both, the sample mean and sample variance of 1000 simulations of the R package rexp(n,lambda) function, are centered at the theoretical mean and theoretical variance values. As the charts presented above show, the Mean and variance distributions are approximatly normal and are consistent with the principles of the central limit theory.
The following section shows the analyze of the “ToothGrowth data” (R datasets package). This will be carried out as:
Loading of the ToothGrowth data and performing of basic exploratory data analyses
A basic summary of the data.
The use of confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose. (Only use the techniques from class, even if there’s other approaches worth considering) State your conclusions and the assumptions needed for your conclusions.
Dataset: The effect of Vitamin C on Tooth Growth in Guinea Pigs
Description: The response (Tooth length) is the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods , (orange juice or ascorbic acid (a form of vitamin C and coded as VC).
ToothGrowth format: A data frame with 60 observations on 3 variables. [,1] len numeric Tooth length [,2] supp factor Supplement type (VC or OJ). [,3] dose numeric Dose in milligrams/day
Source C. I. Bliss (1952) The Statistics of Bioassay. Academic Press. (referenced from R documentation datasets)
library(datasets)
data("ToothGrowth")
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
require(graphics)
coplot(len ~ dose | supp, data = ToothGrowth, panel = panel.smooth,xlab = "Tooth Lenght Response on Dose by Supplement Type")
Table1 <- aggregate(ToothGrowth$len, list( ToothGrowth$dose , ToothGrowth$supp), FUN=mean)
colnames(Table1) <- c("Dose","Supp", "Dose(Mean)")
Table1
## Dose Supp Dose(Mean)
## 1 0.5 OJ 13.23
## 2 1.0 OJ 22.70
## 3 2.0 OJ 26.06
## 4 0.5 VC 7.98
## 5 1.0 VC 16.77
## 6 2.0 VC 26.14
Assuming that the experimental study has controlled for other factors related to tooth growth in mice, an initial analysis of the data suggests that dose levels of vitamin C may have an effect in tooth lenght. However, a compound effect of dose levels and type of supplement may be present.
The type of supplement (VC or OJ) seems to have an effect on tooth growth in the dose category of 0.5mg. At the dose level of 0.5mg, the data suggests a difference in tooth lenght amomg the two types of supplement: (lenght growth men of OJ=13.23 vs VC=7.98). In the high dose category (2.0) it seems tehre are not differences between the type of supplement and differences in tooth lenght.
To carry out the analysis by dose levels, the data will be suset in three different ways depending: a data subset of doses i the categories 0.5 and 1.0, another subset of values 0.5 and 2.0 and a third data subset of values 1.0 vs 2.0.
Dose05_Dose10 <- subset(ToothGrowth, dose== 0.5 |dose == 1.0 )
Dose05_Dose20 <- subset(ToothGrowth, dose== 0.5 |dose == 2.0 )
Dose10_Dose20 <- subset(ToothGrowth, dose== 1.0 |dose == 2.0 )
t.test(len~supp, paired=F, var.equal=T, data=ToothGrowth, lternative="greater")
##
## Two Sample t-test
##
## data: len by supp
## t = 1.9153, df = 58, p-value = 0.06039
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1670064 7.5670064
## sample estimates:
## mean in group OJ mean in group VC
## 20.66333 16.96333
t.test(len~dose, paired=F, var.equal=T, data=Dose05_Dose10)
##
## Two Sample t-test
##
## data: len by dose
## t = -6.4766, df = 38, p-value = 1.266e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.983748 -6.276252
## sample estimates:
## mean in group 0.5 mean in group 1
## 10.605 19.735
t.test(len~dose, paired=F, var.equal=T, data=Dose05_Dose20)
##
## Two Sample t-test
##
## data: len by dose
## t = -11.799, df = 38, p-value = 2.838e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -18.15352 -12.83648
## sample estimates:
## mean in group 0.5 mean in group 2
## 10.605 26.100
t.test(len~dose, paired=F, var.equal=T, data=Dose10_Dose20)
##
## Two Sample t-test
##
## data: len by dose
## t = -4.9005, df = 38, p-value = 1.811e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -8.994387 -3.735613
## sample estimates:
## mean in group 1 mean in group 2
## 19.735 26.100
The data in the study indicates that on overall, the type of supplement (VC vs OJ) has not effect on teeth growth lenght (T p-value = 0.06039. The 95% condifence interval includes the value of zero (-0.1670064 7.5670064).
On the other hand, the data suggest that dose levels of Vitamin C have an effect on teeth growth. The differences in teeth growth are significant among the three different dose levels, and the tree confidence intervals calculated for the combinations of levels (0.5 vs 1.0, 0.5 vs 2.0 and 1.0 vs 2.0), all exclude the value of zero from the CI. Then, we can accept the alternative hypothesis that teeh lenght differ among dose levels.
===
Rpubs MauVas http://rpubs.com/MauVas.
Howtos: [knitr documentation& markdown] (http://shiny.rstudio.com/articles/rmarkdown.html)