Structure: Data Preparation, Section 1, Section 2, Section 3
The goal of this project is to investigate the exponential distribution in R and compare it with the Central Limit Theorem.
The exponential distribution will be simulated in R with rexp(n, lambda) where lambda is the rate parameter
We will investigate the distribution of averages of 40 exponentials based on a thousand simulations.
set.seed(12345)
lambda = .2
mean = 1/lambda
sdev = 1/lambda
data <- matrix(rep(NA),nrow=40,ncol=1000)
for (i in 1:1000){
sim <- rexp(40,rate=lambda)
data[,i] <- sim
}
sample_means <- colMeans(data)
library(pander)
panderOptions("digits", 2)
pander(summary(sample_means))
| Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. |
|---|---|---|---|---|---|
| 2.7 | 4.5 | 4.9 | 5 | 5.5 | 8.3 |
# theoretical mean
theoretical <- 1/.2
theoretical
## [1] 5
# distribution mean
distr_mean <- mean(sample_means)
distr_mean
## [1] 4.971972
The sample mean is 4.971972 whereas the theoretical mean is 5. The sampling distribution plot indicates that the distribution center is located near the theoretical mean.
Standard Deviation of the distribution is 0.772 with the theoretical standard deviation of 0.595. The theoretical variance is 0.625, while the actual variance of the distribution is 0.791.
The following table shows how variable the sample is compared to the theoretical values.
| Â | Sample | Theoretical |
|---|---|---|
| Standard deviation | 0.772 | 0.595 |
| Variance | 0.791 | 0.625 |
Distribution of exponential avarages follows normal distributions according to the Central Limit Theorem(CLT) as shown on the graphs.
library(ggplot2)
qplot(sample_means, type = "histogram", binwidth=1/6) +
geom_histogram(colour="black", fill="steelblue") +
labs(title = "Sampling Distribution",x="Sample means",y="Count")+
geom_rug(col = "steelblue", alpha = 0.3) +
geom_vline(aes(xintercept=distr_mean, colour="red"),size=1.1) +
geom_vline(aes(xintercept=theoretical, colour="orange2"),size=1.1)
SSD <- sd(sample_means) #sample standard deviation
SVAR <- SSD^2 #sample variance
n <- 40 # number of samples is equal to 40 (each samples consists of 1,000 observations)
TSD <- sdev/sqrt(n) #theoretical standard deviation
TVAR <- TSD^2 #theoretical variance
x <- rbind(c(SSD,SVAR),c(TSD,TVAR))
rownames(x) <- c("Standard deviation","Variance"); colnames(x) <- c("Sample","Theoretical")
library(pander)
panderOptions("digits", 3)
pander(x)
Plots 1&2:
library(ggplot2)
library(gridExtra)
plot1 <- qplot(as.vector(data)) +
geom_histogram(colour="black", fill="steelblue") +
labs(title = "Exponential Distribution",x="Values",y="Count") +
geom_rug(col = "steelblue", alpha = 0.2) +
geom_vline(aes(xintercept=distr_mean,colour="red"),size=1.1)
df <- data.frame(Means=sample_means)
plot2 <- ggplot(data = df, aes(x = Means)) +
geom_histogram(aes(y=..density..), fill = "whitesmoke", binwidth = 1/6, color = "royalblue", alpha = 1/2) +
geom_vline(aes(xintercept=distr_mean, colour="Sample mean"), size = 1.25,linetype="dotdash") +
geom_vline(aes(xintercept=theoretical,colour = "Theoretical mean"), size = 1.25, linetype="dashed") +
geom_density(aes(color = "Means distribution"), size = 2.25, show_guide=FALSE) +
stat_function(fun=dnorm, arg=list(mean=theoretical, sd=TSD), aes(color = "Normal distribution"), size = 2) +
theme(legend.justification=c(1,0), legend.position=c(1.15,0.65)) +
labs(title = "Means Distribution", x = "Exponential means") +
scale_x_continuous(limits = c(1, 10), breaks=1:10) +
scale_color_discrete(name ="Compared Parameters") +
geom_rug(col = "royalblue", alpha = 0.2)
grid.arrange(plot1, plot2,nrow=1)
Plot 3:
qqnorm(sample_means,main="Quantile-Quantile plot",xlab = "Theoretical Quantiles", ylab = "Sample Quantiles")
qqline(sample_means,col=4)
#####The current system configuration:
sessionInfo()
## R version 3.1.3 (2015-03-09)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: OS X 10.10.3 (Yosemite)
##
## locale:
## [1] ru_RU.UTF-8/ru_RU.UTF-8/ru_RU.UTF-8/C/ru_RU.UTF-8/ru_RU.UTF-8
##
## attached base packages:
## [1] grid stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] gridExtra_0.9.1 ggplot2_1.0.1 pander_0.5.2
##
## loaded via a namespace (and not attached):
## [1] colorspace_1.2-6 digest_0.6.8 evaluate_0.7 formatR_1.2
## [5] gtable_0.1.2 htmltools_0.2.6 knitr_1.10.5 labeling_0.3
## [9] magrittr_1.5 MASS_7.3-40 munsell_0.4.2 plyr_1.8.2
## [13] proto_0.3-10 Rcpp_0.11.6 reshape2_1.4.1 rmarkdown_0.5.1
## [17] scales_0.2.4 stringi_0.4-1 stringr_1.0.0 tools_3.1.3
## [21] yaml_2.1.13