and compare it to the Central Limit Theorem. For this analysis, lambda will be set to 0.2 for all of the simulations.
Objective: Illustrate via simulation and associated explanatory text the properties of the distribution of the mean of 40 exponentials.
library(knitr)
library(ggplot2)
knitr::opts_chunk$set(echo = TRUE)
lambda <- 0.2
simData <- matrix(rexp(1000*40, lambda), nrow = 1000, ncol = 40)
distMean <- apply(simData, 1, mean)
hist(distMean, breaks = 50, main = "Distribution of 1000 averages of 40 random exponentials", xlab = "Value of the means", ylab = "Frequency of the means", col = "green")
abline(v = 1/lambda, lty = 2, lwd = 8, col = "black")
legend("topright", lty = 1, lwd = 6, col = "black", legend = "mean")
This shows a normal mean distribution.
distVar <- apply(simData, 1, var)
hist(distVar, breaks = 50, main = "Distribution of 1000 variance of 40 random exponentials", xlab = "Value of variances", ylab = "Frequency of variance", col = "orange")
abline(v = (1/lambda)^2, lty = 2, lwd = 8, col = "blue")
legend("topright", lty = 1, lwd = 6, col = "blue", legend = "variance")
The sample variances are almost normal with the center near the variance
par(mfrow = c(3, 1))
hist(simData, breaks = 50, main = "Distribution of exponentials with lambda equals to 0.2", xlab = "Exponentials", col = "light pink")
hist(distMean, breaks = 50, main = "Distribution of 1000 averages of 40 random exponentials", xlab = "Value of the means", ylab = "Frequency of means", col = "light green")
simNorm <- rnorm(1000, mean = mean(distMean), sd = sd(distMean))
Shows the distributions have some variance
comparing the guinea tooth growth by supplement and dose. First, you should do exploratory data analysis on the data set. Then do the comparison with confidence intervals in order to make conclusions about the tooth growth.
library(datasets)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(stats)
data(ToothGrowth)
library(ggplot2)
t = ToothGrowth
levels(t$supp) <- c("Orange Juice", "Ascorbic Acid")
ggplot(t, aes(x=factor(dose), y=len)) +
facet_grid(.~supp) +
geom_boxplot(aes(fill = supp), show_guide = FALSE) +
labs(title="Guinea pigs tooth length by the dosage for each type of the supplement",
x="Dose (mg/day)",
y="Tooth Length")
## Warning: `show_guide` has been deprecated. Please use `show.legend` instead.
The plots show the increased dosage increases the tooth growth.
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
summary(ToothGrowth[ToothGrowth$supp == "OJ", ])
## len supp dose
## Min. : 8.20 OJ:30 Min. :0.500
## 1st Qu.:15.53 VC: 0 1st Qu.:0.500
## Median :22.70 Median :1.000
## Mean :20.66 Mean :1.167
## 3rd Qu.:25.73 3rd Qu.:2.000
## Max. :30.90 Max. :2.000
t.test(x = ToothGrowth$len, data = ToothGrowth, paired = FALSE, conf.level = 0.95)$conf.
## [1] 16.83731 20.78936
## attr(,"conf.level")
## [1] 0.95
Then to calculate the mean under both supplements
summary(ToothGrowth[ToothGrowth$supp == "OJ", ]$len)[4]
## Mean
## 20.66333
summary(ToothGrowth[ToothGrowth$supp == "VC", ]$len)[4]
## Mean
## 16.96333
Both of them are inside the confidence intervals. OJ at 20.66 and VC at 16.96