Darragh Hanley November 2014
The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also also 1/lambda.Set lambda = 0.2 for all of the simulations.In this simulation, you will investigate the distribution of averages of 40 exponential(0.2)s. Note that you will need to do a thousand or so simulated averages of 40 exponentials.
Illustrate via simulation and associated explanatory text the properties of the distribution of the mean of 40 exponential(0.2)s. You should
Show where the distribution is centered at and compare it to the theoretical center of the distribution.
Show how variable it is and compare it to the theoretical variance of the distribution.
Show that the distribution is approximately normal.
Problem 1.1 and 1.2:
Simulate:
set.seed(100)
# Set lambda = 0.2 for all of the simulations
lambda=.2
# In this simulation, you will investigate the distribution of averages of 40 exponential(0.2)s.
n=40
# Note that you will need to do a thousand or so simulated averages of 40 exponentials.
sim=1000
means = as.data.frame(matrix(ncol=1, nrow=1000))
for(i in 1:sim){
means[i,1]<- mean(rexp(n, lambda))
}
Qualifying data for part 1 & 2
mean(means[,1])
## [1] 4.999702
sd(means[,1])
## [1] 0.8020251
var(means[,1])
## [1] 0.6432442
# Sample mean expected variance (sigma2/n) and standard deviation (sigma/sqrt(n))
(1/lambda)^2/(n)
## [1] 0.625
(1/lambda)/sqrt(n)
## [1] 0.7905694
The distribution is centered at 4.999702 and while the theoretical center of the distribution is 5 (1/lambda).
The distribution variance is 0.6432442 while the theorietical variance is 0.625. The distribution standard deviation is 0.8020251 while the theorietical standard deviation is 0.7905694.
Problem 1.3:
Below can be seen a plot of the 1000 simulations of the means.
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.1.2
ggplot(means ,aes(x = means[,1])) + geom_histogram(aes(fill = ..count..), binwidth=.1) + xlab("Simulated Mean") + ylab("Frequency") + ggtitle("Means of simulations of rexp(n, lambda)") + theme_bw()
The distribution above is seen to be approximately normal.
Now in the second portion of the class, we’re going to analyze the ToothGrowth data in the R datasets package.
1. Load the ToothGrowth data and perform some basic exploratory data analyses
library(datasets)
data(ToothGrowth)
dim(ToothGrowth)
## [1] 60 3
lapply(ToothGrowth,class)
## $len
## [1] "numeric"
##
## $supp
## [1] "factor"
##
## $dose
## [1] "numeric"
head(ToothGrowth)
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
## 4 5.8 VC 0.5
## 5 6.4 VC 0.5
## 6 10.0 VC 0.5
ggplot(ToothGrowth, aes(supp, len)) + geom_boxplot(aes(fill=factor(dose))) + xlab("Supplement type (VC or OJ)") + ylab("Tooth Length") + ggtitle("Effect of Vitamin C on Tooth Growth of Guinea Pigs") + scale_fill_discrete(name="Dose in \nmilligrams")
2. Provide a basic summary of the data.
Description : The response is the length of odontoblasts (teeth) in each of 10 guinea pigs at each of three dose levels of Vitamin C (0.5, 1, and 2 mg) with each of two delivery methods (orange juice or ascorbic acid).
Format : A data frame with 60 observations on 3 variables. [,1] len numeric Tooth length [,2] supp factor Supplement type (VC or OJ). [,3] dose numeric Dose in milligrams.
3. Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose. (Only use the techniques from class, even if there’s other approaches worth considering)
for(i in levels(ToothGrowth$supp)){
for(j in unique(ToothGrowth$dose)){
x <- ToothGrowth$len[ToothGrowth$supp==i&ToothGrowth$dose==j]
print(paste0("For dosage type ", i, " and for dosage quantity ", j, " the 95% confidence interval of the means is : "))
print(mean(x) + c(-1,1) * qnorm(0.975) * sd(x)/sqrt(length(x)))
}
}
## [1] "For dosage type OJ and for dosage quantity 0.5 the 95% confidence interval of the means is : "
## [1] 10.46589 15.99411
## [1] "For dosage type OJ and for dosage quantity 1 the 95% confidence interval of the means is : "
## [1] 20.27601 25.12399
## [1] "For dosage type OJ and for dosage quantity 2 the 95% confidence interval of the means is : "
## [1] 24.41441 27.70559
## [1] "For dosage type VC and for dosage quantity 0.5 the 95% confidence interval of the means is : "
## [1] 6.27765 9.68235
## [1] "For dosage type VC and for dosage quantity 1 the 95% confidence interval of the means is : "
## [1] 15.21102 18.32898
## [1] "For dosage type VC and for dosage quantity 2 the 95% confidence interval of the means is : "
## [1] 23.16639 29.11361
4. State your conclusions and the assumptions needed for your conclusions.
With the values seen in the boxplot and in the confidence interval tests it can be concluded that length of teeth in the guinea pig is influenced by the dosgae of vitamin c and the delivery methods. this is with the assumption that the sampling was sampled in a random manner representative of the population.