This Final Course Project Consists of Two Parts:
Part 1 of this project will investigate the exponential distribution in R and compare it with the Central Limit Theorem. The exponential distribution will be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is \(\frac{1}{\lambda}\) and the standard deviation is also \(\frac{1}{\lambda}\). \(\lambda\) will be set \(\lambda = 0.2\) for all of the simulations. To investigate the distribution of averages of 40 exponentials an illustration via simulation and associated explanatory text will be provided.
First, the seed will be set for reproducibilty, the parameters will be set for the sample size, Lambda and our simulation size. Secondly, a NULL variable called “Exponentials” will be created to be used within a for loop to perform simulation and create a large collection of averages of 40 exponentials. This is accomplished using rexp(n, Lambda) as indicated in the overview.
set.seed(12345)
n<-40
Lambda<-0.2
Simulations<-1000
Exponentials <- NULL
for(i in 1:Simulations) {
Exponentials <- c(Exponentials, mean(rexp(n, Lambda)))
}
The absolute value of the differences is shown in the table below.
TheoMean<-1/Lambda; SampleMean<-mean(Exponentials)
MeanDiff<-abs(SampleMean-TheoMean)
CollMeans<-c(TheoMean, SampleMean, MeanDiff)
TheoVar<-(1/Lambda)/sqrt(n); SampleVar<-var(Exponentials)
VarDiff<-abs(SampleVar-TheoVar)
CollVars<-c(TheoVar, SampleVar, VarDiff)
ExpoTable<-data.frame(CollMeans, CollVars)
names(ExpoTable)<-c("Mean","Variance")
row.names(ExpoTable)<-c("Theoretical","Sample","Absolute Difference")
ExpoTable
Obviously, there is almost no difference between the sample and theoretical mean and variance. The mean difference is only 0.028028 and the variance difference is only 0.1951325.
The purppose of this final section of “Part 1: A Simulation Exercise” is to show that the distribution of a large collection of averages of 40 exponentials is approximately normal. The focus is on the difference between the distribution of a large collection of random exponentials and the distribution of a large collection of averages of 40 exponentials.
set.seed(12345)
Random <- runif(1000)
par(mfrow=c(1,2))
hist(Random, breaks = n, freq = F,
main = "Normal Distribution of \n Random Exponentials",
xlab = "Means")
legend("topright", pch = 19, col = 3, legend = "Density Line")
lines(density(Random), col = 3)
hist(Exponentials,breaks = n,freq = F,
main = "Normal Distribution of \n 40 Exponential Means",
xlab = "Means")
lines(density(Exponentials), col = 3)
legend("topright", pch = 19, col = 3, legend = "Density Line")
The difference between the two are quite different. The random exponentials are a uniform distribution and the 40 exponentials are normally distributed.
Our provided dataset calculates the “Effect of Vitamin C on Tooth Growth in Guinea Pigs.” The following key of variables of the dataset are provided by the Department of Mathematics of ETH Zurich website:[1]
[,1] len = Tooth length
[,2] supp = Supplement type (VC or OJ)
[,3] dose = Dose in milligrams/day
library(datasets)
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
boxplot(len~supp*dose, data=ToothGrowth,
col=(c("orange","purple")),
main="Tooth Growth", xlab="Suppliment and Dose",ylab="Tooth Length")
After EDA, the next step is to use hypothesis testing to compare tooth growth by supp and dose. Our \(H_0\) will state that the dose in milligrams per day of suppliment has no effect on tooth growth \(H_0:\mu=0\). Our \(H_1\) states dosage will be an effect on tooth growth \(H_1: \mu \neq 0\).
tg<-ToothGrowth
d.5<-subset(tg,dose==0.5); d1<-subset(tg,dose==1);d2<-subset(tg,dose==2)
HT1<-t.test(d.5$len~d.5$supp);HT2<-t.test(d1$len~d1$supp)
HT3<-t.test(d2$len~d2$supp)
PVALUES<-c(HT1$p.value,HT2$p.value,HT3$p.value)
Reject_NullHypothesis<-PVALUES<0.5
HypothesisTable<-data.frame(PVALUES,Reject_NullHypothesis)
row.names(HypothesisTable)<-c("0.5 mg/ day","1 mg/ day","2 mg/ day")
HypothesisTable
t.test(tg$len~tg$supp)
##
## Welch Two Sample t-test
##
## data: tg$len by tg$supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1710156 7.5710156
## sample estimates:
## mean in group OJ mean in group VC
## 20.66333 16.96333
Per the above hypothesis table, we have 2 out of 3 null hypothesis rejected since the p-value is below 0.05. It can can be said OJ did have a significant effect on tooth length compared to VC. However, on the 2 mg per day dosage there was no difference and both means remained within the 95% region. Comparitively, OJ and VC were not absolute in effect across all the data; however, OJ did have a significant effect on tooth growth given 0.5 mg and 1 mg per day.
[1] McNeil, D. R. (1977). Interactive Data Analysis. New York: Wiley.
Crampton, E. W. (1947). The growth of the odontoblast of the incisor teeth as a criterion of vitamin C intake of the guinea pig. The Journal of Nutrition, 33(5), 491–504. doi: 10.1093/jn/33.5.491.