In this project we will investigate the exponential distribution in R and compare it with the Central Limit Theorem. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. Set lambda = 0.2 for all of the simulations. We will investigate the distribution of averages of 40 exponentials and will do thousand simulations.
knitr::opts_chunk$set(echo = TRUE)
library(ggplot2)
n<- 40
lambda <- 0.2
simdata<- matrix(rexp(1000*n,lambda),nrow = 1000, ncol = n)
mean_dist<- apply(simdata, 1, mean)
hist(mean_dist,breaks = 50, main = "The distribution of 1000 averages of 40 random exponentials", xlab = "Means", ylab = "Frequency" )
abline(v= 1/lambda, lty = 1, lwd = 3, col = "blue")
legend("topright", lty = 1, lwd = 3, col = "blue", legend = "Theoretical Mean")
sample_mean<- mean(mean_dist)
sample_mean
## [1] 5.033899
theoretical_mean<- 1/lambda
theoretical_mean
## [1] 5
sample_var<- var(mean_dist)
sample_var
## [1] 0.6569091
theoretical_var<- (1/lambda)^2/n
theoretical_var
## [1] 0.625
x<- seq(min(mean_dist), max(mean_dist), length = 100)
y<- dnorm(x, mean = theoretical_mean, sd = 1/.2/sqrt(n))
hist(mean_dist,breaks = n, prob = T,xlab = "means", ylab = "count", main = "Density of Means")
lines(x,y,lty = 5, pch = 2, col = "red")
library(stats)
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
data("ToothGrowth")
dim(ToothGrowth)
## [1] 60 3
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
data set ToothGrowth has 3 variables and 60 observations: 1. length - length of tooth , its numeric 2. supp - its factor with two levels oj:orange juice vc:vitamin C 3. dose - dosage of supplement and its numeric
qplot(x = supp, y = len, data = ToothGrowth, facets = ~dose, main = "Tooth growth by supplement type and dosage" , xlab = "Supplement" , ylab = " Tooth Length")+ geom_boxplot(aes(fill = supp))
We are going to do two sample t-testing on the data.For this we first split the data into groups according to the levels of two supplements OJ and VC.
dose_0.5<-filter(ToothGrowth, dose== 0.5)
dose_1.0<-filter(ToothGrowth, dose== 1.0)
dose_2.0<-filter(ToothGrowth, dose== 2.0)
Now we will test whether OJ and VC with same dosage have statistical significant differences in mean length in tooth growth.
t.test(len~supp, paired = FALSE, data = dose_0.5)
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 3.1697, df = 14.969, p-value = 0.006359
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.719057 8.780943
## sample estimates:
## mean in group OJ mean in group VC
## 13.23 7.98
As the p-value is lower than .05 we reject null hypothesis in favor of alternate hypothesis. so there is statistical significant difference in mean growth of tooth at dosage of 0.5.
Now lets run the same test for dosage of 1.
t.test(len~supp, paired = FALSE, data = dose_1.0)
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 4.0328, df = 15.358, p-value = 0.001038
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 2.802148 9.057852
## sample estimates:
## mean in group OJ mean in group VC
## 22.70 16.77
In this case also the p-value is lower than .05 so we reject null hypothesis in favor of alternative hypothesis.
Now we run the same test for dosage level 2.
t.test(len~supp, paired = FALSE, data = dose_2.0)
##
## Welch Two Sample t-test
##
## data: len by supp
## t = -0.046136, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3.79807 3.63807
## sample estimates:
## mean in group OJ mean in group VC
## 26.06 26.14
In this case the p-value is higher than o.5 so we fail to reject null hypothesis in favor of alernate hypothesis means there is no statistical significant difference in the mean growth of tooth length when the dosage level is 2.
We assume that population is normally distributed as whole and we also assume population under different doseage of supplements is normally distributed . Conclusion is when dose is 0.5 or 1.0 the p-value is lower than 0.5 so null hypothesis can be rejected in favor of alternative hypothesis while in case of 2.0 dose the p-value is higher than 0.5 so we fail to reject null hypothesis in favor of alternate hypothesis.