In this project we will investigate the exponential distribution in R and compare it with the Central Limit Theorem. The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. Set lambda = 0.2 for all of the simulations. We will investigate the distribution of averages of 40 exponentials.
set.seed(262018)
mns = NULL
n<-40
lambda<-0.2
for (i in 1 : 1000) mns = c(mns, mean(rexp(n,rate=lambda)))
hist(mns,main="Sample Mean for Exponential Function",col="blue")
abline(v=mean(mns),col="red",lwd=4)
sample_mean<-mean(mns)
paste("sample mean is ",sample_mean)
## [1] "sample mean is 5.05353749083723"
t_mean<-1/lambda
paste("theoritical mean is ",t_mean)
## [1] "theoritical mean is 5"
sample_var<-var(mns)
t_var<-(1/lambda)^2/n
paste0("Sample variance is ",sample_var)
## [1] "Sample variance is 0.628141741217932"
paste0("Theoritical variance is ",t_var)
## [1] "Theoritical variance is 0.625"
m<-mean(mns)
std<-sqrt(var(mns))
hist(mns, xlab="x-variable",prob=TRUE,
main="normal curve over histogram")
x <- seq(min(mns), max(mns), length=2*1000)
y <- dnorm(x, mean=m, sd=std)
# sample curve
lines(x, y,col="blue")
x <- seq(min(mns), max(mns), length=2*1000)
std<-(1/lambda)/sqrt(40)
y <- dnorm(x, mean=1/lambda, sd=std)
# Theoritical curve
lines(x, y,col="red")
Sample mean and variance is very close to theoritical mean and variance. We also see that sample destribution closely follow the theoritical destribution
Now in the second portion of the project, we’re going to analyze the ToothGrowth data in the R datasets package. -Load the ToothGrowth data and perform some basic exploratory data analyses -Provide a basic summary of the data. -Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose. (Only use the techniques from class, even if there’s other approaches worth considering) -State your conclusions and the assumptions needed for your conclusions.
library(ggplot2)
data("ToothGrowth")
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
dim(ToothGrowth)
## [1] 60 3
head(ToothGrowth)
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
## 4 5.8 VC 0.5
## 5 6.4 VC 0.5
## 6 10.0 VC 0.5
p <- ggplot(ToothGrowth, aes(factor(dose), len, fill = factor(supp)))
p<-p+geom_bar(stat="identity")
p<-p+facet_grid(. ~ supp)
p<-p+xlab("dose")+ylab("Length")+ggtitle("Growth by supplier")
print(p)
First of all, it is clear that tooth length increases with increasing dose in case of either supplier. Also, for dose sie .5 and 1, OJ is more effective than VC at those level
t.test(len~supp,data=ToothGrowth)
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1710156 7.5710156
## sample estimates:
## mean in group OJ mean in group VC
## 20.66333 16.96333
It is clear that p-value is less than .05 for all cases, therefore, there is a strong co-relation between supplier and tooth growth since, p-value is .06>0.05, we can conclude that there is no correlation between tooth growth and supplier
toothSubset<-subset(ToothGrowth, ToothGrowth$dose %in% c(.5,1))
t.test(len~supp,data=toothSubset)
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 3.0503, df = 36.553, p-value = 0.004239
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.875234 9.304766
## sample estimates:
## mean in group OJ mean in group VC
## 17.965 12.375
toothSubset<-subset(ToothGrowth, ToothGrowth$dose %in% c(.5,1))
t.test(len~dose,data=toothSubset)
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.983781 -6.276219
## sample estimates:
## mean in group 0.5 mean in group 1
## 10.605 19.735
toothSubset<-subset(ToothGrowth, ToothGrowth$dose %in% c(1,2))
t.test(len~dose,data=toothSubset)
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -8.996481 -3.733519
## sample estimates:
## mean in group 1 mean in group 2
## 19.735 26.100
toothSubset<-subset(ToothGrowth, ToothGrowth$dose %in% c(.5,2))
t.test(len~dose,data=toothSubset)
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -11.799, df = 36.883, p-value = 4.398e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -18.15617 -12.83383
## sample estimates:
## mean in group 0.5 mean in group 2
## 10.605 26.100
It is clear that p-value is less than .05 for all cases, therefore, there is a strong co-relation between dose and tooth growth
While there is no strong correlation between supploer and tooth growth, there is a strong correlation between dose adn tooth growth.As evident by very low value of p-value which is less than <0.05