Statistical Inference Peer-Graded Assignment

Part 1: Simulation Exercise

Overview In this project you will investigate the exponential distribution in R and compare it with the Central Limit Theorem. The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. Set lambda = 0.2 for all of the simulations. You will investigate the distribution of averages of 40 exponentials. Note that you will need to do a thousand simulations.

1.1 Simulations

n <- 40
lambda <- 0.2
simulations <- 1000 # Do 1000 simulations
 
sampleMeans = NULL # Sample means
for(i in 1:simulations) {
  sampleMeans <- c(sampleMeans, mean(rexp(n, lambda)))
}

1.2 Sample Mean vs. Theoretical Mean

This is the sample mean.

mean1 = mean(sampleMeans) # Mean of sample means
mean1

## [1] 5.003032

This is the theoretic mean.

mean2 = 1/lambda # Theoretic mean
mean2

## [1] 5

The theoretic mean and the sample mean are close.

hist(sampleMeans,
     main = "Distribution of Sample Means",
     xlab = "Sample Mean",
     nclass = 50,
     col = "lightblue")
abline(v = mean1, col = "blue", lwd = 2) 
abline(v = mean2, col = "red", lwd = 2)

1.3 Sample Variance vs. Theoretical Variance

This is the sample variance.

var1 = var(sampleMeans); var1

## [1] 0.6699872

This is the theoretical variance.

var2 = 1/lambda; var2

## [1] 5

The sample variance is a little larger than the theoretical variance.

1.4 Distribution

The blue line is a normal distribution with that the mean equals to the sample mean, and the standard deviation equals to the sample standard deviation. The sample distribution is approximately normal.

par(mfrow = c(1,2))
hist(sampleMeans,
     probability = T,
     main = "Distribution of Sample Means",
     xlab = "Sample Mean",
     nclass = 50,
     col = "lightblue")
abline(v = mean1, col = "blue", lwd = 2) 
curve(dnorm(x, mean1, sd=var1^0.5), 
      add=TRUE, col="blue", lwd=2) # Add normal distribution line
qqnorm(sampleMeans, col="lightblue"); qqline(sampleMeans, col="blue", lwd=2)

Part 2: Basic Inferential Data Analysis

2.1 Load the ToothGrowth data and perform some basic exploratory data analyses

Overview Now in the second portion of the project, we’re going to analyze the ToothGrowth data in the R datasets package.

Dataset Description The response is the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice or ascorbic acid (a form of vitamin C and coded as VC).

head(ToothGrowth)

summary(ToothGrowth)

##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

par(mfrow = c(1,2))
hist(ToothGrowth$len, nclass = 50)
hist(ToothGrowth$dose, nclass = 50)

2.2 Basic summary of the data

The dose and teeth growth are very positively linearly related.

ToothGrowth_VC <- subset(ToothGrowth, ToothGrowth[,"supp"] == "VC")
ToothGrowth_OJ <- subset(ToothGrowth, ToothGrowth[,"supp"] == "OJ")
cor(ToothGrowth$len, ToothGrowth$dose)

## [1] 0.8026913

cor(ToothGrowth_VC$len, ToothGrowth_VC$dose)

## [1] 0.8989722

cor(ToothGrowth_OJ$len, ToothGrowth_OJ$dose)

## [1] 0.7500585

layout(matrix(c(1,1,2,3), 2, 2, byrow = TRUE))
plot(ToothGrowth$dose, ToothGrowth$len,
     main = "Length vs. Dose, Overall", xlab = "Length", ylab = "Dose")
abline(lm(ToothGrowth$len ~ ToothGrowth$dose), col="red")

#par(mfrow = c(1,2))
plot(ToothGrowth_VC$dose, ToothGrowth_VC$len,
     main = "Length vs. Dose, VC", xlab = "Length", ylab = "Dose")
abline(lm(ToothGrowth_VC$len ~ ToothGrowth_VC$dose), col="red")
plot(ToothGrowth_OJ$dose, ToothGrowth_OJ$len,
     main = "Length vs. Dose, OJ", xlab = "Length", ylab = "Dose")
abline(lm(ToothGrowth_OJ$len ~ ToothGrowth_OJ$dose), col="red")

2.3 Compare Tooth Growth by supp and dose

Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose. (Only use the techniques from class, even if there’s other approaches worth considering)

H0: Orange juice doesn’t cause more tooth growth than ascorbic acid. Accepted.

t.test(len ~ supp, data = ToothGrowth)

## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1710156  7.5710156
## sample estimates:
## mean in group OJ mean in group VC 
##         20.66333         16.96333

H0: At dose = 0.5, orange juice doesn’t cause more tooth growth than ascorbic acid. Rejected.

t.test(len ~ supp, data = subset(ToothGrowth, dose == 0.5))

## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 3.1697, df = 14.969, p-value = 0.006359
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.719057 8.780943
## sample estimates:
## mean in group OJ mean in group VC 
##            13.23             7.98

H0: At dose = 1, orange juice doesn’t cause more tooth growth than ascorbic acid. Rejected.

t.test(len ~ supp, data = subset(ToothGrowth, dose == 1))

## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 4.0328, df = 15.358, p-value = 0.001038
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  2.802148 9.057852
## sample estimates:
## mean in group OJ mean in group VC 
##            22.70            16.77

H0: At dose = 2, orange juice doesn’t cause more tooth growth than ascorbic acid. Accepted.

t.test(len ~ supp, data = subset(ToothGrowth, dose == 2))

## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = -0.046136, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.79807  3.63807
## sample estimates:
## mean in group OJ mean in group VC 
##            26.06            26.14

2.4 Conclusions and the Assumptions

Vitamin c and tooth growth are positively linear related, higher vitamin c dose causes more tooth growth.
Overall tooth growth caused by orange juice and ascorbic acid are similar.
However, at relatively lower dose level (dose = 0.5 and 1 mg/day), orange juice causes more tooth growth; At relatively higher dose level (dose = 2 mg/day), orange juice and ascorbic acid have similar effect on tooth growth.