This project consists of two parts: Simulation and basic inference exercises.
https://www.coursera.org/learn/statistical-inference/peer/3k8j5/statistical-inference-course-project
In this part, the exponential distribution will be investigated and compared to the Central Limit Theorem.
Set seed to recreate random experiments
set.seed(500)
Set simulation parameters (lambda, sample size and number of replications)
lambda <- 0.2
n <- 40
rep <- 1000
Create the random exponential distribution matrix
sim <- replicate(rep, rexp(n, lambda))
sim_matrix <- matrix(sim, rep, n)
Create the vector of means, compare theoretical and sample mean and variance
means <- rowMeans(sim_matrix)
sam_mean <- mean(means)
theo_mean <- 1/lambda
error_mean <- (theo_mean - sam_mean)/theo_mean
sam_var <- var(means)
theo_var <- (1/lambda)^2/n
error_var <- (theo_var - sam_var)/theo_var
message(sprintf("Sample mean: %s; Theoretical mean: %s with a mere error of %s\n\n", round(sam_mean, 3), theo_mean, round(error_mean, 3)))
## Sample mean: 5.011; Theoretical mean: 5 with a mere error of -0.002
message(sprintf("Sample variance: %s; Theoretical variance: %s with a mere error of %s\n\n", round(sam_var, 3), theo_var, round(error_var, 3)))
## Sample variance: 0.6; Theoretical variance: 0.625 with a mere error of 0.039
Plot the histogram of exponential distribution means and compare it to the curves of theoretical and sample mean normal distributions
hist(means, xlab="Exponential distribution means", ylab = "Frequency", main="Distribution of 40 exponential sample means", breaks=n, prob=TRUE)
curve(dnorm(x, theo_mean, sqrt(theo_var)), col="red", add=TRUE)
curve(dnorm(x, sam_mean, sqrt(sam_var)), col="blue", add=TRUE)
legend(x = "topright",
legend = c("Theoretical mean normal distribution", "Sample mean normal distribution"),
lty = c(1, 1),
col = c("red", "blue"),
cex = 0.8)
The data set used is named “ToothGrowth”. It contains 60 observations on 60 guinea pigs related to tooth length, vitamin C delivery method (orange juice or ascorbic acid), and the dose level (0.5, 1, 2).
This part consists of developing a hypothesis test on whether the delivery method or the dose level have any impact on tooth length.
Let’s start by loading the ToothGrowth data and explore its structure
data(ToothGrowth)
data <- ToothGrowth
str(data)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
table(data$dose)
##
## 0.5 1 2
## 20 20 20
table(data$supp)
##
## OJ VC
## 30 30
table(data$len)
##
## 4.2 5.2 5.8 6.4 7 7.3 8.2 9.4 9.7 10 11.2 11.5 13.6 14.5 15.2 15.5
## 1 1 1 1 1 1 1 1 2 2 2 1 1 3 2 1
## 16.5 17.3 17.6 18.5 18.8 19.7 20 21.2 21.5 22.4 22.5 23 23.3 23.6 24.5 24.8
## 3 2 1 1 1 1 1 1 2 1 1 1 2 2 1 1
## 25.2 25.5 25.8 26.4 26.7 27.3 29.4 29.5 30.9 32.5 33.9
## 1 2 1 4 1 2 1 1 1 1 1
Provide a basic summary of the data
summary(data)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
Perform a t-test to test if vitamin C delivery method affects tooth length mean
t.test(len~supp, data)
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means between group OJ and group VC is not equal to 0
## 95 percent confidence interval:
## -0.1710156 7.5710156
## sample estimates:
## mean in group OJ mean in group VC
## 20.66333 16.96333
The fact that the p-value is large enough (0.06063) and that 0 is enclosed within the confidence interval leads us to not reject the null hypothesis, and therefore state that vitamin C delivery method has no effect on tooth length.
Perform three t-tests to test whether or not odontoblast length is affected by changing the dose level
t.test(len ~ dose, data[data$dose == 0.5 | data$dose == 1, ])
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means between group 0.5 and group 1 is not equal to 0
## 95 percent confidence interval:
## -11.983781 -6.276219
## sample estimates:
## mean in group 0.5 mean in group 1
## 10.605 19.735
t.test(len ~ dose, data[data$dose == 0.5 | data$dose == 2, ])
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -11.799, df = 36.883, p-value = 4.398e-14
## alternative hypothesis: true difference in means between group 0.5 and group 2 is not equal to 0
## 95 percent confidence interval:
## -18.15617 -12.83383
## sample estimates:
## mean in group 0.5 mean in group 2
## 10.605 26.100
t.test(len ~ dose, data[data$dose == 1 | data$dose == 2, ])
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
## 95 percent confidence interval:
## -8.996481 -3.733519
## sample estimates:
## mean in group 1 mean in group 2
## 19.735 26.100
Based on the fact that the p-value is negligible in all three tests above, we are allowed to reject the null hypothesis, and state that vitamin C dose has indeed an effect on tooth length.