In this project, we will work with the sampling distribution and the Central Limit Theorem (CLT).
The Central Limit Theorem is perhaps the most powerful concept/tool that we will learn in this course, it will send you back to the beginning of this class, require you to remind yourself and use the distributions and probability principles that we studied then, and yet at the end of this class now, it also sets the stage for new beginnings for many things to come. Make sure to have fun, and good luck!
Given sample data of the form: \[ X= \{x_1, x_2, x_3, \dots, x_n\},\] consider the following statistic: \[ \hat{\theta}(X) = \frac{\sum_{i=1}^n (x_i - \overline{x})^2}{n}.\] We will soon see that this statistic can be an “estimator” for the population variance \(\sigma^2\). For now, write a function “theta_hat” that calculates the value of the statistic given sample data “samp”.
theta_hat <- function(samp){
x = samp
draws = sample(x, size = 100, replace = TRUE, prob = NULL)
}
theta_hat(50)
Use the replicate and hist function to calculate the sampling distribution of \(\hat{\theta}\) when working with random samples coming from \(N(\mu = 5, \sigma = 1.5)\) of sizes \(n = 2, 3, 5, 10, 50, 500\).
B <- 10000
sizes <- c(2, 3, 4, 5, 10, 50 ,500)
mean <- 5
sd <- 1.5
for(n in sizes){
thetas <- replicate(B, {
samp <- rnorm(n, mean, sd)
theta_hat(samp)
})
hist(thetas, breaks = 50,
main = "sampling distribution", axes= TRUE)
}
For each of these cases of sample sizes, calculate the empirical expected value.
B <- 10000
sizes <- c(2, 3, 4, 5, 10, 50 ,500)
mean <- 5
sd <- 1.5
for(n in sizes){
thetas <- replicate(B, {
samp <- rnorm(n, mean, sd)
theta_hat(samp)
})
print(paste("Sample size:", n, "Empirical exp value of theta_hat ", round(mean(thetas), 5)))
}
## [1] "Sample size: 2 Empirical exp value of theta_hat 5.02529"
## [1] "Sample size: 3 Empirical exp value of theta_hat 5.01247"
## [1] "Sample size: 4 Empirical exp value of theta_hat 4.99605"
## [1] "Sample size: 5 Empirical exp value of theta_hat 4.99664"
## [1] "Sample size: 10 Empirical exp value of theta_hat 5.00222"
## [1] "Sample size: 50 Empirical exp value of theta_hat 4.99356"
## [1] "Sample size: 500 Empirical exp value of theta_hat 5.00076"
As sample size increases what is the relation between the empirical mean of \(\hat{\theta}\) and \(\sigma^2\)?
Suppose we are working with a population that has the exponential distibution with \(\lambda = 2\).
Use the replicate function to get the histograms for the sampling distribution of the sample mean when working with sample sizes \(n = 1, 2, 3, 4, 15, 500\). Be sure to have appropriate titles for your histograms.
B <- 10000
sizes <- c(1, 2, 3, 4, 15, 500)
mean <- 5
sd <- 1.5
for(n in sizes){
thetas <- replicate(B, {
samp <- rnorm(n, mean, sd)
theta_hat(samp)
})
hist(thetas, breaks = 50,
main = "exponential distribution", axes= TRUE)
}
What do you notice?
It is symmetrical.
Suppose we are working with the discrete uniform random variable taking values \(\{1, 2, 3, 4, 5, 6\}\).
Define a function “disc_samp” that takes input “n” and returns a random sample of size “n” from this distribution.
n <- c(1, 2, 3, 4 , 5, 6)
disc_samp <- function(n){
x = n;
draws = sample(x, size =100, replace = TRUE, prob = )
}
disc_samp(100)
Use the “disc_samp” function and the replicate function to to get the histograms for the sampling distribution of the sample mean when working with sample sizes \(n = 1, 2, 3, 4, 15, 500\). Be sure to have appropriate titles for your histograms.
B <- 10000
sizes <- c(1, 2, 3, 4, 15, 500)
mean <- 5
sd <- 1.5
for(n in sizes){
thetas <- replicate(B, {
samp <- rnorm(n, mean, sd)
#disc_samp(n)
})
hist(thetas, breaks = 50,
main = "sampling distribution", axes= TRUE)
}
What do you notice?