1. Consider the population \(\{3, 6, 7, 9, 11, 14\}\). For samples of size 3 without replacement, find (and plot) the sampling distribution of the minimum. What is the mean of the sampling distribution?
# Your code here
pop <- c(3,6,7,9,11,14)
#Min <- apply(combn(pop, 3), 2, min)
mean(pop)
[1] 8.333333
md <- replicate(1000, mean(sample(pop, 3, replace = FALSE)))
mean(md)
[1] 8.346
hist(md)

  1. Let \(X_1, X_2, \ldots, X_n\) be a random sample from some distribution and suppose \(Y = T(X_1, X_2, \ldots, X_n)\) is a statistic. Suppose the sampling distribution of \(Y\) has pdf \(f(y) = (3/8)y^2\) for \(0 \leq y \leq 2.\) Find \(P(0 \leq Y \leq 1/5)\).
# Your code here
func <- function(x){3 * x ^ 2 / 8}
val <- integrate(func, 0, 1/5)$value

\(P(0 \leq Y \leq 1/5) =\) 0.001

  1. A friend claim that she has drawn a random sample of size 30, from the exponential distribution with \(\lambda = 1/10\). The mean of her sample is 12.

    1. What is the expected value of a sample mean?
    2. Run a simulation by drawing 10,000 random samples, each of size 30, from \(\text{Exp}(\lambda = 1/10)\) and then compute the mean. What proportion of the sample means are as large as or larger than 12?
    3. Is a mean of 12 unusual for a sample of size 30 from \(\text{Exp}(\lambda = 1/10)\)?
library(tidyverse)
set.seed(123)

# part a
sims <- 10000
md <- numeric(sims)
for (i in 1:sims) {
  md[i] <- mean(rexp(30, rate = 1/10))
}
mean(md)
[1] 9.971208
hist(md)

# part b
mean(md >= 12)
[1] 0.1341
  1. The expected value of the sample mean is 9.971208.
  2. The proportion of sample means that are larger than or equal to 12 is 0.1341.
  3. Due to the proportion being greater than 0.05, a mean of 12 is not normal.
  1. Let \(X_1, X_2, \ldots , X_{10} \overset{i.i.d}\sim N(20, 8)\) and \(Y_1, Y_2, \ldots, Y_{15} \overset{i.i.d}\sim N(16, 7)\). Let \(W = \bar{X} + \bar{Y}.\)

    1. Give the exact sampling distribution of \(W\)
    2. Simulate the sampling distribution in R and plot your results. Check that the simulated mean and standard error are close to the theoretical mean and standard error.
    3. Use your simulation to find \(P(W < 40).\) Calculate an exact answer and compare.
# part b
set.seed(13)
N <- 10 ^4-1
X <- numeric(N)
Y <- numeric(N)

for(i in 1:N) {
  X[i] <- mean(rnorm(10, 20, 8))
  Y[i] <- mean(rnorm(15, 16,7))
}
W <- X+Y
mean(W)
[1] 35.9955
sd(W)
[1] 3.055326
  1. The exact sample distribution of \(W\) = \(\hat{X}\) + \(\hat{Y}\) ~ \({N (35.995498, 3.0553256)}\)
# part c.
sims<- 10^4 
X <- numeric(sims)
Y <- numeric(sims)
for(i in 1:sims){
  X[i] <- rnorm(10, 20, 8)
  Y[i] <- rnorm(15, 16, 7)
  
}
W <- X + Y
mean(W < 40)
[1] 0.6516
pnorm(39, 36, 3.4157)
[1] 0.8101088
# part c.
  1. Let \(X_1, X_2, \ldots , X_{9} \overset{i.i.d}\sim N(7, 3)\) and \(Y_1, Y_2, \ldots, Y_{12} \overset{i.i.d}\sim N(10, 5)\). Let \(W = \bar{X} - \bar{Y}.\)

    1. GIve the exact sampling distribution of \(W\).
    2. Simulate the sampling distribution of \(W\) in R and plot your results using ggplot2. Check that the simulated mean and the standard error are close to the theoretical mean and the standard error.
    3. Use your simulation to find \(P(W < -1.5)\). Calculate an exact answer and compare.
# part a.

The exact sampling distribution of \(W\) is \(\bar{X} - \bar{Y}\) ~N(-3, -2)

# part b.
sims <- 10^4-1
W <- numeric(sims)
  for(i in 1:sims){
    W[i] <- mean(rnorm(9, 7, 3) + rnorm(12, 10, 5))
  }
mean <- mean(W)
standardDeviation <- sd(W)
mean
[1] 16.9857
standardDeviation
[1] 1.766579

The simulated mean comes out to 17.02687 and the standard deviation is 1.765292.

#plot using ggplot 
ggplot(data.frame(W=W), aes(x = W)) + 
  geom_histogram()

# part c.
# simulated answer
simulated <- (sum(W < -1.5) + 1) / (sims+1)
simulated
[1] 1e-04
# Exact answer
exact <- pnorm(-1.5, 17, sqrt(3^2 + 5^2))
exact
[1] 0.0007550805

After calculating the simulated and exact answer where P(W < -1.5) we found out that the simulated probablyility is 10^-4 and the exact probablity is .0007.

  1. Let \(X_1, X_2, \ldots , X_n\) be a random sample from \(N(0, 1)\). Let \(W = X_1^2 + X_2^2 + \cdots + X_N^2.\) Describe the sampling distribution of \(W\) by running a simulation, using \(n = 2.\) What is the mean and variance of the sampling distribution of \(W\)? Repeat using \(n = 4, n = 5.\) What observations or conjectures do you have for general \(n\)?
# Your code here
sims <- 10^4
WN2 <- numeric(sims)

for (i in 1:sims) {
  WN2[i] <- sum(rnorm(2)^2)
}
mean(WN2)
[1] 1.978066
var(WN2)
[1] 3.909587
WN4 <- numeric(sims)
for(i in 1:sims){
  WN4[i] <- sum(rnorm(4)^2)
}

mean(WN4)
[1] 3.989153
var(WN4)
[1] 8.101381
WN5 <- numeric(sims)

for(i in 1:sims){
  WN5[i] <- sum(rnorm(5)^2)
}

mean(WN5)
[1] 4.992883
var(WN5)
[1] 9.762109

After examining the means and variances for the different values of , we can conclude that the mean of W is pretty close to n, while the variance is close to 2*n.

  1. Let \(X\) be a uniform random variable on the interval \([40, 60]\) and \(Y\) a uniform random variable on \([45, 80].\) Assume that \(X\) and \(Y\) are independent.

    1. Compute the expected value and variance of \(X + Y.\)
    2. Simulate the sampling distribution of \(X + Y.\) Desribe the graph of the distribution of \(X + Y\). Compute the mean and variance of the sampling distribution and compare this to the theoretical mean and variance.
    3. Suppose the time (in minutes) Jack takes to complete his statistics homework is \(\text{Unif}[40, 60]\) and the time JIll takes is \(\text{Unif}[40, 60].\) Assume they work independently. One day they announce that their total time to finish an assignment was less than 90 min. How likely is this?
# Your code here
xfx <- function(x){x/20}
EX <- integrate(xfx, 40, 60)$value
yfy <- function(x){x/35}
EY <- integrate(yfy, 45, 80)$value
EX + EY
[1] 112.5
vx <- function(x){
  (x - EX)^2 / 20
}
VX <- integrate(vx, 40, 60)$value
vy <- function(x){
  (x - EY)^2 / 35
}
VY <- integrate(vy, 45, 80)$value
VX + VY
[1] 135.4167
# b.
sims <- 10^4 
result_means <- numeric(sims)
result_var <- numeric(sims)
for(i in 1:sims) {
  X <- runif(sims, 40, 60)
  Y <- runif(sims, 45, 80)
  result_means[i] <- mean(X) + mean(Y)
  result_var[i] <- var(X) + var(Y)
}
mean(result_means)
[1] 112.4991
mean(result_var)
[1] 135.4207

\(P(X + Y < 90) =\) 135.4207087

# part c
# P(W < 90)
sims <- 10^4 
result <- numeric(sims)
for(i in 1:sims) {
  X <- runif(sims, 40, 60)
  Y <- runif(sims, 40, 60)
  result[i] <- X + Y
}
mean(result < 90)
[1] 0.122