Worked with Brogan Pietrocini, Kaden Buckley

Problem 1

Assume a Poisson(\(\mu\)) model for the number of home runs hit (in total by both teams) in a MLB game. Let \(X_1, \ldots, X_n\) be a random sample of home run counts for \(n\) games.

Suppose we want to estimate \(\theta = \mu e^{-\mu}\), the probability that any single game has exactly 1 HR (for Poisson(\(\mu\)), \(P(X = 1) = e^{-\mu}\,u^1/1! = \mu e^{-\mu}\)). Consider two estimators of \(\theta\):

  1. Compute the value of \(\hat{\theta}\) based on the sample (3, 0, 1, 4, 0). Write a clearly worded sentence reporting in context this estimate of \(\theta\).

    X bar=(3+0+1+4+0)/5

    =1.6

    Theta hat=X bar e-X bar

    =1.6e-1.6

    =0.323

    The probability that any single game has 1 HR is 0.323 based on Poisson’s Distribution.

  2. Compute the value of \(\hat{p}\) based on the sample (3, 0, 1, 4, 0). Write a clearly worded sentence reporting in context your estimate of \(\theta\).

    p hat=1/5

    1. he probability that any single game has 1 HR is 0.25 based on Poisson’s Distribution.
  3. Which of these two estimators is the MLE of \(\theta\) in this situation? Explain, without doing any calculations.

    In a poisson’s distribution, the MLE is the sample mean of the observed data points. So therefore the MLE is 1.6.

  4. It can be shown that \(\hat{p}\) is an unbiased estimator of \(p\). Explain in words what this means.

    Based on the definition of an unbiased estimator, p hat does not tend to systematically overestimate or underestimate the parameter of interest, regardless of the true value of the parameter.

  5. Is \(\hat{\theta}\) an unbiased estimator of \(\theta\)? Explain. (You don’t have to derive anything; just apply a general principle.)

    Theta hat is an unbiased estimator of theta if E(theta hat)=theta for all potential values of theta. If the sample size is large, theta-hat will be an asymptotically unbiased estimator, or “nearly unbiased” since E(theta-hat) approximately equals theta. Sample size is small so it is biased.

  6. Suppose \(\mu = 2.3\) and \(n=5\). Explain in full detail how you would use simulation to approximate the bias of \(\hat{\theta}\) in this case.

    Create a spinner with the values in our sample (3,0,1,4,0) and use this spinner to generate values and repeat many times, then compare these results to the given parameters.

  7. Coding required. Conduct the simulation from the previous part and approximate bias of \(\hat{\theta}\) when \(\mu = 2.3\) and \(n = 5\).

    mu <- 2.3
    n <- 5
    num_simulations <- 10000
    biases <- numeric(num_simulations)
    
    for (i in 1:num_simulations) {
      hr_sim <- rpois(n, mu)
      biases[i] <- mean(hr_sim) - mu
    }
    
    mean_of_biases <- mean(biases)
    mean_of_biases
    ## [1] -0.00636

    The approximate bias of theta hat from my simulation was 0.00522

  8. Explain in full detail how you would use simulation to approximate the bias function of \(\hat{\theta}\) when \(n=5\).

    Same set up as the last part but instead of mu=2.3, you would use X bar as that value.

  9. Coding required. Conduct the simulation from the previous part and plot the approximate bias function when \(n=5\). For what values of \(\mu\) does \(\hat{\theta}\) tend to overestimate \(\mu\)? Underestimate? For what values of \(\mu\) is the bias the worst?

    mu <- 1.6
    n <- 5
    num_simulations <- 10000
    biases <- numeric(num_simulations)
    
    for (i in 1:num_simulations) {
      hr_sim <- rpois(n, mu)
      biases[i] <- mean(hr_sim) - mu
    }
    
    mean_of_biases <- mean(biases)
    mean_of_biases
    ## [1] 0.00292

Problem 2

Continuing Problem 1.

  1. It can be shown that \(\text{Var}(\hat{p}) = \frac{\theta(1-\theta)}{n}\). Compute \(\text{Var}(\hat{p})\) when \(\mu = 2.3\) and \(n=5\). Then write a clearly worded sentence interpreting this value.

    Var(p hat)=2.3e-2.3(1- 2.3e-2.3)/5

    =0.035

    The variance of p hat is 0.035 when mu = 2.3 and n = 5.

  2. Suppose \(\mu = 2.3\) and \(n=5\). Explain in full detail how you would use simulation to approximate the variance of \(\hat{\theta}\).

    Use the same set up with the spinner, but now finding the variance of all the averages with the equation theta(1-theta)/n.

  3. Coding required. Conduct the simulation from the previous part and approximate the variance of \(\hat{\theta}\) when \(\mu = 2.3\) and \(n=5\). Then write a clearly worded sentence interpreting this value.

    mu <- 2.3
    n <- 5
    num_simulations <- 10000
    theta_hat_values <- numeric(num_simulations)
    
    for (i in 1:num_simulations) {
      hr_sim <- rpois(n, mu)
      theta_hat_values[i] <- mean(hr_sim)
    }
    
    variance_of_theta_hat <- var(theta_hat_values)
    variance_of_theta_hat
    ## [1] 0.461173

    I’m not entirely sure how to interpret that value.

  4. Which estimator has smaller variance when \(\mu = 2.3\) (and \(n=5\))? Answer, but then explain why this information alone is not really useful.

    The estimator with the smaller variance was p hat. This information isn’t very alone because we don’t know what the real value of mu is.

  5. Explain in full detail how you would use simulation to approximate the variance function of \(\hat{\theta}\) (if \(n=5\)).

    Same set up as the last part but instead of mu=2.3, you would use X bar as that value.

  6. Coding required. Conduct the simulation from the previous part and plot the approximate variance function. Compare to the variance function of \(\hat{p}\). Based on variability alone, which estimator is preferred?

    mu <- 1.6
    n <- 5
    num_simulations <- 10000
    p_hat_values <- numeric(num_simulations)
    
    for (i in 1:num_simulations) {
      hr_sim <- rpois(n, mu)
      p_hat_values[i] <- mean(hr_sim)
    }
    
    variance_of_p_hat <- var(p_hat_values)
    variance_of_p_hat
    ## [1] 0.3183752

The preferred estimator is the p hat because it’s smaller.