Introduction to Statistical Theory Assignment

Author
Affiliation

Vusumuzi Mabasa

Witwatersrand University

Keywords

Maximum Likehood Estimator, Hypothesis Testing, Moment Generating Functions, Bias

Question 1

  1. The posterior distribution of λ

The likelihood function:

\[ L(X | \lambda) = \frac{\lambda^X e^{-\lambda}}{X!} \]

The prior distribution for the parameter \(( \lambda)\) is assumed to follow a Chi-Square distribution with 4 degrees of freedom. The probability density function (PDF) is given by:

\[ p( \lambda) = \frac{\lambda^{\frac{\nu}{2} - 1} e^{-\frac{\lambda}{2}}}{2^{\nu/2} \Gamma(\nu / 2)} \]

with \(\nu = 4\)

\[ p(\lambda) = \frac{\lambda^{1} e^{-\lambda / 2}}{4} \]

Using Bayes’ Theorem:

\[ P( \lambda | X) \propto L( \lambda) \cdot P( \lambda). \]

Substituting the likelihood and prior:

\[ p(\lambda | x) \propto \left(\frac{\lambda e^{-\lambda / 2}}{4}\right) \cdot \left(\frac{\lambda^x e^{-\lambda}}{x!}\right). \]

Simplify: \[ p(\lambda | x) \propto \lambda^{x + 1} e^{-\frac{3\lambda}{2}}. \]

This corresponds to a Gamma distribution:

where \(\alpha - 1 = 1 + x\) = \(\alpha = 1 + x + 1 = x + 2\) and \(\beta = \frac{3}{2}\)

\[ \lambda | x \sim \text{Gamma}(\alpha = x + 2, \beta = \frac{3}{2}). \]

  1. Computing the posterior mean, mode, and 95% credible interval in R

Given \(x = 4\), the posterior parameters are: - Shape: \(\alpha = 4 + 2 = 6\), - Rate: \(\beta = \frac{3}{2}\).

Posterior Mean:

\[ \mathbb{E}[\lambda] = \frac{\alpha}{\beta} = \frac{6}{3/2} = 4 \]

Posterior Mode:

\[ \text{Mode} = \frac{\alpha - 1}{\beta} = \frac{6 - 1}{3/2} = \frac{5}{1.5} = 3.33 \]

The 95% credible interval using R

qgamma(c(0.025, 0.975), shape = 6, rate = 3/2)
[1] 1.467930 7.778888
  1. Interpretations

Given the prior belief, there is a 95% probability that \(( \lambda)\) lies within the interval (1.77, 7.27).

Question 2

  1. Mean Heights of Males in a Population

Given \(Y \sim N(\mu, \sigma^2)\), with a prior \(\mu \sim N(\mu_0, \sigma_0^2)\), the posterior distribution for \((\mu | Y)\) is derived as follows:

Prior Distribution

\[ P(\mu) = \frac{1}{\sqrt{2\pi \sigma_0^2}} \exp\left(-\frac{(\mu - \mu_0)^2}{2\sigma_0^2}\right) \]

Likelihood Function

\[ L(\mu) = \prod_{i=1}^n \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x_i - \mu)^2}{2\sigma^2}\right) \]

Simplifyin the joint likelihood further: \[ L(\mu) \propto \exp\left(-\frac{n}{2\sigma^2} (n\bar{x} - \mu)^2\right), \]

where, \(\bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i\)

Posterior Distribution

Using Bayes’ theorem: \[ P(\mu \mid x) \propto L(\mu) \cdot P(\mu) \]

Substitute the prior and simplified likelihood: \[ P(\mu \mid x) \propto \exp\left(-\frac{1}{2\sigma_0^2} (\mu - \mu_0)^2\right) \cdot \exp\left(-\frac{n}{2\sigma^2} (n\bar{x} - \mu)^2\right) \]

Combine the exponents: \[ P(\mu \mid x) \propto \exp\left(-\frac{1}{2} \left[ \frac{1}{\sigma_0^2} (\mu - \mu_0)^2 + \frac{n}{\sigma^2} (n\bar{x} - \mu)^2 \right]\right) \]

Completing the Square

Expand and combine terms: \[ \frac{1}{\sigma_0^2} (\mu^2 - 2\mu\mu_0 + \mu_0^2) + \frac{n}{\sigma^2} (n\bar{x}^2 - 2\mu n\bar{x} + \mu^2) \]

Group terms: \[ \left( \frac{1}{\sigma_0^2} + \frac{n}{\sigma^2} \right) \mu^2 - 2\mu \left( \frac{\mu_0}{\sigma_0^2} + \frac{n\bar{x}}{\sigma^2} \right) + C, \] where C collects constants independent of \(\mu\).

Posterior Mean and Variance

The posterior mean and variance are: \[ \mu_{\text{post}} = \frac{\frac{\mu_0}{\sigma_0^2} + \frac{n\bar{x}}{\sigma^2}}{\frac{1}{\sigma_0^2} + \frac{n}{\sigma^2}}, \]

\[ \sigma_{\text{post}}^2 = \frac{1}{\frac{1}{\sigma_0^2} + \frac{n}{\sigma^2}}. \]

  1. Manual computation

Compute \((P(Y > 180)\)

Given:

  1. Sample size: \(n = 80\)

  2. Sample mean: \(\bar{Y} = 182.137\)

  3. Known variance: \(\sigma^2 = 225\)

  4. Prior mean: \(\mu_0 = 187\)

  5. Prior variance: \(\sigma_0^2 = 100\)

The posterior mean is:

\[ \mu_{\text{post}} = \frac{\frac{187}{100} + \frac{80 \cdot 182.137}{225}}{\frac{1}{100} + \frac{80}{225}} = 182.137 \]

The posterior variance is:

\[ \sigma_{\text{post}}^2 = \frac{1}{\frac{1}{100} + \frac{80}{225}} = 2.736 \]

New Variance: The new variance combines the variance of Y given \(\mu\). The new variance of \(\mu:\) is:

\[ \text{Var}(Y_{\text{new}}) = \sigma^2 + v^2 = 225 + 2.736 = 227.736. \]

The Posterior Distribution of \(( Y_{\text{new}})\) is given by:

\[ Y_{\text{new}} \sim \mathcal{N}(182.137, 227.736). \]

Probability Calculation:

\[ P(Y_{\text{new}} > 180) = 1 - P(Y_{\text{new}} \leq 180). \]

Standardization: Standardizing the value:

\[ Z = \frac{180 - 182.137}{\sqrt{227.736}} = \approx -0.1416, \]

Then using the normal distribution table, I found the Z score to be 0.4443. The probability for the new Y value was then estimated to be:

\[ P(Y_{\text{new}} > 180) = 1 - 0.4443 = 0.5557. \]

  1. Computation in R
# Define given parameters
# Parameters
mu_post <- 182.137
sigma_post <- sqrt(227.736)
y_value <- 180

# Probability Calculation
p_greater_180 <- 1 - pnorm(y_value, mean = mu_post, sd = sigma_post)
print(p_greater_180)
[1] 0.5563053

The normal normal conjugate distrbustion

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# Define parameters
prior_mean <- 187
prior_var <- 100
prior_sd <- sqrt(prior_var)

sample_mean <- 182
known_var <- 225
n <- 80
likelihood_var <- known_var / n
likelihood_sd <- sqrt(likelihood_var)

posterior_mean <- 182.137
posterior_var <- 227.736
posterior_sd <- sqrt(posterior_var)


x <- seq(170, 200, length.out = 500)


prior_density <- dnorm(x, mean = prior_mean, sd = prior_sd)
likelihood_density <- dnorm(x, mean = sample_mean, sd = likelihood_sd)
posterior_density <- dnorm(x, mean = posterior_mean, sd = posterior_sd)


df <- data.frame(
  x = rep(x, 3),
  density = c(prior_density, likelihood_density, posterior_density),
  Distribution = rep(c("Prior (N(187, 100))", "Likelihood (N(182, 225))", "Posterior (N(182.137, 227.736))"), each = length(x))
)


ggplot(df, aes(x = x, y = density, color = Distribution, linetype = Distribution)) +
  geom_line(linewidth = 1) +
  scale_color_manual(values = c("Prior (N(187, 100))" = "blue", 
                                "Likelihood (N(182, 225))" = "red", 
                                "Posterior (N(182.137, 227.736))" = "green")) +
  labs(x = "Parameter (μ)", 
       y = "Density", 
       title = "Normal Normal Conjugate") +
  theme_gray() +
  theme(legend.position = "bottom", 
        legend.title = element_blank(), 
        legend.background = element_rect(fill = "white", color = "black"))

Question 3

  1. Deriving the Likelihood Ratio

The Likelihood function

\[ L(\mu) = \prod_{i=1}^n \frac{1}{\sqrt{2\pi \cdot 64}} \exp\left(-\frac{(X_i - \mu)^2}{2 \cdot 64}\right) = \left(\frac{1}{\sqrt{2\pi \cdot 64}}\right)^n \exp\left(-\frac{1}{128}\sum_{i=1}^n (X_i - \mu)^2\right). \]

The likelihood function under \(H_0\) is: 80 \[ L(80) = \left(\frac{1}{\sqrt{2\pi \cdot 64}}\right)^n \exp\left(-\frac{1}{128} \sum_{i=1}^n (X_i - 80)^2\right). \]

The likelihood function under \(H_1\) is: 76

\[ L(76) = \left(\frac{1}{\sqrt{2\pi \cdot 64}}\right)^n \exp\left(-\frac{1}{128} \sum_{i=1}^n (X_i - 76)^2\right). \]

Likelihood Ratio

The likelihood ratio is:

\[ \frac{L(80)}{L(76)} = \frac{(128\pi)^{-n/2} \exp\left(-128 \sum_{i=1}^n (x_i - 80)^2 \right)}{(128\pi)^{-n/2} \exp\left(-128 \sum_{i=1}^n (x_i - 76)^2 \right)}, \]

\[ \frac{L(80)}{L(76)} = \exp\left(-\frac{1}{128}\sum_{i=1}^n (X_i - 80)^2\right) \exp\left(-\frac{1}{128}\sum_{i=1}^n (X_i - 76)^2\right) = \exp\left(-\frac{1}{128} \left[\sum_{i=1}^n (X_i - 80)^2 - \sum_{i=1}^n (X_i - 76)^2 \right]\right). \]

Simplify the exponent:

\[ (X_i - 80)^2 - (X_i - 76)^2 = (X_i^2 - 160X_i + 6400) - (X_i^2 - 152X_i + 5776) = -160X_i + 152X_i + 6400 - 5776 = -8X_i + 624. \]

\[ \sum_{i=1}^n \left[ (X_i - 80)^2 - (X_i - 76)^2 \right] = \sum_{i=1}^n (-8 X_i + 624) = -8 \sum_{i=1}^n X_i + 624n = -8n \bar{X} + 624n, \]

where

\[ \bar{X} = \frac{1}{n} \sum_{i=1}^n X_i. \]

Thus:

\[ \Lambda = \exp\left(-\frac{1}{128}(-8n \bar{X} + 624n)\right) = \exp\left(\frac{8n \bar{X} - 624n}{128}\right) = \exp\left(n \frac{\bar{X} - 78}{16}\right). \]

The critical region is defined by:

\[ \Lambda \leq k : \exp\left(\frac{n}{16} (\bar{X} - 78)\right) \leq k. \]

Taking the natural logarithm:

\[ \frac{n}{16} (\bar{X} - 78) \leq \log(k). \]

Simplifying:

\[ \bar{X} - 78 \leq \frac{16}{n} \log(k) \implies \bar{X} \leq 78 + \frac{16}{n} \log(k). \]


  1. Determining the values of n and c such that the type I error α ≈ 0.05 and the type II error β ≈ 0.05.

\[ \alpha = P\left(\bar{X} \leq c \mid \mu = 80\right). \]

Since

\[ \bar{X} \sim N\left(\mu, \frac{\sigma^2}{n}\right) = N\left(80, \frac{64}{n}\right), \]

\[ Z = \frac{\bar{X} - 80}{\sqrt{64/n}} = \frac{\bar{X} - 80}{8/\sqrt{n}} \sim N(0, 1). \]

The probability:

\[ P\left(\bar{X} \leq c \mid \mu = 80\right) = P\left(\frac{\bar{X} - 80}{8/\sqrt{n}} \leq \frac{c - 80}{8/\sqrt{n}}\right) = \Phi\left(\frac{c - 80}{8/\sqrt{n}}\right) \leq 0.05. \]

Solving for \(c:\)

\[ \frac{c - 80}{8/\sqrt{n}} \leq \Phi^{-1}(0.05) = -1.645 \implies c - 80 \leq -1.645 \cdot \frac{8}{\sqrt{n}} \implies c \leq 80 - \frac{13.16}{\sqrt{n}}. \]

\[ \beta = P\left(c \geq \bar{X} \mid \mu = 76\right). \]

\[ \beta = P\left(\bar{X} > c \mid \mu = 76\right) = 1 - P\left(\bar{X} \leq c \mid \mu = 76\right). \]

Under \(H_1\),

\[ \bar{X} \sim N\left(76, \frac{64}{n}\right). \]

Thus:

\[ P\left(\bar{X} \leq c \mid \mu = 76\right) = P\left(\frac{\bar{X} - 76}{8/\sqrt{n}} \leq \frac{c - 76}{8/\sqrt{n}}\right) = \Phi\left(\frac{c - 76}{8/\sqrt{n}}\right). \]

Substituting into \(\beta\):

\[ \beta = 1 - \Phi\left(\frac{c - 76}{8/\sqrt{n}}\right) \leq 0.05 \implies \Phi\left(\frac{c - 76}{8/\sqrt{n}}\right) \geq 0.95. \]

Solving for \(c\):

\[ \frac{c - 76}{8/\sqrt{n}} \geq \Phi^{-1}(0.95) = 1.645 \implies c - 76 \geq 1.645 \cdot \frac{8}{\sqrt{n}} \implies c \geq 76 + \frac{13.16}{\sqrt{n}}. \]

Combining the inequalities:

\[ 80 - \frac{13.16}{\sqrt{n}} \geq c \geq 76 + \frac{13.16}{\sqrt{n}}. \]

For equality (to minimize \(n\))

\[ 80 - \frac{13.16}{\sqrt{n}} = 76 + \frac{13.16}{\sqrt{n}}. \]

Simplify:

\[ 80 - 76 = \frac{13.16}{\sqrt{n}} + \frac{13.16}{\sqrt{n}} \implies 4 = \frac{26.32}{\sqrt{n}} \implies \sqrt{n} = \frac{26.32}{4} = 6.58 \implies n = (6.58)^2 \approx 43.3. \]

Round up:

\[ n = 44. \]

Then:

\[ c = 80 - \frac{13.16}{\sqrt{44}} = 80 - \frac{13.16}{6.633} \approx 80 - 1.984 \approx 78.016. \]

Verify:

\[ c = 76 + \frac{13.16}{\sqrt{44}} \approx 76 + 1.984 \approx 78.016. \]

Check \(\alpha)\):

\[ \frac{c - 80}{8/\sqrt{44}} = \frac{78.016 - 80}{8/\sqrt{44}} = \frac{-1.984}{1.206} \approx -1.645 \implies \alpha = \Phi(-1.645) = 0.05. \]

Check \(\beta\):

\[ \frac{c - 76}{8/\sqrt{44}} = \frac{78.016 - 76}{1.206} \approx 1.645 \implies \beta = 1 - \Phi(1.645) = 0.05. \]

Thus:

\[ n = 44, \quad c \approx 78.016. \]

  1. R computation of b
# Parameters
mu0 <- 80
mu1 <- 76
sigma <- 8
alpha <- 0.05
beta <- 0.05

# Critical z-values
z_alpha <- qnorm(alpha, lower.tail = TRUE)  # One-sided test
z_beta <- qnorm(1 - beta)                   # For power = 1 - beta

# Difference in means
delta <- abs(mu0 - mu1)

# Calculate required n
numerator <- sigma * (z_beta + abs(z_alpha))
n_required <- (numerator / delta)^2

# Round up to nearest integer
n_required_rounded <- ceiling(n_required)

# Compute critical value c
se <- sigma / sqrt(n_required_rounded)
critical_value <- mu0 + z_alpha * se

# Output results
cat("Required sample size n:", n_required_rounded, "\n")
Required sample size n: 44 
cat("Critical value c:", round(critical_value, 2), "\n")
Critical value c: 78.02 
# Create x-axis values for the plot
x_vals <- seq(70, 85, length.out = 1000)

# Densities under H0 and H1
df <- data.frame(
  x = x_vals,
  H0 = dnorm(x_vals, mean = mu0, sd = se),
  H1 = dnorm(x_vals, mean = mu1, sd = se)
)

# Plot
ggplot(df, aes(x = x)) +
  geom_line(aes(y = H0), color = "blue", size = 1.2) +
  geom_line(aes(y = H1), color = "red", size = 1.2) +
  
  # Type I error region
  geom_area(data = subset(df, x <= critical_value), aes(y = H0), fill = "blue", alpha = 0.3) +
  
  # Type II error region
  geom_area(data = subset(df, x > critical_value), aes(y = H1), fill = "red", alpha = 0.3) +
  
  geom_vline(xintercept = critical_value, linetype = "dashed", color = "black") +
  annotate("text", x = critical_value, y = 0.05, label = paste0("C-value ≈ ", round(critical_value, 3)), angle = 90, vjust = -0.5) +
  
  labs(title = "Sampling Distributions Under H0 and H1",
       x = expression(bar(X)),
       y = "Density") +
  theme_minimal() +
  theme(legend.position = "bottom", 
        legend.title = element_blank(), 
        legend.background = element_rect(fill = "white", color = "black"),  
        legend.margin = margin(6, 6, 6, 6)) 
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

Question 4

\[ \psi(t_1, t_2) = \ln M(t_1, t_2), \quad M(t_1, t_2) = E\left[\exp(t_1 X + t_2 Y)\right]. \]

First Derivatives

  1. \(\frac{\partial \psi}{\partial t_1}\):

\[ \frac{\partial \psi}{\partial t_1} = \frac{1}{M(t_1, t_2)} \cdot \frac{\partial M(t_1, t_2)}{\partial t_1}, \]

where

\[ \frac{\partial M(t_1, t_2)}{\partial t_1} = E\left[X \cdot \exp(t_1 X + t_2 Y)\right]. \]

At \((t_1, t_2) = (0, 0)\):

\[ \frac{\partial \psi(0, 0)}{\partial t_1} = E[X]. \]

  1. \(\frac{\partial \psi}{\partial t_2}\):

\[ \frac{\partial \psi}{\partial t_2} = \frac{1}{M(t_1, t_2)} \cdot \frac{\partial M(t_1, t_2)}{\partial t_2}, \]

where

\[ \frac{\partial M(t_1, t_2)}{\partial t_2} = E\left[Y \cdot \exp(t_1 X + t_2 Y)\right]. \]

At \((t_1, t_2) = (0, 0)\):

\[ \frac{\partial \psi(0, 0)}{\partial t_2} = E[Y]. \]

Second Derivatives

  1. \(\frac{\partial^2 \psi}{\partial t_1^2}\):

\[ \frac{\partial^2 \psi}{\partial t_1^2} = \frac{\frac{\partial^2 M(t_1, t_2)}{\partial t_1^2} \cdot M(t_1, t_2) - \left(\frac{\partial M(t_1, t_2)}{\partial t_1}\right)^2}{M(t_1, t_2)^2}. \]

At \((t_1, t_2) = (0, 0)\):

\[ \frac{\partial^2 \psi(0, 0)}{\partial t_1^2} = \text{Var}(X). \]

  1. \(\frac{\partial^2 \psi}{\partial t_2^2}\):

\[ \frac{\partial^2 \psi}{\partial t_2^2} = \frac{\frac{\partial^2 M(t_1, t_2)}{\partial t_2^2} \cdot M(t_1, t_2) - \left(\frac{\partial M(t_1, t_2)}{\partial t_2}\right)^2}{M(t_1, t_2)^2}. \]

At \((t_1, t_2) = (0, 0)\):

\[ \frac{\partial^2 \psi(0, 0)}{\partial t_2^2} = \text{Var}(Y). \]

  1. Mixed Derivative \(\frac{\partial^2 \psi}{\partial t_1 \partial t_2}\):

\[ \frac{\partial^2 \psi}{\partial t_1 \partial t_2} = \frac{\frac{\partial^2 M(t_1, t_2)}{\partial t_1 \partial t_2} \cdot M(t_1, t_2) - \frac{\partial M(t_1, t_2)}{\partial t_1} \cdot \frac{\partial M(t_1, t_2)}{\partial t_2}}{M(t_1, t_2)^2}. \]

At \((t_1, t_2) = (0, 0)\):

\[ \frac{\partial^2 \psi(0, 0)}{\partial t_1 \partial t_2} = \text{Cov}(X, Y). \]

Question 5

Step 1: Write the Likelihood Function

\[ L(\lambda) = \frac{e^{-\lambda} \cdot \lambda^x}{x!}. \]

Step 2: Compute the Log-Likelihood

\[ \ln\left(L(\lambda)\right) = \ln\left(\frac{e^{-\lambda} \cdot \lambda^x}{x!}\right) = -\lambda + x \ln(\lambda) - \ln(x!). \]

Step 3: Find the First Derivative

Differentiate \(\ln(L(\lambda))\) with respect to \(\lambda\):

\[ \frac{d\ln(L)}{d\lambda} = \frac{d}{d\lambda} [ -\lambda + x \ln(\lambda) ] = -1 + x \cdot \frac{1}{\lambda} = -1 + \frac{x}{\lambda}. \]

Step 4: Set the Derivative to Zero

To find the maximum:

\[ -1 + \frac{x}{\lambda} = 0 \implies \frac{x}{\lambda} = 1 \implies \lambda = x. \]

So, the MLE of \(\lambda\), based on one observation \(X\), is:

\[ \hat{\lambda} = X. \]

Step 5: Verify It’s a Maximum

Take the second derivative to check if this is a maximum:

\[ \frac{d^2\ln(L)}{d\lambda^2} = \frac{d}{d\lambda} \left[ -1 + \frac{x}{\lambda} \right] = \frac{d}{d\lambda} \left[ \frac{x}{\lambda} \right] = x \cdot \left(-\frac{1}{\lambda^2}\right) = -\frac{x}{\lambda^2}. \]

Since \(x\) and \(\lambda\) are positive,

\[ -\frac{x}{\lambda^2} < 0, \]

which means the second derivative is negative, confirming that this is indeed a maximum.

The expected value of \(\hat{\lambda}\) is:

\[ E[\hat{\lambda}] = E[X]. \]

For a Poisson random variable:

\[ E[X] = \lambda = 50, \]

so:

\[ E[\hat{\lambda}] = 50. \]

This matches the true \(\lambda\), demonstrating that the estimator is unbiased.

Question 6

Part A

  1. 10000 random sample simulation
set.seed(100)  # For reproducibility

# Parameters
mu <- 21
sigma <- sqrt(9)  # Standard deviation = sqrt(variance)
n <- 30
num_simulations <- 10000

# Simulate samples: each row is one sample of size n
samples <- matrix(rnorm(n * num_simulations, mean = mu, sd = sigma), 
                  nrow = num_simulations, ncol = n)
  1. Compute the sample mean denoted by $ _bar$ and a biased estimator denoted by \(0.9 \X_bar\).
# Sample means (X̄)
X_bar <- rowMeans(samples)

# Biased estimator: μ̂_biased = 0.9 * X̄
biased_estimates <- 0.9 * X_bar
  1. Sampling distribution plots in R
# Create a dataframe for plotting
df <- data.frame(
  Estimate = c(X_bar, biased_estimates),
  Type = rep(c("Unbiased (X̄)", "Biased (0.9 X̄)"), each = num_simulations)
)

# Plot density curves
ggplot(df, aes(x = Estimate, fill = Type, color = Type)) +
  geom_density(alpha = 0.4, linewidth = 1) +
  geom_vline(xintercept = mu, linetype = "dashed", color = "black") +
  annotate("text", x = mu, y = 0.15, label = "True Mean = 21", vjust = -1.2, size = 4) +
  labs(title = "Sampling Distributions of Estimators",
       x = "Estimate Value", y = "Density") +
  theme_minimal() +
  theme(legend.position = "top")

  1. Calculate and reporting each estimator
# True mean
mu <- 21

# Unbiased estimator
mean_unbiased <- mean(X_bar)
bias_unbiased <- mean_unbiased - mu
var_unbiased <- var(X_bar)

# Biased estimator
mean_biased <- mean(biased_estimates)
bias_biased <- mean_biased - mu
var_biased <- var(biased_estimates)

# Print results
cat("Unbiased Estimator (X̄):\n")
Unbiased Estimator (X̄):
cat("  Mean:", round(mean_unbiased, 3), "\n")
  Mean: 21.008 
cat("  Bias:", round(bias_unbiased, 3), "\n")
  Bias: 0.008 
cat("  Variance:", round(var_unbiased, 3), "\n\n")
  Variance: 0.302 
cat("Biased Estimator (0.9 X̄):\n")
Biased Estimator (0.9 X̄):
cat("  Mean:", round(mean_biased, 3), "\n")
  Mean: 18.907 
cat("  Bias:", round(bias_biased, 3), "\n")
  Bias: -2.093 
cat("  Variance:", round(var_biased, 3), "\n")
  Variance: 0.245 
  1. Short questions
  • The sample mean (\(\bar{X}\) )appears unbiased. This is because its average estimate across the 10,000 simulations is approximately equal to the true population mean ( \(\mu = 21\)), and its computed bias is close to zero.

  • The biased estimator 0.9\(\bar{X}\) has lower variance. This makes intuitive sense: multiplying by a constant less than 1 shrinks the variability of the estimator (by the square of that constant.

  • I’d prefer \(\bar{X}\) . While the biased \(\mu\) has lower variance, its bias (-2.1) means it systematically underestimates the true mean, which can lead to incorrect conclusions. \(\bar{X}\) is unbiased, so it’s more reliable for estimating the true mean, even if its variance is slightly higher.

  • An estimator is unbiased if, on average, it equals the true value it’s estimating. For example, if we’re estimating a mean of 21, an unbiased estimator averages to 21 over many samples—it doesn’t systematically over- or under-estimate.

  • An estimator is efficient if it has the smallest possible variance among unbiased estimators. It means the estimates are tightly clustered around the true value, with less variability than other unbiased estimators. Efficiency is about precision: an efficient estimator gives more consistent estimates with less spread.

Part B

set.seed(100)  # For reproducibility

# Parameters
mu1 <- 21
sigma1 <- sqrt(9)  # Standard deviation = sqrt(variance)
n1 <- 100
num_simulations1 <- 10000

# Simulate samples: each row is one sample of size n
samples1 <- matrix(rnorm(n1 * num_simulations, mean = mu1, sd = sigma1), 
                  nrow = num_simulations1, ncol = n1)
X_bar1 <- rowMeans(samples1)

# Biased estimator: μ̂_biased = 0.9 * X̄
biased_estimates1 <- 0.9 * X_bar1
# Create a dataframe for plotting
df1 <- data.frame(
  Estimate1 = c(X_bar1, biased_estimates1),
  Type = rep(c("Unbiased (X̄)", "Biased (0.9 X̄)"), each = num_simulations1)
)

# Plot density curves
ggplot(df1, aes(x = Estimate1, fill = Type, color = Type)) +
  geom_density(alpha = 0.4, linewidth = 1) +
  geom_vline(xintercept = mu1, linetype = "dashed", color = "black") +
  annotate("text", x = mu1, y = 0.15, label = "True Mean = 21", vjust = -1.2, size = 4) +
  labs(title = "Sampling Distributions of Estimators with n= 100",
       x = "Estimate Value", y = "Density") +
  theme_minimal() +
  theme(legend.position = "top")

mu1 <- 21

# Unbiased estimator
mean_unbiased <- mean(X_bar1)
bias_unbiased <- mean_unbiased - mu1
var_unbiased <- var(X_bar1)

# Biased estimator
mean_biased <- mean(biased_estimates1)
bias_biased <- mean_biased - mu1
var_biased <- var(biased_estimates1)

# Print results
cat("Unbiased Estimator (X̄):\n")
Unbiased Estimator (X̄):
cat("  Mean:", round(mean_unbiased, 3), "\n")
  Mean: 21.004 
cat("  Bias:", round(bias_unbiased, 3), "\n")
  Bias: 0.004 
cat("  Variance:", round(var_unbiased, 3), "\n\n")
  Variance: 0.09 
cat("Biased Estimator (0.9 X̄):\n")
Biased Estimator (0.9 X̄):
cat("  Mean:", round(mean_biased, 3), "\n")
  Mean: 18.904 
cat("  Bias:", round(bias_biased, 3), "\n")
  Bias: -2.096 
cat("  Variance:", round(var_biased, 3), "\n")
  Variance: 0.073 
  • The sampling distributions of the two estimators, \(\bar{X}\) (unbiased) and mu_biased (0.9 *\(\bar{X}\), biased), were analyzed for sample sizes n = 30 and n = 100. Both distributions are normal-shaped, with \(\bar{X}\) centered at the true mean of 21 and the biased \(\mu\) at 18.9, reflecting its bias of -2.1. The spread of biased \(\mu\) is consistently narrower than that of \(\bar{X}\) due to its lower variance (0.243 vs. 0.3 for n = 30, and 0.0729 vs. 0.09 for n = 100), as seen in the graphs where the pink curve (\(\mu\)) is tighter than the blue curve (\(\bar{bar}\)).

  • As sample size increases from 30 to 100, the variability of both estimators decreases, with variances dropping proportionally (e.g., Var(\(\bar{X}\)) from 0.3 to 0.09), making the distributions visibly narrower in the n = 100 graph.

  • A good estimator benefits from low variance because it ensures more consistent and reliable estimates, even if slightly biased, as consistency often outweighs a small bias in practical applications; however, in this case, \(\mu\)’s large bias makes \(\bar{X}\) preferable despite its higher variance.

Question 7

Part A

Independent Random Variables

E(GH) = E(G)E(H).

The unbiased estimators for E(G) and E(H) are their respective sample means:

\[ \hat{E}(G) = \frac{1}{m} \sum_{i=1}^m G_i, \quad \hat{E}(H) = \frac{1}{n} \sum_{j=1}^n H_j. \]

Thus, the The Uniformly Minimum Variance Unbiased Estimator (UMVUE) for E(GH) is given by:

\[ \hat{E}(GH) = \hat{E}(G) \cdot \hat{E}(H) = \left(\frac{1}{m} \sum_{i=1}^m G_i \right) \cdot \left( \frac{1}{n} \sum_{j=1}^n H_j \right). \]

Finding the UMVUE of \(\text{Var}(G + H)\) Using the property of variances for independent variables, we know:

\[ \text{Var}(G + H) = \text{Var}(G) + \text{Var}(H). \]

The unbiased estimators for \(\text{Var}(G)\) and \(\text{Var}(H)\) are derived from their sample variances:

\[ \hat{\text{Var}}(G) = \frac{1}{m-1} \sum_{i=1}^m \left( G_i - \bar{G} \right)^2, \quad \text{where} \; \bar{G} = \frac{1}{m} \sum_{i=1}^m G_i, \]

\[ \hat{\text{Var}}(H) = \frac{1}{n-1} \sum_{j=1}^n \left( H_j - \bar{H} \right)^2, \quad \text{where} \; \bar{H} = \frac{1}{n} \sum_{j=1}^n H_j. \]

Thus, the UMVUE for \(\text{Var}(G + H)\) is:

\[ \hat{\text{Var}}(G + H) = \hat{\text{Var}}(G) + \hat{\text{Var}}(H). \]

Part B

Proof that \(E[(X - \mu)^2] = \sigma^2\)

Definition of Variance:

The variance of a random variable X is defined as:

\[ \text{Var}(X) = E[(X - E[X])^2]. \]

Substitute the expectation:

For a normal random variable X \(\sim N(\mu, \sigma^2)\), the mean is \(\mu\). Substituting this:

\[ \text{Var}(X) = E[(X - \mu)^2]. \]

Variance Property:

For a normal distribution, the variance is given as:

\[ \text{Var}(X) = \sigma^2. \]

Conclusion:

From the definition of variance and its property for normal distributions, we can conclude:

\[ E[(X - \mu)^2] = \sigma^2. \]