Introduction to Statistical Theory for Health Researchers

Author
Affiliation

Bongani Ncube(3002164)

University Of the Witwatersrand (School of Public Health)

Published

May 5, 2025

Keywords

Conjugate Priors, Bayesian Statistics, Maximum likelihood, UMVUE, Powerful test

Question 1: Poisson Distribution with Chi-Square Prior

a) What is the posterior distribution of λ, after observing a single value of x?

Step 1: Likelihood Function

If \(X \sim \text{Poisson}(\lambda)\), then the likelihood of observing \(x\) is:

\[ P(X = x \mid \lambda) = \frac{\lambda^x e^{-\lambda}}{x!} \]

Step 2: Prior Distribution

Given \(\lambda \sim \chi^2_4\), we can express this as a Gamma distribution:

\[ \lambda \sim \text{Gamma}(\alpha = 2, \theta = 2) \]

Recall that a Chi-square distribution with \(k\) degrees of freedom is equivalent to:

\[ \chi^2_k \equiv \text{Gamma}\left(\frac{k}{2}, 2\right) \]

So the prior density is:

\[ \pi(\lambda) = \frac{1}{\Gamma(2) \cdot 2^2} \lambda^{2-1} e^{-\lambda/2} = \frac{1}{4} \lambda e^{-\lambda/2} \] since \(\Gamma(2)=1\)

Step 3: Posterior Distribution

Using Bayes’ theorem, the unnormalized posterior is:

\[ \pi(\lambda \mid x) \propto P(x \mid \lambda) \cdot \pi(\lambda) = \left( \frac{\lambda^x e^{-\lambda}}{x!} \right) \cdot \left( \frac{1}{4} \lambda e^{-\lambda/2} \right) \]

Simplifying:

\[\begin{aligned} \pi(\lambda \mid x) &\propto\lambda^{x + 1} e^{- \frac{3}{2} \lambda}\\ &\propto\lambda^{x + 2-1} e^{- \frac{3}{2} \lambda}\\ \end{aligned} \]

This is the kernel of a Gamma distribution:

\[ \lambda \mid x \sim \text{Gamma}(x + 2, \text{rate} = \frac{3}{2}) \]

Final Answer

The posterior distribution of \(\lambda\), after observing \(x\), is:

\[ \boxed{\lambda \mid x \sim \text{Gamma}(x + 2, \text{rate} = \frac{3}{2})} \]

where \[\alpha^* = x+2=6 , \quad \beta^*=3/2\]

b) Given an observed value x = 4, compute the posterior mean, mode, and 95% credible interval, both manually and using any software of your choice.

\[mean = E(X)=\frac{\alpha^*}{\beta^*}=\frac{4+2}{1.5}=4\] \[mode = \frac{\alpha^* -1}{\beta^*}=\frac{6-1}{1.5}=3.333\]

95% confidence interval

we want to find the values \((a,b)\) such that \(P^{*}(a\leq \theta \leq b) =0.95\) and the values are such that :

\[P^{*}(\theta \geq a)=0.975 , \quad P^{*}(\theta \leq b)=0.025\] the values are found in the R code below where a=c_lower and b=c_upper

Solution in R

Posterior Distribution of λ

  1. Prior: \(\lambda \sim \chi^2(4)\) which is \(Gamma(shape=2, rate=0.5)\)
  2. Likelihood: \(X|\lambda \sim Poisson(\lambda)\)
  3. Posterior: \(Gamma(shape = 2 + x, rate = 0.5 + 1)\)
# (b) Calculations for observed x=4
x <- 4
post_shape <- 2 + x
post_rate <- 0.5 + 1  # = 1.5

# Posterior mean
post_mean <- post_shape / post_rate

# Posterior mode
post_mode <- (post_shape - 1) / post_rate

# 95% Credible Interval
ci_lower <- qgamma(0.025, shape=post_shape, rate=post_rate)
ci_upper <- qgamma(0.975, shape=post_shape, rate=post_rate)

# Print results
cat("(a) Posterior distribution: Gamma(shape =", post_shape, ", rate =", post_rate, ")\n")
(a) Posterior distribution: Gamma(shape = 6 , rate = 1.5 )
cat("(b) For x = 4:\n")
(b) For x = 4:
cat("  Posterior mean:", post_mean, "\n")
  Posterior mean: 4 
cat("  Posterior mode:", post_mode, "\n")
  Posterior mode: 3.333333 
cat("  95% Credible Interval: (", ci_lower, ",", ci_upper, ")\n")
  95% Credible Interval: ( 1.46793 , 7.778888 )
# (c) Visualization of prior vs posterior
library(ggplot2)

# Create sequence of λ values
lambda_vals <- seq(0, 15, length.out=500)

# Prior density (Chi-square with 4 df = Gamma(2, 0.5))
prior_dens <- dgamma(lambda_vals, shape=2, rate=0.5)

# Posterior density
post_dens <- dgamma(lambda_vals, shape=post_shape, rate=post_rate)

# Create data frame for plotting
df <- data.frame(
  lambda = rep(lambda_vals, 2),
  density = c(prior_dens, post_dens),
  distribution = rep(c("Prior (χ²4)", "Posterior"), each=length(lambda_vals))
)

# Plot
ggplot(df, aes(x=lambda, y=density, color=distribution)) +
  geom_line(linewidth=1.2) +
  geom_vline(xintercept=post_mean, linetype="dashed", color="red") +
  geom_vline(xintercept=post_mode, linetype="dotted", color="blue") +
  annotate("rect", xmin=ci_lower, xmax=ci_upper, ymin=0, ymax=0.2, 
           alpha=0.2, fill="green") +
  labs(title="Prior and Posterior Distributions of λ",
       subtitle="For observed x=4 with Poisson likelihood and χ²₄ prior",
       x="λ", y="Density") +
  theme_minimal() +
  scale_color_manual(values=c("Prior (χ²4)"="black", "Posterior"="red")) +
  theme(legend.position="bottom")

Interpretation

95% Credible Interval (Bayesian):

  • Given the observed data and prior, there is a 95% probability that the true value of \(\lambda\) ,\(\lambda\) lies within this interval.

95% Confidence Interval (Frequentist):

  • If we were to repeat the experiment many times, 95% of the intervals constructed this way would contain the true \(\lambda\). It does not assign a probability to \(\lambda\) itself — \(\lambda\) is fixed, and the interval is random.

Question 2: Mean Heights of Males in a Population

Derive the Normal-Normal Conjugate posterior distribution

Data comes from a normal distribution with known variance \(\sigma^2\) but unknown mean \(\mu\), and if your prior on the mean \(\mu\), has a normal distribution with self-elicited mean \(\mu_0\) and self-elicited variance \(\sigma_0\), then your posterior density for the mean, after seeing a sample of size \(n\) with sample mean \(\bar{y}\), is also normal. In mathematical notation, we have:

\[\begin{aligned} y|\mu &\sim N(\mu,\sigma^2) \\ \mu &\sim N(\mu_0, \sigma_0^2) \end{aligned}\]

the likelihood expression is given by:

\[ \begin{aligned} P(y|\mu) &=L(y|\mu)\\ &= \prod^n_{n=1}\frac{1}{\sqrt{2\pi\sigma}}e^{\frac{-(y-\mu)^2}{2\sigma^2}}\\ &=\left(\frac{1}{\sqrt{2\pi\sigma}}\right)^ne^{\frac{-\sum(y-\mu)^2}{2\sigma^2}} \end{aligned} \]

Deriving the normal-normal posterior

\[ \begin{aligned} \pi^*(\mu|y) &\propto P(y|\mu)\pi(\mu) \\ &= \left(\frac{1}{\sqrt{2\pi\sigma}}\right)^ne^{\frac{-\sum(y-\mu)^2}{2\sigma^2}}*\left(\frac{1}{\sqrt{2\pi\sigma_0}}\right)e^{\frac{(\mu-\mu_0)^2}{2\sigma_0^2}}\\ &= \left(\frac{1}{\sqrt{2\pi\sigma}}\right)^n*\left(\frac{1}{\sqrt{2\pi\sigma_0}}\right)e^{\frac{-\sum(y-\mu)^2}{2\sigma^2}}e^{\frac{-(\mu-\mu_0)^2}{2\sigma_0^2}}\\ &= \left(\frac{1}{\sqrt{2\pi\sigma}}\right)^n*\left(\frac{1}{\sqrt{2\pi\sigma_0}}\right)e^{\frac{-\sum(y^2-2y\mu+\mu^2)}{2\sigma^2}}e^{\frac{-(\mu^2-2\mu\mu_0+\mu_0^2)}{2\sigma_0^2}}\\ &= \left(\frac{1}{\sqrt{2\pi\sigma}}\right)^n*\left(\frac{1}{\sqrt{2\pi\sigma_0}}\right)e^{\frac{-\sum(y^2-2y\mu+\mu^2)}{2\sigma^2}\frac{-(\mu^2-2\mu\mu_0+\mu_0^2)}{2\sigma_0^2}}\\ &\propto e^{\frac{-\sum(y^2-2y\mu+\mu^2)}{2\sigma^2}\frac{-(\mu^2-2\mu\mu_0+\mu_0^2)}{2\sigma_0^2}} \end{aligned} \]

  • Looking at the term in the exponent we see that it can be simplified to:
\[\begin{aligned} {\frac{-\sum(y^2-2y\mu+\mu^2)}{2\sigma^2}\frac{-(\mu^2-2 \mu\mu_0+\mu_0^2)}{2\sigma_0^2}}&=\frac{-\sigma_0^2\sum y_i^2+2\mu\sigma_0^2\sum y_i-n\mu^2\sigma^2_0-\sigma^2 \mu^2+2\mu_0\sigma^2 \mu-\mu_0^2\sigma^2}{2\sigma^2\sigma_0^2}\\ &= \frac{-n\mu^2\sigma_0^2-\sigma^2\mu^2+2\mu\sigma_0^2n\bar{y}+2\mu_0\sigma^2 \mu}{2\sigma^2\sigma_0^2}\\ &=\frac{-\mu^2(n\sigma_0^2+\sigma^2)+2\mu(\sigma_0^2n\bar{y}+\mu_0\sigma^2)}{2\sigma^2\sigma_0^2}\\ &=\frac{-\mu^2(n\sigma_0^2+\sigma^2)}{2\sigma^2\sigma_0^2}+\frac{2\mu(\sigma_0^2n\bar{y}+\mu_0\sigma^2)}{2\sigma^2\sigma_0^2}\\ &=-\mu^2/2(n/\sigma^2+1/\sigma^2_0)+\mu(n\bar{y}/\sigma^2+\mu_0/\sigma^2_0)\\ &=-a\mu^2/2+b\mu\\ &=\frac{-a}{2}(\mu^2-\frac{2b}{a}\mu)\\ &=\frac{-a}{2}\left[\mu^2-\frac{2b}{a}\mu\right]\\ &=\frac{-a}{2}\left[\mu^2-\frac{2b}{a}\mu+\frac{b^2}{a^2}-\frac{b^2}{a^2}\right]\\ &=\frac{-a}{2}\left[(\mu-\frac{b}{a})^2-\frac{b^2}{a^2}\right]\\ &=\frac{-a}{2}(\mu-\frac{b}{a})^2-\frac{b^2}{2a}\\ &\propto \frac{-a}{2}(\mu-\frac{b}{a})^2\\ &\propto \frac{-1}{2}\frac{(\mu-\frac{b}{a})^2}{\frac{1}{a}}\\ \end{aligned}\]

the last eypression entails a normal distribution with

\[\mu^* =\frac{b}{a}=\frac{\frac{n\bar{y}}{\sigma^2}+\frac{\mu_0}{\sigma^2_0}}{\frac{n}{\sigma^2}+\frac{1}{\sigma^2_0}}\]

\[\sigma^{2*} = \frac{1}{a}=\frac{1}{\frac{n}{\sigma^2}+\frac{1}{\sigma^2_0}}\]

Hence for the normal-normal conjugate families, assume the prior on the unknown mean follows a normal distribution, i.e. \(\mu \sim N(\mu_0, \sigma_0^2)\). We also assume that the data \(y_1,y_2,\cdots,y_n\) are independent and come from a normal with variance \(\sigma^2\).

Then the posterior distribution of \(\mu\) is also normal, with mean as a weighted average of the prior mean and the sample mean. We have

\[\mu|y_1,y_2,\cdots,y_n \sim N(\mu_0^*, \sigma^{2*}),\]

where

\[\mu^* = \frac{\mu_0\sigma^2 + n\bar{y}\sigma^2_0}{\sigma^2 + n\sigma_0^2} \text{ and } \sigma^{2*} = \sqrt{\frac{\sigma^2\sigma_0^2}{\sigma^2 + n\sigma_0^2}}.\] (b) Given parameters:

\[\begin{align*} n &= 80,\quad \bar{y} = 182\,\text{cm} \\ \sigma &= 15\,\text{cm},\quad \mu_0 = 187\,\text{cm},\quad \sigma_0 = 10\,\text{cm} \end{align*}\]

Posterior calculations: \[\begin{align*} \mu_n &= \frac{10^2 \times 80 \times 182 + 15^2 \times 187}{80 \times 10^2 + 15^2} \\ &= \frac{1,456,000 + 42,075}{8,000 + 225} \\ &\approx 182.14\,\text{cm} \\[10pt] \sigma_n^2 &= \left(\frac{80}{15^2} + \frac{1}{10^2}\right)^{-1} \\ &= \left(\frac{80}{225} + \frac{1}{100}\right)^{-1} \\ &\approx (0.3556 + 0.01)^{-1} \approx 2.74\,\text{cm}^2 \\ \sigma_n &\approx 1.65\,\text{cm} \end{align*}\]

Posterior predictive probability:

\[ P(Y > 180) = 1 - \Phi\left(\frac{180 - \mu_n}{\sqrt{\sigma_n^2}}\right) \approx 1 - \Phi\left(\frac{-2.17}{15.11}\right) \approx 0.9018 \]

c) Use any software of your choice to solve part (b) and include appropriate graphs

# Given values
mu0 <- 187
sigma0 <- 10
ybar <- 182
sigma <- 15
n <- 80

# Posterior parameters
(sigma_n2 <- 1 / (1 / sigma0^2 + n / sigma^2))
[1] 2.735562
(mu_n <- (sigma0^2 * ybar + (sigma^2 / n) * mu0) / (sigma0^2 + sigma^2 / n))
[1] 182.1368
sigma_n <- sqrt(sigma_n2)

# Compute P(Y > 180)
p_gt_180 <- 1 - pnorm(180, mean = mu_n, sd = sigma_n)
p_gt_180
[1] 0.9018078
# Plot posterior
library(ggplot2)
x <- seq(mu_n - 4*sigma_n, mu_n + 4*sigma_n, length.out = 1000)
df <- data.frame(x = x, density = dnorm(x, mean = mu_n, sd = sigma_n))

ggplot(df, aes(x = x, y = density)) +
  geom_line(color = "blue") +
  geom_vline(xintercept = 180, color = "red", linetype = "dashed") +
  geom_area(data = subset(df, x > 180), aes(y = density), fill = "blue", alpha = 0.3) +
  labs(title = "Posterior Distribution of μ", x = "Height (cm)", y = "Density") +
  theme_minimal()

Question 3: Hypothesis Testing with Normal Distribution

Let \(X_1, X_2, \ldots, X_n\) be a random sample from the normal distribution \(\mathcal{N}(\mu, 64)\). We want to test the simple null hypothesis:

\[ H_0: \mu = 80 \quad \text{vs} \quad H_1: \mu = 76 \]

using the likelihood ratio:

\[ \Lambda(x) = \frac{L(80)}{L(76)} \]

(a) Derivation of the Likelihood Ratio

Since the variance is known (\(\sigma^2 = 64\)), the likelihood function under a specific value of \(\mu\) is:

\[\begin{align} L(X;\mu) &= \prod_{i=1}^{n} \frac{1}{\sqrt{2\pi \cdot 64}} \exp\left( -\frac{(x_i - \mu)^2}{2 \cdot 64} \right)\\ &= \left( \frac{1}{\sqrt{2\pi \cdot 64}} \right)^n \exp\left( -\frac{1}{128} \sum_{i=1}^n (x_i - \mu)^2 \right) \end{align} \]

The likelihood ratio is:

\[ \begin{aligned} \Lambda(x) &= \frac{L(X; \mu_0)}{L(X; \mu_1)} = \frac{L(80)}{L(76)} \\ &= \exp\left(-\frac{1}{2\sigma^2} \left(\sum_{i=1}^n (X_i - \mu_0)^2 - \sum_{i=1}^n (X_i - \mu_1)^2\right)\right)\\ &= \exp\left( \frac{1}{128} \left[ \sum_{i=1}^{n} (X_i - 76)^2 - \sum_{i=1}^{n} (X_i - 80)^2 \right] \right) \end{aligned} \]

Also note that:

\[\Lambda(x) = \exp\left(\frac{1}{2\sigma^2} \left(-2(\mu_1 - \mu_0)\sum_{i=1}^n X_i + n(\mu_1^2 - \mu_0^2)\right)\right)\]

Now simplify the difference inside the exponent:

\[ (x_i - 76)^2 - (x_i - 80)^2 = 8x_i - 624 \]

So the sum becomes:

\[\begin{aligned} \sum_{i=1}^n \left[ (x_i - 76)^2 - (x_i - 80)^2\right]&=8\sum X_i - \sum 624\\ &=8.n.\frac{\sum X_i}{n} - 624n\\ &= 8n \bar{X} - 624n\\ \end{aligned} \]

Thus, the likelihood ratio simplifies to:

\[\begin{aligned} \Lambda(x) &= \exp(\frac{8n \bar{X} - 624n}{128})\\ &=\exp\left( \frac{n}{16} (\bar{X} - 78) \right) \end{aligned} \]

(b) Likelihood Ratio Condition

We want to find the values in the sample space for which \(\Lambda(x) \leq k\), for some constant \(k > 0\). That is,

\[ \exp\left( \frac{n}{16} (\bar{X} - 78) \right) \leq k \]

Taking natural logarithms on both sides:

\[ \frac{n}{16} (\bar{X} - 78) \leq \ln k \quad \Rightarrow \quad \bar{X} \leq \frac{16}{n} \ln k + 78 \]

Conclusion

The critical region for the test is:

\[ \bar{X} \leq \frac{16}{n} \ln k + 78 \]

where \(k\) is chosen to satisfy the desired significance level \(\alpha\).

Let \(c = 16\ln k + 78n\), then the critical region is: \[\boxed{\sum_{i=1}^n X_i \leq c}\]

  1. Determine n and c for \(\alpha \approx 0.05\), \(\beta \approx 0.05\)

Under \(H_0\): \(\sum X_i \sim N(n80, n64)\) \ Under \(H_1\): \(\sum X_i \sim N(n76, n64)\)

For \(\alpha = 0.05\): \[P\left(\sum X_i \leq c\mid\mu=80\right) = 0.05\] \[\frac{c - 80n}{8\sqrt{n}} = -1.645\]

For \(\beta = 0.05\): \[P\left(\sum X_i > c\mid\mu=76\right) = 0.05\] \[\frac{c - 76n}{8\sqrt{n}} = 1.645\]

Solving the system: \[c = 80n - 1.645\times8\sqrt{n}\] \[c = 76n + 1.645\times8\sqrt{n}\]

Equating both expressions: \[80n - 13.16\sqrt{n} = 76n + 13.16\sqrt{n}\] \[4n = 26.32\sqrt{n}\] \[\sqrt{n} = \frac{26.32}{4} = 6.58\] \[n = (6.58)^2 \approx 43.3\]

\[c = 76(43.3) + 1.645\times 8*\sqrt{44}=3431.294\]

Question 4: Moment Generating Function

Let \[M(t_1, t_2) = \mathbb{E}[e^{t_1 X + t_2 Y}] \tag{1}\] be the joint moment generating function (MGF) of two random variables \(X\) and \(Y\). We can show that:

\[\begin{aligned} M(t_1, t_2) &= \mathbb{E}[e^{t_1 X + t_2 Y}]\\ &= \mathbb{E}[e^{t_1 X}]\mathbb{E}[e^{t_2 Y}] \\ &= \mathbb{M}(t_1,0)\mathbb{M}(0,t_2) \end{aligned}\]

we also know that \[M(t_1,0) =\int^{\infty}_{-\infty}e^{t_1x}f(x)dx\]

\[M(0,t_2) =\int^{\infty}_{-\infty}e^{t_2y}f(y)dy\]

Now

\[\begin{aligned} M(t_1, t_2) &= \mathbb{M}(t_1,0)\mathbb{M}(0,t_2)\\ &=\int^{\infty}_{-\infty}e^{t_1x}f(x)dx \int^{\infty}_{-\infty}e^{t_2y}f(y)dy\\ &= \int^{\infty}_{-\infty}\int^{\infty}_{-\infty}e^{t_1x+t_2y}f(x)f(y)dxdy\\ &= \int^{\infty}_{-\infty}\int^{\infty}_{-\infty}e^{t_1x+t_2y}f(x,y)dxdy \end{aligned}\]

Define the cumulant generating function:

\[ \psi(t_1, t_2) = \log M(t_1, t_2) \]

We will show that the partial derivatives of \(\psi\) at \((t_1, t_2) = (0, 0)\) yield the mean, variance, and covariance of \(X\) and \(Y\).

First Derivatives (Means)

\[\begin{aligned} M(t_1, t_2) &= \int^{\infty}_{-\infty}\int^{\infty}_{-\infty}e^{t_1x+t_2y}f(x,y)dxdy\\ then~~ \frac{\partial}{\partial t_1}M(t_1, t_2) &= \int^{\infty}_{-\infty}\int^{\infty}_{-\infty}\frac{\partial}{\partial t_1} e^{t_1x+t_2y}f(x,y)dxdy\\ &=\int^{\infty}_{-\infty}\int^{\infty}_{-\infty}X e^{t_1x+t_2y}f(x,y)dxdy\\ &= \mathbb{E}[X e^{t_1 X + t_2 Y}] \end{aligned}\]

by inspection :

\[\frac{\partial^r}{\partial t_1^r}M(t_1, t_2) = \mathbb{E}[X^r e^{t_1 X + t_2 Y}] \tag{2}\]

\[\frac{\partial^r}{\partial t_2^r}M(t_1, t_2) = \mathbb{E}[Y^r e^{t_1 X + t_2 Y}] \tag{3}\]

\[\frac{\partial^2}{\partial t_1\partial t_2}M(t_1, t_2) = \mathbb{E}[XY e^{t_1 X + t_2 Y}] \tag{4}\] hence

\[ \frac{\partial}{\partial t_1} \psi(t_1, t_2) = \frac{1}{M(t_1, t_2)} \cdot \frac{\partial }{\partial t_1}M(t_1, t_2) = \frac{\mathbb{E}[X e^{t_1 X + t_2 Y}]}{\mathbb{E}[e^{t_1 X + t_2 Y}]} \] \[ \frac{\partial}{\partial t_1} \psi(0, 0) = \frac{\mathbb{E}[X e^{0 (X) + 0(Y)}]}{\mathbb{E}[e^{0(X) + 0(Y)}]}=\frac{\mathbb{E} [X]}{1} =\mathbb{E} [X] \] In the same manner:

\[ \frac{\partial}{\partial t_2} \psi(t_1, t_2) = \frac{1}{M(t_1, t_2)} \cdot \frac{\partial }{\partial t_1}M(t_1, t_2) = \frac{\mathbb{E}[Y e^{t_1 X + t_2 Y}]}{\mathbb{E}[e^{t_1 X + t_2 Y}]} \]

\[ \frac{\partial}{\partial t_2} \psi(0, 0) = \frac{\mathbb{E}[Y e^{0 (X) + 0(Y)}]}{\mathbb{E}[e^{0(X) + 0(Y)}]}=\frac{\mathbb{E} [Y]}{1} =\mathbb{E} [Y] \]

So, the first derivatives at \((0, 0)\) are:

\[ \left. \frac{\partial \psi}{\partial t_1} \right|_{(0, 0)} = \mathbb{E}[X], \quad \left. \frac{\partial \psi}{\partial t_2} \right|_{(0, 0)} = \mathbb{E}[Y] \]

Second Derivatives (Variances)

Using Quotient Rule we can find the second derivatives as below

\[ \frac{\partial^2}{\partial t_1^2}\psi(t_1, t_2) = \frac{M(t_1,t_2) \cdot \frac{\partial^2 M(t_1,t_2)}{\partial t_1^2} - \left( \frac{\partial M(t_1,t_2)}{\partial t_1} \right)^2}{M(t_1,t_2)^2} \]

At \((0, 0)\), this simplifies to:

\[ \frac{\partial^2}{\partial t_1^2}\psi(0, 0) = \frac{M(0,0) \cdot \frac{\partial^2 M(0,0)}{\partial t_1^2} - \left( \frac{\partial M(0,0)}{\partial t_1} \right)^2}{M(0,0)^2} \] \[ \frac{\partial^2}{\partial t_1^2}\psi(0, 0) = \frac{1 \cdot \mathbb{E}[X^2] - [\mathbb{E}[X]]^2}{1^2}=\mathbb{E}[X^2] - [\mathbb{E}[X]]^2=Var(X) \]

\[ \left. \frac{\partial^2 \psi}{\partial t_1^2} \right|_{(0, 0)} = \mathbb{E}[X^2] - (\mathbb{E}[X])^2 = \mathrm{Var}(X) \]

Similarly:

\[ \left. \frac{\partial^2 \psi}{\partial t_2^2} \right|_{(0, 0)} = \mathrm{Var}(Y) \]

Covariances

since

\[\begin{aligned} \frac{\partial^2}{\partial t_1 \partial t_2}\psi(t_1, t_2) &= \frac{M(t_1,t_2) \cdot \frac{\partial^2 M(t_1,t_2)}{\partial t_1 \partial t_2} - \frac{\partial M(t_1,t_2)}{\partial t_1} \frac{\partial M(t_1,t_2)}{\partial t_2}}{M(t_1,t_2)^2}\\ \frac{\partial^2}{\partial t_1 \partial t_2}\psi(0, 0) &= \frac{M(0,0) \cdot \frac{\partial^2 M(0,0)}{\partial t_1 \partial t_2} - \frac{\partial M(0,0)}{\partial t_1} \frac{\partial M(0,0)}{\partial t_2}}{M(0,0)^2}\\ \left. \frac{\partial^2 \psi}{\partial t_1 \partial t_2} \right|_{(0, 0)} &= \mathbb{E}[XY] - \mathbb{E}[X]\mathbb{E}[Y] = \mathrm{Cov}(X, Y) \end{aligned}\]

Since for the mixed partial derivative:

\[\frac{\partial^2}{\partial t_1 \partial t_2}M(t_1, t_2) =\mathbb{E}[XY e^{t_1 X + t_2 Y}]\] \[\frac{\partial^2}{\partial t_1 \partial t_2}M(0, 0) =\mathbb{E}[XY]\]

In short

\[\begin{aligned} \left. \frac{\partial \psi}{\partial t_1} \right|_{(0, 0)} &= \mathbb{E}[X] \\ \left. \frac{\partial \psi}{\partial t_2} \right|_{(0, 0)} &= \mathbb{E}[Y] \\ \left. \frac{\partial^2 \psi}{\partial t_1^2} \right|_{(0, 0)} &= \mathrm{Var}(X) \\ \left. \frac{\partial^2 \psi}{\partial t_2^2} \right|_{(0, 0)} &= \mathrm{Var}(Y) \\ \left. \frac{\partial^2 \psi}{\partial t_1 \partial t_2} \right|_{(0, 0)} &= \mathrm{Cov}(X, Y) \end{aligned}\]

Question (5)

\[P(X=x) = \frac{e^{-\lambda} \lambda^x}{x!}=\frac{e^{-50} 50^x}{x!}\]

Finding the maximum likelihood

\[\mathcal L(\lambda;x)=\frac{e^{-\lambda} \lambda^x}{x!}\]

Finding the log likelihood

\[ \begin{aligned} log\mathcal L(\lambda;x) &= log\left(\frac{e^{-\lambda} \lambda^x}{x!}\right)\\ &= log(e^{-\lambda} \lambda^x)-log(x!)\\ &= -\lambda+ xlog\lambda-log(x!)\\ differentiate~~\\ \frac{d}{d\lambda}log\mathcal L(\lambda;x) &= -1 + \frac{x}{\lambda} \end{aligned} \]

Maximize by equating to zero and solve ,i.e

\[ \begin{aligned} \frac{d}{d\lambda}log\mathcal L(\lambda;x) &= 0\\ -1 + \frac{x}{\lambda} &= 0\\ \frac{x}{\lambda} &= 1\\ \hat{x} &= \lambda=50 \end{aligned} \]

Question 6: Unbiasedness and Bias

# Set seed for reproducibility
set.seed(123)

# Parameters
mu <- 21
sigma <- 3  # since variance = 9
n1 <- 30
n2 <- 100
simulations <- 10000

# Function to simulate and analyze
simulate_estimates <- function(n) {
  samples <- matrix(rnorm(simulations * n, mean = mu, sd = sigma), nrow = simulations)
  xbar <- rowMeans(samples)
  xbar_biased <- 0.9 * xbar
  
  data.frame(xbar = xbar, xbar_biased = xbar_biased)
}

# Simulate for both sample sizes
results_n30 <- simulate_estimates(n1)
results_n100 <- simulate_estimates(n2)

# Summary statistics function
summarize_results <- function(data) {
  list(
    mean_xbar = mean(data$xbar),
    bias_xbar = mean(data$xbar) - mu,
    var_xbar = var(data$xbar),
    
    mean_biased = mean(data$xbar_biased),
    bias_biased = mean(data$xbar_biased) - mu,
    var_biased = var(data$xbar_biased)
  )
}

# Get summaries
summary_n30 <- summarize_results(results_n30)
summary_n100 <- summarize_results(results_n100)

print(summary_n30)
$mean_xbar
[1] 21.0056

$bias_xbar
[1] 0.005595708

$var_xbar
[1] 0.3047891

$mean_biased
[1] 18.90504

$bias_biased
[1] -2.094964

$var_biased
[1] 0.2468792
print(summary_n100)
$mean_xbar
[1] 20.99441

$bias_xbar
[1] -0.005587189

$var_xbar
[1] 0.0889089

$mean_biased
[1] 18.89497

$bias_biased
[1] -2.105028

$var_biased
[1] 0.07201621
# Plot histograms
library(ggplot2)
library(tidyr)

results_n30_long <- pivot_longer(results_n30, cols = everything(), names_to = "Estimator", values_to = "Value")
results_n100_long <- pivot_longer(results_n100, cols = everything(), names_to = "Estimator", values_to = "Value")

# Plot for n = 30
ggplot(results_n30_long, aes(x = Value, fill = Estimator)) +
  geom_density(alpha = 0.5) +
  geom_vline(xintercept = mu, color = "black", linetype = "dashed") +
  labs(title = "Sampling Distributions for n = 30", x = "Estimator Value", y = "Density") +
  theme_minimal()

# Plot for n = 100
ggplot(results_n100_long, aes(x = Value, fill = Estimator)) +
  geom_density(alpha = 0.5) +
  geom_vline(xintercept = mu, color = "black", linetype = "dashed") +
  labs(title = "Sampling Distributions for n = 100", x = "Estimator Value", y = "Density") +
  theme_minimal()

Comments

Which estimator appears unbiased for n=30?

  • The sample mean (X̄) is unbiased (bias ≈ 0)
  • The 0.9X̄ estimator is biased (bias ≈ -2.1)

Which has lower variance for n=30 and for n=100?

The biased estimator has lower variance (0.244 vs 0.301 for n=30; 0.072 vs 0.089 for n=100)

Which estimator would you prefer?

For precise estimation: Biased estimator (lower variance)

For accurate estimation: Sample mean (unbiased)

Trade-off depends on application (MSE = Variance + Bias^2)

Effects of larger sample size (n=100):

  • Both distributions become narrower (lower variance)
  • Shapes become more concentrated around their means
  • Variance decreases proportionally to 1/n
  • Bias becomes less significant with larger n

Question 7: Estimation and Expectation

Part (A)

Let \((G_1, G_2, \dots, G_m)\) and \((H_1, H_2, \dots, H_n)\) be independent random variables from two continuous distribution functions. Find the Uniformly Minimum Variance Unbiased Estimators (UMVUE) of:

(i) \(E(GH)\)

Since \(G\) and \(H\) are independent: \[E(GH) = E(G)E(H)\]

The sample means are unbiased estimators: \[\bar{G} = \frac{1}{m}\sum_{i=1}^m G_i, \quad \bar{H} = \frac{1}{n}\sum_{j=1}^n H_j\]

thus

\[E(\bar{G}) = \mu_G, \quad E(\bar{H}) = \mu_H\] Proof

\[\begin{aligned} E(\bar{X}) &= E(\frac{\sum^n_{i=1} X_i)}{n}\\ &= \sum^n_{i=1}(\frac{E( X_i)}{n})\\ &= \frac{n.\mu}{n}\\ &= \mu \end{aligned}\]

Because \(\bar{G}\) and \(\bar{H}\) are independent: \[E(\bar{G}\bar{H}) = E(\bar{G})E(\bar{H}) = E(G)E(H)\]

Thus, the UMVUE for \(E(GH)\) is: \[\color{dodgerblue}{\boxed{\bar{G}\bar{H} = \left(\frac{1}{m}\sum_{i=1}^m G_i\right)\left(\frac{1}{n}\sum_{j=1}^n H_j\right)}}\]

(ii) \(\text{Var}(G + H)\)

Since \(G\) and \(H\) are independent: \[\text{Var}(G + H) = \text{Var}(G) + \text{Var}(H)\]

The sample variances are unbiased estimators: \[S_G^2 = \frac{1}{m-1}\sum_{i=1}^m (G_i - \bar{G})^2, \quad S_H^2 = \frac{1}{n-1}\sum_{j=1}^n (H_j - \bar{H})^2\] thus

\[E(S_G^2) = \sigma^2_G, \quad E(S_H^2) = \sigma^2_H\]

Proof

\[E(S^2) = \sigma^2\]

\[\begin{aligned} E\bigg[\frac{1}{n-1} \sum_{i=1}^n (X_i - \overline{X})^2 \bigg] &= \frac{\sum_{i=1}^n E(X_i^2) + E \bigg[ - 2 \overline{X}\left( {\color{tomato}{n\overline{X}}} \right) + n \overline{X}^2 \bigg]}{n-1} \\ &= \frac{\sum_{i=1}^n E(X_i^2) + E \bigg[ -n \overline{X}^2\bigg]}{n-1}\\ &= \frac{\sum_{i=1}^n E(X_i^2) - n E \big[ \overline{X}^2\big]}{n-1} .\\ &= \frac{\sum_{i=1}^n {\color{dodgerblue}{ E \big[ X_i^2 \big]}} - n {\color{tomato}{E \big[ \overline{X}^2 \big]}}}{n-1} \\ &= \frac{\sum_{i=1}^n \bigg( {\color{dodgerblue}{ \mbox{Var} \big[ X_i \big] + \left( E \big[ X_i \big] \right)^2 }} \bigg) - n \left( {\color{tomato}{\mbox{Var} \big[ \overline{X} \big] + \left( E \big[ \overline{X} \big]\right)^2}} \right)}{n-1} \\ &= \frac{\sum_{i=1}^n {\color{dodgerblue}{ \left( \sigma^2 + \mu^2 \right)}} - n \left( \mbox{Var} \big[ \overline{X} \big] + \left( E \big[ \overline{X} \big]\right)^2 \right)}{n-1} \\ &= \frac{\sum_{i=1}^n\left( \sigma^2 + \mu^2 \right) - n \left( {\color{tomato}{\frac{\sigma^2}{n}}} + \left( {\color{tomato}{ \mu }}\right)^2 \right)}{n-1} \\ &= \frac{{\color{dodgerblue}{n(\sigma^2 + \mu^2)}} - \sigma^2 - n\mu^2}{n-1} \\ &= \frac{(n-1) \sigma^2}{n-1}.\\ &= \sigma^2 \end{aligned}\]

Thus, the UMVUE for \(\text{Var}(G + H)\) is: \[\color{dodgerblue}{\boxed{S_G^2 + S_H^2 = \frac{1}{m-1}\sum_{i=1}^m (G_i - \bar{G})^2 + \frac{1}{n-1}\sum_{j=1}^n (H_j - \bar{H})^2}}\]

Part (B)

Suppose \(X \sim N(\mu, \sigma^2)\). Prove that: \[E[(X - \mu)^2] = \sigma^2\]

By definition of variance: \[\text{Var}(X) = E[(X - \mu)^2]\]

For \(X \sim N(\mu, \sigma^2)\): \[\text{Var}(X) = \sigma^2\]

Therefore: \[\color{dodgerblue}{\boxed{E[(X - \mu)^2] = \sigma^2}}\]