PS_8_1

Question 9

library(tidyverse)

Warning: package 'tidyverse' was built under R version 4.5.2

Warning: package 'ggplot2' was built under R version 4.5.2

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   4.0.1     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(purrrfect)


Attaching package: 'purrrfect'

The following objects are masked from 'package:base':

    replicate, tabulate

library(knitr)

(summary_table <- parameters(
  ~n, ~beta,
  c(3, 11, 51), c(5)
) %>% 
  add_trials(10000) %>% 
  mutate(
    y_sample = pmap(list(n, beta), .f = \(n, b) rexp(n, rate = 1/b)),
    theta_hat1 = map_dbl(y_sample, mean),
    theta_hat2 = map_dbl(y_sample, \(x) mean(x) * log(2)),
    theta_hat3 = map_dbl(y_sample, median)
  ) %>% 
  pivot_longer(
    cols = starts_with("theta"), 
    names_to = 'estimator', 
    values_to = 'estimate'
  ) %>% 
  summarize(
    bias = mean(estimate - (5 * log(2))),
    variance = var(estimate),
    .by = c(n, estimator)
  ) %>% 
  mutate(mse = bias^2 + variance) %>% 
  pivot_longer(
    cols = c(bias, variance, mse),
    names_to = "metric",
    values_to = "value"
  ) %>% 
  mutate(
    metric = factor(metric, levels = c("bias", "variance", "mse"), labels = c("B", "Var", "MSE")),
    estimator = factor(estimator, levels = c("theta_hat1", "theta_hat2", "theta_hat3"), labels = c("(theta1)", "(theta2)", "(theta3)"))
  ) %>% 
  arrange(metric, estimator) %>% 
  unite("Statistic", metric, estimator, sep = "") %>% 
  pivot_wider(
    names_from = n,
    values_from = value,
    names_prefix = "n = "
  ))

# A tibble: 9 × 4
  Statistic   `n = 3` `n = 11` `n = 51`
  <chr>         <dbl>    <dbl>    <dbl>
1 B(theta1)    1.56    1.52     1.54   
2 B(theta2)    0.0173 -0.00909  0.00481
3 B(theta3)    0.713   0.190    0.0610 
4 Var(theta1)  8.46    2.27     0.480  
5 Var(theta2)  4.07    1.09     0.231  
6 Var(theta3)  9.06    2.38     0.486  
7 MSE(theta1) 10.9     4.59     2.86   
8 MSE(theta2)  4.07    1.09     0.231  
9 MSE(theta3)  9.57    2.42     0.490

kable(summary_table, digits = 3)

Statistic	n = 3	n = 11	n = 51
B(theta1)	1.559	1.521	1.541
B(theta2)	0.017	-0.009	0.005
B(theta3)	0.713	0.190	0.061
Var(theta1)	8.463	2.275	0.480
Var(theta2)	4.066	1.093	0.231
Var(theta3)	9.060	2.383	0.486
MSE(theta1)	10.894	4.588	2.855
MSE(theta2)	4.066	1.093	0.231
MSE(theta3)	9.568	2.419	0.490

From this we can see that our 2nd estimators bias is basically 0, as we found analytically. We can also see that our first estimator doesn’t change with larger values of n, while the bias in theta3 goes down as n increases. For variance as expected as n increases variance decreases for all. Theta2 does best again, about 2x better. The MSE tells us something similar, we can see that theta 2 is definitely our best estimator, and is unbiased.