PS 8.3

Show via simulation study that for a sample of size n drawn i.i.d. from a BETA(θ,3) population, that θ ̂= 2 (∑_i▒〖Y_i /(1-Y_i)〗)/n is consistent for θ by simulating 5000 realizations of θ ̂ over a grid of θ∈{0.2,0.5,0.8} and n∈{50,100,…,1000} and finding the simulated P(|θ ̂-θ|<ϵ) for ϵ=0.1. For each θ, plot this probability as a function of n and verify that this probability appears to be converging to 1, and comment on how the true value of θ impacts the rate of convergence.

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(purrrfect)


Attaching package: 'purrrfect'

The following objects are masked from 'package:base':

    replicate, tabulate

set.seed(1234)

(sim_many_runs <-
  parameters(
    ~theta, ~n, ~epsilon,
    c(0.2, 0.5, 0.8),
    seq(50, 1000, by = 50),
    0.1
  ) %>%
  add_trials(5000) %>%
  mutate(
    Ysample = pmap(
      list(n, theta),
      \(n, th) rbeta(n, th, 3)
    ),
    theta_hat = map_dbl(
      Ysample,
      \(y) 2 * mean(y / (1 - y))
    ),
    within_epsilon = abs(theta_hat - theta) < epsilon
  )
)

# A tibble: 300,000 × 7
   theta     n epsilon .trial Ysample    theta_hat within_epsilon
   <dbl> <dbl>   <dbl>  <dbl> <list>         <dbl> <lgl>         
 1   0.2    50     0.1      1 <dbl [50]>    0.312  FALSE         
 2   0.2    50     0.1      2 <dbl [50]>    0.195  TRUE          
 3   0.2    50     0.1      3 <dbl [50]>    0.117  TRUE          
 4   0.2    50     0.1      4 <dbl [50]>    0.0758 FALSE         
 5   0.2    50     0.1      5 <dbl [50]>    0.189  TRUE          
 6   0.2    50     0.1      6 <dbl [50]>    0.195  TRUE          
 7   0.2    50     0.1      7 <dbl [50]>    0.324  FALSE         
 8   0.2    50     0.1      8 <dbl [50]>    0.166  TRUE          
 9   0.2    50     0.1      9 <dbl [50]>    0.231  TRUE          
10   0.2    50     0.1     10 <dbl [50]>    0.103  TRUE          
# ℹ 299,990 more rows

(prop_within_epsilon <-
  sim_many_runs %>%
  summarize(
    prob_within_eps = mean(within_epsilon),
    .by = c(theta, n)
  )
)

# A tibble: 60 × 3
   theta     n prob_within_eps
   <dbl> <dbl>           <dbl>
 1   0.2    50           0.813
 2   0.2   100           0.922
 3   0.2   150           0.960
 4   0.2   200           0.970
 5   0.2   250           0.983
 6   0.2   300           0.986
 7   0.2   350           0.990
 8   0.2   400           0.991
 9   0.2   450           0.995
10   0.2   500           0.994
# ℹ 50 more rows

ggplot(prop_within_epsilon,
       aes(x = n, y = prob_within_eps, color = factor(theta))) +
  geom_line(linewidth = 1.1) +
  labs(
    x = "Sample size n",
    y = expression(P(abs(hat(theta) - theta) < epsilon)),
    color = expression(theta)
  ) +
  theme_classic(base_size = 16)

From the plot we can see that lower values of theta converge to 1 quicker than the larger values of theta.