Practice set 4.7

Submission instructions

I want you to start familiarizing yourself with R Quarto, a medium for compiling text, code, and output in one reproducible document. For this problem, use this .qmd file as a starting point for writing your simulation studies.

When finished with your solutions:

  • Click the Publish button
  • Select RPubs
  • Follow the prompts for creating an account on RPubs and publishing your solution

Submit BOTH your .qmd file and a link to your published html.

Question 1

Recall the following problem. Let \(X\) represents the number of cars driving past a parking ramp in any given hour, and suppose that \(X\sim POI(\lambda)\). Given \(X=x\) cars driving past a ramp, let \(Y\) represent the number that decide to park in the ramp, with \(Y|X=x\sim BIN(x,p)\). We’ve shown analytically that unconditionally, \(Y\sim POI(\lambda p)\). Verify this result via a simulation study over a grid of \(\lambda \in \{10, 20, 30\}\) and \(p\in \{0.2, 0.4, 0.6\}\) using the framework presented in your notes, with 10,000 simulated outcomes per \((\lambda, p)\) combination. Create a faceted plot of the overlaid analytic and empirial CDFs as well as the p-p plot.

I’ll get you started:

library(tidyverse)
library(purrrfect)

(simstudy <- parameters(~lambda, seq(0.25,1,by=0.25))
  %>% add_trials(10000)
  %>% mutate(Y = map_dbl(lambda, \(l) rexp(1, rate = l)))
  %>% mutate(X = floor(Y))
  %>% mutate(Fhat_X = cume_dist(X), .by = lambda)
  %>% mutate(F_X = pgeom(X, prob = 1-exp(-lambda)))
) %>% head()
# A tibble: 6 × 6
  lambda .trial      Y     X Fhat_X   F_X
   <dbl>  <dbl>  <dbl> <dbl>  <dbl> <dbl>
1   0.25      1 10.4      10  0.936 0.936
2   0.25      2  7.07      7  0.863 0.865
3   0.25      3  4.70      4  0.709 0.713
4   0.25      4  1.83      1  0.393 0.393
5   0.25      5  2.05      2  0.532 0.528
6   0.25      6  0.430     0  0.217 0.221
(ggplot(aes(x=X), data=simstudy)
 + geom_step(aes(y =Fhat_X, color = 'Simulated CDF'))
 + geom_step(aes(y = F_X, color = 'Analytic CDF'))
 + labs(y=expression(P(X <= x)), x= 'x', color='')
 + theme_classic(base_size = 12)
 + facet_wrap(~lambda, labeller = label_both))

Question 2

In this problem we will study the relationship between \(\alpha\) and \(Var(Y)\) for fixed \(\mu\) for the beta distribution.

Recall that if \(Y\sim BETA(\alpha,\beta)\) and with \(\mu \equiv E(Y)\), \(\beta = \alpha \cdot \frac{1-\mu}{\mu}\).

Use this fact to simulate 10,000 realizations of \(Y\sim BETA(\alpha,\beta)\) for each combination of \(\alpha \in \{2,4,8,16\}\) and \(\mu\in\{0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7\}\).

A)

Create a faceted plot of the empirical densities overlaid with the analytic densities for each \((\alpha, \mu)\) combination.

library(tidyverse)
library(purrrfect)


(betasim <- parameters(~ alpha, ~ mu,
                       c(2, 4, 8, 16),
                       c(0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7))
  %>% add_trials(10000)
  %>% mutate(beta = alpha*(1-mu)/mu)
  %>% mutate(Y = pmap_dbl(list(alpha, beta), \(a, b) rbeta(1, shape1 = a, shape2 = b)))
  %>% mutate(f_Y = dbeta(Y, alpha, beta))
)
# A tibble: 280,000 × 6
   alpha    mu .trial  beta      Y   f_Y
   <dbl> <dbl>  <dbl> <dbl>  <dbl> <dbl>
 1     2   0.1      1    18 0.0401 6.84 
 2     2   0.1      2    18 0.0311 6.22 
 3     2   0.1      3    18 0.0533 7.18 
 4     2   0.1      4    18 0.0414 6.90 
 5     2   0.1      5    18 0.236  0.825
 6     2   0.1      6    18 0.0437 6.99 
 7     2   0.1      7    18 0.106  5.39 
 8     2   0.1      8    18 0.155  3.04 
 9     2   0.1      9    18 0.0824 6.53 
10     2   0.1     10    18 0.150  3.24 
# ℹ 279,990 more rows
(ggplot(aes(x = Y), data = betasim)
 + geom_histogram(aes(y = after_stat(density)), binwidth = 0.02, center = 0.01)
 + geom_line(aes(y = f_Y), linewidth = 1)
 + labs(y = "Beta density", x = "y")
 + theme_classic(base_size = 12)
 + facet_grid(alpha ~ mu, labeller = label_both)
)

B)

Use summarize() to find the simulated variance of \(Y\) for each \((\alpha,\mu)\) combination.

(beta_var <- betasim %>% summarize(varY = var(Y), .by = c(alpha, mu)))
# A tibble: 28 × 3
   alpha    mu    varY
   <dbl> <dbl>   <dbl>
 1     2   0.1 0.00429
 2     2   0.2 0.0145 
 3     2   0.3 0.0282 
 4     2   0.4 0.0393 
 5     2   0.5 0.0493 
 6     2   0.6 0.0555 
 7     2   0.7 0.0547 
 8     4   0.1 0.00222
 9     4   0.2 0.00757
10     4   0.3 0.0144 
# ℹ 18 more rows

C)

Create a plot of the simulated variances as a function of \(\alpha\). Use lines as your geometry, and color-code each line by \(\mu\).

(ggplot(beta_var, aes(x = alpha, y = varY, group = mu, color = factor(mu)))
 + geom_line(linewidth = 1.1)
 + geom_point(size = 2)
 + scale_color_manual(values = c(
   "#F4A7B9",
   "#F6C272",
   "#C3D676",
   "#9ACD97",
   "#F59BAF",
   "#FFB570",  
   "#A5E3B9"
 ))
 + labs(x = expression(alpha),
        y = "Simulated Var(Y)",
        color = expression(mu))
 + theme_classic(base_size = 12)
 + theme(
   legend.position = "right",
   panel.grid = element_blank(),
 )
)

D)

Comment on how the combination of \(\alpha\) and \(\mu\) impact variance.

As α increases, the variance of Y gets smaller the distribution becomes tighter and more concentrated around its mean. When α is small, the shape is wider and more variable. For any fixed α, variance changes with μ: it’s lowest near 0 or 1 and reaches its highest point around the middle (roughly μ ≈ 0.5). Together, α controls how stable the distribution is, while μ controls where that stability sits along the 0–1 range.