Practice set 4.7

Question 1

Recall the following problem. Let \(X\) represents the number of cars driving past a parking ramp in any given hour, and suppose that \(X\sim POI(\lambda)\). Given \(X=x\) cars driving past a ramp, let \(Y\) represent the number that decide to park in the ramp, with \(Y|X=x\sim BIN(x,p)\). We’ve shown analytically that unconditionally, \(Y\sim POI(\lambda p)\). Verify this result via a simulation study over a grid of \(\lambda \in \{10, 20, 30\}\) and \(p\in \{0.2, 0.4, 0.6\}\) using the framework presented in your notes, with 10,000 simulated outcomes per \((\lambda, p)\) combination. Create a faceted plot of the overlaid analytic and empirial CDFs as well as the p-p plot.

I’ll get you started:

library(tidyverse)
library(purrrfect)

N <- 10000
(simstudy <- parameters(~ lambda, ~ p,
                        c(10, 20, 30), c(0.2, 0.4, 0.6))
  %>% add_trials(N)
  %>% mutate(X = map_int(lambda, \(l) rpois(1, l)))
  %>% mutate(Y = pmap_int(list(X,p), \(var1, var2) rbinom(1, size = var1, prob = var2)))
  %>% mutate(Fhat_Y = cume_dist(Y), .by = c(lambda, p))
  %>% mutate(F_Y = ppois(Y, lambda = lambda*p))
) %>% head
# A tibble: 6 × 7
  lambda     p .trial     X     Y Fhat_Y   F_Y
   <dbl> <dbl>  <dbl> <int> <int>  <dbl> <dbl>
1     10   0.2      1     6     1  0.404 0.406
2     10   0.2      2     9     1  0.404 0.406
3     10   0.2      3    14     5  0.982 0.983
4     10   0.2      4     9     2  0.674 0.677
5     10   0.2      5    14     1  0.404 0.406
6     10   0.2      6     4     1  0.404 0.406
(ggplot(aes(x = Y), data = simstudy)
  +geom_step(aes(y = Fhat_Y, color = 'Simulated CDF'))
  +geom_step(aes(y = F_Y, color = 'Analytic CDF'))
  +labs(y = expression(P(Y<= y)), x = 'y', color = '')
  +theme_classic(base_size = 12)
  +facet_grid(lambda~p, labeller = label_both, scales = 'free_x')
  )

(ggplot(aes(x = Fhat_Y, y = F_Y), data = simstudy)
  +geom_point()
  +labs(y = expression(P(Y<= y)), x = expression(hat(P)(Y <=y)))
  +geom_abline(intercept = 0, slope = 1)
  +theme_classic()
  +facet_grid(lambda~p, labeller = label_both)
  )

Question 2

In this problem we will study the relationship between \(\alpha\) and \(Var(Y)\) for fixed \(\mu\) for the beta distribution.

Recall that if \(Y\sim BETA(\alpha,\beta)\) and with \(\mu \equiv E(Y)\), \(\beta = \alpha \cdot \frac{1-\mu}{\mu}\).

Use this fact to simulate 10,000 realizations of \(Y\sim BETA(\alpha,\beta)\) for each combination of \(\alpha \in \{2,4,8,16\}\) and \(\mu\in\{0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7\}\).

library(tidyverse)
library(purrrfect)

N <- 10000
(simstudy1 <- parameters(~alpha, ~mu,
                         c(2, 4, 8,16), seq(0.1, 0.7, by = 0.1))
  %>% add_trials(N)
  %>% mutate(B = alpha * ((1 - mu)/mu))
  %>% mutate(Y = pmap_dbl(list(alpha, B), \(var1, var2) rbeta(1, var1, var2)))
  %>% mutate(Fhat_Y = cume_dist(Y), .by = c(alpha, mu))
  %>% mutate(F_Y = pbeta(Y, alpha, B))
  %>% mutate(f_y = dbeta(Y, alpha, B))
) %>% head 
# A tibble: 6 × 8
  alpha    mu .trial     B      Y Fhat_Y   F_Y   f_y
  <dbl> <dbl>  <dbl> <dbl>  <dbl>  <dbl> <dbl> <dbl>
1     2   0.1      1    18 0.0822  0.463 0.471 6.54 
2     2   0.1      2    18 0.257   0.974 0.973 0.558
3     2   0.1      3    18 0.0431  0.197 0.197 6.97 
4     2   0.1      4    18 0.0746  0.418 0.420 6.83 
5     2   0.1      5    18 0.0581  0.303 0.304 7.18 
6     2   0.1      6    18 0.0623  0.332 0.333 7.14 

A)

Create a faceted plot of the empirical densities overlaid with the analytic densities for each \((\alpha, \mu)\) combination.

(ggplot(aes(x = Y), data = simstudy1)
  + geom_histogram(aes(y = after_stat(density)),
                  fill = 'goldenrod',
                  binwidth = 0.02, center = 0.01)
  + geom_line(aes(y = f_y), col = 'cornflowerblue', linewidth = 1)
  + labs(y = 'Beta density', x = 'y')
  + theme_classic()
  + facet_grid(alpha~mu, labeller = label_both, scales = 'free_x')
)

B)

Use summarize() to find the simulated variance of \(Y\) for each \((\alpha,\mu)\) combination.

(updated_simstudy <- simstudy1
 %>% group_by(alpha, mu)
 %>% summarize(sim_Var = var(Y))
 )
`summarise()` has grouped output by 'alpha'. You can override using the
`.groups` argument.
# A tibble: 28 × 3
# Groups:   alpha [4]
   alpha    mu sim_Var
   <dbl> <dbl>   <dbl>
 1     2   0.1 0.00439
 2     2   0.2 0.0148 
 3     2   0.3 0.0272 
 4     2   0.4 0.0402 
 5     2   0.5 0.0509 
 6     2   0.6 0.0555 
 7     2   0.7 0.0550 
 8     4   0.1 0.00218
 9     4   0.2 0.00752
10     4   0.3 0.0147 
# ℹ 18 more rows

C)

Create a plot of the simulated variances as a function of \(\alpha\). Use lines as your geometry, and color-code each line by \(\mu\).

(ggplot(aes(x = alpha, y = sim_Var, color = factor(mu), group = mu), data = updated_simstudy)
 + geom_line()
 + theme_classic()
 + labs(x = expression(alpha), y = expression(Var(Y)), color = '')
 + scale_color_brewer(
   palette = "Dark2",
   labels = function(x) paste0("μ =", x))
)

D)

Comment on how the combination of \(\alpha\) and \(\mu\) impact variance.

We see that for each \(\mu\), the variance decreases as \(\alpha\) increases. When we hold \(\alpha\) constant, we see that the variance increases as we increase \(\mu\). We also see that the vertical gaps between the different \(\mu\) levels shrinks as we increase \(\alpha\).