Practice set 4.7

Submission instructions

I want you to start familiarizing yourself with R Quarto, a medium for compiling text, code, and output in one reproducible document. For this problem, use this .qmd file as a starting point for writing your simulation studies.

When finished with your solutions:

Click the Publish button
Select RPubs
Follow the prompts for creating an account on RPubs and publishing your solution

Submit BOTH your .qmd file and a link to your published html.

Question 1

Recall the following problem. Let \(X\) represents the number of cars driving past a parking ramp in any given hour, and suppose that \(X\sim POI(\lambda)\). Given \(X=x\) cars driving past a ramp, let \(Y\) represent the number that decide to park in the ramp, with \(Y|X=x\sim BIN(x,p)\). We’ve shown analytically that unconditionally, \(Y\sim POI(\lambda p)\). Verify this result via a simulation study over a grid of \(\lambda \in \{10, 20, 30\}\) and \(p\in \{0.2, 0.4, 0.6\}\) using the framework presented in your notes, with 10,000 simulated outcomes per \((\lambda, p)\) combination. Create a faceted plot of the overlaid analytic and empirial CDFs as well as the p-p plot.

I’ll get you started:

library(tidyverse)
library(purrrfect)
N <- 10000
set.seed(807)
(pois_binom <- parameters(~ lambda, ~ p,
                             c(10, 20, 30), c(0.2, 0.4, 0.6)
                             )
  %>% add_trials(N)
  %>% mutate(X = map_int(lambda, \(l) rpois(1, lambda=l)))
  %>% mutate(Y = pmap_int(list(X,p), \(var1,var2) rbinom(1, size = var1, prob = var2))) 
  %>% mutate(Fhat_Y = cume_dist(Y), .by = c(lambda, p)) 
  %>% mutate(F_Y = ppois(Y, lambda = lambda*p))
) %>% head

# A tibble: 6 × 7
  lambda     p .trial     X     Y Fhat_Y   F_Y
   <dbl> <dbl>  <dbl> <int> <int>  <dbl> <dbl>
1     10   0.2      1     4     1  0.406 0.406
2     10   0.2      2    10     4  0.948 0.947
3     10   0.2      3    10     2  0.681 0.677
4     10   0.2      4    13     3  0.857 0.857
5     10   0.2      5     9     0  0.135 0.135
6     10   0.2      6    12     3  0.857 0.857

Now for the plot:

(ggplot(aes(x=Y),data=pois_binom)
  + geom_step(aes(y =Fhat_Y, color = 'Simulated CDF'))
  + geom_step(aes(y = F_Y, color = 'Analytic CDF'))
  + labs(y=expression(P(Y <= y)), x= 'y', color='', 
         title = 'CDF plots')
  + theme_classic(base_size = 12)
  + facet_grid(lambda~p, labeller = label_both)
)

(ggplot(aes(x=Fhat_Y, y = F_Y),data=pois_binom)
  + geom_point(shape ='.')
  + labs(y=expression(P(Y<= y)), 
         x= expression(hat(P)(Y <= y)),
         title = 'p-p plot')
  + geom_abline(intercept=0, slope=1,color='red')
  + theme_classic(base_size = 10)
  + facet_grid(lambda~p, labeller = label_both)
)

Question 2

In this problem we will study the relationship between \(\alpha\) and \(Var(Y)\) for fixed means.

Recall that if \(Y\sim BETA(\alpha,\beta)\) and with \(\mu \equiv E(Y)\), \(\beta = \alpha \cdot \frac{1-\mu}{\mu}\).

Use this fact to simulate 10,000 realizations of \(Y\sim BETA(\alpha,\beta)\) for each combination of \(\alpha \in \{2,4,8,16\}\) and \(\mu\in\{0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7\}\).

set.seed(807)
(beta_sim <- parameters(~ alpha, ~ mu,
                             c(2, 4, 8, 16), seq(0.1, 0.7, by = 0.1)
                             )
  %>% add_trials(N)
  %>% mutate(beta = alpha*(1-mu)/mu)
  %>% mutate(Y = pmap_dbl(list(alpha,beta), \(a,b) rbeta(1, shape1=a, shape2=b)))
  %>% mutate(f_Y = dbeta(Y, shape1=alpha, shape2=beta))
) %>% head

# A tibble: 6 × 6
  alpha    mu .trial  beta      Y   f_Y
  <dbl> <dbl>  <dbl> <dbl>  <dbl> <dbl>
1     2   0.1      1    18 0.0147 3.90 
2     2   0.1      2    18 0.117  4.81 
3     2   0.1      3    18 0.0707 6.95 
4     2   0.1      4    18 0.0926 6.07 
5     2   0.1      5    18 0.258  0.549
6     2   0.1      6    18 0.160  2.82

A)

Create a faceted plot of the empirical densities overlaid with the analytic densities.

(ggplot(aes(x=Y),data=beta_sim)
  + geom_histogram(aes(y =after_stat(density)), 
                   fill = 'goldenrod',
                   binwidth = 0.05, center = 0.025)
  + geom_line(aes(y = f_Y), col = 'cornflowerblue', linewidth=1)
  + labs(y='Beta density', x= 'y')
  + theme_classic(base_size = 12)
  #+ ylim(c(0,20))
  + facet_grid(alpha~mu, labeller = label_both, scales = 'free_y')
)

B)

Use summarize() to find the simulated variance of \(Y\) for each \((\alpha,\mu)\) combination.

simmed_vars <- beta_sim %>% 
  summarize(var_y = var(Y), .by = c(alpha,mu)
            )
simmed_vars

# A tibble: 28 × 3
   alpha    mu   var_y
   <dbl> <dbl>   <dbl>
 1     2   0.1 0.00423
 2     2   0.2 0.0146 
 3     2   0.3 0.0270 
 4     2   0.4 0.0406 
 5     2   0.5 0.0497 
 6     2   0.6 0.0556 
 7     2   0.7 0.0544 
 8     4   0.1 0.00210
 9     4   0.2 0.00775
10     4   0.3 0.0147 
# ℹ 18 more rows

C)

Create a plot of the simulated variances as a function of \(\alpha\). Use lines as your geometry, and color-code each line by \(\mu\).

ggplot(data = simmed_vars) + 
  geom_line(aes(x = alpha, y = var_y, col  = mu, group = mu), linewidth=1) + 
  labs(y = 'Simulated variances', x = expression(alpha), color = expression(mu)) + 
  scale_color_gradient(low='cornflowerblue', high = 'goldenrod')+  
  theme_classic(base_size = 14)

D)

Comment on how the combination of \(\alpha\) and \(\mu\) impact variance.

For a fixed \(\mu\), the variance decreases as \(\alpha\) increases. For a fixed \(\alpha\), variance and mean are positively associated - as \(\mu\) increases so does the variance.