Practice set 4.7

Submission instructions

I want you to start familiarizing yourself with R Quarto, a medium for compiling text, code, and output in one reproducible document. For this problem, use this .qmd file as a starting point for writing your simulation studies.

When finished with your solutions:

Click the Publish button
Select RPubs
Follow the prompts for creating an account on RPubs and publishing your solution

Submit BOTH your .qmd file and a link to your published html.

Question 1

Recall the following problem. Let \(X\) represents the number of cars driving past a parking ramp in any given hour, and suppose that \(X\sim POI(\lambda)\). Given \(X=x\) cars driving past a ramp, let \(Y\) represent the number that decide to park in the ramp, with \(Y|X=x\sim BIN(x,p)\). We’ve shown analytically that unconditionally, \(Y\sim POI(\lambda p)\). Verify this result via a simulation study over a grid of \(\lambda \in \{10, 20, 30\}\) and \(p\in \{0.2, 0.4, 0.6\}\) using the framework presented in your notes, with 10,000 simulated outcomes per \((\lambda, p)\) combination. Create a faceted plot of the overlaid analytic and empirial CDFs as well as the p-p plot.

I’ll get you started:

library(tidyverse)
library(purrrfect)
N <- 10000
set.seed(227)
(thinned_binom <- parameters(~ lambda, ~ p,
c(10, 20, 30), c(0.2, 0.4, 0.6)
)
%>% add_trials(N)
%>% mutate(X = map_int(lambda, \(a) rpois(1, lambda)))
%>% mutate(Y = pmap_int(list(X,p), \(var1,var2) rbinom(1, size = var1, prob = var2)))
%>% mutate(Fhat_Y = cume_dist(Y), .by = c(lambda, p))
%>% mutate(F_Y = ppois(Y,lambda*p))
)

# A tibble: 90,000 × 7
   lambda     p .trial     X     Y Fhat_Y   F_Y
    <dbl> <dbl>  <dbl> <int> <int>  <dbl> <dbl>
 1     10   0.2      1    11     2  0.682 0.677
 2     10   0.2      2     7     0  0.138 0.135
 3     10   0.2      3     8     1  0.408 0.406
 4     10   0.2      4     7     4  0.949 0.947
 5     10   0.2      5     9     1  0.408 0.406
 6     10   0.2      6     6     1  0.408 0.406
 7     10   0.2      7    14     4  0.949 0.947
 8     10   0.2      8     9     2  0.682 0.677
 9     10   0.2      9     7     1  0.408 0.406
10     10   0.2     10    13     2  0.682 0.677
# ℹ 89,990 more rows

(ggplot(aes(x=Y),data=thinned_binom)
+ geom_step(aes(y =Fhat_Y, color = 'Simulated CDF'))
+ geom_step(aes(y = F_Y, color = 'Analytic CDF'))
+ labs(y=expression(P(Y <= y)), x= 'y', color='')
+ theme_classic(base_size = 12)
+ facet_grid(lambda~p, labeller = label_both)
)

Question 2

In this problem we will study the relationship between \(\alpha\) and \(Var(Y)\) for fixed \(\mu\) for the beta distribution.

Recall that if \(Y\sim BETA(\alpha,\beta)\) and with \(\mu \equiv E(Y)\), \(\beta = \alpha \cdot \frac{1-\mu}{\mu}\).

Use this fact to simulate 10,000 realizations of \(Y\sim BETA(\alpha,\beta)\) for each combination of \(\alpha \in \{2,4,8,16\}\) and \(\mu\in\{0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7\}\).

A)

Create a faceted plot of the empirical densities overlaid with the analytic densities for each \((\alpha, \mu)\) combination.

library(tidyverse)
library(purrrfect)

set.seed(227)
N <- 10000
beta_func <- (parameters(~ alpha, ~ mu,
                       c(2, 4, 8, 16),
                       seq(0.1, 0.7, by = 0.1)) 
%>% add_trials(N) 
%>% mutate(beta = alpha * (1 - mu) / mu)
%>% mutate (Y = rbeta(n(), shape1 = alpha, shape2 = beta))
%>% mutate(Fhat_Y = cume_dist(Y), .by = c(alpha, mu))
%>% mutate(F_Y = pbeta(Y,alpha, beta))
)
(ggplot(aes(x=Y),data=beta_func)
+ geom_step(aes(y =Fhat_Y, color = 'Simulated CDF'))
+ geom_step(aes(y = F_Y, color = 'Analytic CDF'))
+ labs(y=expression(P(Y <= y)), x= 'y', color='')
+ theme_classic(base_size = 12)
+ facet_grid(alpha~mu, labeller = label_both)
)

B)

Use summarize() to find the simulated variance of \(Y\) for each \((\alpha,\mu)\) combination.

beta_func_summary <- beta_func %>%
  group_by(alpha, mu) %>%
  summarise(
    var_Y = var(Y),     
    .groups = "drop"
  )
beta_func_summary

# A tibble: 28 × 3
   alpha    mu   var_Y
   <dbl> <dbl>   <dbl>
 1     2   0.1 0.00424
 2     2   0.2 0.0142 
 3     2   0.3 0.0278 
 4     2   0.4 0.0405 
 5     2   0.5 0.0505 
 6     2   0.6 0.0562 
 7     2   0.7 0.0536 
 8     4   0.1 0.00223
 9     4   0.2 0.00778
10     4   0.3 0.0148 
# ℹ 18 more rows

C)

Create a plot of the simulated variances as a function of \(\alpha\). Use lines as your geometry, and color-code each line by \(\mu\).

library(ggplot2)

ggplot(beta_func_summary, aes(x = alpha, y = var_Y, color = factor(mu))) +
  geom_line(size = 1) +      
  geom_point(size = 2) +      
  scale_color_viridis_d(option = "plasma", name = expression(mu)) + 
  labs(
    title = "Simulated Variance of Beta(α, β) as Function of α",
    x = expression(alpha),
    y = expression(paste("Simulated Variance ", Var(Y)))
  ) +
  theme_minimal(base_size = 14)

Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

D)

Comment on how the combination of \(\alpha\) and \(\mu\) impact variance.

You can see that the combination of alpha and mu impacts the variance by as our mu decreases we have a lower variance and we can see that our variance also shrinks as our alpha increases showing that the combination of alpha and mu can shrink the variance.