I want you to start familiarizing yourself with R Quarto, a medium for compiling text, code, and output in one reproducible document. For this problem, use this .qmd file as a starting point for writing your simulation studies.
When finished with your solutions:
Click the Publish button
Select RPubs
Follow the prompts for creating an account on RPubs and publishing your solution
Submit BOTH your .qmd file and a link to your published html.
Question 1
Recall the following problem. Let \(X\) represents the number of cars driving past a parking ramp in any given hour, and suppose that \(X\sim POI(\lambda)\). Given \(X=x\) cars driving past a ramp, let \(Y\) represent the number that decide to park in the ramp, with \(Y|X=x\sim BIN(x,p)\). We’ve shown analytically that unconditionally, \(Y\sim POI(\lambda p)\). Verify this result via a simulation study over a grid of \(\lambda \in \{10, 20, 30\}\) and \(p\in \{0.2, 0.4, 0.6\}\) using the framework presented in your notes, with 10,000 simulated outcomes per \((\lambda, p)\) combination. Create a faceted plot of the overlaid analytic and empirial CDFs as well as the p-p plot.
(ggplot(aes(x = Y), data = simstudy)+geom_step(aes(y = Fhat_Y, color ="Simulated CDF"))+geom_step(aes(y = F_Y, color ="Analytic CDF"))+facet_grid(lambda ~ p, labeller = label_both))
Empirical vs analytic CDFs across λ and p.
(ggplot(aes(x = Fhat_Y, y = F_Y), data = simstudy)+geom_point(alpha =0.6)+geom_abline(intercept =0, slope =1, color ="gray50", linetype ="dashed")+labs(y =expression(P(Y <= y)),x =expression(hat(P)(Y <= y)))+theme_classic(base_size =12)+facet_grid(lambda ~ p, labeller = label_both))
P–P plot comparing empirical and analytic CDFs across λ and p.
Question 2
In this problem we will study the relationship between \(\alpha\) and \(Var(Y)\) for fixed \(\mu\) for the beta distribution.
Recall that if \(Y\sim BETA(\alpha,\beta)\) and with \(\mu \equiv E(Y)\), \(\beta = \alpha \cdot \frac{1-\mu}{\mu}\).
Use this fact to simulate 10,000 realizations of \(Y\sim BETA(\alpha,\beta)\) for each combination of \(\alpha \in \{2,4,8,16\}\) and \(\mu\in\{0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7\}\).
# A tibble: 280,000 × 5
alpha mu simulated_var theoretical_var .groups
<dbl> <dbl> <dbl> <dbl> <chr>
1 2 0.1 0.00424 0.00429 drop
2 2 0.1 0.00424 0.00429 drop
3 2 0.1 0.00424 0.00429 drop
4 2 0.1 0.00424 0.00429 drop
5 2 0.1 0.00424 0.00429 drop
6 2 0.1 0.00424 0.00429 drop
7 2 0.1 0.00424 0.00429 drop
8 2 0.1 0.00424 0.00429 drop
9 2 0.1 0.00424 0.00429 drop
10 2 0.1 0.00424 0.00429 drop
# ℹ 279,990 more rows
C)
Create a plot of the simulated variances as a function of \(\alpha\). Use lines as your geometry, and color-code each line by \(\mu\).
(ggplot(sim_var, aes(x = alpha, y = simulated_var, color =as.factor(mu), group = mu))+geom_line(linewidth =1)+labs(x =expression(alpha),y ="Simulated Variance",color =expression(mu) )+theme_classic(base_size =12))
D)
Comment on how the combination of \(\alpha\) and \(\mu\) impact variance.
We can see that with a larger \(\mu\) and a small \(\alpha\), the variances differ significantly. As \(\alpha\) increases, the variances for each \(\mu\) exhibit a similar downward trend, and by the largest \(\alpha\), the variances are considerably closer together than they were initially. Therefore, the combination of \(\alpha\) and \(\mu\) affects the variance by controlling both the overall spread and the concentration of the Beta distribution: smaller \(\alpha\) values amplify the influence of \(\mu\) on variance, whereas larger \(\alpha\) values dominate the effect, producing more uniform variances across different values of \(\mu\).