5.7 - Simulating joint random variables

Motivating example

  • Suppose \(X \sim EXP(\lambda = 0.5)\)
  • \(Y|X=x \sim N(\sqrt{x}, 0.2^2)\)
  • Simulate the following:
    • Joint distribution of \((X,Y)\)
    • \(E(Y|X)\)
    • \(P(X\le 2, Y \le 1)\)
    • \(E(Y)\), \(Var(Y)\)
    • \(P(Y\le 1 | X \le 5)\)
    • \(Cov(X,Y)\) and \(Cor(X,Y)\)

Simulating

library(tidyverse)

N <- 10000
(simstudy <- data.frame(x = rexp(N, rate = .5)) 
              %>% mutate(y = map_dbl(x, \(x) rnorm(1, mean = sqrt(x), sd = 0.2))) 
)  %>% head
           x         y
1 0.04370311 0.2977216
2 0.69911426 0.8605832
3 1.45362794 1.1177599
4 2.34028434 1.7301351
5 1.69402447 1.2406840
6 2.01676946 1.5679178

Visualizing joint distribution

  • We visualize the joint distribution simply by plotting a scatterplot
  • Use small points and transparency (set alpha < 1) so the density is more obvious
ggplot(data = simstudy,aes(x = x, y = y)) + 
  geom_point(size = .2, alpha = .5)  + 
  theme_classic()

Joint distribution of X and Y

Simulated and analytic \(E(Y|X)\)

  • The simulated conditional mean is simply a loess smooth of the scatterplot, use geom_smooth().
  • We can use geom_function() to overlay the analytic conditional mean.
ggplot(data = simstudy,aes(x = x, y = y)) + 
  geom_point(size = .2, alpha = .5)  + 
  geom_smooth(method='loess', se= FALSE, 
              aes(color='Simulated E(Y|X)'),
              size =.7) + 
  geom_function(fun = \(x) sqrt(x), 
                aes(color='Analytic E(Y|X)'),
                size = .7) + 
  labs(color='') + 
  theme_classic()

Joint distribution of X and Y with conditional means superimposed

Whole-sim aggregations

  • We can approximate the following with a summarize() step:
    • \(P(X\le 2, Y \le 1)\)
    • \(E(Y)\)
    • \(Var(Y)\)
(simstudy
    %>% summarize('P(X<=2, Y<=1)' = mean(x<=2 & y<=1),
                  'E(Y)' = mean(y),
                  'Var(Y)'= var(y)
                  )
 )
  P(X<=2, Y<=1)     E(Y)   Var(Y)
1        0.3978 1.255784 0.471138

Filtered-sim aggregations

  • Conditional summaries require filtering to the condition, then summarizing: filter() %>% summarize().
  • Proceeding to simulate \(P(Y\le 1 | X \le 5)\):
(simstudy
    %>% filter(x<=5)
    %>% summarize('P(Y<=1 | X<=5)' = mean(y<=1)
                  )
 )
  P(Y<=1 | X<=5)
1      0.4348396

Covariance and correlation

  • To find covariance and correlation, we simply use the cov and cor functions.
  • These produce covariance and correlation matrices
    • Diagonals are marginal variances (covariance), correlation of variables with themselves (correlation)
    • Off-diagonals are the covariances or correlations
simstudy %>% 
  cov
         x        y
x 4.169904 1.281244
y 1.281244 0.471138
simstudy %>%
  cor
          x         y
x 1.0000000 0.9141021
y 0.9141021 1.0000000