5.1 - bivariate discrete distributions

Motivation

Consider the data set below, examples of which you have hopefully seen many times:

   age smoker married gestation  bwt
1   30      N married        38 3540
2   30      N married        39 3090
3   23      Y  single        34 2522
4   32      N married        38 3690
5   26      N married        41 3062
6   31      N married        40 3856
7   36      N married        41 4451
8   27      N  single        42 3685
9   22      Y  single        38 3118
10  34      N married        38 3152

Univariate relationships for birth weight data.

  • This is an example of multivariate data
    • \(Y_1\): Normal?
    • \(Y_2\), \(Y_3\): Bernoulli?
    • \(Y_4\), \(Y_5\): Inverted gamma?
  • In this example, each observation is a length\(-p\) vector, with \(p=5\)

Multivariate relationships

  • The “one-by-one” distributions are marginal distributions
  • In statistics of course, we usually care not just about univariate distributions, but about multivariate distributions
Bivariate relationships for birth weight data.
  • These multivariate distributions are often called joint distributions

Multivariate discrete

  • Let \(\{Y_1,...,Y_p\}\in \mathbb{R}^p\) be a \(p-\)vector of discrete random variables.

  • Then the joint probability mass function for \(\{Y_1,...,Y_p\}\) is:

\[p(y_1,y_2,...,y_p) = P(Y_1 = y_1, Y_2 = y_2,...,Y_p = y_p), y_1 \in \mathbb{R}, ..., y_p \in \mathbb{R}\]

Bivariate discrete

  • To study mathematical properties we will for the most part stay simple, and let \(p=2\)
  • We will use \(X \equiv Y_1\) and \(Y\equiv Y_2\), for simplicity of notation.

Joint pmf properties (bivariate case):

  1. \(p(x,y) \geq 0\) for any \(x,y \in \mathbb{R}\).
  2. \(\sum_{x}\sum_{y}p(x,y) = 1\) (sum is over all \(x,y\) values such that \(p(x,y) > 0\).)

Joint CDF

  • The joint cumulative distribution function (CDF) is defined as a straightforward extension of the univariate case.
  • For any random variables \(X\) and \(Y\), the joint bivariate distribution function \(F\) is:

\[F(x,y) = P(X\leq x, Y \leq y); x,y \in \mathbb{R}\]

Properties:

  1. \(\lim_{x\rightarrow -\infty}F(x,y) =\lim_{y\rightarrow -\infty}F(x,y) = 0\)
  2. \(\lim_{x\rightarrow \infty}\lim_{y\rightarrow \infty}F(x,y) =1\)
  3. Bivariate non-decreasing property (cumulative probability over a rectangular region is \(\geq 0\)): If \(x < x^*\) and \(y < y^*\), then \(F(x^*, y^*) - F(x^*,y)-F(x,y^*)+F(x,y)= P(x < X \leq x^*, y < Y \leq y^*) \geq 0\)

Visual of Property 3

Bivariate non-decreasing property: If \(x < x^*\) and \(y < y^*\), then \(F(x^*, y^*) - F(x^*,y)-F(x,y^*)+F(x,y)= P(x < X \leq x^*, y < Y \leq y^*) \geq 0\)

image decomposing cumulative probability over a rectangular region

Marginal

If \(X\) and \(Y\) have joint pmf \(p(x,y)\), then the marginal (aka unconditional) pmfs are given by:

\[p_X(x) = P(X=x) = \sum_{all\ y} p(x,y)\]

\[p_Y(y) = P(Y=y) = \sum_{all\ x} p(x,y)\]

  • It’s the subscript on these functions that’s important!
  • \(p_X(y)\) makes sense!
    • Just “the marginal probability that \(X\) takes on specific value \(y\)
    • Plug value \(y\) into function \(p_X\).

Conditional

  • If \(X\) and \(Y\) have joint pmf \(p(x,y)\) and marginal pmfs \(p_X\) and \(p_Y\), respectively, then the conditional discrete probability function is:

\[p_{X|Y}(x|y) = P(X=x|Y=y) = \frac{p(x,y)}{p_Y(y)} =\frac{P(X=x, Y=y)}{P(Y=y)},\] we analogously define \(p_{Y|X}(y|x) = p(x,y)/p_X(x)\).

  • Note from here we can find a joint by combining conditional and marginal:

\[p(x,y) = p_{X|Y}(x|y) p_Y(y)\] \[p(x,y) = p_{Y|X}(y|x) p_X(x)\]

Example

Suppose \(X\) and \(Y\) have joint pmf given by:

\[p(x,y) = \left\{\begin{array} {ll} \frac{1}{30}(x+y) & x=0,1,2; y=0,1,2,3 \\ 0 & otherwise\\ \end{array}\right.\]

3D bar graph representing bivariate discrete distribution

y x = 0 x = 1 x = 2
0 0/30 1/30 2/30
1 1/30 2/30 3/30
2 2/30 3/30 4/30
3 3/30 4/30 5/30

Verify that this is a valid joint pmf

3D bar graph representing bivariate discrete distribution

\[\sum_{x=0}^2 \sum_{y=0}^3\frac{1}{30}(x+y)\] \[= \sum_{x=0}^2 \left(\frac{x+0}{30} + \frac{x+1}{30} + \frac{x+2}{30} + \frac{x+3}{30}\right)\]

\[=\left(\frac{0+0}{30} + \frac{0+1}{30} + \frac{0+2}{30} + \frac{0+3}{30}\right)\] \[+\left(\frac{1+0}{30} + \frac{1+1}{30} + \frac{1+2}{30} + \frac{1+3}{30}\right)\] \[+\left(\frac{2+0}{30} + \frac{2+1}{30} + \frac{2+2}{30} + \frac{2+3}{30}\right)\]

Joint CDF

  • \(F(1,1) = \frac{0+0}{30} + \frac{1+0}{30} + \frac{0+1}{30} + \frac{1+1}{30}\)

  • \(F(-1,4) = 0\)

3D bar graph representing bivariate discrete distribution with bars color coded by whether or not they are included in the cumulative probability

  • \(F(4,4) = 1\)

3D bar graph representing bivariate discrete distribution with bars color coded by whether or not they are included in the cumulative probability

Marginal of \(X\)

\[p_X(0) = \sum_{y=0}^3 p(0,y) =\left(\frac{0+0}{30} + \frac{0+1}{30} + \frac{0+2}{30} + \frac{0+3}{30}\right) =\frac{6}{30}\] \[p_X(1) = \sum_{y=0}^3 p(1,y) =\left(\frac{1+0}{30} + \frac{1+1}{30} + \frac{1+2}{30} + \frac{1+3}{30}\right) =\frac{10}{30}\] \[p_X(2) = \sum_{y=0}^3 p(2,y) =\left(\frac{2+0}{30} + \frac{2+1}{30} + \frac{2+2}{30} + \frac{2+3}{30}\right) =\frac{14}{30}\]

\(x\) 0 1 2
\(P(X=x)\) 6/30 10/30 14/30

Conditional \(p_{Y|X}\)

  • \(p_X(x)\):
\(x\) 0 1 2
\(P(X=x)\) 6/30 10/30 14/30
  • \(p(x,y)\):
y x = 0 x = 1 x = 2
0 0/30 1/30 2/30
1 1/30 2/30 3/30
2 2/30 3/30 4/30
3 3/30 4/30 5/30
  • \(p_{Y|0} = P(Y=y|X=0)\):
y \(P(Y=y|X=0)\)
0 0/30/(6/30) = 0/6
1 1/30/(6/30) = 1/6
2 2/30/(6/30) = 2/6
3 3/30/(6/30) = 3/6
  • \(p_{Y|1} = P(Y=y|X=1)\):
y \(P(Y=y|X=1)\)
0 1/30/(10/30) = 1/10
1 2/30/(10/30) = 2/10
2 3/30/(10/30) = 3/10
3 4/30/(10/30) = 4/10

Marginal CDFs from joint CDF

If \(F(x,y)\) is a joint CDF, then:

\[F_X(x) = P(X\leq x) =\lim_{y\rightarrow \infty} F(x,y)\] \[F_Y(y) = P(Y\leq y)= \lim_{x\rightarrow \infty} F(x,y)\]

Should make sense: \(P(X\leq x) = P(X \leq x, Y<\infty)\)

Marginal CDFs for example

y x = 0 x = 1 x = 2
0 0/30 1/30 2/30
1 1/30 2/30 3/30
2 2/30 3/30 4/30
3 3/30 4/30 5/30

\[F_X(0) = \frac{0+1+2+3}{30} = \frac{6}{30}\]

\[F_X(1) = \frac{6}{30}+\frac{1+2+3+4}{30} = \frac{16}{30}\]

\[F_X(2) = \frac{16}{30}+\frac{2+3+4+5}{30} = 1\]

\[F_X(x)= \begin{cases} 0& x < 0 \\ \frac{6}{30} & 0 \leq x < 1\\ \frac{16}{30} & 1 \leq x < 2\\ 1 & x \geq 2 \end{cases}\]

\[F_Y(y)= \begin{cases} 0& y < 0 \\ \frac{3}{30} & 0 \leq y < 1\\ \frac{9}{30} & 1 \leq y < 2\\ \frac{18}{30} & 2 \leq y < 3\\ 1 & y \geq 3 \end{cases}\]

Dart example

  • A bag has 6 boxes.
    • Three boxes have 3 darts,
    • Two of them have 4 darts,
    • One box has 5 darts.
  • A player picks a box at random, then shoot all the darts in the box at a target.
  • Suppose the player can hit the target 60% of the time.
  • Let \(X\) be the number of darts in the box he picks, and \(Y\) the number of times the player hits the target.
  • Find the joint pmf and conditional \(p_{X|Y}\).

Joint pmf

Note we have a marginal and a conditional:

\[p_X(x) = \frac{6-x}{6}, x = 3, 4, 5\]

\[Y|X=x \sim BIN(x, 0.6)\]

Getting R to work for us:

library(purrrfect)
library(tidyverse)

(parameters(~x, ~y, 
           c(3,4,5), 0:5
           ) 
)
# A tibble: 18 × 2
       x     y
   <dbl> <int>
 1     3     0
 2     3     1
 3     3     2
 4     3     3
 5     3     4
 6     3     5
 7     4     0
 8     4     1
 9     4     2
10     4     3
11     4     4
12     4     5
13     5     0
14     5     1
15     5     2
16     5     3
17     5     4
18     5     5

Joint pmf

Note we have a marginal and a conditional:

\[p_X(x) = \frac{6-x}{6}, x = 3, 4, 5\]

\[Y|X=x \sim BIN(x, 0.6)\]

Getting R to work for us:

library(purrrfect)
library(tidyverse)

(parameters(~x, ~y, 
           c(3,4,5), 0:5
           ) 
  %>% mutate(p_X = (6-x)/6, 
             p_Y_given_X = dbinom(y, x, 0.6),
             p_XY = p_X*p_Y_given_X
             )
)
# A tibble: 18 × 5
       x     y   p_X p_Y_given_X    p_XY
   <dbl> <int> <dbl>       <dbl>   <dbl>
 1     3     0 0.5        0.064  0.032  
 2     3     1 0.5        0.288  0.144  
 3     3     2 0.5        0.432  0.216  
 4     3     3 0.5        0.216  0.108  
 5     3     4 0.5        0      0      
 6     3     5 0.5        0      0      
 7     4     0 0.333      0.0256 0.00853
 8     4     1 0.333      0.154  0.0512 
 9     4     2 0.333      0.346  0.115  
10     4     3 0.333      0.346  0.115  
11     4     4 0.333      0.130  0.0432 
12     4     5 0.333      0      0      
13     5     0 0.167      0.0102 0.00171
14     5     1 0.167      0.0768 0.0128 
15     5     2 0.167      0.230  0.0384 
16     5     3 0.167      0.346  0.0576 
17     5     4 0.167      0.259  0.0432 
18     5     5 0.167      0.0778 0.0130 

Joint pmf

Note we have a marginal and a conditional:

\[p_X(x) = \frac{6-x}{6}, x = 3, 4, 5\]

\[Y|X=x \sim BIN(x, 0.6)\]

Getting R to work for us:

library(purrrfect)
library(tidyverse)

(parameters(~x, ~y, 
           c(3,4,5), 0:5
           ) 
  %>% mutate(p_X = (6-x)/6, 
             p_Y_given_X = dbinom(y, x, 0.6),
             p_XY = p_X*p_Y_given_X
             )
  %>% pivot_wider(names_from = y, 
                  values_from = p_XY, 
                  names_prefix = 'y=',
                  id_cols = x)
)
# A tibble: 3 × 7
      x   `y=0`  `y=1`  `y=2`  `y=3`  `y=4`  `y=5`
  <dbl>   <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1     3 0.032   0.144  0.216  0.108  0      0     
2     4 0.00853 0.0512 0.115  0.115  0.0432 0     
3     5 0.00171 0.0128 0.0384 0.0576 0.0432 0.0130

Marginal \(p_Y\) and conditional \(p_{X|Y}\)

\[p_Y(y) = \sum_{x=3}^5 p(x,y)\]

\[p_{X|Y}(x|y) = \frac{p(x,y)}{p_Y(y)}\]

Conditional must sum to 1 for fixed \(y\):

(parameters(~x, ~y, 
           c(3,4,5), 0:5
           ) 
  %>% mutate(p_X = (6-x)/6, 
             p_Y_given_X = dbinom(y, x, 0.6),
             p_XY = p_X*p_Y_given_X
             )
  %>% mutate(p_Y = sum(p_XY), .by = y)
  %>% mutate(p_X_given_Y = p_XY/p_Y)
)
# A tibble: 18 × 7
       x     y   p_X p_Y_given_X    p_XY    p_Y p_X_given_Y
   <dbl> <int> <dbl>       <dbl>   <dbl>  <dbl>       <dbl>
 1     3     0 0.5        0.064  0.032   0.0422      0.758 
 2     3     1 0.5        0.288  0.144   0.208       0.692 
 3     3     2 0.5        0.432  0.216   0.370       0.584 
 4     3     3 0.5        0.216  0.108   0.281       0.385 
 5     3     4 0.5        0      0       0.0864      0     
 6     3     5 0.5        0      0       0.0130      0     
 7     4     0 0.333      0.0256 0.00853 0.0422      0.202 
 8     4     1 0.333      0.154  0.0512  0.208       0.246 
 9     4     2 0.333      0.346  0.115   0.370       0.312 
10     4     3 0.333      0.346  0.115   0.281       0.410 
11     4     4 0.333      0.130  0.0432  0.0864      0.500 
12     4     5 0.333      0      0       0.0130      0     
13     5     0 0.167      0.0102 0.00171 0.0422      0.0404
14     5     1 0.167      0.0768 0.0128  0.208       0.0615
15     5     2 0.167      0.230  0.0384  0.370       0.104 
16     5     3 0.167      0.346  0.0576  0.281       0.205 
17     5     4 0.167      0.259  0.0432  0.0864      0.500 
18     5     5 0.167      0.0778 0.0130  0.0130      1     

Marginal \(p_Y\) and conditional \(p_{X|Y}\)

\[p_Y(y) = \sum_{x=3}^5 p(x,y)\]

\[p_{X|Y}(x|y) = \frac{p(x,y)}{p_Y(y)}\]

Conditional must sum to 1 for fixed \(y\):

(parameters(~x, ~y, 
           c(3,4,5), 0:5
           ) 
  %>% mutate(p_X = (6-x)/6, 
             p_Y_given_X = dbinom(y, x, 0.6),
             p_XY = p_X*p_Y_given_X
             )
  %>% mutate(p_Y = sum(p_XY), .by = y)
  %>% mutate(p_X_given_Y = p_XY/p_Y)
  %>% pivot_wider(names_from = x, 
                  values_from = p_X_given_Y, 
                  names_prefix = 'x=',
                  id_cols = y)
)
# A tibble: 6 × 4
      y `x=3` `x=4`  `x=5`
  <int> <dbl> <dbl>  <dbl>
1     0 0.758 0.202 0.0404
2     1 0.692 0.246 0.0615
3     2 0.584 0.312 0.104 
4     3 0.385 0.410 0.205 
5     4 0     0.500 0.500 
6     5 0     0     1     

Simulating

  • Simulate bivariate random variables often involves first simulating one from the marginal, then the other from the conditional.
  • Here, simulate \(X\), then simulate \(Y|X=x\).
N <- 10000
bag <- c(3,3,3,4,4,5)
one_draw <- \() sample(bag, 1)
(dart_sim <- replicate_int(N, one_draw(), .as = X)
  %>% mutate(Y = map_int(X, \(x) rbinom(1, size = x, prob = 0.6)))
  ) %>% head
# A tibble: 6 × 3
  .trial     X     Y
   <dbl> <int> <int>
1      1     3     2
2      2     3     3
3      3     3     2
4      4     5     3
5      5     3     1
6      6     3     2
(dart_sim 
    %>% summarize(cnt = n(), 
                  .by = c(X,Y)) 
    %>% mutate(joint_prob = cnt/N,
               prob_X_given_Y = cnt/sum(cnt), .by  = Y)
)
# A tibble: 15 × 5
       X     Y   cnt joint_prob prob_X_given_Y
   <int> <int> <int>      <dbl>          <dbl>
 1     3     2  2068     0.207          0.576 
 2     3     3  1092     0.109          0.386 
 3     5     3   563     0.0563         0.199 
 4     3     1  1445     0.144          0.685 
 5     5     2   376     0.0376         0.105 
 6     4     3  1176     0.118          0.415 
 7     4     1   537     0.0537         0.255 
 8     4     2  1145     0.114          0.319 
 9     5     1   128     0.0128         0.0607
10     4     0    86     0.0086         0.199 
11     3     0   332     0.0332         0.769 
12     5     5   145     0.0145         1     
13     4     4   451     0.0451         0.505 
14     5     4   442     0.0442         0.495 
15     5     0    14     0.0014         0.0324

Plot of the simulated joint distribution

ggplot(data = dart_sim) + 
  geom_jitter(aes(x = X, y = Y), width=.1, height=.3, size = .4, alpha = .8) + 
  scale_x_continuous(breaks = 3:5) + 
  scale_y_continuous(breaks = 0:5) + 
  theme_classic(base_size = 20)
plot of simulated joint discrete distribution for dart-throwing example