5.1 - bivariate discrete distributions

Motivation

Consider the data set below, examples of which you have hopefully seen many times:

   age smoker married gestation  bwt
1   30      N married        38 3540
2   30      N married        39 3090
3   23      Y  single        34 2522
4   32      N married        38 3690
5   26      N married        41 3062
6   31      N married        40 3856
7   36      N married        41 4451
8   27      N  single        42 3685
9   22      Y  single        38 3118
10  34      N married        38 3152

Univariate relationships for birth weight data.

This is an example of multivariate data
- \(Y_1\): Normal?
- \(Y_2\), \(Y_3\): Bernoulli?
- \(Y_4\), \(Y_5\): Inverted gamma?
In this example, each observation is a length\(-p\) vector, with \(p=5\)

Multivariate relationships

The “one-by-one” distributions are marginal distributions
In statistics of course, we usually care not just about univariate distributions, but about multivariate distributions

Bivariate relationships for birth weight data.

These multivariate distributions are often called joint distributions

Multivariate discrete

Let \(\{Y_1,...,Y_p\}\in \mathbb{R}^p\) be a \(p-\)vector of discrete random variables.
Then the joint probability mass function for \(\{Y_1,...,Y_p\}\) is:

\[p(y_1,y_2,...,y_p) = P(Y_1 = y_1, Y_2 = y_2,...,Y_p = y_p), y_1 \in \mathbb{R}, ..., y_p \in \mathbb{R}\]

Bivariate discrete

To study mathematical properties we will for the most part stay simple, and let \(p=2\)
We will use \(X \equiv Y_1\) and \(Y\equiv Y_2\), for simplicity of notation.

Joint pmf properties (bivariate case):

\(p(x,y) \geq 0\) for any \(x,y \in \mathbb{R}\).
\(\sum_{x}\sum_{y}p(x,y) = 1\) (sum is over all \(x,y\) values such that \(p(x,y) > 0\).)

Joint CDF

The joint cumulative distribution function (CDF) is defined as a straightforward extension of the univariate case.
For any random variables \(X\) and \(Y\), the joint bivariate distribution function \(F\) is:

\[F(x,y) = P(X\leq x, Y \leq y); x,y \in \mathbb{R}\]

Properties:

\(\lim_{x\rightarrow -\infty}F(x,y) =\lim_{y\rightarrow -\infty}F(x,y) = 0\)
\(\lim_{x\rightarrow \infty}\lim_{y\rightarrow \infty}F(x,y) =1\)
Bivariate non-decreasing property (cumulative probability over a rectangular region is \(\geq 0\)): If \(x < x^*\) and \(y < y^*\), then \(F(x^*, y^*) - F(x^*,y)-F(x,y^*)+F(x,y)= P(x < X \leq x^*, y < Y \leq y^*) \geq 0\)

Visual of Property 3

Bivariate non-decreasing property: If \(x < x^*\) and \(y < y^*\), then \(F(x^*, y^*) - F(x^*,y)-F(x,y^*)+F(x,y)= P(x < X \leq x^*, y < Y \leq y^*) \geq 0\)

image decomposing cumulative probability over a rectangular region

Marginal

If \(X\) and \(Y\) have joint pmf \(p(x,y)\), then the marginal (aka unconditional) pmfs are given by:

\[p_X(x) = P(X=x) = \sum_{all\ y} p(x,y)\]

\[p_Y(y) = P(Y=y) = \sum_{all\ x} p(x,y)\]

It’s the subscript on these functions that’s important!
\(p_X(y)\) makes sense!
- Just “the marginal probability that \(X\) takes on specific value \(y\)”
- Plug value \(y\) into function \(p_X\).

Conditional

If \(X\) and \(Y\) have joint pmf \(p(x,y)\) and marginal pmfs \(p_X\) and \(p_Y\), respectively, then the conditional discrete probability function is:

\[p_{X|Y}(x|y) = P(X=x|Y=y) = \frac{p(x,y)}{p_Y(y)} =\frac{P(X=x, Y=y)}{P(Y=y)},\] we analogously define \(p_{Y|X}(y|x) = p(x,y)/p_X(x)\).

Note from here we can find a joint by combining conditional and marginal:

\[p(x,y) = p_{X|Y}(x|y) p_Y(y)\] \[p(x,y) = p_{Y|X}(y|x) p_X(x)\]

Example

Suppose \(X\) and \(Y\) have joint pmf given by:

\[p(x,y) = \left\{\begin{array} {ll} \frac{1}{30}(x+y) & x=0,1,2; y=0,1,2,3 \\ 0 & otherwise\\ \end{array}\right.\]

3D bar graph representing bivariate discrete distribution

y	x = 0	x = 1	x = 2
0	0/30	1/30	2/30
1	1/30	2/30	3/30
2	2/30	3/30	4/30
3	3/30	4/30	5/30

Verify that this is a valid joint pmf

\[\sum_{x=0}^2 \sum_{y=0}^3\frac{1}{30}(x+y)\] \[= \sum_{x=0}^2 \left(\frac{x+0}{30} + \frac{x+1}{30} + \frac{x+2}{30} + \frac{x+3}{30}\right)\]

\[=\left(\frac{0+0}{30} + \frac{0+1}{30} + \frac{0+2}{30} + \frac{0+3}{30}\right)\] \[+\left(\frac{1+0}{30} + \frac{1+1}{30} + \frac{1+2}{30} + \frac{1+3}{30}\right)\] \[+\left(\frac{2+0}{30} + \frac{2+1}{30} + \frac{2+2}{30} + \frac{2+3}{30}\right)\]

Joint CDF

\(F(1,1) = \frac{0+0}{30} + \frac{1+0}{30} + \frac{0+1}{30} + \frac{1+1}{30}\)

\(F(-1,4) = 0\)

3D bar graph representing bivariate discrete distribution with bars color coded by whether or not they are included in the cumulative probability

\(F(4,4) = 1\)

Marginal of \(X\)

\[p_X(0) = \sum_{y=0}^3 p(0,y) =\left(\frac{0+0}{30} + \frac{0+1}{30} + \frac{0+2}{30} + \frac{0+3}{30}\right) =\frac{6}{30}\] \[p_X(1) = \sum_{y=0}^3 p(1,y) =\left(\frac{1+0}{30} + \frac{1+1}{30} + \frac{1+2}{30} + \frac{1+3}{30}\right) =\frac{10}{30}\] \[p_X(2) = \sum_{y=0}^3 p(2,y) =\left(\frac{2+0}{30} + \frac{2+1}{30} + \frac{2+2}{30} + \frac{2+3}{30}\right) =\frac{14}{30}\]

\(x\)	0	1	2
\(P(X=x)\)	6/30	10/30	14/30

Conditional \(p_{Y|X}\)

\(p_X(x)\):

\(x\)	0	1	2
\(P(X=x)\)	6/30	10/30	14/30

\(p(x,y)\):

y	x = 0	x = 1	x = 2
0	0/30	1/30	2/30
1	1/30	2/30	3/30
2	2/30	3/30	4/30
3	3/30	4/30	5/30

\(p_{Y|0} = P(Y=y|X=0)\):

y	\(P(Y=y\|X=0)\)
0	0/30/(6/30) = 0/6
1	1/30/(6/30) = 1/6
2	2/30/(6/30) = 2/6
3	3/30/(6/30) = 3/6

\(p_{Y|1} = P(Y=y|X=1)\):

y	\(P(Y=y\|X=1)\)
0	1/30/(10/30) = 1/10
1	2/30/(10/30) = 2/10
2	3/30/(10/30) = 3/10
3	4/30/(10/30) = 4/10

Marginal CDFs from joint CDF

If \(F(x,y)\) is a joint CDF, then:

\[F_X(x) = P(X\leq x) =\lim_{y\rightarrow \infty} F(x,y)\] \[F_Y(y) = P(Y\leq y)= \lim_{x\rightarrow \infty} F(x,y)\]

Should make sense: \(P(X\leq x) = P(X \leq x, Y<\infty)\)

Marginal CDFs for example

y	x = 0	x = 1	x = 2
0	0/30	1/30	2/30
1	1/30	2/30	3/30
2	2/30	3/30	4/30
3	3/30	4/30	5/30

\[F_X(0) = \frac{0+1+2+3}{30} = \frac{6}{30}\]

\[F_X(1) = \frac{6}{30}+\frac{1+2+3+4}{30} = \frac{16}{30}\]

\[F_X(2) = \frac{16}{30}+\frac{2+3+4+5}{30} = 1\]

\[F_X(x)= \begin{cases} 0& x < 0 \\ \frac{6}{30} & 0 \leq x < 1\\ \frac{16}{30} & 1 \leq x < 2\\ 1 & x \geq 2 \end{cases}\]

\[F_Y(y)= \begin{cases} 0& y < 0 \\ \frac{3}{30} & 0 \leq y < 1\\ \frac{9}{30} & 1 \leq y < 2\\ \frac{18}{30} & 2 \leq y < 3\\ 1 & y \geq 3 \end{cases}\]

Dart example

A bag has 6 boxes.
- Three boxes have 3 darts,
- Two of them have 4 darts,
- One box has 5 darts.
A player picks a box at random, then shoot all the darts in the box at a target.
Suppose the player can hit the target 60% of the time.
Let \(X\) be the number of darts in the box he picks, and \(Y\) the number of times the player hits the target.
Find the joint pmf and conditional \(p_{X|Y}\).

Joint pmf

Note we have a marginal and a conditional:

\[p_X(x) = \frac{6-x}{6}, x = 3, 4, 5\]

\[Y|X=x \sim BIN(x, 0.6)\]

Getting R to work for us:

library(purrrfect)
library(tidyverse)

(parameters(~x, ~y, 
           c(3,4,5), 0:5
           ) 
)

# A tibble: 18 × 2
       x     y
   <dbl> <int>
 1     3     0
 2     3     1
 3     3     2
 4     3     3
 5     3     4
 6     3     5
 7     4     0
 8     4     1
 9     4     2
10     4     3
11     4     4
12     4     5
13     5     0
14     5     1
15     5     2
16     5     3
17     5     4
18     5     5

Joint pmf

Note we have a marginal and a conditional:

\[p_X(x) = \frac{6-x}{6}, x = 3, 4, 5\]

\[Y|X=x \sim BIN(x, 0.6)\]

Getting R to work for us:

library(purrrfect)
library(tidyverse)

(parameters(~x, ~y, 
           c(3,4,5), 0:5
           ) 
  %>% mutate(p_X = (6-x)/6, 
             p_Y_given_X = dbinom(y, x, 0.6),
             p_XY = p_X*p_Y_given_X
             )
)

# A tibble: 18 × 5
       x     y   p_X p_Y_given_X    p_XY
   <dbl> <int> <dbl>       <dbl>   <dbl>
 1     3     0 0.5        0.064  0.032  
 2     3     1 0.5        0.288  0.144  
 3     3     2 0.5        0.432  0.216  
 4     3     3 0.5        0.216  0.108  
 5     3     4 0.5        0      0      
 6     3     5 0.5        0      0      
 7     4     0 0.333      0.0256 0.00853
 8     4     1 0.333      0.154  0.0512 
 9     4     2 0.333      0.346  0.115  
10     4     3 0.333      0.346  0.115  
11     4     4 0.333      0.130  0.0432 
12     4     5 0.333      0      0      
13     5     0 0.167      0.0102 0.00171
14     5     1 0.167      0.0768 0.0128 
15     5     2 0.167      0.230  0.0384 
16     5     3 0.167      0.346  0.0576 
17     5     4 0.167      0.259  0.0432 
18     5     5 0.167      0.0778 0.0130

Joint pmf

Note we have a marginal and a conditional:

\[p_X(x) = \frac{6-x}{6}, x = 3, 4, 5\]

\[Y|X=x \sim BIN(x, 0.6)\]

Getting R to work for us:

library(purrrfect)
library(tidyverse)

(parameters(~x, ~y, 
           c(3,4,5), 0:5
           ) 
  %>% mutate(p_X = (6-x)/6, 
             p_Y_given_X = dbinom(y, x, 0.6),
             p_XY = p_X*p_Y_given_X
             )
  %>% pivot_wider(names_from = y, 
                  values_from = p_XY, 
                  names_prefix = 'y=',
                  id_cols = x)
)

# A tibble: 3 × 7
      x   `y=0`  `y=1`  `y=2`  `y=3`  `y=4`  `y=5`
  <dbl>   <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1     3 0.032   0.144  0.216  0.108  0      0     
2     4 0.00853 0.0512 0.115  0.115  0.0432 0     
3     5 0.00171 0.0128 0.0384 0.0576 0.0432 0.0130

Marginal \(p_Y\) and conditional \(p_{X|Y}\)

\[p_Y(y) = \sum_{x=3}^5 p(x,y)\]

\[p_{X|Y}(x|y) = \frac{p(x,y)}{p_Y(y)}\]

Conditional must sum to 1 for fixed \(y\):

(parameters(~x, ~y, 
           c(3,4,5), 0:5
           ) 
  %>% mutate(p_X = (6-x)/6, 
             p_Y_given_X = dbinom(y, x, 0.6),
             p_XY = p_X*p_Y_given_X
             )
  %>% mutate(p_Y = sum(p_XY), .by = y)
  %>% mutate(p_X_given_Y = p_XY/p_Y)
)

# A tibble: 18 × 7
       x     y   p_X p_Y_given_X    p_XY    p_Y p_X_given_Y
   <dbl> <int> <dbl>       <dbl>   <dbl>  <dbl>       <dbl>
 1     3     0 0.5        0.064  0.032   0.0422      0.758 
 2     3     1 0.5        0.288  0.144   0.208       0.692 
 3     3     2 0.5        0.432  0.216   0.370       0.584 
 4     3     3 0.5        0.216  0.108   0.281       0.385 
 5     3     4 0.5        0      0       0.0864      0     
 6     3     5 0.5        0      0       0.0130      0     
 7     4     0 0.333      0.0256 0.00853 0.0422      0.202 
 8     4     1 0.333      0.154  0.0512  0.208       0.246 
 9     4     2 0.333      0.346  0.115   0.370       0.312 
10     4     3 0.333      0.346  0.115   0.281       0.410 
11     4     4 0.333      0.130  0.0432  0.0864      0.500 
12     4     5 0.333      0      0       0.0130      0     
13     5     0 0.167      0.0102 0.00171 0.0422      0.0404
14     5     1 0.167      0.0768 0.0128  0.208       0.0615
15     5     2 0.167      0.230  0.0384  0.370       0.104 
16     5     3 0.167      0.346  0.0576  0.281       0.205 
17     5     4 0.167      0.259  0.0432  0.0864      0.500 
18     5     5 0.167      0.0778 0.0130  0.0130      1

Marginal \(p_Y\) and conditional \(p_{X|Y}\)

\[p_Y(y) = \sum_{x=3}^5 p(x,y)\]

\[p_{X|Y}(x|y) = \frac{p(x,y)}{p_Y(y)}\]

Conditional must sum to 1 for fixed \(y\):

(parameters(~x, ~y, 
           c(3,4,5), 0:5
           ) 
  %>% mutate(p_X = (6-x)/6, 
             p_Y_given_X = dbinom(y, x, 0.6),
             p_XY = p_X*p_Y_given_X
             )
  %>% mutate(p_Y = sum(p_XY), .by = y)
  %>% mutate(p_X_given_Y = p_XY/p_Y)
  %>% pivot_wider(names_from = x, 
                  values_from = p_X_given_Y, 
                  names_prefix = 'x=',
                  id_cols = y)
)

# A tibble: 6 × 4
      y `x=3` `x=4`  `x=5`
  <int> <dbl> <dbl>  <dbl>
1     0 0.758 0.202 0.0404
2     1 0.692 0.246 0.0615
3     2 0.584 0.312 0.104 
4     3 0.385 0.410 0.205 
5     4 0     0.500 0.500 
6     5 0     0     1

Simulating

Simulate bivariate random variables often involves first simulating one from the marginal, then the other from the conditional.
Here, simulate \(X\), then simulate \(Y|X=x\).

N <- 10000
bag <- c(3,3,3,4,4,5)
one_draw <- \() sample(bag, 1)
(dart_sim <- replicate_int(N, one_draw(), .as = X)
  %>% mutate(Y = map_int(X, \(x) rbinom(1, size = x, prob = 0.6)))
  ) %>% head

# A tibble: 6 × 3
  .trial     X     Y
   <dbl> <int> <int>
1      1     3     2
2      2     3     3
3      3     3     2
4      4     5     3
5      5     3     1
6      6     3     2

(dart_sim 
    %>% summarize(cnt = n(), 
                  .by = c(X,Y)) 
    %>% mutate(joint_prob = cnt/N,
               prob_X_given_Y = cnt/sum(cnt), .by  = Y)
)

# A tibble: 15 × 5
       X     Y   cnt joint_prob prob_X_given_Y
   <int> <int> <int>      <dbl>          <dbl>
 1     3     2  2068     0.207          0.576 
 2     3     3  1092     0.109          0.386 
 3     5     3   563     0.0563         0.199 
 4     3     1  1445     0.144          0.685 
 5     5     2   376     0.0376         0.105 
 6     4     3  1176     0.118          0.415 
 7     4     1   537     0.0537         0.255 
 8     4     2  1145     0.114          0.319 
 9     5     1   128     0.0128         0.0607
10     4     0    86     0.0086         0.199 
11     3     0   332     0.0332         0.769 
12     5     5   145     0.0145         1     
13     4     4   451     0.0451         0.505 
14     5     4   442     0.0442         0.495 
15     5     0    14     0.0014         0.0324

Plot of the simulated joint distribution

ggplot(data = dart_sim) + 
  geom_jitter(aes(x = X, y = Y), width=.1, height=.3, size = .4, alpha = .8) + 
  scale_x_continuous(breaks = 3:5) + 
  scale_y_continuous(breaks = 0:5) + 
  theme_classic(base_size = 20)

plot of simulated joint discrete distribution for dart-throwing example