6.1 - CDF method

Where we’re headed

  • The next several lectures focus on methods for finding distributions of transformations, or functions, of random variables.
  • The methods are:
    • The CDF method
    • The pdf method (offshoot of CDF method)
    • MGF method
    • Jacobian method for \(2\rightarrow 2\) transformations

CDF method overview

  • Let \(Y\) be a continuous random variable with pdf \(f_Y(y)\).
  • Interested in finding distribution of \(U = g(Y)\).
  • Approach:
    • Identify support of \(U\)
    • Write \(F_U(u) = P(U \le u) = P(g(Y)\le u)\)
    • Use properties of \(Y\) to find \(F_U(u)\)
    • Differentiate \(F_U(u)\) to find \(f_U(u)\)
  • Can apply CDF method for both \(1 \rightarrow 1\) and \(2 \rightarrow 1\) transformations.

Example: linear transformation of a beta

Let \(Y \sim BETA(1,2)\), thus:

\[f_Y(y) = \left\{\begin{array} {ll} 2(1-y) & 0\leq y \leq 1 \\ 0 & otherwise\\ \end{array}\right.\]

Note, for \(y\in (0,1)\):

\[P(Y\le y)= \int_0^y 2(1-t)\, dt = 2y-y^2 \Rightarrow F_Y(y) = \begin{cases} 0 & y < 0 \\ 2y-y^2 & 0 \le y \le 1 \\ 1 & y > 1 \end{cases}\]

Let \(U = 2Y-1\). Find distribution of \(U\).

Support and CDF of \(U\)

\[f_Y(y) = \left\{\begin{array} {ll} 2(1-y) & 0\leq y \leq 1 \\ 0 & otherwise\\ \end{array}\right.\]

\[F_Y(y) = \begin{cases} 0 & y < 0 \\ 2y-y^2 & 0 \le y \le 1 \\ 1 & y > 1 \end{cases}\]

Find distribution of \(U = 2Y-1\).

  • Support: \(-1\le U \le 1\)
  • CDF, for \(-1 \le u \le 1\):

\[F_U(u) = P(U\le u) = P(2Y-1 \le u) = P\left(Y\le \frac{u+1}{2}\right)\]

\[ = F_Y\left(\frac{u+1}{2}\right) = 2\cdot \left(\frac{u+1}{2}\right)-\left(\frac{u+1}{2}\right)^2=\frac{3+2u-u^2}{4}\]

Full CDF and pdf of \(U\)

\[F_U(u) = \begin{cases} 0 & u < -1 \\ \frac{3+2u-u^2}{4} & -1 \le u \le 1 \\ 1 & u > 1 \end{cases}\]

\[f_U(u) = \frac{d}{du}F_U (u) = \begin{cases} \frac{1-u}{2} & -1\le u \le 1 \\ 0 & otherwise \end{cases}\]

Verifying via simulation study

library(tidyverse)
library(patchwork) # To add plots side-by-side!

sim_df <- data.frame(Y = rbeta(10000, 1, 2)) %>% 
          mutate(U = 2*Y-1)

ggplot(data = sim_df) + 
    geom_histogram(aes(x = Y,y=after_stat(density)), 
                   fill = 'goldenrod',
                   color ='black',
                   center = 0.02, binwidth = 0.04)+
    geom_function(fun = \(y) dbeta(y, 1, 2), 
                  linewidth = 1,
                  xlim = c(0,1)) + 
    xlim(c(-1,1)) + ylim(c(0,2.5))  + 
    labs(x = 'y', y = expression(f[Y](y))) + 
    theme_classic(base_size = 14)  +


ggplot(data = sim_df) + 
    geom_histogram(aes(x = U,y=after_stat(density)), 
                   fill = 'cornflowerblue',
                   color ='black',
                   center = -0.98,  binwidth = 0.04)+
    geom_function(fun = \(u) (1-u)/2,
                  linewidth = 1) +
    xlim(c(-1,1)) + ylim(c(0,2.5))  + 
    labs(x = 'u', y = expression(f[U](u))) + 
    theme_classic(base_size = 14)
Simulated densities of Y and U with analytic densities superimposed

Example: squaring a uniform

Let \(Y \sim UNIF(-1,1)\), thus:

\[\small f_Y(y) = \begin{cases} \frac{1}{2} & -1< y < 1 \\ 0 & otherwise \end{cases}\]

\[\small F_Y(y) = \begin{cases} 0 & y \le -1 \\ \frac{y+1}{2} & -1 < y < 1 \\ 1 & y \ge 1 \end{cases}\]

Let \(U = Y^2\). Find distribution of \(U\).

  • Support: \(0 < U < 1\)
  • CDF, for \(0 < u < 1\):

\[\small F_U(u) = P(U\le u) = P(Y^2 \le u) = P\left(-\sqrt{u}\le Y \le \sqrt{u}\right)\]

\[\small = F_Y(\sqrt{u}) - F_Y(-\sqrt{u})\]

\[\small = \frac{\sqrt{u}+1}{2} - \frac{-\sqrt{u}+1}{2} = \sqrt{u}\]

Full CDF and pdf of \(U\)

\[F_U(u) = \begin{cases} 0 & u \le 0 \\ \sqrt{u} & 0 < u < 1 \\ 1 & u \geq 1 \end{cases}\]

\[f_U(u) = \frac{d}{du}F_U (u) = \begin{cases} \frac{1}{2\sqrt{u}} & 0 < u < 1 \\ 0 & otherwise \end{cases}\]

Verifying via simulation study

sim_df <- data.frame(Y = runif(10000, -1, 1)) %>% 
          mutate(U = Y^2)

ggplot(data = sim_df) + 
    geom_histogram(aes(x = Y,y=after_stat(density)), 
                   fill = 'goldenrod',
                   color ='black',
                   center = -0.98, binwidth = 0.04)+
    geom_function(fun = \(y) dunif(y, -1, 1), 
                  linewidth = 1) + 
    xlim(c(-1,1)) + ylim(c(0,2.5))  + 
    labs(x = 'y', y = expression(f[Y](y))) + 
    theme_classic(base_size = 14)  +


ggplot(data = sim_df) + 
    geom_histogram(aes(x = U,y=after_stat(density)), 
                   fill = 'cornflowerblue',
                   color ='black',
                   center = 0.02,  binwidth = 0.04)+
    geom_function(fun = \(u) ifelse( u > 0, 1/(2*sqrt(u)), 0),
                  linewidth = 1) +
    xlim(c(-1,1)) + ylim(c(0,2.5))  + 
    labs(x = 'u', y = expression(f[U](u))) + 
    theme_classic(base_size = 14)
Simulated densities of Y and U with analytic densities superimposed

Adding uniforms: a \(2\rightarrow 1\) example

  • Suppose \(X\), \(Y\) are independent \(UNIF(0,1)\)

\[\Rightarrow f_{X,Y}(x,y) = \begin{cases} 1 & 0 < x < 1, 0 < y < 1 \\ 0 & otherwise \end{cases}\]

  • Let \(U = X + Y\). Find the distribution of \(U\).
  • Support: \(0 \le U \le 2\)
  • CDF:

\[F_U(u) = P(U \le u) = P(X+Y \le u) = P(Y \le u - X)\]

Regions of integration

\[\small 0 \le u \le 1:\]

region of integration for u in (0,1)

\[\tiny F_U(u) = P(X+Y \le u) = \frac{u^2}{2}\]

\[ \small 1 \le u \le 2:\]

region of integration for u in (1,2)

\[\tiny F_U(u) = P(X+Y \le u) = 1-P(X+Y > u) =1-\frac{(2-u)^2}{2}\]

Full CDF and pdf

\[F_U(u) = \begin{cases} 0 & u < 0 \\ \frac{u^2}{2} & 0 \le u \le 1 \\ 1-\frac{(2-u)^2}{2} & 1 \le u \le 2 \\ 1 & u > 2 \end{cases}\]

\[f_U(u) = \frac{d}{du} F_U(u) = \begin{cases} u & 0 \le u \le 1 \\ 2-u & 1 \le u \le 2 \\ 0 & otherwise \end{cases}\]

Verifying via simulation study

sim_df <- data.frame(Y = runif(10000, 0, 1),
                     X = runif(10000, 0,1)) %>% 
          mutate(U = X+Y)

ggplot(data = sim_df) + 
    geom_point(aes(x = X, y = Y), alpha = 0.5)+
    theme_classic() + 


ggplot(data = sim_df) + 
    geom_histogram(aes(x = U,y=after_stat(density)), 
                   fill = 'cornflowerblue',
                   color ='black',
                   center = 0.02,  binwidth = 0.04)+
    geom_function(fun = \(u) case_when( 0 < u & u <= 1 ~ u,
                                        1 < u & u < 2 ~ 2-u,
                                        .default = 0),
                  linewidth = 1) +
    xlim(c(0,2)) + ylim(c(0,2.5))  + 
    labs(x = 'u', y = expression(f[U](u))) + 
    theme_classic()
Simulated densities of Y and U with analytic densities superimposed

Another \(2\rightarrow 1\) uniform example

  • Suppose \(X\), \(Y\) are independent \(UNIF(0,1)\)

\[\Rightarrow f_{X,Y}(x,y) = \begin{cases} 1 & 0 < x < 1, 0 < y < 1 \\ 0 & otherwise \end{cases}\]

  • Let \(U = -\ln(XY)\). Find the distribution of \(U\).
  • Support: \(U > 0\)
  • CDF:

\[F_U(u) = P(U \le u) = P(-\ln(XY) \le u) = P(\ln(XY) \ge -u)\]

\[= P\left(XY \ge e^{-u}\right) = P\left(Y \ge \frac{e^{-u}}{X}\right)\]

Region of integration

region of integration

\[\small F_U(u) = P\left(Y \ge \frac{e^{-u}}{X}\right) =\int_{e^{-u}}^1 \int_{e^{-u}/x}^1 1\, dy\, dx\]

\[ \small = \int_{e^{-u}}^1(1-e^{-u}/x)\,dx = (x - \ln(x)e^{-u})\big|_{e^{-u}}^1 = 1-e^{-u}-ue^{-u}\]

Full CDF and pdf

\[F_U(u) = \begin{cases} 0 & u \le 0 \\ 1-e^{-u}-ue^{-u} & u > 0\\ \end{cases}\]

\[f_U(u) = \frac{d}{du} F_U(u) = \begin{cases} ue^{-u} & u> 0\\ 0 & otherwise \end{cases}\]

\(\Rightarrow U \sim GAM(2,1)!\) (Note \(1^2/\Gamma(2) = 1\))

Verifying via simulation study

sim_df <- data.frame(Y = runif(10000, 0, 1),
                     X = runif(10000, 0,1)) %>% 
          mutate(U = -log(X*Y))

ggplot(data = sim_df) + 
    geom_point(aes(x = X, y = Y), alpha = 0.5)+
    theme_classic() + 


ggplot(data = sim_df) + 
    geom_histogram(aes(x = U,y=after_stat(density)), 
                   fill = 'cornflowerblue',
                   center = 0.05,  binwidth = 0.1)+
    geom_function(fun = \(u) dgamma(u, shape = 2, rate = 1),
                  linewidth = 1) +
    xlim(c(0,10)) + ylim(c(0,.5))  + 
    labs(x = 'u', y = expression(f[U](u))) + 
    theme_classic()
Simulated densities of Y and U with analytic densities superimposed