It can be useful (or not useful) to plot the marginal probabilities in the margins of a chart or table…

head(penguins_dt)
##    species    island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##     <fctr>    <fctr>          <num>         <num>             <int>       <int>
## 1:  Adelie Torgersen           39.1          18.7               181        3750
## 2:  Adelie Torgersen           39.5          17.4               186        3800
## 3:  Adelie Torgersen           40.3          18.0               195        3250
## 4:  Adelie Torgersen             NA            NA                NA          NA
## 5:  Adelie Torgersen           36.7          19.3               193        3450
## 6:  Adelie Torgersen           39.3          20.6               190        3650
##       sex  year
##    <fctr> <int>
## 1:   male  2007
## 2: female  2007
## 3: female  2007
## 4:   <NA>  2007
## 5: female  2007
## 6:   male  2007

There’s two distinct clusters in the data corresponding to two classes of penguin…

library(ggplot2)
library(ggExtra)

# Create a joint density plot
p <- ggplot(dt, aes(x = Distance, y = Speed)) +
  geom_point(color = "lightblue", alpha = 0.85) +
  geom_density_2d(color = "steelblue", alpha = 0.75) +
  theme_minimal()

# Add marginal densities
ggMarginal(p, type = "density",color = "steelblue", fill = "lightblue")

marginal probabilities in a joint probability table
marginal probabilities in a joint probability table

Discrete Case, Joint Mass Probability Function

A joint mass probability function \(f_{XY}(x_i,y_j)\) describes the probability distribution of random variable pairs \(\{x_i,y_j\}\) where \(0\leq f(x_i,y_j)\) and \(\sum_{i=1}^N\sum_{j=1}^Mf(x_i,y_j)=1\). The point mass probability function can be visualised in joint probability table in which the probability of an event correspond to subsets of cells in the table: \[P(E) = \sum_Ef_{XY}(x_i,y_j)\] Similarly, the joint cumulative distribution function is given by: \[F_{XY}=\sum_{x_i\leq x}\sum_{y_j\leq y}\:f_{XY}(x_i,y_i)\]

Continuous Case, Joint Probability Density Function

A joint probability density function \(f_{XY}(x_i,y_j)\) describes the probability distribution of a pair of continuous random variables \(\{x_i,y_j\}\) where \(0\leq f(x_i,y_j)\) and \(\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}f_{XY}(x_i,y_j)\:dx\:dy=1\). The point mass probability function can be visualised on a 2-d plot in which an event correspond to subsets of the area on the canvas:

joint probability density function: \[\qquad\qquad P(E) = \int\int_Ef_{XY}(x_i,y_j)\:dx\:dy\qquad\qquad\] the joint cumulative distribution: \[F_{X,Y}(x,y)=\int_\infty^y\int_\infty^x\:f_{XY}(x,y)\:dx\:dy\] \[f_{XY}(x,y)=\frac{d}{dx}\frac{d}{dy}F_{XY}(x,y)\]

Note on Independence

For independent events \(A\) and \(B\): \[P(A\cap B)=P(A)P(B)\] …that is to say any event defined by \(X\) is independent of any even defined by \(Y\), (they can be accurately represented on two independent sapmling spaces).

Jointly-distributed random variables \(X\) and \(Y\) are independent if their joint cdf or pdf is the product of the marginal cdfs and pdfs: \[F_{XY}(x,y)=F_X(x)F_Y(y)\] \[f_{XY}(x,y)=f_X(x)f_Y(y)\] For independent random variables, each “slice” or “line” through the joint PDF is just a scaled version of the other marginal PDF.

Covariance & Correlation

Given random variables \(X\) and \(Y\) with means of \(\mu_X\) and \(\mu_Y\) respectively, the covariance is defined as:
\[\boxed{\text{Cov}(X,Y)\equiv E\left[(X-\mu_X)(Y-\mu_Y)\right]}\\ =E\left[XY\right]-\mu_X\mu_Y\] \[\begin{align} \text{Cov}(X,Y)&=E\left[(X-E[X])(Y-E[Y])\right]\\ &=E\left[XY-XE[Y]-E[X]Y+E[X]E[Y]\right]\\ &=E\left[XY\right]-E[X]E[Y]-E[X]E[Y]+E[X]E[Y]\\ &=E\left[XY\right]-E[X]E[Y] \end{align}\]

Correlation is given by: \[\rho=\boxed{\text{Cor}(X,Y)=\frac{\text{Cov}(X,Y)}{\sigma_X\sigma_Y}}\]

if \(X\) and \(Y\) are discrete random variables with joint pmf \(p(x_i,y_j)\) then covariance is:
\[\text{Cov}(X,Y)=\left(\sum_{i=1}^n\sum_{j=1}^mp(x_i,y_i)x_iy_i\right)-\mu_X\mu_Y=E[XY]-\mu_X\mu_Y\]

\[\begin{align} \text{Cov}(X,Y)&=E\left[ p(x_i,y_i)(x_i-\mu_X)(y_j-\mu_Y)\right]\\ &=E\left[ p(x_i,y_i)(x_iy_i-x_i\mu_Y-\mu_Xy_i+\mu_X\mu_Y)\right]\\ &=E\left[ p(x_i,y_i)x_iy_i\right]-E\left[ p(x_i,y_i)x_i\mu_Y\right]-E\left[p(x_i,y_i)\mu_Xy_i\right]+E\left[p(x_i,y_i)\mu_X\mu_Y\right]\\ &=E\left[ p(x_i,y_i)x_iy_i\right]-\mu_YE\left[ p(x_i,y_i)x_i\right]-\mu_XE\left[p(x_i,y_i)y_iE\right]+\mu_X\mu_YE\left[ p(x_i,y_i)\right]\\ &=E\left[ p(x_i,y_i)x_iy_i\right]-\mu_Y\mu_X-\mu_X\mu_Y+\mu_X\mu_Y\\ &=E\left[ p(x_i,y_i)x_iy_i\right]-\mu_X\mu_Y \end{align}\]

if \(X\) and \(Y\) are continuous random variables with joint pmf \(p(x_i,y_j)\) then covariance is:
\[\text{Cov}(X,Y)=\left(\int_{-\infty}^\infty\int_{-\infty}^\infty p(x_i,y_i)x_iy_i\:dx\:dy\right)-\mu_X\mu_Y=E[XY]-\mu_X\mu_Y\] \[\begin{align} \text{Cov}(X,Y)&=E\left[ p(x_i,y_i)(x_i-\mu_X)(y_j-\mu_Y)\right]\\ &=E\left[ p(x_i,y_i)(x_iy_i-x_i\mu_Y-\mu_Xy_i+\mu_X\mu_Y)\right]\\ &=E\left[ p(x_i,y_i)x_iy_i\right]-E\left[ p(x_i,y_i)x_i\mu_Y\right]-E\left[p(x_i,y_i)\mu_Xy_i\right]+E\left[p(x_i,y_i)\mu_X\mu_Y\right]\\ &=E\left[ p(x_i,y_i)x_iy_i\right]-\mu_YE\left[ p(x_i,y_i)x_i\right]-\mu_XE\left[p(x_i,y_i)y_iE\right]+\mu_X\mu_YE\left[ p(x_i,y_i)\right]\\ &=E\left[ p(x_i,y_i)x_iy_i\right]-\mu_Y\mu_X-\mu_X\mu_Y+\mu_X\mu_Y\\ &=E\left[ p(x_i,y_i)x_iy_i\right]-\mu_X\mu_Y \end{align}\]
  1. Linearity: \(\text{Cov}(aX+b,cY+d)=ac\text{Cov}(X,Y)\) This property is used in analyzing relationships between linearly transformed random variables, such as when standardizing or scaling variables (e.g., in regression analysis). \[\begin{align} \text{Cov}(X,Y)&=E\left[(X-E[X])(Y-E[Y])\right]\\ &=E\left[XY-XE[Y]-E[X]Y+E[X]E[Y]\right]\\ &=E\left[XY\right]-E[X]E[Y]-E[X]E[Y]+E[X]E[Y]\\ &=E\left[XY\right]-E[X]E[Y]\\ \\ \text{Cov}(aX+b,cY+d)&=E\left[(aX+b-E[aX+b])(cY+d-E[cY+d])\right]\\ &=E\left[(aX+b-aE[X]-b])(cY+d-cE[Y]-d)\right]\\ &=E\left[a(X-E[X]])c(Y-E[Y])\right]\\ &=acE\left[(X-E[X]])(Y-E[Y])\right]\\ &=ac\text{Cov}(X,Y)\\ \end{align}\]
  2. Additivity: \(\text{Cov}(X_1+X_2,Y)=\text{Cov}(X_1,Y)+\text{Cov}(X_2,Y)\) This is used in calculating the covariance of sums of random variables, particularly in multivariate analysis and statistical modeling involving multiple predictors or outcomes \[\begin{align} \text{Cov}(X,Y)&=E\left[(X-E[X])(Y-E[Y])\right]\\ &=E\left[XY-XE[Y]-E[X]Y+E[X]E[Y]\right]\\ &=E\left[XY\right]-E[X]E[Y]-E[X]E[Y]+E[X]E[Y]\\ &=E\left[XY\right]-E[X]E[Y]\\ \\ \text{Cov}(X_1+X_2,Y)&=E\left[(X_1+X_2)Y)\right]-E[X_1+X_2])E[Y]\\ &=E[X_1Y]+E[X_2Y]-E[X_1]E[Y]+E[X_2]E[Y]\\ &=\text{Cov}(X_1,Y)+\text{Cov}(X_2,Y) \end{align}\]
  1. \(\text{Cov}(X,X)=\text{Var}(X)\) \[ \begin{align} \text{Cov}(X,X)&=E\left[(X-\mu_X)(X-\mu_X)\right]\\ &=E\left[(X-\mu_X)^2\right]\\ &=\text{Var}(X) \end{align} \]
  2. \(\text{Var}(X+Y)=\text{Var}(X)+\text{Var}(Y)+2\,\text{Cov}(X,Y)\) This property is widely used in statistics when dealing with sums of random variables, such as in portfolio variance in finance, reliability engineering, and analysis of errors. I remember it turned up in the Kriging derivation \[\begin{align} \text{Var}(X+Y)&=E\left[(X+Y)^2\right]-E[X+Y]^2\\ &=E\left[X^2+2XY+Y^2\right]-E[X+Y]^2\\ &=E[X^2]+2E[XY]+E[Y^2]-E[X+Y]^2\\ &=E[X^2]+2E[XY]+E[Y^2]-\left(E[X]+E[Y]\right)^2\\ &=E[X^2]+2E[XY]+E[Y^2]-E[X]^2-2E[X]E[Y]-E[Y]^2\\ &=(E[X^2]-E[X]^2)+2(E[XY]-E[X]E[Y])-(E[Y]^2-E[Y^2])\\ &=\text{Var}(X)+2\,\text{Cov}(X,Y)+\text{Var}(Y) \end{align}\]
  3. \(\text{Cov}(X,Y)=0\) is necessary but not sufficient for \(X\) and \(Y\) to be independent If \(X\) and \(Y\) are independent then \(\text{Var}(X+Y)=\text{Var}(X)+\text{Var}(Y)\). Therefore, from the previous property, \(text{Cov}(X,Y)=0\). If \(X\) and \(Y\) are independent then the joint pdf \(f_{XY}(x,y)=f_X(x)f_Y(y)\) \[\begin{align} \text{Cov}(X,Y)&=\int_{-\infty}^\infty\int_{-\infty}^\infty p(x_i,y_i)(x_i-\mu_X)(y_j-\mu_Y)\:dx\:dy\\ &=\int_{-\infty}^\infty p(x_i)(x_i-\mu_X)\:dx \cdot \int_{-\infty}^\infty p(y_i)(y_j-\mu_Y)\:dy\\ &=E\left[X-\mu_X\right]E\left[Y-\mu_Y\right]\\ &=(E[X]-\mu_X)(E[Y]-\mu_Y) =(0)\cdot(0)\\ &=0 \end{align}\]

Covariance and Correlation values for bi-variate distribution

The bivariate normal distribution has density: \[f(x,y)=\frac{e^{\frac{-1}{2(1-p^2)}\left[\frac{(x-\mu_X)^2}{\sigma_X^2}+z\frac{(y-\mu_Y)^2}{\sigma_Y^2}-\frac{2p(x-\mu_X)(y-\mu_y)}{\sigma_X\sigma_Y}\right]}}{2\pi\sigma_X\sigma_Y\sqrt{1-p^2}}\] convariance matrix \(\begin{bmatrix}\sigma_X&\sigma_{XY}\\\sigma_{YX}&\sigma_{Y}\end{bmatrix}\)

## 
## Attaching package: 'MASS'
## The following object is masked from 'package:plotly':
## 
##     select