3/25/25

Definition

  • The normal distribbution is the classic bell curve
    • Models a single continuous variable
    • Most values cluster around the mean
  • The multivariate normal extends this idea
    • Models multiple continuous variables
    • Captures how variables relate to each other
  • Defined by:
    • A mean vector \(\mu\): center for each variable
    • A covariance matrix \(\Sigma\): spread and correlation

Mathematical Definition

\[ f(\mathbf{x}) = \frac{1}{(2\pi)^{d/2} |\Sigma|^{1/2}} \exp\left(-\frac{1}{2} (\mathbf{x} - \mu)^T \Sigma^{-1} (\mathbf{x} - \mu)\right) \]

  • \(\mu\): Mean vector
  • \(\Sigma\): Covariance matrix
    • Describes spread and relationships between variables
  • \(|\Sigma|\): Determinant of the covariance matrix
  • \(d\): Number of dimensions
  • \((\mathbf{x} - \mu)^T \Sigma^{-1} (\mathbf{x} - \mu)\): Mahalanobis distance squared
    • Measures the distance from the mean

3D Surface Plot

plot_ly(x = ~x, y = ~y, z = ~z, type = "surface") %>%
  layout(title = "Multivariate Normal Density")

3D Surface Plot contd.

2D Density Heatmap

ggplot(data, aes(x = X1, y = X2)) +
  stat_density_2d(aes(fill = after_stat(level)), geom = "polygon", color = "darkgray") +
  scale_fill_gradient(low = "lightblue", high = "darkblue") +
  labs(title = "Bivariate Normal Density", x = "X1", y = "X2")

2D Density Heatmap contd.

Scatter Plot

ggplot(data, aes(x = X1, y = X2)) +
  geom_point(alpha = 0.5) +
  geom_smooth(method = "lm", se = FALSE, color = "blue") +
  labs(title = "Correlation", x = "X1", y = "X2")

Scatter Plot contd.

Properties

Let \(X \sim \mathcal{N}_d(\mu, \Sigma)\), then: - Any weighted combination of variables is still normally distributed
\[ a^T X \sim \mathcal{N}(a^T \mu,\ a^T \Sigma a) \] - Each individual variable on its own follows a normal distribution
\[ X_i \sim \mathcal{N}(\mu_i,\ \Sigma_{ii}) \] - The distribution of one variable, given another, is also normal