DAT 301 HW 3

April 15, 2023

Bivariate Gaussians

A bivariate Gaussian distribution is a probability distribution that describes the joint distribution of two continuous random variables. It is also known as a 2D normal distribution, because it is a multivariate normal distribution with two dimensions.

The bivariate Gaussian distribution is characterized by two parameters: the mean vector and the covariance matrix. The mean vector is a two-dimensional vector that specifies the expected values of the two random variables, and the covariance matrix is a 2 \(\times\) 2 matrix that specifies the covariance between the two random variables.

The probability density function (PDF) of a bivariate Gaussian distribution is given by:

\(f_{X,Y}(x,y) = \frac{1}{2\pi\sigma_x\sigma_y\sqrt{1-\rho^2}}\exp\left[-\frac{1}{2(1-\rho^2)}\left(\frac{(x-\mu_x)^2}{\sigma_x^2}-2\rho\frac{(x-\mu_x)(y-\mu_y)}{\sigma_x\sigma_y}+\frac{(y-\mu_y)^2}{\sigma_y^2}\right)\right]\)

where \(\mu_x\) and \(\mu_y\) are the means of \(X\) and \(Y\), \(\sigma_x\) and \(\sigma_y\) are their standard deviations, and \(\rho\) is their correlation coefficient. This is a common bivariate PDF called the normal distribution.

The bivariate Gaussian distribution is widely used in statistics, machine learning, and signal processing, among other fields, to model the joint distribution of two variables that are correlated. It is also a fundamental building block for more complex multivariate Gaussian distributions.

Covariance and Correlation relationship

Expectation value of X:

\(\operatorname{E}[X] = \int_{-\infty}^{\infty} x f_X(x) dx\)
Standard deviation:

\(\sigma_X = \sqrt{\operatorname{E}[(X - \operatorname{E}[X])^2]}\)
Covariance:

\(\operatorname{cov}(X,Y) = \operatorname{E}[(X - \operatorname{E}[X])(Y - \operatorname{E}[Y])]\)
Correlation:

\(\rho_{X,Y} = \frac{\operatorname{cov}(X,Y)}{\sigma_X \sigma_Y}\)

In these equations, \(\operatorname{cov}(X,Y)\) represents the covariance between the random variables X and Y, \(\sigma_X\) and \(\sigma_Y\) represent the standard deviations of X and Y, respectively, and \(\rho_{X,Y}\) represents the correlation coefficient between \(X\) and \(Y\).
\(\operatorname{E}[X]\) represents the expected value (or mean) of \(X\), calculated as the integral of \(X\) multiplied by its probability density function \(f_X(x)\) over the range of \(X\).

R code to get a 3D plot the PDF of a Bivariate Gaussian with \(\rho=-0.8\)

The following code is used to generate the PDF plot of a Bivariate Gaussian with \(\mu_x=\mu_y=0\), \(\sigma_x=\sigma_y=1,\) and correlation \(\rho_1=-0.8\).

library(mvtnorm)
library(ggplot2)
library(plotly)
library(tidyr)
x = seq(-4,4,by=0.05)
y = seq(-4,4,by=0.05)
rho1 = -0.8
## creating meshgrid of (x,y) points
X = matrix(rep(x,length(y)),nrow=length(y), byrow=TRUE)
Y = matrix(rep(y,length(x)),ncol=length(x), byrow=FALSE)
## creating a surface Z=f(X,Y)
Z1 = exp(-(X^2-2*rho1*X*Y+Y^2)/2)/(2*pi*sqrt(1-rho1^2))
fig1 <- plot_ly(x = X, y = Y, z = Z1, showscale=F, width=800, 
                height=550) %>% add_surface(contours = list(
                  z = list(show=TRUE, usecolormap=TRUE,
                           highlightcolor="#ff0000",
                           project=list(z=TRUE))))%>%
  layout(title = "Bivariate Gaussian Distribution PDF")

3D Plot of the PDF of a Bivariate Gaussian with \(\rho=-0.8\)

The plot show a bell-shaped bivariate distribution, with elliptical contours elongated along the second diagonal, which is the line \(Y=-X\).

3D Plot of the PDF of a Bivariate Gaussian with \(\rho=0\)

The plot show a bell-shaped bivariate distribution, with circle contours centered at \((0,0)\).

3D Plot of the PDF of a Bivariate Gaussian with \(\rho=0.8\)

The plot show a bell-shaped bivariate distribution, with elliptical contours elongated along the diagonal, which is the line \(Y=X\).

R Code to get the Scatterplot of Y vs X of a Bivariate Gaussian with \(\rho=-0.8\)

The following code is used to generate the scatter plot of a Bivariate Gaussian with \(\mu_x=\mu_y=0\), \(\sigma_x=\sigma_y=1,\) and correlation \(\rho_1=-0.8\).

set.seed(123) # Set the seed for reproducibility
mu <- c(0, 0)
sigma1 <- matrix(c(1, -0.8, -0.8, 1), nrow=2)
data1 <- rmvnorm(n=1000, mean=mu, sigma=sigma1)
colnames(data1) <- c("X","Y")
data1 <- as.data.frame(x=data1)
plot1 = ggplot(data1) + aes(x=X,y=Y) + geom_point() +
  geom_smooth(method="lm", se=F) +
  ggtitle("Scatterplot of Y vs X") +
  theme(plot.title = element_text(hjust = 0.5))

Scatterplot of Y vs X of a Bivariate Gaussian with \(\rho=-0.8\)

When the correlation coefficient \(\rho\) between two variables, \(X\) and \(Y\),
is negative, the scatter plot of \(Y\) versus \(X\) shows a downward sloping pattern. This means that as \(X\) increases, \(Y\) tends to decrease.
The scatter plot shows a cloud of points that are clustered around a line that slopes downwards from left to right. The tighter the points are clustered around this line, the stronger the negative correlation between \(X\) and \(Y\).

Scatterplot of Y vs X of a Bivariate Gaussian with \(\rho=0\)

When the correlation coefficient \(\rho\) between two variables, \(X\) and \(Y\),
is zero, the scatter plot of \(Y\) versus \(X\) shows no clear linear relationship between the variables. Instead, the points are scattered randomly throughout the plot.
The scatter plot shows a cloud of points with no clear pattern or trend. The points may be clustered in certain areas of the plot, but there is no clear direction or slope to the pattern.

R Code to get the Scatterplot of Y vs X of a Bivariate Gaussian with \(\rho=0.8\)

When the correlation coefficient \(\rho\) between two variables, \(X\) and \(Y\),
is positive, the scatter plot of \(Y\) versus \(X\) shows an upward sloping pattern. This means that as \(X\) increases, \(Y\) tends to increase.
The scatter plot shows a cloud of points that are clustered around a line that slopes upwards from left to right. The tighter the points are clustered around this line, the stronger the positive correlation between \(X\) and \(Y\).

Histogram of a Random variable X of a Bivariate Gaussian with \(\rho=-0.8\)

The histogram of \(X\) shows a skewed distribution. The distribution is skewed to the left, with a longer tail on the left side of the histogram.

Histogram of a Random variable Y of a Bivariate Gaussian with \(\rho=-0.8\)

The histogram of \(Y\) shows a skewed distribution. The distribution is skewed to the right, with a longer tail on the right side of the histogram.