This project will help us visualize various functions attacted to the different distributions learnt in class.

1 Binomial distribution.

1.1 Probability Mass Function

We will plot the probability mass function for the binomial distribution Bin(n, p). Note: 1) The values for this random variable are 0, 1, 2, …, n. 2) The density plot will have a bar of height P(X=k), at the point ‘k’ on the x-axis. 3) In the plot include a vertical line at the expected value of Bin(n,p).

Write a function plot_binom, that takes input values: n and p, and returns the density plot of Bin(n,p).

 plot_binom <- function(n,p){
  x <- 0:n
   bin <-dbinom(x,n,p)
   mu_x <- n*p
   plot(x, bin, type = "h", lwd = 5,
    main = paste("Density =",n,",p= ",p),
    xlab = "x", ylab = "P(X=x)")
  abline(v = mu_x, col ='red', lwd = 4)
 }

Fix n = 40. Compute plots of the pmf for the following values of p: 0.05, 0.1, 0.4, 0.6, 0.9, 0.95. Have all the plots on the same frame.

n <- 40
values <- c(0.05, 0.1, 0.4, 0.6, 0.9, 0.95)
par(mfrow = c(3, 2))
for(p in values){
  plot_binom(n, p)
}

What do you notice about the shape of the plots? Consider skewness and symmetry. YOUR ANSWER: as the probability increases, the graph tends to skew right, and the lower the probability the more left the graph will skew. Regarding the expected value, all of the graphs appear symmertic about their respective expected values.

1.2 Cumulative Distribution Functions

Write a function to plot the cumulative distribution function of the Binomial random variable, call it ‘plot_binom_cdf’.

plot_binom_cdf <- function(n, p){
  x <- 0:n
  bin <-pbinom(x,n,p)
  plot(bin,type="s",lwd=2,
       main=paste("Density =",n,",p = ",p),
       xlab= "x",ylab= "P(X=x)")
  abline(h=0.8,col="red")
}

Fix n = 40. Compute plots of the of the graphs cdf for the following values of p: 0.05, 0.1, 0.4, 0.6, 0.9, 0.95. Have all graphs on the same plot (with different colors). Draw a horizontal line at \(y=0.8\).

n <- 40
values <- c(0.05, 0.1, 0.4, 0.6, 0.9, 0.95)
par(mfrow = c(3, 2))
for(p in values){
  plot_binom_cdf(n, p)
}

Interpret the values of x where the line y=0.8 intersects the graphs of the different cdf. YOUR ANSWER: the values of x where the line y=0.8 intersects the different CDF graphs represents how many values will be within range for a given probability value. Take for example p=0.1, the y=.8 value corresponds to about x=5, which means of the 40 values, 80% of the values will be 5.

2 Poisson Distribution.

2.1 Probability Mass Functions

We will plot the mass function for the Poison distribution Pois(mu). Note: 1) The values for this random variable are: 0, 1, 2, 3, …. 2) The density plot will have a bar of height P(X=k), at the point ‘k’ on the x-axis. 3) Since most of the densities will be concentrated at lower values of k, we will fix a large enough value of n, say n = 100, when drawing the density plots. 3) In the plot include a vertical line at the expected value of Pois(mu).

Write a function plot_pois, that takes input values: mu, and returns the density plot of Pois(mu).

n <- 100 
plot_pois_pmf <- function(mu){
  x <- 0:n
  pois <- dpois(x, lambda = mu)
  mu_X <- mu
  plot(x, pois, type = "h", lwd = 3,
       main = paste("Poisson density: mu = ", mu),
       xlab = "x", ylab = "P(X=x)")
  abline(v = mu_X, col = 'red', lwd = 3)
}

For the following values of mu compute the plots for the pmf of the Poisson distribution: mu: 0.5, 1, 5, 8, 20, 50. Have all plots on the same frame.

values <- c(0.5,1,5,8,20,50)
par(mfrow = c(3,2))
for(mu in values){
  plot_pois_pmf(mu)
}

2.2 Cumulative Distribution Functions

Write a function ‘plot_pois_cdf’ that takes input mu and returns the plot of the cdf of the Pois(mu)

n<-100
plot_pois_cdf <- function(mu) {
  x<-0:n
  pois=ppois(x,mu)
  plot(x,pois,type="s",lwd=2,
       main=paste("Poisson Distribution Function mu = ", mu),
       xlab="number of events x",
       ylab="P(X=x)")
}

For the following values of mu compute the plots for the pmf of the Poisson distribution: mu: 0.5, 1, 5, 8, 20, 50. Have all plots on the same frame.

values <- c(0.5,1,5,8,20,50)
par(mfrow = c(3,2))
for(mu in values){
  plot_pois_cdf(mu)
}

3 Comparing Poison with Binomial

We say two random variables X and Y are identically distributed if they have the same cumulative distribution functions, that is \[ F_X(u) = F_Y(u) \quad \quad \forall x\in \mathbb{R}\] In class we noted the Poisson approximation of the Binomial distribution. Let’s visualize it now:

Plot the graphs of the cdf of \(\text{Binom}(n=10, p=0.5)\) and \(\text{Pois}(\mu= 5)\) on the same plot.

par(mfrow = c(2,1))
plot_binom_cdf(10,.5)
plot_pois_cdf(5)

Do the graphs overlap? Why/Why not? YOUR ANSWER: the graphs do overlap, we observe they follow the same trends.

Now plot the graphs of the cdf of \(\text{Binom}(n= 1000, p= 0.005)\) and \(\text{Pois}(\mu= 5)\).

par(mfrow = c(2,1))
plot_binom_cdf(1000,.0005)
plot_pois_cdf(5)

Do the graphs overlap? Why/Why not? YOUR ANSWER: Yes, for the same reasons as n=10,p=0.5 and mu=5, while it is harder to see because the range of x values is 1000 for the binomial cdf.

4 Normal Distibution

In this section we will explore the normal distribution.

4.1 Fixed mean, varying standard deviation

4.1.1 Plotting Densities

Set \(\mu = 5\). For values of \(\sigma\) given by \(0.2, 0.4, 0.8, 1, 1.3, 1.8, 2\), plot the densities of \(N(\mu, \sigma)\) in the same plot. It might help if (1) you have the densities of \(N(\mu = 5, \sigma = 0.2)\) and \(N(\mu = 5, \sigma = 2)\) to be blue in color and the rest to be red. (2) choose appropriate limits for the x-axis (use x_lim parameter in the plot funtion) and y-axis (use y_lim).

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors

x <- seq(-6,6,length.out = 1000)
x <- 5+x


y_0.2 <- dnorm(x,mean = 5, sd = 0.2)
y_0.4 <- dnorm(x,mean = 5, sd = 0.4)
y_0.8 <- dnorm(x,mean = 5, sd = 0.8)
y_1 <- dnorm(x,mean = 5, sd = 1)
y_1.3 <- dnorm(x,mean = 5, sd = 1.3)
y_1.8 <- dnorm(x,mean = 5, sd = 1.8)
y_2 <- dnorm(x,mean = 5, sd = 2)

plot_data_pdf <- data.frame(x,y_0.2, y_0.4, y_0.8,y_1, y_1.3, y_1.8, y_2)

plot_data_pdf_new <- pivot_longer(
  data = plot_data_pdf,
  cols = c(y_0.2, y_0.4, y_0.8,y_1, y_1.3, y_1.8, y_2),
  names_to = "std_devs"
)

ggplot(data = plot_data_pdf_new,
       aes(x=x,y=value,group = std_devs,
           color = std_devs))+
  geom_line()+
  geom_hline(yintercept = 0, linetype="dashed") +
  geom_vline(xintercept = 5, linetype="dashed") +
  labs(title = "Normal Density Plots: Varying Std Devs",
       subtitle = "",
       x = "x-axis",
       y = "density")

What do you notice about the plot? Comment about how the width changes. The plot gets wider as the std_devs increase as expected since the data is distributed over a wider range of values.

4.1.2 Plotting Cummulative Distribution Functions

Set \(\mu = 5\). For values of \(\sigma\) given by \(0.2, 0.4, 0.8, 1, 1.3, 1.8, 2\), plot the cummulative distribution function of \(N(\mu, \sigma)\) in the same plot. It might help if (1) you have the cdf of \(N(\mu = 5, \sigma = 0.2)\) and \(N(\mu = 5, \sigma = 2)\) to be blue in color and the rest to be red. (2) choose appropriate limits for the x-axis (use x_lim parameter in the plot funtion) and y-axis (use y_lim).

  z_0.2 <- pnorm(x,mean = 5, sd = 0.2)
z_0.4 <- pnorm(x,mean = 5, sd = 0.4)
z_0.8 <- pnorm(x,mean = 5, sd = 0.8)
z_1 <- pnorm(x,mean = 5, sd = 1)
z_1.3 <- pnorm(x,mean = 5, sd = 1.3)
z_1.8 <- pnorm(x,mean = 5, sd = 1.8)
z_2 <- pnorm(x,mean = 5, sd = 2)

plot_data_pdf <- data.frame(x,z_0.2, z_0.4, z_0.8,z_1, z_1.3, z_1.8, z_2)

plot_data_cdf_new <- pivot_longer(
  data = plot_data_pdf,
  cols = c(z_0.2, z_0.4, z_0.8,z_1, z_1.3, z_1.8, z_2),
  names_to = "std_devs"
)

ggplot(data = plot_data_cdf_new,
       aes(x=x,y=value,group = std_devs,
           color = std_devs))+
  geom_line()+
  geom_hline(yintercept = 0, linetype="dashed") +
  geom_vline(xintercept = 5, linetype="dashed") +
  labs(title = "Cumulative Dist Plots: Varying Std Devs",
       subtitle = "mu",
       x = "x-axis",
       y = "density")

What information does the point of intersection of two cdfs give us? - The point of intersection tells us where the mu is and all of them intersect at that point due to having the name mu/mean.

4.2 Varying mean, fixed standard deviation

4.2.1 Plotting Densities

Set \(\sigma = 0.4\). For values of \(\mu\) given by \(-1, -0.5, 0, 0.4, 0.9, 2.5, 4\) plot the densities of \(N(\mu, \sigma)\) in the same plot. You might need to choose appropriate limits for the x-axis.

library(tidyverse)

sigma <- 0.4
mu <- c(-1, -0.5, 0, 0.4, 0.9, 2.5, 4)

x <- seq(min(mu)-3*sigma, max(mu)+3*sigma, length.out = 1000)
plot_data_pdf <- data.frame(x)

for (i in seq_along(mu)) {
  plot_data_pdf[paste0("y_", mu[i])] <- dnorm(x, mean = mu[i], sd = sigma)
}

plot_data_pdf_new <- pivot_longer(
  data = plot_data_pdf,
  cols = starts_with("y_"),
  names_to = "mu",
  values_to = "density"
) 

ggplot(data = plot_data_pdf_new,
       aes(x = x, y = density, color = mu)) +
  geom_line() +
  geom_vline(xintercept = mu, linetype = "dashed") +
  labs(title = "Normal Density Plots: Varying std_dev",
       subtitle = expression(paste(sigma, " = 0.4")),
       x = "x",
       y = "density",
       color = "std_dev")

4.2.2 Plotting Cummulative Distribution Functions

Set \(\sigma = 0.4\). For values of \(\mu\) given by \(-1, -0.5, 0, 0.4, 0.9, 2.5, 4\) plot the cumulative distribution functions of \(N(\mu, \sigma)\) in the same plot. You might need to choose appropriate limits for the x-axis.

library(tidyverse)

sigma <- 0.4
mu <- c(-1, -0.5, 0, 0.4, 0.9, 2.5, 4)

x <- seq(min(mu)-3*sigma, max(mu)+3*sigma, length.out = 1000)
plot_data_pdf <- data.frame(x)

for (i in seq_along(mu)) {
  plot_data_pdf[paste0("y_", mu[i])] <- pnorm(x, mean = mu[i], sd = sigma)
}

plot_data_pdf_new <- pivot_longer(
  data = plot_data_pdf,
  cols = starts_with("y_"),
  names_to = "mu",
  values_to = "density"
)

ggplot(data = plot_data_pdf_new,
       aes(x = x, y = density, color = mu)) +
  geom_line() +
  geom_vline(xintercept = mu, linetype = "dashed") +
  labs(title = "Cumulative Dist Plots: Varying Std",
       subtitle = expression(paste(sigma, " = 0.4")),
       x = "x",
       y = "density",
       color = "std_dev")

5 Exponential Distribution

5.1 Probablity Density Function

For values of \(\lambda\) in \((0.2, 0.5, 1, 2, 8, 10)\) plot the graphs of the densities of \(\text{Exp}(\lambda)\) in the same plot.

lambda_values <- c(0.2, 0.5, 1, 2, 8, 10)
x <- seq(0, 5, length = 100)
par(mfrow = c(1, 2))
plot(NULL, xlim = c(0, 5), ylim = c(0, 2), xlab = "x", ylab = "f(x)", main = "PDF of Exp(lambda)")
for (i in 1:length(lambda_values)) {
  lines(x, dexp(x, rate = lambda_values[i]), col = i)
}
legend("topright", legend = lambda_values, title = "lambda", col = 1:length(lambda_values), lty = 1)

5.2 Cummulative Distribution Function

For values of \(\lambda\) in \((0.2, 0.5, 1, 2, 8, 10)\) plot the graphs of the cumulative distribution function of \(\text{Exp}(\lambda)\) in the same plot. Draw a horizontal line at \(y=0.8\).

plot(NULL, xlim = c(0, 5), ylim = c(0, 1), xlab = "x", ylab = "F(x)", main = "CDF of Exp(lambda)")
for (i in 1:length(lambda_values)) {
  lines(x, pexp(x, rate = lambda_values[i]), col = i)
}
abline(h = 0.8, col = "red", lty = 2)
legend("bottomright", legend = lambda_values, title = "lambda", col = 1:length(lambda_values), lty = 1)

Interpret the values of \(x\) where the line \(y=0.8\) intersects with the graphs of the cdfs. YOUR ANSWER: #The values of x where the line y = 0.8 intersects with the CDFs represent where the probability of observing a value less than or equal to x is 0.8. These values depend on what lambda is, and note that whenever lambda is high, the values become smaller.

6 Gamma Distibution

We will plot the Gamma distribution for different shapes and scales. You might need to adjust the limits of x and y axes appropriately.

6.1 Different Shapes

For values of \(\alpha \in (0.3,0.7, 1, 1.5, 2, 2.5, 10)\), plot the Plot the densities of \(\text{Gamma}(\alpha, \beta = 1)\) in a single plot.

alpha_values <- c(0.3, 0.7, 1, 1.5, 2, 2.5, 10)
beta_values <- c(0.2, 0.6, 1, 1.5, 2)
plot_gamma <- function(alpha, beta_values) {
  if (alpha < 1) {
    x_limits <- c(0, 5)
  } else {
    x_limits <- c(alpha - 3 * sqrt(alpha), alpha + 3 * sqrt(alpha))
  }
  plot(
    NULL,
    xlim = x_limits,
    ylim = c(0, 1.5),
    xlab = "x",
    ylab = "Density"
  )
  legend_values <- character()
  for (beta in beta_values) {
    curve(
      dgamma(x, shape = alpha, rate = beta),
      add = TRUE,
      col = rainbow(length(beta_values))[which(beta_values == beta)],
      lty = 1
    )
    legend_values <- c(legend_values, paste0("Beta = ", beta))
  }
  legend(
    "topright",
    legend = legend_values,
    col = rainbow(length(beta_values)),
    lty = 1
  )
}
par(mfrow = c(2, 4))
for (i in 1:length(alpha_values)) {
  plot_gamma(alpha_values[i], 1)
  title(paste0("Alpha = ", alpha_values[i]))
}

For each of the shapes, identify a feature that distinguishes one shape from the other. This to consider a: does it have a peak, concavity, presence of inflection points, ect. YOUR RESPONSE: # We can see that regarding the Gamma distribution, the alpha values, say 0.7 that are smaller tend to display as more skewed, whereas larger alpha values, say 10, tend to appear more symmetrical.

6.2 \(\alpha =1\) , varying scales

Set \(\alpha = 1\), vary \(\beta\) over \(0.2, 0.6, 1, 1.5, 2\). Plot the densities of \(\text{Gamma}(\alpha, \beta)\) in a single plot.

plot_gamma(1, beta_values)
title("Alpha = 1, Varying Scales")

6.3 \(\alpha =0.6\) , varying scales

Set \(\alpha = 0.6\), vary \(\beta\) over \(0.2, 0.6, 1, 1.5, 2\). Plot the densities of \(\text{Gamma}(\alpha, \beta)\) in a single plot.

plot_gamma(0.6, beta_values)
title("Alpha = 0.6, Varying Scales")

6.4 \(\alpha = 2\) , varying scales

Set \(\alpha = 2\), vary \(\beta\) over \(0.2, 0.6, 1, 1.5, 2\). Plot the densities of \(\text{Gamma}(\alpha, \beta)\) in a single plot.

plot_gamma(2, beta_values)
title("Alpha = 2, Varying Scales")

6.5 \(\alpha = 5\) , varying scales

Set \(\alpha = 5\), vary \(\beta\) over \(0.2, 0.6, 1, 1.5, 2\). Plot the densities of \(\text{Gamma}(\alpha, \beta)\) in a single plot.

plot_gamma(5, beta_values)
title("Alpha = 5, Varying Scales")

Distributions: Discrete and Continuous

Cory Powers, Aash Lyer, Muzzamil Khan