This project will help us visualize various functions attached to the different distributions learned in class.

1 Binomial distribution.

1.1 Probability Mass Function

We will plot the probability mass function for the binomial distribution Bin(n, p). Note: 1) The values for this random variable are 0, 1, 2, …, n. 2) The density plot will have a bar of height P(X=k), at the point ‘k’ on the x-axis. 3) In the plot include a vertical line at the expected value of Bin(n,p).

Write a function plot_binom, that takes input values: n and p, and returns the density plot of Bin(n,p).

plot_binom_pmf <- function(n, p) {
  x <- 0:n
  binomvals <- dbinom(x, size=n, prob=p)
  mu_X <- n * p
  plot(x, binomvals, type="h", lwd=5,
       main="Binomial Density",
       xlab="x", ylab="P(X=x)")
  abline(v=mu_X, col='red', lwd=4)
}

Fix n = 40. Compute plots of the pmf for the following values of p: 0.05, 0.1, 0.4, 0.6, 0.9, 0.95. Have all the plots on the same frame.

n <- 40
p_values <- c(0.05, 0.1, 0.4, 0.6, 0.9, 0.95)
for(p in p_values)
  plot_binom_pmf(n,p)

What do you notice about the shape of the plots? Consider skewness and symmetry. YOUR ANSWER: As the probability increases the graph skews right. As the probability decreases the graph skews left. When the probability is in the middle the graph is symmetrical.

1.2 Cumulative Distribution Functions

Write a function to plot the cumulative distribution function of the Binomial random variable, call it ‘plot_binom_cdf’.

plot_binom_cdf <- function(n, p) {
  x <- 0:n
  bin <- pbinom(x, n, p)
  
  # Set up the plot
  plot(x, bin, type = "s", lwd = 2,
       main = paste("Binomial CDF: n =", n, ", p =", p),
       xlab = "x", ylab = "P(X ≤ x)", ylim = c(0, 1))
  
  # Add a horizontal line at y = 0.8
  abline(h = 0.8, col = "red", lty = 2)
}

Fix n = 40. Compute plots of the of the graphs cdf for the following values of p: 0.05, 0.1, 0.4, 0.6, 0.9, 0.95. Have all graphs on the same plot (with different colors). Draw a horizontal line at \(y=0.8\).

x <- seq(-10, 20, length.out = 1000)
#plot(x, pbinom(x, size = 15, prob = 0.8))

par(mfrow = c(2,1))
plot(x, pbinom(x, size = 15, prob = 0.99),
     main = "CDF of Binomial with p = 0.99",
     xlab = "x_values",
     ylab = "cdf of Binnomial")
plot(x, pbinom(x, size = 15, prob = 0.6))

plot(x, pbinom(x, size = 15, prob = 0.4))
plot(x, pbinom(x, size = 15, prob = 0.05))

Interpret the values of \(x\) where the line \(y=0.8\) intersects the graphs of the different cdf. YOUR ANSWER: When the graph intersects with y-0.8 is the number of values that are within range for a given probability value.

2 Poisson Distribution.

2.1 Probability Mass Functions

We will plot the mass function for the Poison distribution Pois(mu). Note: 1) The values for this random variable are: 0, 1, 2, 3, …. 2) The density plot will have a bar of height P(X=k), at the point ‘k’ on the x-axis. 3) Since most of the densities will be concentrated at lower values of k, we will fix a large enough value of n, say n = 100, when drawing the density plots. 3) In the plot include a vertical line at the expected value of Pois(mu).

The following function is a first jab at writing a function to plot the pmf of the Poisson Distribution

n <- 100 
plot_pois_pmf <- function(mu){
  x <- 0:n
  pois <- dpois(x, mu)
  mu_X <- mu
  plot(x, pois, type = "h", lwd = 3,
       main = paste("Poisson density: mu = ", mu),
       xlab = "x", ylab = "P(X=x)")
  abline(v = mu_X, col = 'red', lwd = 3)
}

plot_pois_pmf(5)

For the following values of mu compute the plots for the pmf of the Poisson distribution: mu: 0.5, 1, 5, 8, 20, 50. Have all plots on the same frame.

mu <- 40
mu_values <- c(0.5, 1, 5, 8, 20 , 50)
for(mu in mu_values)
  plot_pois_pmf(mu)

2.2 Cumulative Distribution Functions

Write a function ‘plot_pois_cdf’ that takes input \(\mu\) and returns the plot of the cdf of the \(\text{Pois}(\mu)\).

n<-100
plot_pois_cdf <- function(mu) {
  x<-0:n
  pois=ppois(x,mu)
  plot(x,pois,type="s",lwd=2,
       main=paste("Poisson Distribution Function mu = ", mu),
       xlab="number of events x",
       ylab="P(X=x)")
}

For the following values of mu compute the plots for the pmf of the Poisson distribution: mu: 0.5, 1, 5, 8, 20, 50. Have all plots on the same frame.

mu <- 40
mu_values <- c(0.5, 1, 5, 8, 20 , 50)
for(mu in mu_values)
  plot_pois_cdf(mu)

3 Comparing Poison with Binomial

We say two random variables \(X\) and \(Y\) are identically distributed if they have the same cumulative distribution functions, that is \[ F_X(u) = F_Y(u) \quad \quad \forall x\in \mathbb{R}\] We can use the Poisson distribution to approximate the Binomial distribution. Let’s visualize it now:

Plot the graphs of the cdf of \(\text{Binom}(n=10, p=0.5)\) and \(\text{Pois}(\mu= 5)\) on the same plot.

par(mfrow = c(2,1))
plot_binom_cdf(10,0.5)
plot_pois_cdf(5)

Do the graphs overlap? Why/Why not? YOUR ANSWER: Yes the graphs overlap because n is large with a parameter of 10 in this instance and p is small with a parameter of 0.5. The expected value is 5 which is equal to the parameter for the Poisson distribution which is 5 in this instance.

Now plot the graphs of the cdf of \(\text{Binom}(n= 1000, p= 0.005)\) and \(\text{Pois}(\mu= 5)\).

par(mfrow = c(2,1))
plot_binom_cdf(1000,0.005)
plot_pois_cdf(5)

Do the graphs overlap? Why/Why not? YOUR ANSWER: Yes the graphs overlap because n is large and p is small for the binomial. The expected value is 5 which is equal to the parameter for the Poisson distribution

4 Normal Distibution

In this section we will explore the normal distribution.

4.1 Fixed mean, varying standard deviation

4.1.1 Plotting Densities

Set \(\mu = 5\). For values of \(\sigma\) given by \(0.2, 0.4, 0.8, 1, 1.3, 1.8, 2\), plot the densities of \(N(\mu, \sigma)\) in the same plot. It might help if (1) you have the densities of \(N(\mu = 5, \sigma = 0.2)\) and \(N(\mu = 5, \sigma = 2)\) to be blue in color and the rest to be red. (2) choose appropriate limits for the x-axis (use x_lim parameter in the plot funtion) and y-axis (use y_lim).

Method 1: Using ‘plot’ function from R-base

mu <- 5
sds <- c(0.4, 0.8, 1, 1.3, 1.8, 2)

x <- seq(-6, 6, length.out = 1000)
x <- 5+x
y0.2 <- dnorm(x, mean = 5, sd=0.2)
y2 <- dnorm(x, mean = 5, sd=2)

plot(x, y0.2, 
     type = 'l',
     main= "Plot of Normal Density with mean 5", col = 'blue')
abline(v=5, h=0)
for(std in sds){
  y_temp <- dnorm(x, mean = 5, sd=std)
  lines(x, y_temp,
        type = 'l',
        col = 'red')
}

Method 2: Using ggplot

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

x <- seq(-6, 6, length.out = 1000)
x <- 5+x

y_0.2 <- dnorm(x, mean= 5, sd = 0.2)
y_0.4 <- dnorm(x, mean= 5, sd = 0.4)
y_0.8 <- dnorm(x, mean= 5, sd = 0.8)
y_1 <- dnorm(x, mean= 5, sd = 1)
y_1.3 <- dnorm(x, mean= 5, sd = 1.3)
y_1.8 <- dnorm(x, mean = 5, sd = 1.8)
y_2 <- dnorm(x, mean= 5, sd = 2)

# Create a data frame with the calculated densities
plot_data_pdf <- data.frame(x, y_0.2, y_0.4, y_1.8, y_1, y_1.3, y_1.8, y_2)

# Reshape the data frame to long format
plot_data_pdf_new <- pivot_longer(plot_data_pdf, 
                                  cols = c(y_0.2, y_0.4, y_1.8, y_1, y_1.3, y_1.8, y_2), 
                                  names_to = "std_devs", 
                                  values_to = "value")

ggplot(data = plot_data_pdf_new, 
       aes(x = x, 
           y = value, 
           group = std_devs, 
           color = std_devs)) +
  geom_line() +
  labs(title = "Normal Density Plots: Varying Std Devs", 
       x = "x-axis", 
       y = "density")

What do you notice about the plot? Comment about how the width changes. As the standard deviation increases the width of the graphs increases. This means that the values of the x-axis is more spread out from the mean.

4.1.2 Plotting Cummulative Distribution Functions

Set \(\mu = 5\). For values of \(\sigma\) given by \(0.2, 0.4, 0.8, 1, 1.3, 1.8, 2\), plot the cummulative distribution function of \(N(\mu, \sigma)\) in the same plot. It might help if (1) you have the cdf of \(N(\mu = 5, \sigma = 0.2)\) and \(N(\mu = 5, \sigma = 2)\) to be blue in color and the rest to be red. (2) choose appropriate limits for the x-axis (use x_lim parameter in the plot funtion) and y-axis (use y_lim).

z_0.2 <- pnorm(x, mean= 5, sd = 0.2)
z_0.4 <- pnorm(x, mean= 5, sd = 0.4)
z_0.8 <- pnorm(x, mean= 5, sd = 0.8)
z_1 <- pnorm(x, mean= 5, sd = 1)
z_1.3 <- pnorm(x, mean= 5, sd = 1.3)
z_1.8 <- pnorm(x, mean = 5, sd = 1.8)
z_2 <- pnorm(x, mean= 5, sd = 2)

plot_data_cdf <- data.frame(x,z_0.2,z_0.4,z_0.8,z_1,z_1.3, z_1.8, z_2)

plot_data_cdf_new <- pivot_longer(plot_data_cdf,
                                  cols = c(z_0.2,z_0.4,z_0.8,z_1,z_1.3, z_1.8, z_2),
                                  names_to = "std_devs"
                                  )

ggplot(data = plot_data_cdf_new,
  aes(x = x, 
      y = value, 
      group = std_devs,
      color = std_devs))+
  geom_line()+
  labs(title = "Normal Density Plots: Varying Std Devs",
       x = "x-axis", 
       y = "density")

What information does the point of intersection of two cdfs give us? represents where the mu is. mu = mean ## Varying mean, fixed standard deviation ### Plotting Densities Set \(\sigma = 0.4\). For values of \(\mu\) given by \(-1, -0.5, 0, 0.4, 0.9, 2.5, 4\) plot the densities of \(N(\mu, \sigma)\) in the same plot. You might need to choose appropriate limits for the x-axis.

sd_min1 <- dnorm(x, mean = -1, sd = 0.4)
sd_min0.5 <- dnorm(x, mean = -0.5, sd = 0.4)
sd_0 <- dnorm(x, mean = 0, sd = 0.4)
sd_0.4 <- dnorm(x, mean = 0.4, sd = 0.4)
sd_0.9 <- dnorm(x, mean = 0.9, sd = 0.4)
sd_2.5 <- dnorm(x, mean = 2.5, sd = 0.4)
sd_4 <- dnorm(x, mean = 4, sd = 0.4)

plot_data_cdf_mean <- data.frame(x,sd_min1, sd_min0.5, sd_0, sd_0.4, sd_0.9, sd_2.5, sd_4)
plot_data_cdf_mean_new <- pivot_longer(plot_data_cdf_mean, cols = c(sd_min1, sd_min0.5, sd_0, sd_0.4, sd_0.9, sd_2.5, sd_4), names_to = "mean")

ggplot(data = plot_data_cdf_mean_new,
  aes(x = x, 
      y = value, 
      group = mean,
      color = mean))+
  geom_line()+
  labs(title = "Normal Density Plots: Varying Std Devs",
       x = "x-axis", 
       y = "density")

4.1.3 Plotting Cummulative Distribution Functions

Set \(\sigma = 0.4\). For values of \(\mu\) given by \(-1, -0.5, 0, 0.4, 0.9, 2.5, 4\) plot the cumulative distribution functions of \(N(\mu, \sigma)\) in the same plot. You might need to choose appropriate limits for the x-axis.

cdf_min1 <- pnorm(x, mean = -1, sd = 0.4)
cdf_min0.5 <- pnorm(x, mean = -0.5, sd = 0.4)
cdf_0 <- pnorm(x, mean = 0, sd = 0.4)
cdf_0.4 <- pnorm(x, mean = 0.4, sd = 0.4)
cdf_0.9 <- pnorm(x, mean = 0.9, sd = 0.4)
cdf_2.5 <- pnorm(x, mean = 2.5, sd = 0.4)
cdf_4 <- pnorm(x, mean = 4, sd = 0.4)

plot_data_cdf_mean <- data.frame(x,cdf_min1, cdf_min0.5, cdf_0, cdf_0.4, cdf_0.9, cdf_2.5, cdf_4)
plot_data_cdf_mean_new <- pivot_longer(plot_data_cdf_mean, cols = c(cdf_min1, cdf_min0.5, cdf_0, cdf_0.4, cdf_0.9, cdf_2.5, cdf_4), names_to = "mean")

ggplot(data = plot_data_cdf_mean_new,
  aes(x = x, 
      y = value, 
      group = mean,
      color = mean))+
  geom_line()+
  labs(title = "Normal Density Plots: Varying Std Devs",
       x = "x-axis", 
       y = "density")

5 Exponential Distribution

5.1 Probablity Density Function

For values of \(\lambda\) in \((0.2, 0.5, 1, 2, 8, 10)\) plot the graphs of the densities of \(\text{Exp}(\lambda)\) in the same plot.

library(tidyverse)

## define variables, you can use either <- or =
x <- seq(0, 10, by = 0.1)
y0.2 <- dexp(x, rate = 0.2)
y0.5 <- dexp(x, rate = 0.5) ## lambda = 0.5
y = dexp(x, rate = 1) ## lambda = 1
y2 = dexp(x, rate = 2) ## lambda = 2
y8 = dexp(x, rate = 8) ## lambda = 8
y10 = dexp(x, rate = 10) ## lambda = 10
## Setting the dataframe
my_data <- data.frame(x,y0.2, y0.5,y,y2, y8, y10) 
my_new_data <- pivot_longer(my_data, cols = c(y,y2,y0.5,y0.2,y8,y10), names_to = "lambdas")

my_new_data$lambdas[my_new_data$lambdas == "y0.2"]

##   [1] "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2"
##  [11] "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2"
##  [21] "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2"
##  [31] "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2"
##  [41] "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2"
##  [51] "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2"
##  [61] "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2"
##  [71] "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2"
##  [81] "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2"
##  [91] "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2"
## [101] "y0.2"

my_new_data$lambdas[my_new_data$lambdas == "y0.5"]

##   [1] "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5"
##  [11] "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5"
##  [21] "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5"
##  [31] "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5"
##  [41] "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5"
##  [51] "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5"
##  [61] "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5"
##  [71] "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5"
##  [81] "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5"
##  [91] "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5"
## [101] "y0.5"

my_new_data$lambdas[my_new_data$lambdas == "y"]

##   [1] "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y"
##  [19] "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y"
##  [37] "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y"
##  [55] "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y"
##  [73] "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y"
##  [91] "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y"

my_new_data$lambdas[my_new_data$lambdas == "y2"]

##   [1] "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2"
##  [16] "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2"
##  [31] "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2"
##  [46] "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2"
##  [61] "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2"
##  [76] "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2"
##  [91] "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2"

my_new_data$lambdas[my_new_data$lambdas == "y8"]

##   [1] "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8"
##  [16] "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8"
##  [31] "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8"
##  [46] "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8"
##  [61] "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8"
##  [76] "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8"
##  [91] "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8"

my_new_data$lambdas[my_new_data$lambdas == "y10"]

##   [1] "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10"
##  [13] "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10"
##  [25] "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10"
##  [37] "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10"
##  [49] "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10"
##  [61] "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10"
##  [73] "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10"
##  [85] "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10"
##  [97] "y10" "y10" "y10" "y10" "y10"

## make a canvas
g <- ggplot(data = my_new_data)
## add line, you can add multiple stuff to a canvas by using +
g + geom_line(mapping = aes(x = x, y = value, group = lambdas, color = 'blue'))

5.2 Cummulative Distribution Function

For values of \(\lambda\) in \((0.2, 0.5, 1, 2, 8, 10)\) plot the graphs of the cumulative distribution function of \(\text{Exp}(\lambda)\) in the same plot. Draw a horizontal line at \(y=0.8\).

library(tidyverse)
x <- seq(0,10,by = 0.1)
y0.2 <- pexp(x, rate = 0.2)
y0.5 <- pexp(x, rate = 0.5)
y = pexp(x, rate = 1)
y2 <- pexp(x, rate = 2)
y8 <- pexp(x, rate = 8)
y10 <- pexp(x, rate = 10)

my_data <- data.frame(x, y0.2, y0.5, y, y2, y8, y10)
my_new_data <- pivot_longer(my_data, cols = c(y0.2, y0.5, y, y2, y8, y10), names_to = "lambdas")
my_new_data$lambdas[my_new_data$lambdas == "y0.2"]

##   [1] "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2"
##  [11] "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2"
##  [21] "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2"
##  [31] "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2"
##  [41] "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2"
##  [51] "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2"
##  [61] "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2"
##  [71] "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2"
##  [81] "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2"
##  [91] "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2" "y0.2"
## [101] "y0.2"

my_new_data$lambdas[my_new_data$lambdas == "y0.5"]

##   [1] "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5"
##  [11] "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5"
##  [21] "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5"
##  [31] "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5"
##  [41] "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5"
##  [51] "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5"
##  [61] "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5"
##  [71] "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5"
##  [81] "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5"
##  [91] "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5" "y0.5"
## [101] "y0.5"

my_new_data$lambdas[my_new_data$lambdas == "y"]

##   [1] "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y"
##  [19] "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y"
##  [37] "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y"
##  [55] "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y"
##  [73] "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y"
##  [91] "y" "y" "y" "y" "y" "y" "y" "y" "y" "y" "y"

my_new_data$lambdas[my_new_data$lambdas == "y2"]

##   [1] "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2"
##  [16] "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2"
##  [31] "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2"
##  [46] "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2"
##  [61] "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2"
##  [76] "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2"
##  [91] "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2" "y2"

my_new_data$lambdas[my_new_data$lambdas == "y8"]

##   [1] "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8"
##  [16] "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8"
##  [31] "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8"
##  [46] "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8"
##  [61] "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8"
##  [76] "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8"
##  [91] "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8" "y8"

my_new_data$lambdas[my_new_data$lambdas == "y10"]

##   [1] "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10"
##  [13] "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10"
##  [25] "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10"
##  [37] "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10"
##  [49] "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10"
##  [61] "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10"
##  [73] "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10"
##  [85] "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10" "y10"
##  [97] "y10" "y10" "y10" "y10" "y10"

g <- ggplot(data = my_new_data)
## add line, you can add multiple stuff to a canvas by using +
g + geom_line(mapping = aes(x = x, y = value,color = 'blue'))

Interpret the values of \(x\) where the line \(y=0.8\) intersects with the graphs of the cdfs. YOUR ANSWER: where y=0.8 intersects the graph is the probability of getting a value of x or less

6 Gamma Distibution

We will plot the Gamma distribution for different shapes and scales. You might need to adjust the limits of x and y axes appropriately.

6.1 Different Shapes

For values of \(\alpha \in (0.3,0.7, 1, 1.5, 2, 2.5, 10)\), plot the Plot the densities of \(\text{Gamma}(\alpha, \beta = 1)\) in a single plot.

x <- seq(0, 10, length.out = 1000)

y_0.3 <- dgamma(x, shape = 0.3, scale = 1)
y_0.7 <- dgamma(x, shape = 0.7, scale = 1)
y_1 <- dgamma(x, shape = 1, scale = 1)
y_1.5 <- dgamma(x, shape = 1.5, scale = 1)
y_2 <- dgamma(x, shape = 2, scale = 1)
y_2.5 <- dgamma(x, shape = 2.5, scale = 1)
y_10 <- dgamma(x, shape = 10, scale = 1)

plot_data_pdf <- data.frame(x,y_0.3,y_0.7,y_1,y_1.5,y_2, y_2.5)

plot_data_pdf_new <- pivot_longer(plot_data_pdf,
                                  cols = c(y_0.3,y_0.7,y_1,y_1.5,y_2, y_2.5),
                                  names_to = "shapes"
                                  )

ggplot(data = plot_data_pdf_new,
  aes(x = x, 
      y = value, 
      group = shapes,
      color = shapes))+
  geom_line()+
  labs(title = "Gamma Density Plots: Varying Shapes",
       x = "x-axis", 
       y = "density") + ylim(0,4)

For each of the shapes, identify a feature that distinguishes one shape from the other. This to consider a: does it have a peak, concavity, presence of inflection points, ect. YOUR RESPONSE: Smaller alpha values tend to be skewed while alpha values that are larger tend to look more symmetrical

6.2 \(\alpha =1\) , varying scales

Set \(\alpha = 1\), vary \(\beta\) over \(0.2, 0.6, 1, 1.5, 2\). Plot the densities of \(\text{Gamma}(\alpha, \beta)\) in a single plot.

alpha_0.2 <- dgamma(x, shape = 0.3, scale = 1)
alpha_0.6 <- dgamma(x, shape = 0.7, scale = 1)
alpha_1 <- dgamma(x, shape = 1, scale = 1)
alpha_1.5 <- dgamma(x, shape = 1.5, scale = 1)
alpha_2 <- dgamma(x, shape = 2, scale = 1)

plot_data_pdf_alpha <- data.frame(x,alpha_0.2, alpha_0.6, alpha_1, alpha_1.5, alpha_2)
plot_data_pdf_alpha_new <- pivot_longer(plot_data_pdf_alpha, cols = c(alpha_0.2, alpha_0.6, alpha_1, alpha_1.5, alpha_2), names_to = "scales")

ggplot(data = plot_data_pdf_alpha_new,
  aes(x = x, 
      y = value, 
      group = scales,
      color = scales))+
  geom_line()+
  labs(title = "Gamma Density Plots: Varying Scales, alpha = 1",
       x = "x-axis", 
       y = "density")

6.3 \(\alpha =0.6\) , varying scales

Set \(\alpha = 0.6\), vary \(\beta\) over \(0.2, 0.6, 1, 1.5, 2\). Plot the densities of \(\text{Gamma}(\alpha, \beta)\) in a single plot.

a0.6_0.2 <- dgamma(x, shape = 0.3, scale = 1)
a0.6_0.6 <- dgamma(x, shape = 0.7, scale = 1)
a0.6_1 <- dgamma(x, shape = 1, scale = 1)
a0.6_1.5 <- dgamma(x, shape = 1.5, scale = 1)
a0.6_2 <- dgamma(x, shape = 2, scale = 1)

plot_data_pdf_a0.6 <- data.frame(x,a0.6_0.2, a0.6_0.6, a0.6_1, a0.6_1.5, a0.6_2)
plot_data_pdf_a0.6_new <- pivot_longer(plot_data_pdf_a0.6, cols = c(a0.6_0.2, a0.6_0.2, a0.6_0.6, a0.6_1, a0.6_1.5, a0.6_2), names_to = "scales_0.6")

ggplot(data = plot_data_pdf_a0.6_new,
  aes(x = x, 
      y = value, 
      group = scales_0.6,
      color = scales_0.6))+
  geom_line()+
  labs(title = "Gamma Density Plots: Varying Scales, alpha = 0.6",
       x = "x-axis", 
       y = "density")

6.4 \(\alpha = 2\) , varying scales

Set \(\alpha = 2\), vary \(\beta\) over \(0.2, 0.6, 1, 1.5, 2\). Plot the densities of \(\text{Gamma}(\alpha, \beta)\) in a single plot.

a2_0.2 <- dgamma(x, shape = 0.3, scale = 1)
a2_0.6 <- dgamma(x, shape = 0.7, scale = 1)
a2_1 <- dgamma(x, shape = 1, scale = 1)
a2_1.5 <- dgamma(x, shape = 1.5, scale = 1)
a2_2 <- dgamma(x, shape = 2, scale = 1)

plot_data_pdf_a2 <- data.frame(x,a2_0.2, a2_0.6, a2_1, a2_1.5, a2_2)
plot_data_pdf_a2_new <- pivot_longer(plot_data_pdf_a2, cols = c(a2_0.2, a2_0.2, a2_0.6, a2_1, a2_1.5, a2_2), names_to = "scales_2")

ggplot(data = plot_data_pdf_a2_new,
  aes(x = x, 
      y = value, 
      group = scales_2,
      color = scales_2))+
  geom_line()+
  labs(title = "Gamma Density Plots: Varying Scales, alpha = 2",
       x = "x-axis", 
       y = "density")

6.5 \(\alpha = 5\) , varying scales

Set \(\alpha = 5\), vary \(\beta\) over \(0.2, 0.6, 1, 1.5, 2\). Plot the densities of \(\text{Gamma}(\alpha, \beta)\) in a single plot.

a5_0.2 <- dgamma(x, shape = 0.3, scale = 1)
a5_0.6 <- dgamma(x, shape = 0.7, scale = 1)
a5_1 <- dgamma(x, shape = 1, scale = 1)
a5_1.5 <- dgamma(x, shape = 1.5, scale = 1)
a5_2 <- dgamma(x, shape = 2, scale = 1)

plot_data_pdf_a5 <- data.frame(x,a5_0.2, a5_0.6, a5_1, a5_1.5, a5_2)
plot_data_pdf_a5_new <- pivot_longer(plot_data_pdf_a5, cols = c(a5_0.2, a5_0.2, a5_0.6, a5_1, a5_1.5, a5_2), names_to = "scales_5")

ggplot(data = plot_data_pdf_a5_new,
  aes(x = x, 
      y = value, 
      group = scales_5,
      color = scales_5))+
  geom_line()+
  labs(title = "Gamma Density Plots: Varying Scales, alpha = 5",
       x = "x-axis", 
       y = "density")

Distributions: Discrete and Continuous

Rachael Thyparambil 120654862