This project will help us visualize various functions attached to the different distributions learned in class.
We will plot the probability mass function for the binomial distribution Bin(n, p). Note: 1) The values for this random variable are 0, 1, 2, …, n. 2) The density plot will have a bar of height P(X=k), at the point ‘k’ on the x-axis. 3) In the plot include a vertical line at the expected value of Bin(n,p).
Write a function plot_binom, that takes input values: n and p, and returns the density plot of Bin(n,p).
plot_binom_pmf <- function(n=2, p=0.5){
x <- 0:n
binomvals <- dbinom(x, size = n, prob = p)
plot(x, binomvals, type = "h", lwd = 5,
main = paste("Binomial density"),
xlab = "x", ylab = "P(X=x)")
mu_X = n*p
abline(v = mu_X, col = 'red', lwd = 4)
}
Fix n = 40. Compute plots of the pmf for the following values of p: 0.05, 0.1, 0.4, 0.6, 0.9, 0.95. Have all the plots on the same frame.
plot_binom_pmf(40, 0.5)
plot_binom_pmf(40, 0.05)
plot_binom_pmf(40, 0.1)
plot_binom_pmf(40, 0.4)
plot_binom_pmf(40, 0.6)
plot_binom_pmf(40, 0.9)
plot_binom_pmf(40, 0.95)
What do you notice about the shape of the plots? Consider skewness and symmetry. YOUR ANSWER: The more the probability increases the more the skewness shifts from right to left and the graphs go from peaking early with the smaller probabilities to peaking later with the larger ones and the closer to 50% the closer to evenly distributed.
Write a function to plot the cumulative distribution function of the Binomial random variable, call it ‘plot_binom_cdf’.
plot_binom_cdf <- function(n=2, p=0.5){
x <-seq(-10, 20, length.out = 1000)
binomvals <- pbinom(q = x, size = n, prob = p)
par(mfrow = c(2,1))
plot(x, binomvals,
main = paste("CDF of Binomial with p = ",p),
xlab = "x_values",
ylab = "cdf of Binnomial",
ylim = c(0, 1))
abline(h = 0.8, col = 'red', lwd = 4)
}
plot_binom_cdf(40, 0.05)
plot_binom_cdf(40, 0.1)
plot_binom_cdf(40, 0.4)
plot_binom_cdf(40, 0.6)
plot_binom_cdf(40, 0.9)
plot_binom_cdf(40, 0.95)
Fix n = 40. Compute plots of the of the graphs cdf for the following
values of p: 0.05, 0.1, 0.4, 0.6, 0.9, 0.95. Have all graphs on the same
plot (with different colors). Draw a horizontal line at \(y=0.8\).
x <- seq(-10, 20, length.out = 1000)
#plot(x, pbinom(x, size = 15, prob = 0.8))
par(mfrow = c(2,1))
plot(x, pbinom(x, size = 15, prob = 0.99),
main = "CDF of Binomial with p = 0.99",
xlab = "x_values",
ylab = "cdf of Binnomial")
plot(x, pbinom(x, size = 15, prob = 0.6))
plot(x, pbinom(x, size = 15, prob = 0.4))
plot(x, pbinom(x, size = 15, prob = 0.05))
par(mfrow=c(1,1))
Interpret the values of \(x\) where the line \(y=0.8\) intersects the graphs of the different cdf. YOUR ANSWER: The x values state where the cdf point reaches 80%
We will plot the mass function for the Poison distribution Pois(mu). Note: 1) The values for this random variable are: 0, 1, 2, 3, …. 2) The density plot will have a bar of height P(X=k), at the point ‘k’ on the x-axis. 3) Since most of the densities will be concentrated at lower values of k, we will fix a large enough value of n, say n = 100, when drawing the density plots. 3) In the plot include a vertical line at the expected value of Pois(mu).
The following function is a first jab at writing a function to plot the pmf of the Poisson Distribution
n <- 100
plot_pois_pmf <- function(mu){
x <- 0:n
pois <- dpois(x, lambda = mu)
plot(x, pois, type = "h", lwd = 5,
main = paste("Poisson density: mu = ", mu),
xlab = "x", ylab = "P(X=x)")
abline(v = mu, col = 'red', lwd = 4)
}
plot_pois_pmf(5)
For the following values of mu compute the plots for the pmf of the Poisson distribution: mu: 0.5, 1, 5, 8, 20, 50. Have all plots on the same frame.
plot_pois_pmf(0.5)
plot_pois_pmf(1)
plot_pois_pmf(5)
plot_pois_pmf(8)
plot_pois_pmf(20)
plot_pois_pmf(50)
Write a function ‘plot_pois_cdf’ that takes input \(\mu\) and returns the plot of the cdf of the \(\text{Pois}(\mu)\).
n <- 100
plot_pois_cdf <- function(mu){
lim <- 3 + mu * 2
x <- 0:lim
pois <- ppois(x, lambda = mu)
plot(x, pois, type = "s", lwd = 5,
main = paste("Poisson density: mu = ", mu),
xlab = "x", ylab = "P(X=x)")
abline(v = mu, col = 'red', lwd = 4)
}
plot_pois_cdf(0.5)
plot_pois_cdf(1)
plot_pois_cdf(5)
plot_pois_cdf(8)
plot_pois_cdf(20)
plot_pois_cdf(50)
For the following values of mu compute the plots for the pmf of the
Poisson distribution: mu: 0.5, 1, 5, 8, 20, 50. Have all plots on the
same frame.
We say two random variables \(X\) and \(Y\) are identically distributed if they have the same cumulative distribution functions, that is \[ F_X(u) = F_Y(u) \quad \quad \forall x\in \mathbb{R}\] We can use the Poisson distribution to approximate the Binomial distribution. Let’s visualize it now:
Plot the graphs of the cdf of \(\text{Binom}(n=10, p=0.5)\) and \(\text{Pois}(\mu= 5)\) on the same plot.
x = 0:10
binom_cdf <- pbinom(x, size = 10, prob = 0.5)
pois_cdf <-ppois(x,lambda = 5)
plot(x, binom_cdf, type = 'l', xlab = "x", ylab = "P(X=x)")
lines(x, pois_cdf, type = 'l', col = 'red')
par(mfrow=c(1,1))
Do the graphs overlap? Why/Why not? YOUR ANSWER: They don’t really overlap because there isn’t enough trials plus their cdfs are different.
Now plot the graphs of the cdf of \(\text{Binom}(n= 1000, p= 0.005)\) and \(\text{Pois}(\mu= 5)\).
x = 0:100
binom_cdf <- pbinom(x, size = 1000, prob = 0.005)
pois_cdf <-ppois(x,lambda = 5)
plot(x, binom_cdf, xlab = "x", ylab = "P(X=x)")
lines(x, pois_cdf, type = 'l', col = 'red')
par(mfrow=c(1,1))
Do the graphs overlap? Why/Why not? YOUR ANSWER: They do because the size is bigger and the probability is smaller so they’re very similar
In this section we will explore the normal distribution.
Set \(\mu = 5\). For values of \(\sigma\) given by \(0.2, 0.4, 0.8, 1, 1.3, 1.8, 2\), plot the densities of \(N(\mu, \sigma)\) in the same plot. It might help if (1) you have the densities of \(N(\mu = 5, \sigma = 0.2)\) and \(N(\mu = 5, \sigma = 2)\) to be blue in color and the rest to be red. (2) choose appropriate limits for the x-axis (use x_lim parameter in the plot funtion) and y-axis (use y_lim).
Method 1: Using ‘plot’ function from R-base
mu <- 5
sds <- c(0.4, 0.8, 1, 1.3, 1.8, 2)
x <- seq(-6, 6, length.out = 1000)
x <- 5+x
y <- dnorm(x, mean = 5, sd=0.2)
plot(x, y,
type = 'l',
main= "Plot of Normal Density with mean 5")
abline(v=5, h=0)
for(std in sds){
y_temp <- dnorm(x, mean = 5, sd=std)
lines(x, y_temp,
type = 'l',
col = 'red')
}
Method 2: Using ggplot
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
x <- seq(-6, 6, length.out = 1000)
x <- 5+x
y_0.2 <- dnorm(x, mean= 5, sd = 0.2)
y_0.4 <- dnorm(x, mean= 5, sd = 0.4)
y_0.8 <- dnorm(x, mean= 5, sd = 0.8)
y_1 <- dnorm(x, mean= 5, sd = 1)
y_1.3 <- dnorm(x, mean= 5, sd = 1.3)
y_1.8 <- dnorm(x, mean = 5, sd = 1.8)
y_2 <- dnorm(x, mean= 5, sd = 2)
plot_data_pdf <- data.frame(x,y_0.2,y_0.4,y_0.8,y_1,y_1.3, y_1.8, y_2)
plot_data_pdf_new <- pivot_longer(plot_data_pdf,
cols = c(y_0.2,y_0.4,y_0.8,y_1,y_1.3, y_1.8, y_2),
names_to = "std_devs"
)
ggplot(data = plot_data_pdf_new,
aes(x = x,
y = value,
group = std_devs,
color = std_devs))+
geom_line()+
labs(title = "Normal Density Plots: Varying Std Devs",
subtitle = "Jonathan Fernandes",
x = "x-axis",
y = "density")
What do you notice about the plot? Comment about how the width changes. The lower the standard deviation the higher the densitiy and the narrower the graph is distributed. This is because when the standard deviation is smaller the graph can’t stretch as far from the mean.
Set \(\mu = 5\). For values of \(\sigma\) given by \(0.2, 0.4, 0.8, 1, 1.3, 1.8, 2\), plot the cummulative distribution function of \(N(\mu, \sigma)\) in the same plot. It might help if (1) you have the cdf of \(N(\mu = 5, \sigma = 0.2)\) and \(N(\mu = 5, \sigma = 2)\) to be blue in color and the rest to be red. (2) choose appropriate limits for the x-axis (use x_lim parameter in the plot funtion) and y-axis (use y_lim).
z_0.2 <- pnorm(x, mean= 5, sd = 0.2)
z_0.4 <- pnorm(x, mean= 5, sd = 0.4)
z_0.8 <- pnorm(x, mean= 5, sd = 0.8)
z_1 <- pnorm(x, mean= 5, sd = 1)
z_1.3 <- pnorm(x, mean= 5, sd = 1.3)
z_1.8 <- pnorm(x, mean = 5, sd = 1.8)
z_2 <- pnorm(x, mean= 5, sd = 2)
plot_data_cdf <- data.frame(x,z_0.2,z_0.4,z_0.8,z_1,z_1.3, z_1.8, z_2)
plot_data_cdf_new <- pivot_longer(plot_data_cdf,
cols = c(z_0.2,z_0.4,z_0.8,z_1,z_1.3, z_1.8, z_2),
names_to = "std_devs"
)
ggplot(data = plot_data_cdf_new,
aes(x = x,
y = value,
group = std_devs,
color = std_devs))+
geom_line()+
labs(title = "Normal Density Plots: Varying Std Devs",
x = "x-axis",
y = "density")
What information does the point of intersection of two cdfs give us? The probability of one random variable is the same as another random variable for that value.
Set \(\sigma = 0.4\). For values of \(\mu\) given by \(-1, -0.5, 0, 0.4, 0.9, 2.5, 4\) plot the densities of \(N(\mu, \sigma)\) in the same plot. You might need to choose appropriate limits for the x-axis.
sd <- 0.4
mu <- c(-0.5, 0, 0.4, 0.9, 2.5, 4)
x <- seq(-6, 6, length.out = 1000)
y <- dnorm(x, mean = -1, sd=0.4)
plot(x, y,
type = 'l',
main= "Plot of Normal Density with mean 5")
abline(h=0.4)
for(msu in mu){
y_temp <- dnorm(x, mean = msu, sd=0.4)
lines(x, y_temp,
type = 'l',
col = 'red')
}
Set \(\sigma = 0.4\). For values of \(\mu\) given by \(-1, -0.5, 0, 0.4, 0.9, 2.5, 4\) plot the cumulative distribution functions of \(N(\mu, \sigma)\) in the same plot. You might need to choose appropriate limits for the x-axis.
x <- seq(-6, 6, length.out = 1000)
z_neg1 <-pnorm(x, mean = -1, sd = 0.4)
z_neg0.5 <-pnorm(x, mean = -0.5, sd = 0.4)
z_0 <-pnorm(x, mean = 0, sd = 0.4)
z_0.4 <-pnorm(x, mean = 0.4, sd = 0.4)
z_0.9 <-pnorm(x, mean = 0.9, sd = 0.4)
z_2.5 <-pnorm(x, mean = 2.5, sd = 0.4)
z_4 <-pnorm(x, mean = 4, sd = 0.4)
plot_data_cdf <- data.frame(x,z_neg1,z_neg0.5,z_0,z_0.4,z_0.9, z_2.5, z_4)
plot_data_cdf_new <- pivot_longer(plot_data_cdf,
cols = c(z_neg1,z_neg0.5,z_0,z_0.4,z_0.9, z_2.5, z_4),
names_to = "std_devs"
)
ggplot(data = plot_data_cdf_new,
aes(x = x,
y = value,
group = std_devs,
color = std_devs))+
geom_line()+
labs(title = "Normal Density Plots: Varying Std Devs",
x = "x-axis",
y = "density")
For values of \(\lambda\) in \((0.2, 0.5, 1, 2, 8, 10)\) plot the graphs of the densities of \(\text{Exp}(\lambda)\) in the same plot.
x <- seq(0, 3, length.out = 1000)
z_0.2 <-dexp(x, rate = 0.2)
z_0.5 <-dexp(x, rate = 0.5)
z_1 <-dexp(x, rate = 1)
z_2 <-dexp(x, rate = 2)
z_8 <-dexp(x, rate = 8)
z_10 <-dexp(x, rate = 10)
plot_data_pdf <- data.frame(x,z_0.2,z_0.5,z_1,z_2,z_8, z_10)
plot_data_pdf_new <- pivot_longer(plot_data_pdf,
cols = c(z_0.2,z_0.5,z_1,z_2,z_8, z_10),
names_to = "lambdas"
)
ggplot(data = plot_data_pdf_new,
aes(x = x,
y = value,
group = lambdas,
color = lambdas))+
geom_line()+
labs(title = "Exponential Probability Density Plots: Varying Rates",
x = "x-axis",
y = "Density")
For values of \(\lambda\) in \((0.2, 0.5, 1, 2, 8, 10)\) plot the graphs of the cumulative distribution function of \(\text{Exp}(\lambda)\) in the same plot. Draw a horizontal line at \(y=0.8\).
x <- seq(0, 8, length.out = 1000)
z_0.2 <-pexp(x, rate = 0.2)
z_0.5 <-pexp(x, rate = 0.5)
z_1 <-pexp(x, rate = 1)
z_2 <-pexp(x, rate = 2)
z_8 <-pexp(x, rate = 8)
z_10 <-pexp(x, rate = 10)
plot_data_cdf <- data.frame(x,z_0.2,z_0.5,z_1,z_2,z_8, z_10)
plot_data_cdf_new <- pivot_longer(plot_data_cdf,
cols = c(z_0.2,z_0.5,z_1,z_2,z_8, z_10),
names_to = "lambdas"
)
ggplot(data = plot_data_cdf_new,
aes(x = x,
y = value,
group = lambdas,
color = lambdas))+
geom_line()+
geom_hline(yintercept = 0.8, color = "red")+
labs(title = "Exponential Cummulative Plots: Varying Rates",
x = "x-axis",
y = "Probability")
Interpret the values of \(x\) where the line \(y=0.8\) intersects with the graphs of the cdfs. YOUR ANSWER: The x value states how long it takes for the cdf to reach the point of 80%
We will plot the Gamma distribution for different shapes and scales. You might need to adjust the limits of x and y axes appropriately.
For values of \(\alpha \in (0.3,0.7, 1, 1.5, 2, 2.5, 10)\), plot the Plot the densities of \(\text{Gamma}(\alpha, \beta = 1)\) in a single plot.
x <- seq(0, 3, length.out = 1000)
y_0.3 <- dgamma(x, shape = 0.3, scale = 1)
y_0.7 <- dgamma(x, shape = 0.7, scale = 1)
y_1 <- dgamma(x, shape = 1, scale = 1)
y_1.5 <- dgamma(x, shape = 1.5, scale = 1)
y_2 <- dgamma(x, shape = 2, scale = 1)
y_2.5 <- dgamma(x, shape = 2.5, scale = 1)
y_10 <- dgamma(x, shape = 10, scale = 1)
plot_data_pdf <- data.frame(x,y_0.3,y_0.7,y_1,y_1.5,y_2, y_2.5, y_10)
plot_data_pdf_new <- pivot_longer(plot_data_pdf,
cols = c(y_0.3,y_0.7,y_1,y_1.5,y_2, y_2.5, y_10),
names_to = "shapes"
)
ggplot(data = plot_data_pdf_new,
aes(x = x,
y = value,
group = shapes,
color = shapes))+
geom_line()+
labs(title = "Gamma Density Plots: Varying Shapes",
x = "x-axis",
y = "density")+ylim(0, 5)
par(mfrow=c(1,1))
For each of the shapes, identify a feature that distinguishes one shape from the other. This to consider a: does it have a peak, concavity, presence of inflection points, ect. YOUR RESPONSE: They all have a different concavity, the arcs are all different, and most of them have the starting point of 0. The smaller the alpha the higher it starts and more it resembles an flipped j.
Set \(\alpha = 1\), vary \(\beta\) over \(0.2, 0.6, 1, 1.5, 2\). Plot the densities of \(\text{Gamma}(\alpha, \beta)\) in a single plot.
x <- seq(0, 10, length.out = 1000)
y_0.2 <- dgamma(x, shape = 1, scale = 0.2)
y_0.6 <- dgamma(x, shape = 1, scale = 0.6)
y_1 <- dgamma(x, shape = 1, scale = 1)
y_1.5 <- dgamma(x, shape = 1, scale = 1.5)
y_2 <- dgamma(x, shape = 1, scale = 2)
plot_data_pdf <- data.frame(x,y_0.2,y_0.6,y_1,y_1.5,y_2)
plot_data_pdf_new <- pivot_longer(plot_data_pdf,
cols = c(y_0.2,y_0.6,y_1,y_1.5,y_2),
names_to = "rates"
)
ggplot(data = plot_data_pdf_new,
aes(x = x,
y = value,
group = rates,
color = rates))+
geom_line()+
labs(title = "Gamma Density Plots: Varying Rates with Shape of 1",
x = "x-axis",
y = "density")
par(mfrow=c(1,1))
Set \(\alpha = 0.6\), vary \(\beta\) over \(0.2, 0.6, 1, 1.5, 2\). Plot the densities of \(\text{Gamma}(\alpha, \beta)\) in a single plot.
x <- seq(0, 1, length.out = 1000)
y_0.2 <- dgamma(x, shape = 0.6, scale = 0.2)
y_0.6 <- dgamma(x, shape = 0.6, scale = 0.6)
y_1 <- dgamma(x, shape = 0.6, scale = 1)
y_1.5 <- dgamma(x, shape = 0.6, scale = 1.5)
y_2 <- dgamma(x, shape = 0.6, scale = 2)
plot_data_pdf <- data.frame(x,y_0.2,y_0.6,y_1,y_1.5,y_2)
plot_data_pdf_new <- pivot_longer(plot_data_pdf,
cols = c(y_0.2,y_0.6,y_1,y_1.5,y_2),
names_to = "rates"
)
ggplot(data = plot_data_pdf_new,
aes(x = x,
y = value,
group = rates,
color = rates))+
geom_line()+
labs(title = "Gamma Density Plots: Varying Rates with Shape of 0.6",
x = "x-axis",
y = "density")
par(mfrow=c(1,1))
Set \(\alpha = 2\), vary \(\beta\) over \(0.2, 0.6, 1, 1.5, 2\). Plot the densities of \(\text{Gamma}(\alpha, \beta)\) in a single plot.
x <- seq(0, 10, length.out = 1000)
y_0.2 <- dgamma(x, shape = 2, scale = 0.2)
y_0.6 <- dgamma(x, shape = 2, scale = 0.6)
y_1 <- dgamma(x, shape = 2, scale = 1)
y_1.5 <- dgamma(x, shape = 2, scale = 1.5)
y_2 <- dgamma(x, shape = 2, scale = 2)
plot_data_pdf <- data.frame(x,y_0.2,y_0.6,y_1,y_1.5,y_2)
plot_data_pdf_new <- pivot_longer(plot_data_pdf,
cols = c(y_0.2,y_0.6,y_1,y_1.5,y_2),
names_to = "rates"
)
ggplot(data = plot_data_pdf_new,
aes(x = x,
y = value,
group = rates,
color = rates))+
geom_line()+
labs(title = "Gamma Density Plots: Varying Rates with Shape of 2",
x = "x-axis",
y = "density")
par(mfrow=c(1,1))
Set \(\alpha = 5\), vary \(\beta\) over \(0.2, 0.6, 1, 1.5, 2\). Plot the densities of \(\text{Gamma}(\alpha, \beta)\) in a single plot.
x <- seq(0, 10, length.out = 1000)
y_0.2 <- dgamma(x, shape = 5, scale = 0.2)
y_0.6 <- dgamma(x, shape = 5, scale = 0.6)
y_1 <- dgamma(x, shape = 5, scale = 1)
y_1.5 <- dgamma(x, shape = 5, scale = 1.5)
y_2 <- dgamma(x, shape = 5, scale = 2)
plot_data_pdf <- data.frame(x,y_0.2,y_0.6,y_1,y_1.5,y_2)
plot_data_pdf_new <- pivot_longer(plot_data_pdf,
cols = c(y_0.2,y_0.6,y_1,y_1.5,y_2),
names_to = "rates"
)
ggplot(data = plot_data_pdf_new,
aes(x = x,
y = value,
group = rates,
color = rates))+
geom_line()+
labs(title = "Gamma Density Plots: Varying Rates with Shape of 5",
x = "x-axis",
y = "density")
par(mfrow=c(1,1))