1. Compare the normal distribution to Student’s t-distribution as the degrees of freedom increase.

x <- seq(-4, 4, length.out = 100)
normal_density <- dnorm(x)
t_density_df2 <- dt(x, df = 2)
t_density_df5 <- dt(x, df = 5)
t_density_df15 <- dt(x, df = 15)
t_density_df30 <- dt(x, df = 30)
t_density_df120 <- dt(x, df = 120)

plot(x, normal_density, type = "l", col = "blue", 
     main = "Normal vs. T-Distributions", 
     ylab = "Density", xlab = "x-values", lwd = 2)
lines(x, t_density_df2, col = "red", lty = 2, lwd = 2)
lines(x, t_density_df5, col = "orange", lty = 3, lwd = 2)
lines(x, t_density_df15, col = "green", lty = 3, lwd = 2)
lines(x, t_density_df30, col = "purple", lty = 4, lwd = 2)
lines(x, t_density_df120, col = "yellow", lty = 3, lwd = 2)
legend("topright", 
       legend = c("Normal", "T (df=2)", "T (df=5)","T (df=15)", "T (df=30)", "T (df=120)"),
       col = c("blue", "red", "orange", "green", "purple", "yellow"), 
       lty = c(1, 2, 3, 4), lwd = 2, cex = 0.8)

2. Normal Distribution vs. Z-Score distribution

set.seed(123)  # Set seed for reproducibility
mu      <-  108
sigma <-  7.2
data_values <- rnorm(n = 1000,   mean = mu,  sd = sigma   ) 
z_scores <- scale(data_values)
par(mfrow=c(1,2))
hist(data_values)
hist(z_scores)

The z-score distribution looks to have the same shape as the normal distribution. This would be because the values from the z-score distribution came from a normal distribution. The z-score calculation doesn’t normalize the data, so it should have the same shape as the data the scores originated from.

3. What are p-values?

A p-value is a measurement used to determine if a sample of data is statistically significant. In hypothesis testing, it represents the probability that the sample can be modeled with the initially assumed distribution. The closer the value is to zero, the more likely it becomes that initial distribution does not properly model the data.