1

The (student) t distribution converges to normal distribution as the degrees of freedom increase (beyond 120). Please plot a normal distribution, and a few t distributions on the same chart with 2, 5, 15, 30, 120 degrees of freedom.

x <- seq(-10, 10, by = 0.01)

# Normal distribution
y <- dnorm(x)
plot(x, 
     y, 
     type = 'l', 
     col = 'black',
     main = "Normal Distribution and T Distributions",
     ylab = "Density")
grid() # to help with the comparison 

# T distributions with varying degrees of freedom
dt2 <- dt(x,
          df = 2)
dt5 <- dt(x,
          df = 5)
dt15 <- dt(x,
           df = 15)
dt30 <- dt(x,
           df = 30)
dt120 <- dt(x,
            df = 120)

# Lines for the T distributions with colors to differentiate them
lines(x, dt2, col = 'lightblue')
lines(x, dt5, col = 'blue')
lines(x, dt15, col = 'green')
lines(x, dt30, col = 'darkgreen')
lines(x, dt120, col = 'purple')

# Legend with corresponding names/colors/line types
legend("topright", 
       legend = c("Normal", "DF=2", "DF=5", "DF=15", "DF=30", "DF=120"), 
       col = c("black", "lightblue", "blue", "green", "darkgreen", "purple"),
       lty = c(1, 1, 1, 1, 1, 1))

2

Plot two charts - the normally distributed data (above) and the Z score distribution of the same data. Do they have the same distributional shape ? Why or why not ?

set.seed(123)
mu      <-  108
sigma <-  7.2
data_values <- rnorm(n = 1000,   mean = mu,  sd = sigma)

par(mfrow = c(1, 2)) # Put them side by side

# Normal Distribution
upper <- mu + sigma*4
lower <- mu - sigma*4
xnorm <- seq(lower, upper, by = 0.1)
y <- dnorm(xnorm, mu, sigma)
plot(xnorm,
     y,
     type = 'l',
     main = "Normal Distribution",
     xlab = "Observations",
     ylab = "Density")

# Z Score Distribution
zscores <- (data_values - mu) / sigma # Find the z score from each data_value

xz <- seq(-4, 4, by = 0.1)
yz <- dnorm(xz)
plot(xz, 
     yz, 
     type = 'l',
     main = "Z Score Distribution",
     xlab = "Z Scores",
     ylab = "Density")

The two charts do have the same distributional shape. The reason they have the same shape is that both are modeled from the same data; the z scores being used in the second plot are just normalized versions of the original observations. The distribution of the data itself does not change. However, we can see that the normal distribution is centered around the mean whereas the z score distribution is centered around 0.

3

In your own words, please explain what is p-value?

The P-value helps measure how much evidence there is against the null hypothesis. It’s important to note that the P-value doesn’t tell you the probability that the null hypothesis is true; it’s one component to consider in drawing scientific conclusions.