Dis10

Part 1

library(ggplot2)

x <- seq(-5, 5, length.out = 200)

data_normal <- data.frame(x = x, y = dnorm(x), distribution = 'Normal')
dof_values <- c(2, 5, 15, 30, 120)
data_t <- do.call(rbind, lapply(dof_values, function(dof) {
  data.frame(x = x, y = dt(x, df = dof), distribution = paste('t (df =', dof, ')'))
}))
data_all <- rbind(data_normal, data_t)

# Plot
ggplot(data_all, aes(x = x, y = y, color = distribution)) +
  geom_line() +
  labs(title = 'Normal and t Distributions',
       x = 'Value',
       y = 'Probability Density') +
  theme_minimal() +
  scale_color_brewer(palette = "Set1")

Part 2

library(ggplot2)

set.seed(123)
mu <- 108
sigma <- 7.2

data_values <- rnorm(n = 1000, mean = mu, sd = sigma)
z_scores <- (data_values - mu) / sigma

data_normal <- data.frame(Value = data_values, Type = 'Normal Distribution')
data_zscores <- data.frame(Value = z_scores, Type = 'Z-score Distribution')
data_all <- rbind(data_normal, data_zscores)

# Plot
ggplot(data_all, aes(x = Value, fill = Type)) +
  geom_histogram(bins = 30, alpha = 0.7) +
  facet_wrap(~Type, scales = 'free') +
  labs(title = 'Comparison of Normal Distribution and Z-score Distribution',
       x = 'Value',
       y = 'Frequency') +
  theme_minimal() +
  scale_fill_brewer(palette = "Set1")

This similarity in distributional shape is expected because the Z-score transformation is a standardization process. It rescales the data to have a mean of 0 and a standard deviation of 1, but it does not alter the fundamental shape of the distribution.

Part 3

I mean p-value answers the question: “If the null hypothesis were true, what is the probability of observing a result as extreme as, or more extreme than, what was actually observed?” Meanwhile, p-value does not measure the probability that the null hypothesis is true or false. It only indicates the probability of the data being observed under the assumption that the null hypothesis is true.

Dis10

Yinda Chen

2023-11-12

Part 1

Part 2

Part 3