library(ggplot2)
x <- seq(-5, 5, length.out = 200)
data_normal <- data.frame(x = x, y = dnorm(x), distribution = 'Normal')
dof_values <- c(2, 5, 15, 30, 120)
data_t <- do.call(rbind, lapply(dof_values, function(dof) {
data.frame(x = x, y = dt(x, df = dof), distribution = paste('t (df =', dof, ')'))
}))
data_all <- rbind(data_normal, data_t)
# Plot
ggplot(data_all, aes(x = x, y = y, color = distribution)) +
geom_line() +
labs(title = 'Normal and t Distributions',
x = 'Value',
y = 'Probability Density') +
theme_minimal() +
scale_color_brewer(palette = "Set1")
library(ggplot2)
set.seed(123)
mu <- 108
sigma <- 7.2
data_values <- rnorm(n = 1000, mean = mu, sd = sigma)
z_scores <- (data_values - mu) / sigma
data_normal <- data.frame(Value = data_values, Type = 'Normal Distribution')
data_zscores <- data.frame(Value = z_scores, Type = 'Z-score Distribution')
data_all <- rbind(data_normal, data_zscores)
# Plot
ggplot(data_all, aes(x = Value, fill = Type)) +
geom_histogram(bins = 30, alpha = 0.7) +
facet_wrap(~Type, scales = 'free') +
labs(title = 'Comparison of Normal Distribution and Z-score Distribution',
x = 'Value',
y = 'Frequency') +
theme_minimal() +
scale_fill_brewer(palette = "Set1")
This similarity in distributional shape is expected because the Z-score
transformation is a standardization process. It rescales the data to
have a mean of 0 and a standard deviation of 1, but it does not alter
the fundamental shape of the distribution.
I mean p-value answers the question: “If the null hypothesis were true, what is the probability of observing a result as extreme as, or more extreme than, what was actually observed?” Meanwhile, p-value does not measure the probability that the null hypothesis is true or false. It only indicates the probability of the data being observed under the assumption that the null hypothesis is true.