##Part 1. #The (student) t distribution converges to normal distribution as the degrees of freedom increase (beyond 120). Please plot a normal distribution, and a few t distributions on the same chart with 2, 5, 15, 30, 120 degrees of freedom.
x_seq <- seq(-4, 4, length.out = 500)
normal_data <- data.frame(x = x_seq, y = dnorm(x_seq), distribution = 'Normal')
t_df2 <- data.frame(x = x_seq, y = dt(x_seq, df = 2), distribution = 't (df=2)')
t_df5 <- data.frame(x = x_seq, y = dt(x_seq, df = 5), distribution = 't (df=5)')
t_df15 <- data.frame(x = x_seq, y = dt(x_seq, df = 15), distribution = 't (df=15)')
t_df30 <- data.frame(x = x_seq, y = dt(x_seq, df = 30), distribution = 't (df=30)')
t_df120 <- data.frame(x = x_seq, y = dt(x_seq, df = 120), distribution = 't (df=120)')
all_data <- rbind(normal_data, t_df2, t_df5, t_df15, t_df30, t_df120)
ggplot(all_data,
aes(x = x, y = y, color = distribution)) +
geom_line() +
theme_minimal() +
labs(title = "Normal and t Distributions",
x = "Value",
y = "Density")
##Part 2 #Plot two charts- the normally distributed data (above) and the Z score distribution of the same data. Do they have the same distributional shape? Why or why not?
set.seed(123)
mu <- 108
sigma <- 7.2
data_values <- rnorm(n = 1000, mean = mu, sd = sigma)
z_score <- (data_values - mu) / sigma
hist(data_values,
main = "Original Normal Data",
xlab = "Value",
ylab = "Frequency",
col = "lightgreen",
border = "black")
hist(z_score,
main = "Z-Score Distribution",
xlab = "Z-Score",
ylab = "Frequency",
col = "lightblue",
border = "black")
The general bell curve of these plots is similar, obviously there is the shift of the mean and standard deviation to be normal for the z-score, additionally the z-score curve has more bars in the bar graph showing a finer degree of detail.
#In your own words, please explain what is p-value?
A p-value is a statistical measure to show whether there is good evidence to reject the null hypothesis of an argument. As noted in the ASA article, it is not used to show whether we can accept an alternative argument, but just so show if the data is incompatible with a specified statistical model.