The (student) t distribution converges to normal distribution as the degrees of freedom increase (beyond 120). Please plot a normal distribution, and a few t distributions on the same chart with 2, 5, 15, 30, 120 degrees of freedom.
library(ggplot2)
# Set seed for reproducibility
set.seed(224)
# Generating a sequence of values from -4 to 4 with 200 points in between
part1_data <- seq(-4, 4, length.out = 200)
# Creating a data frame for nomal distribution
normal_dist_data <- data.frame(x = part1_data,
y = dnorm(part1_data ),
distribution = 'Normal')
# Creating a data frame for t distributions with different degrees of freedom
t_dist_2 <- data.frame(x = part1_data, y = dt(part1_data, df = 2), distribution = 't (df=2)')
t_dist_5 <- data.frame(x = part1_data, y = dt(part1_data, df = 5), distribution = 't (df=5)')
t_dist_15 <- data.frame(x = part1_data, y = dt(part1_data, df = 15), distribution = 't (df=15)')
t_dist_30 <- data.frame(x = part1_data, y = dt(part1_data, df = 30), distribution = 't (df=30)')
t_dist_120 <- data.frame(x = part1_data, y = dt(part1_data, df = 120), distribution = 't (df=120)')
# Combine everything
all_t_data <- rbind(normal_dist_data, t_dist_2, t_dist_5, t_dist_15, t_dist_30, t_dist_120)
# Plot Graph
ggplot(all_t_data,
aes(x = x,
y = y,
color = distribution)) +
geom_line() +
theme_minimal() +
labs(title = "Normal and t Distributions",
x = "Value",
y = "Density")
Lets work with normal data below (1000 observations with mean of 108 and sd of 7.2).
set.seed(123) # Set seed for reproducibility
mu <- 108
sigma <- 7.2
data_values <- rnorm(n = 1000, mean = mu, sd = sigma )
Plot two charts - the normally distributed data (above) and the Z score distribution of the same data. Do they have the same distributional shape ? Why or why not ?
# Set seed for reproducibility
set.seed(224)
# Generate normal data for graph
mu <- 108
sigma <- 7.2
data_values <- rnorm(n = 1000,
mean = mu,
sd = sigma)
# Calculate Z-scores
z_scores <- (data_values - mu) / sigma
# Create a layout for side-by-side histograms
par(mfrow = c(1, 2))
# Plot the original normal data
hist(data_values,
main = "Original Normal Data",
xlab = "Value",
ylab = "Frequency",
col = "lightgreen",
border = "black")
# Plot the Z-score distribution
hist(z_scores,
main = "Z-Score Distribution",
xlab = "Z-Score",
ylab = "Frequency",
col = "lightblue",
border = "black")
Both graphs, of the above data and the subsequent Z score distribution data, differ in scale but have very similar bell shaped normal distributions. Since the distribution of the above data is normally distributed, so is the distribution of z-scores, with the main difference being, that the z-scre distrbution is centered at 0 and has a standard deviation of 1. In Standard Normal Distribution (SND), the mean is 0 and the standard deviation is 1, hencewhy, the z-score distribution is centered around zero, with a standard deviation of 1.
In your own words, please explain what is p-value?
When conducting hypthosis tests, P-values are often utilized to determine the probability of obtaining results, as or more extreme, than the observed results of a statistical hypothesis test. A smaller p-value generally means that there is stronger evidence in favor of the alternative hypothesis.
When conducting Hypothesis Tests, you state a NULL (H₀) and Alternative (H₁ or Hₐ) Hypothesis (H₁ or Hₐ). Generally, a Null Hypothesis states, that there is no significance, while the Alternative Hypothesis states the opposite of the null hypothesis.
Wording wise, there are 2 possible outcomes for hypothesis testing, either you reject the null or you do not reject the null, but you never say “I accept the null hypothesis” . The reason this distinction is important in my expatiation of p-values, is that p-values based on significance level allows you to either reject or not reject the null. The null is often rejected, if p-value is less than, the significance level because the results are considered statistically significant.
For example with a significance level of 0.05, and p-value of 0.03, you would reject the null hypothis, and these results would would indicate there is evidence for the alternative hypothesis. Similarly, with a significance level of 0.05, and p-value of 0.05 or more, you would not reject the null hypothesis.