ADAN7301.02 Week 5 Discussion

Part 1. The (student) t distribution converges to normal distribution as the degrees of freedom increase (beyond 120). Please plot a normal distribution, and a few t distributions on the same chart with 2, 5, 15, 30, 120 degrees of freedom.

library(ggplot2)

x = seq(-5, 5, by = 0.1)

normal_data = data.frame(x = x, y = dnorm(x, mean = 0, sd = 1))
t_2_df = data.frame(x = x, y = dt(x, df = 2))
t_5_df = data.frame(x = x, y = dt(x, df = 5))
t_15_df = data.frame(x = x, y = dt(x, df = 15))
t_30_df = data.frame(x = x, y = dt(x, df = 30))
t_120_df = data.frame(x = x, y = dt(x, df = 120))

plot_data = rbind(cbind(Distribution = "Normal", normal_data), cbind(Distribution = "t(2)", t_2_df), cbind(Distribution = "t(5)", t_5_df), cbind(Distribution = "t(15)", t_15_df), cbind(Distribution = "t(30)", t_30_df), cbind(Distribution = "t(120)", t_120_df))

ggplot(plot_data, aes(x = x, y = y, color = Distribution)) + geom_line(linewidth = 1) + labs(title = "Normal and Student t-distributions", x = "Z-Score", y = "Density")

Part 2. Lets work with normal data below (1000 observations with mean of 108 and sd of 7.2). Plot two charts - the normally distributed data (above) and the Z score distribution of the same data. Do they have the same distributional shape? Why or why not?

set.seed(123)  # Set seed for reproducibility
mu = 108
sigma = 7.2
data_values = rnorm(n = 1000, mean = mu, sd = sigma) 
df = data.frame(data_values)

z_scores = ((data_values - mu) / sigma)
df_z = data.frame(z_scores)

ggplot(df, aes(x = data_values)) + geom_histogram() + labs(title = "Histogram With Normal Distribution Data", x = "", y = "Density")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(df_z, aes(x = z_scores)) + geom_histogram() + labs(title = "Histogram With Z-Scores", x = "", y = "Density")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Yes, the two distributions have the same shape because they are from the same data–the z-score just standardizes the observations. Standardizing the data changes the number values representing the observations but not the overall shape. For this reason, the z-score histogram is centered at zero and has a standard deviation of one.

Part 3. In your own words, please explain what is p-value?

The p-value is the likelihood of observing our data under the circumstances described by the null hypothesis. Low p-values (usually measured as being 0.05 or lower) have high statistical significance and provide grounds for rejecting a null hypothesis in favor of the alternative. If we have a very low p-value, we would be inclined to believe that the null hypothesis is not true, and something else is going on to cause the abnormal observations we are seeing.

ADAN7301.02 Week 5 Discussion

Grace Inorio

Part 1. The (student) t distribution converges to normal distribution as the degrees of freedom increase (beyond 120). Please plot a normal distribution, and a few t distributions on the same chart with 2, 5, 15, 30, 120 degrees of freedom.

Part 2. Lets work with normal data below (1000 observations with mean of 108 and sd of 7.2). Plot two charts - the normally distributed data (above) and the Z score distribution of the same data. Do they have the same distributional shape? Why or why not?

Part 3. In your own words, please explain what is p-value?