1. The (student) t distribution converges to normal distribution as the degrees of freedom increase (beyond 120). Please plot a normal distribution, and a few t distributions on the same chart with 2, 5, 15, 30, 120 degrees of freedom.

x <- seq(-3,3, 0.1) #Set up x-axis
df <- c(2,5,15,30,120) #define the degrees of freedom

#Plot the normal distribution
plot(x, dnorm(x), type = 'l', lwd = 2, 
     main = "T-Distributions of Varying DF and The Normal Distribution",
     ylab = "Density")

#Plot the various T-distributions
lines(x, dt(x, df[5]), col = 'red')
lines(x, dt(x, df[4]), col = "orange")
lines(x, dt(x, df[3]), col = "yellow")
lines(x, dt(x, df[2]), col = "green")
lines(x, dt(x, df[1]), col = "blue")

#Add a legend
legend(1.35, 0.4, legend = c(df, "Normal"), 
       fill = c("blue", "green", "yellow", "orange", "red", "black"),
       title = "Degrees of Freedom"
       )

2. Lets work with normal data below (1000 observations with mean of 108 and sd of 7.2).

set.seed(407)  # Set seed for reproducibility**

mu <-  108

sigma <-  7.2

data_values <- rnorm(n = 1000,   mean = mu,  sd = sigma) 

Plot two charts - the normally distributed data (above) and the Z score distribution of the same data. Do they have the same distributional shape ? Why or why not ?

First we can easily plot the data from above using a histogram:

#plot the results from above on a histogram
hist(data_values, xlab = "Result", main = "Normally Distributed Results")

This data set is clearly normal as we would expect. To make our next plot, we will have to first calculate the Z-Score for each data point.

\(Z= \displaystyle \frac{x-\mu}{\sigma}\)

This formula will tell us how many standard deviations (\(\sigma\)) a given data point (\(x\)) is from the mean (\(\mu\)).

zscores <- (data_values-mu)/sigma #calculate Z-score for each data point
hist(zscores, xlab = "Z-Score", main = "Distribution of Z-Scores") #plot results on a histogram

Both charts have the rough shape of a normal distribution which makes sense. The first is generated as normal, and the other is simply describing how far each data point is from the mean. We would expect a lot entries to be close, having a near zero Z-score, and fewer to be far away, having large Z-scores. This gives the second graph the same bell shape as a normal distribution.

3. In your own words, please explain what is p-value?

A P-Value describes the probability of seeing data that is at least as supportive of the alternate hypothesis of our current data set, assuming the null hypothesis to be true. This value is important for hypothesis testing, as it is what we will ultimately use to decide whether to reject the null hypothesis or not. If the p-value is less the the significance level (\(\alpha\)), then we reject the null hypothesis. For example, if we determine that there is only 2% chance of seeing our current results given the null hypothesis, and our signifigance level is 5%, then we would comfortably reject the null hypothesis.