x <- seq(-3,3, 0.1) #Set up x-axis
df <- c(2,5,15,30,120) #define the degrees of freedom
#Plot the normal distribution
plot(x, dnorm(x), type = 'l', lwd = 2,
main = "T-Distributions of Varying DF and The Normal Distribution",
ylab = "Density")
#Plot the various T-distributions
lines(x, dt(x, df[5]), col = 'red')
lines(x, dt(x, df[4]), col = "orange")
lines(x, dt(x, df[3]), col = "yellow")
lines(x, dt(x, df[2]), col = "green")
lines(x, dt(x, df[1]), col = "blue")
#Add a legend
legend(1.35, 0.4, legend = c(df, "Normal"),
fill = c("blue", "green", "yellow", "orange", "red", "black"),
title = "Degrees of Freedom"
)
set.seed(407) # Set seed for reproducibility**
mu <- 108
sigma <- 7.2
data_values <- rnorm(n = 1000, mean = mu, sd = sigma)
First we can easily plot the data from above using a histogram:
#plot the results from above on a histogram
hist(data_values, xlab = "Result", main = "Normally Distributed Results")
This data set is clearly normal as we would expect. To make our next plot, we will have to first calculate the Z-Score for each data point.
\(Z= \displaystyle \frac{x-\mu}{\sigma}\)
This formula will tell us how many standard deviations (\(\sigma\)) a given data point (\(x\)) is from the mean (\(\mu\)).
zscores <- (data_values-mu)/sigma #calculate Z-score for each data point
hist(zscores, xlab = "Z-Score", main = "Distribution of Z-Scores") #plot results on a histogram
Both charts have the rough shape of a normal distribution which makes sense. The first is generated as normal, and the other is simply describing how far each data point is from the mean. We would expect a lot entries to be close, having a near zero Z-score, and fewer to be far away, having large Z-scores. This gives the second graph the same bell shape as a normal distribution.
A P-Value describes the probability of seeing data that is at least as supportive of the alternate hypothesis of our current data set, assuming the null hypothesis to be true. This value is important for hypothesis testing, as it is what we will ultimately use to decide whether to reject the null hypothesis or not. If the p-value is less the the significance level (\(\alpha\)), then we reject the null hypothesis. For example, if we determine that there is only 2% chance of seeing our current results given the null hypothesis, and our signifigance level is 5%, then we would comfortably reject the null hypothesis.