The (student) t distribution converges to normal distribution as the degrees of freedom increase (beyond 120). Please plot a normal distribution, and a few t distributions on the same chart with 2, 5, 15, 30, 120 degrees of freedom.
#Set the seed
set.seed(123)
#Create a dataframe
x=seq(-5,5,length=1000)
#Assign the normal distribution using dnorm
normal_dist <- dnorm(x,mean = 0, sd = 1)
#Set the various t distributions and their degrees of freedom
t_2 <- dt(x,df=2)
t_5 <- dt(x,df=5)
t_15 <- dt(x,df=15)
t_30 <- dt(x,df=30)
t_120 <- dt(x,df=120)
#Plot the normal distribution first, and highlight it with a dark black line
plot(x,normal_dist,
t='l',
main = "Normal Distribution vs t Distributions",
ylab = "Density",lwd = 3
)
#Plot the various t distributions with different colors to distinguish each of them
lines(x,t_2,col='blue')
lines(x,t_5,col='red')
lines(x,t_15,col='darkorchid')
lines(x,t_30,col='darkgreen')
lines(x,t_120,col='orange')
#Add a legend to identify each of the t distributions by color
legend("topright",
legend=c("Normal Dist.(Bolded)","DF = 2","DF=5","DF=15","DF=30","DF=120"),
col=c("black","blue", "red","darkorchid","darkgreen","orange"),
lty=1,
bty="o",
title="Graph Legend"
)
Looking at the different t-distributions, it is true that the higher the degrees of freedom, the greater the convergence to the mean.(The orange line is essentially on top of the black line (df = 120 = normal distribution))
Plot two charts - the normally distributed data (above) and the Z score distribution of the same data. Do they have the same distributional shape ? Why or why not ?
set.seed(123) # Set seed for reproducibility
mu <- 108
sigma <- 7.2
data_values <- rnorm(n = 1000, mean = mu, sd = sigma)
hist(data_values)
z_score <- (data_values-mu)/sigma
hist(z_score)
Yes, the distributions are the same shape. This is because the z_score standardizes the distribution by computing the number of standard deviations an observation is above and below the mean. As a result, given that this is a normal distribution, we would expect the “raw data” to resemble the z-score distribution, since the z-score distribution is measuring the position of the observations away from the mean and is essentially re-scaling the data.
In your own words, please explain what is p-value?
Note that I could not access the article linked- it said “access denied”, so I used other online resources (linked below) to inform my explanation. A p-value is used to evaluate how strongly the data rejects or supports the null hypothesis. The p-value is the likelihood that the observed results are valid, assuming the null hypothesis is correct. A higher p-value indicates a lower statistical significance, while a lower p-value indicates a higher statistical significance. For example, a p-value of .0001 provides strong evidence against the null hypothesis, and points in favor of the alternative hypothesis. This value is typically compared to a chosen cut off point, which is usually 0.05. In other words, this p-value tells us that 0.01% of the time, we are more likely to see a result at least as extreme as the values we’ve already observed, again, assuming the null hypothesis is true.
Sources: https://www.scribbr.com/statistics/p-value/ https://www.simplypsychology.org/p-value.html https://www.investopedia.com/terms/p/p-value.asp