#set the range
x<-seq(-4,4, length.out=1000)
# create a dataframe with the densities laid out in the problem
dense<-data.frame(
x=rep(x,6),
df=c(dnorm(x),
dt(x, df=2),
dt(x, df=5),
dt(x, df=15),
dt(x, df=30),
dt(x, df=120)
),
distribution=factor(rep(c("Normal",
"t (df=2)",
"t (df=5)",
"t (df=15)",
"t (df=30)",
"t (df=120)"),
each=length(x)),
levels=c(
"Normal",
"t (df=2)",
"t (df=5)",
"t (df=15)",
"t (df=30)",
"t (df=120)"
)))
# now plot
ggplot(dense, aes(x=x, y=df, color=distribution))+
geom_line(linewidth = 1)+
labs(title="Normal vs student's t distributions",
x="",
y="")+
theme_minimal()Discussion 5
Part 1. The (student) t distribution converges to normal distribution as the degrees of freedom increase (beyond 120). Please plot a normal distribution, and a few t distributions on the same chart with 2, 5, 15, 30, 120 degrees of freedom.
Lets work with normal data below (1000 observations with mean of 108 and sd of 7.2).
set.seed(123) # Set seed for reproducibility
mu <- 108
sigma <- 7.2
data_values <- rnorm(n = 1000, mean = mu, sd = sigma )
Plot two charts - the normally distributed data (above) and the Z score distribution of the same data. Do they have the same distributional shape ? Why or why not ?
These plots have the same distribution as the z score is a linear transformation that serves to re-scale the data, and preserves the shape of the data
# set values
set.seed(123)
mu<-108
sigma<-7.2
data_values<-rnorm(n=1000, mean=mu, sd=sigma)
#compute the z scores
z_scores<-(data_values-mean(data_values))/sd(data_values)
## now create a data frame for the plolt
df2<-data.frame(value=data_values)
dfz<-data.frame(value=z_scores)
### now plot the data
p1<-ggplot(df2, aes(x=value))+
geom_histogram(aes(y=after_stat(density)),
bins=30,
fill="violet",
color="violet")+
labs(title="distribution of data",
x="",
y="")+
theme_minimal()
# now plopt the z-score distribution
p2<-ggplot(dfz, aes(x=value))+
geom_histogram(aes(y=after_stat(density)),
bins=30,
fill="skyblue",
color="skyblue")+
labs(title="distribution of z-scores",
x="",
y="")+
theme_minimal()
#pring and compare
p1p2Part 3. In your own words, please explain what is p-value?
The p-value gives us the probability that the data supports the null hypothesis. A larger p-value means there’s a greater chance that the data supports the hypothesis while a smaller value indicates a lesser chance.