x=c(3,6,9,12,15,18,21,24,27,30)
pop_mean=mean(x)
pop_mean
## [1] 16.5
pop_variance = sum((x - pop_mean)^2) / length(x)
pop_variance
## [1] 74.25
d=expand.grid(x,x,x,x)
d$X_bar=rowMeans(d[,1:4])
freq=table(d$X_bar)
prop=prop.table(freq)
head(prop)
## 
##      3   3.75    4.5   5.25      6   6.75 
## 0.0001 0.0004 0.0010 0.0020 0.0035 0.0056
head(freq)
## 
##    3 3.75  4.5 5.25    6 6.75 
##    1    4   10   20   35   56
plot(as.numeric(names(prop)), as.numeric(prop), type = "b", xlab = "Sample Mean (X_bar)", ylab = "Proportion", main = "Sampling Distribution of Sample Mean")

the shape of the plot looks like a Normal Distribution.

mean_of_X_bar=mean(d$X_bar)
mean_of_X_bar
## [1] 16.5

It is exactly equal to the population mean from part 1(a). This shows that the sample mean is an unbiased estimator.

var_of_X_bar=sum((d$X_bar-mean_of_X_bar)^2)/10000
var_of_X_bar
## [1] 18.5625

relationship:as the sample size increases, the precision of the sample mean increases and the spread decreases in proportion.(since pop_variance = sum((x - pop_mean)**2) / length(x) from1(b))

I demonstrated the Central Limit Theorem (CLT),In part (f), the graph looks like a normal curve. In part (g), the average is the same as the population mean. In part (h), the variance rule. Together, we can get the CLT.

n=4
v2=rowMeans(d[, 1:4]^2)-(d$X_bar)^2
v1=v2*(n/(n-1))
b1=mean(v1)-pop_variance
b2=mean(v2)-pop_variance
cat("Bias of S^2:", b1, "\n")
## Bias of S^2: 0
cat("Bias of sigma_hat_sq:", b2, "\n")
## Bias of sigma_hat_sq: -18.5625

The Bias of \(S^2\) is 0, so \(S^2\) is an unbiased estimator. The Bias of \(\hat{\sigma}^2\) is not 0, so \(\hat{\sigma}^2\) is a biased estimator.

var1=sum((v1-mean(v1))^2)/length(v1)
var2=sum((v2 - mean(v2))^2)/length(v2)
mse1=var1+b1^2
mse2=var2+b2^2

cat("S^2: Var =", var1, ", MSE =", mse1, "\n")
## S^2: Var = 1988.044 , MSE = 1988.044
cat("sigma_hat_sq: Var =", var2, ", MSE =", mse2, "\n")
## sigma_hat_sq: Var = 1118.275 , MSE = 1462.841

Although \(\hat{\sigma}^2\) MSE is smaller than \(S^2\).Smaller MSE means the estimator is more accurate. As a result, \(\hat{\sigma}^2\) is a better estimator

sample_a=function(){
  s=runif(2, min=0, max=5) 
  return(mean(s))
}
X_bar_a = replicate(10000, sample_a())
plot(density(X_bar_a), main="n=2, Unif[0,5]")

sample_b=function(){
  s=runif(5, min=0, max=5)
  return(mean(s))
}
X_bar_b=replicate(10000, sample_b())
plot(density(X_bar_b), main="n=5, Unif[0,5]")

sample_c=function(){
  s = rchisq(5, df=1)
  return(mean(s))
}
X_bar_c=replicate(10000, sample_c())
plot(density(X_bar_c), main="n=5, Chi-sq df=1")

sample_d=function(){
  s=rchisq(60, df=1)
  return(mean(s))
}
X_bar_d=replicate(10000, sample_d())
plot(density(X_bar_d), main="n=60, Chi-sq df=1")

sample_e=function(){
  s=rchisq(5, df=60)
  return(mean(s))
}
X_bar_e=replicate(10000, sample_e())
plot(density(X_bar_e), main="n=5, Chi-sq df=60")

how large n should be:if the shape is balanced, \(n=5\) is enough. If the shape is very “inclined” to one side, we need a much bigger \(n\) to make it look Normal.

The role of Skewness:Skewness makes the change slower. And if the original distribution is skewed, we may need more data to make the final graph look like a symmetric bell.