load("more/ames.RData")
population <- ames$Gr.Liv.Area
samp <- sample(population, 60)
hist(samp)
Ans: The distribution of the population has fat left tail. The tipical size is 60, which means randomly select 60 examples from size of hourse variable.
Ans: I am not expect to have same distribution as the others, but we might have skew distributions eith left or right skew since fat tail distribution shows there are some outliners data in the dataset, and if random samples include some of these data, then sh/he will has skew distribution.
Ans: The sample size must be equal or grater than 30; the sample observation must independent;mean within and the population distribution is not strong skew.
Ans: “95% confidence” is of those interval within 2 standar error of the parameter will contain the actual mean.
sample_mean <- mean(samp)
se <- sd(samp) / sqrt(60)
lower <- sample_mean - 1.96 * se
upper <- sample_mean + 1.96 * se
c(lower, upper)
## [1] 1384.617 1628.083
mean(population)
## [1] 1499.69
Ans: Yes, the true average 1499.69 is within interval.
Ans: I expect 95% of the students having their mean value within interval which have 2 standar error from actural mean. The reason is 95% confident has been set to that interval. There are 60 students in the class, and at least 57 student’s mean value will be within the interval.
samp_mean <- rep(NA, 50)
samp_sd <- rep(NA, 50)
n <- 60
for(i in 1:50){
samp <- sample(population, n) # obtain a sample of size n = 60 from the population
samp_mean[i] <- mean(samp) # save sample mean in ith element of samp_mean
samp_sd[i] <- sd(samp) # save sample sd in ith element of samp_sd
}
lower_vector <- samp_mean - 1.96 * samp_sd / sqrt(n)
upper_vector <- samp_mean + 1.96 * samp_sd / sqrt(n)
c(lower_vector[1], upper_vector[1])
## [1] 1338.089 1586.977
plot_ci(lower_vector, upper_vector, mean(population))
Ans: In above data set, there 3 out of 60 of the interval which is 5%, is not include the population mean.Yes, it is exctly 95% in confident interval. But in other cases, it would be less than 3 examples out of the interval, which it will satisfify 95% confident interval assumption.
Ans: I pick 90% confident interval, then the critical value will be 1.64.
plot_ci
function, plot all intervals and calculate the proportion of intervals that include the true population mean. How does this percentage compare to the confidence level selected for the intervals?lower_vector90 <- samp_mean - 1.64 * samp_sd / sqrt(n)
upper_vector90 <- samp_mean + 1.64 * samp_sd / sqrt(n)
plot_ci(lower_vector90, upper_vector90, mean(population))
Ans: In above same data set, there 3 out of 60 of the interval which is 10%, is not include the population mean.
This is a product of OpenIntro that is released under a Creative Commons Attribution-ShareAlike 3.0 Unported. This lab was written for OpenIntro by Andrew Bray and Mine Çetinkaya-Rundel.