download.file("http://www.openintro.org/stat/data/ames.RData", destfile = "ames.RData")
load("ames.RData")
The distribution would be skewed right, along with being considered unimodal due to its median being less than the mean, as shown in the summary command. The typical size would roughly be 1250 in the sample mean. To me, I interpret “typical” as the average value that represents/shows the average living area size that most homes consist of, within the group.
population <- ames$Gr.Liv.Area
samp <- sample(population, 60)
hist(samp,
col = "Lavender")
summary(samp)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 720 1240 1524 1539 1824 2690
No I would not expect other students to be identical simply due to the sampling being done randomly, chances are the student would have another sample set with distinct features, but it may share similarities if it was taken from the same samples.
Observations from the sample must be random, observations must be independent of one another, distribution can be strongly skewed, and the sample size has to be greater than 30.
sample_mean <- mean(samp)
se <- sd(samp) / sqrt(60)
lower <- sample_mean - 1.96 * se
upper <- sample_mean + 1.96 * se
c(lower, upper)
## [1] 1433.724 1643.642
It just means that 95% of the confidence intervals would have the true population mean, or it can be thought of as 95% of the time the estimate would be within 2 standard errors/lower bound and upper bound.
mean(population)
## [1] 1499.69
Looking at the previous coding Exercise 3-4, yes the confidence interval does capture the average house size in Ames, this is due to the true mean lies between the lower bound (1334.801) and upper bound(1581.299), while being true mean being 1499.69.
I would still suspect that 95% would capture the true population
samp_mean <- rep(NA, 50)
samp_sd <- rep(NA, 50)
n<- 60
for(i in 1:50){
samp <- sample(population, n)
samp_mean[i] <- mean(samp)
samp_sd[i]<- sd(samp)
}
lower_vector <- samp_mean - 1.96 * samp_sd / sqrt(n)
upper_vector <- samp_mean +1.96 * samp_sd / sqrt(n)
c(lower_vector[1], upper_vector[1])
## [1] 1427.304 1692.296
Out of the 50 confidence variables, only 3 did not capture the true mean in this situation, which roughly correlates to 94% of confidence intervals including the true population mean. Looking at the proportion value being equal to the confidence level, they are close but not exact, keeping in mind the confidence intervals are not men to capture exact values but instead provide a range that may contain the population of the true value.
plot_ci(lower_vector, upper_vector, mean(population))
NTM <- 1-(2/50)
Confidence Value= 90%, Critical Value = 1.28
qnorm(0.90)
## [1] 1.281552
It is fairly comparable, 5/50 did not meet the true mean which is about 90%, this makes sense, due to setting the confidence interval at a lower sig level.
samp_mean <- rep(NA, 50)
samp_sd <- rep(NA, 50)
n<- 60
for(i in 1:50){
samp <- sample(population, n)
samp_mean[i] <- mean(samp)
samp_sd[i]<- sd(samp)
}
lower_vector <- samp_mean - 1.28 * samp_sd / sqrt(n)
upper_vector <- samp_mean +1.28 * samp_sd / sqrt(n)
c(lower_vector[1], upper_vector[1])
## [1] 1379.781 1544.786
plot_ci(lower_vector, upper_vector, mean(population))
prop <- 1-(5/50)
prop
## [1] 0.9