download.file("http://www.openintro.org/stat/data/ames.RData", destfile = "ames.RData")
load("ames.RData")

Excersice 1: Describe the Distribution of your sample. What would you say is the “typical” size within your sample mean? Also state precisely what you interpreted “typical” to mean.

The distribution would be skewed right, along with being considered unimodal due to its median being less than the mean, as shown in the summary command. The typical size would roughly be 1250 in the sample mean. To me, I interpret “typical” as the average value that represents/shows the average living area size that most homes consist of, within the group.

population <- ames$Gr.Liv.Area
samp <- sample(population, 60)
hist(samp, 
     col = "Lavender")

summary(samp)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     720    1240    1524    1539    1824    2690

Excersie 2: Would you expect another student’s distribution to be identical to yours? Would you expect it to be similar? Why or Why not?

No I would not expect other students to be identical simply due to the sampling being done randomly, chances are the student would have another sample set with distinct features, but it may share similarities if it was taken from the same samples.

Excerisce 3:For the confidence interval to be valid, the sample mean must be normally distributed and have standard error s/n−−√. What conditions must be met for this to be true?

Observations from the sample must be random, observations must be independent of one another, distribution can be strongly skewed, and the sample size has to be greater than 30.

sample_mean <- mean(samp)

se <- sd(samp) / sqrt(60)
lower <- sample_mean - 1.96 * se
upper <- sample_mean + 1.96 * se 
c(lower, upper)
## [1] 1433.724 1643.642

Exercise 4: What does “95% confience” mean?

It just means that 95% of the confidence intervals would have the true population mean, or it can be thought of as 95% of the time the estimate would be within 2 standard errors/lower bound and upper bound.

mean(population)
## [1] 1499.69

Exercise 5:Does your confidence interval capture the true average size of houses in Ames? If you are working on this lab in a classroom, does your neighbor’s interval capture this value?

Looking at the previous coding Exercise 3-4, yes the confidence interval does capture the average house size in Ames, this is due to the true mean lies between the lower bound (1334.801) and upper bound(1581.299), while being true mean being 1499.69.

Excersice 6: Each student in your class should have gotten a slightly different confidence interval. What proportion of those intervals would you expect to capture the true population mean? Why? If you are working in this lab in a classroom, collect data on the intervals created by other students in the class and calculate the proportion of intervals that capture the true population mean.

I would still suspect that 95% would capture the true population

samp_mean <- rep(NA, 50)
samp_sd <- rep(NA, 50)
n<- 60
for(i in 1:50){
  samp <- sample(population, n)
  samp_mean[i] <- mean(samp)
  samp_sd[i]<- sd(samp)
}
lower_vector <- samp_mean - 1.96 * samp_sd / sqrt(n)
upper_vector <- samp_mean +1.96 * samp_sd / sqrt(n)
c(lower_vector[1], upper_vector[1])
## [1] 1427.304 1692.296

On Your OWn

1.)Using the following function (which was downloaded with the data set), plot all intervals. What proportion of your confidence intervals include the true population mean? Is this proportion exactly equal to the confidence level? If not, explain why.

Out of the 50 confidence variables, only 3 did not capture the true mean in this situation, which roughly correlates to 94% of confidence intervals including the true population mean. Looking at the proportion value being equal to the confidence level, they are close but not exact, keeping in mind the confidence intervals are not men to capture exact values but instead provide a range that may contain the population of the true value.

plot_ci(lower_vector, upper_vector, mean(population))

NTM <- 1-(2/50)

2.)Pick a confidence level of your choosing, provided it is not 95%. What is the appropriate critical value?

Confidence Value= 90%, Critical Value = 1.28

qnorm(0.90)
## [1] 1.281552

3.) Calculate 50 confidence intervals at the confidence level you chose in the previous question. You do not need to obtain new samples, simply calculate new intervals based on the sample means and standard deviations you have already collected. Using the plot_ci function, plot all intervals and calculate the proportion of intervals that include the true population mean. How does this percentage compare to the confidence level selected for the intervals?

It is fairly comparable, 5/50 did not meet the true mean which is about 90%, this makes sense, due to setting the confidence interval at a lower sig level.

samp_mean <- rep(NA, 50)
samp_sd <- rep(NA, 50)
n<- 60
for(i in 1:50){
  samp <- sample(population, n)
  samp_mean[i] <- mean(samp)
  samp_sd[i]<- sd(samp)
}
lower_vector <- samp_mean - 1.28 * samp_sd / sqrt(n)
upper_vector <- samp_mean +1.28 * samp_sd / sqrt(n)
c(lower_vector[1], upper_vector[1])
## [1] 1379.781 1544.786
plot_ci(lower_vector, upper_vector, mean(population))

prop <- 1-(5/50)
prop
## [1] 0.9