Lab report

Load data:

download.file("http://www.openintro.org/stat/data/ames.RData", destfile = "ames.RData")
load("ames.RData")

Set a seed:

set.seed(293841)

Exercises:

Exercise 1:

population<-ames$Gr.Liv.Area
samp<-sample(population, 60)
hist(samp)

summary(samp)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     796    1149    1450    1491    1733    2649

The distribution of this sample is right skewed, it has some outlines as well. This distribution would also be unimodal. The mean for the distribution is 1585 and the median is 1430. The most typical sample size would be from the 1000 - 2000 range for the sample. In this case “typical” refers to the most common response from the sample.

Exercise 2:

I don’t think another student would have the exact same distribution as my sample, because we both used a different random sample size of 60. So they would not be exactly the same, but I can imagine that the distributions would most likely be similar. Since we are ultimately trying to determine the same thing, if our methods of sampling were similar enough, then they should be similar.

Exercise 3:

sample_mean<-mean(samp)
se<-sd(samp)/sqrt(60)
lower<- sample_mean-1.96*se
upper<-sample_mean+1.96*se
c(lower, upper)
## [1] 1372.750 1608.583

The sample observations must be random, the sample size must be greater than 30, and the population distribution should not be strongly skewed.

Exercise 4:

mean(population)
## [1] 1499.69

If we take multiple samples and compute a 95% confidence interval for each sample, that means that 95% of those confidence intervals will have the true population mean.

Exercise 5:

mean(population)
## [1] 1499.69

Yes our interval was accurate since the actual mean is 1499.69 which fits into our interval which was [1404.14, 1764.93].

Exercise 6:

samp_mean<-rep(NA, 50)
samp_sd<-rep(NA, 50)
n<-60
for(i in 1:50){
  samp<-sample(population, n)
  samp_mean[i]<-mean(samp)
  samp_sd[i]<-sd(samp)
}
lower_vector<-samp_mean-1.96*samp_sd/sqrt(n)
upper_vector<-samp_mean+1.96*samp_sd/sqrt(n)
c(lower_vector[1], upper_vector[1])
## [1] 1344.174 1614.826

I would expect 95% of the intervals to capture the true population mean, since the interval built for 95% confidence. This means we can expect 95% of the intervals to capture the actual population mean.


On your own:

1:

plot_ci(lower_vector, upper_vector, mean(population))

Out of the 50 intervals, 48 confidence intervals show the true population. That is 96% of the confidence intervals include the true population mean. This confidence level is not exactly the same, but it is very close to the true value. This is because the confidence level is a good estimate but is not an exact calculation.

2:

I picked a confidence interval of 99% and the critical value for this confidence interval is 2.58.

3:

lower_vector_90 <- samp_mean - 1.65 * samp_sd / sqrt(n) 
upper_vector_90 <- samp_mean + 1.65 * samp_sd / sqrt(n)
plot_ci(lower_vector_90, upper_vector_90, mean(population))

lower_vector_99 <- samp_mean - 2.58 * samp_sd / sqrt(n) 
upper_vector_99 <- samp_mean + 2.58 * samp_sd / sqrt(n)
plot_ci(lower_vector_99, upper_vector_99, mean(population))

In the first diagram the intervals have 90% confidence to include the population mean. For this sample 46 out of 50 includes the mean which is 92%. Although this is an estimate, it is a good approximation for the confidence level. For the second diagram all of the confidence intervals, or 100%, include the population mean. And since my condifence level was 99%, receiving a 100% is a good approximation.

Teamwork report

Team member Attendance Author Contribution %
Name of member 1 Yes / No Yes / No 25%
Name of member 2 Yes / No Yes / No 25%
Name of member 3 Yes / No Yes / No 25%
Name of member 4 Yes / No Yes / No 25%
Total 100%