download.file("http://www.openintro.org/stat/data/ames.RData", destfile = "ames.RData")
load("ames.RData")
population <- ames$Gr.Liv.Area
The population distribution is right skewed and unimodal, as the median (1,514) is less than the mean. The range is is about 600.there is an extreme outlierover 5,000. Looking at the summary of the sample size of 60 I would say the typical size within the sample is the sample mean, 1,525. To me, this means that this value represents an average living area size that most homes have in Ames.
set.seed(60)
samp <- sample(population, 60)
summary(samp)
Min. 1st Qu. Median Mean 3rd Qu. Max.
729 1160 1510 1520 1720 5100
hist(samp)
Another student's distribution will not be idential to mine as the samples are random for each person who is sampling the population. However, i would expect it to be similar because if the point esitmate is unbiased then the sampling distribution of the estimate should be centered at the parameter it estimates.
The samples must be random and imdependent (less than 10% of the population), sample size greater than 30, and the disribution is not strongly skewed.
confidence interval only tries to capture the population parameter, but it does not mean that it captures individual observations, or point estimtes. 95% confidence means that roughly 95% of the time the estimates will be within 2 standard errors of the parameter.
sample_mean <- mean(samp)
se <- sd(samp)/sqrt(60)
lower <- sample_mean - 1.96 * se
upper <- sample_mean + 1.96 * se
c(lower, upper)
95% confidence interval [1] 1373 1677
mean(population)
Population mean [1] 1500
Based on the 95% confidence interval calculation vs. the population mean, yes, the interval does includeing the true average size of the houses in Ames.
Since we do not work in the classroom together I can't collect data from other students. However, I would still say 95% of the intervals captured the true population mean.
samp_mean <- rep(NA, 50)
samp_sd <- rep(NA, 50)
n <- 60
for(i in 1:50){
samp <- sample(population, n)
samp_mean[i] <- mean(samp)
samp_sd[i] <- sd(samp)
}
lower <- samp_mean - 1.96 * samp_sd / sqrt(n)
upper <- samp_mean + 1.96 * samp_sd / sqrt(n)
c(lower[1],upper[1])
samples [1] 1314 1541
c(lower[2],upper[2])
samples [1] 1345 1594
Lab 4B Questions
plot_ci(lower, upper, mean(population))
p <- 1-(3/50)
p
proportion of CI plot [1] 0.94
50 confidence intervals of size = 60 were plotted. 3 out ofthe 50 intervals did not capture the true mean, mu=1499.69, which is about 94%. The proportion does not exactly equal to the confidence level. However, the confidence intervals are not meant to capture exact values. it is a range of values that is about 95% likely to contain the true value of the population.
I will pick 90% confidence level. using signifiance level 1- a/2 = 1 - 0.10/2 = 0.95.
Z <- qnorm(.95)
Z
critical values [1] 1.645
It is comparable. 4 out of the 50 intervals did not capture the true mean, mu=1499.69, which is about 92%. This is reasonable as we set the confidence interval at a lower level with a significance level of 10%.
samp_mean <- rep(NA, 50)
samp_sd <- rep(NA, 50)
n <- 60
for(i in 1:50){
samp <- sample(population, n)
samp_mean[i] <- mean(samp)
samp_sd[i] <- sd(samp)
}
lower <- samp_mean - 1.65 * samp_sd / sqrt(n)
upper <- samp_mean + 1.65 * samp_sd / sqrt(n)
c(lower[1],upper[1])
[1] 1375 1637
plot_ci(lower, upper, mean(population))
p <- 1-(4/50)
p
proportion of CI plot at 90% CI [1] 0.92
Yes the conditionsl are satisfied - the sample size of 36 is greater than 30. Samples are collected from schools in a large city, which provides a population large enoguh for unbiased and independent. The histogram did not show that the data is strongly skewed or contains extreme outliers, and the shape is near normal distribution.
sample_mean <- 30.69
sample_sd <- 4.31
se <- sample_sd/sqrt(36)
lower <- sample_mean - 1.65 * se
upper <- sample_mean + 1.65 * se
c(lower, upper)
significance level of 0.10 [1] 29.50 31.88
Since the confidence interval does NOT include the 32 months we can conclude that the average age at which GT kids firswt count to 10 successfully is less than 32M.
(30.69-32)/se
p-Value [1] -1.824
A negative Z score of -1.823666 corresponds to a probability of 0.03336. If the P-value is less than the 0.10 significance level there is sufficient evidence to reject H0 in favor of H1 - Average age at which gifted children first count to 10 successfully is less than 32M
sample_mean <- 30.69
sample_sd <- 4.31
se <- sample_sd/sqrt(36)
lower <- sample_mean - 1.65 * se
upper <- sample_mean + 1.65 * se
c(lower, upper)
90% confidendce interval [1] 29.50 31.88
I think the results should be the same becasue 0.10 significance level is the complement of 90% confidence interval. The hypothesis test using P-value indicates that it is smaller than the 0.10 significance level. 90% confidence interval calculation did not capture the average reading time of 32 months. Both results agree that the average age at which gifted children first count to 10 successfully is less than 32 months.
Since we are trying to proof that m > 34 this is a one sided test. Looking at the P-value of 0.05 in the normal probability talbe gives us a z score of 1.65 (95%).
se <- 10/sqrt(65)
se
standard of error [1] 1.24
Plugging all the variables in the P-value formula 1.65 = (X - 34)/se
the sample mean has to be 36.046, rounding to 37 for the p-value be equal to 0.05.
H0 : Regulations are being met at the restaurant HA : Regulations are not being met at the restaurant
The type 1 error in this context would be the inspector reject the null hypothesis that the regulations are being met at the restaurant, even thought the restaurant did meet all the regulations.
The type 2 error in this context would be the inspector fail to reject the null hypothesis that the regulations are being met at the restaurant, even thought the restaurant didn't meet all the regulations.
Type 1 is more problematic for the restaurant because this means that the restaurant passed the regulation but wrongly accused of failing it.
Type 2 is more problematic for the diners because they may be eating at a restant that has sanitation practices without knowing it, since the inspector failed to prove it otherwise.
I would prefer that the food safety inspector requires strong evidence which lowers the threshhold of evaluating evidence for the investigation. This makes it more likely to cause the restaurant to fail the inspection so it will push the restaurant owner to increase the sanitation standards. However, by lowering the threshold the potential of making type 1 error also increases, which would be of a concern as well.