Lab 4B - Xialing Walla

download.file("http://www.openintro.org/stat/data/ames.RData", destfile = "ames.RData")
load("ames.RData")
population <- ames$Gr.Liv.Area

Exercise 1 Describe the distribution of your sample. What would you say is the “typical” size within your sample? Also state precisely what you interpreted “typical” to mean.

The population distribution is right skewed and unimodal, as the median (1,514) is less than the mean. The range is is about 600.there is an extreme outlierover 5,000. Looking at the summary of the sample size of 60 I would say the typical size within the sample is the sample mean, 1,525. To me, this means that this value represents an average living area size that most homes have in Ames.

set.seed(60)
samp <- sample(population, 60)
summary(samp)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    729    1160    1510    1520    1720    5100

hist(samp)

plot of chunk unnamed-chunk-3

Exercise 2 Would you expect another student's distribution to be identical to yours? Would you expect it to be similar? Why, or why not?

Another student's distribution will not be idential to mine as the samples are random for each person who is sampling the population. However, i would expect it to be similar because if the point esitmate is unbiased then the sampling distribution of the estimate should be centered at the parameter it estimates.

Exercise 3 For the confidence interval to be valid, the sample mean must be normally distributed and have standard error s/Sqrt n. What conditions must be met for this to be true?

The samples must be random and imdependent (less than 10% of the population), sample size greater than 30, and the disribution is not strongly skewed.

Exercise 4 What does 95% confidence" mean?

confidence interval only tries to capture the population parameter, but it does not mean that it captures individual observations, or point estimtes. 95% confidence means that roughly 95% of the time the estimates will be within 2 standard errors of the parameter.

Exercise 5 Does your confidence interval capture the true average size of houses in Ames? If you are working on this lab in a classroom, does your neighbor's interval capture this value?

sample_mean <- mean(samp)
se <- sd(samp)/sqrt(60)
lower <- sample_mean - 1.96 * se
upper <- sample_mean + 1.96 * se
c(lower, upper)

95% confidence interval [1] 1373 1677

mean(population)

Population mean [1] 1500

Based on the 95% confidence interval calculation vs. the population mean, yes, the interval does includeing the true average size of the houses in Ames.

Exercise 6 Each student in your class should have gotten a slightly different confidence interval. What proportion of those intervals would you expect to capture the true population mean? Why? If you are working in this lab in a classroom, collect data on the intervals created by other students in the class and calculate the proportion of intervals that capture the true population mean.

Since we do not work in the classroom together I can't collect data from other students. However, I would still say 95% of the intervals captured the true population mean.

samp_mean <- rep(NA, 50)
samp_sd <- rep(NA, 50)
n <- 60
for(i in 1:50){
samp <- sample(population, n)
samp_mean[i] <- mean(samp)
samp_sd[i] <- sd(samp)
}
lower <- samp_mean - 1.96 * samp_sd / sqrt(n)
upper <- samp_mean + 1.96 * samp_sd / sqrt(n)

c(lower[1],upper[1])

samples [1] 1314 1541

c(lower[2],upper[2])

samples [1] 1345 1594

Lab 4B Questions

1. Using the following function (which was downloaded with the data set), plot all intervals. What proportion of your confidence intervals include the true population mean? Is this proportion exactly equal to the confidence level? If not, explain why.

plot_ci(lower, upper, mean(population))

plot of chunk unnamed-chunk-8

p <- 1-(3/50)
p

proportion of CI plot [1] 0.94

50 confidence intervals of size = 60 were plotted. 3 out ofthe 50 intervals did not capture the true mean, mu=1499.69, which is about 94%. The proportion does not exactly equal to the confidence level. However, the confidence intervals are not meant to capture exact values. it is a range of values that is about 95% likely to contain the true value of the population.

2. Pick a confidence level of your choosing, provided it is not 95%. What is the appropriate critical value?

I will pick 90% confidence level. using signifiance level 1- a/2 = 1 - 0.10/2 = 0.95.

Z <- qnorm(.95)
Z

critical values [1] 1.645

3. Calculate 50 confidence intervals at the confidence level you chose in the previous question. You do not need to obtain new samples, simply calculate new intervals based on the sample means and standard deviations you have already collected. Using the plot_ci function, plot all intervals. Calculate the proportion of intervals that include the true population mean. How does this percentage compare to the confidence level level selected for the intervals?

It is comparable. 4 out of the 50 intervals did not capture the true mean, mu=1499.69, which is about 92%. This is reasonable as we set the confidence interval at a lower level with a significance level of 10%.

samp_mean <- rep(NA, 50)
samp_sd <- rep(NA, 50)
n <- 60
for(i in 1:50){
samp <- sample(population, n)
samp_mean[i] <- mean(samp)
samp_sd[i] <- sd(samp)
}
lower <- samp_mean - 1.65 * samp_sd / sqrt(n)
upper <- samp_mean + 1.65 * samp_sd / sqrt(n)
c(lower[1],upper[1])

[1] 1375 1637

plot_ci(lower, upper, mean(population))

plot of chunk unnamed-chunk-12

p <- 1-(4/50)
p

proportion of CI plot at 90% CI [1] 0.92

4. Gifted children: Researchers investigating characteristics of gifted children collected data from schools in a large city on a random sample of thirty-six children who were identified as gifted children soon after they reached the age of four. The following histogram shows the distribution of the ages (in months) at which these children firrst counted to 10 successfully. Also provided are some sample statistics.

a) Are conditions for inference satisfied?

Yes the conditionsl are satisfied - the sample size of 36 is greater than 30. Samples are collected from schools in a large city, which provides a population large enoguh for unbiased and independent. The histogram did not show that the data is strongly skewed or contains extreme outliers, and the shape is near normal distribution.

b)Suppose you read on a parenting website that children first count to 10 successfully when they are 32 months old, on average. Perform a hypothesis test to evaluate if these data provide convincing evidence that the average age at which gifted children first count to 10 successfully is less than the general average of 32 months. Use a significance level of 0.10.

H0 = µ = 32 - Average age at which gifted children first count to 10 successfully is at 32M
H1 = µ < 32 - Average age at which gifted children first count to 10 successfully is less than 32M

sample_mean <- 30.69
sample_sd <- 4.31
se <- sample_sd/sqrt(36)
lower <- sample_mean - 1.65 * se
upper <- sample_mean + 1.65 * se
c(lower, upper)

significance level of 0.10 [1] 29.50 31.88

Since the confidence interval does NOT include the 32 months we can conclude that the average age at which GT kids firswt count to 10 successfully is less than 32M.

c) Interpret the p-value in context of the hypothesis test and the data.

(30.69-32)/se

p-Value [1] -1.824

A negative Z score of -1.823666 corresponds to a probability of 0.03336. If the P-value is less than the 0.10 significance level there is sufficient evidence to reject H0 in favor of H1 - Average age at which gifted children first count to 10 successfully is less than 32M

d) Calculate a 90% confidence interval for the average age at which gifted children first count to 10 successfully.

sample_mean <- 30.69
sample_sd <- 4.31
se <- sample_sd/sqrt(36)
lower <- sample_mean - 1.65 * se
upper <- sample_mean + 1.65 * se
c(lower, upper)

90% confidendce interval [1] 29.50 31.88

e) Do your results from the hypothesis test and the confidence interval agree? Explain.

I think the results should be the same becasue 0.10 significance level is the complement of 90% confidence interval. The hypothesis test using P-value indicates that it is smaller than the 0.10 significance level. 90% confidence interval calculation did not capture the average reading time of 32 months. Both results agree that the average age at which gifted children first count to 10 successfully is less than 32 months.

5. Find the sample mean: You are given the following hypotheses:H0:m=34,HA:m>34. We know that the sample standard deviation is 10 and the sample size is 65. For what sample mean would the p-value be equal to 0.05? Assume that all conditions necessary for inference are satisfied

Since we are trying to proof that m > 34 this is a one sided test. Looking at the P-value of 0.05 in the normal probability talbe gives us a z score of 1.65 (95%).

se <- 10/sqrt(65)
se

standard of error [1] 1.24

Plugging all the variables in the P-value formula 1.65 = (X - 34)/se

the sample mean has to be 36.046, rounding to 37 for the p-value be equal to 0.05.

6. Testing for food safety A food safety inspector is called upon to investigate a restaurant with a few customer reports of poor sanitation practices. The food safety inspector uses a hypothesis testing framework to evaluate whether regulations are not being met. If he decides the restaurant is in gross violation, its license to serve food will be revoked.

a)Write the hypotheses in words.

H0 : Regulations are being met at the restaurant HA : Regulations are not being met at the restaurant

b) What is a Type 1 error in this context?

The type 1 error in this context would be the inspector reject the null hypothesis that the regulations are being met at the restaurant, even thought the restaurant did meet all the regulations.

c) What is a Type 2 error in this context?

The type 2 error in this context would be the inspector fail to reject the null hypothesis that the regulations are being met at the restaurant, even thought the restaurant didn't meet all the regulations.

d) Which error is more problematic for the restaurant owner? Why?

Type 1 is more problematic for the restaurant because this means that the restaurant passed the regulation but wrongly accused of failing it.

e) Which error is more problematic for the diners? Why?

Type 2 is more problematic for the diners because they may be eating at a restant that has sanitation practices without knowing it, since the inspector failed to prove it otherwise.

f) As a diner, would you prefer that the food safety inspector requires strong evidence or very strong evidence of health concerns before revoking a restaurant's license? Explain your reasoning.

I would prefer that the food safety inspector requires strong evidence which lowers the threshhold of evaluating evidence for the investigation. This makes it more likely to cause the restaurant to fail the inspection so it will push the restaurant owner to increase the sanitation standards. However, by lowering the threshold the potential of making type 1 error also increases, which would be of a concern as well.