What is the point estimate for the average height of active individuals?
171.1
What about the median?
170.3
What is the point estimate for the standard deviation of the heights of active individuals?
9.4
What about the IQR?
14
Is a person who is 1m 80cm (180 cm) tall considered unusually tall? And is a person who is 1m 55cm (155cm) considered unusually short? Explain your reasoning.
180cm is less than one SD from the mean (z = .947) so no, it is not unusually tall.
155cm is nearly 2 SD from the mean (z = -1.713) so while it is definitely on the shorter side of the sample population it is not unusually short.
The researchers take another random sample of physically active individuals. Would you expect the mean and the standard deviation of this new sample to be the ones given above? Explain your reasoning.
No, but I would expect them to be close. It’s always possible that I will draw an atypical sample (either for the first or second) but generally speaking they would be close. It would be rare to draw two samples with exactly the same statistics.
The sample means obtained are point estimates for the mean height of all active individuals, if the sample of individuals is equivalent to a simple random sample. What measure do we use to quantify the variability of such an estimate (Hint: recall that SD\(x_i\) = \(\sigma\) / \(\sqrt{n}\) )?
Stardard Error
Compute this quantity using the data
\(\sigma\) = 9.4
\(\sqrt{n}\) = \(\sqrt{507}\) = 22.52
SE = SD\(x_i\) = 9.4 / 22.52 = .42
We are 95% confident that the average spending of these 436 American adults is between $80.31 and $89.11.
False. Technically we are 100% sure the average spending is in the range because we know that the average spending of the sample is $84.71.
This confidence interval is not valid since the distribution of spending in the sample is right skewed.
False. The spending itself might be skewed but the mean distribution will be normal.
95% of random samples have a sample mean between $80.31 and $89.11.
False. Confidence intervals refer to a population, not a sample.
We are 95% confident that the average spending of all American adults is between $80.31 and $89.11.
True. This is the definitive use of a confidence interval.
A 90% confidence interval would be narrower than the 95% confidence interval since we don’t need to be as sure about our estimate.
True. While a narrower confidence interval is less likely to capture the population mean it gives us a more targeted field to examine.
In order to decrease the margin of error of a 95% confidence interval to a third of what it is now, we would need to use a sample 3 times larger.
False. Because ME is based off of SE and the SE calculation takes the square root of the sample size you would need to increase the sample by 9 (3 squared).
The margin of error is 4.4.
False. The margin of error for the confidence interval given is 2.
Are conditions for inference satisfied?
Yes. The observations (children) are independent, there are more than 30 children in the sample, and the distribution is not highly skewed.
Suppose you read online that children first count to 10 successfully when they are 32 months old, on average. Perform a hypothesis test to evaluate if these data provide convincing evidence that the average age at which gifted children fist count to 10 successfully is less than the general average of 32 months. Use a significance level of 0.10.
H0 = The average age at which gifted children first count to 10 successfully is the same as the general average.
HA = The average age at which gifted children first count to 10 successfully is less than the general average.
hyptest <- function(xbar, nulval, sigma, n){
SE <- sigma / sqrt(n)
Z <- (xbar-nulval)/SE
cat("The Z score for this test is", Z) # if I had more time I would love to add an "if" statement so that the sentence would say whether or not it passed.
}
hyptest(30.69, 32, 4.31, 36)
## The Z score for this test is -1.823666
The Z score corresponds to a p of .0344 so we can reject the null hypothesis.
Interpret the p-value in context of the hypothesis test and the data.
The null hypothesis is that the children in the study don’t learn to count to 10 any faster than a normal child. The p value is less than 10% and therefore lies outside the range in which we can reject the alternative hypothesis.
Calculate a 90% confidence interval for the average age at which gifted children first count to 10 successfully.
The Z score 1.28 corresponds to a p of .8997, the closest available to 90%.
cilow <- 30.69 - 1.28 * .71833 # point estimate +/- Confidence Level * SE
cihigh <- 30.69 + 1.28 * .71833
cat("The confidence interval is (", cilow, ",", cihigh, ")")
## The confidence interval is ( 29.77054 , 31.60946 )
Do your results from the hypothesis test and the confidence interval agree? Explain.
Yes. The hypothesis test shows that it is unlikely that most children can count as early as the ones in the study and the confidence interval shows that most exceptional children could count to 10 before they reach 32 months.
Perform a hypothesis test to evaluate if these data provide convincing evidence that the average IQ of mothers of gifted children is different than the average IQ for the population at large, which is 100. Use a significance level of 0.10.
H0 = The average IQ of mothers of gifted children is the same as the average IQ.
HA = The average IQ of mothers of gifted children is different than the average IQ.
## The Z score for this test is 16.8
The Z score of 16.8 is well outside of the range that would allow for accepting the null hypothesis.
cilow <- 118.2 - 1.28 * .10833 # point estimate +/- Confidence Level * SE
cihigh <- 118.2 + 1.28 * .10833
cat("The confidence interval is (", cilow, ",", cihigh, ")")
## The confidence interval is ( 118.0613 , 118.3387 )
Do your results from the hypothesis test and the confidence interval agree? Explain.
Yes. The hypothesis test and confidence interval both indicate that it is highly unlikely that the null hypothesis is true.
A sampling distribution is a representation of statistics describing repeated random population samples. Generally this refers to the point estimate (mean) of each sample. The representation is most commonly graphed as a bar chart that, as more samples are included, form a fairly normal curve and allow for the mathematical properties of a normal curve to be utilized in its analysis.
## [1] 1.5
A bulb lasting more than 10,500 hours would be at least 1.5 SD from the mean so there is a 6.7% chance of this occurring.
Describe the distribution of the mean lifespan of 15 light bulbs.
n = 15 mean = 9,000 sd = 1,000 SE = 258.2 90% confidence interval = (8670, 9330)
## The Z score for this test is -5.809475
With a Z score that far from the mean it is almost impossible to get a random sample with that mean lifespan.
bulbpop <- rnorm(1000, m=9000, sd=1000)
# plot(density(bulbpop))
# plot(density(bulbsample))
# plot(density(bulbpop,bulbsample))
# The typical method would be to use plot just once to set up the limits, possibly to include the range of all series combined, and then to use points and lines to add the separate series.) To use plot multiple times with par(new=TRUE) you need to make sure that your first plot has a proper ylim to accept the all series (and in another situation, you may need to also use the same strategy for xlim):
plot(density(bulbpop), col='red', lwd = "2")
lines(density(bulbsample))
Could you estimate the probabilities from parts (a) and (c) if the lifespans of light bulbs had a skewed distribution?
Not using CLT calculations.
Suppose you conduct a hypothesis test used on a sample where the sample size is n = 50 and arrive at a p-value of 0.08. You then refer back to your notes and discovered that you made a careless mistake, the sample size should have been n=500-. Will your p-vale increase, decrease, or stay the same? Explain.
It would be smaller because n is included in the denominator of the SE calculation, so it lowers that number and the SE is part of the calcuation for the Z score, reducing the number even further.