Mean = 171.1 cm
Median = 170.3 cm
Std dev = SD = 9.4 cm
IQR = Q3 - Q1 = 14 cm
m <- 171.1
sd <- 9.4
(z180 <- (180 - m) / sd)
## [1] 0.9468085
(z155 <- (155 - m) / sd)
## [1] -1.712766
pnorm(z155)
## [1] 0.0433778No, the sample mean and sample standard deviation are only point estimates, and as we’ve seen, they have their own sampling distributions. From sample to sample, we expect random variability in these point estimates.
We use the standard error: \(SE_{\bar{x}} = s_{\bar{x}} / \sqrt{n} = 0.42\). Assuming that all the conditions for a normal distribution are met, the sampling means \(\bar{x}\) are expected to be normally distributed with a mean equal to the population mean \(\mu\), and a standard deviation equal to the standard error \(SE_{\bar{x}}\).
(se <- sd / sqrt(507))
## [1] 0.4174687False; the confidence interval by definition includes the sample mean (so we are 100% confident!).
False: the confidence interval for one sample says nothing about the sample means for other samples.
True: the confidence interval has a 95% probability of containing the true population mean.
True: a confidence interval with a lower level of confidence is constructed using a smaller critical value \(z^{*}\), which means the confidence interval will be narrower.
False: the margin of error \(= z^{*} SE_{\bar{x}}\), and the standard error \(SE_{\bar{x}} = s_{\bar{x}} / \sqrt{n}\). So in order to decrease the margin of error by a factor of 1/3, we need to increase the sample size n by a factor of 9.
True: the margin of error is half the width of the confidence interval, which is 4.4.
(89.11 - 80.31) / 2
## [1] 4.4\(H_{0}\): \(\mu_{gifted} = 32\)
\(H_{A}\): \(\mu_{gifted} < 32\)
\(\alpha = 0.10\)
Note that this is a 1-tailed test.
mu = 32
n <- 36
m <- 30.69
s <- 4.31
(se <- s / sqrt(n))
## [1] 0.7183333
(z <- (m - mu) / se)
## [1] -1.823666
pnorm(z)
## [1] 0.0341013The 1-tailed p-value of 0.034 indicates that, if the null hypothesis is true, then there is only a 3.4% chance that the sample mean would be 30.69 months or less. Since this is less than the signficance level of 10%, we reject the null hypothesis in favor of the alternative hypothesis.
The critical value \(z^* = 1.645\) for a two-tailed 90% confidence interval. In this case, the confidence interval is [29.51, 31.87]. Note that the population mean \(\mu = 32\) barely falls outside the confidence interval; on the surface, this would suggest that the average age of gifted children when they first count to 10 may not be statistically significantly different than 32, at a 90% confidence level.
However, it’s important to keep in mind that the hypothesis test was done as a 1-tailed test, so the 2-tailed confidence interval is not directly comparable. Consider how we would construct a 1-tailed confidence interval about the sample mean \(m = 30.69\), with 10% of the probability in the right tail. Then \(z^* = 1.282\), and the interval would span from 0 to \(m + z^* SE_{\bar{x}} = 31.61\). This more strongly suggests that the population average is outside the 90% confidence interval of [0, 31.61], compared to the 2-tailed confidence interval.
(z <- qnorm(0.05))
## [1] -1.644854
qnorm(0.95)
## [1] 1.644854
m + z * se
## [1] 29.50845
m - z * se
## [1] 31.87155
(z1 <- qnorm(0.9))
## [1] 1.281552
m + z1 * se
## [1] 31.61058At first they appeared to almost disagree, since the hypothesis test was a 1-tailed test and the usual confidence interval is 2-tailed. However, after adjusting the confidence interval to be 1-tailed, they agree.
\(H_0: \mu_{\text{gifted}} = 100\)
\(H_A: \mu_{\text{gifted}} \neq 100\)
\(\alpha = 0.10\)
mu <- 100
n <- 36
m <- 118.2
s <- 6.5
(se <- s / sqrt(n))
## [1] 1.083333
(z <- (m - mu) / se)
## [1] 16.8
# 2-tailed p-value
pnorm(-z)*2
## [1] 2.44044e-63For a 90% confidence interval around the sample mean, the critical value is \(z^* = 1.645\), in which case the confidence interval is [116.4, 120.0]. Note that the population average IQ of 100 is dramatically far from the confidence interval.
# 2-tailed critical value at alpha = 90%
(zstar <- qnorm(0.95))
## [1] 1.644854
m - zstar * se
## [1] 116.4181
m + zstar * se
## [1] 119.9819Yes the results agree, and both indicate that at a 10% significance level, mothers of gifted children have a different (higher) IQ on average than the general population. In fact, this would be true even at a much lower significance level, say 0.01.
The “sampling distribution” of the mean refers to the distribution that arises when we (a) select an infinite number of independent random samples (each having the same sample size) from the same population, and (b) then compute the mean for each sample and plot the means on a histrogram.
As the sample size increases, the sampling distribution approaches a normal distribution with mean equal to the population mean and standard deviation equal to the standard error. In other words, as the sample size increases:
Normal distribution with \(\mu = 9,000\) and \(\sigma = 1,000\)
\(P(x > 10,500) = 6.68\%\)
mu = 9000
sigma = 1000
pnorm(10500, mean = mu, sd = sigma, lower.tail = FALSE)
## [1] 0.0668072
normalPlot(mean = mu, sd = sigma, bounds = c(10500, Inf))
Assuming the 15 light bulbs are independently chosen at random from the overall population, which is approximated by a normal distribuion, then the sampling distribution should also be approximately normal. The sampling distribution should be centered at \(\mu = 9000\) and have standard deviation \(SE_\bar{x} = \sigma / \sqrt{n} = 258.2\). We can see this in the graphic below.
n <- 15
(se <- sigma / sqrt(n))
## [1] 258.1989
# generate samples of size 15
sample_means <- rep(NA, 5000)
for (i in 1:5000) {
samp <- rnorm(n, mean = mu, sd = sigma)
sample_means[i] <- mean(samp)
}
hist(sample_means, breaks = 50, probability = TRUE)
x <- seq(7000, 11000, 0.5)
y <- dnorm(x = x, mean = mu, sd = se)
lines(x = x, y = y, col = "blue")
The Z-score of 10,500 from the sampling distribution is \(Z_{10500} = 5.81\), which corresponds to a p-value of 3 parts in a billion, e.g., almost 0. This much smaller than the probability of a single bulb lasting longer than 10,500 hours.
(z <- (10500 - mu) / se)
## [1] 5.809475
pnorm(z, lower.tail = FALSE)
## [1] 3.133452e-09
# alternatively, just computed directly
pnorm(10500, mu, se, lower.tail = FALSE)
## [1] 3.133452e-09See the graphic below; the blue line graphs the population distribuion, while the black line graphs the sampling distribution of the mean.
curve(dnorm(x, mu, se), 7000, 11000)
x <- seq(7000, 11000, 1)
y <- dnorm(x, mu, sigma)
lines(x = x, y = y, col = "blue")
No. If the distribution were skewed, then we couldn’t have used the normal distribution to calculate part (a). And for part (c), the sample size of 15 is \(\ll\) 30, so not large enough to rely on the Central Limit Theorem to provide for a normally distributed sampling distribution.
The wording in this question is ambiguous, as there seem to be two ways to interpret the question:
First, we hold the sample observation constant (from the question heading “Same observation, different sample size”), which means the sample mean \(\bar{x}\) and sample standard deviation \(s_{\bar{x}}\) remain constant and we just change the sample size \(n\). This corresponds to the case where the sample size was in fact 500 and the observations were made, and in the last step, a sample size of 50 was mistakenly used to arrive at the p-value. In this case, the sample size increases by a factor of 10, and other quantities change as follows:
\[n = 50 \rightarrow 500 = 10n\] \[SE_{\bar{x}} = s_{\bar{x}} / \sqrt{n} \rightarrow SE_{\bar{x}} / \sqrt{10} = 0.316 SE_{\bar{x}}\] \[Z = (\bar{x} - \mu_0) / SE_{\bar{x}} \rightarrow \sqrt{10} Z = 3.16 Z\]
So we have a smaller standard error (by a factor of 0.316) and a larger Z-score (by a factor of 3.16), which means that the p-value will decrease substantially.
For instance, if we assume a two-tailed hypothesis test, then the p-value of 0.08 correspondes to a Z-score of +1.75 or -1.75. Following the logic above, the Z-score would increase to 5.54 (3.16 * 1.75), in which case the p-value would decrease to 3 parts in 100 million.
# 2-tailed
(z <- qnorm(0.04, lower.tail = FALSE))
## [1] 1.750686
(z1 <- z * sqrt(10))
## [1] 5.536155
pnorm(-z1) * 2
## [1] 3.091832e-08On the other hand, this question could be interpreted as follows: We assume that the sample was done for 50 observations, and the sample mean and sample standard deviation were computed correctly for this sample of size 50. In this case, the sample would have to be redone, this time randomly selecting 500 observations. Now the sample mean and sample standard deviation will change, and we can’t infer in general what the new values will be. However given that the sample size is increasing by a factor of 10, it would be expected that the standard error should decrease, the z-score should increase, and the p-value should decrease. Of course, this all assumes that the sample of size 500 has a sample mean and sample standard deviation having the same order of magnitude as the first sample of size 50.