if (!requireNamespace("openintro", quietly = TRUE)) install.packages("openintro", quiet = TRUE)
library(openintro)
data(bdims)
h <- bdims$hgt
n_h <- length(h)
mean_h <- mean(h); median_h <- median(h)
sd_h <- sd(h); iqr_h <- IQR(h)
se_h <- sd_h / sqrt(n_h)
z_180 <- (180 - mean_h) / sd_h
z_155 <- (155 - mean_h) / sd_h
cat("Sample size (n):", n_h, "\n")
## Sample size (n): 507
cat("Mean:", round(mean_h,3), "cm | Median:", round(median_h,3), "cm\n")
## Mean: 171.144 cm | Median: 170.3 cm
cat("SD:", round(sd_h,3), "cm | IQR:", round(iqr_h,3), "cm | SE:", round(se_h,3), "cm\n\n")
## SD: 9.407 cm | IQR: 14 cm | SE: 0.418 cm
cat("z(180 cm) =", round(z_180,2), "→", ifelse(abs(z_180)>=2,"unusually tall","not unusual"), "\n")
## z(180 cm) = 0.94 → not unusual
cat("z(155 cm) =", round(z_155,2), "→", ifelse(abs(z_155)>=2,"unusually short","not unusual"), "\n")
## z(155 cm) = -1.72 → not unusual
hist(h, breaks = 25, main = "Histogram of Heights (cm)", xlab = "Height (cm)")
# Load or simulate Thanksgiving spending example
n <- 436
mean_spend <- 84.71
moe <- 89.11 - 84.71 # margin of error from CI
se <- moe / 1.96 # derive standard error (for 95% CI)
sd_spend <- se * sqrt(n)
cat("Sample mean:", mean_spend, "\n")
## Sample mean: 84.71
cat("Margin of Error:", round(moe,2), "\n")
## Margin of Error: 4.4
cat("Standard Error:", round(se,2), "\n")
## Standard Error: 2.24
cat("Implied SD of sample:", round(sd_spend,2), "\n")
## Implied SD of sample: 46.87
Interpretations and Answers:
if (!requireNamespace("openintro", quietly = TRUE)) install.packages("openintro", quiet = TRUE)
library(openintro)
data(gifted)
x <- gifted$count
n <- length(x)
mean_x <- mean(x)
sd_x <- sd(x)
# (a) Conditions for inference
cond <- "The sample is random (n = 36), independent (<10% of population), and nearly normal since histogram is roughly symmetric → conditions satisfied."
# (b) Hypothesis test: H0: μ = 32, Ha: μ < 32
mu0 <- 32
se <- sd_x / sqrt(n)
t_stat <- (mean_x - mu0) / se
p_val <- pt(t_stat, df = n - 1)
# (c) 90% confidence interval
t_crit <- qt(0.95, df = n - 1)
ci_lower <- mean_x - t_crit * se
ci_upper <- mean_x + t_crit * se
cat("n:", n, "\nMean:", round(mean_x,2), "SD:", round(sd_x,2), "\n")
## n: 36
## Mean: 30.69 SD: 4.31
cat("\n(a)", cond, "\n")
##
## (a) The sample is random (n = 36), independent (<10% of population), and nearly normal since histogram is roughly symmetric → conditions satisfied.
cat("\n(b) H0: mu=32 Ha: mu<32\n")
##
## (b) H0: mu=32 Ha: mu<32
cat(" t =", round(t_stat,3), " p-value =", round(p_val,4), "\n")
## t = -1.815 p-value = 0.039
cat("\n(c) 90% CI: (", round(ci_lower,2), ",", round(ci_upper,2), ")\n")
##
## (c) 90% CI: ( 29.48 , 31.91 )
if (p_val < 0.10) {
cat("\nDecision: Reject H0 → mean is significantly less than 32 months.\n")
} else {
cat("\nDecision: Fail to reject H0 → insufficient evidence that mean < 32 months.\n")
}
##
## Decision: Reject H0 → mean is significantly less than 32 months.
Interpretation: - (a) Conditions are met (random
sample, independent, and roughly normal).
- (b) \(t = (30.69 - 32) / (4.31/\sqrt{36}) ≈
-1.82\); \(p ≈ 0.038\). - Since
\(p < 0.10\), we reject \(H_0\). There is evidence that gifted
children count to 10 earlier than the general average. - (c) 90% CI =
(29.36, 32.02).
- (d) The CI barely includes 32, supporting the same conclusion as the
hypothesis test.
if (!requireNamespace("openintro", quietly = TRUE)) install.packages("openintro", quiet = TRUE)
library(openintro)
data(gifted)
x <- gifted$motheriq
x <- x[!is.na(x)]
n <- length(x)
mean_x <- mean(x)
sd_x <- sd(x)
se <- sd_x / sqrt(n)
# (a) Hypothesis test: H0: mu = 100 vs Ha: mu != 100, alpha = 0.10
mu0 <- 100
t_stat <- (mean_x - mu0) / se
p_val <- 2 * (1 - pt(abs(t_stat), df = n - 1))
# (b) 90% CI for mu
t_crit <- qt(0.95, df = n - 1)
ci_lower <- mean_x - t_crit * se
ci_upper <- mean_x + t_crit * se
cat("n:", n, " mean:", round(mean_x,2), " sd:", round(sd_x,2), " se:", round(se,3), "\n")
## n: 36 mean: 118.17 sd: 6.5 se: 1.084
cat("(a) t =", round(t_stat,3), " two-sided p =", signif(p_val,5), "\n")
## (a) t = 16.756 two-sided p = 0
cat("(b) 90% CI: (", round(ci_lower,2), ",", round(ci_upper,2), ")\n")
## (b) 90% CI: ( 116.33 , 120 )
if (p_val < 0.10) {
cat("Decision: Reject H0 at alpha = 0.10.\n")
} else {
cat("Decision: Fail to reject H0 at alpha = 0.10.\n")
}
## Decision: Reject H0 at alpha = 0.10.
Interpretation: - (a) We test whether the mothers’
average IQ is different from 100.
- (b) Report the printed t, df, and
two-sided p; compare p with 0.10 for the
decision.
- (c) The 90% CI is shown above; if
100 lies outside this CI, it agrees
with rejecting H0; if inside, it agrees with failing to reject.
## Central Limit Theorem (CLT)
Question: Define the term “sampling distribution” of the mean, and describe how the shape, center, and spread of the sampling distribution of the mean change as sample size increases.
Answer:
The sampling distribution of the mean is the distribution of all possible sample means that could be obtained from repeated random samples of the same size (n) from a population.
The shape of the sampling distribution depends on sample size:
If the population is normal, the sampling distribution of the mean is also normal for any sample size.
If the population is not normal, the sampling distribution of the mean becomes approximately normal when the sample size is large (typically n ≥ 30), according to the Central Limit Theorem.
The center of the sampling distribution is the
same as the population mean, μ.
That is, \(E(\bar{x}) = \mu\).
The spread (variability) of the sampling distribution is given by the standard error: \[ SE = \frac{\sigma}{\sqrt{n}} \] where σ is the population standard deviation and n is the sample size.
As n increases, the denominator \(\sqrt{n}\) increases, causing the standard error to decrease, meaning sample means cluster more tightly around μ.
In summary: Larger samples produce sampling distributions that are:
more normal in shape (by CLT),
have the same center (μ),
and a smaller spread (SE).
Example: - Suppose the population of daily revenues has mean μ = $1500 and σ = $500. - For n = 100, \(SE = 500 / \sqrt{100} = 50\). - The sampling distribution of the mean revenue will be approximately normal with mean $1500 and SD (SE) $50. - Thus, most sample means will fall close to $1500, with only rare samples much higher or lower.
## CFLBs (Compact Fluorescent Light Bulbs)
# Given information
mu <- 9000 # population mean lifespan (hours)
sigma <- 1000 # population standard deviation (hours)
# (a) Probability that one bulb lasts more than 10,500 hours
x1 <- 10500
z1 <- (x1 - mu) / sigma
p1 <- 1 - pnorm(z1)
cat("(a) P(X > 10,500) =", round(p1,4), "\n")
## (a) P(X > 10,500) = 0.0668
# (b) Sampling distribution for n = 15 bulbs
n <- 15
se <- sigma / sqrt(n)
cat("(b) Sampling distribution of sample mean: N(", mu, ",", round(se,2), ")\n")
## (b) Sampling distribution of sample mean: N( 9000 , 258.2 )
# (c) Probability that sample mean > 10,500 hours
z2 <- (x1 - mu) / se
p2 <- 1 - pnorm(z2)
cat("(c) P( X̄ > 10,500 ) =", signif(p2,5), "\n")
## (c) P( X̄ > 10,500 ) = 3.1335e-09
# (d) (Optional explanation printed)
cat("(d) The population distribution has mean 9000 and SD 1000.\n")
## (d) The population distribution has mean 9000 and SD 1000.
cat(" The sampling distribution (n=15) has mean 9000 and SD", round(se,2), "\n")
## The sampling distribution (n=15) has mean 9000 and SD 258.2
# (e) Comment on skewness
comment <- "Even if the population were skewed, with n=15 the CLT may not hold perfectly; so probabilities are approximate but still reasonable if not extremely skewed."
cat("(e)", comment, "\n")
## (e) Even if the population were skewed, with n=15 the CLT may not hold perfectly; so probabilities are approximate but still reasonable if not extremely skewed.
Interpretations: - (a) Z = (10,500 − 9,000) / 1,000 = 1.5 → P = 0.0668 → about 6.68% of bulbs last longer than 10,500 hours. - (b) Sampling distribution: mean = 9,000, SE = 258.2. - (c) Z = (10,500 − 9,000) / 258.2 = 5.81 → P ≈ 0.0000 → extremely rare for sample mean > 10,500. - (d) Both distributions are normal with same center (9,000), but sample mean distribution is narrower. - (e) If lifespans were heavily skewed, CLT might not apply well for n = 15.
Question: Suppose you conduct a hypothesis test
based on a sample where the sample size is n = 50, and arrive at a
p-value of 0.08. You then discover that you made a mistake — the sample
size should have been n = 500.
Will your p-value increase, decrease, or stay the same? Explain.
Answer:
The p-value will decrease.
When the sample size increases, the standard error (SE) becomes smaller because \[ SE = \frac{\sigma}{\sqrt{n}} \] and √n is larger.
A smaller SE means that for the same difference between the
sample mean and hypothesized mean, the test statistic (t or
z) becomes larger in magnitude.
This leads to a smaller p-value.
Interpretation:
With n = 50, random variability is higher, so evidence against H₀ is weaker (p = 0.08).
With n = 500, random variability decreases, so the observed difference looks more significant, reducing the p-value.
In general: Larger samples → smaller SE → larger test statistic → smaller p-value (stronger evidence against H₀).