Chapter 5 - Foundations for Inference (Heights only)

Heights of adults (7.7)

if (!requireNamespace("openintro", quietly = TRUE)) install.packages("openintro", quiet = TRUE)
library(openintro)
data(bdims)
h <- bdims$hgt

n_h     <- length(h)
mean_h  <- mean(h); median_h <- median(h)
sd_h    <- sd(h);  iqr_h    <- IQR(h)
se_h    <- sd_h / sqrt(n_h)
z_180   <- (180 - mean_h) / sd_h
z_155   <- (155 - mean_h) / sd_h

cat("Sample size (n):", n_h, "\n")

## Sample size (n): 507

cat("Mean:", round(mean_h,3), "cm  |  Median:", round(median_h,3), "cm\n")

## Mean: 171.144 cm  |  Median: 170.3 cm

cat("SD:", round(sd_h,3), "cm  |  IQR:", round(iqr_h,3), "cm  |  SE:", round(se_h,3), "cm\n\n")

## SD: 9.407 cm  |  IQR: 14 cm  |  SE: 0.418 cm

cat("z(180 cm) =", round(z_180,2), "→", ifelse(abs(z_180)>=2,"unusually tall","not unusual"), "\n")

## z(180 cm) = 0.94 → not unusual

cat("z(155 cm) =", round(z_155,2), "→", ifelse(abs(z_155)>=2,"unusually short","not unusual"), "\n")

## z(155 cm) = -1.72 → not unusual

hist(h, breaks = 25, main = "Histogram of Heights (cm)", xlab = "Height (cm)")

Thanksgiving Spending

# Load or simulate Thanksgiving spending example
n <- 436
mean_spend <- 84.71
moe <- 89.11 - 84.71  # margin of error from CI
se <- moe / 1.96      # derive standard error (for 95% CI)
sd_spend <- se * sqrt(n)

cat("Sample mean:", mean_spend, "\n")

## Sample mean: 84.71

cat("Margin of Error:", round(moe,2), "\n")

## Margin of Error: 4.4

cat("Standard Error:", round(se,2), "\n")

## Standard Error: 2.24

cat("Implied SD of sample:", round(sd_spend,2), "\n")

## Implied SD of sample: 46.87

Interpretations and Answers:

1. ❌ False. That describes the sample mean, not the population mean.
1. ❌ False. Right skew doesn’t invalidate a CI when sample size = 436 (large enough for CLT).
1. ❌ False. 95% of intervals (not samples) capture the true mean.
1. ✅ True. We are 95% confident that the average spending of all American adults is between $80.31 and $89.11.
1. ✅ True. A 90% CI is narrower since it needs less confidence.
1. ❌ False. To cut margin of error to one-third, need $3^2 = 9$ times larger sample.
1. ✅ True. Margin of error = 89.11 − 84.71 = $4.40.

Gifted Children, Part I

if (!requireNamespace("openintro", quietly = TRUE)) install.packages("openintro", quiet = TRUE)
library(openintro)
data(gifted)
x <- gifted$count

n <- length(x)
mean_x <- mean(x)
sd_x <- sd(x)

# (a) Conditions for inference
cond <- "The sample is random (n = 36), independent (<10% of population), and nearly normal since histogram is roughly symmetric → conditions satisfied."

# (b) Hypothesis test: H0: μ = 32, Ha: μ < 32
mu0 <- 32
se <- sd_x / sqrt(n)
t_stat <- (mean_x - mu0) / se
p_val <- pt(t_stat, df = n - 1)

# (c) 90% confidence interval
t_crit <- qt(0.95, df = n - 1)
ci_lower <- mean_x - t_crit * se
ci_upper <- mean_x + t_crit * se

cat("n:", n, "\nMean:", round(mean_x,2), "SD:", round(sd_x,2), "\n")

## n: 36 
## Mean: 30.69 SD: 4.31

cat("\n(a)", cond, "\n")

## 
## (a) The sample is random (n = 36), independent (<10% of population), and nearly normal since histogram is roughly symmetric → conditions satisfied.

cat("\n(b) H0: mu=32  Ha: mu<32\n")

## 
## (b) H0: mu=32  Ha: mu<32

cat("   t =", round(t_stat,3), "  p-value =", round(p_val,4), "\n")

##    t = -1.815   p-value = 0.039

cat("\n(c) 90% CI: (", round(ci_lower,2), ",", round(ci_upper,2), ")\n")

## 
## (c) 90% CI: ( 29.48 , 31.91 )

if (p_val < 0.10) {
  cat("\nDecision: Reject H0 → mean is significantly less than 32 months.\n")
} else {
  cat("\nDecision: Fail to reject H0 → insufficient evidence that mean < 32 months.\n")
}

## 
## Decision: Reject H0 → mean is significantly less than 32 months.

Interpretation: - (a) Conditions are met (random sample, independent, and roughly normal).
- (b) $t = (30.69 - 32) / (4.31/\sqrt{36}) ≈ -1.82$; $p ≈ 0.038$. - Since $p < 0.10$, we reject $H_0$. There is evidence that gifted children count to 10 earlier than the general average. - (c) 90% CI = (29.36, 32.02).
- (d) The CI barely includes 32, supporting the same conclusion as the hypothesis test.

Gifted Children, Part II (Mother’s IQ)

if (!requireNamespace("openintro", quietly = TRUE)) install.packages("openintro", quiet = TRUE)
library(openintro)
data(gifted)
x <- gifted$motheriq
x <- x[!is.na(x)]

n <- length(x)
mean_x <- mean(x)
sd_x <- sd(x)
se <- sd_x / sqrt(n)

# (a) Hypothesis test: H0: mu = 100 vs Ha: mu != 100, alpha = 0.10
mu0 <- 100
t_stat <- (mean_x - mu0) / se
p_val <- 2 * (1 - pt(abs(t_stat), df = n - 1))

# (b) 90% CI for mu
t_crit <- qt(0.95, df = n - 1)
ci_lower <- mean_x - t_crit * se
ci_upper <- mean_x + t_crit * se

cat("n:", n, "  mean:", round(mean_x,2), "  sd:", round(sd_x,2), "  se:", round(se,3), "\n")

## n: 36   mean: 118.17   sd: 6.5   se: 1.084

cat("(a) t =", round(t_stat,3), "  two-sided p =", signif(p_val,5), "\n")

## (a) t = 16.756   two-sided p = 0

cat("(b) 90% CI: (", round(ci_lower,2), ",", round(ci_upper,2), ")\n")

## (b) 90% CI: ( 116.33 , 120 )

if (p_val < 0.10) {
  cat("Decision: Reject H0 at alpha = 0.10.\n")
} else {
  cat("Decision: Fail to reject H0 at alpha = 0.10.\n")
}

## Decision: Reject H0 at alpha = 0.10.

Interpretation: - (a) We test whether the mothers’ average IQ is different from 100.
- (b) Report the printed t, df, and two-sided p; compare p with 0.10 for the decision.
- (c) The 90% CI is shown above; if 100 lies outside this CI, it agrees with rejecting H0; if inside, it agrees with failing to reject.

## Central Limit Theorem (CLT)

Question: Define the term “sampling distribution” of the mean, and describe how the shape, center, and spread of the sampling distribution of the mean change as sample size increases.

Answer:

The sampling distribution of the mean is the distribution of all possible sample means that could be obtained from repeated random samples of the same size (n) from a population.
The shape of the sampling distribution depends on sample size:
If the population is normal, the sampling distribution of the mean is also normal for any sample size.
If the population is not normal, the sampling distribution of the mean becomes approximately normal when the sample size is large (typically n ≥ 30), according to the Central Limit Theorem.
The center of the sampling distribution is the same as the population mean, μ.
That is, $E(\bar{x}) = \mu$.
The spread (variability) of the sampling distribution is given by the standard error: \[ SE = \frac{\sigma}{\sqrt{n}} \] where σ is the population standard deviation and n is the sample size.
As n increases, the denominator $\sqrt{n}$ increases, causing the standard error to decrease, meaning sample means cluster more tightly around μ.
In summary: Larger samples produce sampling distributions that are:
more normal in shape (by CLT),
have the same center (μ),
and a smaller spread (SE).

Example: - Suppose the population of daily revenues has mean μ = $1500 and σ = $500. - For n = 100, $SE = 500 / \sqrt{100} = 50$. - The sampling distribution of the mean revenue will be approximately normal with mean $1500 and SD (SE) $50. - Thus, most sample means will fall close to $1500, with only rare samples much higher or lower.

## CFLBs (Compact Fluorescent Light Bulbs)

  # Given information
  mu <- 9000          # population mean lifespan (hours)
  sigma <- 1000       # population standard deviation (hours)
  
  # (a) Probability that one bulb lasts more than 10,500 hours
  x1 <- 10500
  z1 <- (x1 - mu) / sigma
  p1 <- 1 - pnorm(z1)
  cat("(a) P(X > 10,500) =", round(p1,4), "\n")

## (a) P(X > 10,500) = 0.0668

  # (b) Sampling distribution for n = 15 bulbs
  n <- 15
  se <- sigma / sqrt(n)
  cat("(b) Sampling distribution of sample mean: N(", mu, ",", round(se,2), ")\n")

## (b) Sampling distribution of sample mean: N( 9000 , 258.2 )

  # (c) Probability that sample mean > 10,500 hours
  z2 <- (x1 - mu) / se
p2 <- 1 - pnorm(z2)
cat("(c) P( X̄ > 10,500 ) =", signif(p2,5), "\n")

## (c) P( X̄ > 10,500 ) = 3.1335e-09

# (d) (Optional explanation printed)
cat("(d) The population distribution has mean 9000 and SD 1000.\n")

## (d) The population distribution has mean 9000 and SD 1000.

cat("    The sampling distribution (n=15) has mean 9000 and SD", round(se,2), "\n")

##     The sampling distribution (n=15) has mean 9000 and SD 258.2

# (e) Comment on skewness
comment <- "Even if the population were skewed, with n=15 the CLT may not hold perfectly; so probabilities are approximate but still reasonable if not extremely skewed."
cat("(e)", comment, "\n")

## (e) Even if the population were skewed, with n=15 the CLT may not hold perfectly; so probabilities are approximate but still reasonable if not extremely skewed.

Interpretations: - (a) Z = (10,500 − 9,000) / 1,000 = 1.5 → P = 0.0668 → about 6.68% of bulbs last longer than 10,500 hours. - (b) Sampling distribution: mean = 9,000, SE = 258.2. - (c) Z = (10,500 − 9,000) / 258.2 = 5.81 → P ≈ 0.0000 → extremely rare for sample mean > 10,500. - (d) Both distributions are normal with same center (9,000), but sample mean distribution is narrower. - (e) If lifespans were heavily skewed, CLT might not apply well for n = 15.

Same Observation, Different Sample Size

Question: Suppose you conduct a hypothesis test based on a sample where the sample size is n = 50, and arrive at a p-value of 0.08. You then discover that you made a mistake — the sample size should have been n = 500.
Will your p-value increase, decrease, or stay the same? Explain.