Heights of adults (7.7)

if (!requireNamespace("openintro", quietly = TRUE)) install.packages("openintro", quiet = TRUE)
library(openintro)
data(bdims)
h <- bdims$hgt

n_h     <- length(h)
mean_h  <- mean(h); median_h <- median(h)
sd_h    <- sd(h);  iqr_h    <- IQR(h)
se_h    <- sd_h / sqrt(n_h)
z_180   <- (180 - mean_h) / sd_h
z_155   <- (155 - mean_h) / sd_h

cat("Sample size (n):", n_h, "\n")
## Sample size (n): 507
cat("Mean:", round(mean_h,3), "cm  |  Median:", round(median_h,3), "cm\n")
## Mean: 171.144 cm  |  Median: 170.3 cm
cat("SD:", round(sd_h,3), "cm  |  IQR:", round(iqr_h,3), "cm  |  SE:", round(se_h,3), "cm\n\n")
## SD: 9.407 cm  |  IQR: 14 cm  |  SE: 0.418 cm
cat("z(180 cm) =", round(z_180,2), "→", ifelse(abs(z_180)>=2,"unusually tall","not unusual"), "\n")
## z(180 cm) = 0.94 → not unusual
cat("z(155 cm) =", round(z_155,2), "→", ifelse(abs(z_155)>=2,"unusually short","not unusual"), "\n")
## z(155 cm) = -1.72 → not unusual
hist(h, breaks = 25, main = "Histogram of Heights (cm)", xlab = "Height (cm)")

Thanksgiving Spending

# Load or simulate Thanksgiving spending example
n <- 436
mean_spend <- 84.71
moe <- 89.11 - 84.71  # margin of error from CI
se <- moe / 1.96      # derive standard error (for 95% CI)
sd_spend <- se * sqrt(n)

cat("Sample mean:", mean_spend, "\n")
## Sample mean: 84.71
cat("Margin of Error:", round(moe,2), "\n")
## Margin of Error: 4.4
cat("Standard Error:", round(se,2), "\n")
## Standard Error: 2.24
cat("Implied SD of sample:", round(sd_spend,2), "\n")
## Implied SD of sample: 46.87

Interpretations and Answers:

Gifted Children, Part I

if (!requireNamespace("openintro", quietly = TRUE)) install.packages("openintro", quiet = TRUE)
library(openintro)
data(gifted)
x <- gifted$count

n <- length(x)
mean_x <- mean(x)
sd_x <- sd(x)

# (a) Conditions for inference
cond <- "The sample is random (n = 36), independent (<10% of population), and nearly normal since histogram is roughly symmetric → conditions satisfied."

# (b) Hypothesis test: H0: μ = 32, Ha: μ < 32
mu0 <- 32
se <- sd_x / sqrt(n)
t_stat <- (mean_x - mu0) / se
p_val <- pt(t_stat, df = n - 1)

# (c) 90% confidence interval
t_crit <- qt(0.95, df = n - 1)
ci_lower <- mean_x - t_crit * se
ci_upper <- mean_x + t_crit * se

cat("n:", n, "\nMean:", round(mean_x,2), "SD:", round(sd_x,2), "\n")
## n: 36 
## Mean: 30.69 SD: 4.31
cat("\n(a)", cond, "\n")
## 
## (a) The sample is random (n = 36), independent (<10% of population), and nearly normal since histogram is roughly symmetric → conditions satisfied.
cat("\n(b) H0: mu=32  Ha: mu<32\n")
## 
## (b) H0: mu=32  Ha: mu<32
cat("   t =", round(t_stat,3), "  p-value =", round(p_val,4), "\n")
##    t = -1.815   p-value = 0.039
cat("\n(c) 90% CI: (", round(ci_lower,2), ",", round(ci_upper,2), ")\n")
## 
## (c) 90% CI: ( 29.48 , 31.91 )
if (p_val < 0.10) {
  cat("\nDecision: Reject H0 → mean is significantly less than 32 months.\n")
} else {
  cat("\nDecision: Fail to reject H0 → insufficient evidence that mean < 32 months.\n")
}
## 
## Decision: Reject H0 → mean is significantly less than 32 months.

Interpretation: - (a) Conditions are met (random sample, independent, and roughly normal).
- (b) \(t = (30.69 - 32) / (4.31/\sqrt{36}) ≈ -1.82\); \(p ≈ 0.038\). - Since \(p < 0.10\), we reject \(H_0\). There is evidence that gifted children count to 10 earlier than the general average. - (c) 90% CI = (29.36, 32.02).
- (d) The CI barely includes 32, supporting the same conclusion as the hypothesis test.

Gifted Children, Part II (Mother’s IQ)

if (!requireNamespace("openintro", quietly = TRUE)) install.packages("openintro", quiet = TRUE)
library(openintro)
data(gifted)
x <- gifted$motheriq
x <- x[!is.na(x)]

n <- length(x)
mean_x <- mean(x)
sd_x <- sd(x)
se <- sd_x / sqrt(n)

# (a) Hypothesis test: H0: mu = 100 vs Ha: mu != 100, alpha = 0.10
mu0 <- 100
t_stat <- (mean_x - mu0) / se
p_val <- 2 * (1 - pt(abs(t_stat), df = n - 1))

# (b) 90% CI for mu
t_crit <- qt(0.95, df = n - 1)
ci_lower <- mean_x - t_crit * se
ci_upper <- mean_x + t_crit * se

cat("n:", n, "  mean:", round(mean_x,2), "  sd:", round(sd_x,2), "  se:", round(se,3), "\n")
## n: 36   mean: 118.17   sd: 6.5   se: 1.084
cat("(a) t =", round(t_stat,3), "  two-sided p =", signif(p_val,5), "\n")
## (a) t = 16.756   two-sided p = 0
cat("(b) 90% CI: (", round(ci_lower,2), ",", round(ci_upper,2), ")\n")
## (b) 90% CI: ( 116.33 , 120 )
if (p_val < 0.10) {
  cat("Decision: Reject H0 at alpha = 0.10.\n")
} else {
  cat("Decision: Fail to reject H0 at alpha = 0.10.\n")
}
## Decision: Reject H0 at alpha = 0.10.

Interpretation: - (a) We test whether the mothers’ average IQ is different from 100.
- (b) Report the printed t, df, and two-sided p; compare p with 0.10 for the decision.
- (c) The 90% CI is shown above; if 100 lies outside this CI, it agrees with rejecting H0; if inside, it agrees with failing to reject.

## Central Limit Theorem (CLT)

Question: Define the term “sampling distribution” of the mean, and describe how the shape, center, and spread of the sampling distribution of the mean change as sample size increases.

Answer:

Example: - Suppose the population of daily revenues has mean μ = $1500 and σ = $500. - For n = 100, \(SE = 500 / \sqrt{100} = 50\). - The sampling distribution of the mean revenue will be approximately normal with mean $1500 and SD (SE) $50. - Thus, most sample means will fall close to $1500, with only rare samples much higher or lower.

## CFLBs (Compact Fluorescent Light Bulbs)

  # Given information
  mu <- 9000          # population mean lifespan (hours)
  sigma <- 1000       # population standard deviation (hours)
  
  # (a) Probability that one bulb lasts more than 10,500 hours
  x1 <- 10500
  z1 <- (x1 - mu) / sigma
  p1 <- 1 - pnorm(z1)
  cat("(a) P(X > 10,500) =", round(p1,4), "\n")
## (a) P(X > 10,500) = 0.0668
  # (b) Sampling distribution for n = 15 bulbs
  n <- 15
  se <- sigma / sqrt(n)
  cat("(b) Sampling distribution of sample mean: N(", mu, ",", round(se,2), ")\n")
## (b) Sampling distribution of sample mean: N( 9000 , 258.2 )
  # (c) Probability that sample mean > 10,500 hours
  z2 <- (x1 - mu) / se
p2 <- 1 - pnorm(z2)
cat("(c) P( X̄ > 10,500 ) =", signif(p2,5), "\n")
## (c) P( X̄ > 10,500 ) = 3.1335e-09
# (d) (Optional explanation printed)
cat("(d) The population distribution has mean 9000 and SD 1000.\n")
## (d) The population distribution has mean 9000 and SD 1000.
cat("    The sampling distribution (n=15) has mean 9000 and SD", round(se,2), "\n")
##     The sampling distribution (n=15) has mean 9000 and SD 258.2
# (e) Comment on skewness
comment <- "Even if the population were skewed, with n=15 the CLT may not hold perfectly; so probabilities are approximate but still reasonable if not extremely skewed."
cat("(e)", comment, "\n")
## (e) Even if the population were skewed, with n=15 the CLT may not hold perfectly; so probabilities are approximate but still reasonable if not extremely skewed.

Interpretations: - (a) Z = (10,500 − 9,000) / 1,000 = 1.5 → P = 0.0668 → about 6.68% of bulbs last longer than 10,500 hours. - (b) Sampling distribution: mean = 9,000, SE = 258.2. - (c) Z = (10,500 − 9,000) / 258.2 = 5.81 → P ≈ 0.0000 → extremely rare for sample mean > 10,500. - (d) Both distributions are normal with same center (9,000), but sample mean distribution is narrower. - (e) If lifespans were heavily skewed, CLT might not apply well for n = 15.

Same Observation, Different Sample Size

Question: Suppose you conduct a hypothesis test based on a sample where the sample size is n = 50, and arrive at a p-value of 0.08. You then discover that you made a mistake — the sample size should have been n = 500.
Will your p-value increase, decrease, or stay the same? Explain.

Answer: