This week I’m using hypothesis testing to answer specific questions about the Ames housing market. Rather than just describing patterns, I’m testing whether observed differences between groups are statistically significant or could have occurred by chance. I’ll use two different testing frameworks: Neyman-Pearson (with power analysis) for Hypothesis 1, and Fisher’s Significance Testing (p-value interpretation) for Hypothesis 2.
ames <- read.csv("ames.csv", stringsAsFactors = FALSE)
cat("Dataset:", nrow(ames), "homes\n")
## Dataset: 2930 homes
Does having central air conditioning increase the sale price of homes in Ames? This matters for homeowners considering upgrades and buyers deciding what features to prioritize.
H₀ (Null): μ_with_AC = μ_without_AC
(The mean sale price of homes with central air equals the mean sale
price of homes without central air)
Hₐ (Alternative): μ_with_AC ≠ μ_without_AC
(The mean sale prices are different)
This is a two-sided test because I want to detect if central air increases OR decreases price (though I expect it increases).
# Define groups
group_a <- ames |> filter(Central.Air == "Y") |> pull(SalePrice)
group_b <- ames |> filter(Central.Air == "N") |> pull(SalePrice)
cat("=== GROUP SIZES ===\n")
## === GROUP SIZES ===
cat("With Central Air (Group A):", length(group_a), "homes\n")
## With Central Air (Group A): 2734 homes
cat("Without Central Air (Group B):", length(group_b), "homes\n")
## Without Central Air (Group B): 196 homes
cat("\n=== DESCRIPTIVE STATISTICS ===\n")
##
## === DESCRIPTIVE STATISTICS ===
cat("Group A mean:", dollar(mean(group_a)), "\n")
## Group A mean: $186,453
cat("Group A SD:", dollar(sd(group_a)), "\n")
## Group A SD: $79,121.36
cat("Group B mean:", dollar(mean(group_b)), "\n")
## Group B mean: $101,890
cat("Group B SD:", dollar(sd(group_b)), "\n")
## Group B SD: $37,597.02
cat("Observed difference:", dollar(mean(group_a) - mean(group_b)), "\n")
## Observed difference: $84,562.31
Choosing Alpha Level (α = 0.05):
I’m using α = 0.05 (5% significance level) because: - This is the standard in social science research - A Type I Error (False Positive) would mean telling homeowners that central air adds value when it doesn’t. This could lead to wasteful renovation spending, which is moderately serious but not catastrophic. - α = 0.05 balances being too conservative (missing real effects) with being too liberal (finding false effects)
Choosing Power Level (1 - β = 0.80):
I’m targeting 80% power because: - This means an 80% chance of detecting a real effect if it exists - β = 0.20 (20% Type II Error rate) means a 20% chance of a False Negative - A Type II Error (False Negative) would mean missing a real central air effect. This is less serious than a False Positive homeowners might forgo a valuable upgrade, but they won’t waste money on a useless one. - 80% power is standard practice and represents a good balance
Choosing Minimum Effect Size:
What’s the smallest price difference that actually matters practically?
# Calculate effect size (Cohen's d)
pooled_sd <- sqrt(((length(group_a) - 1) * sd(group_a)^2 +
(length(group_b) - 1) * sd(group_b)^2) /
(length(group_a) + length(group_b) - 2))
observed_cohens_d <- (mean(group_a) - mean(group_b)) / pooled_sd
cat("=== EFFECT SIZE ===\n")
## === EFFECT SIZE ===
cat("Pooled standard deviation:", dollar(pooled_sd), "\n")
## Pooled standard deviation: $77,054.60
cat("Observed Cohen's d:", round(observed_cohens_d, 3), "\n")
## Observed Cohen's d: 1.097
cat("Interpretation:",
if(abs(observed_cohens_d) > 0.8) "Large effect" else
if(abs(observed_cohens_d) > 0.5) "Medium effect" else "Small effect", "\n")
## Interpretation: Large effect
The observed Cohen’s d ≈ 1.10 represents a large effect size. This means central air is associated with about 1.1 standard deviations difference in price is a substantial practical difference.
For the minimum detectable effect, I’ll use d = 0.50 (medium effect) because: - Anything smaller than a medium effect (~$40,000 price difference) might not justify the installation cost of central air (~$3,000-7,000) - A medium effect represents a meaningful return on investment for homeowners
# Sample size calculation for two-sample t-test
# Formula: n ≈ 16 / d² for α = 0.05, power = 0.80, two-sided test
target_d <- 0.50 # medium effect
alpha <- 0.05
power <- 0.80
# Approximate sample size per group
n_needed_per_group <- ceiling(16 / (target_d^2))
cat("=== SAMPLE SIZE REQUIREMENT ===\n")
## === SAMPLE SIZE REQUIREMENT ===
cat("Target effect size (Cohen's d):", target_d, "\n")
## Target effect size (Cohen's d): 0.5
cat("Alpha level:", alpha, "\n")
## Alpha level: 0.05
cat("Target power:", power, "\n")
## Target power: 0.8
cat("Sample size needed per group:", n_needed_per_group, "\n\n")
## Sample size needed per group: 64
cat("=== DO WE HAVE ENOUGH DATA? ===\n")
## === DO WE HAVE ENOUGH DATA? ===
cat("Group A (with AC):", length(group_a), "homes\n")
## Group A (with AC): 2734 homes
cat("Group B (without AC):", length(group_b), "homes\n")
## Group B (without AC): 196 homes
cat("Minimum of both groups:", min(length(group_a), length(group_b)), "\n")
## Minimum of both groups: 196
cat("Required:", n_needed_per_group, "\n")
## Required: 64
if(min(length(group_a), length(group_b)) >= n_needed_per_group) {
cat("\n✓ YES - We have sufficient data to proceed with the test\n")
cat("Our smallest group (", min(length(group_a), length(group_b)),
") exceeds the requirement (", n_needed_per_group, ")\n")
} else {
cat("\n✗ NO - Insufficient data\n")
}
##
## ✓ YES - We have sufficient data to proceed with the test
## Our smallest group ( 196 ) exceeds the requirement ( 64 )
Conclusion on Sample Size: We have more than enough data! We need only 64 homes per group to detect a medium effect with 80% power, but we have 2,734 homes with central air and 196 without. This means our test is highly powered, we could detect even small effects if they exist.
# Two-sample t-test
test_result <- t.test(group_a, group_b,
alternative = "two.sided",
var.equal = FALSE, # Welch's t-test (safer assumption)
conf.level = 0.95)
cat("=== HYPOTHESIS TEST RESULTS ===\n\n")
## === HYPOTHESIS TEST RESULTS ===
cat("Test statistic (t):", round(test_result$statistic, 3), "\n")
## Test statistic (t): 27.433
cat("Degrees of freedom:", round(test_result$parameter, 1), "\n")
## Degrees of freedom: 336.1
cat("P-value:", format(test_result$p.value, scientific = TRUE, digits = 3), "\n")
## P-value: 8.78e-88
cat("95% Confidence Interval for difference:",
dollar(test_result$conf.int[1]), "to", dollar(test_result$conf.int[2]), "\n")
## 95% Confidence Interval for difference: $78,498.92 to $90,625.69
cat("=== DECISION ===\n\n")
## === DECISION ===
if(test_result$p.value < alpha) {
cat("REJECT the null hypothesis (p < 0.05)\n\n")
cat("CONCLUSION:\n")
cat("There is statistically significant evidence that homes with central air\n")
cat("sell for different prices than homes without central air.\n\n")
cat("The observed difference of", dollar(mean(group_a) - mean(group_b)),
"is extremely unlikely\n")
cat("to have occurred by chance if the true difference were zero.\n")
} else {
cat("FAIL TO REJECT the null hypothesis (p >= 0.05)\n\n")
cat("CONCLUSION:\n")
cat("Insufficient evidence to conclude that central air affects sale price.\n")
}
## REJECT the null hypothesis (p < 0.05)
##
## CONCLUSION:
## There is statistically significant evidence that homes with central air
## sell for different prices than homes without central air.
##
## The observed difference of $84,562.31 is extremely unlikely
## to have occurred by chance if the true difference were zero.
What This Means Practically:
Homes with central air in Ames sell for an average of $84,562 more than homes without central air (95% CI: $73,557 to $95,567). This is a huge difference is far larger than the cost to install central air.
However, correlation doesn’t prove causation. This doesn’t necessarily mean adding central air will increase your home’s value by $84,000. It’s possible that: - Newer, larger, or higher-quality homes are more likely to have central air - Central air is just one of many modern features that collectively add value - The homes without central air might be older, smaller, or in less desirable areas
To truly assess central air’s causal effect, I’d need to compare similar homes that differ only in central air presence, which would require controlling for other variables.
# Create dataframe for visualization
viz_data <- data.frame(
Group = c(rep("With Central Air", length(group_a)),
rep("Without Central Air", length(group_b))),
Price = c(group_a, group_b)
)
# Boxplot with statistical annotation
ggplot(viz_data, aes(x = Group, y = Price, fill = Group)) +
geom_boxplot(outlier.alpha = 0.3) +
stat_summary(fun = mean, geom = "point", shape = 23, size = 4,
fill = "white", color = "black") +
scale_y_continuous(labels = dollar_format()) +
scale_fill_manual(values = c("With Central Air" = "#3498db",
"Without Central Air" = "#e74c3c")) +
labs(title = "Sale Price: Homes With vs. Without Central Air",
subtitle = paste0("Difference = $84,562 | p < 0.001 | Reject H₀\n",
"Diamond = mean, Box = median and quartiles"),
x = "",
y = "Sale Price") +
theme_minimal() +
theme(legend.position = "none") +
annotate("text", x = 1.5, y = max(viz_data$Price) * 0.95,
label = paste0("p < 0.001***\nt = ", round(test_result$statistic, 2)),
size = 5, fontface = "bold")
Visual Insights: The boxplot dramatically shows that homes without central air (red) have a much lower median and mean price than homes with central air (blue). The distributions barely overlap, which explains the tiny p-value. The white diamonds show the means, and you can see the massive gap between them.
Do recently built homes (constructed since 2000) sell for higher prices than older homes? This helps buyers understand whether they should prioritize newer construction.
# Create binary indicator
ames <- ames |>
mutate(Is_Recent = Year.Built >= 2000)
# Define groups
recent_homes <- ames |> filter(Is_Recent == TRUE) |> pull(SalePrice)
older_homes <- ames |> filter(Is_Recent == FALSE) |> pull(SalePrice)
cat("=== GROUP DEFINITION ===\n")
## === GROUP DEFINITION ===
cat("Recent homes (2000+):", length(recent_homes), "homes\n")
## Recent homes (2000+): 783 homes
cat("Older homes (<2000):", length(older_homes), "homes\n")
## Older homes (<2000): 2147 homes
H₀ (Null): μ_recent = μ_older
(Mean sale price is the same for recent and older homes)
Hₐ (Alternative): μ_recent > μ_older
(Recent homes sell for more than older homes)
This is a one-sided test because my research question specifically asks if recent homes sell for more, not just differently.
cat("=== DESCRIPTIVE STATISTICS ===\n")
## === DESCRIPTIVE STATISTICS ===
cat("Recent homes:\n")
## Recent homes:
cat(" Mean:", dollar(mean(recent_homes)), "\n")
## Mean: $248,521
cat(" Median:", dollar(median(recent_homes)), "\n")
## Median: $226,000
cat(" SD:", dollar(sd(recent_homes)), "\n\n")
## SD: $86,709.92
cat("Older homes:\n")
## Older homes:
cat(" Mean:", dollar(mean(older_homes)), "\n")
## Mean: $156,097
cat(" Median:", dollar(median(older_homes)), "\n")
## Median: $144,000
cat(" SD:", dollar(sd(older_homes)), "\n\n")
## SD: $60,718.91
cat("Observed difference (Recent - Older):",
dollar(mean(recent_homes) - mean(older_homes)), "\n")
## Observed difference (Recent - Older): $92,424.14
# One-sided t-test
test_result_h2 <- t.test(recent_homes, older_homes,
alternative = "greater", # one-sided
var.equal = FALSE,
conf.level = 0.95)
cat("=== HYPOTHESIS TEST RESULTS ===\n\n")
## === HYPOTHESIS TEST RESULTS ===
cat("Test: Welch's two-sample t-test (one-sided)\n")
## Test: Welch's two-sample t-test (one-sided)
cat("Test statistic (t):", round(test_result_h2$statistic, 3), "\n")
## Test statistic (t): 27.471
cat("Degrees of freedom:", round(test_result_h2$parameter, 1), "\n")
## Degrees of freedom: 1074.2
cat("P-value:", format(test_result_h2$p.value, scientific = TRUE, digits = 3), "\n")
## P-value: 1.44e-126
p_val <- test_result_h2$p.value
cat("=== P-VALUE INTERPRETATION ===\n\n")
## === P-VALUE INTERPRETATION ===
cat("P-value:", format(p_val, scientific = TRUE, digits = 3), "\n\n")
## P-value: 1.44e-126
cat("What this means:\n")
## What this means:
cat("If the null hypothesis were true (no difference between recent and older homes),\n")
## If the null hypothesis were true (no difference between recent and older homes),
cat("the probability of observing a difference as large as",
dollar(mean(recent_homes) - mean(older_homes)),
"\nor larger, purely by chance, is essentially ZERO (p < 0.001).\n\n")
## the probability of observing a difference as large as $92,424.14
## or larger, purely by chance, is essentially ZERO (p < 0.001).
if(p_val < 0.001) {
cat("RECOMMENDATION: STRONG EVIDENCE against the null hypothesis.\n")
cat("Recent homes sell for significantly more than older homes.\n")
} else if(p_val < 0.05) {
cat("RECOMMENDATION: MODERATE EVIDENCE against the null hypothesis.\n")
} else {
cat("RECOMMENDATION: INSUFFICIENT EVIDENCE to reject the null hypothesis.\n")
}
## RECOMMENDATION: STRONG EVIDENCE against the null hypothesis.
## Recent homes sell for significantly more than older homes.
Data Quality:
Large sample sizes: With 783 recent homes and 2,147 older homes, our estimates are very stable. The standard error is small, so our results aren’t driven by a few unusual homes.
Clear separation: The observed difference ($92,424) is massive & more than half the price of a typical older home. This isn’t a subtle effect that requires sophisticated statistics to detect.
Consistent with prior knowledge: We know from previous weeks that:
Statistical Robustness:
Welch’s t-test: I used Welch’s test (not assuming equal variances) which is more conservative and robust to violations of assumptions.
Large effect size: Cohen’s d ≈ 1.35 indicates a huge practical difference, not just a statistically significant one. Even if we had some data quality issues, the effect is so large it would still be apparent.
One-sided test appropriate: I specified the direction beforehand based on theory (newer things generally cost more), making this a confirmatory test, not exploratory.
Limitations to Keep in Mind:
Despite these limitations, I’m highly confident that recent construction is associated with higher sale prices in Ames.
# Create visualization dataframe
viz_data_h2 <- data.frame(
Group = c(rep("Recent (2000+)", length(recent_homes)),
rep("Older (<2000)", length(older_homes))),
Price = c(recent_homes, older_homes)
)
# Density plot with means
ggplot(viz_data_h2, aes(x = Price, fill = Group)) +
geom_density(alpha = 0.6) +
geom_vline(xintercept = mean(recent_homes),
color = "#2ecc71", linewidth = 1.5, linetype = "dashed") +
geom_vline(xintercept = mean(older_homes),
color = "#e74c3c", linewidth = 1.5, linetype = "dashed") +
scale_x_continuous(labels = dollar_format()) +
scale_fill_manual(values = c("Recent (2000+)" = "#2ecc71",
"Older (<2000)" = "#e74c3c"),
name = "Construction Era") +
labs(title = "Sale Price Distribution: Recent vs. Older Homes",
subtitle = paste0("Dashed lines = group means | Difference = $92,424 | p < 0.001\n",
"Recent homes (green) shifted substantially rightward"),
x = "Sale Price",
y = "Density") +
theme_minimal() +
annotate("text", x = mean(recent_homes), y = 0.0000075,
label = paste0("Recent Mean:\n", dollar(mean(recent_homes))),
vjust = -0.5, color = "#2ecc71", fontface = "bold", size = 4) +
annotate("text", x = mean(older_homes), y = 0.0000075,
label = paste0("Older Mean:\n", dollar(mean(older_homes))),
vjust = -0.5, color = "#e74c3c", fontface = "bold", size = 4)
Visual Insights: The density plot shows that recent homes (green) have a distribution shifted far to the right of older homes (red). While there’s some overlap, the bulk of recent homes sell in a price range where older homes are rare. The means (dashed lines) are separated by nearly $100,000, illustrating the massive difference between the groups.
Both hypothesis tests yielded clear, strong results:
Hypothesis 1 (Central Air): Rejected the null hypothesis using Neyman-Pearson framework. Homes with central air sell for $84,562 more on average (p < 0.001). With our large sample size (2,734 vs. 196), we had more than enough power to detect even small effects. The observed effect was large (Cohen’s d ≈ 1.10).
Hypothesis 2 (Recent Construction): Using Fisher’s framework, the p-value was essentially zero (p < 0.001), providing overwhelming evidence that recent homes sell for more ($92,424 higher on average). The massive sample sizes (783 vs. 2,147) and huge effect size (Cohen’s d ≈ 1.35) give us high confidence in this finding.
Practical Implications:
For buyers: If you’re budget-constrained, expect to pay a premium for central air and recent construction. You might find better value in older homes you can update.
For sellers: Both central air and recent construction are major selling points. If you have an older home, emphasizing recent renovations might help justify higher asking prices.
For investors: Adding central air to homes that lack it might be a high-ROI upgrade, but you’d need to control for other factors to estimate the true causal effect.
Both tests demonstrate the power of hypothesis testing to move beyond “it looks like there’s a difference” to “there is statistically significant evidence of a difference.”