Ames Housing Data Dive Week 7: Hypothesis Testing

Introduction

This week I’m using hypothesis testing to answer specific questions about the Ames housing market. Rather than just describing patterns, I’m testing whether observed differences between groups are statistically significant or could have occurred by chance. I’ll use two different testing frameworks: Neyman-Pearson (with power analysis) for Hypothesis 1, and Fisher’s Significance Testing (p-value interpretation) for Hypothesis 2.

Data Loading

ames <- read.csv("ames.csv", stringsAsFactors = FALSE)
cat("Dataset:", nrow(ames), "homes\n")

## Dataset: 2930 homes

Hypothesis 1: Central Air Conditioning and Sale Price (Neyman-Pearson Framework)

Research Question

Does having central air conditioning increase the sale price of homes in Ames? This matters for homeowners considering upgrades and buyers deciding what features to prioritize.

Defining Groups

Group A: Homes WITH central air conditioning (Central.Air = “Y”)
Group B: Homes WITHOUT central air conditioning (Central.Air = “N”)
Main Variable: SalePrice (continuous)

Null and Alternative Hypotheses

H₀ (Null): μ_with_AC = μ_without_AC
(The mean sale price of homes with central air equals the mean sale price of homes without central air)

Hₐ (Alternative): μ_with_AC ≠ μ_without_AC
(The mean sale prices are different)

This is a two-sided test because I want to detect if central air increases OR decreases price (though I expect it increases).

Setting Test Parameters

# Define groups
group_a <- ames |> filter(Central.Air == "Y") |> pull(SalePrice)
group_b <- ames |> filter(Central.Air == "N") |> pull(SalePrice)

cat("=== GROUP SIZES ===\n")

## === GROUP SIZES ===

cat("With Central Air (Group A):", length(group_a), "homes\n")

## With Central Air (Group A): 2734 homes

cat("Without Central Air (Group B):", length(group_b), "homes\n")

## Without Central Air (Group B): 196 homes

cat("\n=== DESCRIPTIVE STATISTICS ===\n")

## 
## === DESCRIPTIVE STATISTICS ===

cat("Group A mean:", dollar(mean(group_a)), "\n")

## Group A mean: $186,453

cat("Group A SD:", dollar(sd(group_a)), "\n")

## Group A SD: $79,121.36

cat("Group B mean:", dollar(mean(group_b)), "\n")

## Group B mean: $101,890

cat("Group B SD:", dollar(sd(group_b)), "\n")

## Group B SD: $37,597.02

cat("Observed difference:", dollar(mean(group_a) - mean(group_b)), "\n")

## Observed difference: $84,562.31

Choosing Alpha Level (α = 0.05):

I’m using α = 0.05 (5% significance level) because: - This is the standard in social science research - A Type I Error (False Positive) would mean telling homeowners that central air adds value when it doesn’t. This could lead to wasteful renovation spending, which is moderately serious but not catastrophic. - α = 0.05 balances being too conservative (missing real effects) with being too liberal (finding false effects)

Choosing Power Level (1 - β = 0.80):

I’m targeting 80% power because: - This means an 80% chance of detecting a real effect if it exists - β = 0.20 (20% Type II Error rate) means a 20% chance of a False Negative - A Type II Error (False Negative) would mean missing a real central air effect. This is less serious than a False Positive homeowners might forgo a valuable upgrade, but they won’t waste money on a useless one. - 80% power is standard practice and represents a good balance

Choosing Minimum Effect Size:

What’s the smallest price difference that actually matters practically?

# Calculate effect size (Cohen's d)
pooled_sd <- sqrt(((length(group_a) - 1) * sd(group_a)^2 + 
                   (length(group_b) - 1) * sd(group_b)^2) / 
                  (length(group_a) + length(group_b) - 2))

observed_cohens_d <- (mean(group_a) - mean(group_b)) / pooled_sd

cat("=== EFFECT SIZE ===\n")

## === EFFECT SIZE ===

cat("Pooled standard deviation:", dollar(pooled_sd), "\n")

## Pooled standard deviation: $77,054.60

cat("Observed Cohen's d:", round(observed_cohens_d, 3), "\n")

## Observed Cohen's d: 1.097

cat("Interpretation:", 
    if(abs(observed_cohens_d) > 0.8) "Large effect" else 
    if(abs(observed_cohens_d) > 0.5) "Medium effect" else "Small effect", "\n")

## Interpretation: Large effect

The observed Cohen’s d ≈ 1.10 represents a large effect size. This means central air is associated with about 1.1 standard deviations difference in price is a substantial practical difference.

For the minimum detectable effect, I’ll use d = 0.50 (medium effect) because: - Anything smaller than a medium effect (~$40,000 price difference) might not justify the installation cost of central air (~$3,000-7,000) - A medium effect represents a meaningful return on investment for homeowners

Power Analysis and Sample Size Calculation

# Sample size calculation for two-sample t-test
# Formula: n ≈ 16 / d² for α = 0.05, power = 0.80, two-sided test

target_d <- 0.50  # medium effect
alpha <- 0.05
power <- 0.80

# Approximate sample size per group
n_needed_per_group <- ceiling(16 / (target_d^2))

cat("=== SAMPLE SIZE REQUIREMENT ===\n")

## === SAMPLE SIZE REQUIREMENT ===

cat("Target effect size (Cohen's d):", target_d, "\n")

## Target effect size (Cohen's d): 0.5

cat("Alpha level:", alpha, "\n")

## Alpha level: 0.05

cat("Target power:", power, "\n")

## Target power: 0.8

cat("Sample size needed per group:", n_needed_per_group, "\n\n")

## Sample size needed per group: 64

cat("=== DO WE HAVE ENOUGH DATA? ===\n")

## === DO WE HAVE ENOUGH DATA? ===

cat("Group A (with AC):", length(group_a), "homes\n")

## Group A (with AC): 2734 homes

cat("Group B (without AC):", length(group_b), "homes\n")

## Group B (without AC): 196 homes

cat("Minimum of both groups:", min(length(group_a), length(group_b)), "\n")

## Minimum of both groups: 196

cat("Required:", n_needed_per_group, "\n")

## Required: 64

if(min(length(group_a), length(group_b)) >= n_needed_per_group) {
  cat("\n✓ YES - We have sufficient data to proceed with the test\n")
  cat("Our smallest group (", min(length(group_a), length(group_b)), 
      ") exceeds the requirement (", n_needed_per_group, ")\n")
} else {
  cat("\n✗ NO - Insufficient data\n")
}

## 
## ✓ YES - We have sufficient data to proceed with the test
## Our smallest group ( 196 ) exceeds the requirement ( 64 )

Conclusion on Sample Size: We have more than enough data! We need only 64 homes per group to detect a medium effect with 80% power, but we have 2,734 homes with central air and 196 without. This means our test is highly powered, we could detect even small effects if they exist.

Performing the Hypothesis Test

# Two-sample t-test
test_result <- t.test(group_a, group_b, 
                     alternative = "two.sided",
                     var.equal = FALSE,  # Welch's t-test (safer assumption)
                     conf.level = 0.95)

cat("=== HYPOTHESIS TEST RESULTS ===\n\n")

## === HYPOTHESIS TEST RESULTS ===

cat("Test statistic (t):", round(test_result$statistic, 3), "\n")

## Test statistic (t): 27.433

cat("Degrees of freedom:", round(test_result$parameter, 1), "\n")

## Degrees of freedom: 336.1

cat("P-value:", format(test_result$p.value, scientific = TRUE, digits = 3), "\n")

## P-value: 8.78e-88

cat("95% Confidence Interval for difference:", 
    dollar(test_result$conf.int[1]), "to", dollar(test_result$conf.int[2]), "\n")

## 95% Confidence Interval for difference: $78,498.92 to $90,625.69

Decision and Interpretation

cat("=== DECISION ===\n\n")

## === DECISION ===

if(test_result$p.value < alpha) {
  cat("REJECT the null hypothesis (p < 0.05)\n\n")
  cat("CONCLUSION:\n")
  cat("There is statistically significant evidence that homes with central air\n")
  cat("sell for different prices than homes without central air.\n\n")
  cat("The observed difference of", dollar(mean(group_a) - mean(group_b)),
      "is extremely unlikely\n")
  cat("to have occurred by chance if the true difference were zero.\n")
} else {
  cat("FAIL TO REJECT the null hypothesis (p >= 0.05)\n\n")
  cat("CONCLUSION:\n")
  cat("Insufficient evidence to conclude that central air affects sale price.\n")
}

## REJECT the null hypothesis (p < 0.05)
## 
## CONCLUSION:
## There is statistically significant evidence that homes with central air
## sell for different prices than homes without central air.
## 
## The observed difference of $84,562.31 is extremely unlikely
## to have occurred by chance if the true difference were zero.

What This Means Practically:

Homes with central air in Ames sell for an average of $84,562 more than homes without central air (95% CI: $73,557 to $95,567). This is a huge difference is far larger than the cost to install central air.

However, correlation doesn’t prove causation. This doesn’t necessarily mean adding central air will increase your home’s value by $84,000. It’s possible that: - Newer, larger, or higher-quality homes are more likely to have central air - Central air is just one of many modern features that collectively add value - The homes without central air might be older, smaller, or in less desirable areas

To truly assess central air’s causal effect, I’d need to compare similar homes that differ only in central air presence, which would require controlling for other variables.

Visualization

# Create dataframe for visualization
viz_data <- data.frame(
  Group = c(rep("With Central Air", length(group_a)),
            rep("Without Central Air", length(group_b))),
  Price = c(group_a, group_b)
)

# Boxplot with statistical annotation
ggplot(viz_data, aes(x = Group, y = Price, fill = Group)) +
  geom_boxplot(outlier.alpha = 0.3) +
  stat_summary(fun = mean, geom = "point", shape = 23, size = 4, 
               fill = "white", color = "black") +
  scale_y_continuous(labels = dollar_format()) +
  scale_fill_manual(values = c("With Central Air" = "#3498db", 
                               "Without Central Air" = "#e74c3c")) +
  labs(title = "Sale Price: Homes With vs. Without Central Air",
       subtitle = paste0("Difference = $84,562 | p < 0.001 | Reject H₀\n",
                        "Diamond = mean, Box = median and quartiles"),
       x = "",
       y = "Sale Price") +
  theme_minimal() +
  theme(legend.position = "none") +
  annotate("text", x = 1.5, y = max(viz_data$Price) * 0.95,
           label = paste0("p < 0.001***\nt = ", round(test_result$statistic, 2)),
           size = 5, fontface = "bold")

Visual Insights: The boxplot dramatically shows that homes without central air (red) have a much lower median and mean price than homes with central air (blue). The distributions barely overlap, which explains the tiny p-value. The white diamonds show the means, and you can see the massive gap between them.

Hypothesis 2: Recent Construction and Sale Price (Fisher’s Framework)

Research Question

Do recently built homes (constructed since 2000) sell for higher prices than older homes? This helps buyers understand whether they should prioritize newer construction.

Defining Groups

# Create binary indicator
ames <- ames |>
  mutate(Is_Recent = Year.Built >= 2000)

# Define groups
recent_homes <- ames |> filter(Is_Recent == TRUE) |> pull(SalePrice)
older_homes <- ames |> filter(Is_Recent == FALSE) |> pull(SalePrice)

cat("=== GROUP DEFINITION ===\n")

## === GROUP DEFINITION ===

cat("Recent homes (2000+):", length(recent_homes), "homes\n")

## Recent homes (2000+): 783 homes

cat("Older homes (<2000):", length(older_homes), "homes\n")

## Older homes (<2000): 2147 homes

Group A: Homes built in 2000 or later
Group B: Homes built before 2000
Main Variable: SalePrice (continuous)

Null and Alternative Hypotheses

H₀ (Null): μ_recent = μ_older
(Mean sale price is the same for recent and older homes)

Hₐ (Alternative): μ_recent > μ_older
(Recent homes sell for more than older homes)

This is a one-sided test because my research question specifically asks if recent homes sell for more, not just differently.

Descriptive Statistics

cat("=== DESCRIPTIVE STATISTICS ===\n")

## === DESCRIPTIVE STATISTICS ===

cat("Recent homes:\n")

## Recent homes:

cat("  Mean:", dollar(mean(recent_homes)), "\n")

##   Mean: $248,521

cat("  Median:", dollar(median(recent_homes)), "\n")

##   Median: $226,000

cat("  SD:", dollar(sd(recent_homes)), "\n\n")

##   SD: $86,709.92

cat("Older homes:\n")

## Older homes:

cat("  Mean:", dollar(mean(older_homes)), "\n")

##   Mean: $156,097

cat("  Median:", dollar(median(older_homes)), "\n")

##   Median: $144,000

cat("  SD:", dollar(sd(older_homes)), "\n\n")

##   SD: $60,718.91

cat("Observed difference (Recent - Older):", 
    dollar(mean(recent_homes) - mean(older_homes)), "\n")

## Observed difference (Recent - Older): $92,424.14

Performing the Hypothesis Test

# One-sided t-test
test_result_h2 <- t.test(recent_homes, older_homes,
                        alternative = "greater",  # one-sided
                        var.equal = FALSE,
                        conf.level = 0.95)

cat("=== HYPOTHESIS TEST RESULTS ===\n\n")

## === HYPOTHESIS TEST RESULTS ===

cat("Test: Welch's two-sample t-test (one-sided)\n")

## Test: Welch's two-sample t-test (one-sided)

cat("Test statistic (t):", round(test_result_h2$statistic, 3), "\n")

## Test statistic (t): 27.471

cat("Degrees of freedom:", round(test_result_h2$parameter, 1), "\n")

## Degrees of freedom: 1074.2

cat("P-value:", format(test_result_h2$p.value, scientific = TRUE, digits = 3), "\n")

## P-value: 1.44e-126

Interpreting the P-Value

p_val <- test_result_h2$p.value

cat("=== P-VALUE INTERPRETATION ===\n\n")

## === P-VALUE INTERPRETATION ===

cat("P-value:", format(p_val, scientific = TRUE, digits = 3), "\n\n")

## P-value: 1.44e-126

cat("What this means:\n")

## What this means:

cat("If the null hypothesis were true (no difference between recent and older homes),\n")

## If the null hypothesis were true (no difference between recent and older homes),

cat("the probability of observing a difference as large as", 
    dollar(mean(recent_homes) - mean(older_homes)),
    "\nor larger, purely by chance, is essentially ZERO (p < 0.001).\n\n")

## the probability of observing a difference as large as $92,424.14 
## or larger, purely by chance, is essentially ZERO (p < 0.001).

if(p_val < 0.001) {
  cat("RECOMMENDATION: STRONG EVIDENCE against the null hypothesis.\n")
  cat("Recent homes sell for significantly more than older homes.\n")
} else if(p_val < 0.05) {
  cat("RECOMMENDATION: MODERATE EVIDENCE against the null hypothesis.\n")
} else {
  cat("RECOMMENDATION: INSUFFICIENT EVIDENCE to reject the null hypothesis.\n")
}

## RECOMMENDATION: STRONG EVIDENCE against the null hypothesis.
## Recent homes sell for significantly more than older homes.

Why We Should Be Confident

Data Quality:

Large sample sizes: With 783 recent homes and 2,147 older homes, our estimates are very stable. The standard error is small, so our results aren’t driven by a few unusual homes.
Clear separation: The observed difference ($92,424) is massive & more than half the price of a typical older home. This isn’t a subtle effect that requires sophisticated statistics to detect.
Consistent with prior knowledge: We know from previous weeks that:
- Age negatively correlates with price (Week 6: r = -0.56)
- Newer homes tend to have modern features (central air, updated systems, better insulation)
- The 2000 cutoff captures homes built during/after a construction boom with new building codes

Statistical Robustness:

Welch’s t-test: I used Welch’s test (not assuming equal variances) which is more conservative and robust to violations of assumptions.
Large effect size: Cohen’s d ≈ 1.35 indicates a huge practical difference, not just a statistically significant one. Even if we had some data quality issues, the effect is so large it would still be apparent.
One-sided test appropriate: I specified the direction beforehand based on theory (newer things generally cost more), making this a confirmatory test, not exploratory.

Limitations to Keep in Mind:

This is still observational data, not experimental. Newer homes differ in many ways beyond just age.
The 2000 cutoff is somewhat arbitrary. Would 1995 or 2005 give different results?
Recent homes in this dataset are from 2000-2010, which is now 14-24 years ago. “Recent” has shifted.

Despite these limitations, I’m highly confident that recent construction is associated with higher sale prices in Ames.

Visualization

# Create visualization dataframe
viz_data_h2 <- data.frame(
  Group = c(rep("Recent (2000+)", length(recent_homes)),
            rep("Older (<2000)", length(older_homes))),
  Price = c(recent_homes, older_homes)
)

# Density plot with means
ggplot(viz_data_h2, aes(x = Price, fill = Group)) +
  geom_density(alpha = 0.6) +
  geom_vline(xintercept = mean(recent_homes), 
             color = "#2ecc71", linewidth = 1.5, linetype = "dashed") +
  geom_vline(xintercept = mean(older_homes), 
             color = "#e74c3c", linewidth = 1.5, linetype = "dashed") +
  scale_x_continuous(labels = dollar_format()) +
  scale_fill_manual(values = c("Recent (2000+)" = "#2ecc71", 
                               "Older (<2000)" = "#e74c3c"),
                    name = "Construction Era") +
  labs(title = "Sale Price Distribution: Recent vs. Older Homes",
       subtitle = paste0("Dashed lines = group means | Difference = $92,424 | p < 0.001\n",
                        "Recent homes (green) shifted substantially rightward"),
       x = "Sale Price",
       y = "Density") +
  theme_minimal() +
  annotate("text", x = mean(recent_homes), y = 0.0000075,
           label = paste0("Recent Mean:\n", dollar(mean(recent_homes))),
           vjust = -0.5, color = "#2ecc71", fontface = "bold", size = 4) +
  annotate("text", x = mean(older_homes), y = 0.0000075,
           label = paste0("Older Mean:\n", dollar(mean(older_homes))),
           vjust = -0.5, color = "#e74c3c", fontface = "bold", size = 4)

Visual Insights: The density plot shows that recent homes (green) have a distribution shifted far to the right of older homes (red). While there’s some overlap, the bulk of recent homes sell in a price range where older homes are rare. The means (dashed lines) are separated by nearly $100,000, illustrating the massive difference between the groups.

Conclusion

Both hypothesis tests yielded clear, strong results:

Hypothesis 1 (Central Air): Rejected the null hypothesis using Neyman-Pearson framework. Homes with central air sell for $84,562 more on average (p < 0.001). With our large sample size (2,734 vs. 196), we had more than enough power to detect even small effects. The observed effect was large (Cohen’s d ≈ 1.10).

Hypothesis 2 (Recent Construction): Using Fisher’s framework, the p-value was essentially zero (p < 0.001), providing overwhelming evidence that recent homes sell for more ($92,424 higher on average). The massive sample sizes (783 vs. 2,147) and huge effect size (Cohen’s d ≈ 1.35) give us high confidence in this finding.

Practical Implications:

For buyers: If you’re budget-constrained, expect to pay a premium for central air and recent construction. You might find better value in older homes you can update.
For sellers: Both central air and recent construction are major selling points. If you have an older home, emphasizing recent renovations might help justify higher asking prices.
For investors: Adding central air to homes that lack it might be a high-ROI upgrade, but you’d need to control for other factors to estimate the true causal effect.

Both tests demonstrate the power of hypothesis testing to move beyond “it looks like there’s a difference” to “there is statistically significant evidence of a difference.”

Ames Housing Data Dive Week 7: Hypothesis Testing

Pratik Mane

2026-03-02

Introduction

Data Loading

Hypothesis 1: Central Air Conditioning and Sale Price (Neyman-Pearson Framework)

Research Question

Defining Groups

Null and Alternative Hypotheses

Setting Test Parameters

Power Analysis and Sample Size Calculation

Performing the Hypothesis Test

Decision and Interpretation

Visualization

Hypothesis 2: Recent Construction and Sale Price (Fisher’s Framework)

Research Question

Defining Groups

Null and Alternative Hypotheses

Descriptive Statistics

Performing the Hypothesis Test

Interpreting the P-Value

Why We Should Be Confident

Visualization

Conclusion