A real estate advisory team in Ames wants to understand whether building type is associated with differences in sale price. The goal is not to predict every home perfectly. The goal is simpler: help agents and clients understand whether some building types tend to sell for more than others, and whether those differences are large enough to matter in practice.
This is a good fit for the original lab because the response is
numeric (sale_price) and the grouping variable is
categorical (bldg_type). That makes a one-way ANOVA a
natural first step.
The main audience is a small internal real estate team: listing agents, pricing analysts, and managers who need a short, defensible summary of the local housing market. A second audience would be clients who want a plain-language explanation of why one property category may be priced differently from another.
The team wants to know whether building type is linked to meaningful differences in sale price so they can improve listing guidance, pricing conversations, and market summaries. A good result here would help answer a practical question: should we treat building types differently when discussing expected sale prices in Ames?
This critique stays inside the original notebook’s scope. It uses the
variables already present in the lab, mainly sale_price and
bldg_type. The original workflow already includes:
That is enough to make a useful critique. The main issue is not that the notebook lacks data. The issue is that the results could be explained more clearly and checked more carefully.
The objective is to determine whether building type is associated with differences in sale price, and to identify which comparisons are worth paying attention to. In plain language, the team should be able to finish this notebook and say:
library(tidyverse)
library(ggthemes)
library(ggrepel)
library(AmesHousing)
library(boot)
library(broom)
library(lindia)
library(car)
options(scipen = 6)
theme_set(theme_minimal())
library(tidyverse)
library(AmesHousing)
# Create dataset
ames <- make_ames() |>
rename_with(tolower)
# Create ames_basic (subset of relevant variables)
ames_basic <- ames |>
select(sale_price, first_flr_sf, lot_area, overall_qual, bldg_type)
# ANOVA from your original work
m <- aov(sale_price ~ bldg_type, data = ames)
summary(m)## Df Sum Sq Mean Sq F value Pr(>F)
## bldg_type 4 645411087404 161352771851 26.15 <2e-16 ***
## Residuals 2925 18047126022947 6169957615
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The original notebook chooses the right tool for the question. Since the analysis is comparing more than two groups, ANOVA is a better starting point than running a series of plain t-tests. It also moves in the right direction by following the ANOVA with pairwise comparisons and confidence intervals.
What the notebook does not always do well is slow down and explain what each result means in plain language. The critique below keeps the original logic, but makes the interpretation a little more careful and a little more useful.
ames |>
ggplot(aes(y = sale_price, x = bldg_type)) +
geom_boxplot() +
scale_y_log10(labels = \(x) paste('$', x / 1000, 'K')) +
annotation_logticks(sides = 'l') +
labs(
title = "Sale Price by Building Type",
x = "Building Type",
y = "Log Sales Price (in $1000s)"
)This boxplot is a good first look because it makes the group structure visible right away. Some building types appear to sit at noticeably different price levels, and some groups also seem more spread out than others.
The important point is that the plot suggests a difference, but it does not prove one. It gives the analysis a reason to continue, not a final answer.
## Df Sum Sq Mean Sq F value Pr(>F)
## bldg_type 4 645411087404 161352771851 26.15 <2e-16 ***
## Residuals 2925 18047126022947 6169957615
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Warning: The `augment()` method for objects of class `aov` is not maintained by the broom team, and is only supported through the `lm` tidier method. Please be cautious in interpreting and reporting broom output.
##
## This warning is displayed once per session.
p_resid <- ggplot(aug_m, aes(.fitted, .resid)) +
geom_point(alpha = 0.35) +
geom_hline(yintercept = 0, linetype = "dashed") +
labs(
title = "Residuals vs Fitted Values",
x = "Fitted values",
y = "Residuals"
)
p_qq <- ggplot(aug_m, aes(sample = .std.resid)) +
stat_qq(alpha = 0.35) +
stat_qq_line() +
labs(
title = "Normal Q-Q Plot of Standardized Residuals",
x = "Theoretical quantiles",
y = "Standardized residuals"
)
p_residThe residual plots matter because ANOVA depends on assumptions. The residuals-vs-fitted plot checks whether the spread is fairly even across predicted values. The Q-Q plot checks whether the residuals are close enough to normal for the test to behave well.
The main takeaway is that the notebook should not only state the assumptions. It should show evidence that the assumptions are at least reasonable. That makes the analysis more honest and easier to defend.
aov_table <- summary(m)[[1]]
eta_sq <- aov_table$`Sum Sq`[1] / sum(aov_table$`Sum Sq`)
omega_sq <- (aov_table$`Sum Sq`[1] - aov_table$`Df`[1] * aov_table$`Mean Sq`[2]) /
(sum(aov_table$`Sum Sq`) + aov_table$`Mean Sq`[2])
eta_sq## [1] 0.03452774
## [1] 0.03319648
A significant p-value tells us the group means are not all the same. That is useful, but it is only part of the story. Effect size helps answer a more practical question: how much of the variation in sale price is actually explained by building type?
That matters because a result can be statistically significant and still be modest in practical terms. For a business audience, size usually matters as much as significance.
summary_tbl <- ames |>
group_by(bldg_type) |>
summarise(
n = n(),
mean_price = mean(sale_price),
median_price = median(sale_price),
.groups = "drop"
) |>
arrange(desc(mean_price))
summary_tbl## # A tibble: 5 × 4
## bldg_type n mean_price median_price
## <fct> <int> <dbl> <dbl>
## 1 TwnhsE 233 192312. 180000
## 2 OneFam 2425 184812. 165000
## 3 Duplex 109 139809. 136905
## 4 Twnhs 101 135934. 130000
## 5 TwoFmCon 62 125582. 122250
summary_tbl |>
ggplot(aes(x = reorder(bldg_type, mean_price), y = mean_price)) +
geom_col() +
coord_flip() +
labs(
title = "Average Sale Price by Building Type",
x = "Building Type",
y = "Average Sale Price"
)This plot is easier to explain to a non-technical audience than a table of coefficients. It shows the group means in a direct way, and it makes the overall ranking easier to see.
The insight here is simple: some building types are clearly priced above others on average, but the gap is not identical across every comparison. That means the grouping variable is useful, but it is not the whole story.
Issue in the lab
The lab assumes a linear relationship between predictors and sale price,
but this is not tested. In practice, variables like
first_flr_sf often have a curved (nonlinear) relationship
with price.
Why it matters
If the relationship is not linear, a straight-line model will
systematically overestimate or underestimate prices at different ranges.
This reduces accuracy, especially for very small or very large
homes.
# ---- Analysis 2: Linearity Check ----
# Baseline linear model
m1 <- lm(sale_price ~ first_flr_sf + overall_qual, data = ames)
# Model with nonlinear term (quadratic)
m2 <- lm(sale_price ~ poly(first_flr_sf, 2) + overall_qual, data = ames)
# Compare models
anova(m1, m2)## Analysis of Variance Table
##
## Model 1: sale_price ~ first_flr_sf + overall_qual
## Model 2: sale_price ~ poly(first_flr_sf, 2) + overall_qual
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 2919 4705171693504
## 2 2918 4517739506002 1 187432187502 121.06 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Call:
## lm(formula = sale_price ~ poly(first_flr_sf, 2) + overall_qual,
## data = ames)
##
## Residuals:
## Min 1Q Median 3Q Max
## -280280 -21634 -1688 17908 286496
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 68350 19690 3.471 0.000525 ***
## poly(first_flr_sf, 2)1 1091272 46801 23.317 < 2e-16 ***
## poly(first_flr_sf, 2)2 -455202 41371 -11.003 < 2e-16 ***
## overall_qualPoor 21687 22509 0.964 0.335373
## overall_qualFair 31339 20635 1.519 0.128948
## overall_qualBelow_Average 52097 19850 2.624 0.008723 **
## overall_qualAverage 70811 19733 3.588 0.000338 ***
## overall_qualAbove_Average 98300 19738 4.980 6.72e-07 ***
## overall_qualGood 135146 19758 6.840 9.60e-12 ***
## overall_qualVery_Good 185325 19837 9.342 < 2e-16 ***
## overall_qualExcellent 268326 20136 13.325 < 2e-16 ***
## overall_qualVery_Excellent 349972 21155 16.543 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 39350 on 2918 degrees of freedom
## Multiple R-squared: 0.7583, Adjusted R-squared: 0.7574
## F-statistic: 832.3 on 11 and 2918 DF, p-value: < 2.2e-16
Interpretation
If the p-value from the ANOVA comparison is less than 0.05, the
nonlinear model (m2) provides a significantly better fit.
Insight
This tells us that price does not increase at a constant rate with
square footage. Larger homes may increase in value differently than
smaller ones. Ignoring this can lead to consistent pricing errors,
especially at the extremes.
Issue in the lab
The lab mentions multicollinearity but does not quantify it. Variables
like first_flr_sf and lot_area are often
strongly correlated.
Why it matters
Multicollinearity: - Inflates standard errors
- Makes coefficients unstable
- Leads to misleading interpretations
# ---- Analysis 3: Multicollinearity Check ----
library(car)
model <- lm(sale_price ~ first_flr_sf + lot_area + overall_qual, data = ames)
# Variance Inflation Factors
vif(model)## GVIF Df GVIF^(1/(2*Df))
## first_flr_sf 1.574715 1 1.254877
## lot_area 1.136578 1 1.066104
## overall_qual 1.429703 9 1.020058
Interpretation
- VIF > 5 → moderate multicollinearity
- VIF > 10 → severe multicollinearity
Insight
If multicollinearity is high, the model becomes harder to trust. A
practical fix is to remove one of the correlated variables or combine
them into a single feature.
The original notebook already has the right ingredients. The critique mainly improves three things:
That makes the analysis more complete without changing the basic storyline of the notebook.
Housing prices reflect: Historical segregation Lending discrimination Neighborhood inequality Using these data without context risks reinforcing past inequities. Example: If the model concludes that certain neighborhoods have “lower value,” it may unintentionally perpetuate socioeconomic disparities.
Insight:
The model is not neutral. It carries the history embedded in the data,
and without context, it can quietly reinforce those patterns.
The dataset lacks: School quality Crime rates Environmental hazards Proximity to transit These factors heavily influence price but are not included, meaning the model’s “knowledge” is incomplete. This is an epistemological limitation: The model only knows what is measured, not what matters.
Insight:
Even a well-built model can miss the bigger picture. Just because
something isn’t in the dataset doesn’t mean it isn’t important.
If deployed in real estate platforms (e.g., Zillow‑style automated pricing): Buyers may be misled Sellers may be underpaid Neighborhoods may be stigmatized Incorrect or biased predictions can have real financial consequences.
Insight:
These results don’t stay in a notebook — they can influence real money
decisions. That makes careful interpretation critical.
Complex models (interactions, nonlinear terms) may confuse stakeholders. Analysts must communicate: What the model can and cannot do Why certain variables matter Where uncertainty exists
Insight:
A model that cannot be explained clearly is hard to trust. Simplicity
and clarity go a long way.
If the model undervalues a home: Who is responsible? The analyst? The firm? The model itself? Ethical modeling requires: Documentation Model cards Clear disclaimers Human oversight
Insight:
Responsibility doesn’t disappear just because a model is involved.
Someone still has to stand behind the outcome.
Overall, the original notebook is a solid teaching example. It uses the right method, asks a sensible question, and gives the reader a reasonable path from plot to test to interpretation.
The main critique is that the analysis should explain itself more clearly. The story gets stronger when the notebook shows the pattern, tests the pattern, and then says what the pattern means in normal language.