Goal: 1. Business scenario

A real estate advisory team in Ames wants to understand whether building type is associated with differences in sale price. The goal is not to predict every home perfectly. The goal is simpler: help agents and clients understand whether some building types tend to sell for more than others, and whether those differences are large enough to matter in practice.

This is a good fit for the original lab because the response is numeric (sale_price) and the grouping variable is categorical (bldg_type). That makes a one-way ANOVA a natural first step.

1.1 Customer or audience

The main audience is a small internal real estate team: listing agents, pricing analysts, and managers who need a short, defensible summary of the local housing market. A second audience would be clients who want a plain-language explanation of why one property category may be priced differently from another.

1.2 Problem statement

The team wants to know whether building type is linked to meaningful differences in sale price so they can improve listing guidance, pricing conversations, and market summaries. A good result here would help answer a practical question: should we treat building types differently when discussing expected sale prices in Ames?

1.3 Scope

This critique stays inside the original notebook’s scope. It uses the variables already present in the lab, mainly sale_price and bldg_type. The original workflow already includes:

  • a boxplot of sale price by building type,
  • a one-way ANOVA,
  • pairwise t-tests with Bonferroni correction,
  • and bootstrapped confidence intervals.

That is enough to make a useful critique. The main issue is not that the notebook lacks data. The issue is that the results could be explained more clearly and checked more carefully.

1.4 Objective

The objective is to determine whether building type is associated with differences in sale price, and to identify which comparisons are worth paying attention to. In plain language, the team should be able to finish this notebook and say:

  • whether building type matters,
  • which groups seem to differ,
  • how confident we are in those differences,
  • and what limitations still remain.

Goal: 2. Model critique and improved analysis

library(tidyverse)
library(ggthemes)
library(ggrepel)
library(AmesHousing)
library(boot)
library(broom)
library(lindia)
library(car)


options(scipen = 6)
theme_set(theme_minimal())

library(tidyverse)
library(AmesHousing)

# Create dataset
ames <- make_ames() |>
  rename_with(tolower)

# Create ames_basic (subset of relevant variables)
ames_basic <- ames |>
  select(sale_price, first_flr_sf, lot_area, overall_qual, bldg_type)

# ANOVA from your original work
m <- aov(sale_price ~ bldg_type, data = ames)
summary(m)
##               Df         Sum Sq      Mean Sq F value Pr(>F)    
## bldg_type      4   645411087404 161352771851   26.15 <2e-16 ***
## Residuals   2925 18047126022947   6169957615                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

2.1 What the original notebook gets right

The original notebook chooses the right tool for the question. Since the analysis is comparing more than two groups, ANOVA is a better starting point than running a series of plain t-tests. It also moves in the right direction by following the ANOVA with pairwise comparisons and confidence intervals.

What the notebook does not always do well is slow down and explain what each result means in plain language. The critique below keeps the original logic, but makes the interpretation a little more careful and a little more useful.

2.2 Improved analysis 1: start with the data, then check the model

ames |>
  ggplot(aes(y = sale_price, x = bldg_type)) +
  geom_boxplot() +
  scale_y_log10(labels = \(x) paste('$', x / 1000, 'K')) +
  annotation_logticks(sides = 'l') +
  labs(
    title = "Sale Price by Building Type",
    x = "Building Type",
    y = "Log Sales Price (in $1000s)"
  )

This boxplot is a good first look because it makes the group structure visible right away. Some building types appear to sit at noticeably different price levels, and some groups also seem more spread out than others.

The important point is that the plot suggests a difference, but it does not prove one. It gives the analysis a reason to continue, not a final answer.

summary(m)
##               Df         Sum Sq      Mean Sq F value Pr(>F)    
## bldg_type      4   645411087404 161352771851   26.15 <2e-16 ***
## Residuals   2925 18047126022947   6169957615                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
aug_m <- augment(m)
## Warning: The `augment()` method for objects of class `aov` is not maintained by the broom team, and is only supported through the `lm` tidier method. Please be cautious in interpreting and reporting broom output.
## 
## This warning is displayed once per session.
p_resid <- ggplot(aug_m, aes(.fitted, .resid)) +
  geom_point(alpha = 0.35) +
  geom_hline(yintercept = 0, linetype = "dashed") +
  labs(
    title = "Residuals vs Fitted Values",
    x = "Fitted values",
    y = "Residuals"
  )

p_qq <- ggplot(aug_m, aes(sample = .std.resid)) +
  stat_qq(alpha = 0.35) +
  stat_qq_line() +
  labs(
    title = "Normal Q-Q Plot of Standardized Residuals",
    x = "Theoretical quantiles",
    y = "Standardized residuals"
  )

p_resid

p_qq

The residual plots matter because ANOVA depends on assumptions. The residuals-vs-fitted plot checks whether the spread is fairly even across predicted values. The Q-Q plot checks whether the residuals are close enough to normal for the test to behave well.

The main takeaway is that the notebook should not only state the assumptions. It should show evidence that the assumptions are at least reasonable. That makes the analysis more honest and easier to defend.

2.3 Improved analysis 2: add effect size, not just a p-value

aov_table <- summary(m)[[1]]
eta_sq <- aov_table$`Sum Sq`[1] / sum(aov_table$`Sum Sq`)
omega_sq <- (aov_table$`Sum Sq`[1] - aov_table$`Df`[1] * aov_table$`Mean Sq`[2]) /
  (sum(aov_table$`Sum Sq`) + aov_table$`Mean Sq`[2])

eta_sq
## [1] 0.03452774
omega_sq
## [1] 0.03319648

A significant p-value tells us the group means are not all the same. That is useful, but it is only part of the story. Effect size helps answer a more practical question: how much of the variation in sale price is actually explained by building type?

That matters because a result can be statistically significant and still be modest in practical terms. For a business audience, size usually matters as much as significance.

summary_tbl <- ames |>
  group_by(bldg_type) |>
  summarise(
    n = n(),
    mean_price = mean(sale_price),
    median_price = median(sale_price),
    .groups = "drop"
  ) |>
  arrange(desc(mean_price))

summary_tbl
## # A tibble: 5 × 4
##   bldg_type     n mean_price median_price
##   <fct>     <int>      <dbl>        <dbl>
## 1 TwnhsE      233    192312.       180000
## 2 OneFam     2425    184812.       165000
## 3 Duplex      109    139809.       136905
## 4 Twnhs       101    135934.       130000
## 5 TwoFmCon     62    125582.       122250
summary_tbl |>
  ggplot(aes(x = reorder(bldg_type, mean_price), y = mean_price)) +
  geom_col() +
  coord_flip() +
  labs(
    title = "Average Sale Price by Building Type",
    x = "Building Type",
    y = "Average Sale Price"
  )

This plot is easier to explain to a non-technical audience than a table of coefficients. It shows the group means in a direct way, and it makes the overall ranking easier to see.

The insight here is simple: some building types are clearly priced above others on average, but the gap is not identical across every comparison. That means the grouping variable is useful, but it is not the whole story.

2.4 Checking linearity and adding nonlinear terms

Issue in the lab
The lab assumes a linear relationship between predictors and sale price, but this is not tested. In practice, variables like first_flr_sf often have a curved (nonlinear) relationship with price.

Why it matters
If the relationship is not linear, a straight-line model will systematically overestimate or underestimate prices at different ranges. This reduces accuracy, especially for very small or very large homes.

# ---- Analysis 2: Linearity Check ----

# Baseline linear model
m1 <- lm(sale_price ~ first_flr_sf + overall_qual, data = ames)

# Model with nonlinear term (quadratic)
m2 <- lm(sale_price ~ poly(first_flr_sf, 2) + overall_qual, data = ames)

# Compare models
anova(m1, m2)
## Analysis of Variance Table
## 
## Model 1: sale_price ~ first_flr_sf + overall_qual
## Model 2: sale_price ~ poly(first_flr_sf, 2) + overall_qual
##   Res.Df           RSS Df    Sum of Sq      F    Pr(>F)    
## 1   2919 4705171693504                                     
## 2   2918 4517739506002  1 187432187502 121.06 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Summary of improved model
summary(m2)
## 
## Call:
## lm(formula = sale_price ~ poly(first_flr_sf, 2) + overall_qual, 
##     data = ames)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -280280  -21634   -1688   17908  286496 
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                   68350      19690   3.471 0.000525 ***
## poly(first_flr_sf, 2)1      1091272      46801  23.317  < 2e-16 ***
## poly(first_flr_sf, 2)2      -455202      41371 -11.003  < 2e-16 ***
## overall_qualPoor              21687      22509   0.964 0.335373    
## overall_qualFair              31339      20635   1.519 0.128948    
## overall_qualBelow_Average     52097      19850   2.624 0.008723 ** 
## overall_qualAverage           70811      19733   3.588 0.000338 ***
## overall_qualAbove_Average     98300      19738   4.980 6.72e-07 ***
## overall_qualGood             135146      19758   6.840 9.60e-12 ***
## overall_qualVery_Good        185325      19837   9.342  < 2e-16 ***
## overall_qualExcellent        268326      20136  13.325  < 2e-16 ***
## overall_qualVery_Excellent   349972      21155  16.543  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 39350 on 2918 degrees of freedom
## Multiple R-squared:  0.7583, Adjusted R-squared:  0.7574 
## F-statistic: 832.3 on 11 and 2918 DF,  p-value: < 2.2e-16

Interpretation
If the p-value from the ANOVA comparison is less than 0.05, the nonlinear model (m2) provides a significantly better fit.

Insight
This tells us that price does not increase at a constant rate with square footage. Larger homes may increase in value differently than smaller ones. Ignoring this can lead to consistent pricing errors, especially at the extremes.


2.5 Addressing multicollinearity

Issue in the lab
The lab mentions multicollinearity but does not quantify it. Variables like first_flr_sf and lot_area are often strongly correlated.

Why it matters
Multicollinearity: - Inflates standard errors
- Makes coefficients unstable
- Leads to misleading interpretations

# ---- Analysis 3: Multicollinearity Check ----

library(car)

model <- lm(sale_price ~ first_flr_sf + lot_area + overall_qual, data = ames)

# Variance Inflation Factors
vif(model)
##                  GVIF Df GVIF^(1/(2*Df))
## first_flr_sf 1.574715  1        1.254877
## lot_area     1.136578  1        1.066104
## overall_qual 1.429703  9        1.020058

Interpretation
- VIF > 5 → moderate multicollinearity
- VIF > 10 → severe multicollinearity

Insight
If multicollinearity is high, the model becomes harder to trust. A practical fix is to remove one of the correlated variables or combine them into a single feature.

2.6 What the critique changes

The original notebook already has the right ingredients. The critique mainly improves three things:

  1. it checks the model more carefully,
  2. it explains the size of the result instead of only the significance,
  3. and it gives the reader a clearer sense of uncertainty.

That makes the analysis more complete without changing the basic storyline of the notebook.


Goal 3: Critical thinking and broader impact

Ethical & Epistemological Concerns

1. Bias in Housing Data

Housing prices reflect: Historical segregation Lending discrimination Neighborhood inequality Using these data without context risks reinforcing past inequities. Example: If the model concludes that certain neighborhoods have “lower value,” it may unintentionally perpetuate socioeconomic disparities.

Insight:
The model is not neutral. It carries the history embedded in the data, and without context, it can quietly reinforce those patterns.


2. Unmeasured but Crucial Variables

The dataset lacks: School quality Crime rates Environmental hazards Proximity to transit These factors heavily influence price but are not included, meaning the model’s “knowledge” is incomplete. This is an epistemological limitation: The model only knows what is measured, not what matters.

Insight:
Even a well-built model can miss the bigger picture. Just because something isn’t in the dataset doesn’t mean it isn’t important.


3. Risk of Misuse

If deployed in real estate platforms (e.g., Zillow‑style automated pricing): Buyers may be misled Sellers may be underpaid Neighborhoods may be stigmatized Incorrect or biased predictions can have real financial consequences.

Insight:
These results don’t stay in a notebook — they can influence real money decisions. That makes careful interpretation critical.


4. Transparency & Interpretability

Complex models (interactions, nonlinear terms) may confuse stakeholders. Analysts must communicate: What the model can and cannot do Why certain variables matter Where uncertainty exists

Insight:
A model that cannot be explained clearly is hard to trust. Simplicity and clarity go a long way.


5. Accountability

If the model undervalues a home: Who is responsible? The analyst? The firm? The model itself? Ethical modeling requires: Documentation Model cards Clear disclaimers Human oversight

Insight:
Responsibility doesn’t disappear just because a model is involved. Someone still has to stand behind the outcome.

4. Closing note

Overall, the original notebook is a solid teaching example. It uses the right method, asks a sensible question, and gives the reader a reasonable path from plot to test to interpretation.

The main critique is that the analysis should explain itself more clearly. The story gets stronger when the notebook shows the pattern, tests the pattern, and then says what the pattern means in normal language.