Introduction

Hedonic pricing is a method used to estimate the value of a good by decomposing it into the attributes that make it up. In housing markets, the observed house price reflects both structural characteristics and neighborhood characteristics. Hedonic regression allows us to estimate how much each characteristic contributes to the median housing price.

This report examines the hprice2 dataset from the wooldridge package to investigate the relationship between housing prices and several neighborhood and property characteristics.

The main research question: What is the impact of the crime rate on the median house price in a given area, holding other factors constant?

The analysis uses multiple linear regression to estimate the effect of crime while controlling for other relevant variables.

Hypotheses

Variable Expected Effect on Price Reason
crime Negative Higher crime lowers attractiveness.
nox Negative More air pollution lowers desirability.
rooms Positive More rooms increase home value.
dist Positive Greater distance may reflect better surroundings.
radial Ambiguous Highway access can help or hurt value.
proptax Negative Higher taxes reduce willingness to pay.
stratio Negative Higher ratios suggest lower school quality.
lowstat Negative Higher lower-status share often lowers prices.

Data Description

The hprice2 dataset contains housing and neighborhood variables commonly used in hedonic price studies. The dependent variable is price, which is the median housing price in a community. The explanatory variables capture pollution, crime, housing structure, location, public services, and neighborhood composition.

Variable Definitions

Variable Meaning
price Median housing price, dollars.
crime Crimes committed per capita.
nox Nitrogen oxide concentration in the air, measured in parts per 100 million.
rooms Average number of rooms per dwelling.
dist Weighted distance to five employment centers.
radial Access index to radial highways.
proptax Property tax per $1,000 of assessed value.
stratio Average student-teacher ratio in local schools.
lowstat Percentage of residents considered lower status.

Exploratory Data Analysis

Exploratory data analysis helps us understand the distribution of the variables and the direction of relationships before estimating the regression model. We begin with summary statistics and then inspect pairwise correlations and scatterplots.

Summary Statistics
name value
price_mean 2.251151e+04
price_sd 9.208856e+03
price_min 5.000000e+03
price_median 2.120000e+04
price_max 5.000100e+04
crime_mean 3.611536e+00
crime_sd 8.590247e+00
crime_min 6.000000e-03
crime_median 2.565000e-01
crime_max 8.897600e+01
nox_mean 5.549783e+00
nox_sd 1.158395e+00
nox_min 3.850000e+00
nox_median 5.380000e+00
nox_max 8.710000e+00
rooms_mean 6.284051e+00
rooms_sd 7.025938e-01
rooms_min 3.560000e+00
rooms_median 6.210000e+00
rooms_max 8.780000e+00
dist_mean 3.795751e+00
dist_sd 2.106136e+00
dist_min 1.130000e+00
dist_median 3.210000e+00
dist_max 1.213000e+01
radial_mean 9.549407e+00
radial_sd 8.707259e+00
radial_min 1.000000e+00
radial_median 5.000000e+00
radial_max 2.400000e+01
proptax_mean 4.082371e+01
proptax_sd 1.685371e+01
proptax_min 1.870000e+01
proptax_median 3.300000e+01
proptax_max 7.110000e+01
stratio_mean 1.845929e+01
stratio_sd 2.165820e+00
stratio_min 1.260000e+01
stratio_median 1.910000e+01
stratio_max 2.200000e+01
lowstat_mean 1.270148e+01
lowstat_sd 7.238066e+00
lowstat_min 1.730000e+00
lowstat_median 1.136000e+01
lowstat_max 3.907000e+01

Correlation Interpretation

Appendix 1.1
Relationship Correlation Interpretation
price and rooms +0.70 Strong positive relationship; more rooms are associated with higher housing prices.
price and lowstat -0.73 Strong negative relationship; areas with more lower-status residents tend to have lower prices.
price and stratio -0.50 Moderate negative relationship; higher student-teacher ratios are associated with lower prices.
price and proptax -0.47 Moderate negative relationship; higher property taxes are associated with lower prices.
price and nox -0.43 Moderate negative relationship; more pollution is associated with lower prices.
price and crime -0.39 Moderate negative relationship; higher crime is associated with lower prices.
price and radial -0.38 Moderate negative relationship; highway access is associated with slightly lower prices.
price and dist +0.25 Weak positive relationship; greater distance shows only a small association with higher prices.
proptax and radial +0.91 Very strong positive correlation; these variables move closely together and may create multicollinearity.
dist and nox -0.77 Strong negative correlation; areas farther from employment centers tend to have lower pollution.
lowstat and rooms -0.61 Moderate negative correlation; lower-status share is associated with fewer rooms.

Model Specification

Regression model:

\[ price = \beta_0 + \beta_1 nox + \beta_2 crime + \beta_3 rooms + \beta_4 dist + \beta_5 radial + \beta_6 proptax + \beta_7 stratio + \beta_8 lowstat + \mu \]

This model holds the other variables constant while estimating the partial effect of each characteristic on median housing price.

The coefficient on crime is of primary interest because it measures how crime affects housing prices.

OLS Estimation

Coefficient Table

OLS Regression Results
term estimate std.error statistic p.value conf.low conf.high
(Intercept) 41533.9920 5065.2375 8.1998 0e+00 31582.0736 51485.9103
nox -1853.2270 365.3199 -5.0729 0e+00 -2570.9887 -1135.4653
crime -122.5444 33.8467 -3.6206 3e-04 -189.0447 -56.0440
rooms 4062.6577 416.9391 9.7440 0e+00 3243.4772 4881.8383
dist -1231.3879 167.9645 -7.3312 0e+00 -1561.3959 -901.3799
radial 293.1667 65.9469 4.4455 0e+00 163.5975 422.7359
proptax -122.3608 34.3277 -3.5645 4e-04 -189.8062 -54.9155
stratio -1102.6969 125.8861 -8.7595 0e+00 -1350.0313 -855.3624
lowstat -519.6365 47.6211 -10.9119 0e+00 -613.2001 -426.0730

Variable Estimate Interpretation

Variable Estimate Interpretation
Variable Estimate Interpretation
(Intercept) 41533.9920 Expected housing price when all predictors are 0. This is usually not very meaningful on its own.
nox -1853.2270 A 1-unit increase in NOx is associated with about a $1,853 decrease in median house price, holding everything else constant.
crime -122.5444 A 1-unit increase in crime is associated with about a $123 decrease in price.
rooms 4062.6577 A 1-unit increase in average rooms is associated with about a $4,063 increase in price.
dist -1231.3879 A 1-unit increase in distance is associated with about a $1,231 decrease in price.
radial 293.1667 A 1-unit increase in radial highway access is associated with about a $293 increase in price.
proptax -122.3608 A 1-unit increase in property tax is associated with about a $122 decrease in price.
stratio -1102.6969 A 1-unit increase in student-teacher ratio is associated with about a $1,103 decrease in price.
lowstat -519.6365 A 1-unit increase in lower-status share is associated with about a $520 decrease in price.

Model Fit

Model Fit Statistics
r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual nobs
0.7154 0.7108 4952.506 156.1292 0 8 -5018.313 10056.63 10098.89 12190077855 497 506

The regression model explains a large share of the variation in housing prices. The \(R^2\) is 0.7154, meaning the model explains about 71.5% of the variation in median housing prices. The adjusted \(R^2\) is 0.7108, which is very close to the regular \(R^2\), so the model still performs well even after accounting for the number of predictors.

The model’s overall test statistic is 156.1292 with a p-value of 0, which means the model is statistically significant overall. The residual standard error is 4952.506, which gives a sense of the typical size of prediction errors in the same units as price. The AIC is 10056.63 and the BIC is 10098.89; these are mainly useful for comparing this model with alternative specifications, where lower values indicate better fit.

Interpretation

The coefficient on crime is interpreted as the expected change in median house price associated with a one-unit increase in crime per capita, holding all other included variables constant. If the coefficient is negative and statistically significant, it means higher crime is associated with lower housing prices.

Interpreting the remaining coefficients in the same way:

  • nox: effect of air pollution on price.
  • rooms: effect of an additional room on price.
  • dist: effect of distance to employment centers.
  • radial: effect of highway access.
  • proptax: effect of property taxes.
  • stratio: effect of school crowding.
  • lowstat: effect of neighborhood socioeconomic composition.

Using the sign, magnitude, and p-value together. A large coefficient is not necessarily important unless it is statistically meaningful and economically interpretable.

Diagnostic Checks

Multiple regression requires checking whether the model assumptions are reasonable. Two common concerns are multicollinearity and heteroskedasticity. We examine these using correlation patterns, variance inflation factors, and formal tests.

Multicollinearity

##      nox    crime    rooms     dist   radial  proptax  stratio  lowstat 
## 3.687239 1.740550 1.766832 2.576614 6.788810 6.891633 1.530528 2.446163

VIF values are:

  • nox : 3.687
  • crime : 1.741
  • rooms : 1.767
  • dist : 2.577
  • radial : 6.789
  • proptax: 6.892
  • stratio: 1.531
  • lowstat: 2.446

These values suggest that multi-collinearity is present, but it is not extreme for most variables. The largest concern is between radial and proptax, which both have VIFs near 7. That means those variables share a fair amount of overlap and may inflate standard errors to some extent.

Heteroskedasticity Tests

## 
##  studentized Breusch-Pagan test
## 
## data:  model
## BP = 48.226, df = 8, p-value = 8.946e-08
## 
##  Goldfeld-Quandt test
## 
## data:  model
## GQ = 2.3308, df1 = 244, df2 = 244, p-value = 3.815e-11
## alternative hypothesis: variance increases from segment 1 to 2

Robust Standard Errors

## 
## t test of coefficients:
## 
##              Estimate Std. Error t value  Pr(>|t|)    
## (Intercept) 41533.992   7450.645  5.5746 4.077e-08 ***
## nox         -1853.227    333.743 -5.5529 4.583e-08 ***
## crime        -122.544     24.511 -4.9996 7.976e-07 ***
## rooms        4062.658    759.890  5.3464 1.370e-07 ***
## dist        -1231.388    189.161 -6.5097 1.844e-10 ***
## radial        293.167     63.751  4.5986 5.403e-06 ***
## proptax      -122.361     28.679 -4.2666 2.378e-05 ***
## stratio     -1102.697    111.570 -9.8835 < 2.2e-16 ***
## lowstat      -519.637     89.965 -5.7760 1.350e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Both heteroskedasticity tests reject constant variance:

  • Breusch-Pagan test: p-value = 8.946e-08
  • Goldfeld-Quandt test: p-value = 3.815e-11

This means the error variance is not constant across observations, so the standard OLS assumptions are violated. Because of that, the robust standard errors are the better results to use for inference.

Discussion

The results do show that if crime is negative and significant, then the evidence supports the idea that more crime lowers housing prices. If rooms is positive and significant, that confirms that larger or higher-quality homes command higher prices.

Limitations

Conclusion

Overall, the hedonic regression shows that crime is significantly associated with lower housing prices, even after controlling for pollution, rooms, distance, highway access, property taxes, school quality, and neighborhood composition. The model fits the data well, with an \(R^2\) of 0.7154, but the presence of heteroskedasticity means robust standard errors should be used for inference.

References