Hedonic pricing is a method used to estimate the value of a good by decomposing it into the attributes that make it up. In housing markets, the observed house price reflects both structural characteristics and neighborhood characteristics. Hedonic regression allows us to estimate how much each characteristic contributes to the median housing price.
This report examines the hprice2 dataset from the
wooldridge package to investigate the relationship between
housing prices and several neighborhood and property
characteristics.
The main research question: What is the impact of the crime rate on the median house price in a given area, holding other factors constant?
The analysis uses multiple linear regression to estimate the effect of crime while controlling for other relevant variables.
| Variable | Expected Effect on Price | Reason |
|---|---|---|
crime |
Negative | Higher crime lowers attractiveness. |
nox |
Negative | More air pollution lowers desirability. |
rooms |
Positive | More rooms increase home value. |
dist |
Positive | Greater distance may reflect better surroundings. |
radial |
Ambiguous | Highway access can help or hurt value. |
proptax |
Negative | Higher taxes reduce willingness to pay. |
stratio |
Negative | Higher ratios suggest lower school quality. |
lowstat |
Negative | Higher lower-status share often lowers prices. |
The hprice2 dataset contains housing and neighborhood
variables commonly used in hedonic price studies. The dependent variable
is price, which is the median housing price in a community.
The explanatory variables capture pollution, crime, housing structure,
location, public services, and neighborhood composition.
| Variable | Meaning |
|---|---|
price |
Median housing price, dollars. |
crime |
Crimes committed per capita. |
nox |
Nitrogen oxide concentration in the air, measured in parts per 100 million. |
rooms |
Average number of rooms per dwelling. |
dist |
Weighted distance to five employment centers. |
radial |
Access index to radial highways. |
proptax |
Property tax per $1,000 of assessed value. |
stratio |
Average student-teacher ratio in local schools. |
lowstat |
Percentage of residents considered lower status. |
Exploratory data analysis helps us understand the distribution of the variables and the direction of relationships before estimating the regression model. We begin with summary statistics and then inspect pairwise correlations and scatterplots.
| name | value |
|---|---|
| price_mean | 2.251151e+04 |
| price_sd | 9.208856e+03 |
| price_min | 5.000000e+03 |
| price_median | 2.120000e+04 |
| price_max | 5.000100e+04 |
| crime_mean | 3.611536e+00 |
| crime_sd | 8.590247e+00 |
| crime_min | 6.000000e-03 |
| crime_median | 2.565000e-01 |
| crime_max | 8.897600e+01 |
| nox_mean | 5.549783e+00 |
| nox_sd | 1.158395e+00 |
| nox_min | 3.850000e+00 |
| nox_median | 5.380000e+00 |
| nox_max | 8.710000e+00 |
| rooms_mean | 6.284051e+00 |
| rooms_sd | 7.025938e-01 |
| rooms_min | 3.560000e+00 |
| rooms_median | 6.210000e+00 |
| rooms_max | 8.780000e+00 |
| dist_mean | 3.795751e+00 |
| dist_sd | 2.106136e+00 |
| dist_min | 1.130000e+00 |
| dist_median | 3.210000e+00 |
| dist_max | 1.213000e+01 |
| radial_mean | 9.549407e+00 |
| radial_sd | 8.707259e+00 |
| radial_min | 1.000000e+00 |
| radial_median | 5.000000e+00 |
| radial_max | 2.400000e+01 |
| proptax_mean | 4.082371e+01 |
| proptax_sd | 1.685371e+01 |
| proptax_min | 1.870000e+01 |
| proptax_median | 3.300000e+01 |
| proptax_max | 7.110000e+01 |
| stratio_mean | 1.845929e+01 |
| stratio_sd | 2.165820e+00 |
| stratio_min | 1.260000e+01 |
| stratio_median | 1.910000e+01 |
| stratio_max | 2.200000e+01 |
| lowstat_mean | 1.270148e+01 |
| lowstat_sd | 7.238066e+00 |
| lowstat_min | 1.730000e+00 |
| lowstat_median | 1.136000e+01 |
| lowstat_max | 3.907000e+01 |
| Relationship | Correlation | Interpretation |
|---|---|---|
| price and rooms | +0.70 | Strong positive relationship; more rooms are associated with higher housing prices. |
| price and lowstat | -0.73 | Strong negative relationship; areas with more lower-status residents tend to have lower prices. |
| price and stratio | -0.50 | Moderate negative relationship; higher student-teacher ratios are associated with lower prices. |
| price and proptax | -0.47 | Moderate negative relationship; higher property taxes are associated with lower prices. |
| price and nox | -0.43 | Moderate negative relationship; more pollution is associated with lower prices. |
| price and crime | -0.39 | Moderate negative relationship; higher crime is associated with lower prices. |
| price and radial | -0.38 | Moderate negative relationship; highway access is associated with slightly lower prices. |
| price and dist | +0.25 | Weak positive relationship; greater distance shows only a small association with higher prices. |
| proptax and radial | +0.91 | Very strong positive correlation; these variables move closely together and may create multicollinearity. |
| dist and nox | -0.77 | Strong negative correlation; areas farther from employment centers tend to have lower pollution. |
| lowstat and rooms | -0.61 | Moderate negative correlation; lower-status share is associated with fewer rooms. |
Regression model:
\[ price = \beta_0 + \beta_1 nox + \beta_2 crime + \beta_3 rooms + \beta_4 dist + \beta_5 radial + \beta_6 proptax + \beta_7 stratio + \beta_8 lowstat + \mu \]
This model holds the other variables constant while estimating the partial effect of each characteristic on median housing price.
The coefficient on crime is of primary interest because
it measures how crime affects housing prices.
| term | estimate | std.error | statistic | p.value | conf.low | conf.high |
|---|---|---|---|---|---|---|
| (Intercept) | 41533.9920 | 5065.2375 | 8.1998 | 0e+00 | 31582.0736 | 51485.9103 |
| nox | -1853.2270 | 365.3199 | -5.0729 | 0e+00 | -2570.9887 | -1135.4653 |
| crime | -122.5444 | 33.8467 | -3.6206 | 3e-04 | -189.0447 | -56.0440 |
| rooms | 4062.6577 | 416.9391 | 9.7440 | 0e+00 | 3243.4772 | 4881.8383 |
| dist | -1231.3879 | 167.9645 | -7.3312 | 0e+00 | -1561.3959 | -901.3799 |
| radial | 293.1667 | 65.9469 | 4.4455 | 0e+00 | 163.5975 | 422.7359 |
| proptax | -122.3608 | 34.3277 | -3.5645 | 4e-04 | -189.8062 | -54.9155 |
| stratio | -1102.6969 | 125.8861 | -8.7595 | 0e+00 | -1350.0313 | -855.3624 |
| lowstat | -519.6365 | 47.6211 | -10.9119 | 0e+00 | -613.2001 | -426.0730 |
| Variable | Estimate | Interpretation |
|---|---|---|
| (Intercept) | 41533.9920 | Expected housing price when all predictors are 0. This is usually not very meaningful on its own. |
| nox | -1853.2270 | A 1-unit increase in NOx is associated with about a $1,853 decrease in median house price, holding everything else constant. |
| crime | -122.5444 | A 1-unit increase in crime is associated with about a $123 decrease in price. |
| rooms | 4062.6577 | A 1-unit increase in average rooms is associated with about a $4,063 increase in price. |
| dist | -1231.3879 | A 1-unit increase in distance is associated with about a $1,231 decrease in price. |
| radial | 293.1667 | A 1-unit increase in radial highway access is associated with about a $293 increase in price. |
| proptax | -122.3608 | A 1-unit increase in property tax is associated with about a $122 decrease in price. |
| stratio | -1102.6969 | A 1-unit increase in student-teacher ratio is associated with about a $1,103 decrease in price. |
| lowstat | -519.6365 | A 1-unit increase in lower-status share is associated with about a $520 decrease in price. |
| r.squared | adj.r.squared | sigma | statistic | p.value | df | logLik | AIC | BIC | deviance | df.residual | nobs |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.7154 | 0.7108 | 4952.506 | 156.1292 | 0 | 8 | -5018.313 | 10056.63 | 10098.89 | 12190077855 | 497 | 506 |
The regression model explains a large share of the variation in housing prices. The \(R^2\) is 0.7154, meaning the model explains about 71.5% of the variation in median housing prices. The adjusted \(R^2\) is 0.7108, which is very close to the regular \(R^2\), so the model still performs well even after accounting for the number of predictors.
The model’s overall test statistic is 156.1292 with a p-value of 0, which means the model is statistically significant overall. The residual standard error is 4952.506, which gives a sense of the typical size of prediction errors in the same units as price. The AIC is 10056.63 and the BIC is 10098.89; these are mainly useful for comparing this model with alternative specifications, where lower values indicate better fit.
The coefficient on crime is interpreted as the expected
change in median house price associated with a one-unit increase in
crime per capita, holding all other included variables constant. If the
coefficient is negative and statistically significant, it means higher
crime is associated with lower housing prices.
Interpreting the remaining coefficients in the same way:
nox: effect of air pollution on price.rooms: effect of an additional room on price.dist: effect of distance to employment centers.radial: effect of highway access.proptax: effect of property taxes.stratio: effect of school crowding.lowstat: effect of neighborhood socioeconomic
composition.Using the sign, magnitude, and p-value together. A large coefficient is not necessarily important unless it is statistically meaningful and economically interpretable.
Multiple regression requires checking whether the model assumptions are reasonable. Two common concerns are multicollinearity and heteroskedasticity. We examine these using correlation patterns, variance inflation factors, and formal tests.
## nox crime rooms dist radial proptax stratio lowstat
## 3.687239 1.740550 1.766832 2.576614 6.788810 6.891633 1.530528 2.446163
VIF values are:
nox : 3.687crime : 1.741rooms : 1.767dist : 2.577radial : 6.789proptax: 6.892stratio: 1.531lowstat: 2.446These values suggest that multi-collinearity is present, but it is
not extreme for most variables. The largest concern is between
radial and proptax, which both have VIFs near
7. That means those variables share a fair amount of overlap and may
inflate standard errors to some extent.
##
## studentized Breusch-Pagan test
##
## data: model
## BP = 48.226, df = 8, p-value = 8.946e-08
##
## Goldfeld-Quandt test
##
## data: model
## GQ = 2.3308, df1 = 244, df2 = 244, p-value = 3.815e-11
## alternative hypothesis: variance increases from segment 1 to 2
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 41533.992 7450.645 5.5746 4.077e-08 ***
## nox -1853.227 333.743 -5.5529 4.583e-08 ***
## crime -122.544 24.511 -4.9996 7.976e-07 ***
## rooms 4062.658 759.890 5.3464 1.370e-07 ***
## dist -1231.388 189.161 -6.5097 1.844e-10 ***
## radial 293.167 63.751 4.5986 5.403e-06 ***
## proptax -122.361 28.679 -4.2666 2.378e-05 ***
## stratio -1102.697 111.570 -9.8835 < 2.2e-16 ***
## lowstat -519.637 89.965 -5.7760 1.350e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Both heteroskedasticity tests reject constant variance:
This means the error variance is not constant across observations, so the standard OLS assumptions are violated. Because of that, the robust standard errors are the better results to use for inference.
The results do show that if crime is negative and
significant, then the evidence supports the idea that more crime lowers
housing prices. If rooms is positive and significant, that
confirms that larger or higher-quality homes command higher prices.
Overall, the hedonic regression shows that crime is significantly associated with lower housing prices, even after controlling for pollution, rooms, distance, highway access, property taxes, school quality, and neighborhood composition. The model fits the data well, with an \(R^2\) of 0.7154, but the presence of heteroskedasticity means robust standard errors should be used for inference.
hprice2 dataset in the
wooldridge R package.