Boston Neighborhood Housing Prices Data

What is the background/context for this data?

This data was accessed via the online dataset base provided by the Department of Biostatistics at Vanderbilt Univeristy. This data was originally collected by Harrison and Rubinfeld in 1978 to analyze a number of problems associated with using housing data to justify the wishes for cleaner air in our society. The Boston Standard Metropolitan Statistical Areas (SMSA) was the source of the data that Harrison and Rubinfield used.

Data management: How many observations are there? What is the unit of observation? What are the key response variable(s) and explanatory variables? Is there any missing data? If so, are there any obvious patterns to the missingness?

There are 506 total observations (one observation per census tract).These 506 observations were taken from 92 different cities/towns near the boston area. The unit of observation is the household. The key response variable, denoted by cmedv, is the median value of owner-occupied homes (in thousands of dollars). Some of the key explanatory variables are, as listed:

  1. Per capita crime rate (crime)

  2. Nitric Oxides concentration in parts per 10 million (nox)

  3. Average number of rooms per household (rooms)

  4. % Population in lower socio-economic status (lstat)

  5. Pupil to Teacher Ratio (ptratio)

There is no missing data for this dataset.

Table

Name Definition Type # Missing Observations
crime Per capita crime rate continuous 0
nox Nitric oxides concentration in parts per 10 million continuous 0
rooms Average number of rooms per household continuous 0
lstat % Population in lower socio-economic status continuous 0
ptratio Pupil to Teacher Ratio continuous 0
cmedv Median value of owner-occupied homes (in thousands of dollars) continuous 0

Univariate Distributions

1. Crime

## Loading required package: survival
## Loading required package: Formula
## Loading required package: ggplot2
## 
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:base':
## 
##     format.pval, round.POSIXt, trunc.POSIXt, units

This bar chart shows that over 80% of the total household observations have crime rates less than 10%, and that about 95% of all of the household observations in this dataset have crime rates less than about 18%.

2. Rooms

This histogram shows that the average number of rooms per household is somewhat normally distributed, with the highest frequency of households having between 5.5 and 7 rooms. A large portion of the households lie between this range, and not many houses at all in this sample have less than 5 rooms or more than 7 rooms.

3. Pupil to Teacher Ratio

This histogram shows that the pupil to teacher ratio seems to not be normally distributed for this data. In fact, most pupil to teacher ratios for the observations are rather high, at around 20 to 21 students per teacher. This means that there is a relatively high pupil to teacher ratio in most of the observations for this dataset.

Pairs Plot

Simple Linear Regressions

1. Simple Linear Regression - Crime Percentage as Predictor

## 
## Call:
## lm(formula = cmedv ~ 1 + crime)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -16.955  -5.467  -2.005   2.527  29.808 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 24.03165    0.40824  58.867   <2e-16 ***
## crime       -0.41588    0.04379  -9.496   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.465 on 504 degrees of freedom
## Multiple R-squared:  0.1518, Adjusted R-squared:  0.1501 
## F-statistic: 90.18 on 1 and 504 DF,  p-value: < 2.2e-16
Interpetation

Slope of crime: All else held constant, households in a town with 1 more percent in crime rate tend to cost about $416 less.

Intercept: Towns with a 0% crime rate are expected on average to cost $24,031.65.

R-Squared: The R-squared value here is 0.15, which is a very small value, indicating that only 15% of the variability in the median value of owner-occupied homes is explained by the model.

## 
## Call:
## lm(formula = cmedv ~ 1 + ptratio)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -18.7831  -4.7738  -0.6335   3.1470  31.2124 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   62.109      3.029   20.51   <2e-16 ***
## ptratio       -2.145      0.163  -13.16   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.93 on 504 degrees of freedom
## Multiple R-squared:  0.2557, Adjusted R-squared:  0.2542 
## F-statistic: 173.1 on 1 and 504 DF,  p-value: < 2.2e-16
Interpetation

Slope of ptratio: All else held constant, households in a town with 1 more student per teacher tend to cost about $2,145 less.

Intercept: Towns with a pupil to teacher ratio of 0 does not make any sense in the context of this data. Instead, we can say that as the pupil to teacher ratio decreases drastically and approaches 0, houses are expected on average to near a cost of $62,109.

R-Squared: The R-squared value here is 0.25, which is a relatively small value, indicating that only 25% of the variability in the median value of owner-occupied homes is explained by the model.

Multiple Linear Regression

1. ptratio and rooms as Predictor Variables

## 
## Call:
## lm(formula = cmedv ~ 1 + ptratio + rooms)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -17.725  -2.805   0.106   2.746  39.815 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   -2.905      4.183  -0.695    0.488    
## ptratio       -1.253      0.134  -9.349   <2e-16 ***
## rooms          7.727      0.413  18.709   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.095 on 503 degrees of freedom
## Multiple R-squared:  0.5611, Adjusted R-squared:  0.5594 
## F-statistic: 321.5 on 2 and 503 DF,  p-value: < 2.2e-16
Interpetation

Slope of ptratio: All else held constant, households in a town with 1 more student per teacher tend to cost about $1,253 less.

Slope of rooms: All else held constant, households in a town with 1 more room tend to cost about $7,727 more.

Intercept: Towns with a pupil to teacher ratio of 0 does not make any sense in the context of this data, and neither does the number of rooms in a house being equal to 0. Instead, we can say that as the pupil to teacher ratio decreases drastically and approaches 0, and as the number of rooms in a house approaches 0, houses are expected on average to near a cost of $-2,905, which also makes no sense in the real world. Instead, this can be interpeted as if a houses room count approaches zero, but the student to teacher ratio does as well, the rooms appraoching zero outweighs the effect of the pupil to teacher ratio because the cost of a house is nothing, meaning that you clearly cannot even purchase a house if it does not have any rooms in it.

2. ptratio and crime as Predictor Variables

## 
## Call:
## lm(formula = cmedv ~ 1 + ptratio + crime)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -17.794  -4.599  -1.045   2.563  32.234 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 57.11195    2.98643  19.124  < 2e-16 ***
## ptratio     -1.81842    0.16293 -11.161  < 2e-16 ***
## crime       -0.28318    0.04101  -6.906 1.52e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.586 on 503 degrees of freedom
## Multiple R-squared:  0.3201, Adjusted R-squared:  0.3174 
## F-statistic: 118.4 on 2 and 503 DF,  p-value: < 2.2e-16
Interpetation

Slope of ptratio: All else held constant, households in a town with 1 more student per teacher tend to cost about $1,818.42 less.

Slope of crime: All else held constant, households in a town with 1 more percent in crime rate tend to cost about $283.18 less.

Intercept: Towns with a pupil to teacher ratio of 0 does not make any sense in the context of this data.. Instead, we can say that as the pupil to teacher ratio decreases drastically and approaches 0, and if the house is in a town with 0% crime, houses are expected on average to near a cost of $57,111.95. This essentially means that if the pupil to teacher ratio is extremely low (usually considered a good thing), and the crime rate is equal to 0% (also considered a good thing when looking for a home), the price of such a house would be extremely high.