This data was accessed via the online dataset base provided by the Department of Biostatistics at Vanderbilt Univeristy. This data was originally collected by Harrison and Rubinfeld in 1978 to analyze a number of problems associated with using housing data to justify the wishes for cleaner air in our society. The Boston Standard Metropolitan Statistical Areas (SMSA) was the source of the data that Harrison and Rubinfield used.
There are 506 total observations (one observation per census tract).These 506 observations were taken from 92 different cities/towns near the boston area. The unit of observation is the household. The key response variable, denoted by cmedv, is the median value of owner-occupied homes (in thousands of dollars). Some of the key explanatory variables are, as listed:
Per capita crime rate (crime)
Nitric Oxides concentration in parts per 10 million (nox)
Average number of rooms per household (rooms)
% Population in lower socio-economic status (lstat)
Pupil to Teacher Ratio (ptratio)
There is no missing data for this dataset.
| Name | Definition | Type | # Missing Observations |
|---|---|---|---|
| crime | Per capita crime rate | continuous | 0 |
| nox | Nitric oxides concentration in parts per 10 million | continuous | 0 |
| rooms | Average number of rooms per household | continuous | 0 |
| lstat | % Population in lower socio-economic status | continuous | 0 |
| ptratio | Pupil to Teacher Ratio | continuous | 0 |
| cmedv | Median value of owner-occupied homes (in thousands of dollars) | continuous | 0 |
## Loading required package: survival
## Loading required package: Formula
## Loading required package: ggplot2
##
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:base':
##
## format.pval, round.POSIXt, trunc.POSIXt, units
This bar chart shows that over 80% of the total household observations have crime rates less than 10%, and that about 95% of all of the household observations in this dataset have crime rates less than about 18%.
This histogram shows that the average number of rooms per household is somewhat normally distributed, with the highest frequency of households having between 5.5 and 7 rooms. A large portion of the households lie between this range, and not many houses at all in this sample have less than 5 rooms or more than 7 rooms.
This histogram shows that the pupil to teacher ratio seems to not be normally distributed for this data. In fact, most pupil to teacher ratios for the observations are rather high, at around 20 to 21 students per teacher. This means that there is a relatively high pupil to teacher ratio in most of the observations for this dataset.
##
## Call:
## lm(formula = cmedv ~ 1 + crime)
##
## Residuals:
## Min 1Q Median 3Q Max
## -16.955 -5.467 -2.005 2.527 29.808
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 24.03165 0.40824 58.867 <2e-16 ***
## crime -0.41588 0.04379 -9.496 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.465 on 504 degrees of freedom
## Multiple R-squared: 0.1518, Adjusted R-squared: 0.1501
## F-statistic: 90.18 on 1 and 504 DF, p-value: < 2.2e-16
Slope of crime: All else held constant, households in a town with 1 more percent in crime rate tend to cost about $416 less.
Intercept: Towns with a 0% crime rate are expected on average to cost $24,031.65.
R-Squared: The R-squared value here is 0.15, which is a very small value, indicating that only 15% of the variability in the median value of owner-occupied homes is explained by the model.
##
## Call:
## lm(formula = cmedv ~ 1 + ptratio)
##
## Residuals:
## Min 1Q Median 3Q Max
## -18.7831 -4.7738 -0.6335 3.1470 31.2124
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 62.109 3.029 20.51 <2e-16 ***
## ptratio -2.145 0.163 -13.16 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.93 on 504 degrees of freedom
## Multiple R-squared: 0.2557, Adjusted R-squared: 0.2542
## F-statistic: 173.1 on 1 and 504 DF, p-value: < 2.2e-16
Slope of ptratio: All else held constant, households in a town with 1 more student per teacher tend to cost about $2,145 less.
Intercept: Towns with a pupil to teacher ratio of 0 does not make any sense in the context of this data. Instead, we can say that as the pupil to teacher ratio decreases drastically and approaches 0, houses are expected on average to near a cost of $62,109.
R-Squared: The R-squared value here is 0.25, which is a relatively small value, indicating that only 25% of the variability in the median value of owner-occupied homes is explained by the model.
##
## Call:
## lm(formula = cmedv ~ 1 + ptratio + rooms)
##
## Residuals:
## Min 1Q Median 3Q Max
## -17.725 -2.805 0.106 2.746 39.815
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.905 4.183 -0.695 0.488
## ptratio -1.253 0.134 -9.349 <2e-16 ***
## rooms 7.727 0.413 18.709 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.095 on 503 degrees of freedom
## Multiple R-squared: 0.5611, Adjusted R-squared: 0.5594
## F-statistic: 321.5 on 2 and 503 DF, p-value: < 2.2e-16
Slope of ptratio: All else held constant, households in a town with 1 more student per teacher tend to cost about $1,253 less.
Slope of rooms: All else held constant, households in a town with 1 more room tend to cost about $7,727 more.
Intercept: Towns with a pupil to teacher ratio of 0 does not make any sense in the context of this data, and neither does the number of rooms in a house being equal to 0. Instead, we can say that as the pupil to teacher ratio decreases drastically and approaches 0, and as the number of rooms in a house approaches 0, houses are expected on average to near a cost of $-2,905, which also makes no sense in the real world. Instead, this can be interpeted as if a houses room count approaches zero, but the student to teacher ratio does as well, the rooms appraoching zero outweighs the effect of the pupil to teacher ratio because the cost of a house is nothing, meaning that you clearly cannot even purchase a house if it does not have any rooms in it.
##
## Call:
## lm(formula = cmedv ~ 1 + ptratio + crime)
##
## Residuals:
## Min 1Q Median 3Q Max
## -17.794 -4.599 -1.045 2.563 32.234
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 57.11195 2.98643 19.124 < 2e-16 ***
## ptratio -1.81842 0.16293 -11.161 < 2e-16 ***
## crime -0.28318 0.04101 -6.906 1.52e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.586 on 503 degrees of freedom
## Multiple R-squared: 0.3201, Adjusted R-squared: 0.3174
## F-statistic: 118.4 on 2 and 503 DF, p-value: < 2.2e-16
Slope of ptratio: All else held constant, households in a town with 1 more student per teacher tend to cost about $1,818.42 less.
Slope of crime: All else held constant, households in a town with 1 more percent in crime rate tend to cost about $283.18 less.
Intercept: Towns with a pupil to teacher ratio of 0 does not make any sense in the context of this data.. Instead, we can say that as the pupil to teacher ratio decreases drastically and approaches 0, and if the house is in a town with 0% crime, houses are expected on average to near a cost of $57,111.95. This essentially means that if the pupil to teacher ratio is extremely low (usually considered a good thing), and the crime rate is equal to 0% (also considered a good thing when looking for a home), the price of such a house would be extremely high.