Introduction

The goal of this analysis was to build a model that would be able to predict house price based on given characteristics like number of stories, the year the home was built, and total number of rooms. The model I have chosen predicts a house’s price with a margin of error of about $150,000. These predictions can be leveraged to figure out if a house is overvalued or undervalued.

Takeaway 1

All else being equal, square feet is by far the most important variable in determining the price of a house. Interestingly, the number of bathrooms turned out to be an important variable to consider as well; moreso than total number of rooms. In other words, the model makes better use of the number of bathrooms in a house than total rooms in making predictions. When evaluating the true quality of a house, it might be more effective to prioritize the number of bathrooms over the total number of rooms.

Takeaway 2

State turned out to be an important predictor for home price. If we took a home in Virginia, in a zip code with an average income of $100,000, it would probably have a higher price than an identical home in Pennsylvania in a zip code with the same average income of $100,000. This could be due to factors like the weather or proximity to Washington DC. Either way, I would suggest expanding further into Virginia since houses yield a higher value than PA.

Takeaway 3

It is important to state which variables are not important in predicting home price. Whether or not the home had a basement was relatively unimportant, as was the description of the home. If you are looking at investing in a given home, it does not matter if the home is a condominium, multi-family home, rowhome, or single family home (if all else is equal). Rather, more attention should be paid to the universal variables like square feet, lot area, and number of bathrooms.

Takeaway 4

One of the most surprising results was the relative importance of roof type. This variable turned out to be more important than year built and exterior finish (holding all other predictors constant).

Takeaway 5

The final prediction model performed far better on observations within the ordinary price range compared to observations with very high prices. To be more specific, the model had trouble predicting home value when the true price was greater than $1,000,000,000, but did well predicting home value for homes priced below $500,000. Therefore, this model should be used with caution when attempting to predict the price for high-end homes.

Overall Takeaway

When building this model, I valued predictive power over interpretability. The model I used is very good at accurately predicting the price of a home given a set of input variables, but lacks in its ability to provide information on the effects of each variable on price. We only know how important each variable is for the model on a relative scale. For instance, this model does not tell us the effect of increasing square feet by 100. In order to make inferences like this, some predictive accuracy might need to be sacrificed.