1. Introduction

Real estate is a vital part of the American economy, accounting for a sizable portion of the nations wealth. As a result, understanding the dynamics of the real estate market is critical for policymakers, investors, and anyone interested in the country’s economic health. This report presents an analysis for the a public US real estate market using publicly available data from Zenrows (https://www.zenrows.com/), a popular dataset provider website.

2. Methodology

A dataset that contains historical home sales and rental prices for thousands of cities and neighborhoods across the United States from 2001 to 2020 is used. The market trends over time are investigated to identify the different factors playing role on the housing price. These various factors such as location, home size, and age affect home prices.

US housing price dataset head
price addressCity addressState addressZipcode beds baths latitude longitude
474900000 Windsor WI 53598 4 3 43.21028 -89.32647
65750000 New York NY 10019 5 5 40.76620 -73.98100
45000000 Boston MA 2110 8 12 42.35632 -71.05956
40000000 New York NY 10007 5 7 40.71520 -74.01250
37000000 Aspen CO 81611 7 9 39.21248 -106.85021
35000000 Newport RI 2840 7 9 41.45831 -71.34237

3. Results

3.1. Exploratory data analysis

  • Boxplot for number of bedroomss, bathrooms and house prices: The following figures show the distribution of data in our dataset. The median for the number of the bedrooms, bathrooms and house prices are 3, 2 and 274700$.
  • Correlation Matrix:

A correlation matrix is plotted to find a correlation between different variables.

The following figure shows that the house price is correlated with number of bathrooms (by 41%) which means that as the number of the bathrooms are higher price would be higher. Similarly, number of the bedrooms are correlated with price by 23%. Higher number of beds lead to higher price of the real estate. A strong relationship between bedrooms and bathrooms is observed meaning that real estates with hugher number of bedrooms have more bathrooms as well.

3.2. Relation between the house price and number of bedrooms

  • The Price of the house has a positive relation with the number of the bedrooms. Properties with more bedrooms are more expensive.

3.3. Relation between the house price and number of bathrooms

The following figure depicts that houses with higher number of bathrooms cost more and houses with less bathrooms are cheaper.

3.4. Price in different states

The following figure enables checking prices on the US map. The data density in the east side of US is more than west side.

  • Which state has the highest average price?

Hawaii and New York are the most expensive states to afford a house. The top 20 most expensive states for buying houses are shown in the figure below. The average of house prices in each state is used in the plots.

3.5. Price prediction with machine learning

  • Price prediction with machine learning: Four different machine learning methods (“lasso”, “random forest”, “ridge”, and “glm”) are used to predict the housing prices. Results show that “lasso” had the best performance with the lowest root mean square error (RSME) compared to other methods.
## note: only 1 unique complexity parameters in default grid. Truncating the grid to 1 .

3.6. Feature importance analysis

Which features play the most important role on the price of a house? Results of the machine learning models are used to analyze the importance of the features on housing price. Following figures show that all models predict that number of bathrooms is more importance feature than number of beds for prediction of price of houses. In other words, number of the bathroooms play more important role on housing price.

4. Conclusions: