Recap:

We intially intended to cover the following topics as part of our analysis

The house price index transition across all states are generally corelated, because we have the house price data for multiple years across states. We can use that data to identify trends in the HPI transition over a period of time once we have that we can identify states which are highly correlated

Every time there is a drop, in the correlation it would give rise to Investment opportunities

Once trends and Investment opportunities has been identified for the values in the past, similar analysis can be done on the forecasted values and that would give us beforehand idea of suitable investment opportunity

Predict the house prices

Peer Comments Summary

While there were many useful feedback received during our proposal presentation, we have incorporated the following:

  1. Problem with finding trends with house prices is that there can be huge variations in the prices.
    • We have faced this issue and had to implement data smoothening techniques to reduce the volatility in the data.
  2. The idea was unique and interesting, and I know your targets are managers and investment bankers, but this could also have great implications for small independent investors and homebuyers as well.
    • Yes Indeed, with the kind of forecast and pattern we havre found this can be used by individual to decide on a suitable market position

Peer Comments Summary (cntd.)

  1. Idea is good. It might be beneficial to use lgb model to do forecast work.
    • While LGB model is an innovative model, it’s application are primarily limited to Regression, Binary and Multiclass Classification.
  2. Maybe as a secondary data source you could utilize crime information and determine the most dangerous neighborhoods in a particular city and add a weight to each listing depending on which neighborhood they reside in.
    • We tried finding the above data, but we required historical data in order to draw an effective comparison, no such data were found

Peer Comments Summary (cntd.)

  1. It would be great to compare the performance and accuracy of your model considering various factors like AUC, ROC, Confusion matrix, Precision, Recall, etc
    • We have used model validation techniques relevant to ARIMA and time series analysis models, Because you will be working on the house price index transition it would be great if you can find any other interesting trends in the data, like is there any particular time of the year where the prices go up or down. We Have found such trends and will be sharing druing our walkthrough.
  2. Looking forward to see the forecasting 5 years down the line
    • We have tried forecasting for 10 Years
  3. You can consider how other external factors like anticipated mortgage rates, Tax law etc. in forecasting house prices - We have included the Mortgage rate transistion into our analysis

Data Summary

- There are two different dataset that we are using
  - Data set for house price prediction 
      - Data has been consumed from the Zillow API 
      - Data has 38021 rows and 21 columns
  - Data Set for Time series analysis
      - Data set has been consumed from Zillow analytics
      - Data has 50 rows corresponding to each state,
      - 300 time periods corresponding to 300 months/25 years
      - Mortgage rate transisiton from Quandl

Data Extraction from API

Data Exttraction using API

Price transiton across all states in USA

Price Transition over the Years

Current Data and Forecast based on Time Series Analysis

Price Transition over the Years

Price Transition vs Mortgage rate transition

Price Transition over the Years

Forecast comparison

Price Transition over the Years

Present vs Forecasted Trends

Price transiiton across all states in USA

Price Transition over the Years

ML Procedures

  • We used the followinf ML/NLP Procedure
  • ARIMA Time Series Analysis
  • Linear Regression
  • Gradient Boosting (GBM)
  • Generalized Linear Model for Regression Using H20

ML Preprocessing for House Price Predictions

  • Last Sold Price vs year Sold Price Transition over the Years

Last Sold Price as per Houses

Price Transition over the Years.jpeg

ML Model Summary

  • The data is biased because of the visible skewness and it also has a lot of sparseness. The current models predict the house prices indices with higher rmse values that can further be tuned to get more accuracy.

  • In all the models above, GBM has outperformed as it has given the lowest rmse. GBM is an excellent ML model for guassian distribution dataset, however the family of the data distribution like guassian, poisson, binomial etc can always be mentioned while applying the GBM ML Model.

Key Take Aways & Future Works

  • We have reliazed that the data cnsumed from Zillow is not cleaned by default and would require significant of data preprocessing before it can be used in models
  • Time Series forecasting is a tricky algotithim to implement, while there are advanced model like auto ARIMA, in order to sucessfully implement it wiht accuracy, good understanding about the trend and seasonality of the data is required
  • One of the key sucess parametres of implementing time Series Forecasting is the length of data we have