10/27/2019

Introduction

Zillow is an online American real estate database with the purpose of empowering consumers with information helping them make informed decisions for buying, selling and renting properties.

Aim: To predict the Selling Price of properties by using the available “Last Sold Price” of properties on Zillow in multiple neighborhoods of different cities.

Data

Primary Data Source

We would be using 2 zillow api’s as our primary data source:

  • Deepcomps : To get detail of properties like size, are, last sold price, etc. within a neighbourhood for a given Zillow Property ID(ZPID)

  • DeepSearchResults : To get multiple ZPID’s using address, zipcode, city and addresses of properties.

Secondary Data Source

To fetch multiple ZPID’s by providing addresses, we plan to use Zillow website/Publicly available verified addresses, as secondary data sources

Problem Description and Analytics Plan

Problem Description

Whenever a property holder wants to sell his property, below are some of the issues faced:

  • may not be aware of the current price of the property
  • may not know comparable prices of similar kind of properties in the area

Analytics plan

  • Data Cleaning
  • Descriptive analysis(Correlation matrix, histogram)
  • Predictive analysis (Linear Regression, Decision Tree, Random Forest)

Evaluation Plan

The output of our analysis is the predicted selling price of a property. By running all the machine learning models, we would analyse our results:

  • By comparing models using RMSE and R-square values
  • Plotting correlation matrix and histograms to see the distribution and dependencies among variables.