Why am I doing this?

I have started taking data science certificate courses to obtain practical applications and skills of data science. And in order to learn better, I started to apply the data science skills to a project I have been interested in which is ‘housing market’ For ‘housing market’ project, I will slice and dice data provided in Zillow using R programming which will help me understand trends of the housing market.

Zillow’s dataset webpage

What problems am I solving?

I am intersted in single family homes, and I have been waiting for the right moment to purchase a home. Home prices are higher than ever, and while some speculate it will continue to go up, some speculate it will go home. By answering questions below, I want to understand the market trends. And it will help me make a decision if I want to hold off on purchasing a house or not.

Is sale and listing price ratio going up or down? Is the sqft price of sale and listing ratio going up or down?

Load packages necessary for execution

source("dbook.R")
load.packages(c("dplyr", "ggplot2", "tidyr", "knitr","stringr", "DT", "forecast", "fpp","lattice"))
## [1] loading dplyr
## [1] loading ggplot2
## [1] loading tidyr
## [1] loading knitr
## [1] loading stringr
## [1] loading DT
## [1] loading forecast
## [1] loading fpp
## [1] loading lattice
opts_chunk$set(comment=NA, fig.width=8, fig.height=5)

To facilitate the analysis, I also wrote a function that transforms ‘wide’ (one column per month) format from Zillow website into ‘narrow’ format (one row per month).

read.ztbl <- function(filename){
  prep.ztbl(read.csv(filename))
}

prep.ztbl <- function(atbl){
  c.nondate = length(which(!str_detect(colnames(atbl), "^X")))
  range.datecol = (c.nondate+1):length(colnames(atbl))
  tbl = atbl %>% gather("Month", "Value", range.datecol)
  tbl$Date = as.Date(sprintf("%s-%s-01", str_extract(tbl$Month, "\\d{4}"),
                             str_extract(tbl$Month, "\\d{2}$")))
  tbl$Month = NULL
  tbl
}