Introduction

1. Question

Help a real-estate agency understand the zip codes that would generate most profit on short term rentals within New York City.

2. Data

The analysis will make use of:

  1. Zillow Data - Provides the price for two-bedroom properties in NYC
  2. Airbnb Data - Provides the rent charged for properties in NYC

3. Initial Assumptions

  1. The investor will pay for the property in cash (i.e. no mortgage/interest rate will need to be accounted for).
  2. The time value of money discount rate is 0% (i.e. $1 today is worth the same 100 years from now).
  3. All properties and all square feet within each locale can be assumed to be homogeneous (i.e. a 1000 square foot property in a locale such as Bronx or Manhattan generates twice the revenue and costs twice as much as any other 500 square foot property within that same locale.)

Setup

Libraries Required

require(Rmisc)
require(data.table)
require(DT)
require(tidyverse)
require(stringr)
require(car)
require(kableExtra)
require(ggplot2)
require(plotly)
require(R.utils)

Setup Variables

# name of  zillow data file
name_zillow_file <- 'Zip_Zhvi_2bedroom'

# name of airbnb data file
name_airbnb_file <- 'listings' 

Reading data

house_price <- 
  fread(paste0(name_zillow_file, '.csv')) # reading zillow data

house_rent <- 
  fread(paste0(name_airbnb_file, '.csv.gz')) # reading airbnb data

Global Variables

All code related changes should me made in this section only.


Global variables for house buying price data - Zillow

# first date variable in house price data
datetime_start_filter <- "1996-04" 

# last date variable in house price data
datetime_end_filter <- "2017-06"

# city filter
city_filter <- 'New York' 

# first variable in a sequence of factor variables
factor_start_filter <- 'City'

# last variable in a sequence of factor variables
factor_end_filter <- 'CountyName'

Global variables for house rent price data - Airbnb

# subset of variables to be analyzed in house rent data
revenue_vars_filter <- c('zipcode','id', 'last_scraped', 'neighbourhood_group_cleansed',
                         'property_type', 'room_type', 'accommodates', 'bathrooms',
                         'bedrooms', 'beds', 'bed_type', 'square_feet',
                         'security_deposit', 'price', 'cleaning_fee',
                         'guests_included', 'extra_people')

# ASSUMPTION - properties with price greater than $2000 per night (0.01% observations) are removed
price_filter <- 2000

# ASSUMPTION - minimum square feet filter
min_square_feet_filter <- 80 

# ASSUMPTION - maximum square feet filter
max_square_feet_filter <- 10000

# set of numeric columns
numeric_cols_filter <- c('year', 'month', 'day', 'zipcode') 

# set of factor columns
factor_cols_filter <- c('neighbourhood_group_cleansed', 'property_type' ,'room_type') 

# Function to convert price columns to numeric
# Steps:
# 1. store indexes of price columns
# 2. remove the '$' character
# 3. substitute ',' with blank
# 4. convert to numeric
convert_price_to_numeric <- function(data, var)
{
  index <- which(colnames(data) == var) # indexes of price columns
  df1 <- str_sub(data[,index], 2) # remove '$' character
  df2 <- gsub(",","",df1) %>% # substitute ',' with blank
    as.numeric() # convert to numeric
  return(df2)
}

# number of bedrooms filter
filter_bedrooms_filter <- 2 

# ASSUMPTION - years to filter on to calculate the year-on-year price increase
year_filter_subset <- c(2014, 2015, 2016, 2017)

# ASSUMPTION - occupancy rate of house every year
occupancy_rate <- 0.75 

# ASSUMPTION - percentage of stays during which extra people will stay
frequency_extra_people_filter <- 1/5

Preparing Data

Preparing Zillow Data

Key Data Quality Steps

  1. Date columns have been unpivoted
  2. Data types of variables are appropriately converted

Creating a function to prepare the Zillow Data where:

  1. The dataset is tidy:
    • Each variable has its own column
    • Each observation has its own row
    • Each value has its own cell
  2. City filter is applied
  3. Date columns have been unpivoted
  4. RegionName is renamed to zipcode
  5. Data types of variables are appropriately converted
prepare_zillow_data <- function(house_price,
                                datetime_start=datetime_start_filter,
                                datetime_end=datetime_end_filter,
                                city=city_filter,
                                factor_start=factor_start_filter,
                                factor_end=factor_end_filter)
{
  
  ######### Wrangling Zillow Data ######### 
  # Steps:
  # 1. Select first and last index of date variables
  # 2. Read house price data
  # 3. Filter by city
  # 4. Unpivot date columns
  # 5. Rename RegionName to zipcode
  
    index_datetime_start <- 
    which(colnames(house_price) == datetime_start) # index of first date variable
  
    index_datetime_end <- 
      which(colnames(house_price) == datetime_end) # index of last date variable
    
    house_price_filtered <-
      house_price %>% # read house price data
      filter(City %in% city) %>%  # filter by city
      gather(Datetime, Median_price, index_datetime_start:index_datetime_end) %>%  # unpivot date columns
      separate(Datetime, c("year", "month")) %>%  # separate date into year and month
      rename(zipcode=RegionName) # rename RegionName to zipcode
    
  ######### Changing data types of variables #########
    index_factor_start <- 
      which(colnames(house_price_filtered) == factor_start) # index of first factor variable
    
    index_factor_end <- 
      which(colnames(house_price_filtered) == factor_end) # index of last factor variable
    
    house_price_filtered[,index_factor_start:index_factor_end] <- 
      map(house_price_filtered[,index_factor_start:index_factor_end]
          , as.factor) # converting above set of variables to factor variables
    
    house_price_filtered$year <- 
      as.numeric(house_price_filtered$year) # converting year to numeric
    
    house_price_filtered$month <- 
      as.numeric(house_price_filtered$month) # converting month to numeric
    
  ######### return final data #########
    return(house_price_filtered)
}

Using the above function to prepare Zillow Data

house_price_filtered <- prepare_zillow_data(house_price)

Zillow Data Dictionary

Variable Description
RegionID Zillow assigned number only, assigned consecutively when the regions are defined
zipcode Zip code of where the property is located
City City of where the property is located
State State of where the property is located
Metro General name of the surrounding area where the property is located
CountyName Political and administrative division of a state, referred to as a particular part of the state
SizeRank Population of the area; the lower the number the greater the population
year Year when data was pulled
month Month when data was pulled
Median_price Median Price within that area

Zillow Data

# display first 1000 rows
kable(head(house_price_filtered, 1000)) %>% 
  kable_styling(bootstrap_options = c("striped", "hover", "responsive")) %>% 
  scroll_box(width = "100%", height = "500px")
RegionID zipcode City State Metro CountyName SizeRank year month Median_price
61639 10025 New York NY New York New York 1 1996 4 NA
61637 10023 New York NY New York New York 3 1996 4 NA
61703 10128 New York NY New York New York 14 1996 4 NA
61625 10011 New York NY New York New York 15 1996 4 NA
61617 10003 New York NY New York New York 21 1996 4 NA
62012 11201 New York NY New York Kings 32 1996 4 NA
62044 11234 New York NY New York Kings 52 1996 4 NA
61790 10314 New York NY New York Richmond 68 1996 4 123100
62026 11215 New York NY New York Kings 71 1996 4 NA
61642 10028 New York NY New York New York 109 1996 4 NA
61635 10021 New York NY New York New York 190 1996 4 NA
61628 10014 New York NY New York New York 379 1996 4 NA
61650 10036 New York NY New York New York 580 1996 4 NA
62120 11434 New York NY New York Queens 622 1996 4 NA
61782 10306 New York NY New York Richmond 668 1996 4 138100
61788 10312 New York NY New York Richmond 764 1996 4 153000
61636 10022 New York NY New York New York 894 1996 4 NA
62028 11217 New York NY New York Kings 1555 1996 4 NA
61627 10013 New York NY New York New York 1744 1996 4 NA
62041 11231 New York NY New York Kings 1817 1996 4 NA
61780 10304 New York NY New York Richmond 1958 1996 4 117500
61781 10305 New York NY New York Richmond 2087 1996 4 131900
61785 10309 New York NY New York Richmond 3682 1996 4 129100
61784 10308 New York NY New York Richmond 4149 1996 4 149600
61779 10303 New York NY New York Richmond 4647 1996 4 99200
61639 10025 New York NY New York New York 1 1996 5 NA
61637 10023 New York NY New York New York 3 1996 5 NA
61703 10128 New York NY New York New York 14 1996 5 NA
61625 10011 New York NY New York New York 15 1996 5 NA
61617 10003 New York NY New York New York 21 1996 5 NA
62012 11201 New York NY New York Kings 32 1996 5 NA
62044 11234 New York NY New York Kings 52 1996 5 NA
61790 10314 New York NY New York Richmond 68 1996 5 124000
62026 11215 New York NY New York Kings 71 1996 5 NA
61642 10028 New York NY New York New York 109 1996 5 NA
61635 10021 New York NY New York New York 190 1996 5 NA
61628 10014 New York NY New York New York 379 1996 5 NA
61650 10036 New York NY New York New York 580 1996 5 NA
62120 11434 New York NY New York Queens 622 1996 5 NA
61782 10306 New York NY New York Richmond 668 1996 5 136900
61788 10312 New York NY New York Richmond 764 1996 5 149800
61636 10022 New York NY New York New York 894 1996 5 NA
62028 11217 New York NY New York Kings 1555 1996 5 NA
61627 10013 New York NY New York New York 1744 1996 5 NA
62041 11231 New York NY New York Kings 1817 1996 5 NA
61780 10304 New York NY New York Richmond 1958 1996 5 113700
61781 10305 New York NY New York Richmond 2087 1996 5 131900
61785 10309 New York NY New York Richmond 3682 1996 5 129200
61784 10308 New York NY New York Richmond 4149 1996 5 148200
61779 10303 New York NY New York Richmond 4647 1996 5 102000
61639 10025 New York NY New York New York 1 1996 6 NA
61637 10023 New York NY New York New York 3 1996 6 NA
61703 10128 New York NY New York New York 14 1996 6 NA
61625 10011 New York NY New York New York 15 1996 6 NA
61617 10003 New York NY New York New York 21 1996 6 NA
62012 11201 New York NY New York Kings 32 1996 6 NA
62044 11234 New York NY New York Kings 52 1996 6 NA
61790 10314 New York NY New York Richmond 68 1996 6 123900
62026 11215 New York NY New York Kings 71 1996 6 NA
61642 10028 New York NY New York New York 109 1996 6 NA
61635 10021 New York NY New York New York 190 1996 6 NA
61628 10014 New York NY New York New York 379 1996 6 NA
61650 10036 New York NY New York New York 580 1996 6 NA
62120 11434 New York NY New York Queens 622 1996 6 NA
61782 10306 New York NY New York Richmond 668 1996 6 135900
61788 10312 New York NY New York Richmond 764 1996 6 146900
61636 10022 New York NY New York New York 894 1996 6 NA
62028 11217 New York NY New York Kings 1555 1996 6 NA
61627 10013 New York NY New York New York 1744 1996 6 NA
62041 11231 New York NY New York Kings 1817 1996 6 NA
61780 10304 New York NY New York Richmond 1958 1996 6 111200
61781 10305 New York NY New York Richmond 2087 1996 6 131400
61785 10309 New York NY New York Richmond 3682 1996 6 129000
61784 10308 New York NY New York Richmond 4149 1996 6 146800
61779 10303 New York NY New York Richmond 4647 1996 6 103900
61639 10025 New York NY New York New York 1 1996 7 NA
61637 10023 New York NY New York New York 3 1996 7 NA
61703 10128 New York NY New York New York 14 1996 7 NA
61625 10011 New York NY New York New York 15 1996 7 NA
61617 10003 New York NY New York New York 21 1996 7 NA
62012 11201 New York NY New York Kings 32 1996 7 NA
62044 11234 New York NY New York Kings 52 1996 7 NA
61790 10314 New York NY New York Richmond 68 1996 7 123300
62026 11215 New York NY New York Kings 71 1996 7 NA
61642 10028 New York NY New York New York 109 1996 7 NA
61635 10021 New York NY New York New York 190 1996 7 NA
61628 10014 New York NY New York New York 379 1996 7 NA
61650 10036 New York NY New York New York 580 1996 7 NA
62120 11434 New York NY New York Queens 622 1996 7 NA
61782 10306 New York NY New York Richmond 668 1996 7 134500
61788 10312 New York NY New York Richmond 764 1996 7 144800
61636 10022 New York NY New York New York 894 1996 7 NA
62028 11217 New York NY New York Kings 1555 1996 7 NA
61627 10013 New York NY