rm(list=ls())
library(stargazer)
library(plyr)
library(lfe)
library(zoo)
library(data.table)
library(ggplot2)
library(dplyr)
library(fst)
library(RSQLite)
library(DBI)
dbdir = "/data/ztrax.db"
con <- dbConnect(RSQLite::SQLite(),dbdir)
The following image shows a sample listing from Zillow. We have marked the data we extracted from listings using red squares.
For each home for which we extracted data from Zillow listings, we then merged information in the ZTRAX databse using the home address. We use one home in our sample to demonstrate the extraction process.
In order to extract information for all the homes, we created a vector of home addresses in iterated the following code over that vector using a for loop.
Below are the information of the sample home we use to demonstrate the data extraction process.
street = "1638 e 88th st"
city = "los angeles"
zip = 90002
sale_date = as.Date("2015-10-29")
sale_price = 205000
In ZTRAX a property is uniquely identified using the field rowid in zasmt_main table. Each rowid is uniquely associated with an importparcelid. The importparcelid is used later to identify transactions associated with the property given by the rowid.
The following chunk of code identifies the rowid and the importparcelid for a given property address. In addition, we can use the same query to extract the most recent property tax payment, the census tract, and the lot area, as these fields are also in the zasmt_main table.
query <- paste0("select zm.rowid,
zm.propertyfullstreetaddress,
zm.propertycity,
zm.propertyzip,
zm.propertyaddresscensustractandblock,
zm.importparcelid,
zm.lotsizesquarefeet as lot_size,
zm.taxamount,
zm.taxyear
from zasmt_main as zm
where LOWER(zm.propertyfullstreetaddress) like '%",street,"%' and
LOWER(zm.propertycity) = '",city,"' and
zm.propertyzip=",zip)
home_ids <- data.table(dbGetQuery(con,query))
print(t(home_ids))
## [,1]
## rowid "D8EDC000-0000-0000-1795-000000000000"
## propertyfullstreetaddress "1638 E 88TH ST"
## propertycity "LOS ANGELES"
## propertyzip "90002"
## propertyaddresscensustractandblock "60375351"
## importparcelid "12643800"
## lot_size "6254"
## taxamount "5768.81"
## taxyear "2020"
rowid and importparcelidrowid = 'D8EDC000-0000-0000-1795-000000000000'
importparcelid = 12643800
Following chunk of code is used to extract property characteristics floor area, year built, year remodeled, number of bedrooms, and number of bathrooms using the rowid.
The same code can be used to extract these information for multiple properties. Instead of just one rowid, we can combine multiple rowids as a string separated by commas and passed in to the following query. For example, if there are two rowid’s 1 and 2, then the we can use rowid=1,2 in the above code chunk.
query <- paste0("select
buildingareas.buildingareasqft,
building.yearbuilt,
building.yearremodeled,
building.totalbedrooms,
building.totalcalculatedbathcount
from zasmt_building as building
LEFT JOIN zasmt_buildingareas as buildingareas
on buildingareas.rowid = building.rowid
where building.rowid in ('",rowid,"')
")
home_characteristics <- data.table(dbGetQuery(con,query))
print(t(home_characteristics))
## [,1]
## buildingareasqft "1574"
## yearbuilt NA
## yearremodeled NA
## totalbedrooms "5"
## totalcalculatedbathcount "3"
The following code chunk extracts all the previous transactions for a given importparcelid. As before, the code can be modified to extract the same information for multiple importparcelid’s. This query returns the transaction date (documentdate), transaction price (salespriceamount), and loan amount (loanamount) for all the transactions associated with a given importparcelid.
transactions <- data.table(dbGetQuery(con,paste0("select
zt.transid,
zt.documentdate,
zt.salespriceamount,
zt.loanamount
from ztrans_main as zt
JOIN ztrans_propertyinfo as zpinfo
on zpinfo.transid = zt.transid
where zpinfo.importparcelid in (",importparcelid,")
and zt.salespriceamount>0
order by zt.documentdate desc")))
print(head(transactions))
## transid documentdate salespriceamount loanamount
## 1: 372638756 2015-09-29 205000 0
## 2: 6185873 2005-11-17 345000 276000
## 3: 5653580 2004-08-27 235000 188000
Next step is extracting the previous transaction. From the data extracted from Zillow listing we know the most recent sales price, listing price, sold date, and listed date. To calculate the years of ownership, we need the date on which the seller purchased the home. The following code chunk identifies the purchase transaction associated with the most recent sale identified from Zillow listings.
transactions <- transactions[documentdate<(sale_date-90)]
purchase_info <- transactions[1,]
print(t(purchase_info))
## [,1]
## transid "6185873"
## documentdate "2005-11-17"
## salespriceamount "345000"
## loanamount "276000"
The years of ownership is calculated by subtracting the purchase year from sale year.