In R, we use certain software packages called “libraries” to help us analyze data. We’ll load those here:
library(dplyr)
library(stringr)
library(jsonlite)
library(leaflet)We can get property tax delinquency data from Philadelphia’s Open Data site. This dataset is public and you can reproduce my entire analysis by using the code you see here.
csv <- "https://phl.carto.com/api/v2/sql?q=SELECT+*,+ST_Y(the_geom)+AS+lat,+ST_X(the_geom)+AS+lng+FROM+real_estate_tax_balances&filename=real_estate_tax_balances&format=csv&skipfields=cartodb_id"
taxData <- read.csv(csv, stringsAsFactors = FALSE)Remove columns that don’t interest us:
taxDataCleaned <- taxData %>% select(-c(lien_number))Remove rows that won’t help us because they list no owner:
taxDataCleaned <- taxDataCleaned %>% filter(owner != "")Get just 2016 tax amount owed. We want to see where Philadelphia tax enforcement is now.
taxDataCleaned <- taxDataCleaned %>% filter(tax_period == 2016)We’ll take only those properties that have overdue tax and do some summarizing.
We consider a “deadbeat investor” to be someone with three or more overdue properties. We want to concentrate on these owners who are unlikely (due to the number of delinquent properties) to be simple homeowners down on their luck.
investors <- taxDataCleaned %>%
filter(total >0) %>%
group_by(owner) %>%
summarise(TotalOwed = sum(total),
NumOverdueProperties = n()) %>%
arrange(desc(NumOverdueProperties)) %>%
filter(NumOverdueProperties >=3)We have 1449 potential investors… but we know that some are actually governmental agencies. Let’s remove those as we discover them:
investors <- investors %>% filter(owner != "PHILA HOUSING AUTHORITY")
investors <- investors %>% filter(owner != "CITY OF PHILADELPHIA")
investors <- investors %>% filter(owner != "ADMIN OF VET AFFAIRS")
investors <- investors %>% filter(owner != "OFFICE OF THE DISTRICT AT")
investors <- investors %>% filter(owner != "ADMIN OF VETS AFFAIRS")
investors <- investors %>% filter(owner != "CITY OF PHILA")
investors <- investors %>% filter(owner != "ADMIN OF VETERAN AFFAIRS")
investors <- investors %>% filter(owner != "ADM OF VET AFFAIRS")What’s the impact of deadbeat investors?
sum(investors$TotalOwed)## [1] 9103652
sum(investors$NumOverdueProperties)## [1] 9334
We have over $9 million due on around 9k properties… a huge impact.
Let’s dig a bit deeper.
We wonder where these investor-owned deadbeat properties are distributed across the city. Our data doesn’t include zip code, but it does have geocoded coordinates. We can do something with that!
First, let’s get all of our investment properties, and set up our latitude and longitude fields.
investmentProperties <- taxDataCleaned %>%
filter(total > 0) %>%
filter (owner %in% investors$owner)
investmentProperties$lat <- round(as.numeric(str_match(investmentProperties$coordinates,
"\\((\\d+.\\d+),")[,2]),4)
investmentProperties$lng <- round(as.numeric(str_match(investmentProperties$coordinates,
",\\s+(.+)\\)")[,2]),4)
investmentProperties <- investmentProperties %>%
filter(!is.na(lat), !is.na(lng))Now we can use leaflet to mark these properties. Since we have several thousand, we’ll certainly want to use clustering!
Since this is public data, I don’t think it counts as unfair shaming to put both the owner and overdue 2016 tax information! Note that you can zoom and pan as desired.
investmentProperties %>% leaflet() %>% addTiles() %>%
setView(lng = mean(as.numeric(investmentProperties$lng), na.rm=TRUE),
lat = mean(as.numeric(investmentProperties$lat), na.rm=TRUE), zoom = 11) %>%
addMarkers(clusterOptions = markerClusterOptions(),
popup = paste("Address: ",
investmentProperties$location, " ",
investmentProperties$unit,
"<br>Overdue Tax: $",
round(investmentProperties$total,0),
"<br>Investor Owner: ",
investmentProperties$owner,
sep=""))