Policy Issue & Proposed Solution

Column

Housing Regression Plot

Column

Unfortunately, in the United States access to housing is not a right. You are not guaranteed a home to rest at night, or a safe place to raise your children. Marginalized communities remain marginalized, and the rich stay rich. Sadly, after a while homeless encampments surrounding our cities begin to just fade into the landscape. The worst part of it is that financial institutions want this status quo to persist, because it guarantees their position as the dominant social class within our ironically democratic society. In fact, in recent years, we have seen an acceleration of housing property being used as an investment method by Hedge Funds and Investment Banks. Houses have become assets to keep on the books, left empty, rather fulfilling their intended purpose of providing shelter. On top of this, we have witnessed collateralized debt obligations (CDO) turn mortgages into a destructive investment weapon that destabilized the world economy. There is a deep need in our nation’s cities to understand how to better utilize housing properties, and where there could be potential for construction of affordable housing.

Rising property prices and need for more afforable housing are two problems facing the city of Boston. Undervalued property due to location and crime rate could be identified for government purchase. My project aimed to understand how crime correlates with property value to see if it would be beneficial for governments to purchase land in high crime areas to rebuild into afforable housing, while in the process buidling up communities with the intent of lowering crime potential. This cycle of property devaluation with crime rate increases is a well known phenomenon taken into account when purchasing a home. If a house is in a bad neighborhood, then it is most likely cheaper just based on common logic. This project aims to dive deeper into this logic, to quantitatively and geospatially measure it, so that city governments can leverage their findings for the construction of afforable housing. The example to my left is of the kind of quantitative regression plot from the start of the course I planned on making for X=Crime and Y=Property_Value. Through the course of this presentation I will explain how I made this analysis possible using government portals, open source applications and multiple nonstandardized datasets.

Civic Tech

Column

Crime Geospatial Visualization

Housing Geospatial Visualization

Column

My civic tech proposal is very simple, in fact, I used most of the types of tech in my project. I am advocating for the use of open source data and applications to be used for quantitiative analytics. Open source methods are transpartent and equitable. City governments are capable of leveraging these methods to solve problems that they have a disadvantage in, such as property purchasing. Boston City government has an afforable housing crisis that requires new data-driven insights to solve the crisis. To the left we can see two examples of geospatial visualizations created in ggplot as prototypes for our analysis, which was later completed in mapbox.

Geolitica is a “software as a service” product that uses Microsoft Azure government Cloud to protect their sensitive data. From my understanding the machine learning models use some sort of supervised learning classification method to locate areas of interest. The data used comes from three types of databases. Record management systems (RMS) is used for crime data released through officer crime reports. Computer assisted dispatch (CAD) databases for collision and public safety related data. Lastly, automated vehicle location (AVL) databases for real time officer location data. RMS and CAD are for unique events, managed through addresses or latitude/longitude like the Boston open data portal does as well.

Geolitica creates real time crime analytics dashboards that are easily shareable with community stakeholders. Heatmaps can be created to show if some areas are being over or under patrolled based on machine learning algorithms for crime hotspots. These easy-to-understand visualizations allow communities to understand how police keep their neighborhoods safe and where high crime areas are located. Geolitica manages daily patterns of patrol vehicles, so you know if officers are going to geolocations where they are most needed. These locations can be based on hotspots, intel operations or set automatically for route efficiency on the Geolitica platform. This creates a strong relationship between guidance and compliance methodologies between precincts and patrols.

Real time inteligence platforms such as Geolitica are the end goal for this policy intiative, but for the scope of this course we will be using ggplot, google data studio, mapbox and flexdashboards to create the best geospatial dashboard possible within the bounds of our course capabilities. Since these applications are easy to access, they can be implemented into city government analysis without expanding on their current budget.

Policy Literature

Column

Crime and Residential Choice:

Empirical Analysis:This article dives into the relationship between crime (independent variable) and housing prices (dependent variable). The hypothesis is that crime is an early indicator of neighborhood transition. They use hedonic regression to quantify the effect of crime on housing prices. Hedonic regression from my understanding is a linear regression model used to predict the price of a good.

The researchers collected both crime and housing data from the study’s main sample location of Columbus, OH. Crime data was provided by the Columbus Police Department (CPD). It has crime data types such as homicide, rape, robbery, assault, burglary, larceny, and automobile theft. Property value data contains `89 census tracts from 1995-1998. Characteristics of the housing data has both physical aspects of the property and the properties price value.

The researchers found that the effects of crime rates on housing prices are misleading. Not only that, but they affect prices at different rates based on the income class of the community. The most interesting yet unsurprising finding was that violent crime has the greatest effect on property value.

Spatial-temporal crime predictions in smart cities:

The authors argue that as cities and the way police handle crime is becoming more complex due to growth in size and technology. The paper outlines a predictive approach using spatial analysis and auto-regressive models to detect high-risk regions to forecast crime patterns. The author’s hypothesis is that both region (independent variable) and time of the year (independent variable) can be used to predict crime rate. First step is to identify the regions with high crime density using spatial analysis. Then the crime prediction model is used for each region. Lastly the algorithm creates a spatial-temporal crime forecasting model that gives us both a summary of crime dense regions and the predictor variables associated with them.

The data is of both Chicago and New York, specifically Manhattan. The Chicago data was collected on the Plenario platform, an open-source urban data resource. The New York data was collected on the NYC Opendata platform, collected by the city government.

The results of this paper are not meant to give an insight on crime in smart cities, but rather show how their analytical process works. The results forecasted the number of crimes within a given urban region with high accuracy. Their conclusion shows that their methodology can be replicated, while their actual algorithms must be tailor made for each region under analysis. In future research papers they hope to implement other machine learning techniques such as spatial clustering models.

Do Affordable Housing Projects Harm Suburban Communities?

This paper evaluates the claims that affordable housing developments often harm communities more than they help. Opponents of affordable housing fear an increase in crime, drop in property value and rise in taxes. To analyze these claims the authors, use a time series group control design, comparing crime rates, property values and property taxes in Mount Laurel NJ with the same variables in nearby municipalities that do not have adorable housing developments. The variables described would be the dependent variables for this controlled comparison study, while the a/b testing control and variation would be Mount Laurel and the nearby municipalities.

Location of the Mount Laurel affordable housing is located adjacent to luxury, market-rate single family homes and one age-restricted retirement community. This adjacency gives the authors the percent direct comparison to similar surrounding common luxury homes. Spatial data was collected by creating a longitudinal series of outcomes for Mount Laurel and the comparison townships, before and after the opening of the affordable housing complexes. From my understanding they used a statistical Wald test so that they could create a model with multiple parameters.

The authors found that adorable housing developments in Mount Laurel were not associated with crime, lower property values or high taxes when compared to the surrounding similar municipalities. This is despite what previous studies have found that use regression models to link a correlation between violent crimes and the construction of affordable housing. These previous studies suggest that location of affordable housing in areas with a preexisting history of violent crime, increases that violent crime, while affordable housing in suburban or high-income areas have no correlation with an increase in violent crimes. I suppose this suggests to us that affordable housing does not decrease property value of higher market valued homes.

Who Participates in Local Government?

This is a research paper written by the authors of a book I am currently reading for this project called Neighborhood Defenders, an empirical study of the housing market, affordable housing, and gentrification. I will most likely use the actual book itself as a reference for my project, but I found this additional research paper by the same authors on local participation in planning and zoning board meetings for housing development. They argue that participation in board meetings is a luxury that most working-class families do not have time for. If time is money, then high income property owners have the advantage when it comes to local democratic participation.

The authors compiled a data set of instances citizens spoke at planning and zoning board meetings for housing development. They matched the same individuals that spoke in these meetings to a preexisting voter file to investigate their history of political participation. These datasets resulted in a better understanding of the participatory demographics of the communities, unsurprisingly uprooting another community issue that has led to unequal participation and rising housing prices. In this analysis, time spent in public board meetings speaking (independent variable) can show us the demographics that are most likely to participate in said meetings (dependent variable), allowing us to unearth the underlying pattern in civic participation.

The authors concluded that most individuals participating in the board meetings were older, male, longtime residents, voters in local elections and homeowners than renters. These individuals also opposed new housing construction, which in turn has resulted in the rising of housing costs leading to further participation inequality. I found this research paper to be an insightful and creative method of analyzing the social phenomena effecting the rise of housing costs within communities.

Google Data Studio

Column

Boston Housing Dashboard

Column

This Google Data Studio lesson was extremely useful for me, both in understanding the purpose of dashboards and incorporating them into my project. The dashboard to my left shows a scatterplot and treemap. The scatterplot shows how gross area effects total value, while being categorized by neighborhood and year of construction.

On a relevant side note, right before this course started I accepted a data scientist contracting role for the Department of Energy Artificial Intelligence and Technology Office. The office is very new to DOE, therefore they have a lot of budgeting and program managment experience, but limited data analtyics. They had interest in incoporating google products into the office at the same time we had this unit, so it was very fruitful to learn about the capabilties of Google Data Studio. I created several Google Data Studio dashboards using their unclassified AI inaititives data and budget numbers, that the office leadership were impressed with! It really showed me the real life importance of incoporating new types of tech innovations into government programs, and how I could leverage that knowledge in my career.

Spatial Data Visualization

Row

Row

The above map of Boston is of the 10 police districts within the city. The blue dots are for the locations and pricing of single-family homes by the assessment department in 2021. The yellow dots being locations of assaults recorded by the Boston PD also in 2021. The shading of blue dots is lower to higher pricing, with lighter blue being lower cost and darker blue being higher. The yellow dots are both simple and aggravated assault. I filtered out only assault data because all the crime data was too large to load into Mapbox, also assault data seems like it would have a significant impact on neighborhood housing value. As we can see from the map some districts have more yellow assault dots with lighter blue home dots, specifically B2 and B3 police districts. E18 also has lower housing costs but less assault records, indicating that there a plenty of lower priced single-family homes further in land from the cost. B2, B3 and E18 make a sort of corridor where homes are cheaper, further inland and have higher crime rates, all of which are factors that create an opportunity for the city government to buy single family homes for affordable housing projects. On the other hand, Assault location is not a perfect indicator, because police could be biased on where they patrol due to racial and ethnic neighborhoods being perceived as higher crime. A better indicator might be theft locations, because those are more likely to be reported by victims rather than witnessed on the street. In the next map we will view the same housing data, but this time with theft locations on top.

Row

Row

The green dots on the map above represent locations of reported theft in the city of Boston in 2021 by the Boston PD. As we can see the locations of reported theft are very similar to the locations of assault depicted in the first map. There does seem to be more reported cases in police districts A1 and D4, but there are few single-family homes reported there, therefore less likely for there to be homes on sale at reasonable prices for city government purchase. I would make the same conclusions as with the first map, that the most affordable opportunities for affordable housing purchases are with the B2, B3, and E18 corridor. I am very surprised to find the maps having such similar clusters. I assumed because the nature of the crimes they would have different target locations, but this was not the case as seen in their mirrored similarities.

Performance Metrics & Data Wrangling

Column

Creating performance metrics was somewhat difficult for my project. I had to data wranlge 3 datasets, one being geospatial, to create my performanec metrics. I clipped the single family home data using an open source ArcGis alternative called Qgis. I clipped the data by boston police districts, then I took the average total house value of each police district. This process gave me a dataset that gave me consitent rows repsenting all 12 police districts. This was the only way I could think to combine the housing and crime data in a way that could be displayed in a bivariant regression plot. If anyone has a better or interesting suggestion on how to merge data that isn’t standardized together, feel free to comment!

Some Factors to Consider. There are other type of geospatial categories to classify the data into, such as neighborhood or voting precinct. Data must be averaged or counted to make it fit evenly into the dataset. You are limited by only using open source data wrangling reasources such as R, Python, Qgis, etc.

Column

Outcome Metrics with Shiny

Column

Theft Regression Plot

Assault Regression Plot

Column

As seen in the graph 1 there is a negative linear relationship between theft and single family home values in Boston. This graph took quiet a bit of time simply because I had to merge the crime and single family home data together in a way that makes sense (see attached csv). I had to download QGIS to clip all of the single family home values separately by a police district GEOJSON file, then take the average of those values so that I could have a home value for police district. Each dot on the scatter plot represents one of the 12 police districts within the city of Boston. This is the only way I could think of besides a geographic map to display comparatively the bivariate relationship between housing values and crime.

As seen in this graph there is a slightly stronger negative linear relationship between assault and single family home values in Boston than in the first visualization. The main difference with this visualization is that since there are less assault counts than theft counts in the Boston PD records, therefore the y axis is different. I used the same process to collect this data, but filtered out only the crimes with “assault” as a keyword in the descriptions.

In conclusion, there is a negative linear relationship between crime and assault from what we can see in the police precinct data. From the mapbox maps we can also see that the best police precints to purchase property for afforable housing is B2, B3 and D4. When looking at these precincts on the linear regression visualizations we can see that these districts have the highest counts of crime and the lowest average single family home values. This was a very simple analysis, but I think it was a good starting point for some form of machine learning modeling on the same topic.