# CODE FOR YOUR FIRST EDA GRAPH SUBMISSION HERE
counties <- read.csv(file = 'counties.csv')
#head(counties)

#unique(counties$name)

#loading data
PA <- map_data(map = "county", region = "Pennsylvania")
#head(PA)

counties$name <- tolower(str_remove_all(counties$name, " County"))

#merging data
mapdata <- left_join(PA, counties, by = c("subregion" = "name"))

#getting rid of duplicate Pennsylvania 
mapdata <- subset(mapdata, select = -parent.location)
mapdata

Graph 1: Choropleth Map

library(viridis)
## Loading required package: viridisLite
library(ggthemes)
library(transformr)

evic_map <- ggplot(data = mapdata) +
  geom_polygon(aes(x = long, y = lat, fill = eviction.rate/100, group = group, na.rm = TRUE),
               color = "grey") +
  expand_limits(x = mapdata$long, y = mapdata$lat) +
  coord_map() +
  scale_fill_viridis(option = "magma", direction = -1, name = "Eviction Rate",
                     labels = scales::percent,
                     guide = guide_colorbar(
                       direction = "horizontal",
                       barheight = unit(2, units = "mm"),
                       barwidth = unit(100, units = "mm"),
                       draw.ulim = FALSE,
                       title.position = 'top',
                       title.hjust = 0.5,
                       title.vjust = 0.5)) + 
  theme_hc() + 
  theme(axis.text.x = element_blank(), 
        axis.text.y = element_blank(), 
        axis.ticks = element_blank(),
        legend.position = "bottom") +
  theme(plot.title = element_text(size = 10)) + 
  xlab(" ") + ylab(" ") +
  labs(title = "Eviction Rate in PA by County in {frame_time}",
       subtitle = "Data from 2000-2016") +
  transition_time(year)
## Warning: Ignoring unknown aesthetics: na.rm
#animate process, it is a good idea to play around with fps and dimensions to best fit your needs. We use an fps of 2 to slow down the animation speed for demonstration purposes. 
animate <- animate(evic_map, fps = 5, height = 574, width = 875)
animate

#Graph 2: Shiny App (see Graph2.R)

#Graph 3: Scatterplot Animated

# CODE FOR YOUR FOURTH EDA GRAPH SUBMISSION HERE
anim = mapdata %>%
  ggplot(aes(x = median.gross.rent, y = rent.burden, size = eviction.rate, col = eviction.rate)) + 
  geom_point(alpha = .5, na.rm = TRUE) + 
  scale_color_viridis_c(option = "C", end = .75, guide = "legend") +
  theme(legend.position = "bottom") + 
  transition_time(year) +
  shadow_mark(size = .1, alpha = .1) +
  labs(
    title = "Median Gross Rent, Rent Burden, and Eviction Rate over time",
    subtitle = 'Date: {frame_time}',
    x = "Median Gross Rent (USD)",
    y = "Rent Burden",
    col = "Eviction Rate",
    size = "Eviction Rate",
    caption = "Source: Eviction Labs"
  )
  

animate(anim, nframes = 17)

#Graph 4: Bar Graph

phillydata = subset(counties, counties$name == "Philadelphia County")

p = melt(data = phillydata %>% group_by(year),
  measure.vars = c("pct.white", "pct.af.am", "pct.hispanic", "pct.am.ind", 
  "pct.asian", "pct.nh.pi", "pct.multiple", "pct.other")) %>% 
  ggplot(aes(x = year, y = value, fill = variable))+ #define the x and y 
  geom_bar(stat = "identity", position="fill") + #make the stacked bars
  scale_y_continuous(labels = scales::percent) +
  labs(
    title = "Racial Breakdown in Philadelphia County",
    subtitle = "Data from 2000 to 2016"
  ) +
  theme(axis.title=element_blank()) +
  scale_fill_discrete(name = "Race", labels = c("White", "African American", "Hispanic", "American Indian",
                                                "Asian", "Native Hawaiian/Pacific Islander", "Multiple", "Other"))
p

#ggplotly(p2, tooltip = "text")

#Introduction

For this data visualization project, we chose to investigate the eviction crisis of the United States, an issue that is so ubiquitous and terrifying, yet real, for millions of people across the nation. The majority of poor renting families in America spend over half of their income on housing costs, and even the fear of eviction itself becomes a significant stressor for all involved. Statistics regarding evictions is jarring, to say the least: in 2016, over 2 million eviction filings were made across the United States, which is equivalent to a rate of four every minute (cite). Additionally, one in 50 renters was evicted from his or her home. It’s not up for debate that this number is far too high, and with all the negative lasting impacts eviction can have on families for generations, the American eviction crisis is undoubtedly a vital issue for all of our communities and our policymakers to address.

Not only does eviction make families susceptible to falling into a long-term cycle of poverty, but it also has severe lasting effects on the mental and physical health of individuals. Evictions have historically resulted in difficulty in finding new housing, and thus, homelessness; one study found that 47 percent of all families in New York City homeless shelters were there as a result of eviction. Furthermore, families who are evicted regularly lose their possessions, lose their jobs, and experience higher rates of depression. For children, the instability caused by eviction can result in worse outcomes in education, health, and future earnings. All in all, evictions create an extremely heavy burden on individuals and families, and combined with the housing affordability crisis, and consistent increases in housing prices in major metropolitan areas, evictions and all its negative effects become an increasingly alarming threat to our nation and its citizens.

Notably, COVID-19 has intensified and brought light to the U.S. eviction crisis. Tens of millions of Americans are potentially at risk of eviction. Many property owners, who lack the credit or financial ability to cover rental payments, will struggle to pay their mortgages and property taxes and maintain properties. COVID-19 has sharply increased the risk of foreclosure and bankruptcy, disrupted the affordable housing market; and destabilized communities across the United States. Although rent and eviction freezes are a temporary solution to this problem, the threat of eviction will not go away once these moratoriums lift. The eviction crisis must be addressed. As a result, we believe that it’s important to closely examine housing and eviction data sets that have become readily available, thanks to the Eviction Lab making this information available to the public, policymakers, and more. Looking for trends and clues within these data sets can help us identify key factors and notable patterns to this crisis that may give us insight into what next steps can be taken to address this issue.

Our primary research question is to investigate the following: What clear trends can we identify in the eviction crisis? What patterns do we see that we can begin to address in housing policy and 1 directly helping communities? The data set we are using comes from the Eviction Lab, which was formed to make nationwide eviction data publicly available, with the goal to help “document the prevalence, causes, and consequences of eviction and to evaluate laws and policies designed to promote residential security and reduce poverty.” We focused specifically on the state of Pennsylvania, and obtained a data set that organized all available eviction data by county, between the years of 2000 and 2016. With variables such as rent burden rates, poverty rates, demographic information, and more, our exploratory data analysis will hopefully shed light to the causes and consequences of eviction. Although there is no data from the most recent years, it will still be valuable to evaluate trends prior to COVID-19, as this can help describe the trajectory of eviction data, even in “normal” circumstances.

#Methods

In order to investigate how the variables we have at hand possibly affect the eviction rate in communities across Pennsylvania, we plan on making a variety of graphs. We plan on beginning broadly, simply looking at the main variable: eviction rate across counties. We will create a choropleth map visualizing how eviction rates differ in different areas of Pennsylvania, and we’ll animate the map to demonstrate how these rates change over time for all of the counties. In order to interpret the animation, we’ll look at basic demographics (such as population density) to see how geography plays a role in eviction rates. Additionally, this graph will help readers quickly identify what areas of the state have lower and higher rates. In order to create this graph, we also merged our original data set from the Eviction Labs with the data from the maps package for the state of Pennsylvania. We have counties that appear gray at times, due to missing data. Unfortunately, there was not much we could do to fix this solution. We searched for the eviction rates for certain counties and for a specific year, but we were unable to find them. When we did find them, we were not one hundred percent certain on how accurate or reliable this data was. We did not want to create an unnecessary data point that did not properly align with our data from the Eviction Labs.

For our second graph, we decided to create a Shiny App that allows users to explore how different variables are correlated. On the X axis, one will be able to select median gross rent, percent renter occupied, median household income, and median property value. We chose these variables because they are relevant to the economics of renting a home. We wanted to investigate how these economic factors might have an impact or be correlated with variables directly related to evictions in Pennsylvania, such as eviction rate, eviction filing rate, poverty rate, and rent burden. Hopefully, this Shiny App will allow us to explore different correlations between the economics of housing and statistics of renting with eviction. This will allow us to address issues such as how the income and property value might be related to the rate of eviction and poverty. We may even be able to draw some interesting conclusions about how the pricing of homes affect a community’s poverty rate, which raises interesting policy questions. For the third graph, we decided to narrow our focus from all of the variables in the 2nd graph, to specifically examine how Median Gross Rent, Rent Burden, and Eviction Rate correlate over time. This will give us a chance to take a deeper dive into how rent specifically affects how much a household is spending on rent, and also how this ties to the rate of eviction. The animation shows the data points for all of the counties in Pennsylvania over time. We predict that having a higher rent will increase the rent burden, and also correlate with a higher eviction rate in the subregion.

Finally, one important section of the dataset we have not picked apart is the racial and ethnic breakdown of counties in Pennsylvania. It was important for us to create a graph for this, because poverty, eviction, and general socio-economic status are often correlated with racial background. Systemic racism cuts deeply into essential parts of our lives, and we thought that it would be essential to visualize how changes in racial make-up of regions might have been correlated with the changes in poverty and eviction rates. Specifically, before creating our final graph, we had original made a scatterplot illustrating the correlation between rent burden and poverty rate across counties, and found that Philadelphia County was a massive outlier in the data set, and had shown a much higher poverty rate and rent burden rate than any other county for the year 2016. Additionally, as Philadelphia is a major metropolitan area with a high population density, holding its rank as the 6th most populous city in the United States, we made the decision to focus specifically on the changes in racial breakdown for Philadelphia County. Thus, our final bar graph was a stacked graph, with a bar for the racial percentages for every year from 2000 to 2016 for Philadelphia County. We hope to identify changes in how racial makeup has changed, which may provide valuable insight and reasoning towards changes in the changing poverty rates and eviction rates in Philadelphia.

#Results: Graphs and interpretations

Graph 1: In Graph 1, we created an animated map of the Pennsylvania counties depicting the eviction rate over time from 2000 to 2016. This data was from the Eviction Labs, and our main variable was the eviction rate. We had to merge datasets with the preloaded Pennsylvania data from the library maps(). We needed to do this in order to have the longitude and latitude to have the outlines of each county which we created by geom_polygon. We can clearly see that overtime, the eviction rate has slowly decreased for the overall counties.

More specifically, the counties in the middle of Pennsylvania have a relatively low eviction rate throughout the time periods. We can see that the East and West side counties of Pennsylvania fluctuate more frequently throughout time. We can also notice that in 2016, the most recent year from our dataset, the eviction rate is relatively low overall. In contrast, it seems that around 2006, the eviction rates for general Pennsylvania counties spiked tremendously as the map became darker and darker. We can also see that in Philadelphia county specifically, the eviction rate continues to increase and darken in color. This is quite important because our interest about our topic sparked due to the fact that Swarthmore College is in Philadelphia county. We also included the year changing in the subtitle of our graph to clarify to the readers/viewers.

Overall, this might be one of our favorite graphs we have from our final project. This graph is visually appealing, straightforward, and easy to interpret. This is also a key component for insight on our topic.

Graph 2: For our second graph, we create a Shiny interactive graph that allows the users to pick which variables they would like to see plotted against other variables in a scatter plot. This might be the most helpful or insightful graph we have included in our final project. We provided options for the x and y axis that make the most sense to plot against.

When we plot eviction rate against the percent of renter occupied, we can see that there is no overall linear relationship between the two variables. Overall, there is a clutter of many data points, which represents the counties, and that there is no pattern. When we plot eviction rate against the median household income, or eviction rate against median property value, we can also notice that there is no linear relationship between the two. However, we might be able to say that there is a slight negative relationship as the points somewhat converge together at a lower eviction rate as median household income increases. Yet, when looking at the scatter plot in a big picture, we would say that there is no correlation or linear relationship.

We can see when we plot eviction rate against median gross rent, we see that there is an overall trend that is not strongly linear, but it is still positive. We can see when median gross rent is low, there is a positive correlation with eviction rates. However, when we see the overall x-axis of the median gross rent, like for instance when median gross rent is over 900, the correlation with eviction rate is non existent. Thus, we can say that there might not be an overall linear relationship between median gross rent and eviction rates. Now taking a look at poverty rate, when we plot poverty rate against median gross rent, we can determine that there is a negative and slightly strong linear relationship to each other. As median gross rent increases, we can say that the poverty rate decreases. However, there is one prominent outlier that does not fit our data correctly. That county is Philadelphia which is an interesting point to note. With an eviction rate of over 20+ in 2016 and a median gross rent of around 900+ in 2016, it is an extreme outlier in our dataset. When we plot poverty rate against percent renter occupied, there is no clear linear relationship between the two variables. There is a big cluster of counties that does not indicate much between the two and there is an extreme outlier that is far from the cluster of the other counties. We can strongly assume there is no linear relationship.

When we plot poverty rate against median household income, there is a clear negative, strong linear relationship. As median household income increases, the poverty rate decreases, which makes the most senese. As income and the economy of individuals overall increase, the poverty rate and those who struggle financially should be lessened. Again, there is an outlier that makes the relationship between the two almost like an exponential, concave up curve. With the outlier removed, the correlation between the two would most definitely increase. Similarly, the same trends apply to when we plot poverty rate against median property value.

Graph 2 presents many interesting interpretations between a selection of two variables that the user is most intrigued in. Not to mention we did not dive in depth about the analysis of the variables rent burden and eviction filing rate against median gross rent, percent renter occupied, median household income, and median property value, but we hoped to show the usefulness of graph 2. This is by far the most helpful graph for the users to compare how variables correlate, or not correlate with each other.

Graph 3: This scatter plot graphing the median gross rent, rent burden, and eviction rates of counties in Pennsylvania over time was particularly interesting, because there were some very strong trends that we noticed. Before interpreting, it is important to note that the original dataset from the Eviction Labs had some odd data points, such that certain variables such as poverty rate and the variables we are graphing remained stagnant for four or five year increments. For example, the poverty rate for Adams County remained at 7.12% between 2000 and 2004. It is unclear whether these statistics are only gathered every few years, or if the unchanging values over time reflect a constant poverty rate over the selected years. Regardless, the unchanging values would explain how the animation sometimes does not show a smooth change over time; the points do not change for several frames due to equal data values across consecutive years.

This graph clearly illustrates that a higher median gross rent is correlated pretty strongly with rent burden. Additionally, as the size of the point indicates the relative eviction rate for that county, and over time, the counties with higher eviction rates continue to rise in terms of median gross rent and rent burden. On the other hand, counties with relatively low eviction rates seem to remain stagnant over time, indicating that while for areas not quite as affected by evictions, the rent burden and average rent does not change throughout the years, but for areas with high poverty rates, the rent burden and average rent fluctuates enormously over time. Areas that are poorer show high levels of rent burden as rent continues to rise, and thus without resources to cope with rising rent, poverty remains high in these areas: an unsettling cycle of generational poverty.

Graph 4: In this bar graph outlining the racial breakdown of Philadelphia County between 2000 and 2016, there are a few interesting immediate observations. For one, the percentage of white residents in Philadelphia County steadily decreases throughout the years. This trend is often referred to as white flight, and it may be possible that white residents are moving out of the cities and into more suburban areas. While the african american population has not significantly changed, we have observed a steady increase in the hispanic population of Philadelphia, as well as the percentage of Asians. Between these years, this county has experienced a large increase in the rent burden, which may shed light into how the movement of white folks away from the county and movement of minorities such as hispanic people and asian people correlates with percent of income spent on rent.

Discussion

Some limitations we can draw are that our dataset is missing values for certain counties in eviction rate. This is a big drawback because graph 1, graph 2, and graph 3 might be a bit skewed without those small, yet important eviction rate data points. Additionally, one drawback is how our dataset only provides data between 2000 and 2016. This is quite unfortunate as it would be interesting to see how the eviction rate has changed in the recent years, especially with the pandemic these days. Future analysis can use the Evictions Lab data set to investigate how eviction trends have fluctuated over time across the United States. It would also be helpful to further investigate the racial breakdown of major metropolitan areas and how that has affected or has been affected by the eviction crisis.