U.S. Eviction Rates

DATA 110, Project 1

Author

Max Reed

Published

June 17, 2024

A person sits on the ground outside of a house. Luggage and belongings are beside them. A notice of eviction is taped to the door of the house.

Image by David Kovaluk for St. Louis Public Radio. Source: https://www.stlpr.org/government-politics-issues/2021-09-07/st-louis-county-council-reinstates-eviction-moratorium

U.S. Eviction Rates from the Prinction Eviction Lab, Pre-Visualization Work

About the Data

The Eviction Lab of Princeton University works to make eviction data publicly available and accessible nationwide. This dataset provided by them tracks how eviction filing counts have trended compared to pre-pandemic averages for 34 cities and 10 states across the U.S. There is no requirement from the government to track this data, so the Eviction Lab only tracks cities or states that are able to keep this data and choose to share their findings publicly. The data can be found on their site at the following link: https://evictionlab.org/eviction-tracking/

The variables in the dataset are as follows:

  • “site” and “site_id” track the name of a city/state and its given ID code. State codes are two digits, and city codes are 5 digits in length.
  • “month” tracks the month in which that row’s collection.
  • “month_filings” is the number of filings for that site in the given month.
  • “pct_of_historical” is the comparative percentage of the “month_filings” actual figure to the “avg_filings” figure.
  • “avg_filings” is the average filings for a given site pre-pandemic.
  • “eviction_filing_rate” is the overall proportion of evictions compared to the number of rental properties in the site.

Loading Necessary Libraries

# devtools::install_github("hrbrmstr/streamgraph") 
library(streamgraph)  # install "streamgraph" as a package
library (tidyverse) 
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library (ggplot2)
library (RColorBrewer)

Import Data

setwd("~/24X Course Work/DATA110") #sets where our dataset is stored and will be pulled from
evictions <- read_csv("main_landing_page_data.csv") #import dataset to our workspace
Rows: 2279 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (2): site_id, site
dbl  (4): month_filings, pct_of_historical, avg_filings, eviction_filing_rate
date (1): month

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(evictions) #preview the data
# A tibble: 6 × 7
  site_id month      month_filings pct_of_historical avg_filings
  <chr>   <date>             <dbl>             <dbl>       <dbl>
1 0       2020-01-01        105523             1.00      105243.
2 0       2020-02-01         91884             1.01       91014.
3 0       2020-03-01         51421             0.617      83381.
4 0       2020-04-01          7152             0.085      84015.
5 0       2020-05-01         11171             0.118      94832.
6 0       2020-06-01         21654             0.214     101231.
# ℹ 2 more variables: eviction_filing_rate <dbl>, site <chr>

Editing the Dataset

#running the Saidi cleaning just to safe!
names(evictions) <- tolower (names(evictions)) #lowercase all column names
names(evictions) <- gsub(" ","_",names(evictions)) #remove any spaces, replace with underscores

Creating Tiers for Eviction Filing Rate

evictionStates <- evictions |>
  mutate(efr_tier = case_when( 
    eviction_filing_rate <= 0.111 ~ "low",
    eviction_filing_rate > 0.111 & eviction_filing_rate < 0.222 ~ "med",
    eviction_filing_rate >= 0.222 ~ "high")) #EFR_tier is to split Eviction Filing Rate into three respective groups, low, medium, and high

Removing Unneeded Data from Working Dataset

evictionStates$site_id <- as.numeric(as.character(evictionStates$site_id)) #set site_id to numeric to properly remove unneeded data as follows
evictionStates <- filter(evictionStates,site_id < 100) #ensure only states are kept 
evictionStates <- filter(evictionStates,site_id != 0) #remove all_sites
head(evictionStates)
# A tibble: 6 × 8
  site_id month      month_filings pct_of_historical avg_filings
    <dbl> <date>             <dbl>             <dbl>       <dbl>
1       9 2020-01-01          1641             0.954       1721 
2       9 2020-02-01          1450             0.938       1546 
3       9 2020-03-01          1238             0.843       1469.
4       9 2020-04-01           167             0.126       1326 
5       9 2020-05-01            23             0.014       1592.
6       9 2020-06-01            29             0.017       1684.
# ℹ 3 more variables: eviction_filing_rate <dbl>, site <chr>, efr_tier <chr>

Visualizations

Box Plots

evic_bp <-boxplot(evictionStates$pct_of_historical~evictionStates$site,
            main = "",
            xlab = "States",
            ylab = "Deviations from Averaged Filing Rate for Each Month")

Line Plot

evictionStates |>
ggplot(aes(x=avg_filings, y=eviction_filing_rate, group=site)) + #setting ax
  geom_point(aes(color=site)) +
  scale_color_brewer(palette="Paired") +
  labs(
    title = "Eviction Filing Rates v. Average Filing Rates by State",
    x= "Average Monthly Filings (Pre-Pandemic)",
    y= "Actual Filing Rate by Month (By Percentage of All Rentals)")

Streamgraph

streamgraph(evictionStates, key="site", value="month_filings", date="month") |>
  sg_axis_x(1, "year", "%Y") %>%
  sg_fill_brewer("Paired") %>%
  sg_legend(TRUE, "State: ") %>%
  sg_title(title = "Eviction Rates around the United States") 
Warning in widget_html(name, package, id = x$id, style = css(width =
validateCssUnit(sizeInfo$width), : streamgraph_html returned an object of class
`list` instead of a `shiny.tag`.
Warning: `bindFillRole()` only works on htmltools::tag() objects (e.g., div(),
p(), etc.), not objects of type 'list'.
Eviction Rates around the United States

Essay Portion

In preparing my data for visualization, I took many precautions to ensure the work would be as seamless as possible. I began by loading my necessary libraries in order to create my plots (streamgraph, ggplot2), load needed color packs (RColorBrewer), as well as run general standard code (tidyverse). After housekeeping by ensuring my casing was uniform as my spaces were made to underscores, I created a new column. Entitled “eft_tier”, short for Eviction Filing Rate Tier, this variable grouped similar filing rates into Low, Med, or High accordingly, creating a new categorical variable. Next, I took advantage oof the included “site-id” variable to easily remove all non-state sites, giving me a smaller, more concise dataset to work with and draw conclusions from when plotting and creating visualizations.

My primary visualization is that the Streamgraph, which examines the Monthly Filings of each city over time. In this, we can clearly see how sharply the COVID-19 Pandemic affected this data as well, as it sharply pinches around when lockdown went into place. There’s also a somewhat significant bump later that same year, perhaps coming from the lockdown taking effect on those who may have become unemployedd or for whatever reason may no longer be able to pay rent. Although there are no labels for the axes or a title, as I was unable to find such a feature for streamgraphs, the interactive State key allows users to highlight a state of their choice by name without needing to hover. Similarly, using the hover feature allows the reader to take note of the exact filing counts for the time and state they hover over, allowing a deeper experience with teh datas than other state version might allow.

Something I wish I could have done was include the data I initially excluded without overwhelming my visualizations. For example, there were multiple cities in Texas included in the dataset, but Texas is not represented since the state itself does not collect this data across all cities. Inclusion of this data would give a broader look to the eviction rates across the US, and a deeper investigation could look into and compare the practices of how this data is gathered from city to city or state to state, maybe eventually leading to some standardization of this practice. I imagine this is the goal of the Princeton University Eviction Lab, as an encompassing look at data will