Summary

Basement sewer back-ups are an ongoing issue that affected nearly 4,000 properties in Baltimore City in 2019 alone. In April 2018, the EPA and Maryland Department of the Environmental required Baltimore City to launch a pilot Expedited Reimbursement Program to help mitigate the clean-up costs for residents following a basement sewage back-up. However, during the program’s first year, less than two percent of households reporting basement sewage back-ups using 311 applied for reimbursement through the program and only ten households actually received a reimbursement through the program.

Blue Water Baltimore and others are currently working to change the city’s approach to basement sewage back-ups.

This analysis seeks to use public data to answer the following:

Background on basement sewage backups in Baltimore

A 2015 analysis by the Environmental Integrity Project looked at data on sewage overflows from 2011-2015, information on E.coli and Enterococcus levels from 2010-2014, and damage claims made to the city from July 1, 2012 to July 1, 2015. This analysis is available as an interactive map and a published report.

Gather and organize 311 service request data

This analysis uses 311 service requests for basement sewage backups from Open Baltimore, data on building backups from the Department of Public Works quarterly reports, and demographic information from the American Community Survey.

We hoped to supplement our analysis by using detailed tables on building sewage backups included in the quarterly reports published by the Department of Public Works as a requirement of the Sanitary Sewer Consent Decree. We explored extracting these tables using Tabula but were not successful in this effort so the data is not currently included in this analysis.

Import 311 service request data from Open Baltimore

Using the RSocrata package, we imported 39,556 service requests for “basement sewage water” at 24,494 unique addresses (excluding duplicates or transferred requests) from Open Baltimore.1 The relevant columns for this analysis include:

  • Created date: The date and time a service request is created by an operator at the 311 call center or by an individual submitting a service request using the website or mobile app.
  • Service request status: When requests are created, the status is marked as open. A request is closed either when an issue is resolved or, in some cases, when a work order is issed to address the issue. This column also identifies duplicate or transferred requests.
  • Status date: When a request is closed or updated, the date and time is recorded in this column. For this analysis, the days to close is calculated based on the difference between the created date and status date for closed requests.
  • Latitude and longitude: The location of the service request. For requests where we knew the location was missing or inaccurate, we geocode the address to identify the correct location.

While the data includes a column for recent activity, this columns is not used consistently so we did not include this information in our analysis.

# You must create an account on Open Baltimore and get an API key to use RSocrata.
# Sign up for Open Baltimore: https://data.baltimorecity.gov/signup
# More information on the Socrata API: https://dev.socrata.com/docs/endpoints.html
# Following is adapted from this post: https://mattherman.info/blog/point-in-poly/

base <- "https://data.baltimorecity.gov/resource/"
resource <- "9agw-sxsr" # 311 Customer Service Requests
vars <- c("ServiceRequestNum", "CreatedDate", "SRStatus", "StatusDate", "LastActivity", "Outcome", "Address", "Neighborhood", "Longitude", "Latitude") # selected variables
sr_type <- "WW-Sewer Water In Basement" # selected service reuqest type
call <- paste0(base, resource, ".json", "?$select=", paste(vars, collapse = ", "), "&SRType=", sr_type)

# download data. api access token is saved via keyring package
requests <- read.socrata(call, app_token = keyring::key_get("Open Baltimore")) %>%
  as_tibble() %>%
  janitor::clean_names() %>% # Clean variable names
  mutate(
    created_date = ymd_hms(created_date), # Convert variable classes
    status_date = ymd_hms(status_date),
    longitude = as.numeric(longitude),
    latitude = as.numeric(latitude)
  ) %>%
  filter(!(sr_status == "Open (Duplicate)" | sr_status == "Closed (Duplicate)" | sr_status == "Closed (Transferred)")) # Exclude duplicate and transferred requests

Import census tracts, neighborhoods, and council districts

Using the tigris package, we imported boundaries for U.S. Census tracts so we could aggregate service requests at the tract level and compare the distribution of service requests to demographic characteristics based on the American Community Survey.

We also imported boundaries for Baltimore neighborhoods and City Council district boundaries using shapefiles from Open Baltimore.

baltimore_tracts <- tigris::tracts(state = "MD", county = "Baltimore city", class = "sf")
baltimore_council_districts <- st_read("data/baltimore-council-districts.shp")
baltimore_neighborhoods <- st_read("data/baltimore-neighborhoods.shp")

Replace missing or incorrect request locations

After reviewing the basement sewage backup service requests, we found the following issues:

  • 65 requests with missing coordinates
  • 3,087 requests with incorrect coordinates2

We exported these 3,152 service requests to a CSV file, geocoded the addresses using the Census Bureau’s Batch Geocoding service (using Little Geocoder for convenience), and combined this geocoded data with the original data.3

requests %>%
  filter(is.na(latitude) | as.numeric(latitude) == 39) %>% # Filter missing and incorrectly coded coordinates
  write_csv("data/requests_location_issue.csv") # Export ~3,184 requests to CSV for manually geocoding with the Census Bureau's Batch Geocoding service

geocoded_requests <- read_csv("data/requests_location_issue-geocoded.csv") # Import geocoded data
## Warning: Missing column names filled in: 'X1' [1]
geocoded_requests <- geocoded_requests %>%
  filter(is_match == "Match") %>% # Filter to matched addresses
  select(-c(1, 10:20, 23:29)) %>% # Drop columns (The columns may be incorrect if this code is run again from scratch)
  select(1:8, 10, 9)

requests <- requests %>%
  filter(!is.na(latitude)) %>%
  filter(!(as.numeric(latitude) == 39)) %>%
  bind_rows(geocoded_requests)

# Add relevant variables, convert data to an SF object, and exclude requests outside city boundary line
requests_sf <- requests %>%
  mutate(
    year_created = year(created_date), # Add year
    month_created = month(created_date), # Add month
    year_qrt_created = quarter(created_date, with_year = TRUE), # Add a year/quarter
    days_to_closed = ifelse(sr_status == "Closed", int_length(interval(ymd_hms(created_date), ymd_hms(status_date))) / 86400, NA)
  ) %>% # Add days to close column for all closed service requests
  st_as_sf(
    coords = c("longitude", "latitude"),
    agr = "constant",
    crs = (4269), # Matching the CRS of the tigris block group data
    stringsAsFactors = FALSE,
    remove = TRUE
  ) %>%
  st_transform(4269) # %>%
# st_join(st_as_sf(balt_boundary))
# TODO: If there are points outside the city, we need to determine the best way to exclude them or address them in the analysis.

Join requests to tracts, neighborhoods, and city council districts

Using the SF package, the requests are joined to corresponding U.S. census tracts, neighborhoods, and city council districts.

# Join the closed service requests with block groups, tracts, neighborhoods, council districts
# requests_blockgroups <- requests_sf %>%
#  st_join(baltimore_blockgroups, join = st_within)

requests_tracts <- requests_sf %>%
  st_join(baltimore_tracts, join = st_within) %>%
  st_drop_geometry()

# TODO: Figure out if there are CRS issues these joins
requests_neighborhoods <- requests_sf %>%
  st_join(baltimore_neighborhoods, join = st_within) %>%
  st_drop_geometry()

requests_councildistricts <- requests_sf %>%
  st_join(baltimore_council_districts, join = st_within) %>%
  st_drop_geometry()

Exploring service requests

For this analysis, we decided to look at service requests from January 2017 through February 2020. This period allowed us to investigate reports of basements sewage backups both before and following significant policy changes and infrastructure improvement that the Baltimore Department of Public Works is required to complete as part of the Sanitary Sewer Consent Decree Program. Key events in study period include:

analysis_period <- interval(ymd("2017-01-01"), ymd("2020-02-29"))

# Filter request data to the analysis period
requests_sf_analysis <- requests_sf %>%
  filter(ymd_hms(requests_sf$created_date) %within% analysis_period) %>% 
  add_count(address, name = 'requests_at_address')

requests_tracts_analysis <- requests_tracts %>%
  filter(ymd_hms(requests_tracts$created_date) %within% analysis_period)

requests_neighborhoods_analysis <- requests_neighborhoods %>%
  filter(ymd_hms(requests_tracts$created_date) %within% analysis_period)

Citywide

The quarterly counts of unique addresses making service requests has declined since 2014.

Its is unclear whether this represents a reduction in backups or a reduction in service requests. This could be checked against the DPW quarterly report data on building backups.

The median days to close service requests has also declined.

It is unclear if requests are always resolved when the service request is closed.

Requests plotted by month

summary_requests_monthly <- requests_tracts_analysis %>%
  group_by(year_created, month_created) %>%
  summarize(
    request_count = n(),
    address_count = length(unique(address)),
    med_days_to_close = median(days_to_closed, na.rm = TRUE),
    max_days_to_close = max(days_to_closed, na.rm = TRUE),
    min_days_to_close = min(days_to_closed, na.rm = TRUE)
  ) %>%
  mutate(address_pct_rank = percent_rank(address_count)) %>% # Add percent rank column
  mutate(address_quintile = ntile(address_count, 5)) # Add ranked bins column (5 bins)

The following table shows eight months with the greatest number of 311 service requests in this period.

year_created month_created request_count address_count med_days_to_close max_days_to_close
2017 1 674 581 1.8867650 228.82398
2017 12 578 493 1.8004861 427.25054
2018 1 670 560 2.1063252 463.60400
2018 2 645 532 2.0239352 451.18389
2018 5 590 519 2.7074248 446.92450
2018 7 638 540 1.9401852 236.84302
2019 12 583 485 0.8771181 59.28868
2020 1 646 538 1.2004745 39.82922
summary_requests_monthly %>%
  ggplot(aes(ymd(paste0(year_created, "-", month_created, "-01")), address_count, group = 1)) +
  geom_point(color = "#440154FF") +
  geom_line(color = "#440154FF") +
  geom_smooth(method = "loess") +
  labs(
    x = "Month",
    y = "Addresses",
    title = "Addresses with 311 requests for\nbasement sewage back-ups by month",
    subtitle = "January 2017 to February 2020",
    caption = "Source: Open Baltimore"
  ) +
  hrbrthemes::theme_ipsum_rc()

summary_requests_monthly %>%
  ggplot(aes(ymd(paste0(year_created, "-", month_created, "-01")), request_count, group = 1)) +
  geom_point(color = "#440154FF") +
  geom_line(color = "#440154FF") +
  geom_smooth(method = "loess") +
  labs(
    x = "Month",
    y = "Addresses",
    title = "Total 311 requests for\nbasement sewage back-ups by month",
    subtitle = "January 2017 to February 2020",
    caption = "Source: Open Baltimore"
  ) +
  hrbrthemes::theme_ipsum_rc()

summary_requests_monthly %>%
  ggplot(aes(ymd(paste0(year_created, "-", month_created, "-01")), med_days_to_close, group = 1)) +
  geom_point(color = "#440154FF") +
  geom_line(color = "#440154FF") +
  geom_smooth(method = "loess") +
  labs(
    x = "Months",
    y = "Median days to closed",
    title = "Median days to close 311 requests for basement sewage backups by month",
    subtitle = "January 2017 to February 2020",
    caption = "Source: Open Baltimore"
  ) +
  hrbrthemes::theme_ipsum_rc()

Requests mapped by address count

Some addresses are associated with more than one service request in this period. It is unclear if these requests represent distinct basement sewage backup events or if residents submitted a second or third request if the issue remained unresolved after their initial service request. The following table shows the 10 addresses with the highest number of requests in this period.

address neighborhood requests_at_address
4014 Echodale Ave, Baltimore, Md, 21206 Waltherson 14
801 Showell Ct, Baltimore, Md, 21202 Johnston Square 14
500 N Hilton St, Baltimore, Md, 21229 Lower Edmondson Village 14
2305 W Lanvale St, Baltimore, Md, 21216 Evergreen Lawn 14
306 S Augusta Ave, Baltimore City, 21229 Irvington 11
3818 Glenmore Ave, Baltimore, Md, 21206 Rosemont East 11
1016 Marlau Dr, Baltimore, Md, 21212 Chinquapin Park 11
102 W Jeffrey St, Baltimore, Md, 21225 Brooklyn 11
1312 W Lexington St, Baltimore, Md, 21223 Franklin Square 10
6317 Wallis Ave, Baltimore, Md, 21215 Cross Country 10

More than 511 properties in 170 neighborhoods have four or more service requests for basement sewage backups in the study period. The map below shows the location of these properties.

Tracts and neighborhoods

We aggregated service requests into the tracts and neighborhoods then created the following variables:

  • Request count
  • Address count
  • Median days to close
  • Maximum days to close
  • Minimum days to close

In some cases, the maximum days to close is very high but it is unclear if this represents extended delays in addressing the issue or if city staff simply neglected to close the service request until several months after the issue was resolved.

We also created a summary to look at changes in these variables month-by-month throughout the study period.

As of February 28, 2020, this period includes 12,800 service requests or ~32% of the non-duplicate requests with locations information.

For the requests in this period:

  • 72 requests (including 62 addresses) could not be matched to a census tract. There are 2 tracts with no requests.
  • 62 requests (including 56 addresses) could not be matched to a neighborhood. There are 13 neighborhoods with no requests.

Requests by census tract

Mapping the count of unique addresses by census tract suggests that basement sewage back-ups are concentrated in northwest Baltimore and in north Baltimore close to Baltimore County.

# Add summary columns by census tract
summary_requests_tracts <- requests_tracts_analysis %>%
  group_by(GEOID) %>%
  summarize(
    request_count = n(),
    address_count = length(unique(address)),
    med_days_to_close = median(days_to_closed, na.rm = TRUE),
    max_days_to_close = max(days_to_closed, na.rm = TRUE),
    min_days_to_close = min(days_to_closed, na.rm = TRUE)
  ) %>%
  mutate(address_pct_rank = percent_rank(address_count)) %>% # Add percent rank column
  mutate(address_quintile = ntile(address_count, 5)) # Add ranked bins column (5 bins)

# Join tract summary with tract geometry
summary_requests_tracts <- summary_requests_tracts %>%
  full_join(baltimore_tracts, by = "GEOID") %>%
  mutate_at(vars(request_count:address_count), ~ replace(., is.na(.), 0)) %>% # Replace NA with 0 requests for two tracts w/ no requests
  st_as_sf()

Requests by neighborhood

# Add summary columns by neighborhood
summary_requests_neighborhoods <- requests_neighborhoods_analysis %>%
  group_by(label) %>%
  summarize(
    request_count = n(),
    address_count = length(unique(address)),
    med_days_to_close = median(days_to_closed, na.rm = TRUE),
    max_days_to_close = max(days_to_closed, na.rm = TRUE),
    min_days_to_close = min(days_to_closed, na.rm = TRUE)
  ) %>%
  mutate(address_pct_rank = percent_rank(address_count)) %>% # Add percent rank column
  mutate(address_quintile = ntile(address_count, 5)) # Add ranked bins column (5 bins)
## Warning: Factor `label` contains implicit NA, consider using
## `forcats::fct_explicit_na`
# Join neighborhood summary with neighborhood geometry
summary_requests_neighborhoods <- summary_requests_neighborhoods %>%
  full_join(baltimore_neighborhoods, by = "label") %>%
  mutate_at(vars(request_count:address_count), ~ replace(., is.na(.), 0)) %>%
  st_as_sf()

The following table shows ten neighborhoods with the greatest number of unique addresses associated with 311 service requests in this period.

label request_count address_count med_days_to_close max_days_to_close
Belair-Edison 639 450 2.015938 252.6286
Frankford 517 341 2.639687 411.4896
Glen 323 230 1.595799 257.9875
Howard Park 320 216 1.803935 265.0551
Hamilton Hills 303 201 2.025243 251.8736
Loch Raven 275 198 2.943588 145.5322
Coldstream Homestead Montebello 281 196 2.022292 142.9118
Idlewood 277 188 2.102656 130.9457
North Harford Road 268 186 2.146256 170.0750
Glenham-Belhar 289 181 2.146765 424.8217

Mapping the count of unique addresses by neighborhood shows a somewhat different pattern than census tracts. In this map, basement sewage back-ups are concentrated in east Baltimore in the area most effected by backups along the sewer interceptor flowing to the Back River Wastewater Treatment Plant.4

Import American Community Survey data

To expand on this exploratory analysis, we combined the tract-level summary of service requests with American Community Survey data on the demographic characteristics, income, and housing costs.

Select ACS variables

#TODO: Create wide data table that links individual service requests to corresponding ACS data
#TODO: Send file to Erin

# Load a list of available ACS variables
# v18 <- load_variables(2018, "acs5", cache = TRUE)
# View(v18)

# Many of these variables are only available at the tract level
acs_vars <- c(
  total_pop = "B01003_001", # Also used as a summary variable in long format
  white_pop = "B02001_002",
  black_pop = "B02001_003",
  hispanic_total_pop = "B03002_012",
  median_age = "B01002_001",
  household_total = "B11001_001",
  families_total = "B17010_001",
  families_below_poverty = "B17010_002",
  median_monthly_housing_costs = "B25105_001",
  total_housing_units = "B25106_001",
  owner_occupied_housing_units = "B25106_002",
  renter_occupied_housing_units = "B25106_024",
  median_household_income = "B25099_001",
  median_household_income_w_mortgage = "B25099_002",
  median_household_income_wo_mortgage = "B25099_003",
  median_gross_rent = "B25064_001", # Total (dollars)
  median_monthly_owner_cost = "B25088_001",
  median_monthly_owner_cost_pct = "B25092_001"
)

Import and combine ACS data at tract level

acs_tract <- get_acs(
  geography = "tract",
  variables = acs_vars,
  state = "MD", county = "Baltimore city", # geometry = TRUE,
  summary_var = "B01003_001", # Total population
  year = 2018
)

# Switch from a long to wide format (drops margin of error)
acs_tract_wide <- acs_tract %>% pivot_wider(id_cols = "GEOID", names_from = "variable", values_from = "estimate")

# Convert count variables into percentages where appropriate
acs_tract_wide <- acs_tract_wide %>%
  janitor::clean_names(case = "snake") %>%
  mutate(pct_white = white_pop / total_pop) %>%
  mutate(pct_hisp = hispanic_total_pop / total_pop) %>%
  mutate(pct_black_pop = black_pop / total_pop) %>%
  mutate(pct_family_poverty = families_below_poverty / families_total) %>%
  mutate(pct_owner_occupied = owner_occupied_housing_units / total_housing_units) %>%
  mutate(pct_renter_occupied = renter_occupied_housing_units / total_housing_units)

# Add quartile rankings for median income, percent family poverty, and monthly owner costs as a percent of income
acs_tract_wide <- acs_tract_wide %>% 
  mutate(quartile_median_household_income = ntile(median_household_income, 4)) %>%
  mutate(quartile_pct_family_poverty = ntile(pct_family_poverty, 4)) %>%
  mutate(quartile_monthly_owner_cost_pct = ntile(median_monthly_owner_cost_pct, 4))
# Joining the 311 requests with the ACS data
acs_summary_requests_tracts <- summary_requests_tracts %>%
  full_join(acs_tract_wide, by = c("GEOID" = "geoid"))

acs_requests_tracts_analysis <- requests_tracts_analysis %>%
  full_join(acs_tract_wide, by = c("GEOID" = "geoid"))

# acs_summary_requests_tracts %>% write_csv('baltimore_tract_request_count.csv')
# acs_requests_tracts_analysis %>% write_csv('acs_requests_tracts_analysis.csv')

Exploring requests and area characteristics

Blue Water Baltimore and the Environmental Integrity Project have expressed particular concern about the financial costs of cleaning up and fixing a basement sewage backups for residents who cannot afford the expense. While this data does not include the income for specific individuals affected by basement sewage backups, the following variables are used to describe the issue:

Requests by median household income

# Tract address count by median household income (colored by quartile)
acs_summary_requests_tracts %>%
  filter(!is.na(quartile_median_household_income)) %>%
  ggplot(aes(x = median_household_income, y = address_count, color = factor(quartile_median_household_income))) +
  geom_jitter() +
  scale_color_viridis_d(name = "Median income", labels = c("$19,375 to $48,092", "$49,054 to $62,250", "$62,695 to $82,139", "$83,929 to $213,200")) +
  scale_x_log10(labels = dollar) +
  labs(
    title = "Count of basement sewage back-up reports\ncompared to median household income by U.S. census tract",
    x = "Median household income",
    y = "Addresses reporting basement sewage back-ups",
    caption = "Source: Open Baltimore; American Community Survey"
  ) +
  coord_flip() +
  hrbrthemes::theme_ipsum_rc()

Census tracts in the lowest half of median houshold income in the city make up 6,398 of all addresses with associated service requests in this period or 42.5% of the total (15,039). This analysis suggests a relatively even distribution of service requests across all four income quartiles.

However, this distribution does not account for the possibility of under-reporting to 311 by low-income tenants.

quartile_median_household_income min_median_household_income_quartile median_household_income_quartile max_median_household_income_quartile sum(address_count)
1 19375 41136 48092 2283
2 49054 55437 62250 4114
3 62695 71576 82139 3924
4 83929 110817 213200 2243
# Total address count by median household income quartile
acs_summary_requests_tracts %>%
  mutate(quartile_median_household_income = ntile(median_household_income, 4)) %>%
  ggplot(aes(x = quartile_median_household_income, y = address_count, fill = quartile_median_household_income)) +
  geom_col()
## Warning: Removed 10 rows containing missing values (position_stack).


  1. These counts are current as of February 28, 2020.↩︎

  2. The requests with incorrect coordinates were created between January and May 2018 and the values for latitude and longitude were converted to integers (e.g. 39, -77; 39, -76) and lost the relevant location information.↩︎

  3. The geocoded data included both exact and non-exact matches. Both groups are included in the geocoded data that is combined with the original data for analysis. 212 requests could not be matched with a location and are excluded from this analysis.↩︎

  4. Learn more about the backups associated with the Back River Wastewater Treatment Plant and the Headworks Project from the Department of Public Works.↩︎