Basement sewer back-ups are an ongoing issue that affected nearly 4,000 properties in Baltimore City in 2019 alone. In April 2018, the EPA and Maryland Department of the Environmental required Baltimore City to launch a pilot Expedited Reimbursement Program to help mitigate the clean-up costs for residents following a basement sewage back-up. However, during the program’s first year, less than two percent of households reporting basement sewage back-ups using 311 applied for reimbursement through the program and only ten households actually received a reimbursement through the program.
Blue Water Baltimore and others are currently working to change the city’s approach to basement sewage back-ups.
This analysis seeks to use public data to answer the following:
A 2015 analysis by the Environmental Integrity Project looked at data on sewage overflows from 2011-2015, information on E.coli and Enterococcus levels from 2010-2014, and damage claims made to the city from July 1, 2012 to July 1, 2015. This analysis is available as an interactive map and a published report.
This analysis uses 311 service requests for basement sewage backups from Open Baltimore, data on building backups from the Department of Public Works quarterly reports, and demographic information from the American Community Survey.
We hoped to supplement our analysis by using detailed tables on building sewage backups included in the quarterly reports published by the Department of Public Works as a requirement of the Sanitary Sewer Consent Decree. We explored extracting these tables using Tabula but were not successful in this effort so the data is not currently included in this analysis.
Using the RSocrata package, we imported 39,556 service requests for “basement sewage water” at 24,494 unique addresses (excluding duplicates or transferred requests) from Open Baltimore.1 The relevant columns for this analysis include:
While the data includes a column for recent activity, this columns is not used consistently so we did not include this information in our analysis.
# You must create an account on Open Baltimore and get an API key to use RSocrata.
# Sign up for Open Baltimore: https://data.baltimorecity.gov/signup
# More information on the Socrata API: https://dev.socrata.com/docs/endpoints.html
# Following is adapted from this post: https://mattherman.info/blog/point-in-poly/
base <- "https://data.baltimorecity.gov/resource/"
resource <- "9agw-sxsr" # 311 Customer Service Requests
vars <- c("ServiceRequestNum", "CreatedDate", "SRStatus", "StatusDate", "LastActivity", "Outcome", "Address", "Neighborhood", "Longitude", "Latitude") # selected variables
sr_type <- "WW-Sewer Water In Basement" # selected service reuqest type
call <- paste0(base, resource, ".json", "?$select=", paste(vars, collapse = ", "), "&SRType=", sr_type)
# download data. api access token is saved via keyring package
requests <- read.socrata(call, app_token = keyring::key_get("Open Baltimore")) %>%
as_tibble() %>%
janitor::clean_names() %>% # Clean variable names
mutate(
created_date = ymd_hms(created_date), # Convert variable classes
status_date = ymd_hms(status_date),
longitude = as.numeric(longitude),
latitude = as.numeric(latitude)
) %>%
filter(!(sr_status == "Open (Duplicate)" | sr_status == "Closed (Duplicate)" | sr_status == "Closed (Transferred)")) # Exclude duplicate and transferred requests
Using the tigris package, we imported boundaries for U.S. Census tracts so we could aggregate service requests at the tract level and compare the distribution of service requests to demographic characteristics based on the American Community Survey.
We also imported boundaries for Baltimore neighborhoods and City Council district boundaries using shapefiles from Open Baltimore.
baltimore_tracts <- tigris::tracts(state = "MD", county = "Baltimore city", class = "sf")
baltimore_council_districts <- st_read("data/baltimore-council-districts.shp")
baltimore_neighborhoods <- st_read("data/baltimore-neighborhoods.shp")
After reviewing the basement sewage backup service requests, we found the following issues:
We exported these 3,152 service requests to a CSV file, geocoded the addresses using the Census Bureau’s Batch Geocoding service (using Little Geocoder for convenience), and combined this geocoded data with the original data.3
requests %>%
filter(is.na(latitude) | as.numeric(latitude) == 39) %>% # Filter missing and incorrectly coded coordinates
write_csv("data/requests_location_issue.csv") # Export ~3,184 requests to CSV for manually geocoding with the Census Bureau's Batch Geocoding service
geocoded_requests <- read_csv("data/requests_location_issue-geocoded.csv") # Import geocoded data
## Warning: Missing column names filled in: 'X1' [1]
geocoded_requests <- geocoded_requests %>%
filter(is_match == "Match") %>% # Filter to matched addresses
select(-c(1, 10:20, 23:29)) %>% # Drop columns (The columns may be incorrect if this code is run again from scratch)
select(1:8, 10, 9)
requests <- requests %>%
filter(!is.na(latitude)) %>%
filter(!(as.numeric(latitude) == 39)) %>%
bind_rows(geocoded_requests)
# Add relevant variables, convert data to an SF object, and exclude requests outside city boundary line
requests_sf <- requests %>%
mutate(
year_created = year(created_date), # Add year
month_created = month(created_date), # Add month
year_qrt_created = quarter(created_date, with_year = TRUE), # Add a year/quarter
days_to_closed = ifelse(sr_status == "Closed", int_length(interval(ymd_hms(created_date), ymd_hms(status_date))) / 86400, NA)
) %>% # Add days to close column for all closed service requests
st_as_sf(
coords = c("longitude", "latitude"),
agr = "constant",
crs = (4269), # Matching the CRS of the tigris block group data
stringsAsFactors = FALSE,
remove = TRUE
) %>%
st_transform(4269) # %>%
# st_join(st_as_sf(balt_boundary))
# TODO: If there are points outside the city, we need to determine the best way to exclude them or address them in the analysis.
Using the SF package, the requests are joined to corresponding U.S. census tracts, neighborhoods, and city council districts.
# Join the closed service requests with block groups, tracts, neighborhoods, council districts
# requests_blockgroups <- requests_sf %>%
# st_join(baltimore_blockgroups, join = st_within)
requests_tracts <- requests_sf %>%
st_join(baltimore_tracts, join = st_within) %>%
st_drop_geometry()
# TODO: Figure out if there are CRS issues these joins
requests_neighborhoods <- requests_sf %>%
st_join(baltimore_neighborhoods, join = st_within) %>%
st_drop_geometry()
requests_councildistricts <- requests_sf %>%
st_join(baltimore_council_districts, join = st_within) %>%
st_drop_geometry()
For this analysis, we decided to look at service requests from January 2017 through February 2020. This period allowed us to investigate reports of basements sewage backups both before and following significant policy changes and infrastructure improvement that the Baltimore Department of Public Works is required to complete as part of the Sanitary Sewer Consent Decree Program. Key events in study period include:
analysis_period <- interval(ymd("2017-01-01"), ymd("2020-02-29"))
# Filter request data to the analysis period
requests_sf_analysis <- requests_sf %>%
filter(ymd_hms(requests_sf$created_date) %within% analysis_period) %>%
add_count(address, name = 'requests_at_address')
requests_tracts_analysis <- requests_tracts %>%
filter(ymd_hms(requests_tracts$created_date) %within% analysis_period)
requests_neighborhoods_analysis <- requests_neighborhoods %>%
filter(ymd_hms(requests_tracts$created_date) %within% analysis_period)
The quarterly counts of unique addresses making service requests has declined since 2014.
Its is unclear whether this represents a reduction in backups or a reduction in service requests. This could be checked against the DPW quarterly report data on building backups.
The median days to close service requests has also declined.
It is unclear if requests are always resolved when the service request is closed.
summary_requests_monthly <- requests_tracts_analysis %>%
group_by(year_created, month_created) %>%
summarize(
request_count = n(),
address_count = length(unique(address)),
med_days_to_close = median(days_to_closed, na.rm = TRUE),
max_days_to_close = max(days_to_closed, na.rm = TRUE),
min_days_to_close = min(days_to_closed, na.rm = TRUE)
) %>%
mutate(address_pct_rank = percent_rank(address_count)) %>% # Add percent rank column
mutate(address_quintile = ntile(address_count, 5)) # Add ranked bins column (5 bins)
The following table shows eight months with the greatest number of 311 service requests in this period.
| year_created | month_created | request_count | address_count | med_days_to_close | max_days_to_close |
|---|---|---|---|---|---|
| 2017 | 1 | 674 | 581 | 1.8867650 | 228.82398 |
| 2017 | 12 | 578 | 493 | 1.8004861 | 427.25054 |
| 2018 | 1 | 670 | 560 | 2.1063252 | 463.60400 |
| 2018 | 2 | 645 | 532 | 2.0239352 | 451.18389 |
| 2018 | 5 | 590 | 519 | 2.7074248 | 446.92450 |
| 2018 | 7 | 638 | 540 | 1.9401852 | 236.84302 |
| 2019 | 12 | 583 | 485 | 0.8771181 | 59.28868 |
| 2020 | 1 | 646 | 538 | 1.2004745 | 39.82922 |
summary_requests_monthly %>%
ggplot(aes(ymd(paste0(year_created, "-", month_created, "-01")), address_count, group = 1)) +
geom_point(color = "#440154FF") +
geom_line(color = "#440154FF") +
geom_smooth(method = "loess") +
labs(
x = "Month",
y = "Addresses",
title = "Addresses with 311 requests for\nbasement sewage back-ups by month",
subtitle = "January 2017 to February 2020",
caption = "Source: Open Baltimore"
) +
hrbrthemes::theme_ipsum_rc()
summary_requests_monthly %>%
ggplot(aes(ymd(paste0(year_created, "-", month_created, "-01")), request_count, group = 1)) +
geom_point(color = "#440154FF") +
geom_line(color = "#440154FF") +
geom_smooth(method = "loess") +
labs(
x = "Month",
y = "Addresses",
title = "Total 311 requests for\nbasement sewage back-ups by month",
subtitle = "January 2017 to February 2020",
caption = "Source: Open Baltimore"
) +
hrbrthemes::theme_ipsum_rc()
summary_requests_monthly %>%
ggplot(aes(ymd(paste0(year_created, "-", month_created, "-01")), med_days_to_close, group = 1)) +
geom_point(color = "#440154FF") +
geom_line(color = "#440154FF") +
geom_smooth(method = "loess") +
labs(
x = "Months",
y = "Median days to closed",
title = "Median days to close 311 requests for basement sewage backups by month",
subtitle = "January 2017 to February 2020",
caption = "Source: Open Baltimore"
) +
hrbrthemes::theme_ipsum_rc()
Some addresses are associated with more than one service request in this period. It is unclear if these requests represent distinct basement sewage backup events or if residents submitted a second or third request if the issue remained unresolved after their initial service request. The following table shows the 10 addresses with the highest number of requests in this period.
| address | neighborhood | requests_at_address |
|---|---|---|
| 4014 Echodale Ave, Baltimore, Md, 21206 | Waltherson | 14 |
| 801 Showell Ct, Baltimore, Md, 21202 | Johnston Square | 14 |
| 500 N Hilton St, Baltimore, Md, 21229 | Lower Edmondson Village | 14 |
| 2305 W Lanvale St, Baltimore, Md, 21216 | Evergreen Lawn | 14 |
| 306 S Augusta Ave, Baltimore City, 21229 | Irvington | 11 |
| 3818 Glenmore Ave, Baltimore, Md, 21206 | Rosemont East | 11 |
| 1016 Marlau Dr, Baltimore, Md, 21212 | Chinquapin Park | 11 |
| 102 W Jeffrey St, Baltimore, Md, 21225 | Brooklyn | 11 |
| 1312 W Lexington St, Baltimore, Md, 21223 | Franklin Square | 10 |
| 6317 Wallis Ave, Baltimore, Md, 21215 | Cross Country | 10 |
More than 511 properties in 170 neighborhoods have four or more service requests for basement sewage backups in the study period. The map below shows the location of these properties.
We aggregated service requests into the tracts and neighborhoods then created the following variables:
In some cases, the maximum days to close is very high but it is unclear if this represents extended delays in addressing the issue or if city staff simply neglected to close the service request until several months after the issue was resolved.
We also created a summary to look at changes in these variables month-by-month throughout the study period.
As of February 28, 2020, this period includes 12,800 service requests or ~32% of the non-duplicate requests with locations information.
For the requests in this period:
Mapping the count of unique addresses by census tract suggests that basement sewage back-ups are concentrated in northwest Baltimore and in north Baltimore close to Baltimore County.
# Add summary columns by census tract
summary_requests_tracts <- requests_tracts_analysis %>%
group_by(GEOID) %>%
summarize(
request_count = n(),
address_count = length(unique(address)),
med_days_to_close = median(days_to_closed, na.rm = TRUE),
max_days_to_close = max(days_to_closed, na.rm = TRUE),
min_days_to_close = min(days_to_closed, na.rm = TRUE)
) %>%
mutate(address_pct_rank = percent_rank(address_count)) %>% # Add percent rank column
mutate(address_quintile = ntile(address_count, 5)) # Add ranked bins column (5 bins)
# Join tract summary with tract geometry
summary_requests_tracts <- summary_requests_tracts %>%
full_join(baltimore_tracts, by = "GEOID") %>%
mutate_at(vars(request_count:address_count), ~ replace(., is.na(.), 0)) %>% # Replace NA with 0 requests for two tracts w/ no requests
st_as_sf()
# Add summary columns by neighborhood
summary_requests_neighborhoods <- requests_neighborhoods_analysis %>%
group_by(label) %>%
summarize(
request_count = n(),
address_count = length(unique(address)),
med_days_to_close = median(days_to_closed, na.rm = TRUE),
max_days_to_close = max(days_to_closed, na.rm = TRUE),
min_days_to_close = min(days_to_closed, na.rm = TRUE)
) %>%
mutate(address_pct_rank = percent_rank(address_count)) %>% # Add percent rank column
mutate(address_quintile = ntile(address_count, 5)) # Add ranked bins column (5 bins)
## Warning: Factor `label` contains implicit NA, consider using
## `forcats::fct_explicit_na`
# Join neighborhood summary with neighborhood geometry
summary_requests_neighborhoods <- summary_requests_neighborhoods %>%
full_join(baltimore_neighborhoods, by = "label") %>%
mutate_at(vars(request_count:address_count), ~ replace(., is.na(.), 0)) %>%
st_as_sf()
The following table shows ten neighborhoods with the greatest number of unique addresses associated with 311 service requests in this period.
| label | request_count | address_count | med_days_to_close | max_days_to_close |
|---|---|---|---|---|
| Belair-Edison | 639 | 450 | 2.015938 | 252.6286 |
| Frankford | 517 | 341 | 2.639687 | 411.4896 |
| Glen | 323 | 230 | 1.595799 | 257.9875 |
| Howard Park | 320 | 216 | 1.803935 | 265.0551 |
| Hamilton Hills | 303 | 201 | 2.025243 | 251.8736 |
| Loch Raven | 275 | 198 | 2.943588 | 145.5322 |
| Coldstream Homestead Montebello | 281 | 196 | 2.022292 | 142.9118 |
| Idlewood | 277 | 188 | 2.102656 | 130.9457 |
| North Harford Road | 268 | 186 | 2.146256 | 170.0750 |
| Glenham-Belhar | 289 | 181 | 2.146765 | 424.8217 |
Mapping the count of unique addresses by neighborhood shows a somewhat different pattern than census tracts. In this map, basement sewage back-ups are concentrated in east Baltimore in the area most effected by backups along the sewer interceptor flowing to the Back River Wastewater Treatment Plant.4
To expand on this exploratory analysis, we combined the tract-level summary of service requests with American Community Survey data on the demographic characteristics, income, and housing costs.
#TODO: Create wide data table that links individual service requests to corresponding ACS data
#TODO: Send file to Erin
# Load a list of available ACS variables
# v18 <- load_variables(2018, "acs5", cache = TRUE)
# View(v18)
# Many of these variables are only available at the tract level
acs_vars <- c(
total_pop = "B01003_001", # Also used as a summary variable in long format
white_pop = "B02001_002",
black_pop = "B02001_003",
hispanic_total_pop = "B03002_012",
median_age = "B01002_001",
household_total = "B11001_001",
families_total = "B17010_001",
families_below_poverty = "B17010_002",
median_monthly_housing_costs = "B25105_001",
total_housing_units = "B25106_001",
owner_occupied_housing_units = "B25106_002",
renter_occupied_housing_units = "B25106_024",
median_household_income = "B25099_001",
median_household_income_w_mortgage = "B25099_002",
median_household_income_wo_mortgage = "B25099_003",
median_gross_rent = "B25064_001", # Total (dollars)
median_monthly_owner_cost = "B25088_001",
median_monthly_owner_cost_pct = "B25092_001"
)
acs_tract <- get_acs(
geography = "tract",
variables = acs_vars,
state = "MD", county = "Baltimore city", # geometry = TRUE,
summary_var = "B01003_001", # Total population
year = 2018
)
# Switch from a long to wide format (drops margin of error)
acs_tract_wide <- acs_tract %>% pivot_wider(id_cols = "GEOID", names_from = "variable", values_from = "estimate")
# Convert count variables into percentages where appropriate
acs_tract_wide <- acs_tract_wide %>%
janitor::clean_names(case = "snake") %>%
mutate(pct_white = white_pop / total_pop) %>%
mutate(pct_hisp = hispanic_total_pop / total_pop) %>%
mutate(pct_black_pop = black_pop / total_pop) %>%
mutate(pct_family_poverty = families_below_poverty / families_total) %>%
mutate(pct_owner_occupied = owner_occupied_housing_units / total_housing_units) %>%
mutate(pct_renter_occupied = renter_occupied_housing_units / total_housing_units)
# Add quartile rankings for median income, percent family poverty, and monthly owner costs as a percent of income
acs_tract_wide <- acs_tract_wide %>%
mutate(quartile_median_household_income = ntile(median_household_income, 4)) %>%
mutate(quartile_pct_family_poverty = ntile(pct_family_poverty, 4)) %>%
mutate(quartile_monthly_owner_cost_pct = ntile(median_monthly_owner_cost_pct, 4))
# Joining the 311 requests with the ACS data
acs_summary_requests_tracts <- summary_requests_tracts %>%
full_join(acs_tract_wide, by = c("GEOID" = "geoid"))
acs_requests_tracts_analysis <- requests_tracts_analysis %>%
full_join(acs_tract_wide, by = c("GEOID" = "geoid"))
# acs_summary_requests_tracts %>% write_csv('baltimore_tract_request_count.csv')
# acs_requests_tracts_analysis %>% write_csv('acs_requests_tracts_analysis.csv')
Blue Water Baltimore and the Environmental Integrity Project have expressed particular concern about the financial costs of cleaning up and fixing a basement sewage backups for residents who cannot afford the expense. While this data does not include the income for specific individuals affected by basement sewage backups, the following variables are used to describe the issue:
# Tract address count by median household income (colored by quartile)
acs_summary_requests_tracts %>%
filter(!is.na(quartile_median_household_income)) %>%
ggplot(aes(x = median_household_income, y = address_count, color = factor(quartile_median_household_income))) +
geom_jitter() +
scale_color_viridis_d(name = "Median income", labels = c("$19,375 to $48,092", "$49,054 to $62,250", "$62,695 to $82,139", "$83,929 to $213,200")) +
scale_x_log10(labels = dollar) +
labs(
title = "Count of basement sewage back-up reports\ncompared to median household income by U.S. census tract",
x = "Median household income",
y = "Addresses reporting basement sewage back-ups",
caption = "Source: Open Baltimore; American Community Survey"
) +
coord_flip() +
hrbrthemes::theme_ipsum_rc()
Census tracts in the lowest half of median houshold income in the city make up 6,398 of all addresses with associated service requests in this period or 42.5% of the total (15,039). This analysis suggests a relatively even distribution of service requests across all four income quartiles.
However, this distribution does not account for the possibility of under-reporting to 311 by low-income tenants.
| quartile_median_household_income | min_median_household_income_quartile | median_household_income_quartile | max_median_household_income_quartile | sum(address_count) |
|---|---|---|---|---|
| 1 | 19375 | 41136 | 48092 | 2283 |
| 2 | 49054 | 55437 | 62250 | 4114 |
| 3 | 62695 | 71576 | 82139 | 3924 |
| 4 | 83929 | 110817 | 213200 | 2243 |
# Total address count by median household income quartile
acs_summary_requests_tracts %>%
mutate(quartile_median_household_income = ntile(median_household_income, 4)) %>%
ggplot(aes(x = quartile_median_household_income, y = address_count, fill = quartile_median_household_income)) +
geom_col()
## Warning: Removed 10 rows containing missing values (position_stack).
These counts are current as of February 28, 2020.↩︎
The requests with incorrect coordinates were created between January and May 2018 and the values for latitude and longitude were converted to integers (e.g. 39, -77; 39, -76) and lost the relevant location information.↩︎
The geocoded data included both exact and non-exact matches. Both groups are included in the geocoded data that is combined with the original data for analysis. 212 requests could not be matched with a location and are excluded from this analysis.↩︎
Learn more about the backups associated with the Back River Wastewater Treatment Plant and the Headworks Project from the Department of Public Works.↩︎