Housing conditions are an important factor in public health and well-being. Poor residential environments, such as mold exposure, can contribute to physical health problems and increased stress. Domestic violence is also influenced by environmental and social stressors, making housing conditions a relevant area of study.
This project examines the relationship between residential mold complaints and domestic violence reports in New York City from 2010 to 2024. I am using two datasets from NYC Open Data: 311 Complaint Data to extract residential mold complaints and NYPD Complaint Data Historic to extract domestic violence reports. Using NYC 311 mold complaint data and DV report data, I explore whether these two types of reports follow similar patterns over time. The goal is not to determine causation, but to understand whether mold complaints and DV reports tend to rise and fall together.
The analysis focuses on monthly aggregated data and includes exploratory summaries, correlation analyses, and regression models. I also explore delayed (or lagged) relationships and mold complaint resolution time to better understand how timing may play a role.
library(tidyverse)
library(readxl)
library(ggplot2)
library(mosaic)
library(AICcmodavg)
library(knitr)
mold_data <- read_excel("311_Service_Requests_from_2010_to_Present_20251215.xlsx")
dv_data <- read_excel("NYPD_Complaint_Data_Historic_20251218.xlsx")
mold_data_clean <- mold_data %>% filter(
`Location Type` == "RESIDENTIAL BUILDING" | `Location Type` == "Residential Building" | `Location Type` == "Loft Residence" | `Location Type` == "Mixed Use Building" | `Location Type` == "Apartment" | `Location Type` == "3+ Family Apartment Building" | `Location Type` == "1-2 Family Dwelling" | `Location Type` == "1-2 Family Mixed Use Building" | `Location Type` == "3+ Family Mixed Use Building" | `Location Type` == "Single Room Occupancy (SRO)")
dv_data_clean<- dv_data %>% rename(
"complaint_number" = `CMPLNT_NUM`,
"inc_occur_date" = `CMPLNT_FR_DT`,
"inc_occur_time" = `CMPLNT_FR_TM`,
"inc_end_date" = `CMPLNT_TO_DT`,
"inc_end_time" = `CMPLNT_TO_TM`,
"precinct_occur" = `ADDR_PCT_CD`,
"report_date" = `RPT_DT`,
"key_code" = `KY_CD`,
"offense_type" = `OFNS_DESC`,
"class_code" = `PD_CD`,
"class_code_desc" = `PD_DESC`,
"attempt_completion" = `CRM_ATPT_CPTD_CD`,
"offense_level" = `LAW_CAT_CD`,
"borough" = `BORO_NM`,
"occur_location" = `LOC_OF_OCCUR_DESC`,
"premise_desc" = `PREM_TYP_DESC`,
"juris_code_desc" = `JURIS_DESC`,
"jurisdiction" = `JURISDICTION_CODE`,
"park_occur" = `PARKS_NM`,
"development" = `HADEVELOPT`,
"development_code" = `HOUSING_PSA`,
"x_coord" = `X_COORD_CD`,
"y_coord" = `Y_COORD_CD`,
"suspect_age" = `SUSP_AGE_GROUP`,
"suspect_race" = `SUSP_RACE`,
"suspect_sex" = `SUSP_SEX`,
"transit_district" = `TRANSIT_DISTRICT`,
"patrol_borough" = `PATROL_BORO`,
"station_name" = `STATION_NAME`,
"victim_age" = `VIC_AGE_GROUP`,
"victim_race" = `VIC_RACE`,
"victim_sex" = `VIC_SEX`
)
mold_data_clean <- mold_data_clean %>%
mutate(Created_Date_Original = `Created Date`) %>%
separate(
col = `Created Date`,
into = c("Year","Month","Day"),
sep = "-",
remove = FALSE
)
dv_data_clean <- dv_data_clean %>% separate(
col = inc_occur_date,
into = c("Year","Month","Day"),
sep = "-",
)
mold_data_clean <- mold_data_clean %>%
filter(Descriptor != "Unsafe Mold Cleanup")
mold_data_clean<- mold_data_clean %>% mutate(
`Complaint Type` = recode(
`Complaint Type`,
"UNSANITARY CONDITION" = "Mold",
"Unsanitary Condition" = "Mold",
"MOLD" = "Mold",
"GENERAL" = "Mold",
"GENERAL CONSTRUCTION" = "Mold"
))
mold_data_clean<- mold_data_clean %>% mutate(
`Month` = recode(
`Month`,
"01" = "01 - January",
"02" = "02 - February",
"03" = "03 - March",
"04" = "04 - April",
"05" = "05 - May",
"06" = "06 - June",
"07" = "07 - July",
"08" = "08 - August",
"09" = "09 - September",
"10" = "10 - October",
"11" = "11 - November",
"12" = "12 - December"
))
dv_data_clean<- dv_data_clean %>% mutate(
`Month` = recode(
`Month`,
"01" = "01 - January",
"02" = "02 - February",
"03" = "03 - March",
"04" = "04 - April",
"05" = "05 - May",
"06" = "06 - June",
"07" = "07 - July",
"08" = "08 - August",
"09" = "09 - September",
"10" = "10 - October",
"11" = "11 - November",
"12" = "12 - December"
))
dv_data_clean$complaint <- "DV"
dv_data_clean <- dv_data_clean %>%
mutate(Year = as.numeric(Year)) %>%
filter(Year >= 2010)
mold_data_clean<- mold_data_clean %>% mutate(
Borough = case_when(
City == "NEW YORK" ~ "MANHATTAN",
City == "BROOKLYN" ~ "BROOKLYN",
City == "ARVERNE" ~ "QUEENS",
City == "BRONX" ~ "BRONX",
City == "JAMAICA" ~ "QUEENS",
City == "SPRINGFIELD GARDENS" ~ "QUEENS",
City == "FLUSHING" ~ "QUEENS",
City == "STATEN ISLAND" ~ "STATEN ISLAND",
City == "RICHMOND HILL" ~ "QUEENS",
City == "ASTORIA" ~ "QUEENS",
City == "HOLLIS" ~ "QUEENS",
City == "RIDGEWOOD" ~ "QUEENS",
City == "FOREST HILLS" ~ "QUEENS",
City == "ELMHURST" ~ "QUEENS",
City == "MASPETH" ~ "QUEENS",
City == "SOUTH RICHMOND HILL" ~ "QUEENS",
City == "JACKSON HEIGHTS" ~ "QUEENS",
City == "BAYSIDE" ~ "QUEENS",
City == "FAR ROCKAWAY" ~ "QUEENS",
City == "SAINT ALBANS" ~ "QUEENS",
City == "CORONA" ~ "QUEENS",
City == "WOODSIDE" ~ "QUEENS",
City == "QUEENS VILLAGE" ~ "QUEENS",
City == "REGO PARK" ~ "QUEENS",
City == "ROSEDALE" ~ "QUEENS",
City == "SUNNYSIDE" ~ "QUEENS",
City == "OZONE PARK" ~ "QUEENS",
City == "EAST ELMHURST" ~ "QUEENS",
City == "MIDDLE VILLAGE" ~ "QUEENS",
City == "WOODHAVEN" ~ "QUEENS",
City == "SOUTH OZONE PARK" ~ "QUEENS",
City == "ROCKAWAY PARK" ~ "QUEENS",
City == "KEW GARDENS" ~ "QUEENS",
City == "FRESH MEADOWS" ~ "QUEENS",
City == "COLLEGE POINT" ~ "QUEENS",
City == "LONG ISLAND CITY" ~ "QUEENS",
City == "OAKLAND GARDENS" ~ "QUEENS",
City == "WHITESTONE" ~ "QUEENS",
City == "HOWARD BEACH" ~ "QUEENS",
City == "CAMBRIA HEIGHTS" ~ "QUEENS",
City == "BELLEROSE" ~ "QUEENS",
City == "LITTLE NECK" ~ "QUEENS",
City == "BREEZY POINT" ~ "QUEENS",
City == "GLEN OAKS" ~ "QUEENS",
City == "FLORAL PARK" ~ "QUEENS",
City == "PELHAM" ~ "BRONX",
City == "NEW HYDE PARK" ~ "QUEENS",
City == "QUEENS" ~ "QUEENS",
City == "MANHATTAN" ~ "MANHATTAN",
City == "Far Rockaway" ~ "QUEENS",
City == "Astoria" ~ "QUEENS",
City == "Elmhurst" ~ "QUEENS",
City == "Corona" ~ "QUEENS",
City == "Ozone Park" ~ "QUEENS",
City == "Forest Hills" ~ "QUEENS",
City == "Jamaica" ~ "QUEENS",
City == "Arverne" ~ "QUEENS",
City == "Bayside" ~ "QUEENS",
City == "East Elmhurst" ~ "QUEENS",
City == "Flushing" ~ "QUEENS",
City == "Middle Village" ~ "QUEENS",
City == "Ridgewood" ~ "QUEENS",
City == "Woodside" ~ "QUEENS",
City == "Oakland Gardens" ~ "QUEENS",
City == "Rego Park" ~ "QUEENS",
City == "Hollis" ~ "QUEENS",
City == "Saint Albans" ~ "QUEENS",
City == "Springfield Gardens" ~ "QUEENS",
City == "Kew Gardens" ~ "QUEENS",
City == "Fresh Meadows" ~ "QUEENS",
City == "Howard Beach" ~ "QUEENS",
City == "South Rich mond Hill" ~ "QUEENS",
City == "Whitestone" ~ "QUEENS",
City == "South Ozone Park" ~ "QUEENS",
City == "College Point" ~ "QUEENS",
City == "Jackson Heights" ~ "QUEENS",
City == "Maspeth" ~ "QUEENS",
City == "Long Island City" ~ "QUEENS",
City == "Rockaway Park" ~ "QUEENS",
City == "Sunnyside" ~ "QUEENS",
City == "Woodhaven" ~ "QUEENS",
City == "Floral Park" ~ "QUEENS",
City == "Glen Oaks" ~ "QUEENS",
City == "Queens Village" ~ "QUEENS",
City == "Bellerose" ~ "QUEENS",
City == "Little Neck" ~ "QUEENS",
City == "Richmond Hill" ~ "QUEENS",
City == "Rosedale" ~ "QUEENS",
City == "Cambria Heights" ~ "QUEENS",
City == "New Hyde Park" ~ "QUEENS",
City == "Breezy Point" ~ "QUEENS",
)
)
mold_data_clean <- mold_data_clean %>% filter(!is.na(Borough))
mold_data_clean<- mold_data_clean %>% filter(Year != 2025)
mold_data_clean <- mold_data_clean %>%
mutate(
created_date = as.Date(`Created Date`),
closed_date = as.Date(`Closed Date`),
resolution_days = as.numeric(closed_date - created_date)
)
kable(head(mold_data_clean, 3))
| Unique Key | Created Date | Year | Month | Day | Closed Date | Agency | Agency Name | Complaint Type | Descriptor | Location Type | Incident Zip | Incident Address | Street Name | Cross Street 1 | Cross Street 2 | Intersection Street 1 | Intersection Street 2 | Address Type | City | Landmark | Facility Type | Status | Due Date | Resolution Description | Resolution Action Updated Date | Community Board | BBL | Borough | X Coordinate (State Plane) | Y Coordinate (State Plane) | Open Data Channel Type | Park Facility Name | Park Borough | Vehicle Type | Taxi Company Borough | Taxi Pick Up Location | Bridge Highway Name | Bridge Highway Direction | Road Ramp | Bridge Highway Segment | Latitude | Longitude | Location | Created_Date_Original | created_date | closed_date | resolution_days |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 63573695 | 2024-12-31T23:06:14.000 | 2024 | 12 - December | 31T23:06:14.000 | 2025-01-19T11:58:36.000 | HPD | Department of Housing Preservation and Development | Mold | MOLD | RESIDENTIAL BUILDING | 10451 | 283 EAST 149 STREET | EAST 149 STREET | NA | NA | NA | NA | ADDRESS | BRONX | NA | NA | Closed | NA | HPD conducted an inspection of this complaint. The conditions observed by the inspector did not violate the housing laws enforced by HPD. The complaint has been closed. | 2025-01-19T00:00:00.000 | 01 BRONX | 2023310072 | BRONX | 1005910 | 236991 | PHONE | Unspecified | BRONX | NA | NA | NA | NA | NA | NA | NA | 40.81713 | -73.92175 | (40.81713468822815, -73.9217467475281) | 2024-12-31T23:06:14.000 | 2024-12-31 | 2025-01-19 | 19 |
| 63583460 | 2024-12-31T21:19:39.000 | 2024 | 12 - December | 31T21:19:39.000 | 2025-01-03T16:24:36.000 | HPD | Department of Housing Preservation and Development | Mold | MOLD | RESIDENTIAL BUILDING | 10468 | 2719 MORRIS AVENUE | MORRIS AVENUE | NA | NA | NA | NA | ADDRESS | BRONX | NA | NA | Closed | NA | HPD inspected this condition so the complaint has been closed. Violations were issued. The law provides the property owner time to correct the condition(s). Violation descriptions and the dates for the property owner to correct any violations are available at HPDONLINE. If the owner has not corrected the condition by the date provided, you may wish to bring a case in housing court seeking the correction of these conditions.To find out more about how to start a housing court case, visit HPD’s w | 2025-01-03T00:00:00.000 | 07 BRONX | 2033170043 | BRONX | 1013153 | 255563 | PHONE | Unspecified | BRONX | NA | NA | NA | NA | NA | NA | NA | 40.86809 | -73.89550 | (40.86808863174485, -73.89549921281306) | 2024-12-31T21:19:39.000 | 2024-12-31 | 2025-01-03 | 3 |
| 63583408 | 2024-12-31T20:55:28.000 | 2024 | 12 - December | 31T20:55:28.000 | 2025-01-10T16:55:45.000 | HPD | Department of Housing Preservation and Development | Mold | MOLD | RESIDENTIAL BUILDING | 11435 | 85-15 139 STREET | 139 STREET | NA | NA | NA | NA | ADDRESS | JAMAICA | NA | NA | Closed | NA | HPD conducted an inspection of this complaint. The conditions observed by the inspector did not violate the housing laws enforced by HPD. The complaint has been closed. | 2025-01-10T00:00:00.000 | 08 QUEENS | 4097100002 | QUEENS | 1034989 | 197495 | PHONE | Unspecified | QUEENS | NA | NA | NA | NA | NA | NA | NA | 40.70861 | -73.81699 | (40.7086094282564, -73.8169884642595) | 2024-12-31T20:55:28.000 | 2024-12-31 | 2025-01-10 | 10 |
kable(head(dv_data_clean, 5))
| complaint_number | Year | Month | Day | inc_occur_time | inc_end_date | inc_end_time | precinct_occur | report_date | key_code | offense_type | class_code | class_code_desc | attempt_completion | offense_level | borough | occur_location | premise_desc | juris_code_desc | jurisdiction | park_occur | development | development_code | x_coord | y_coord | suspect_age | suspect_race | suspect_sex | transit_district | Latitude | Longitude | Lat_Lon | patrol_borough | station_name | victim_age | victim_race | victim_sex | complaint |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 298690828 | 2024 | 12 - December | 31T00:00:00.000 | 13:00:00 | 2024-12-31T00:00:00.000 | 13:10:00 | 113 | 2024-12-31T00:00:00.000 | 344 | ASSAULT 3 & RELATED OFFENSES | 101 | ASSAULT 3 | COMPLETED | MISDEMEANOR | QUEENS | INSIDE | RESIDENCE-HOUSE | N.Y. POLICE DEPT | 0 | (null) | (null) | NA | 1046104 | 187464 | <18 | BLACK | F | NA | 40.68101 | -73.77699 | (40.681014, -73.776991) | PATROL BORO QUEENS SOUTH | (null) | <18 | BLACK | M | DV |
| 298698016 | 2024 | 12 - December | 31T00:00:00.000 | 08:00:00 | 2024-12-31T00:00:00.000 | 09:00:00 | 116 | 2024-12-31T00:00:00.000 | 344 | ASSAULT 3 & RELATED OFFENSES | 101 | ASSAULT 3 | COMPLETED | MISDEMEANOR | QUEENS | INSIDE | RESIDENCE-HOUSE | N.Y. POLICE DEPT | 0 | (null) | (null) | NA | 1048028 | 178970 | 25-44 | BLACK | M | NA | 40.65769 | -73.77013 | (40.657687, -73.770132) | PATROL BORO QUEENS SOUTH | (null) | 18-24 | WHITE HISPANIC | F | DV |
| 298704508 | 2024 | 12 - December | 31T00:00:00.000 | 16:50:00 | 2024-12-31T00:00:00.000 | 16:56:00 | 107 | 2024-12-31T00:00:00.000 | 344 | ASSAULT 3 & RELATED OFFENSES | 101 | ASSAULT 3 | COMPLETED | MISDEMEANOR | QUEENS | INSIDE | RESIDENCE - APT. HOUSE | N.Y. POLICE DEPT | 0 | (null) | (null) | NA | 1050645 | 203097 | UNKNOWN | BLACK | F | NA | 40.72389 | -73.76046 | (40.723891, -73.760464) | PATROL BORO QUEENS SOUTH | (null) | 65+ | WHITE | M | DV |
| 298678676 | 2024 | 12 - December | 31T00:00:00.000 | 07:00:00 | 2024-12-31T00:00:00.000 | 07:30:00 | 113 | 2024-12-31T00:00:00.000 | 344 | ASSAULT 3 & RELATED OFFENSES | 101 | ASSAULT 3 | COMPLETED | MISDEMEANOR | QUEENS | INSIDE | RESIDENCE-HOUSE | N.Y. POLICE DEPT | 0 | (null) | (null) | NA | 1051478 | 189936 | 25-44 | BLACK | M | NA | 40.68776 | -73.75759 | (40.687762, -73.757589) | PATROL BORO QUEENS SOUTH | (null) | 45-64 | BLACK | M | DV |
| 298672417 | 2024 | 12 - December | 31T00:00:00.000 | 02:50:00 | 2024-12-31T00:00:00.000 | 02:55:00 | 101 | 2024-12-31T00:00:00.000 | 344 | ASSAULT 3 & RELATED OFFENSES | 101 | ASSAULT 3 | COMPLETED | MISDEMEANOR | QUEENS | INSIDE | RESIDENCE-HOUSE | N.Y. POLICE DEPT | 0 | (null) | (null) | NA | 1054075 | 157436 | 45-64 | BLACK | M | NA | 40.59854 | -73.74856 | (40.598536, -73.74856) | PATROL BORO QUEENS SOUTH | (null) | 45-64 | BLACK | F | DV |
This chunk deals with some heavy cleaning of large datasets; I am mostly standardizing column names, values, filling in NA values for Borough (using the City column), and getting rid of otherwise NA values. I also separated the dates so that Year, Month, and Day could be individually utilized in the project.
aggregated_dv_data <- dv_data_clean %>%
group_by(Year, Month, borough) %>%
summarise(
`complaint` = n(),
.groups = "drop"
)
aggregated_dv_data <- aggregated_dv_data %>%
filter(borough != "(null)")
aggregated_dv_data <- aggregated_dv_data %>%
mutate(Year = as.character(Year))
aggregated_dv_data<- aggregated_dv_data %>% rename(
"Borough" = "borough")
aggregated_dv_data<- aggregated_dv_data %>% rename(
"DV Reports" = "complaint"
)
aggregated_mold_data <- mold_data_clean %>%
group_by(Year, Month, Borough) %>%
summarise(
`Complaint Type` = n(),
.groups = "drop"
)
aggregated_mold_data<- aggregated_mold_data %>% rename(
"Mold Complaints" = "Complaint Type"
)
aggregated_dv_mold_data <- left_join(
aggregated_dv_data,
aggregated_mold_data,
by = c("Borough", "Year", "Month")
)
aggregated_dv_mold_data <- aggregated_dv_mold_data %>%
mutate(
year_month = paste(Year, Month)
)
aggregated_dv_mold_data <- aggregated_dv_mold_data %>%
group_by(Borough) %>%
mutate(time_index = row_number()) %>%
ungroup()
kable(head(aggregated_dv_mold_data, 15))
| Year | Month | Borough | DV Reports | Mold Complaints | year_month | time_index |
|---|---|---|---|---|---|---|
| 2010 | 01 - January | BRONX | 910 | 954 | 2010 01 - January | 1 |
| 2010 | 01 - January | BROOKLYN | 1306 | 779 | 2010 01 - January | 1 |
| 2010 | 01 - January | MANHATTAN | 541 | 410 | 2010 01 - January | 1 |
| 2010 | 01 - January | QUEENS | 791 | 315 | 2010 01 - January | 1 |
| 2010 | 01 - January | STATEN ISLAND | 154 | 58 | 2010 01 - January | 1 |
| 2010 | 02 - February | BRONX | 818 | 738 | 2010 02 - February | 2 |
| 2010 | 02 - February | BROOKLYN | 938 | 651 | 2010 02 - February | 2 |
| 2010 | 02 - February | MANHATTAN | 424 | 338 | 2010 02 - February | 2 |
| 2010 | 02 - February | QUEENS | 603 | 273 | 2010 02 - February | 2 |
| 2010 | 02 - February | STATEN ISLAND | 151 | 29 | 2010 02 - February | 2 |
| 2010 | 03 - March | BRONX | 969 | 941 | 2010 03 - March | 3 |
| 2010 | 03 - March | BROOKLYN | 1149 | 870 | 2010 03 - March | 3 |
| 2010 | 03 - March | MANHATTAN | 500 | 415 | 2010 03 - March | 3 |
| 2010 | 03 - March | QUEENS | 758 | 395 | 2010 03 - March | 3 |
| 2010 | 03 - March | STATEN ISLAND | 147 | 55 | 2010 03 - March | 3 |
In this chunk, I aggregated the domestic violence reports and mold complaints datasets into one, grouped by month and borough, in order to easily analyze trends.
dv_summary <- aggregated_dv_data %>%
summarise(
total_reports = sum(`DV Reports`),
start_year = min(as.numeric(Year)),
end_year = max(as.numeric(Year)),
boroughs = n_distinct(Borough),
avg_monthly = mean(`DV Reports`)
)
kable(dv_summary)
| total_reports | start_year | end_year | boroughs | avg_monthly |
|---|---|---|---|---|
| 669136 | 2010 | 2024 | 5 | 743.4844 |
From the beginning of the year in 2010 to the end of the year in 2024, there were a total of 669,136 domestic violence incidents reported across all five boroughs in New York City. Without grouping by year or borough, there are roughly 743 domestic violence incident reports filed each month.
dv_by_year_borough <- aggregated_dv_data %>%
group_by(Year, Borough) %>%
summarise(total_reports = sum(`DV Reports`)) %>%
pivot_wider(names_from = Year,
values_from = total_reports)
kable(dv_by_year_borough)
| Borough | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 | 2022 | 2023 | 2024 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| BRONX | 11274 | 11441 | 12207 | 12630 | 12924 | 12594 | 12916 | 12569 | 13662 | 12943 | 12013 | 12499 | 14193 | 14149 | 14905 |
| BROOKLYN | 14094 | 14344 | 15232 | 15032 | 15022 | 14101 | 13771 | 13052 | 13039 | 12250 | 11256 | 12362 | 13286 | 13180 | 13652 |
| MANHATTAN | 6081 | 6128 | 6471 | 6409 | 6627 | 6555 | 6791 | 6586 | 6550 | 6864 | 6117 | 7160 | 7194 | 6880 | 7215 |
| QUEENS | 8726 | 9182 | 9128 | 9425 | 9467 | 8906 | 8697 | 8733 | 9047 | 9324 | 8778 | 9368 | 10205 | 10484 | 12071 |
| STATEN ISLAND | 2015 | 2056 | 2414 | 2242 | 2364 | 2178 | 2112 | 2036 | 2048 | 1868 | 1687 | 1871 | 2099 | 2126 | 2259 |
From 2010 to 2024, Brooklyn and Bronx have consistently had the highest amount of domestic violence reports in NYC. Brooklyn was the borough with the most amount of reports since 2010, but sometime in 2018, Bronx took the title for highest reported incidents and has been the borough with the highest number of reports since then. Staten Island has had the lowest reported incidents each year.
For the most part, domestic violence incident reports have risen across the boroughs consistently from 2010-2024. However, Brooklyn is the only borough in which the number of reported incidents are lower in 2024 than they were in 2010. (This could be something interesting to look into!)
library(ggplot2)
dv_plot_data <- aggregated_dv_data %>%
group_by(Year, Borough) %>%
summarise(total_reports = sum(`DV Reports`))
ggplot(dv_plot_data, aes(x = Year, y = Borough, fill = total_reports)) +
geom_tile(color = "white") +
scale_fill_gradient(low = "lightblue", high = "darkred") +
labs(
title = "DV Reports by Borough and Year",
x = "Year",
y = "Borough",
fill = "DV Reports"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45))
Above is a heat map of domestic violence incident reports from 2010-2024. This reflects what was observed in the previous table; The Bronx and Brooklyn typically tend to have a higher volume of reports, and Staten Island has stayed largely below the average.
mold_summary <- aggregated_mold_data %>%
summarise(
total_complaints = sum(`Mold Complaints`),
start_year = min(as.numeric(Year)),
end_year = max(as.numeric(Year)),
boroughs = n_distinct(Borough),
avg_monthly = mean(`Mold Complaints`)
)
kable(mold_summary)
| total_complaints | start_year | end_year | boroughs | avg_monthly |
|---|---|---|---|---|
| 388293 | 2010 | 2024 | 5 | 431.4367 |
From 2010-2025, there have been a total of 412,698 mold complaints in residential buildings reported across all five boroughs in New York City. Without grouping by year or borough, there are roughly 431 residential mold complaints made to 311 every month.
mold_by_year_borough <- aggregated_mold_data %>%
group_by(Year, Borough) %>%
summarise(total_complaints = sum(`Mold Complaints`)) %>%
pivot_wider(names_from = Year,
values_from = total_complaints)
kable(mold_by_year_borough)
| Borough | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 | 2022 | 2023 | 2024 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| BRONX | 7973 | 8726 | 7384 | 7778 | 8295 | 8434 | 7946 | 7510 | 9510 | 6460 | 5953 | 9378 | 9513 | 11438 | 12847 |
| BROOKLYN | 8244 | 10404 | 8373 | 9299 | 9280 | 9074 | 7700 | 7908 | 9747 | 6360 | 5397 | 8199 | 8265 | 10389 | 11092 |
| MANHATTAN | 3977 | 4403 | 3683 | 3840 | 4787 | 5062 | 4395 | 4031 | 4942 | 3119 | 2985 | 5027 | 5347 | 6660 | 6757 |
| QUEENS | 3258 | 3907 | 3205 | 3290 | 3351 | 3328 | 2875 | 2824 | 3569 | 2406 | 2027 | 3125 | 3347 | 4229 | 4736 |
| STATEN ISLAND | 615 | 852 | 730 | 757 | 747 | 688 | 720 | 691 | 864 | 529 | 504 | 777 | 721 | 960 | 770 |
From 2010 to 2024, Brooklyn and Bronx seem to have the highest complaints of mold in residential buildings, and they compete for first place. Staten Island was found to have the lowest number of residential mold complaints to 311 each year.
Across all 5 boroughs, there are more mold complaints in 2024 than there were in 2010, with the overall trend being an increase in 311 complaints for residential mold.
mold_plot_data <- aggregated_mold_data %>%
group_by(Year, Borough) %>%
summarise(total_complaints = sum(`Mold Complaints`))
ggplot(mold_plot_data, aes(x = Year, y = Borough, fill = total_complaints)) +
geom_tile(color = "white") +
scale_fill_gradient(low = "wheat", high = "darkgreen") +
labs(
title = "Mold Complaints by Borough and Year",
x = "Year",
y = "Borough",
fill = "Mold Complaints"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45))
Above is a heat map of residential mold complaints to 311 from 2010-2025. Similar in density to the domestic violence report heat map; The Bronx and Brooklyn seem to have a higher volume of complaints, and Staten Island has stayed largely below the average.
cor.test(aggregated_dv_mold_data$`DV Reports`,
aggregated_dv_mold_data$`Mold Complaints`)
##
## Pearson's product-moment correlation
##
## data: x and y
## t = 43.206, df = 898, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.7992741 0.8418473
## sample estimates:
## cor
## 0.8217037
Strength: 0.82 (very strong)
Direction: positive
Significance: statistically significant (p<0.05)
I ran a correlation between domestic violence reports and residential mold complaints to see if there was a substantial relationship between them, and there is! The relationship between the two is positive and very strong, suggesting that a higher amount of DV reports is associated with a higher amount of mold complaints, and a lower number of DV reports is associated with a lower amount of mold complaints. This tells us that both variables move together.
ggplot(aggregated_dv_mold_data, aes(x = `Mold Complaints`, y = `DV Reports`, color = Borough)) +
geom_point(alpha = 0.5) +
geom_smooth(method = "lm", se = FALSE, color = "gray18") +
labs(title = "DV Reports vs Mold Complaints by Borough",
x = "Mold Complaints",
y = "DV Reports") +
theme_minimal()
However, this only tells us that domestic violence reports and mold complaints co-occur, and it does not tell us anything about causality.
Let’s dive into how domestic violence reports and mold complaints develop over time!
yearly_counts_wide <- aggregated_dv_mold_data %>%
group_by(Year) %>%
summarise(
`DV Reports` = sum(`DV Reports`, na.rm = TRUE),
`Mold Complaints` = sum(`Mold Complaints`, na.rm = TRUE),
.groups = "drop"
) %>%
pivot_longer(
cols = c(`DV Reports`, `Mold Complaints`),
names_to = "Type",
values_to = "Count"
) %>%
pivot_wider(
names_from = Year,
values_from = Count
)
kable(yearly_counts_wide)
| Type | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 | 2022 | 2023 | 2024 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| DV Reports | 42190 | 43151 | 45452 | 45738 | 46404 | 44334 | 44287 | 42976 | 44346 | 43249 | 39851 | 43260 | 46977 | 46819 | 50102 |
| Mold Complaints | 24067 | 28292 | 23375 | 24964 | 26460 | 26586 | 23636 | 22964 | 28632 | 18874 | 16866 | 26506 | 27193 | 33676 | 36202 |
Looking at this table, we can see that DV reports and mold complaints have staggered over the years, but overall they seem to have an increasing trend. Both variables have higher reports in 2024 than they did in 2010.
monthly_counts <- aggregated_dv_mold_data %>%
group_by(Year, Month) %>%
summarise(
total_dv = sum(`DV Reports`, na.rm = TRUE),
total_mold = sum(`Mold Complaints`, na.rm = TRUE)
)
## `summarise()` has grouped output by 'Year'. You can override using the
## `.groups` argument.
kable(head(monthly_counts, 12))
| Year | Month | total_dv | total_mold |
|---|---|---|---|
| 2010 | 01 - January | 3702 | 2516 |
| 2010 | 02 - February | 2934 | 2029 |
| 2010 | 03 - March | 3523 | 2676 |
| 2010 | 04 - April | 3343 | 2251 |
| 2010 | 05 - May | 3583 | 1787 |
| 2010 | 06 - June | 3867 | 1856 |
| 2010 | 07 - July | 3872 | 1765 |
| 2010 | 08 - August | 3617 | 1958 |
| 2010 | 09 - September | 3618 | 1766 |
| 2010 | 10 - October | 3542 | 1969 |
| 2010 | 11 - November | 3365 | 1661 |
| 2010 | 12 - December | 3224 | 1833 |
Above is a table that separates the total domestic violence reports and mold complaints by each month per year. If we plot this, we can see how they both trend over time compared to one another:
plot_data <- aggregated_dv_mold_data %>%
pivot_longer(
cols = c(`DV Reports`, `Mold Complaints`),
names_to = "Type",
values_to = "Count"
)
ggplot(plot_data, aes(x = time_index, y = Count, color = Type)) +
geom_line(linewidth = 0.5) +
facet_wrap(~Borough) +
scale_color_manual(values = c("DV Reports" = "darkred", "Mold Complaints" = "darkgreen")) +
labs(
title = "Monthly DV and Mold Reports by Borough",
x = "Time",
y = "Number of Reports",
color = "Report Type"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45)
)
Here, we have line plots of domestic violence reports and residential mold complaints per month from January of 2010 to December of 2024 (faceted by borough). We can see similar peaks across the boroughs (especially The Bronx and Brooklyn).
So, how exactly does mold exposure relate to domestic violence reports over time?
We’ve established that a relationship exists between the two variables themselves, but we need to look closer at this data. How does domestic violence in a given borough during a given month correlate with mold complaints in the same borough during the same month, and how does those variables move together?
Month-by-Month DV vs. Mold Counts
cor.test(monthly_counts$total_dv, monthly_counts$total_mold)
##
## Pearson's product-moment correlation
##
## data: x and y
## t = 5.1733, df = 178, p-value = 6.155e-07
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2272817 0.4822876
## sample estimates:
## cor
## 0.3615268
Strength: 0.36
Direction: positive
Significance: statistically significant (p<0.05)
This is a much more realistic look at how mold complaints and DV reports move together across time! We got rid of borough size differences and were able to focus on the complaints and reports that were really happening in the same area during the same time. We can see that month-by-month DV reports and mold complaints have a moderate, positive correlation with one another and this result is statistically significant. This suggests that mold complaints and DV reports tend to coincide with each other each month.
Counts of mold complaints provide insight into the amount of housing issues that exist, but they do not tell us how long the residents are dealing with the exposure to mold. When investigating predictors of household stress, the length of time a complaint remains unresolved may be important.
So, let’s take a look at how resolution time may play a role in the relationship between domestic violence reports and mold complaints:
res_time<- mold_data_clean %>%
group_by(Borough) %>%
summarise(
avg_resolution_days = mean(resolution_days, na.rm = TRUE),
median_days = median(resolution_days, na.rm = TRUE),
min_days = min(resolution_days, na.rm = TRUE),
max_days = max(resolution_days, na.rm = TRUE),
n_complaints = n()
) %>%
arrange(desc(avg_resolution_days))
kable(res_time)
| Borough | avg_resolution_days | median_days | min_days | max_days | n_complaints |
|---|---|---|---|---|---|
| MANHATTAN | 20.60248 | 11 | 0 | 3090 | 69015 |
| QUEENS | 18.42648 | 12 | 0 | 4184 | 49477 |
| STATEN ISLAND | 17.67769 | 11 | 0 | 805 | 10925 |
| BRONX | 15.02155 | 10 | 0 | 975 | 129145 |
| BROOKLYN | 14.03738 | 9 | 0 | 3980 | 129731 |
Right away, we see a huge variation in the amount of days it has taken for a residential mold complaint to be resolved. Some reports are addressed as early as the same day, while others can take years to be fully resolved. The average amount of days it takes for a complaint to be resolved varies per borough, but it is roughly between 14 to 21 days, or 2 to 3 weeks.
mold_monthly_resolution <- mold_data_clean %>%
group_by(Year, Month) %>%
summarise(
avg_resolution_days = mean(resolution_days, na.rm = TRUE),
n_complaints = n()
) %>%
arrange(Year, Month)
## `summarise()` has grouped output by 'Year'. You can override using the
## `.groups` argument.
kable(head(mold_monthly_resolution, 12))
| Year | Month | avg_resolution_days | n_complaints |
|---|---|---|---|
| 2010 | 01 - January | 20.31474 | 2516 |
| 2010 | 02 - February | 16.09571 | 2029 |
| 2010 | 03 - March | 15.59963 | 2676 |
| 2010 | 04 - April | 14.46246 | 2251 |
| 2010 | 05 - May | 14.81757 | 1787 |
| 2010 | 06 - June | 14.81358 | 1856 |
| 2010 | 07 - July | 14.59217 | 1765 |
| 2010 | 08 - August | 14.01535 | 1958 |
| 2010 | 09 - September | 13.32578 | 1766 |
| 2010 | 10 - October | 18.39756 | 1969 |
| 2010 | 11 - November | 20.97167 | 1661 |
| 2010 | 12 - December | 26.94590 | 1833 |
Above, we see the monthly average amount of days it took to resolve a mold complaint in 2010. This closely follows our averages from the previous table, with December as a bit of an outlier.
dv_mold_lagged <- aggregated_dv_mold_data %>%
arrange(Borough, time_index) %>%
group_by(Borough) %>%
mutate(DV_next_month = lead(`DV Reports`, n = 1)) %>%
ungroup()
kable(head(dv_mold_lagged, 12))
| Year | Month | Borough | DV Reports | Mold Complaints | year_month | time_index | DV_next_month |
|---|---|---|---|---|---|---|---|
| 2010 | 01 - January | BRONX | 910 | 954 | 2010 01 - January | 1 | 818 |
| 2010 | 02 - February | BRONX | 818 | 738 | 2010 02 - February | 2 | 969 |
| 2010 | 03 - March | BRONX | 969 | 941 | 2010 03 - March | 3 | 875 |
| 2010 | 04 - April | BRONX | 875 | 798 | 2010 04 - April | 4 | 940 |
| 2010 | 05 - May | BRONX | 940 | 576 | 2010 05 - May | 5 | 1015 |
| 2010 | 06 - June | BRONX | 1015 | 582 | 2010 06 - June | 6 | 1043 |
| 2010 | 07 - July | BRONX | 1043 | 553 | 2010 07 - July | 7 | 970 |
| 2010 | 08 - August | BRONX | 970 | 528 | 2010 08 - August | 8 | 983 |
| 2010 | 09 - September | BRONX | 983 | 534 | 2010 09 - September | 9 | 914 |
| 2010 | 10 - October | BRONX | 914 | 562 | 2010 10 - October | 10 | 959 |
| 2010 | 11 - November | BRONX | 959 | 513 | 2010 11 - November | 11 | 878 |
| 2010 | 12 - December | BRONX | 878 | 694 | 2010 12 - December | 12 | 1052 |
Because psychological effects related to mold exposure may not develop immediately, I wanted to explore whether there are delayed temporal patterns between mold complaints and domestic violence reports. Specifically, if mold complaints increase in one month, could this be associated with higher levels of domestic violence in the following month?
To examine this, I created a lagged dataset that matches residential mold complaints from one month with domestic violence incidents reported in the following month.
ggplot(dv_mold_lagged, aes(x = `Mold Complaints`, y = DV_next_month)) +
geom_point(alpha = 0.5, color = "darkgreen") +
geom_smooth(method = "lm", color = "darkred") +
labs(
title = "Next Month DV Reports vs Current Month Mold Complaints",
x = "Mold Complaints (Current Month)",
y = "DV Reports (Next Month)"
) +
theme_minimal()
These two variables still look very closely related! But, does time passing really have anything to do with it?
Let’s conduct some statistical tests to dig deeper!
cor.test(dv_mold_lagged$DV_next_month, dv_mold_lagged$`Mold Complaints`)
##
## Pearson's product-moment correlation
##
## data: x and y
## t = 41.32, df = 893, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.7865287 0.8316650
## sample estimates:
## cor
## 0.8102952
I conducted a Pearson’s correlation test to examine the relationship between mold complaints in one month and DV reports the following month.
Strength: 0.81 (very strong)
Direction: positive
Significance: statistically significant (p<0.05)
The results show a very strong positive correlation, suggesting that months with higher mold complaint counts are associated with higher domestic violence reports in the following month. However, this result is similar to the basic correlation between mold complaints and DV reports (conducted earlier on) and does not account for other factors that might influence the relationship, such as borough.
To better understand how additional variables (such as borough and average resolution time) affect this association, I conducted regression analyses.
aggregated_dv_mold_data <- aggregated_dv_mold_data %>%
mutate(Borough = factor(Borough)) %>%
arrange(Year, Month)
lm_dv_mold <- lm(`DV Reports` ~ `Mold Complaints`,
data = aggregated_dv_mold_data)
summary(lm_dv_mold)
##
## Call:
## lm(formula = `DV Reports` ~ `Mold Complaints`, data = aggregated_dv_mold_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -922.11 -177.91 -14.37 167.41 609.43
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 306.12047 12.25800 24.97 <2e-16 ***
## `Mold Complaints` 1.01374 0.02346 43.21 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 207.4 on 898 degrees of freedom
## Multiple R-squared: 0.6752, Adjusted R-squared: 0.6748
## F-statistic: 1867 on 1 and 898 DF, p-value: < 2.2e-16
AIC(lm_dv_mold)
## [1] 12160.34
This linear regression model tests the association between monthly residential mold complaints and domestic violence reports across all 5 boroughs and all time periods.
Strength: strong (R^2 = 0.67)
Direction: positive
Significance: statistically significant (p<0.05)
AIC: 12160.34
Results suggest a strong and statistically significant positive association between mold complaints and DV reports. On average, months with higher numbers of mold complaints are associated with higher numbers of reported domestic violence incidents. However, this model does not account for differences across boroughs or temporal patterns.
lm_borough <- lm(
`DV Reports` ~ `Mold Complaints` + Borough,
data = aggregated_dv_mold_data
)
summary(lm_borough)
##
## Call:
## lm(formula = `DV Reports` ~ `Mold Complaints` + Borough, data = aggregated_dv_mold_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -257.79 -42.87 -3.50 37.93 432.04
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 931.33548 15.54248 59.922 < 2e-16 ***
## `Mold Complaints` 0.19574 0.01975 9.909 < 2e-16 ***
## BoroughBROOKLYN 59.10721 9.02091 6.552 9.56e-11 ***
## BoroughMANHATTAN -452.89589 11.17680 -40.521 < 2e-16 ***
## BoroughQUEENS -198.79959 12.56259 -15.825 < 2e-16 ***
## BoroughSTATEN ISLAND -768.91015 15.80207 -48.659 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 85.58 on 894 degrees of freedom
## Multiple R-squared: 0.9449, Adjusted R-squared: 0.9446
## F-statistic: 3069 on 5 and 894 DF, p-value: < 2.2e-16
AIC(lm_borough)
## [1] 10571.03
This linear regression model tests the association between monthly residential mold complaints and domestic violence reports within each borough rather than across the city.
Strength: very strong (R^2 = 0.94)
Direction: positive
Significance: statistically significant (p<0.05)
AIC: 10571.03
The association between mold complaints and DV reports remains positive and statistically significant. Including boroughs greatly increases the R^2, showing that much of the variation in DV reports is explained by differences between boroughs rather than mold alone. The borough coefficients compare each borough to the Bronx and highlight that DV reporting levels differ substantially across boroughs.
aggregated_dv_mold_data <- aggregated_dv_mold_data %>%
left_join(mold_monthly_resolution, by = c("Year", "Month"))
lm_resolution_borough <- lm(
`DV Reports` ~ `Mold Complaints` + Borough + avg_resolution_days,
data = aggregated_dv_mold_data
)
summary(lm_resolution_borough)
##
## Call:
## lm(formula = `DV Reports` ~ `Mold Complaints` + Borough + avg_resolution_days,
## data = aggregated_dv_mold_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -258.31 -43.84 -2.01 38.16 403.40
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 994.96968 17.64318 56.394 < 2e-16 ***
## `Mold Complaints` 0.17645 0.01944 9.078 < 2e-16 ***
## BoroughBROOKLYN 59.17001 8.78658 6.734 2.95e-11 ***
## BoroughMANHATTAN -459.33996 10.92506 -42.045 < 2e-16 ***
## BoroughQUEENS -207.33753 12.29650 -16.862 < 2e-16 ***
## BoroughSTATEN ISLAND -781.57967 15.49694 -50.434 < 2e-16 ***
## avg_resolution_days -3.03676 0.43241 -7.023 4.30e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 83.35 on 893 degrees of freedom
## Multiple R-squared: 0.9478, Adjusted R-squared: 0.9475
## F-statistic: 2704 on 6 and 893 DF, p-value: < 2.2e-16
AIC(lm_resolution_borough)
## [1] 10524.65
This linear regression model continues to test the association while incorporating resolution time of mold complaints.
Strength: very strong (R^2 = 0.95)
Direction: negative
Significance: statistically significant (p<0.05)
AIC: 10539.76
Even after controlling for borough and resolution time, mold complaints remain a statistically significant predictor of DV reports. Average resolution days show a statistically significant negative association with DV reports, suggesting that months with longer resolution delays are associated with fewer reported DV incidents. While this may intuitively feel like the opposite of the expected result, this could be due to many factors, such as a lack of borough/community relationship. For instance, this relationship may have been lower in times where city officials took longer to respond to mold complaints. If there was a low borough/community relationship, it is possible community members felt less confident in filing domestic violence reports, out of lack of perceived resources.
Overall, the final linear regression model is our best predictive model for domestic violence reports. With R^2 of 0.95, and the lowest AIC out of the three regression models (10539.76), this model best supports the hypothesis that domestic violence reports can be predicted by residential mold complaints in the same area at the same time.
Overall, the results show a consistent positive association between residential mold complaints and domestic violence reports. Simple correlations suggest that months with more mold complaints tend to have more DV reports as well. This pattern appears at both yearly and monthly levels.
When borough differences are accounted for in regression models, the relationship between mold complaints and DV reports still remains statistically significant, but weaker. This suggests that while borough-level differences explain much of the variation, mold complaints still have an independent association with DV reports.
Adding average mold resolution time shows that longer resolution delays are associated with lower DV report counts. This may be due to reporting behavior or service engagement rather than a direct effect.
Lagged analyses were used to test whether mold complaints in one month are related to DV reports in the following month. Although the lagged relationship remains positive, it closely resembles the non-lagged results, suggesting that the results may not largely be due to time.
In summary, the analysis suggests a consistent positive relationship between residential mold complaints and domestic violence reports, but borough-level differences and other contextual factors appear to drive much of the variation, highlighting the complexity of environmental and social influences on public health outcomes. In the future, I would like to look at neighborhood-specific trends, or DV/mold rates instead of counts as populations vary across boroughs.