Introduction

Housing conditions are an important factor in public health and well-being. Poor residential environments, such as mold exposure, can contribute to physical health problems and increased stress. Domestic violence is also influenced by environmental and social stressors, making housing conditions a relevant area of study.

This project examines the relationship between residential mold complaints and domestic violence reports in New York City from 2010 to 2024. I am using two datasets from NYC Open Data: 311 Complaint Data to extract residential mold complaints and NYPD Complaint Data Historic to extract domestic violence reports. Using NYC 311 mold complaint data and DV report data, I explore whether these two types of reports follow similar patterns over time. The goal is not to determine causation, but to understand whether mold complaints and DV reports tend to rise and fall together.

The analysis focuses on monthly aggregated data and includes exploratory summaries, correlation analyses, and regression models. I also explore delayed (or lagged) relationships and mold complaint resolution time to better understand how timing may play a role.

Loading, Prepping, Cleaning, & Aggregating

Data Preparation & Cleaning

library(tidyverse)
library(readxl)
library(ggplot2)
library(mosaic)
library(AICcmodavg)
library(knitr)
mold_data <- read_excel("311_Service_Requests_from_2010_to_Present_20251215.xlsx")
dv_data <- read_excel("NYPD_Complaint_Data_Historic_20251218.xlsx")
mold_data_clean <- mold_data %>% filter(
           `Location Type` == "RESIDENTIAL BUILDING" | `Location Type` == "Residential Building" | `Location Type` == "Loft Residence" | `Location Type` == "Mixed Use Building" | `Location Type` == "Apartment" | `Location Type` == "3+ Family Apartment Building" | `Location Type` == "1-2 Family Dwelling" | `Location Type` == "1-2 Family Mixed Use Building" | `Location Type` == "3+ Family Mixed Use Building" | `Location Type` == "Single Room Occupancy (SRO)")

dv_data_clean<- dv_data %>% rename(
  "complaint_number" = `CMPLNT_NUM`,
  "inc_occur_date" = `CMPLNT_FR_DT`,
  "inc_occur_time" = `CMPLNT_FR_TM`,
  "inc_end_date" = `CMPLNT_TO_DT`,
  "inc_end_time" = `CMPLNT_TO_TM`,
  "precinct_occur" = `ADDR_PCT_CD`,
  "report_date" = `RPT_DT`,
  "key_code" = `KY_CD`,
  "offense_type" = `OFNS_DESC`,
  "class_code" = `PD_CD`,
  "class_code_desc" = `PD_DESC`,
  "attempt_completion" = `CRM_ATPT_CPTD_CD`,
  "offense_level" = `LAW_CAT_CD`,
  "borough" = `BORO_NM`,
  "occur_location" = `LOC_OF_OCCUR_DESC`,
  "premise_desc" = `PREM_TYP_DESC`,
  "juris_code_desc" = `JURIS_DESC`,
  "jurisdiction" = `JURISDICTION_CODE`,
  "park_occur" = `PARKS_NM`,
  "development" = `HADEVELOPT`,
  "development_code" = `HOUSING_PSA`,
  "x_coord" = `X_COORD_CD`,
  "y_coord" = `Y_COORD_CD`,
  "suspect_age" = `SUSP_AGE_GROUP`,
  "suspect_race" = `SUSP_RACE`,
  "suspect_sex" = `SUSP_SEX`,
  "transit_district" = `TRANSIT_DISTRICT`,
  "patrol_borough" = `PATROL_BORO`,
  "station_name" = `STATION_NAME`,
  "victim_age" = `VIC_AGE_GROUP`,
  "victim_race" = `VIC_RACE`,
  "victim_sex" = `VIC_SEX`
  )

mold_data_clean <- mold_data_clean %>%
  mutate(Created_Date_Original = `Created Date`) %>% 
  separate(
    col = `Created Date`,
  into = c("Year","Month","Day"),
  sep = "-", 
  remove = FALSE
)

dv_data_clean <- dv_data_clean %>% separate(
  col = inc_occur_date,
  into = c("Year","Month","Day"),
  sep = "-",
)

mold_data_clean <- mold_data_clean %>%
  filter(Descriptor != "Unsafe Mold Cleanup")

mold_data_clean<- mold_data_clean %>% mutate(
  `Complaint Type` = recode(
    `Complaint Type`, 
    "UNSANITARY CONDITION" = "Mold",
    "Unsanitary Condition" = "Mold",
    "MOLD" = "Mold",
    "GENERAL" = "Mold",
    "GENERAL CONSTRUCTION" = "Mold"
  ))

mold_data_clean<- mold_data_clean %>% mutate(
  `Month` = recode(
    `Month`, 
    "01" = "01 - January",
    "02" = "02 - February",
    "03" = "03 - March",
    "04" = "04 - April",
    "05" = "05 - May",
    "06" = "06 - June",
    "07" = "07 - July",
    "08" = "08 - August",
    "09" = "09 - September",
    "10" = "10 - October",
    "11" = "11 - November",
    "12" = "12 - December"
    ))

dv_data_clean<- dv_data_clean %>% mutate(
  `Month` = recode(
    `Month`, 
    "01" = "01 - January",
    "02" = "02 - February",
    "03" = "03 - March",
    "04" = "04 - April",
    "05" = "05 - May",
    "06" = "06 - June",
    "07" = "07 - July",
    "08" = "08 - August",
    "09" = "09 - September",
    "10" = "10 - October",
    "11" = "11 - November",
    "12" = "12 - December"
  ))

dv_data_clean$complaint <- "DV"

dv_data_clean <- dv_data_clean %>%
  mutate(Year = as.numeric(Year)) %>%
  filter(Year >= 2010)

mold_data_clean<- mold_data_clean %>% mutate(
  Borough = case_when(
    City == "NEW YORK" ~ "MANHATTAN",
    City == "BROOKLYN" ~ "BROOKLYN",
    City == "ARVERNE" ~ "QUEENS",
    City == "BRONX" ~ "BRONX",
    City == "JAMAICA" ~ "QUEENS",
    City == "SPRINGFIELD GARDENS" ~ "QUEENS",
    City == "FLUSHING" ~ "QUEENS",
    City == "STATEN ISLAND" ~ "STATEN ISLAND",
    City == "RICHMOND HILL" ~ "QUEENS",
    City == "ASTORIA" ~ "QUEENS",
    City == "HOLLIS" ~ "QUEENS",
    City == "RIDGEWOOD" ~ "QUEENS",
    City == "FOREST HILLS" ~ "QUEENS",
    City == "ELMHURST" ~ "QUEENS",
    City == "MASPETH" ~ "QUEENS",
    City == "SOUTH RICHMOND HILL" ~ "QUEENS",
    City == "JACKSON HEIGHTS" ~ "QUEENS",
    City == "BAYSIDE" ~ "QUEENS",
    City == "FAR ROCKAWAY" ~ "QUEENS",
    City == "SAINT ALBANS" ~ "QUEENS",
    City == "CORONA" ~ "QUEENS",
    City == "WOODSIDE" ~ "QUEENS",
    City == "QUEENS VILLAGE" ~ "QUEENS",
    City == "REGO PARK" ~ "QUEENS",
    City == "ROSEDALE" ~ "QUEENS",
    City == "SUNNYSIDE" ~ "QUEENS",
    City == "OZONE PARK" ~ "QUEENS",
    City == "EAST ELMHURST" ~ "QUEENS",
    City == "MIDDLE VILLAGE" ~ "QUEENS",
    City == "WOODHAVEN" ~ "QUEENS",
    City == "SOUTH OZONE PARK" ~ "QUEENS",
    City == "ROCKAWAY PARK" ~ "QUEENS",
    City == "KEW GARDENS" ~ "QUEENS",
    City == "FRESH MEADOWS" ~ "QUEENS",
    City == "COLLEGE POINT" ~ "QUEENS",
    City == "LONG ISLAND CITY" ~ "QUEENS",
    City == "OAKLAND GARDENS" ~ "QUEENS",
    City == "WHITESTONE" ~ "QUEENS",
    City == "HOWARD BEACH" ~ "QUEENS",
    City == "CAMBRIA HEIGHTS" ~ "QUEENS",
    City == "BELLEROSE" ~ "QUEENS",
    City == "LITTLE NECK" ~ "QUEENS",
    City == "BREEZY POINT" ~ "QUEENS",
    City == "GLEN OAKS" ~ "QUEENS",
    City == "FLORAL PARK" ~ "QUEENS",
    City ==  "PELHAM" ~ "BRONX",
    City == "NEW HYDE PARK" ~ "QUEENS",
    City == "QUEENS" ~ "QUEENS",
    City == "MANHATTAN" ~ "MANHATTAN",
    City == "Far Rockaway" ~ "QUEENS",  
    City == "Astoria" ~ "QUEENS",
    City == "Elmhurst" ~ "QUEENS",
    City == "Corona" ~ "QUEENS",       
    City == "Ozone Park" ~ "QUEENS",
    City == "Forest Hills" ~ "QUEENS",
    City == "Jamaica" ~ "QUEENS",  
    City == "Arverne" ~ "QUEENS",
    City == "Bayside" ~ "QUEENS",
    City == "East Elmhurst" ~ "QUEENS", 
    City == "Flushing" ~ "QUEENS",
    City == "Middle Village" ~ "QUEENS",
    City == "Ridgewood" ~ "QUEENS",
    City == "Woodside" ~ "QUEENS",
    City == "Oakland Gardens" ~ "QUEENS",
    City == "Rego Park" ~ "QUEENS",
    City == "Hollis" ~ "QUEENS",
    City == "Saint Albans" ~ "QUEENS",
    City == "Springfield Gardens" ~ "QUEENS",
    City == "Kew Gardens" ~ "QUEENS",
    City == "Fresh Meadows" ~ "QUEENS",
    City == "Howard Beach" ~ "QUEENS",
    City == "South Rich mond Hill" ~ "QUEENS",
    City == "Whitestone" ~ "QUEENS",
    City == "South Ozone Park" ~ "QUEENS",
    City == "College Point" ~ "QUEENS",
    City == "Jackson Heights" ~ "QUEENS",
    City == "Maspeth" ~ "QUEENS",
    City == "Long Island City" ~ "QUEENS",
    City == "Rockaway Park" ~ "QUEENS",
    City == "Sunnyside" ~ "QUEENS",
    City == "Woodhaven" ~ "QUEENS",
    City == "Floral Park" ~ "QUEENS",
    City == "Glen Oaks" ~ "QUEENS",
    City == "Queens Village" ~ "QUEENS",
    City == "Bellerose" ~ "QUEENS",
    City == "Little Neck" ~ "QUEENS",
    City == "Richmond Hill" ~ "QUEENS",
    City == "Rosedale" ~ "QUEENS",
    City == "Cambria Heights" ~ "QUEENS",
    City == "New Hyde Park" ~ "QUEENS",
    City == "Breezy Point" ~ "QUEENS",
  )
)

mold_data_clean <- mold_data_clean %>% filter(!is.na(Borough))
mold_data_clean<- mold_data_clean %>% filter(Year != 2025)

mold_data_clean <- mold_data_clean %>%
  mutate(
    created_date = as.Date(`Created Date`),
    closed_date  = as.Date(`Closed Date`),
    resolution_days = as.numeric(closed_date - created_date)
  )

kable(head(mold_data_clean, 3))
Unique Key Created Date Year Month Day Closed Date Agency Agency Name Complaint Type Descriptor Location Type Incident Zip Incident Address Street Name Cross Street 1 Cross Street 2 Intersection Street 1 Intersection Street 2 Address Type City Landmark Facility Type Status Due Date Resolution Description Resolution Action Updated Date Community Board BBL Borough X Coordinate (State Plane) Y Coordinate (State Plane) Open Data Channel Type Park Facility Name Park Borough Vehicle Type Taxi Company Borough Taxi Pick Up Location Bridge Highway Name Bridge Highway Direction Road Ramp Bridge Highway Segment Latitude Longitude Location Created_Date_Original created_date closed_date resolution_days
63573695 2024-12-31T23:06:14.000 2024 12 - December 31T23:06:14.000 2025-01-19T11:58:36.000 HPD Department of Housing Preservation and Development Mold MOLD RESIDENTIAL BUILDING 10451 283 EAST 149 STREET EAST 149 STREET NA NA NA NA ADDRESS BRONX NA NA Closed NA HPD conducted an inspection of this complaint. The conditions observed by the inspector did not violate the housing laws enforced by HPD. The complaint has been closed. 2025-01-19T00:00:00.000 01 BRONX 2023310072 BRONX 1005910 236991 PHONE Unspecified BRONX NA NA NA NA NA NA NA 40.81713 -73.92175 (40.81713468822815, -73.9217467475281) 2024-12-31T23:06:14.000 2024-12-31 2025-01-19 19
63583460 2024-12-31T21:19:39.000 2024 12 - December 31T21:19:39.000 2025-01-03T16:24:36.000 HPD Department of Housing Preservation and Development Mold MOLD RESIDENTIAL BUILDING 10468 2719 MORRIS AVENUE MORRIS AVENUE NA NA NA NA ADDRESS BRONX NA NA Closed NA HPD inspected this condition so the complaint has been closed. Violations were issued. The law provides the property owner time to correct the condition(s). Violation descriptions and the dates for the property owner to correct any violations are available at HPDONLINE. If the owner has not corrected the condition by the date provided, you may wish to bring a case in housing court seeking the correction of these conditions.To find out more about how to start a housing court case, visit HPD’s w 2025-01-03T00:00:00.000 07 BRONX 2033170043 BRONX 1013153 255563 PHONE Unspecified BRONX NA NA NA NA NA NA NA 40.86809 -73.89550 (40.86808863174485, -73.89549921281306) 2024-12-31T21:19:39.000 2024-12-31 2025-01-03 3
63583408 2024-12-31T20:55:28.000 2024 12 - December 31T20:55:28.000 2025-01-10T16:55:45.000 HPD Department of Housing Preservation and Development Mold MOLD RESIDENTIAL BUILDING 11435 85-15 139 STREET 139 STREET NA NA NA NA ADDRESS JAMAICA NA NA Closed NA HPD conducted an inspection of this complaint. The conditions observed by the inspector did not violate the housing laws enforced by HPD. The complaint has been closed. 2025-01-10T00:00:00.000 08 QUEENS 4097100002 QUEENS 1034989 197495 PHONE Unspecified QUEENS NA NA NA NA NA NA NA 40.70861 -73.81699 (40.7086094282564, -73.8169884642595) 2024-12-31T20:55:28.000 2024-12-31 2025-01-10 10
kable(head(dv_data_clean, 5))
complaint_number Year Month Day inc_occur_time inc_end_date inc_end_time precinct_occur report_date key_code offense_type class_code class_code_desc attempt_completion offense_level borough occur_location premise_desc juris_code_desc jurisdiction park_occur development development_code x_coord y_coord suspect_age suspect_race suspect_sex transit_district Latitude Longitude Lat_Lon patrol_borough station_name victim_age victim_race victim_sex complaint
298690828 2024 12 - December 31T00:00:00.000 13:00:00 2024-12-31T00:00:00.000 13:10:00 113 2024-12-31T00:00:00.000 344 ASSAULT 3 & RELATED OFFENSES 101 ASSAULT 3 COMPLETED MISDEMEANOR QUEENS INSIDE RESIDENCE-HOUSE N.Y. POLICE DEPT 0 (null) (null) NA 1046104 187464 <18 BLACK F NA 40.68101 -73.77699 (40.681014, -73.776991) PATROL BORO QUEENS SOUTH (null) <18 BLACK M DV
298698016 2024 12 - December 31T00:00:00.000 08:00:00 2024-12-31T00:00:00.000 09:00:00 116 2024-12-31T00:00:00.000 344 ASSAULT 3 & RELATED OFFENSES 101 ASSAULT 3 COMPLETED MISDEMEANOR QUEENS INSIDE RESIDENCE-HOUSE N.Y. POLICE DEPT 0 (null) (null) NA 1048028 178970 25-44 BLACK M NA 40.65769 -73.77013 (40.657687, -73.770132) PATROL BORO QUEENS SOUTH (null) 18-24 WHITE HISPANIC F DV
298704508 2024 12 - December 31T00:00:00.000 16:50:00 2024-12-31T00:00:00.000 16:56:00 107 2024-12-31T00:00:00.000 344 ASSAULT 3 & RELATED OFFENSES 101 ASSAULT 3 COMPLETED MISDEMEANOR QUEENS INSIDE RESIDENCE - APT. HOUSE N.Y. POLICE DEPT 0 (null) (null) NA 1050645 203097 UNKNOWN BLACK F NA 40.72389 -73.76046 (40.723891, -73.760464) PATROL BORO QUEENS SOUTH (null) 65+ WHITE M DV
298678676 2024 12 - December 31T00:00:00.000 07:00:00 2024-12-31T00:00:00.000 07:30:00 113 2024-12-31T00:00:00.000 344 ASSAULT 3 & RELATED OFFENSES 101 ASSAULT 3 COMPLETED MISDEMEANOR QUEENS INSIDE RESIDENCE-HOUSE N.Y. POLICE DEPT 0 (null) (null) NA 1051478 189936 25-44 BLACK M NA 40.68776 -73.75759 (40.687762, -73.757589) PATROL BORO QUEENS SOUTH (null) 45-64 BLACK M DV
298672417 2024 12 - December 31T00:00:00.000 02:50:00 2024-12-31T00:00:00.000 02:55:00 101 2024-12-31T00:00:00.000 344 ASSAULT 3 & RELATED OFFENSES 101 ASSAULT 3 COMPLETED MISDEMEANOR QUEENS INSIDE RESIDENCE-HOUSE N.Y. POLICE DEPT 0 (null) (null) NA 1054075 157436 45-64 BLACK M NA 40.59854 -73.74856 (40.598536, -73.74856) PATROL BORO QUEENS SOUTH (null) 45-64 BLACK F DV

This chunk deals with some heavy cleaning of large datasets; I am mostly standardizing column names, values, filling in NA values for Borough (using the City column), and getting rid of otherwise NA values. I also separated the dates so that Year, Month, and Day could be individually utilized in the project.

Aggregating Mold Data & DV Data

aggregated_dv_data <- dv_data_clean %>%
  group_by(Year, Month, borough) %>%
  summarise(
    `complaint` = n(),
    .groups = "drop"
  )
aggregated_dv_data <- aggregated_dv_data %>%
  filter(borough != "(null)")

aggregated_dv_data <- aggregated_dv_data %>%
  mutate(Year = as.character(Year))

aggregated_dv_data<- aggregated_dv_data %>% rename(
  "Borough" = "borough")

aggregated_dv_data<- aggregated_dv_data %>% rename(
  "DV Reports" = "complaint"
)

aggregated_mold_data <- mold_data_clean %>%
  group_by(Year, Month, Borough) %>%
  summarise(
  `Complaint Type` = n(),
  .groups = "drop"
)

aggregated_mold_data<- aggregated_mold_data %>% rename(
  "Mold Complaints" = "Complaint Type"
)

aggregated_dv_mold_data <- left_join(
  aggregated_dv_data,
  aggregated_mold_data,
  by = c("Borough", "Year", "Month")
)

aggregated_dv_mold_data <- aggregated_dv_mold_data %>%
  mutate(
    year_month = paste(Year, Month)
  )

aggregated_dv_mold_data <- aggregated_dv_mold_data %>%
  group_by(Borough) %>%
  mutate(time_index = row_number()) %>%
  ungroup()

kable(head(aggregated_dv_mold_data, 15))
Year Month Borough DV Reports Mold Complaints year_month time_index
2010 01 - January BRONX 910 954 2010 01 - January 1
2010 01 - January BROOKLYN 1306 779 2010 01 - January 1
2010 01 - January MANHATTAN 541 410 2010 01 - January 1
2010 01 - January QUEENS 791 315 2010 01 - January 1
2010 01 - January STATEN ISLAND 154 58 2010 01 - January 1
2010 02 - February BRONX 818 738 2010 02 - February 2
2010 02 - February BROOKLYN 938 651 2010 02 - February 2
2010 02 - February MANHATTAN 424 338 2010 02 - February 2
2010 02 - February QUEENS 603 273 2010 02 - February 2
2010 02 - February STATEN ISLAND 151 29 2010 02 - February 2
2010 03 - March BRONX 969 941 2010 03 - March 3
2010 03 - March BROOKLYN 1149 870 2010 03 - March 3
2010 03 - March MANHATTAN 500 415 2010 03 - March 3
2010 03 - March QUEENS 758 395 2010 03 - March 3
2010 03 - March STATEN ISLAND 147 55 2010 03 - March 3

In this chunk, I aggregated the domestic violence reports and mold complaints datasets into one, grouped by month and borough, in order to easily analyze trends.

Exploring the Data

Domestic Violence Data

Summary Stats

dv_summary <- aggregated_dv_data %>%
  summarise(
    total_reports = sum(`DV Reports`),
    start_year = min(as.numeric(Year)),
    end_year = max(as.numeric(Year)),
    boroughs = n_distinct(Borough),
    avg_monthly = mean(`DV Reports`)
  )

kable(dv_summary)
total_reports start_year end_year boroughs avg_monthly
669136 2010 2024 5 743.4844

From the beginning of the year in 2010 to the end of the year in 2024, there were a total of 669,136 domestic violence incidents reported across all five boroughs in New York City. Without grouping by year or borough, there are roughly 743 domestic violence incident reports filed each month.

Borough/Year Distribution

dv_by_year_borough <- aggregated_dv_data %>%
  group_by(Year, Borough) %>%
  summarise(total_reports = sum(`DV Reports`)) %>%
  pivot_wider(names_from = Year,
    values_from = total_reports)

kable(dv_by_year_borough)
Borough 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024
BRONX 11274 11441 12207 12630 12924 12594 12916 12569 13662 12943 12013 12499 14193 14149 14905
BROOKLYN 14094 14344 15232 15032 15022 14101 13771 13052 13039 12250 11256 12362 13286 13180 13652
MANHATTAN 6081 6128 6471 6409 6627 6555 6791 6586 6550 6864 6117 7160 7194 6880 7215
QUEENS 8726 9182 9128 9425 9467 8906 8697 8733 9047 9324 8778 9368 10205 10484 12071
STATEN ISLAND 2015 2056 2414 2242 2364 2178 2112 2036 2048 1868 1687 1871 2099 2126 2259

From 2010 to 2024, Brooklyn and Bronx have consistently had the highest amount of domestic violence reports in NYC. Brooklyn was the borough with the most amount of reports since 2010, but sometime in 2018, Bronx took the title for highest reported incidents and has been the borough with the highest number of reports since then. Staten Island has had the lowest reported incidents each year.

For the most part, domestic violence incident reports have risen across the boroughs consistently from 2010-2024. However, Brooklyn is the only borough in which the number of reported incidents are lower in 2024 than they were in 2010. (This could be something interesting to look into!)

Heat Map

library(ggplot2)

dv_plot_data <- aggregated_dv_data %>%
  group_by(Year, Borough) %>%
  summarise(total_reports = sum(`DV Reports`))

ggplot(dv_plot_data, aes(x = Year, y = Borough, fill = total_reports)) +
  geom_tile(color = "white") +
  scale_fill_gradient(low = "lightblue", high = "darkred") +
  labs(
    title = "DV Reports by Borough and Year",
    x = "Year",
    y = "Borough",
    fill = "DV Reports"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45))

Above is a heat map of domestic violence incident reports from 2010-2024. This reflects what was observed in the previous table; The Bronx and Brooklyn typically tend to have a higher volume of reports, and Staten Island has stayed largely below the average.

Mold Exposure Data

Summary Stats

mold_summary <- aggregated_mold_data %>%
  summarise(
    total_complaints = sum(`Mold Complaints`),
    start_year = min(as.numeric(Year)),
    end_year = max(as.numeric(Year)),
    boroughs = n_distinct(Borough),
    avg_monthly = mean(`Mold Complaints`)
  )

kable(mold_summary)
total_complaints start_year end_year boroughs avg_monthly
388293 2010 2024 5 431.4367

From 2010-2025, there have been a total of 412,698 mold complaints in residential buildings reported across all five boroughs in New York City. Without grouping by year or borough, there are roughly 431 residential mold complaints made to 311 every month.

Borough/Year Distributions

mold_by_year_borough <- aggregated_mold_data %>%
  group_by(Year, Borough) %>%
  summarise(total_complaints = sum(`Mold Complaints`)) %>%
  pivot_wider(names_from = Year,
    values_from = total_complaints)

kable(mold_by_year_borough)
Borough 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024
BRONX 7973 8726 7384 7778 8295 8434 7946 7510 9510 6460 5953 9378 9513 11438 12847
BROOKLYN 8244 10404 8373 9299 9280 9074 7700 7908 9747 6360 5397 8199 8265 10389 11092
MANHATTAN 3977 4403 3683 3840 4787 5062 4395 4031 4942 3119 2985 5027 5347 6660 6757
QUEENS 3258 3907 3205 3290 3351 3328 2875 2824 3569 2406 2027 3125 3347 4229 4736
STATEN ISLAND 615 852 730 757 747 688 720 691 864 529 504 777 721 960 770

From 2010 to 2024, Brooklyn and Bronx seem to have the highest complaints of mold in residential buildings, and they compete for first place. Staten Island was found to have the lowest number of residential mold complaints to 311 each year.

Across all 5 boroughs, there are more mold complaints in 2024 than there were in 2010, with the overall trend being an increase in 311 complaints for residential mold.

Heat Map

mold_plot_data <- aggregated_mold_data %>%
  group_by(Year, Borough) %>%
  summarise(total_complaints = sum(`Mold Complaints`))

ggplot(mold_plot_data, aes(x = Year, y = Borough, fill = total_complaints)) +
  geom_tile(color = "white") +
  scale_fill_gradient(low = "wheat", high = "darkgreen") +
  labs(
    title = "Mold Complaints by Borough and Year",
    x = "Year",
    y = "Borough",
    fill = "Mold Complaints"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45))

Above is a heat map of residential mold complaints to 311 from 2010-2025. Similar in density to the domestic violence report heat map; The Bronx and Brooklyn seem to have a higher volume of complaints, and Staten Island has stayed largely below the average.

Preliminary Correlation

cor.test(aggregated_dv_mold_data$`DV Reports`, 
         aggregated_dv_mold_data$`Mold Complaints`)
## 
##  Pearson's product-moment correlation
## 
## data:  x and y
## t = 43.206, df = 898, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.7992741 0.8418473
## sample estimates:
##       cor 
## 0.8217037
  • Strength: 0.82 (very strong)

  • Direction: positive

  • Significance: statistically significant (p<0.05)

I ran a correlation between domestic violence reports and residential mold complaints to see if there was a substantial relationship between them, and there is! The relationship between the two is positive and very strong, suggesting that a higher amount of DV reports is associated with a higher amount of mold complaints, and a lower number of DV reports is associated with a lower amount of mold complaints. This tells us that both variables move together.

Let’s visualize this:
ggplot(aggregated_dv_mold_data, aes(x = `Mold Complaints`, y = `DV Reports`, color = Borough)) +
  geom_point(alpha = 0.5) +
  geom_smooth(method = "lm", se = FALSE, color = "gray18") +
  labs(title = "DV Reports vs Mold Complaints by Borough",
       x = "Mold Complaints",
       y = "DV Reports") +
  theme_minimal()

However, this only tells us that domestic violence reports and mold complaints co-occur, and it does not tell us anything about causality.

Let’s dive into how domestic violence reports and mold complaints develop over time!

Statistical Analysis

Lagged Data Correlation Analysis

cor.test(dv_mold_lagged$DV_next_month, dv_mold_lagged$`Mold Complaints`)
## 
##  Pearson's product-moment correlation
## 
## data:  x and y
## t = 41.32, df = 893, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.7865287 0.8316650
## sample estimates:
##       cor 
## 0.8102952

I conducted a Pearson’s correlation test to examine the relationship between mold complaints in one month and DV reports the following month.

  • Strength: 0.81 (very strong)

  • Direction: positive

  • Significance: statistically significant (p<0.05)

The results show a very strong positive correlation, suggesting that months with higher mold complaint counts are associated with higher domestic violence reports in the following month. However, this result is similar to the basic correlation between mold complaints and DV reports (conducted earlier on) and does not account for other factors that might influence the relationship, such as borough.

To better understand how additional variables (such as borough and average resolution time) affect this association, I conducted regression analyses.

Regression Models

DV ~ Mold

aggregated_dv_mold_data <- aggregated_dv_mold_data %>%
  mutate(Borough = factor(Borough)) %>%
  arrange(Year, Month)

lm_dv_mold <- lm(`DV Reports` ~ `Mold Complaints`,
  data = aggregated_dv_mold_data)

summary(lm_dv_mold)
## 
## Call:
## lm(formula = `DV Reports` ~ `Mold Complaints`, data = aggregated_dv_mold_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -922.11 -177.91  -14.37  167.41  609.43 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       306.12047   12.25800   24.97   <2e-16 ***
## `Mold Complaints`   1.01374    0.02346   43.21   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 207.4 on 898 degrees of freedom
## Multiple R-squared:  0.6752, Adjusted R-squared:  0.6748 
## F-statistic:  1867 on 1 and 898 DF,  p-value: < 2.2e-16
AIC(lm_dv_mold)
## [1] 12160.34

This linear regression model tests the association between monthly residential mold complaints and domestic violence reports across all 5 boroughs and all time periods.

  • Strength: strong (R^2 = 0.67)

  • Direction: positive

  • Significance: statistically significant (p<0.05)

  • AIC: 12160.34

Results suggest a strong and statistically significant positive association between mold complaints and DV reports. On average, months with higher numbers of mold complaints are associated with higher numbers of reported domestic violence incidents. However, this model does not account for differences across boroughs or temporal patterns.

DV ~ Mold + Borough

lm_borough <- lm(
  `DV Reports` ~ `Mold Complaints` + Borough,
  data = aggregated_dv_mold_data
)

summary(lm_borough)
## 
## Call:
## lm(formula = `DV Reports` ~ `Mold Complaints` + Borough, data = aggregated_dv_mold_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -257.79  -42.87   -3.50   37.93  432.04 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           931.33548   15.54248  59.922  < 2e-16 ***
## `Mold Complaints`       0.19574    0.01975   9.909  < 2e-16 ***
## BoroughBROOKLYN        59.10721    9.02091   6.552 9.56e-11 ***
## BoroughMANHATTAN     -452.89589   11.17680 -40.521  < 2e-16 ***
## BoroughQUEENS        -198.79959   12.56259 -15.825  < 2e-16 ***
## BoroughSTATEN ISLAND -768.91015   15.80207 -48.659  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 85.58 on 894 degrees of freedom
## Multiple R-squared:  0.9449, Adjusted R-squared:  0.9446 
## F-statistic:  3069 on 5 and 894 DF,  p-value: < 2.2e-16
AIC(lm_borough)
## [1] 10571.03

This linear regression model tests the association between monthly residential mold complaints and domestic violence reports within each borough rather than across the city.

  • Strength: very strong (R^2 = 0.94)

  • Direction: positive

  • Significance: statistically significant (p<0.05)

  • AIC: 10571.03

The association between mold complaints and DV reports remains positive and statistically significant. Including boroughs greatly increases the R^2, showing that much of the variation in DV reports is explained by differences between boroughs rather than mold alone. The borough coefficients compare each borough to the Bronx and highlight that DV reporting levels differ substantially across boroughs.

DV ~ Mold + Borough + Average Resolution Days

aggregated_dv_mold_data <- aggregated_dv_mold_data %>%
  left_join(mold_monthly_resolution, by = c("Year", "Month"))

lm_resolution_borough <- lm(
  `DV Reports` ~ `Mold Complaints` + Borough + avg_resolution_days,
  data = aggregated_dv_mold_data
)

summary(lm_resolution_borough)
## 
## Call:
## lm(formula = `DV Reports` ~ `Mold Complaints` + Borough + avg_resolution_days, 
##     data = aggregated_dv_mold_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -258.31  -43.84   -2.01   38.16  403.40 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           994.96968   17.64318  56.394  < 2e-16 ***
## `Mold Complaints`       0.17645    0.01944   9.078  < 2e-16 ***
## BoroughBROOKLYN        59.17001    8.78658   6.734 2.95e-11 ***
## BoroughMANHATTAN     -459.33996   10.92506 -42.045  < 2e-16 ***
## BoroughQUEENS        -207.33753   12.29650 -16.862  < 2e-16 ***
## BoroughSTATEN ISLAND -781.57967   15.49694 -50.434  < 2e-16 ***
## avg_resolution_days    -3.03676    0.43241  -7.023 4.30e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 83.35 on 893 degrees of freedom
## Multiple R-squared:  0.9478, Adjusted R-squared:  0.9475 
## F-statistic:  2704 on 6 and 893 DF,  p-value: < 2.2e-16
AIC(lm_resolution_borough)
## [1] 10524.65

This linear regression model continues to test the association while incorporating resolution time of mold complaints.

  • Strength: very strong (R^2 = 0.95)

  • Direction: negative

  • Significance: statistically significant (p<0.05)

  • AIC: 10539.76

Even after controlling for borough and resolution time, mold complaints remain a statistically significant predictor of DV reports. Average resolution days show a statistically significant negative association with DV reports, suggesting that months with longer resolution delays are associated with fewer reported DV incidents. While this may intuitively feel like the opposite of the expected result, this could be due to many factors, such as a lack of borough/community relationship. For instance, this relationship may have been lower in times where city officials took longer to respond to mold complaints. If there was a low borough/community relationship, it is possible community members felt less confident in filing domestic violence reports, out of lack of perceived resources.

Overall, the final linear regression model is our best predictive model for domestic violence reports. With R^2 of 0.95, and the lowest AIC out of the three regression models (10539.76), this model best supports the hypothesis that domestic violence reports can be predicted by residential mold complaints in the same area at the same time.

Discussion & Insights

Overall, the results show a consistent positive association between residential mold complaints and domestic violence reports. Simple correlations suggest that months with more mold complaints tend to have more DV reports as well. This pattern appears at both yearly and monthly levels.

When borough differences are accounted for in regression models, the relationship between mold complaints and DV reports still remains statistically significant, but weaker. This suggests that while borough-level differences explain much of the variation, mold complaints still have an independent association with DV reports.

Adding average mold resolution time shows that longer resolution delays are associated with lower DV report counts. This may be due to reporting behavior or service engagement rather than a direct effect.

Lagged analyses were used to test whether mold complaints in one month are related to DV reports in the following month. Although the lagged relationship remains positive, it closely resembles the non-lagged results, suggesting that the results may not largely be due to time.

In summary, the analysis suggests a consistent positive relationship between residential mold complaints and domestic violence reports, but borough-level differences and other contextual factors appear to drive much of the variation, highlighting the complexity of environmental and social influences on public health outcomes. In the future, I would like to look at neighborhood-specific trends, or DV/mold rates instead of counts as populations vary across boroughs.