Data Visualization with Python

Introduction

This project analyzes a subset of New York City’s 311 Service Request dataset, which records over 40 million service calls from 2010 to the present. To focus the analysis, the data was filtered to include only noise-related complaints—a substantial share of all 311 activity. These complaints have grown steadily more common over time, reflecting rising public sensitivity to noise and a shifting urban landscape in which sound disturbances are an increasingly important quality-of-life issue.

The visualizations explore patterns in noise complaint frequency, types of noise sources, and spatial distribution across the city, highlighting when, where, and what kinds of disturbances are most frequently reported. To better understand relative “noisiness”, population data from the NYC Department of City Planning’s Decennial Census dataset was incorporated, enabling borough-level comparisons normalized by population size.

Data Sources

import os
os.environ['QT_QPA_PLATFORM_PLUGIN_PATH'] = '/Users/sarahsmith/anaconda3/plugins/platforms'

# Import libraries.
import pandas as pd
import seaborn as sns
import plotly.express as px
import numpy as np
import json
import folium
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import matplotlib.ticker as ticker
import matplotlib.patches as mpatches
import warnings
from datetime import datetime

# Import various colors and palettes.
from matplotlib import colors
from palettable.cartocolors.qualitative import Safe_3
from palettable.cartocolors.qualitative import Safe_4
from palettable.cartocolors.qualitative import Safe_5
from palettable.cartocolors.qualitative import Safe_7
from palettable.cartocolors.qualitative import Safe_8
from palettable.cartocolors.qualitative import Vivid_5

warnings.filterwarnings("ignore")

# Create a df from the csv file.
df = pd.read_csv("/Users/sarahsmith/Desktop/Python Data/NYC_311_Noise_Complaints.csv")

# Drop rows with na values without creating a new df.
df.dropna(inplace=True)

# Replace "T" character with empty space.
df["created_date"] = df["created_date"].str.replace("T", " ")

# Format date
df["created_date"] = pd.to_datetime(df["created_date"], format="%Y-%m-%d %H:%M:%S.%f")

# Extract portions of created_date, format as needed, and create new columns containing the values.
df["Date_only"] = df["created_date"].dt.date
df["Date_only"] = pd.to_datetime(df["Date_only"], format="%Y-%m-%d")
df["created_date"].dtype

df["Year"] = df["created_date"].dt.year
df["Month"] = df["created_date"].dt.month
df["Day"] = df["created_date"].dt.day
df["Hour"] = df["created_date"].dt.hour
df["Day Name"] = df["created_date"].dt.strftime("%A")
df["Month Name"] = df["created_date"].dt.strftime("%B")

Visualization Tabs

Complaint Counts by Noise Source

The chart below displays the distribution of noise complaints by source. Residential and Street/Sidewalk noise are by far the most frequently reported categories, suggesting that daily neighborhood and public-space activity account for the majority of complaints. In the visualization, blue bars represent categories above the mean complaint count, while yellow bars indicate those below the mean. The remaining sources—such as commercial, vehicle, park, and helicopter—show comparatively lower complaint volumes, reflecting a clear concentration of noise issues in residential and street-level environments.

# Horizontal Bar Chart - Number of Complaints by Noise Source

# Create df with complaint count per year and complaint_type.
df_complaint_sum = (
    df.groupby(["Year", "complaint_type"]).size().reset_index(name="count")
)

def mean_count_colors(my_df, above, within, below):
    """Calculate mean of count in df and assign colors to values
    according to above average, at average and below average
    """

    colors = []

    avg = my_df["count"].mean()

    for item in my_df["count"]:
        if item > avg * 1.01:
            colors.append(above)
        elif item < avg * 0.99:
            colors.append(below)
        else:
            colors.append(within)
    return colors
  
# Create df by broad category of complaint_type.
df_categories = (
    df_complaint_sum.groupby(["complaint_type"])["count"].sum().reset_index()
)

# Sort df by count with smallest count at the top ending with largest count at the bottom.
df_categories = df_categories.sort_values(by="count")

# Call function to determine the color depending on count above average, at average and below average.
colors_category = mean_count_colors(
    df_categories,
    (0.3647, 0.4118, 0.6941),
    (0.8, 0.4, 0.4666666666666667),
    (0.8666666666666667, 0.8, 0.4666666666666667),
)

# Horizontal Bar Chart

# Set color values.
above_color = (0.3647, 0.4118, 0.6941)
at_color = (0.8, 0.4, 0.4666666666666667)
below_color = (0.8666666666666667, 0.8, 0.4666666666666667)

# Establish figure and ax size.
fig, ax = plt.subplots(figsize=(12, 8))

# Plot horizontal bar chart.
ax.barh(
    df_categories["complaint_type"],
    df_categories["count"],
    color=colors_category,
    edgecolor=None,
)

# Format count labels and position labels at end of bars.
for i, v in enumerate(df_categories["count"]):
    ax.text(
        v + (0.01 * max(df_categories["count"])),
        i,
        f"{v:,.0f}",
        va="center",
        fontsize=12,
    )

# Calculate the mean value.
mean_val = df_categories["count"].mean()

# Plot the mean line.
plt.axvline(mean_val, color="gray", linestyle=":", linewidth=2)

# Set location and format of the mean label.
ax.text(
    mean_val + (0.02 * df_categories["count"].max()),
    7.5,
    f"Mean = {mean_val:,.0f}",
    color="gray",
    fontsize=12,
    fontstyle="italic",
    rotation=0,
    va="center",
    ha="left",
)

# Set and format the x axis labels.
ax.set_xlim(0, df_categories["count"].max() * 1.2)

ax.xaxis.set_major_formatter(ticker.StrMethodFormatter("{x:,.0f}"))
ax.tick_params(axis="x", labelsize=12)

# Set and format the y axis labels.
wrapped_labels = [
    "House of\nWorship",
    "Park",
    "Helicopter",
    "Vehicle",
    "Commercial",
    "Unspecified",
    "Street/\nSidewalk",
    "Residential",
]

ax.set_yticks(range(len(wrapped_labels)))
ax.set_yticklabels(wrapped_labels, fontsize=12)

# Define legend values.
Above = mpatches.Patch(color=above_color, label="Above average")
At = mpatches.Patch(color=at_color, label="Within 1% of Average")
Below = mpatches.Patch(color=below_color, label="Below average")

# Position and format the legend for aesthetics and optimal visualization with enlarged color patches and additional spacing.
ax.legend(
    handles=[Above, At, Below],
    fontsize=9,
    loc="lower right",
    frameon=True,
    fancybox=True,
    framealpha=0.9,
    handlelength=2.5,
    handleheight=1.2,
    borderpad=1.2,
    labelspacing=0.8,
)

# Set and format the x, y labels and overall plot title.
ax.set_xlabel("Number of Complaints", fontsize=20)
ax.set_ylabel("Noise Source", fontsize=20)
ax.set_title("Number of Complaints by Noise Source (2010-2025)", fontsize=22)

# Set grid style.
ax.grid(True, color="darkgray", linewidth=.5, alpha=0.5)

# Make sure everything fits within boundaries of figure.
plt.tight_layout()

# Display the plot.
plt.show()

Distribution of Complaint Types

This stacked bar chart displays the distribution of noise complaint categories by year, allowing us to see how each category contributes to the total annually. The year 2020 stands out with roughly 800,000 complaints, far higher than the surrounding years. By contrast, vehicle-related complaints are nearly absent in 2023. Residential noise consistently makes up a large share of total complaints across the entire time period.

# Stacked Bar Chart

# Re-arrange the column and row values of df for stacked bar chart.
df_complaint_pivot = df_complaint_sum.pivot(
    index="Year", columns="complaint_type", values="count"
)

# Set variable colors.
colors = Safe_8.mpl_colors

# Establish fig and ax size.
fig = plt.figure(figsize=(18, 10))
ax = fig.add_subplot(1, 1, 1)

# Plot the stacked bar chart.
df_complaint_pivot.plot(kind="bar", stacked=True, ax=ax, color=colors, edgecolor="none")

# Set and format labels and overall chart title.
plt.title("Complaint Type Count (2010 - 2025)", fontsize=32)
plt.xlabel("Year", fontsize=25)
plt.ylabel("Count", fontsize=25)

# Set how the axes labels will appear.
plt.xticks(rotation=0)

ax.tick_params(axis="x", which="major", length=8, labelsize=16)
ax.tick_params(axis="y", which="major", length=8, labelsize=16)
ax.yaxis.set_major_formatter(ticker.StrMethodFormatter("{x:,.0f}"))

# Format and position legend for aesthetics and optimal visibility.
plt.legend(
    fontsize=16,
    title="Noise Source",
    title_fontsize=20,
    loc="upper left",
    frameon=False,
    labelspacing=0.8,
)
# Set plot style.
ax.grid(True, color="darkgray", linewidth=.5, alpha=0.5)

# Make sure all text stays within figure bounds.
plt.tight_layout()

# Display the plot.
plt.show()

Proportion of Complaint Types Over Time

This chart shows how the proportions of different noise complaint sources have changed over time. After removing the general “Noise” category, each remaining source was converted into a yearly percentage. Residential complaints consistently dominate, dropping from about 83% in 2010 to 55% in 2025. House of Worship and Park sources remain minimal throughout. Helicopter complaints rise starting in 2019 and peak between 2020 and 2023 before declining. In 2020, residential complaints decrease while street/sidewalk complaints increase, and vehicle-related complaints show a sharp drop in 2023.

# Stacked Area Chart

# Create a df without the general "Noise" category.
df_no_noise = (
    df_complaint_sum[df_complaint_sum["complaint_type"] != "Noise"]
    .pivot(index="Year", columns="complaint_type", values="count")
    .fillna(0)
)

# Calculate % to show each noise source as a percentage of total complaints per time period.
df_no_noise_norm = df_no_noise.div(df_no_noise.sum(axis=1), axis=0) * 100

# Stacked Area Chart

# Set colors variable.
colors = Safe_8.mpl_colors

# Plot the stacked area chart.
df_no_noise_norm.plot(kind="area", stacked=True, figsize=(16, 8), color=colors)

# Set how the axes labels will appear.
ax = plt.gca()
ax.tick_params(axis="x", labelsize=18)
ax.tick_params(axis="y", labelsize=18)

# Set and format the labels and overall title.
plt.xlabel("Year", fontsize=24)
plt.ylabel("Share of Total Complaints (%)", fontsize=24)
plt.title("Proportion of Noise Complaint Sources Over Time", fontsize=28)

# Format legend and position it below chart for aesthetics and optimal visibility.
plt.legend(
    title="Noise Source",
    title_fontsize=16,
    fontsize=13,
    loc="upper center",
    bbox_to_anchor=(0.5, -0.15),
    ncol=2,
    frameon=False,
)

# Set plot style.
ax.grid(True, linewidth=0.3, alpha=0.7)
plt.tight_layout()

# Display plot
plt.show()

Top 10 Noise Descriptors

This horizontal bar chart displays the top 10 noise complaint descriptors, with a dotted vertical line marking the mean. Bars above the mean are shaded blue and those below are yellow. Only “Loud Music/Party” (≈4.3M) and “Loud Talking” (≈1.9M) exceed the average; all remaining descriptors fall below it. Notably, these two highest-frequency descriptors correspond to the broadest noise categories—Residential and Street/Sidewalk—mirroring the broader pattern in the dataset, where neighborhood- and street-level disturbances consistently dominate total complaints.

# Horizontal Bar Chart - Top 10 Descriptors

# Create df by descriptor and its cumulative count.
df_descriptor = (
    df.groupby(["descriptor"])["descriptor"].count().reset_index(name="count")
)

# Order df by largest to smallest count.
df_descriptor.sort_values(by="count", ascending=False, inplace=True)

# Get the top 10 descriptors of noise complaints.
df_top10 = df_descriptor.head(10)

# Call function to determine colors of horizontal bars.
colors_top_10 = mean_count_colors(
    df_top10,
    (0.3647, 0.4118, 0.6941),
    (0.8, 0.4, 0.4666666666666667),
    (0.8666666666666667, 0.8, 0.4666666666666667),
)

# Bar chart of top 10 descriptors with mean line and colors.

# Set color values.
above_color = (0.3647, 0.4118, 0.6941)
at_color = (0.8, 0.4, 0.4666666666666667)
below_color = (0.8666666666666667, 0.8, 0.4666666666666667)

# Set fig and ax size.
fig = plt.figure(figsize=(20, 12))
ax1 = fig.add_subplot(1, 1, 1)

# Create a horizontal bar chart with grid visible.
ax1.barh(df_top10["descriptor"], df_top10["count"], color=colors_top_10, zorder=3)
ax1.invert_yaxis()

# Format count labels and position labels at end of bars.
for i, v in enumerate(df_top10["count"]):
    ax1.text(
        v + (0.01 * max(df_top10["count"])),
        i,
        f"{v:,.0f}",
        va="center",
        fontsize=16,
    )

# Set custom color patches.
Above = mpatches.Patch(color=above_color, label="Above average")
At = mpatches.Patch(color=at_color, label="Within 1% of Average")
Below = mpatches.Patch(color=below_color, label="Below average")

# Calculate the mean.
mean_val = df_top10["count"].mean()

# Plot the mean line.
ax1.axvline(mean_val, color="gray", linestyle=":", linewidth=2)

# Annotate the mean line.
ax1.text(
    mean_val + (0.01 * max(df_top10["count"])),
    -0.5,  # position slightly above top bar
    f"Mean = {mean_val:,.0f}",
    color="gray",
    fontsize=17,
    fontstyle="italic",
)

# Set and format y-axis tick labels.
wrapped_labels = [
    "Loud Music/\nParty",
    "Banging/\nPounding",
    "Loud Talking",
    "Car/Truck Music",
    "Construction\nBefore/After\nHours",
    "Other",
    "Car/Truck Horn",
    "Barking Dog",
    "Construction\nEquipment",
    "Engine Idling",
]
ax1.set_yticks(range(len(wrapped_labels)))
ax1.set_yticklabels(wrapped_labels, fontsize=20)

# Set and format x-axis tick labels.
ax1.tick_params(axis="x", which="major", length=8, labelsize=18)
ax1.xaxis.set_major_formatter(ticker.StrMethodFormatter("{x:,.0f}"))

# Set minimum and maximum bounds for the x-axis.
ax1.set_xlim(0, 5_000_000)

# Set and format the axis labels and overall title.
ax1.set_xlabel("Number of Complaints", fontsize=32)
ax1.set_ylabel("Descriptor", fontsize=32)
ax1.set_title("Top 10 Noise Complaint Descriptors", fontsize=36, pad=20)

# Position and format the legend with enlarged color patches and extra space between lines and in the box.
ax1.legend(
    handles=[Above, At, Below],
    fontsize=16,
    loc="lower right",
    frameon=True,
    fancybox=True,
    framealpha=0.9,
    handlelength=2.5,
    handleheight=1.2,
    borderpad=1.2,
    labelspacing=0.8,
)

# Set plot style with grid on top of background,
ax1.grid(True, color="darkgray", linewidth=0.5, alpha=0.9, zorder=0)

# Make sure all text stays within figure bounds.
plt.tight_layout()

# Display the plot.
plt.show()

Spatial Distribution of Top 3 Descriptors

This map shows a sample of 10,000 noise complaints plotted at their exact reported locations. Descriptors were used instead of broad categories to provide more specific detail. The pattern closely mirrors the overall spatial distribution of complaints, with the highest concentrations appearing in Manhattan, Brooklyn, and the Bronx, and noticeably fewer points in Queens and Staten Island.

# Folium Map showing where a 10_000 sample of the top 3 descriptors have occured.

# Identify columns needed for map visuals.
map_columns = ['created_date', 'borough', 'descriptor', 'latitude', 'longitude']

# Create a copy of df with only the selected columns.
map_df = df[map_columns].copy()

def rgb_to_hex(rgb_tuple):
    """Convert rgb tuples to hex code"""
    return "#%02x%02x%02x" % tuple(int(x * 255) for x in rgb_tuple)


# Convert specific color values.
orange = rgb_to_hex((0.8980392156862745, 0.5254901960784314, 0.023529411764705882))
blue = rgb_to_hex((0.36470588235294116, 0.4117647058823529, 0.6941176470588235))
turquoise = rgb_to_hex((0.3215686274509804, 0.7372549019607844, 0.6392156862745098))
lime = rgb_to_hex((0.6, 0.788235294117647, 0.27058823529411763))
pink = rgb_to_hex((0.8, 0.3803921568627451, 0.6901960784313725))

# Top 3 ONLY

# Set color values.
orange = "#e58606"
blue = "#5d69b1"
turquoise = "#52bca3"
lime = "#99c945"
pink = "#cc61b0"

# Set the geographic center of New York City.
center_nyc = [40.6958, -73.9171]

# Create the map.
nyc_map = folium.Map(
    location=center_nyc,
    zoom_start=12,
    width="90%",
    height="100%",
    left="5%",
    right="5%",
    top="0%",
)

# Create the option for user to utilize different backgrounds.
tiles = ["OpenStreetMap", "StamenTerrain", "CartoDB Voyager", "CartoDB Positron"]
for tile in tiles:
    folium.TileLayer(tile).add_to(nyc_map)

folium.LayerControl().add_to(nyc_map)

# Pull a sample of the data to plot.
sample_df = map_df.sample(n=10000, random_state=1)

# Create a dictionary with the descriptor and corresponding color.
color_map = {
    "Loud Music/Party": lime,
    "Banging/Pounding": blue,
    "Loud Talking": pink,
}

# Add the color column based on the color map to the sample_df.
sample_df["color"] = sample_df["descriptor"].map(color_map).fillna(orange)

# Add colored circles for the top 3 descriptors only.
for _, row in sample_df[sample_df["color"] != orange].iterrows():
    try:
        folium.Circle(
            location=[row["latitude"], row["longitude"]],
            tooltip=row["descriptor"],
            popup=f"Date: {row['created_date']}\nBorough: {row['borough']}",
            radius=50,
            color=row["color"],
            fill=True,
            fill_color=row["color"],
            fill_opacity=0.2,
        ).add_to(nyc_map)
    except:
        pass

# Save generated map as html file.
nyc_map.save("nyc_map_top3.html")

# Display map.
nyc_map

Complaint Distribution by Day and Hour

The heatmap shows the distribution of noise complaints by day and hour. Weekend evenings generate the highest volume, with Saturday between 11 PM and midnight standing out as the peak period. In contrast, the 5 AM hour from Tuesday through Friday has the fewest complaints, marking the quietest time of the week.

# Heatmap

# Create a df with complaint count per hour per day of week. 
df_hour = df.groupby(["Hour", "Day Name"])["Hour"].count().reset_index(name="count")
df_hour = pd.DataFrame(df_hour)

# Rename column.
df_hour = df_hour.rename(columns={"Day Name": "DayName"})

# Re-arrange columns and rows for heatmap.
df_heatmap = df_hour.pivot(index="DayName", columns="Hour", values="count")

# Specify the order of the days.
day_order = [
    "Monday",
    "Tuesday",
    "Wednesday",
    "Thursday",
    "Friday",
    "Saturday",
    "Sunday",
]
df_heatmap = df_heatmap.reindex(day_order)

# Heatmap
# Set figure size.
plt.figure(figsize=(16, 8))

# Create the heatmap.
ax = sns.heatmap(
    df_heatmap,
    cmap="rocket_r",
    # cmap="YlGnBu",
    linewidths=0.5,
    cbar_kws={"label": "Number of Noise Complaints", "pad": 0.05},
)

# Make cbar label larger and well-spaced.
cbar = ax.collections[0].colorbar
cbar.ax.tick_params(labelsize=14)
cbar.ax.yaxis.label.set_size(18)
cbar.ax.yaxis.labelpad = 15

# Set x and y axis tick labels and appearance.
ax.set_xticks(np.arange(0.5, 24.5, 2))

ax.set_xticklabels(range(0, 24, 2), fontsize=14, rotation=0)

ax.set_yticklabels(ax.get_yticklabels(), fontsize=14, rotation=0)

# Set x and y axis labels and overall title.
plt.title("Noise Complaints by Day of Week and Hour of Day", fontsize=32)

plt.xlabel("Hour of Day", fontsize=22)

plt.ylabel("Day of Week", fontsize=22)

plt.tight_layout()

# Display the plot.
plt.show()

Borough-Level Complaint Counts

This map visualizes the total number of noise complaints by borough, with color intensity indicating complaint volume—darker shades reflect higher counts. The boroughs rank from most to least complaints as follows: Manhattan, Brooklyn, The Bronx, Queens, and Staten Island, which reports fewer than 200,000 complaints.

# Create df of complaint counts per year for each borough.
df_borough = df.groupby(["Year", "borough"])["Year"].count().reset_index(name="count")
df_borough = pd.DataFrame(df_borough)
df_borough["borough"] = df_borough["borough"].str.capitalize()

# Sum complaint count totals for all years across boroughs.
df_borough_sum = df_borough.groupby("borough")["count"].sum().reset_index()

# Choropleth Map

# Load NYC borough GeoJSON
with open("/Users/sarahsmith/Desktop/Python Data/nyc_boroughs_water.json", "r") as f:
    nyc_geo = json.load(f)

# Clean and prepare the borough df.
df_borough_sum = df_borough_sum[df_borough_sum["borough"] != "Unspecified"].copy()
# Remove spaces and capitalize the first letter of each borough name.
df_borough_sum["borough"] = df_borough_sum["borough"].str.strip().str.title()

# Build dictionary from borough df.
borough_to_count = df_borough_sum.set_index("borough")["count"].to_dict()

# Merge dictionary into the GeoJSON features.
for feature in nyc_geo["features"]:
    boro_name = feature["properties"]["BoroName"]
    feature["properties"]["count"] = borough_to_count.get(boro_name, 0)

# Set geographic center of NYC.
center_nyc = [40.6958, -73.9171]

# Create map with folium.
nyc_map2 = folium.Map(
    location=center_nyc,
    zoom_start=10,
    tiles="cartodbpositron",
    width="90%",
    height="100%",
    left="5%",
    right="5%",
    top="0%",
)

# Create choropleth map.
ch_map = folium.Choropleth(
    geo_data=nyc_geo,
    data=df_borough_sum,
    columns=["borough", "count"],
    key_on="feature.properties.BoroName",
    fill_color="BuPu",
    fill_opacity=0.7,
    line_opacity=0.4,
    legend_name="Noise Complaint Count by Borough",
    highlight=True,
).add_to(nyc_map2)

# Create and customize hover tooltip.
ch_map.geojson.add_child(
    folium.features.GeoJsonTooltip(
        fields=["BoroName", "count"],
        aliases=["Borough:", "Complaints:"],
        localize=True,
        labels=True,
        style=(
            "background-color: black; color: white; font-weight: bold;"
            "padding: 4px 6px; border-radius: 4px;"
            "line-height: 1.2; display: inline-block;"
        ),
        sticky=False,
    )
)

# Save the map.
nyc_map2.save("nyc_map_choropleth.html")

# Display the map.
nyc_map2

Relative Noisiness by Borough

These plots compare borough-level noisiness in 2010 (bottom) and 2020 (top). Each point represents a borough, with population on the x-axis, complaint count on the y-axis, and bubble size showing complaints per 1,000 residents. This metric is calculated by dividing the total number of complaints by the borough’s population and multiplying by 1,000. From 2010 to 2020, population decreased across all boroughs while noise complaints rose sharply. Staten Island’s complaints nearly doubled, and Queens, Manhattan, and Brooklyn each saw increases of more than 100,000, with the Bronx showing the largest jump at about 180,000. By 2020, the Bronx has the largest bubble, indicating the highest relative noisiness, while Brooklyn and Manhattan show slight decreases in noisiness relative to their population.

# Bubble plots for 2010 and 2020 for the relative noisiness of each borough.

# Extract complaint count totals for 2010 only for scatterplot.
df_borough_2010 = df_borough[df_borough["Year"] == 2010].copy()

# Extract complaint count totals for 2020 only for scatterplot.
df_borough_2020 = df_borough[df_borough["Year"] == 2020].copy()

# Create df of 2020 values.
df_decennial = pd.read_excel("/Users/sarahsmith/Desktop/Python Data/nyc-decennialcensusdata_v2.xlsx")
df_decennial_sum = df_decennial.groupby("GeogName")["Pop1"].sum().reset_index()

# Create df of 2010 values.
df_decennial2 = pd.read_excel("/Users/sarahsmith/Desktop/Python Data/nyc_decennialcensusdatachange-core-geographiesv2.xlsx", sheet_name=1)
df_decennial2_sum = df_decennial2.groupby("Borough")["Pop1"].sum().reset_index()

# Rename columns 
df_decennial_sum.rename(columns={"GeogName": "Borough"}, inplace=True)
df_borough_2020.rename(columns={"borough": "Borough"}, inplace=True)
df_borough_2010.rename(columns={"borough": "Borough"}, inplace=True)

# Standardize spacing and capitalization.
df_borough_2020["Borough"] = df_borough_2020["Borough"].str.strip().str.title()
df_decennial_sum["Borough"] = df_decennial_sum["Borough"].str.strip().str.title()

# Standardize spacing and capitalization.
df_borough_2010["Borough"] = df_borough_2010["Borough"].str.strip().str.title()
df_decennial2_sum["Borough"] = df_decennial2_sum["Borough"].str.strip().str.title()

# Merge population data with borough complaint counts.
df_population = pd.merge(df_borough_2020, df_decennial_sum, on="Borough", how="inner")

# Calculate the rate of noise complaints per 1000 residents.
df_population["complaints_per_1000"] = df_population["count"] / df_population["Pop1"] * 1000
df_population["complaints_per_1000"] = df_population["complaints_per_1000"].round(0).astype(int)

# Merge borough_sum with decennial_sum on borough for 2010 population and complaint counts.
df_population2 = pd.merge(df_borough_2010, df_decennial2_sum, on="Borough", how="inner")

# Calculate the rate of noise complaints per 1000 residents.
df_population2["complaints_per_1000"] = (
    df_population2["count"] / df_population2["Pop1"] * 1000
)
df_population2["complaints_per_1000"] = (
    df_population2["complaints_per_1000"].round(0).astype(int)
)

# Assign color values.
colors = Safe_5.mpl_colors

# Create two stacked subplots 2010 on the bottom, and 2020 on the top, sharing the x-axis.
fig, (ax2, ax1) = plt.subplots(2, 1, figsize=(18, 18), sharex=True)

# Plot 2010 scatterplot.
sns.scatterplot(
    data=df_population2,
    x="Pop1",
    y="count",
    size="complaints_per_1000",
    alpha=0.8,
    sizes=(600, 2400),
    hue="Borough",
    palette=colors,
    ax=ax1,
    legend=False,
)

# Set x and y labels and overall title for plot 1.
ax1.set_title(
    "Relative Noisiness by Borough in 2010\n(Complaints per 1,000 Residents)",
    fontsize=20,
)

ax1.set_xlabel("Population", fontsize=18)

ax1.set_ylabel("Noise Complaint Count", fontsize=18)

# Plot 2020 scatterplot.
sns.scatterplot(
    data=df_population,
    x="Pop1",
    y="count",
    size="complaints_per_1000",
    alpha=0.8,
    sizes=(600, 2400),
    hue="Borough",
    palette=colors,
    ax=ax2,
    legend="full",
)

# Set x and y labels and overall title for plot 1.
ax2.set_title(
    "Relative Noisiness by Borough in 2020\n(Complaints per 1,000 Residents)",
    fontsize=20,
)

ax2.set_xlabel("Population", fontsize=18)

ax2.set_ylabel("Noise Complaint Count", fontsize=18)

# Format the tick labels for both subplots and set grid to be visible.
for ax in [ax1, ax2]:
    ax.xaxis.set_major_formatter(ticker.StrMethodFormatter("{x:,.0f}"))
    ax.yaxis.set_major_locator(ticker.MaxNLocator(nbins=8))
    ax.yaxis.set_major_formatter(ticker.StrMethodFormatter("{x:,.0f}"))
    ax.tick_params(axis="x", which="major", length=8, labelsize=13)
    ax.tick_params(axis="y", which="major", length=8, labelsize=13)
    ax.set_facecolor("whitesmoke")
    ax.spines["top"].set_visible(True)
    ax.spines["right"].set_visible(True)
    ax.grid(True, color="darkgray", linewidth=0.5, alpha=0.7, zorder=0)
    for artist in ax.collections:
        artist.set_zorder(3)

# Remove the size legend, and reate only the borough legend to appear on plot 2 only.
handles, labels = ax2.get_legend_handles_labels()
filtered = [
    (h, l)
    for h, l in zip(handles, labels)
    if not l.replace(".", "", 1).isdigit() and l.lower() != "complaints_per_1000"
]
handles, labels = zip(*filtered)
ax2.legend(
    handles,
    labels,
    fontsize=14,
    loc="upper right",
    frameon=True,
    facecolor="#FEFEFE",
    edgecolor="lightgray",
    fancybox=True,
    framealpha=0.9,
    markerscale=2.5,
)

plt.tight_layout()

# Display the plot.
plt.show()

Reporting Modes for Noise Complaints

This visualization shows how frequently different reporting modes were used over time. Mobile reporting first appears in 2013 and grows steadily afterward. Both online and mobile reports rise sharply during the COVID period (2020–2023), with some fluctuations but an overall upward trend. Online reporting experiences a noticeable drop after 2023, followed by a gradual recovery, while mobile reporting continues a consistent upward climb from 2023 onward.

# Create a df with just the reporting modes and count.
df_report_type = (
    df.groupby(["Year", "open_data_channel_type"])["open_data_channel_type"]
    .count()
    .reset_index(name="count")
)

# Drop values that are unknown.
df_report_type = df_report_type[
    df_report_type["open_data_channel_type"] != "UNKNOWN"
].copy()

# Change from all caps to first letter capitalization.
df_report_type["open_data_channel_type"] = (
    df_report_type["open_data_channel_type"].str.strip().str.title()
)

# Remove values that are other.
df_report_type = df_report_type[
    df_report_type["open_data_channel_type"] != "Other"
].copy()

# Multiple Line Plot

# Set fig and ax size.
fig, ax = plt.subplots(figsize=(18, 10))

# Plot the lineplot with Seaborn.
sns.lineplot(
    data=df_report_type,
    x="Year",
    y="count",
    hue="open_data_channel_type",
    palette=Safe_3.mpl_colors,
    linewidth=3,
    marker="8",
    ax=ax,
)

# Format ticks
ax.tick_params(axis="x", labelsize=18)
ax.tick_params(axis="y", labelsize=18)

# Format x-axis
ax.set_xticks(sorted(df_report_type["Year"].unique()))

ax.set_xticklabels(sorted(df_report_type["Year"].unique()))

# Format y-axis
ax.yaxis.set_major_formatter(ticker.StrMethodFormatter("{x:,.0f}"))

# Set and format x and y axis labels and overall title.
plt.title("Noise Complaints by Mode of Reporting (2010–2025)", fontsize=36)

ax.set_xlabel("Year", fontsize=28)

ax.set_ylabel("Count", fontsize=28, labelpad=25)

# Get values for the legend and make the lines slightly thicker for better visibility.
handles, labels = ax.get_legend_handles_labels()
for h in handles:
    h.set_linewidth(5.0)

# Format legend text, placement and spacing.
plt.legend(
    handles,
    labels,
    title="Reporting Mode",
    title_fontsize=20,
    fontsize=18,
    loc="upper left",
    frameon=False,
    borderpad=1.2,
    labelspacing=0.8,
)

# Set plot style.
ax.grid(True, color="darkgray", linewidth=.5, alpha=0.9)
plt.tight_layout()

# Display plot.
plt.show()

Daily Trend in Complaint Counts

Because COVID had such a profound impact on daily life in 2020, this chart visualizes the day-to-day noise complaint counts alongside a 30-day rolling average. The 30-day rolling average smooths short-term fluctuations by taking the mean of each day’s value along with the 29 days before and after it. This highlights broader trends that may be harder to see in the raw daily counts. In 2020, April 9 shows a notable dip in complaints, followed by a steep rise from June through September. Two clear peaks stand out: June 21, 2020, with more than 12,500 complaints, and October 10, 2020, at just under 10,000 complaints, with pronounced declines between them. After October 10, overall complaint levels gradually trend downward until mid-2021, when new peaks appear in June and July of 2021.

# Daily Noise Complaint Trend with Rolling 30 Average (2020-2021)

# Create a df with daily complaint counts from 2010 to 2025.
df_daily = df.groupby(["Year", "Date_only"]).size().reset_index(name="Count")

# Confirm dates are pandas datetime objects.
df_daily["Date_only"] = pd.to_datetime(df_daily["Date_only"])

# Converts and stores the datetime object to an integer that counts from a particular date.
df_daily["Date_num"] = df_daily["Date_only"].apply(lambda x: x.toordinal()).astype(int)

# Use df_daily to create a df with monthly complaint counts.
df_monthly = df_daily.set_index("Date_only").resample("MS")["Count"].sum().reset_index()

# Use df_monthly to create a df with yearly complaint counts.
df_yearly = (
    df_monthly.groupby(df_monthly["Date_only"].dt.year)["Count"]
    .sum()
    .reset_index(name="yearly_count")
)

# Format the column Date_only values.
df_yearly["Date_only"] = pd.to_datetime(df_yearly["Date_only"].astype(str) + "-01-01")

# Create a df with daily complaint counts with only the years 2020 and 2021.
selected_years = [2020, 2021]
df_year_daily = df_daily[df_daily["Year"].isin(selected_years)]

# Daily Noise Complaint Trend with Rolling 30 Average (2020-2021)

# Create copy of df_year_daily.
df_year_daily = df_year_daily.copy()

# Calculate rolling average.
df_year_daily["Rolling30"] = (
    df_year_daily["Count"].rolling(window=30, center=True).mean()
)

# Establish figure, ax and size.
fig, ax = plt.subplots(figsize=(18, 12))

# Create daily line plot.
sns.lineplot(
    df_year_daily,
    x="Date_only",
    y="Count",
    color=(0.5333333333333333, 0.8, 0.9333333333333333),
    alpha=1.0,
    label="Daily Complaints",
)

# Create rolling 30 line plot.
sns.lineplot(
    df_year_daily,
    x="Date_only",
    y="Rolling30",
    color=(0.8, 0.4, 0.4666666666666667),
    lw=2.5,
    label="30-Day Rolling Avg",
)

# Prepare axes, labels and ticks.
ax = plt.gca()

ticks = ax.get_xticks()
ax.set_xticks(ticks)

ax.set_xticklabels(
    [datetime.fromordinal(int(t)).strftime("%Y-%m") for t in ticks],
    rotation=45,
    ha="right",
)

ax.xaxis_date()

# Set major ticks once per year.
major_locator = mdates.YearLocator()
major_formatter = mdates.DateFormatter("%b %Y")

# Set minor ticks once per quarter and format as month name and year.
minor_locator = mdates.MonthLocator(bymonth=[1, 4, 7, 10])
minor_formatter = mdates.DateFormatter("%b %Y")

# Set placement of major and minor ticks and implement display formatting.
ax.xaxis.set_major_locator(major_locator)
ax.xaxis.set_major_formatter(major_formatter)
ax.xaxis.set_minor_locator(minor_locator)
ax.xaxis.set_minor_formatter(minor_formatter)

# Set how the axes labels will appear.
ax.tick_params(axis="x", which="major", length=8, labelsize=18)
ax.tick_params(axis="x", which="minor", length=4, labelsize=13, labelrotation=45)
ax.tick_params(axis="y", which="major", length=8, labelsize=18)
ax.yaxis.set_major_formatter(ticker.StrMethodFormatter("{x:,.0f}"))

# Set and format title and axes labels.
plt.title(
    "Daily Noise Complaint Counts with 30-Day Rolling Average\n(2020-2021)", fontsize=36
)

plt.xlabel("Date", fontsize=28)

plt.ylabel("Complaint Count", fontsize=28)

# Format and position legend for aesthetics and optimal visibility.
plt.legend(
    fontsize=18,  # text size
    title="Trend Type",  # optional legend title
    title_fontsize=18,  # title size
    loc="upper right",  # legend position
    frameon=False,  # remove box if you prefer minimal look
    labelspacing=0.8,  # spacing between lines
)

# Set the plot style.
ax.grid(True, color="darkgray", linewidth=.5, alpha=0.9)

# Keep text within figure bounds.
plt.tight_layout()

# Display the plot.
plt.show()

Interactive: Daily Trend in Complaint Counts

This figure displays the same daily trend and 30-day rolling average shown in the previous plot, but in an interactive format. Hover over the line to view exact complaint counts for each day.

# Interactive - Daily trend during 2020-2021 with rolling 30 day average.

# Calculate 30-day rolling average
df_year_daily["Rolling30"] = (
    df_year_daily["Count"].rolling(window=30, center=True).mean().round(0)
)

# Melt the dataframe so both lines plot easily
df_melt = df_year_daily.melt(
    id_vars="Date_only",
    value_vars=["Count", "Rolling30"],
    var_name="Trend Type",
    value_name="Complaint Count",
)
# Extract two nice hex colors from the palette
colors = [Safe_3.hex_colors[0], Safe_3.hex_colors[1]]

# Create the interactive line plot
fig = px.line(
    df_melt,
    x="Date_only",
    y="Complaint Count",
    color="Trend Type",
    color_discrete_sequence=colors,
    labels={
        "Date_only": "Date",
        "Complaint Count": "Complaint Count",
        "Trend Type": "Trend Type",
    },
    title="Daily Noise Complaint Counts with 30-Day Rolling Average (2020–2021)",
)

# Customize the two traces: daily line (thin) and rolling 30-day line (thicker)
fig.update_traces(
    selector=dict(name="Count"),  # Daily complaints
    line=dict(width=0.85, dash="solid"),  # thinner solid line
    opacity=0.55,  # light transparency for daily noise
)

fig.update_traces(
    selector=dict(name="Rolling30"),  # Rolling average
    line=dict(width=1.75, dash="solid"),  # thicker line
    opacity=0.9,  # higher visibility
)

# Format labels and overall title, major ticks, legend and overall plot style.
fig.update_layout(
    title_font=dict(size=16),
    xaxis=dict(
        title="Date",
        tickformat="%b %d %Y",
        tickfont=dict(size=10),
        tickangle=315,
        dtick="M3",
    ),
    yaxis=dict(title="Complaint Count", tickformat=",", tickfont=dict(size=10)),
    legend=dict(
        font=dict(size=8), x=0.02, y=0.98, bgcolor="rgba(255,255,255,0)", borderwidth=0
    ),
    template="plotly_white",
    width=800,
    height=460,
)

# Add minor ticks representing each month with dotted grid lines.
fig.update_xaxes(
    minor=dict(
        dtick="M1", showgrid=True, gridcolor="lightgray", gridwidth=0.5, griddash="dot"
    )
)

# Add date formatting to the hover labels, and revise how complaint count is displayed.
fig.update_traces(hovertemplate="%{x|%b %d, %Y}<br>Complaints: %{y:,.0f}")

# Display the plot.
fig.show()

Monthly and Yearly Trends in Complaints

This chart displays yearly and monthly noise complaint trends together, showing a clear upward trajectory over time. At the yearly level, 2011 has the lowest total at roughly 175,000 complaints, while 2020 stands out as the peak year with approximately 800,000 complaints.

The monthly data reveal dramatic shifts as well. One of the most notable increases occurs from April 2020 (≈17,948 complaints) to June 2020 (≈127,000), reflecting the sharp rise in activity during the early months of the pandemic. The lowest monthly counts include November 2017 at around 5,500 complaints and December 2024 at approximately 12,500, illustrating how the overall baseline has risen over the years.

# Static - Monthly and Yearly Complaint Trends (2010-2025)

# Establish figure and ax size.
fig, ax = plt.subplots(figsize=(18, 12))

# Plot the monthly trend line.
plt.plot(
    df_monthly["Date_only"],
    df_monthly["Count"],
    color=(0.8, 0.38, 0.69),
    alpha=0.9,
    label="Monthly Total",
)

# Plot the yearly trend line.
plt.plot(
    df_yearly["Date_only"],
    df_yearly["yearly_count"],
    color=(0.894, 0.325, 0.047),
    linewidth=3.5,
    marker="o",
    label="Yearly Total",
)

# Set placement and how x and y ticks will appear.
ax.xaxis.set_major_locator(mdates.MonthLocator(interval=6))
ax.xaxis.set_major_formatter(mdates.DateFormatter("%b %Y"))
ax.xaxis.set_minor_locator(mdates.MonthLocator(interval=6))
ax.yaxis.set_major_formatter(ticker.StrMethodFormatter("{x:,.0f}"))

# Set placement of major and minor ticks and implement display formatting.
ax.tick_params(axis="x", labelrotation=30, labelsize=14)
ax.tick_params(axis="x", which="major", length=8, labelsize=16)
ax.tick_params(axis="x", which="minor", length=4, labelsize=12)
ax.tick_params(axis="y", which="major", length=8, labelsize=16)

# Set and format labels and title.
plt.title("Monthly and Yearly Noise Complaint Trends (2010–2025)", fontsize=29)

plt.xlabel("Month", fontsize=22)

plt.ylabel("Number of Complaints", fontsize=22)

# Format and position legend for aesthetics and optimal visibility.
plt.legend(
    fontsize=16,  # text size
    title="Trend Type",  # optional legend title
    title_fontsize=18,  # title size
    loc="upper left",  # legend position
    frameon=False,  # remove box if you prefer minimal look
    labelspacing=0.8,  # spacing between lines
)

# Set grid style.
ax.grid(True, color="darkgray", linewidth=.5, alpha=0.9)

# Keep text within figure bounds.
plt.tight_layout()

# Display the plot.
plt.show()

Interactive: Monthly and Yearly Trends in Complaints

This interactive version presents the same yearly and monthly trends as the static chart. Use your mouse to hover over points or lines to view exact complaint counts for each year and month.

# Interactive Monthly and Yearly Trends

# Confirm that Date_only is datetime
df_monthly["Date_only"] = pd.to_datetime(df_monthly["Date_only"])
df_yearly["Date_only"] = pd.to_datetime(df_yearly["Date_only"])

# Extract two nice hex colors from the palette
colors = [Safe_7.hex_colors[5], Safe_7.hex_colors[6]]

# Plot the monthly line
fig = px.line(
    df_monthly,
    x="Date_only",
    y="Count",
    color_discrete_sequence=[colors[0]],
    labels={"Date_only": "Month", "Count": "Number of Complaints"},
    title="Monthly and Yearly Noise Complaint Trends (2010–2025)",
)

# Rename the monthly line and add it to the legend.
fig.data[0].name = "Monthly Total"
fig.data[0].showlegend = True

# Plot the yearly trend.
fig.add_scatter(
    x=df_yearly["Date_only"],
    y=df_yearly["yearly_count"],
    mode="lines+markers",
    line=dict(color=colors[1], width=2.5),
    name="Yearly Total",
)

# Format the text and appearance of the hover tooltip.
fig.update_traces(hovertemplate="<b>%{x|%b %Y}</b><br>Complaints: %{y:,.0f}")

# Specify and format x and y axes, legend and plot style.
fig.update_layout(
    title_font=dict(size=24),
    xaxis=dict(
        title="Month",
        tickformat="%b %Y",
        tickangle=315,
        tickfont=dict(size=11),
        tickmode="linear",
        dtick="M12",
        showgrid=True,
        gridcolor="lightgray",
        showline=True,
    ),
    yaxis=dict(
        title="Number of Complaints",
        tickformat=",",
        tickfont=dict(size=11),
        showgrid=True,
        gridcolor="lightgray",
    ),
    legend=dict(
        font=dict(size=10), x=0.02, y=0.98, bgcolor="rgba(255,255,255,0)", borderwidth=0
    ),
    template="plotly_white",
    width=800,
    height=460,
)

# Add minor ticks representing every six months with dotted grid lines.
fig.update_xaxes(
    minor=dict(
        dtick="M6", showgrid=True, gridcolor="lightgray", gridwidth=0.5, griddash="dot"
    )
)

fig.show()

Conclusion

Across all visualizations, a consistent story emerges: NYC noise complaints have increased over time, with 2020 representing a major turning point. Residential and Street/Sidewalk disturbances dominate the dataset, reflected both in broad categories and in specific descriptors. Spatial patterns show the highest concentrations in Manhattan, Brooklyn, and the Bronx. COVID-era shifts produced sharp spikes in complaints, changes in noise source proportions, and elevated baseline levels that continue into recent years. Taken together, the findings suggest that noise in NYC is not only persistent but growing, both in volume and impact.