Enviornmental Citations in Baltimore (2004-2023)

Introduction

This analysis is taking a look at the number of environmental citations given in the city of Baltimore in the years 2004-2023. This analysis displays things like at what points in time were the most citations given, at what times of the year the most citations are given, and what areas of Baltimore receive the most citations. There are also graphs that demonstrate the total dollar amount of fines given along with the average amount of fines. This data could be useful for individuals thinking of moving the Baltimore, understanding how much fines are, where they are most prominent, and the most common violations. This can also aid current Baltimore residents to help them better maintain their property and daily actions. It may also be useful for the Baltimore government to understand how much money comes from issuing environmental citations.

Dataset

The data used in this analysis was taken from the Baltimore City Government website. This site provides a great amount of public data from Baltimore from government spending to traffic violations. The given data set provides variables like violation date, which is the time and date the violation was issued. Agency which is the Baltimore city agency that issued the fine. And fine amount which is the the dollar value of an issued citation. From the given data I was able to determine the total amount of citations given based on agency, neighborhood, etc… I was also able to formulate new variables like average fine of a given citation or the name of the month the violation occurred. A lot of the charts below rely on the date, in order to properly display the days and months properly I needed to create seperate variables for weekday, month, month name, and the quarter. The findings below are mostly dependent on variables that deal with time, fine amounts, and the approximate location of the violation.

Findings

Scatter plot

This is a scatter plot of the total number of violations given in a certain year and month, The data goes from October of 2004 and ends in May of 2023. As displayed below the largest volume of violations reported occurred in the later part of 2010, with the months of July and August being the highest months within the 19 year span. While most points on the plot are smaller than 2010 there does seem to be an uptick in violations when the weather is warmer. It is important to note the most frequent violation was overgrown grass which is much more likely to occur during the spring and summer. Potential and current residents in Baltimore should be extra aware of their property during these high volume times.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings

warnings.filterwarnings("ignore")

path = "C:/users/rzang/OneDrive/Documents/Python_files"

filename = path + 'Env_Citations.csv'

df = pd.read_csv(filename, nrows=5)
df = pd.read_csv(filename, usecols = ['ViolationDate'])
df['ViolationDate'] = pd.to_datetime(df['ViolationDate'], format = '%Y/%m/%d %I:%M:%S+00')
df['Year'] = df['ViolationDate'].dt.year
df['Month'] = df['ViolationDate'].dt.month
df['MonthName'] = df['ViolationDate'].dt.strftime('%B')
x = df.groupby(['Year','Month'])['Year'].count().reset_index(name='count')
x = pd.DataFrame(x)
x['count_hundreds'] = round(x['count']/10, 0)
del x['count_hundreds']
x2 = x.groupby(['Year', 'Month'])['count'].sum().reset_index()
x2 = pd.DataFrame(x2)
x2['count_hundreds'] = round(x['count']/10, 0)
x2 = x2.dropna()
plt.figure(figsize=(18,10))

plt.scatter(x2['Month'], x2['Year'], marker='o', cmap='summer', c=x2['count'], s=x2['count'], edgecolors='black')

plt.title('Citations by Month and Year', fontsize=18)
plt.xlabel('Months of the Year', fontsize=14)
plt.ylabel('Year', fontsize=14)

cbar = plt.colorbar()
cbar.set_label('Number of Citations', rotation = 270, fontsize=14, color='black', labelpad=30)

my_colorbar_ticks = [*range(0, (x2['count'].max()), 100  )]
cbar.set_ticks(my_colorbar_ticks)

my_x_ticks = [*range( x2['Month'].min(), x2['Month'].max()+1, 1 )]
plt.xticks(my_x_ticks, fontsize=14, color='black')

my_y_ticks = [*range( x2['Year'].min(), x2['Year'].max()+1, 1 )]
plt.yticks(my_y_ticks, fontsize=14, color='black')

plt.show()

Bar Chart

The Bar Chart below shows the top 10 neighborhoods in Baltimore with the most environmental citations, along with the average amount each fine is in that neighborhood. It is worth noting that the Broadway East Neighborhood had the most citations by a very wide margin at 6,277 and an average fine amount of $297.75. They were excluded from this graph to show a much cleaner visualization. The next two highest neighborhoods in terms of violation count are Coldstream Homestead and Central Park Heights. The two neighborhoods with the highest fines on average are Sandtown-Winchester and East Baltimore Midway. These could be areas that have more oversight or are more prone to certain violations.

df = pd.read_csv(filename, usecols = ['Agency'])
df = pd.read_csv(filename, usecols = ['Neighborhood', 'FineAmount', 'ViolationDate'])
df['ViolationDate'] = pd.to_datetime(df['ViolationDate'], format = '%Y/%m/%d %I:%M:%S+00')
df['Hour'] = df.ViolationDate.dt.hour
df['day'] = df.ViolationDate.dt.day
df['Month'] = df.ViolationDate.dt.month
df['Year'] = df.ViolationDate.dt.year
df['WeekDay'] = df.ViolationDate.dt.strftime('%a')
df['MonthName'] = df.ViolationDate.dt.strftime('%b')
x = df.groupby(['Neighborhood']).agg({'Neighborhood':['count'], 'FineAmount':['sum', 'mean']}).reset_index()
x.columns = ['Neighborhood','count','TotalFines','AverFine']
x = x.sort_values('count',ascending=False)
x.reset_index(inplace=True, drop=True)
d1 = x.loc[1:10]
def autolabel(these_bars, this_ax, place_of_decimals, symbol):
    for each_bar in these_bars: 
        height = each_bar.get_height()
        this_ax.text(each_bar.get_x()+each_bar.get_width()/2, height*1.01, symbol+format(height, place_of_decimals),
                    fontsize=11, color='black', ha='center', va='bottom')

fig = plt.figure(figsize=(18, 10))
ax1 = fig.add_subplot(1,1,1)
ax2 = ax1.twinx()
bar_width = 0.4

x_pos = np.arange(10)
count_bars = ax1.bar(x_pos-(0.5*bar_width), d1['count'], bar_width, color='gray', edgecolor='black', label='Citation Count')
aver_fine_bars = ax2.bar(x_pos+(0.5*bar_width), d1['AverFine'], bar_width, color='green', edgecolor='black', label='Average Fine')


ax1.set_ylabel('Count of Violations', fontsize=18, labelpad=20)
ax2.set_ylabel('Average Fine', fontsize=18, rotation=270, labelpad=20)
ax1.tick_params(axis='y', labelsize=14)
ax2.tick_params(axis='y', labelsize=14)

plt.title('Violation Count and Average Fine\n Top 10 Neighborhoods', fontsize=18)
ax1.set_xticks(x_pos)
ax1.set_xticklabels(d1['Neighborhood'], fontsize=14, rotation=315)

count_color, count_label = ax1.get_legend_handles_labels()
fine_color, fine_label = ax2.get_legend_handles_labels()
combined_handles = count_color + fine_color
combined_labels = count_label + fine_label
legend = ax1.legend(combined_handles, combined_labels, loc= 'upper right', frameon=True, ncol=1, shadow=True,
                    borderpad=1, fontsize=14)

ax1.set_ylim(0, d1['count'].max()*1.50)

ax2.set_ylim(0, d1['AverFine'].max()*1.50)

autolabel(count_bars, ax1, '.0f', '')
autolabel(aver_fine_bars, ax2, '.2f', '$')

plt.show()

Line Plot

This line plot shows the total dollar amount of fines given during each month of the 19 year span. Each line represents a day of the week, this is meant to show what specific month’s or days of the week have the highest amount of fines. The plot shows that fines were given out mostly during week days with Thursday and Tuesday having the highest points on the plot. Saturday and Sunday were consistently the lowest volume of fines, significantly below all regular week days for the most part. Similar to the scatter plot, the months at which the weather begins to change sees the highest amount in fines. The highest points on this graph falling in April and September. Once again showing that the time of year can impact the frequency of violations reported.

fine_df = df.groupby(['Month', 'WeekDay'])['FineAmount'].sum().reset_index(name='TotalFines')
from matplotlib.ticker import FuncFormatter


fig = plt.figure(figsize = (18,10))
ax = fig.add_subplot(1,1,1)

my_colors = {'Mon':'blue',
             'Tue':'red',
             'Wed':'green',
             'Thu':'black',
             'Fri':'purple',
             'Sat':'brown',
             'Sun':'orange',}


for key, grp in fine_df.groupby('WeekDay'):
    grp.plot(ax=ax, kind = 'line', x='Month', y= 'TotalFines', color = my_colors[key], label=key, marker='8')

plt.title('Total Fines by Month', fontsize = 18)
ax.set_xlabel('Month', fontsize = 18)
ax.set_ylabel('Total Fines', fontsize=18, labelpad=20)
ax.tick_params(axis = 'x', labelsize=14, rotation = 0)
ax.tick_params(axis = 'y', labelsize=14, rotation = 0)

ax.set_xticks(np.arange(1, 13))

handles, labels = ax.get_legend_handles_labels()
handles = [ handles[1], handles[5], handles[6],handles[4],handles[0],handles[2],handles[3] ]
labels = [ labels[1], labels[5], labels[6], labels[4], labels[0], labels[2], labels[3] ]
plt.legend(handles, labels, loc='best', fontsize=14, ncol = 1)

ax.yaxis.set_major_formatter( FuncFormatter( lambda x, pos: f'${x:,.0f}')) 

plt.show()

Stacked Bar Chart

This stacked bar chart is similar to the previous line plot but a little easier to understand. The chart shows the total dollar value of fines for every month in the data set. The stacked bars each represent a day in the week, the thicker the bar the more money it represents. Like the prior plot the results are the same, The highest amount of fines come when the weather is the nicest and days in the middle of the week come with the most fines. This further reiterates that all Baltimore citizens should be resonsible with their property expecially during warm months.

stacked_df = df.groupby(['Month', 'WeekDay'])['FineAmount'].sum().reset_index(name='TotalFines')

stacked_df = stacked_df.pivot(index='Month', columns='WeekDay', values='TotalFines')
day_order = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
stacked_df = stacked_df.reindex(columns=reversed(day_order))
fig = plt.figure(figsize=(18,10))
ax = fig.add_subplot(1,1,1)

stacked_df.plot(kind='bar', stacked=True, ax=ax)

plt.ylabel('Total Fines', fontsize=18, labelpad=10)
plt.title('Total Fines Issued by Month and Day \n Stacked Bar Plot', fontsize=18)
plt.xticks(rotation=0, horizontalalignment = 'center', fontsize=14)

plt.yticks(fontsize=14)

plt.xlabel('Month', fontsize = 18)

handles, labels = ax.get_legend_handles_labels()
handles = [ handles[6], handles[5], handles[4],handles[3],handles[2],handles[1],handles[0] ]
labels = [ labels[6], labels[5], labels[4], labels[3], labels[2], labels[1], labels[0] ]
plt.legend(handles, labels, loc='best', fontsize=12, ncol = 1)

ax.yaxis.set_major_formatter( FuncFormatter( lambda x, pos: f'${x:,.0f}')) 

plt.show()

Pie Chart

This pie chart represents the total amount of fines collected through environmental citations as well as the quarter they were collected in. The results show that the majority of fines collected were in quarter 3 which are the months of July, August, and September. With quarter 1, the months of January, February, and March being the quarter with the least loans collected. This further supports the previous findings that more violations are issued when the weather is warmer. This data might be more useful for the city so they can understand how much income they are receiving from environmental citations, which could lead to a change in policies.

df['Quarter'] = 'Quarter' + df.ViolationDate.dt.quarter.astype('string')

pie_df = df.groupby(['Quarter', 'MonthName'])['FineAmount'].sum().reset_index(name='TotalFines')
number_outside_colors = len(pie_df.Quarter.unique())
outside_color_ref_number = np.arange(number_outside_colors)*4
fig = plt.figure(figsize=(10,10))
ax = fig.add_subplot(1,1,1)

colormap = plt.get_cmap("tab20c")
outer_colors = colormap(outside_color_ref_number)

all_fines = pie_df.TotalFines.sum()

pie_df.groupby(['Quarter'])['TotalFines'].sum().plot(
        kind = 'pie', radius =1, colors = outer_colors, pctdistance = 0.85, labeldistance = 1.1, 
        wedgeprops = dict(edgecolor='White'), textprops= {'fontsize':16},
        autopct = lambda p: '{:.2f}%\n(${:,.0f})'.format(p, (p/100)*all_fines),
        startangle = 90)
hole = plt.Circle((0, 0), 0.3, fc='white')
fig1 = plt.gcf()
fig1.gca().add_artist(hole)

ax.yaxis.set_visible(False)
plt.title('Total Fines by Quarter', fontsize = 18)

ax.text(0, 0, f'Total Fines\n${all_fines:,.0f}', size = 18, ha='center', va = 'center')

ax.axis('equal')

plt.tight_layout()

plt.show()

Conclusion

Understanding the correlation of all of these graphs are important as we can understand how environmental citations effects Baltimore’s residence and government. A trend that held true through every graph is that during the warm weather months the number of citations given rose. During the colder months there were much fewer citations, quarter 3 saw 28.12% of the total fines collected. Violations are issued mainly in the middle of the week, the multi-line chart showed that the highest volume of fines were issued on Tuesday’s, Wednesday’s, and Thursday’s. Location can also play a role the Broadway East neighborhood has by far the largest count of total violations. With all of this information we can assume that an individual living in the Broadway East Neighborhood on a Thursday in the 3rd quarter of the year 2010 is more likely to receive a citation than anyone else. Although not displayed on any of the graphs above by far the most frequent violation is overgrown grass. All of this information can be used by current Baltimore residents to help avoid future violations. The government is much more interested in the money that is brought in through these citations which is a total of $2,753,593 over the 19 year span this data set covers. This is not a very large amount of money especially to a city government, if they feel they need more income the city could choose to change their policies which could lead to a higher volume of citations.