Data Visualization Python Project

import os
os.environ["QT_QPA_PLATFORM_PLUGIN_PATH"] = "D:/Anaconda3/Library/plugins/platforms"

Introduction

This data set exists to display reported parking citations in Los Angeles, California from the year 2001 to 2021. This data is sourced from the city of Los Angeles’ Department of Transportation. It contains 22 variables and over 13 million rows of data. This project utilizes the data set to create visualizations that explain when someone is likely to receive a parking citation and how much money is collected by the city from these citations.

Basic Descriptive Statistics

As previously stated, there is over 13 million rows of data with 22 variables such as Ticket Number, Violation Code, and Fine Amount. The types of data for each variable vary from character, integer, to logic. The variables that were used to create the visualizations include Issue Date, Fine Amount, and Make. As seen in the plots below, the year with the most amount of parking citations was in 2017. The most common make of car that received a parking citation was a Toyota. The most likely time and day of the week to receive a citation is Tuesday at 12pm.

Citations by Month/Year

The year with the most number of citations occurred in 2017. One noticeable difference between the number of citations given each month and year is seen in the months of April - September in the year 2020. This is likely due to the rise of the Covid-19 Pandemic where there was in increase in the number of people staying indoors. This would likely mean that less people were out driving; consequently, less citations were given.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly.graph_objects as go
from matplotlib.ticker import FuncFormatter
import warnings

warnings.filterwarnings("ignore")
df = pd.read_csv("parking-citations.csv")

df = df.drop(columns = ["Ticket number", "Meter Id", "Marked Time", "Plate Expiry Date", "VIN", "Body Style", "Agency", "Agency Description", "Body Style Description"])

df['Ticket Date'] = pd.to_datetime(df['Issue Date'], format = '%Y-%m-%d')
df = df.drop(columns = ["Issue Date"])

df['Year'] = df['Ticket Date'].dt.year
df['Month'] = df['Ticket Date'].dt.month
df['Day'] = df['Ticket Date'].dt.day
df['DayName'] = df['Ticket Date'].dt.strftime('%A')
df['WeekDay'] = df['Ticket Date'].dt.strftime('%a')

df2 = df[['Year','Month']].copy()

df2 = df2[df2['Year'].notna() & df2['Month'].notna()]

df2 = df2[(df2.Year <= 2021) & (df2.Year >= 2015)]

df2 = df2.groupby(['Year', 'Month'])['Year'].count().reset_index(name = 'Count')
df2 = pd.DataFrame(df2)

df2['Year'] = df2['Year'].astype('int')
df2['Month'] = df2['Month'].astype('int')

df3 = df2.groupby(['Year','Month'])['Count'].sum().reset_index()
df3 = pd.DataFrame(df3)
df3['count_hundreds'] = round(df3['Count']/100, 0)
df3 = df3[df3.Year <= 2021]

plt.figure(figsize = (18, 10))

plt.scatter(df3['Month'], df3['Year'], marker = '8', cmap = 'viridis', c = df3['count_hundreds'], s = df3['count_hundreds'], edgecolors = 'black')

plt.title('Citations Given by Month and Year', fontsize = 18)
plt.xlabel('Months of the Year', fontsize = 14)
plt.ylabel('Year', fontsize = 14)

cbar = plt.colorbar()
cbar.set_label('Number of Citations', rotation = 270, fontsize = 14, color = 'black', labelpad = 30)

my_colorbar_ticks = [*range(100, int(df3['count_hundreds'].max()), 100)]
cbar.set_ticks(my_colorbar_ticks)

my_colorbar_tick_labels = [*range(1000, int(df3['Count'].max()), 10000)]
my_colorbar_tick_labels = ["{:,}".format(each) for each in my_colorbar_tick_labels]
cbar.set_ticklabels(my_colorbar_tick_labels)

my_x_ticks = [*range(df3['Month'].min(), df3['Month'].max()+1, 1)]
plt.xticks(my_x_ticks, fontsize = 14, color = 'black');

my_y_ticks = [*range(df3['Year'].min(), df3['Year'].max()+1, 1)]
plt.yticks(my_y_ticks, fontsize = 14, color = 'black');

plt.show();

Top 10 Cars

The most common car to receive a citation was a Toyota. Over the course of 20 years, Toyota drivers received 2,144,062 citations with an average of 71.27 US Dollars per citation. Nissan drivers were the fourth most common make of car to receive a citation with around 941,500 but had the highest average fine with 72.12 US Dollars. Dodge drivers were the 10th most common make of car to receive a citation with around 390,009 and an average fine of 71.59 US Dollars.

df4 = df[['Make','Fine amount']].copy()

df4 = df[df['Make'].notna() & df['Fine amount'].notna()]

tp = df4['Make'].value_counts()[0:10]
top_cars = ['TOYT', 'HOND', 'FORD', 'NISS', 'CHEV', 'BMW', 'MERZ', 'VOLK', 'HYUN', 'DODG']

df4 = df4[df4['Make'].isin(top_cars)]

x = df4.groupby(['Make']).agg({'Make':['count'], 'Fine amount':['sum', 'mean']}).reset_index()

x.columns = ['Make', 'Count', 'TotalFines', 'AvgFine']
x = x.sort_values('Count', ascending = False)

def autolabel(these_bars, this_ax, place_of_decimals, symbol):
    for each_bar in these_bars:
        height = each_bar.get_height()
        this_ax.text(each_bar.get_x()+each_bar.get_width()/2, height*1.01, 
                     symbol+"{:,}".format(height, place_of_decimals),
                     fontsize = 11, color = 'black', ha = 'center', va = 'bottom')
                     

p = x[0:10]

fig = plt.figure(figsize = (25, 12))
ax1 = fig.add_subplot(1, 1, 1)
ax2 = ax1.twinx()
bar_width = 0.4

x_pos = np.arange(10)
count_bars = ax1.bar(x_pos-(0.5*bar_width), p.Count, bar_width, color = 'red', edgecolor = 'black', 
                     label = 'Citation Count')
avg_fine_bars = ax2.bar(x_pos+(0.5*bar_width), round(p.AvgFine, 2), bar_width, color = 'blue', edgecolor = 'black', 
                        label = 'Average Fine')

ax1.set_xlabel('Vehicle Make', fontsize = 18)
ax1.set_ylabel('Count of Citations', fontsize = 18, labelpad = 20)
ax2.set_ylabel('Average Fine', fontsize = 18, rotation = 270, labelpad = 20)
ax1.tick_params(axis = 'y', labelsize = 14)
ax2.tick_params(axis = 'y', labelsize = 14)

plt.title('Citation Count & Average Fine Analysis\n Top 10 Most Frequently Cited Makes of Cars', fontsize = 18)
ax1.set_xticks(x_pos)
ax1.set_xticklabels(p.Make, fontsize = 14)

count_color, count_label = ax1.get_legend_handles_labels()
fine_color, fine_label = ax2.get_legend_handles_labels()
legend = ax1.legend(count_color + fine_color, count_label + fine_label, loc = 'upper left', frameon = True, ncol = 1, shadow = True, borderpad = 1, fontsize = 11, bbox_to_anchor = (1.02, 1.01))
ax1.set_ylim(0, p.Count.max()*1.2)

autolabel(count_bars, ax1, '.0f', '')
autolabel(avg_fine_bars, ax2, '.2f', '$')
plt.show();

Total Fines by Hour/Day of the Week

The most common time and day of the week to receive a citation is 12pm on a Tuesday. There is a significant uptick in the number of citations given around 7 and 8 am for each day of the week. This is likely due to the number of people rushing to work in the morning to avoid being late. The weekend is when there is a noticeable drop in the number of citations given.

fines_df = df[["Issue time", "Fine amount", "Year", "Month", "WeekDay"]].copy()
fines_df = fines_df.dropna()

fines_df["Hour"] = fines_df["Issue time"] // 100

fines_df = fines_df.drop(columns = ["Issue time"])

fines_df2 = fines_df.groupby(["Hour", "WeekDay"])["Fine amount"].sum().reset_index(name = "TotalFines")

fig = plt.figure(figsize = (18, 10))
ax = fig.add_subplot(1, 1, 1)

my_colors = {"Mon" : "blue",
             "Tue" : "red",
             "Wed" : "green",
             "Thu" : "gray",
             "Fri" : "purple",
             "Sat" : "gold",
             "Sun" : "brown"}

for key, grp in fines_df2.groupby(["WeekDay"]):
    grp.plot(ax = ax, kind = "line", x = "Hour", y = "TotalFines", color = my_colors[key], label = key, marker = "8")

plt.title("Total Fines Issued by Hour & Day", fontsize = 18)
ax.set_xlabel("Hour (24 Hours Interval)", fontsize = 18)
ax.set_ylabel("Total Fines ($M)", fontsize = 18, labelpad = 20)
ax.tick_params(axis = "x", labelsize = 14, rotation = 0)
ax.tick_params(axis = "y", labelsize = 14, rotation = 0)

ax.set_xticks(np.arange(24))

handles, labels = ax.get_legend_handles_labels()
handles = [handles[1],handles[5],handles[6],handles[4],handles[0],handles[2],handles[3]]
labels = [labels[1],labels[5],labels[6],labels[4],labels[0],labels[2],labels[3]]
plt.legend(handles, labels, loc = "best", fontsize = 14, ncol = 1)

ax.yaxis.set_major_formatter(FuncFormatter(lambda x, pos: ("$%1.1fM")%(x*1e-6)))

plt.show();

Quarterly/Monthly Breakdown of Fines

Over the course of 20 years, the city of Los Angeles has made nearly 923 million US dollars from parking citations alone. Quarter 1 had the largest percentage of the overall money created from the citations with 28.03%. Quarter 3 had the smallest percentage of the overall money created from the citations with 22.88%. The month of January had the highest percentage with 9.57%. The month of September had the smallest percentage with 7.24%.

df["Quarter"] = "Quarter " + df["Ticket Date"].dt.quarter.astype("string")
df['MonthName'] = df["Ticket Date"].dt.strftime('%b')

pie_df = df.groupby(["Quarter", "MonthName", "Month"])["Fine amount"].sum().reset_index(name = "TotalFines")
pie_df.sort_values(by = ["Month"], inplace = True)
pie_df.reset_index(inplace = True, drop = True)
# deleting month from data frame so it does mess with pie chart
del pie_df["Month"]

# set up inside and outside reference numbers for colors - reference numbers draw from Color Map (CMAP)

number_outside_colors = len(pie_df.Quarter.unique())
outside_color_ref_number = np.arange(number_outside_colors)*4

number_inside_colors = len(pie_df.MonthName.unique())
all_color_ref_number = np.arange(number_outside_colors + number_inside_colors)

# picking colors not already used
inside_color_ref_number = [ ]
for each in all_color_ref_number:
    if each not in outside_color_ref_number:
        inside_color_ref_number.append(each)

# Nested Pie Chart

fig = plt.figure(figsize = (7, 7))
ax = fig.add_subplot(1, 1, 1)

colormap = plt.get_cmap("tab20c")
outer_colors = colormap(outside_color_ref_number)

all_fines = pie_df.TotalFines.sum()

pie_df.groupby(["Quarter"])["TotalFines"].sum().plot(
    kind = "pie", radius = 1, colors = outer_colors, pctdistance = 0.85, labeldistance = 1.1,
    wedgeprops = {"edgecolor":'white'}, textprops = {"fontsize": 11},
    autopct = lambda p: "{:.2f}%\n(${:.1f}M)".format(p, (p/100)*all_fines/1e+6),
    startangle = 90)

inner_colors = colormap(inside_color_ref_number)
pie_df.TotalFines.plot(
    kind = "pie", radius = 0.7, colors = inner_colors, pctdistance = 0.55, labeldistance = 0.8,
    wedgeprops = {"edgecolor":'white'}, textprops = {"fontsize": 9},
    labels = pie_df.MonthName,
    autopct = "%1.2f%%",
    startangle = 90)

hole = plt.Circle((0, 0), 0.3, fc = "white")
fig1 = plt.gcf()
fig1.gca().add_artist(hole)

ax.yaxis.set_visible(False)
plt.title("Total Fines by Quarter and Month", fontsize = 14)

ax.text(0,0, "Total Fines\n " + "$" + str(round(all_fines/1e6, 2)) + "M", size = 13, ha = "center", va = "center")

ax.axis("equal")

plt.tight_layout()
plt.show();

Actual vs Budget of Monthly Fines

With a created budget of 12 million US dollars, the chart below examines a monthly breakdown of the actual amount of money earned from citations compared to the budget in the year 2017 (most citations given for all years). All months except February and November exceeded the monthly expected budget of 12 million. February only made 11.86 million and November only made 11.53 million. The month of March made the most with 14.73 million. Over the course of 12 months in 2017, the city of Los Angeles made 158.45 million US dollars from parking citations.

waterf = df[df["Year"] == 2017].groupby(["MonthName"])["Fine amount"].sum().reset_index(name = "TotalFines")

waterf["Budget"] = 12e6
waterf["Deviation"] = waterf.TotalFines - waterf.Budget
waterf.loc[waterf.index.max()+1] = ["Total",
                                  waterf.TotalFines.sum(),
                                  waterf.Budget.sum(),
                                  waterf.TotalFines.sum() - waterf.Budget.sum()]
months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec", "Total"]

# ordering by month
waterf.MonthName = pd.Categorical(waterf.MonthName, categories = months, ordered = True)
waterf.sort_values(by = "MonthName", inplace = True)
waterf.reset_index(inplace = True, drop = True)

if waterf.loc[12, "Deviation"] > 0:
    end_color = "black"
elif waterf.loc[12, "Deviation"] < 0:
    end_color = "red"
else:
    end_color = "blue"

fig1 = go.Figure(go.Waterfall(name = "", orientation = "v", x = waterf["MonthName"], textposition = "outside", measure = ["relative","relative","relative","relative","relative","relative",             "relative","relative","relative","relative","relative","relative", "total"], y = waterf["Deviation"]/1e6, text = ["${:.2f}M".format(each/1e6) for each in waterf["TotalFines"]], decreasing = {"marker":{"color":"red"}}, increasing = {"marker":{"color":"green"}}, totals = {"marker":{"color":end_color}}, hovertemplate = "Cumulative Deviation to Date: " + "$%{y:,.2f}M" + "<br>" + "Total Fines in %{x}: %{text}"))

fig1.layout = go.Layout(yaxis = dict(tickformat = ".1f"))

fig1.update_xaxes(title_text = "Months", title_font = {"size": 18});
fig1.update_yaxes(title_text = "Total Fines (Running Total $M)", title_font = {"size": 18}, dtick = 0.5, tickprefix = "$", ticksuffix = "M", zeroline = True);
fig1.update_layout(title = 
dict(text = "Deviation between Actual and Budgeted Monthly Fines in 2017 <br>" + "Surpluses in Green, Deficits in Red",
font = dict(family = "Arial", size = 18, color = "black")),
template = "simple_white", title_x = 0.5, showlegend = False, autosize = True, margin = dict(l = 30, r = 30, t = 60, b = 30))

#fig1.show();

import plotly.io as pio
path = "U:/"
pio.write_html(fig1, path + "waterfall_diagram.html", auto_open=False)

Conclusion

Overall, this project provides a general understanding of the parking/driving violations that occurs in the city of Los Angeles, California from 2001-2021. Using the visualizations provided, one can argue that the most likely parking/driving citation would be a Toyota driver on a Tuesday at 12 pm. This project only contains data about when parking/driving citations occurred and how much the fines were worth. This could be further expanded using more accurate data about the location a citation was given or data about the people who received the citation. Combining this project with data about a citation location and/or the offender’s age, race, and economic background, would aid in finding more insight into the parking/driving violations that occurs in Los Angeles.