Introduction

Below is a real world data set containing over 11 million rows of parking citations within Los Angeles. Primarily between 2010-2020, this data set contains 22 columns. It contains records from the Los Angeles Department of Transportation, a municipal agency that overseas transportation, planning, design, construction, maintenance, and operations within the city of Los Angeles, California.

Dataset: Los Angeles Parking Citations

The Los Angeles Parking Citations data set includes 11,017,510 rows of data and 22 columns. This data set contains information about the time each ticket was issued, including the year, month, and day. It also contains information regarding each cars VIN number, as well as the make and model of their car. It contains each cars fine amount, violation code, and violation description. As opposed to the Baltimore Citations data set, this data set contains the year and date that each ticket was issued, rather than the expiration year and date of each car. Being that parking meters have not been around for too long, the bulk of the information I included in my visualizations are from 2014-2019.

Being that this data set was so large, I first ran into some problems with running everything all at once. To make this easier, I decided to delete columns and only read in the columns I needed for each individual visualization. The few columns that I never used were deleted first, including latitude, longitude, car make, car model, agency description, and color description.

Once I started creating my first data set, I realized the first thing I needed to do was pull out information from the date. The “issue date” column from my data set contained the day, month, and year each parking citation was issued, so I needed to break those up into separate columns. By reading in only the issue date column, I imported pandas and used the datetime function to pull out more columns in my data frame. Rather than just having the issue date column, I created new columns for the year, month, month name, day, weekday, quarter, as well as abbreviations. Once I separated these dates, I was ready to build on my first visualization.

My first visualization was a scatterplot. Creating a data frame containing the year, month, and count of each parking citation, I was able to show which months and years contained the most citations.

My second visualization was a little more complex. I was able to create 2 frequency bar charts in one visualization. The first chart showed the frequency of parking citations by VIN of the top 250 cars. The second chart showed the top 10 frequencies of parking citations. Each chart was color coordinated, with one color representing VINs above the average citation and one color representing VINs below the average citation.

Like my previous R data set visualizations, I decided to create another multiple line plot visualization. The multiple line plot creates a better visual picture for people, especially in color coordinating each line. This line plot showed total parking fines by month and weekday.

My 4th visualization was a complex pie chart, or a nested pie chart. I’ve never created one of these visualizations in the past, but I liked the uniqueness of a pie chart as opposed to looking at the same charts all the time. By creating a nested pie chart, I was able to show a “2 in 1” set of information. My pie chart represented the total number of parking fines by quarter and month.

Moving on to my 5th visualization, I created a bump chart. This is something I have never done before, but I was able to fit the most amount of information into this visualization than any other visualization I have done throughout my data set. This bump chart focused in on rankings. By ranking the total number of parking fines, I was able to create a visualization showing which month and year had the most and least number of citations.

Last but not least, my 6th visualization was a heatmap. A heatmap is one of my favorite ways for people to easily point out information based on the shading on each box. My heatmap demonstrated the number of parking citations by year and month.

Findings and Data Summary

Below is basic information about my data set.

<class ‘pandas.core.frame.DataFrame’> RangeIndex: 11017510 entries, 0 to 11017509 Data columns (total 22 columns): # Column Dtype

0 Ticket number object 1 Issue Date object 2 Issue time float64 3 Meter Id object 4 Marked Time float64 5 RP State Plate object 6 Plate Expiry Date float64 7 VIN object 8 Make object 9 Body Style object 10 Color object 11 Location object 12 Route object 13 Agency float64 14 Violation code object 15 Violation Description object 16 Fine amount float64 17 Latitude float64 18 Longitude float64 19 Agency Description object 20 Color Description object 21 Body Style Description object dtypes: float64(7), object(15) memory usage: 1.8+ GB

Overall, I think what I wanted to focus in on with this LA Parking Citations data set was the frequency of when parking citations were issued. Whether it be the most common month, day of the week, year, or even quarter, I focused in on creating visualizations that described to people the most common times parking citations were issued. I was even able to delve a little deeper into this story by creating rankings, so that people could not only see the most common and least common occurrences of a parking citation, but they could see all the in-between as well. Breaking up my data into quarters was able to capture something different and appealing to the eye. It’s not so often we see visualizations showing quarters, rather just years, months, and days, so utilizing this column made my story even stronger. Through dates, times, counts, and fine amounts, I was able to narrow down this 1.8 GB data set and develop a story for people to see.

Scatterplot

Below is a scatterplot showing the number of parking citations by month and year. Although the data set had years going all the way back to the early 2000s, I decided only to include 2014-2019 since this is where the bulk of the information came. The reason there was little activity before 2014 is mainly due to the fact that parking meters have not been around for that long, so they have slowly become more popular. Now that every major city has parking meters, there is so much more information and data available.

import os
os.environ['QT_QPA_PLATFORM_PLUGIN_PATH']='c:/ProgramData/Anaconda3/Library/plugins/platforms'

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings

path = "U:/"
filename = 'Parking_Citations.csv'
df = pd.read_csv(path + filename, usecols = ['Issue Date'])
df['Issue Date'] = pd.to_datetime(df['Issue Date'], format = '%m/%d/%Y')
df['Year'] = df['Issue Date'].dt.year
df['Quarter'] = df['Issue Date'].dt.quarter
df['Month'] = df['Issue Date'].dt.month
df['Day'] = df['Issue Date'].dt.day
df['DayOfTheWeek'] = df['Issue Date'].dt.dayofweek
df['MonthName'] = df['Issue Date'].dt.strftime('%B')

x = df.groupby(['Year', 'Month'])['Year'].count().reset_index(name = 'count')
x = pd.DataFrame(x)

x2 = x.loc[~x['Year'].isin(range(2014))]
x2 = x2.loc[~x2['Year'].isin(range(2020,9999))]
x2['Year'] = x2['Year'].astype('int')
x2['Month'] = x2['Month'].astype('int')
x2['count_hundreds'] = round(x2['count']/100, 0)
x2 = x2.reset_index(drop = True)
del x2['count_hundreds']
x3 = x2.groupby(['Year', 'Month'])['count'].sum().reset_index()
x3 = pd.DataFrame(x3)
x3['count_hundreds'] = round(x3['count']/100, 0)

plt.figure(figsize = (16,10))
plt.scatter(x3['Month'], x3['Year'], 
            marker = 'o', 
            cmap = 'plasma', 
            c = x3['count_hundreds'], 
            s = x3['count_hundreds'], 
            edgecolors = 'black')
plt.title('Los Angelos Parking Citations', fontsize = 22)
plt.xlabel('Months of the Year', fontsize = 18)
plt.ylabel('Year', fontsize = 18)
cbar = plt.colorbar()
cbar.set_label('Number of Parking Citations', rotation = 270, fontsize = 16, color = 'black', labelpad = 30)
my_colorbar_ticks = [*range(0, int(x3['count_hundreds'].max()), 200   )]
cbar.set_ticks(my_colorbar_ticks)
my_colorbar_tick_labels = [*range(0, int(x3['count'].max()), 20000)]
my_colorbar_tick_labels = ['{:,}'.format(each) for each in my_colorbar_tick_labels]
cbar.set_ticklabels(my_colorbar_tick_labels)
my_x_ticks = [*range(x3['Month'].min(), x3['Month'].max()+1, 1)]
plt.xticks(my_x_ticks, fontsize = 18, color = 'black')
my_y_ticks = [*range(x3['Year'].min(), x3['Year'].max()+1, 1)]
plt.yticks(my_y_ticks, fontsize = 18, color = 'black')
plt.show()

What was unique about this visualization is that it captured so many aspects for people to see and visualize. Not only do the bigger circles symbolize more parking citations from that year and month, but the colors played a part in this as well. When looking at this scatterplot, the lighter the color and bigger the circle, the more frequent parking citations occurred in that specific year and month. Consequently, the darker the circle and smaller the circle meant there were less parking citations in a specific year and month. Overall, this data concluded that 2014 and the end of 2019 has the least amount of parking citations. There were so few citations issued in 2014 that the circles are too small to even see on this scatterplot. The reason for this is largely due to the presence of parking meters. They started to come to Los Angeles right around 2012-2014, so this explains why there are so few citations in 2014.

Vertical Bar Charts

Below are 2 vertical bar charts showing the frequency of parking citations by VIN. A VIN is a vehicle identification number. Each car has a unique VIN based on the cars registration, or where each vehicle was made. Since this data set did not contain information of each vehicles license plate number, I used the VIN number to separate each individual vehicle. VIN numbers are also 17 digits long, so I shortened it to the last 4 digits of the VIN to make it easier to read and understand in my bar chart.

import os
os.environ['QT_QPA_PLATFORM_PLUGIN_PATH']='c:/ProgramData/Anaconda3/Library/plugins/platforms'
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
path = "U:/"
filename = 'Parking_Citations.csv'
df = pd.read_csv(path + filename, usecols = ['Issue Date', 'VIN', 'Fine amount'])
df.VIN.fillna("Not Available", inplace = True)
df['Issue Date'] = pd.to_datetime(df['Issue Date'], format = '%m/%d/%Y')
x = df.groupby(['VIN']).agg({'VIN':['count'], 'Fine amount':['sum', 'mean']}).reset_index()
x.columns = ['VIN', 'Count', 'TotalFines', 'AverageFines']
x = x.sort_values('Count', ascending = False)
PossibleBadVIN = x['VIN'].str.contains('Not Available|NOT VISIBLE|COVERED|NV|Not|NOT')
PossibleBadRows = x[x['VIN'].str.contains('Not Available|NOT VISIBLE|COVERED|NV|Not|NOT')]
deleteRows = PossibleBadRows
a = deleteRows.Count.sum()
b = deleteRows.TotalFines.sum()
c = b/a
x = x[-x['VIN'].isin(deleteRows.VIN)]
x.loc[x.index.max()+1] = ['Missing', a, b, c]
x = x.sort_values('Count', ascending = False)
x.reset_index(inplace = True, drop = True)

def pick_colors_according_to_mean_count(this_data):
    colors=[]
    avg = this_data.Count.mean()
    for each in this_data.Count:
        if each > avg*1.10:
            colors.append('violet')
        elif each < avg*0.90:
            colors.append('royalblue')
        else:
            colors.append('silver')
    return colors

import matplotlib.patches as mpatches
bottom1 = 1
top1 = 250
d1 = x.loc[bottom1:top1]
my_colors1 = pick_colors_according_to_mean_count(d1)
my_colors1
bottom2 = 1
top2 = 10
d2 = x.loc[bottom2:top2]
my_colors2 = pick_colors_according_to_mean_count(d2)
my_colors2
Above = mpatches.Patch(color = 'violet', label = 'Above Average')
At = mpatches.Patch(color = 'silver', label = 'Within 10% of the Average')
Below = mpatches.Patch(color = 'royalblue', label = 'Below Average')
fig = plt.figure(figsize = (16,14))
fig.suptitle('Frequency of Parking Citation Analysis by VIN:\n Top ' + str(top1) + ' and Top ' + str(top2), 
             fontsize = 16, fontweight = 'bold')
ax1 = fig.add_subplot(2, 1, 1)
ax1.bar(d1.VIN, d1.Count, label = 'Parking Citation Count', color = my_colors1)
#ax1.legend(fontsize = 14)
ax1.legend(handles=[Above, At, Below], fontsize = 12)
plt.axhline(d1.Count.mean(), color = 'black', linestyle = 'dashed')
ax1.spines['right'].set_visible(False)
ax1.spines['top'].set_visible(False)
ax1.axes.xaxis.set_visible(False)
ax1.set_title('Top ' + str(top1) + ' LA Parking Citations', size = 18)
ax1.text(top1-10, d1.Count.mean()+0.2, 'Mean = ' + str(d1.Count.mean()), rotation = 0, fontsize = 12)
ax2 = fig.add_subplot(2, 1, 2)
ax2.bar(d2.VIN, d2.Count, label = 'Parking Citation Count', color = my_colors2)
#ax1.legend(fontsize = 14)
ax2.legend(handles=[Above, At, Below], fontsize = 10)
plt.axhline(d2.Count.mean(), color = 'black', linestyle = 'solid')
ax2.spines['right'].set_visible(False)
ax2.spines['top'].set_visible(False)
#ax1.axes.xaxis.set_visible(False)
ax2.set_title('Top ' + str(top2) + ' LA Parking Citations', size = 18)
ax2.text(top2-1, d2.Count.mean()-0.4, 'Mean = ' + str(d2.Count.mean()), rotation = 0, fontsize = 12)
fig.subplots_adjust(hspace = 0.1)
plt.show()

Looking at these bar charts, I quickly realized that most cars who got parking tickets tended not to get them again. The car with the most parking citations only had a maximum of 8 citations in total. Although this visualization was not what I had expected at all, it was interesting to see the effect that one parking citation had on cars. It seems like once someone gets one citation, they are more careful not to do it again. Los Angeles is a city of over 4 million people, so it was very surprising to find out that the most parking citations any car has ever gotten was 8 throughout 2014-2019.

This visualization also showed the average amount of parking citations of both the top 250 cars and the top 10. The top 250 cars had average citations of only 4.6, while the top 10 cars had an average of 7.

Multiple Line Plot

Below is a multiple line plot showing the total parking fines by month and day of the week. Creating 7 lines with different colors for each line clearly showed a visual representation of what month the most citations occurred and what day of the week. The maximum amount of money generated per month for parking citations in LA was about 14 million dollars, whereas the minimum amount was closer to 2 million dollars.

import os
os.environ['QT_QPA_PLATFORM_PLUGIN_PATH']='c:/ProgramData/Anaconda3/Library/plugins/platforms'
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
path = "U:/"
filename = 'Parking_Citations.csv'

df = pd.read_csv(path + filename, usecols = ['Issue Date', 'VIN', 'Fine amount'])
df.VIN.fillna("Not Available", inplace = True)
df['Issue Date'] = pd.to_datetime(df['Issue Date'], format = '%m/%d/%Y')
df['Year'] = df['Issue Date'].dt.year
df['Quarter'] = df['Issue Date'].dt.quarter
df['Month'] = df['Issue Date'].dt.month
df['Day'] = df['Issue Date'].dt.day
df['Weekday'] = df['Issue Date'].dt.strftime('%a')
df['MonthName'] = df['Issue Date'].dt.strftime('%b')
fine_df = df.groupby(['Month', 'Weekday'])['Fine amount'].sum().reset_index(name = 'TotalFines')

from matplotlib.ticker import FuncFormatter
fig = plt.figure(figsize = (16,10))
ax = fig.add_subplot(1, 1, 1)
my_colors = {'Mon':'indianred',
             'Tue':'deepskyblue',
             'Wed':'lightpink',
             'Thu':'yellowgreen',
             'Fri':'orchid',
             'Sat':'orange',
             'Sun':'midnightblue'}
for key, grp in fine_df.groupby(['Weekday']):
    grp.plot(ax=ax, kind = 'line', x = 'Month', y = 'TotalFines', color = my_colors[key], label = key, marker = '8')
plt.title('Total Parking Fines by Month', fontsize = 22)
ax.set_xlabel('Month (Numerical)', fontsize = 18)
ax.set_ylabel('Total Fines', fontsize = 18, labelpad = 20)
ax.tick_params(axis = 'x', labelsize = 14, rotation = 0)
ax.tick_params(axis = 'y', labelsize = 14, rotation = 0)
ax.set_xticks(np.arange(13))
handles, labels = ax.get_legend_handles_labels()
handles = [handles[1], handles[5], handles[6], handles[4], handles[0], handles[2], handles[3]]
labels = [labels[1], labels[5], labels[6], labels[4], labels[0], labels[2], labels[3]]
plt.legend(handles, labels, loc = 'best', fontsize = 14, ncol = 1)
ax.get_yaxis().set_major_formatter(FuncFormatter(lambda x, p: '$'+format(int(x), ',')))
plt.show()

Looking at this multiple line plot, the first thing I noticed was the huge drop in parking citations on the weekends. It looks like Saturday and Sunday generate a lot less money from parking citations than do the weekdays. The biggest reason for this is because many places offer free parking on the weekends, or at a lower rate. Less people work on the weekends, so there is less traffic and rush hour throughout the day. During the weekdays, people are all traveling to the city to get to work, so the cities can afford to make people pay for parking with so much traffic on the streets.

Another thing I noticed in this line plot was the huge dip in parking citations in March and November. Consequently, there was a decrease in citations in June, July, and August. The biggest reason for this sudden decrease was mainly due to the summer. Not as many people work in the summer. A lot of families travel, go on vacation, and even migrate away from the city so there is much less traffic.

Nested Pie Chart

Below is the 4th visualization. I decided to go with a nested pie chart to show something different and unique as opposed to all of the graphs and plots. This pie chart contains an inner and outer layer, with a hole, or donut, in the middle. The outer layer represents the total number of parking fines per quarter in both millions and a percentage. The inner layer goes into more detail to show the total number and percentage of fines per month of each quarter. The hole in the middle tallies up all of the fines and represents the total number of all LA parking citations in this data set.

import os
os.environ['QT_QPA_PLATFORM_PLUGIN_PATH']='c:/ProgramData/Anaconda3/Library/plugins/platforms'
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
path = "U:/"
filename = 'Parking_Citations.csv'
df = pd.read_csv(path + filename, usecols = ['Issue Date', 'VIN', 'Fine amount'])
df.VIN.fillna("Not Available", inplace = True)
df['Issue Date'] = pd.to_datetime(df['Issue Date'], format = '%m/%d/%Y')
df['Year'] = df['Issue Date'].dt.year
df['Quarter'] = df['Issue Date'].dt.quarter
df['Month'] = df['Issue Date'].dt.month
df['Day'] = df['Issue Date'].dt.day
df['Weekday'] = df['Issue Date'].dt.strftime('%a')
df['MonthName'] = df['Issue Date'].dt.strftime('%b')
df.drop(columns=['Quarter'])
df['Quarter'] = 'Quarter ' + df['Issue Date'].dt.quarter.astype('string')
pie_df = df.groupby(['Quarter','MonthName', 'Month'])['Fine amount'].sum().reset_index(name = 'TotalParkingFines')
pie_df.sort_values(by=['Month'], inplace = True)
pie_df.reset_index(inplace = True, drop = True)
del pie_df['Month']
number_outside_colors = len(pie_df.Quarter.unique())
outside_color_ref_number = np.arange(number_outside_colors)*4
number_inside_colors = len(pie_df.MonthName.unique())
all_color_ref_number = np.arange(number_outside_colors + number_inside_colors)
inside_color_ref_number = []
for each in all_color_ref_number:
    if each not in outside_color_ref_number:
        inside_color_ref_number.append(each)

fig = plt.figure(figsize = (12,12))
ax = fig.add_subplot(1, 1, 1)
colormap = plt.get_cmap("tab20")
outer_colors = colormap(outside_color_ref_number)
all_fines = pie_df.TotalParkingFines.sum()
pie_df.groupby(['Quarter'])['TotalParkingFines'].sum().plot(
       kind = 'pie', radius = 1, colors = outer_colors, pctdistance = 0.85, labeldistance = 1.1,
       wedgeprops = dict(edgecolor = 'white'), textprops = {'fontsize':18},
       autopct = lambda p: '{:.2f}%\n(${:.1f}M)'.format(p, (p/100)*all_fines/1e+6),
       startangle = 90)
inner_colors = colormap(inside_color_ref_number)
pie_df.TotalParkingFines.plot(
       kind = 'pie', radius = 0.7, colors = inner_colors, pctdistance = 0.55, labeldistance = 0.8,
       wedgeprops = dict(edgecolor = 'white'), textprops = {'fontsize':13},
       labels = pie_df.MonthName, 
       autopct = '%1.2f%%',
       startangle = 90)
hole = plt.Circle((0,0), 0.3, fc = 'white')
fig1 = plt.gcf()
fig1.gca().add_artist(hole)
ax.yaxis.set_visible(False)
plt.title('Total Parking Fines by Month and Quarter', fontsize = 18)
ax.text(0,0, 'Total Fines\n' + '$' + str(round(all_fines/1e+6, 2)) + 'M', size = 18, ha = 'center', va = 'center')
ax.axis('equal')
plt.tight_layout()
plt.show()

Overall, the number of fines per quarter generally stays within the same range. What I found interesting was that the fines decreased little by little each quarter. Quarter 1 started off with 27% of fines, quarter 2 was 26%, quarter 3 went down to 24%, and quarter 4 was only 22%. As the years progressed, less and less fines were issued. As for the fines per month, the most amount of fines were issued in the first quarter months, with February and March carrying about 9% of total fines overall.

Bump Chart

Below is a bump chart. A bump chart is especially useful in ranking data. In this case, the bump chart below ranks the total fines issued each month. It color codes lines for each year so that we can see how total fines change from year to year and month to month. Since I used data from years 2010 to 2019 (skipping 2011), each month is ranked 1-9. A rank of 1 would mean that month generated the most fines, and 9 being the least. To make things easier to visualize, I also included circles for each month of each year to show how much money was generated from the parking fines (in millions).

import os
os.environ['QT_QPA_PLATFORM_PLUGIN_PATH']='c:/ProgramData/Anaconda3/Library/plugins/platforms'
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
path = "U:/"
filename = 'Parking_Citations.csv'
df = pd.read_csv(path + filename, usecols = ['Issue Date', 'VIN', 'Fine amount'])
df.VIN.fillna("Not Available", inplace = True)
df['Issue Date'] = pd.to_datetime(df['Issue Date'], format = '%m/%d/%Y')
df['Year'] = df['Issue Date'].dt.year
df['Quarter'] = df['Issue Date'].dt.quarter
df['Month'] = df['Issue Date'].dt.month
df['Day'] = df['Issue Date'].dt.day
df['Weekday'] = df['Issue Date'].dt.strftime('%a')
df['MonthName'] = df['Issue Date'].dt.strftime('%b')
bump_df = df.groupby(['Year', 'MonthName'])['Fine amount'].sum().reset_index(name = 'TotalParkingFines')
bump_df = bump_df.pivot(index = 'Year', columns = 'MonthName', values = 'TotalParkingFines')
month_order = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
bump_df = bump_df.reindex(columns = month_order)
bump_df = bump_df.dropna()
bump_df_ranked = bump_df.rank(0, ascending = False, method = 'min')
bump_df_ranked = bump_df_ranked.T

fig = plt.figure(figsize = (30,22))
ax = fig.add_subplot(1, 1, 1)
bump_df_ranked.plot(kind = 'line', ax = ax, marker = 'o', markeredgewidth = 1, linewidth = 6, 
                   markersize = 80, 
                   markerfacecolor = 'white')

ax.invert_yaxis()
num_rows = bump_df_ranked.shape[0]
num_cols = bump_df_ranked.shape[1]
plt.ylabel('Monthly Ranking', fontsize = 22, labelpad = 10)
plt.title('Ranking of Total Parking Fines by Month and Year \n Bump Chart', fontsize = 22, pad = 12)
plt.xticks(np.arange(num_rows), month_order, fontsize = 14)
plt.yticks(range(1, num_cols+1, 1), fontsize = 22)
ax.set_xlabel('Month', fontsize = 22)
handles, labels = ax.get_legend_handles_labels()
handles = [handles[8], handles[7], handles[6], handles[5], handles[4], handles[3], handles[2], handles[1], handles[0]]
labels = [labels[8], labels[7], labels[6], labels[5], labels[4], labels[3], labels[2], labels[1], labels[0]]
ax.legend(handles, labels, bbox_to_anchor=(1.02, 1.02), fontsize = 16, 
         labelspacing = 2,
         markerscale = .2,
         borderpad = 1,
         handletextpad = 0.8)
i = 0
j = 0
for eachcol in bump_df_ranked.columns:
    for eachrow in bump_df_ranked.index:
        this_rank = bump_df_ranked.iloc[i, j]
        ax.text(i, this_rank, '$' + str(round(bump_df.iloc[j, i]/1e6, 4)) + 'M', ha = 'center', va = 'center', fontsize = 12)
        i+=1
    j+=1
    i=0
  
plt.show()

Normally for this data set, I have been using years 2014-2019. For this visualization, I wanted to include years 2010-2014 in order to show how unaffected they are compared to more recent years. 2011 was also not included in this data set, most likely the reason being there were little to none parking meters this early on. As you can see from the chart, the fines are calculated in millions. 2010, 2012, 2013, and 2014 have had such little fines they are shown as 0 in the chart. Once you get farther ahead to years 2017 and on, the amount of fines keep getting larger, with the maximum amount of fines generated in in March, 2017 at over 14 million dollars. Overall, I created this bump chart because it represents a completely new way of visualizing data using rankings rather than just numbers.

Heatmap

A heatmap is a visual technique that shows the magnitude of a phenomenon as color in two dimensions. The variation of color is by intensity, with the darker of the color meaning more intensity of the data. Below is a heatmap consisting of the number of parking citations by year and month. For example, the total number of citations that occurred the most per month of each year are the darkest shade of blue. The lightest shade of blue-green represent the least amount of violations which occurred.

import os
os.environ['QT_QPA_PLATFORM_PLUGIN_PATH']='c:/ProgramData/Anaconda3/Library/plugins/platforms'
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
path = "U:/"
filename = 'Parking_Citations.csv'
df = pd.read_csv(path + filename, usecols = ['Issue Date', 'VIN', 'Fine amount'])
df.VIN.fillna("Not Available", inplace = True)
df['Issue Date'] = pd.to_datetime(df['Issue Date'], format = '%m/%d/%Y')
df['Year'] = df['Issue Date'].dt.year
df['Quarter'] = df['Issue Date'].dt.quarter
df['Month'] = df['Issue Date'].dt.month
df['Day'] = df['Issue Date'].dt.day
df['Weekday'] = df['Issue Date'].dt.strftime('%a')
df['MonthName'] = df['Issue Date'].dt.strftime('%b')
x = df.groupby(['Year', 'Month'])['Year'].count().reset_index(name = 'count')
x = pd.DataFrame(x)
x2 = x.loc[~x['Year'].isin(range(2014))]
x2 = x2.loc[~x2['Year'].isin(range(2020,9999))]
x2['Year'] = x2['Year'].astype('int')
x2['Month'] = x2['Month'].astype('int')
x2['count_hundreds'] = round(x2['count']/100, 0)
x2 = x2.reset_index(drop = True)
del x2['count_hundreds']
x3 = x2.groupby(['Year', 'Month'])['count'].sum().reset_index()
x3 = pd.DataFrame(x3)
x3['count_hundreds'] = round(x3['count']/100, 0)
hm_df = pd.pivot_table(x3, index = 'Year', columns = 'Month', values = 'count')

import seaborn as sns
from matplotlib.ticker import FuncFormatter

fig = plt.figure(figsize = (20,12))
ax = fig.add_subplot(1, 1, 1)
comma_fmt = FuncFormatter(lambda x, p: format(int(x), ','))
ax = sns.heatmap(hm_df, linewidth = 0.2, annot = True, cmap = 'crest', fmt = ',.0f',
                 square = False, annot_kws = {'size': 16},
                 cbar_kws = {'format': comma_fmt, 'orientation':'vertical'})
plt.title('Heatmap of Parking Citations by Year and Month', fontsize = 20, pad=15)
plt.xlabel('Month', fontsize = 18, labelpad = 10)
plt.ylabel('Year', fontsize = 18, labelpad = 10)
plt.yticks(rotation = 0, size = 14)
plt.xticks(size = 14)
ax.invert_yaxis()
cbar = ax.collections[0].colorbar
max_count = hm_df.to_numpy().max()
my_colorbar_ticks = [*range(0, max_count, 20000)]
cbar.set_ticks(my_colorbar_ticks)
my_colorbar_tick_labels = ['{:,}'.format(each) for each in my_colorbar_ticks]
cbar.set_ticklabels(my_colorbar_tick_labels)
cbar.set_label('Number of Parking Citations', rotation = 270, fontsize = 14, color = 'black', labelpad = 20)
plt.show()

For this visualization, I went back to just using years 2014 through 2019. As you can see by the light shades of color, 2014 had the least number of violations as well as 2019. The most amount of fines occurred in the middle years, with 2016 and 2017 having the darkest shade of blue and the most amount of fines. What I found interesting was comparing the huge difference between the most and least amount of fines. June of 2014 had only 11 total citations issued, whereas March of 2017 has over 210,000.

Conclusion

Overall, these 6 visualizations created a specific story about the data. The main theme I wanted to visualize was dates and times. Using the “Issue Date” column and pulling out the month, year, day, and quarter that each citation was issued made it much easier for me to visualize my data and create these visualizations. Since this data did not have any information regarding license plate numbers or registration expiration, it was more useful to develop a story using the frequency of fines and total fine amount. Every one of my visualizations utilized a date, whether it be a quarter, day, month, or year, I was able to represent how often and how many citations occurred. Through these visualizations, one can see what time of the year the most citations happened and how many there were.

One difficult part about my data set was its length. Being that it was almost 2 gigabytes of data and over 11 million rows, I had a lot of trouble with time. Even utilizing only a few columns in each visualization, it took a lot of time for several codes to run. Another issue with such a large data set was the number of NA’s in the data. In order to create most of my visualizations, I had to take out all of the NA’s and restructure my data in a way that still made sense with my overall story. Overall, I think the Los Angeles Parking Citations data set contained a lot of useful information in capturing specific data from the LADOT. Through this data set I developed a story for viewers to clearly visualize using charts, graphs, and plots.