All the data and code for this project is on my Github. All plots in
pngformat can be found in my imgur gallery.
2013-10-21 12:00 PM:
Initial hand in version.
2013-10-21 02:00 PM:
Fixed some plots, added colour to existing plots.
2013-10-22 01:00 AM:
Added maps and conclusions. Updated MAKEFILE.R
2013-10-23 11:30 PM:
Fixed typos, bad grammar, bad spelling, etc. Clarified some points. Added a table describing variables in data set.
I investigate a data set containing all municipal parking tickets issued in the city of Vancouver between January 1, 2004 and September 25, 2008. The data set originates from The Vancouver Sun's website where a web interface is provided for the interested reader to query the database. There was no way to easily download the data from the source, so I obtained a dump of the entire database from the website of a local Vancouver programmer, David Grant, who kindly links to the file on his website along with the code used to scrape the data from the source website.
Note that this data set only contains parking offences issued by the city of Vancouver, but does not contain data on fines received in off-street private or public parking lots.
I chose this data set because I wanted to delve into some local data that might be useful for us “Vancouverites”. Also, this data set is pretty sizeable, containing information on 1.6 million parking tickets issued in Vancouver. Maybe we will be able to find something interesting that will help us to avoid parking tickets?
The data set extracted from David Grant's database dump was already quite clean. We load the data and check its dimensions and structure to verify that nothing went horribly wrong (output not shown). Note that I took the liberty of converting the relevant table in the MYSQL data base into a csv file. The file is quite large, so I provide an rds binary copy of the data set in my Github.
# Read in the data
ptDat <- readRDS("../data_01_raw/parkingtickets.rds")
# Check dimensions
dim(ptDat)
# Check structure
str(ptDat)
We have approximately 1.6 million rows and 15 columns. We go through some additional cleaning steps before we begin our analysis:
NAs)R (e.g. POSIX date variables, set time zones, convert to factors, etc.)year, month, day, hour, etc.)After some light data cleaning and processing, we check the structure of the data:
# Verify that the data set was read properly Check dimensions
dim(ptDat)
## [1] 1631387 17
# Check structure
str(ptDat)
## 'data.frame': 1631387 obs. of 17 variables:
## $ datetime : POSIXct, format: "2004-01-02 07:08:00" "2004-01-02 07:09:00" ...
## $ date : POSIXct, format: "2004-01-02" "2004-01-02" ...
## $ time : chr "07:08:00" "07:09:00" "07:11:00" "07:14:00" ...
## $ plate : chr "661DEL" "A87433E" "061JJK" "NJR688" ...
## $ make_denorm : Factor w/ 115 levels "DELOREAN","EDSEL",..: 107 113 107 108 113 102 112 95 114 97 ...
## $ address : chr "1250 Broadway St. W." "1450 Davie St." "1350 Davie St." "1650 Davie St." ...
## $ street_num : int 1250 1450 1350 1650 850 650 1650 1350 1350 1550 ...
## $ street_name : chr "Broadway St. W." "Davie St." "Davie St." "Davie St." ...
## $ offence_denorm: Factor w/ 16 levels "Park too far away from curb",..: 15 15 15 15 15 15 15 15 15 15 ...
## $ make_denorm2 : Factor w/ 80 levels "Other","AUSTIN",..: 72 78 72 73 78 67 77 60 79 62 ...
## $ year : num 2004 2004 2004 2004 2004 ...
## $ month : num 1 1 1 1 1 1 1 1 1 1 ...
## $ day : int 2 2 2 2 2 2 2 2 2 2 ...
## $ wday : Ord.factor w/ 7 levels "Sunday"<"Monday"<..: 6 6 6 6 6 6 6 6 6 6 ...
## $ hour : int 7 7 7 7 7 7 7 7 7 7 ...
## $ holiday : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ holiday_name : Factor w/ 9 levels "New Years","Good Friday",..: NA NA NA NA NA NA NA NA NA NA ...
# The variable names
names(ptDat)
## [1] "datetime" "date" "time" "plate"
## [5] "make_denorm" "address" "street_num" "street_name"
## [9] "offence_denorm" "make_denorm2" "year" "month"
## [13] "day" "wday" "hour" "holiday"
## [17] "holiday_name"
Here is a table describing the variables in the data set.
| Variable Name | Description |
|---|---|
datetime |
The date, including the time (year, month, day, hour, minute) |
date |
The date (year, month, day) |
time |
The time (hour, minute) |
plate |
The license plate of ticketed vehicle |
make_denorm |
The make of the vehicle (e.g. HONDA, TOYOTA, etc) |
address |
The address where the vehicle was ticketed (e.g. 1050 Robson St.) |
street_num |
The street number (e.g. 1050) |
street_name |
The street name (e.g. Robson St.) |
offence_denorm |
The parking violation name (e.g. Expired Meter) |
make_denorm2 |
Modified make_denorm, where low frequency makes are binned into “Other” group |
year |
The year |
month |
The month |
day |
The day |
wday |
The day of the week (e.g. Monday, Tuesday, etc) |
hour |
The hour |
holiday |
Indiciator for holiday |
holiday_name |
Holiday name |
We begin the data analysis by first examining the different types of parking tickets in the data set and the frequency of each offence. It is of interest to know what kind of bad parking behaviours are most frequent.
| Count | Proportion | |
|---|---|---|
| Expired Meter | 737313 | 0.4520 |
| No Stopping | 321845 | 0.1973 |
| Permit/Residential Parking | 119130 | 0.0730 |
| Exceed time limit for free parking | 98167 | 0.0602 |
| Other | 59518 | 0.0365 |
| Stop in a commercial loading zone | 52888 | 0.0324 |
| Stop in a bus zone | 49637 | 0.0304 |
| Stop in a commercial lane | 46921 | 0.0288 |
| Stop too close to an intersection | 43064 | 0.0264 |
| Stop too close to a crossing | 22309 | 0.0137 |
| Stop too close to a stop sign | 19066 | 0.0117 |
| Stop too close to a fire hydrant | 15334 | 0.0094 |
| Stop or park facing the wrong way | 14014 | 0.0086 |
| Stop in area reserved for certain vehicles | 10972 | 0.0067 |
| Park in a no-parking zone | 10884 | 0.0067 |
| Park too far away from curb | 10325 | 0.0063 |
It seems the number one reason for receiving parking tickets in Vancouver is from expired parking meters. The next two most common reasons for receiving a parking tickets seem to arise from drivers wanting to avoid parking meters all together. Drivers that park their cars in a “No Stopping Zone” and drivers that parked in other people's permit/residential accounted for more than a quarter of the parking tickets in the data set. I guess that goes to show that you should always be mindful of how much money you put into the meter, and if you run out of change, you best not push your luck by not adding more time.
Let's check out the frequency of each othe offences by month over the time period of our data.
It appears that the monthly number of tickets issued for each offence is quite stable. There is no immediate trend present in the plot. However, we see some unusual activity around August, 2007, where there is a sharp decline in the number of tickets issued in Vancouver. However, the number of issued parking tickets quickly recovers to “normal” levels in a couple of months. It is unknown what caused this to occur - maybe it is a data quality issue?
Now we focus our attention to the cars! We have information about the make (i.e. manufacturer) of the car attached to each parking ticket. This may not be terribly descriptive since the model or year is not recorded (e.g. a 1980 Honda Civic is recorded the same way as a 2014 Honda Pilot), but we may see that owners of differernt car brands may have different parking behaviours.
There are 115 different makes of vehicles recorded in the data set, so we do not include a table of counts with this figure. Note that in the above figure, we group some of the 115 makes of cars into a category labelled Other to keep the figure readable.
Not much can be said relating to parking tickets from this plot per se. We are most likely seeing which cars are most common in Vancouver from our preceding figure. In the recording period of our data set, it was probably the case that the most common cars in Vancouver were Hondas, Toyotas and Fords. It is probably not the case that roadside deviants are more likely to drive Toyotas, Hondas and Fords. In any case, we cannot say much with our figure.
We can try to see if there are any noticeable differences in parking habits for people who drive different makes of cars by colouring the above plot by the ticket type.
It is hard to tell whether drivers of a certain car type are more likely to commit some offences over others in the above plot. We show a similar plot below, this time with stacked bars that describe the proportion of each offence for each car make.
If we take a quick look at this figure, the “Expired Meter” offence seems to be the most likely reason for receiving a parking ticket. But if we look a little closer, we see a handful of cars that have a relatively larger proportion of parking tickets received due to parking in “No Stopping” zones. These cars are usually unknown or unbranded types of cars (e.g. UNMARKED, UNLISTED, INTERNATIONAL, etc.), or some build of car that I am not familiar with (e.g. FREGHTLINER, HINO, KENWORTH, etc.). I hypothesize that very important visitors, such as diplomats or VIPs with chauffeurs, are the the types of people who drive (or are driven in) these types of unmarked cars. Another interesting make of car is the UTILITY vehicle. Drivers that drive UTILITY vehicles seem to receive many “OTHER” parking tickets. One can only guess what this means. Maybe drivers of UTILITY vehicles like to break things as they park their giant cars, warranting an uncategorized parking ticket?
Another interesting question is when should we be careful with our parking here in Vancouver? Are there certain months or days or times during the day that we should avoid? Are people more prone to receiving tickets during the weekend? During holidays?
We first take a broad look at the data. We Look at the yearly totals:
It looks like the number of parking tickets given out peaked in 2005 and then experienced a steady decline up to 2008. However, this is a little misleading, since the data ranges from January 1, 2004 to September 25, 2008. During the final year of our data set, we have 3 fewer months of data compared to the other years. Also, as we saw before and will see in our next few plots, there is something strange with the data in 2007.
Let us take a look at the monthly totals for the range of our data set. We display the same data with bar plots and with a line chart.
The bar plots and the line charts show the same picture. We focus on the line chart. We seem to be seeing around 30000 parking tickets per month, with some random(?) peaks and troughs (which are exaggerated in the line chart due to the y axis starting at 10000), but suddenly see a great drop in the number of parking tickets issued in 2007 starting in July and lasting until around October before reaching “normal” levels again in November of that year. Doing a quick search in Google did not yield any ideas as to why this occurred. Maybe the appointment of the new police chief, Jim Chu, reminded everyone to follow the law more closely during these months (though, probably not). It would be interesting to find what caused this to occur.
Now we look at which months experienced the larged number of parking tickets in our data set. If we take a look at monthly totals, it appears that May is the worst time to risk a parking infraction, while October seems to be a good time to park without putting money in the parking meter.
However, if we correct for the fact that we do not have data for October, November, or December in 2008, we see a different picture. We take the monthly means and plot below:
Now it appears that November is the riskiest month to park illegally. In fact, July, August and September are probably not good months to park either due to that anomaly observed in 2007.
However, there may still be a ray of hope for those looking to park for free (albeit wrongfully) at the meter. If we take a look at parking tickets by days of the week and by holidays, we can see that parking enforcement officers may be more strict on some days than others. Let us take a look at the overall parking ticket count by day of the week.
Sunday experiences the lowest number of parking tickets being handed out, followed by Saturday. Then we see a unimodal-like distribution of parking tickets handed out during Monday to Friday, peaking on Wednesday (maybe parking officers are grumpy on hump day?). The low number of tickets on Sunday and Saturday is probably due to weekend parking rules where some areas are free to park only on weekends. In addition, some meters in some locations may be free to park on weekends, and there may be fewer parking officers working during the weekends.
Let us take a look at the proportion of tickets given on each day of the week facetted by offence. It will be apparent that the distributions are roughly similar. It seems that Sunday is the best time to go out and park in Vancouver with the least worry of parking receiving parking tickets (except maybe for your expired meter).
Now we will take a look at what times are best for parking. We created an hour variable in the data set to get an idea of which times are bad for parking.
We try looking at the number of parking tickets by hour. Then we will check if the distribution is similar for every day of the week, and then we will see if certain types of offences are committed at different hours of the day.
Perhaps the time when we should be most mindful of our vehicles is around 3:00 PM. The largest number of parking tickets are given out around this time. This pattern persists for all days of the week.
Now, we take a look at the proportion of tickets given out during a specific time given the offence. It appears that different offences have different peak times of getting caught. In particular, we have a very strange distribution for the “No Stopping” offence - 3:00 PM is the peak time for giving out tickets for parking or stopping in a “No Stopping” zone.
Sorry for the rotated x-axis labels. The labels would overlap otherwise!
We zoom in to the “No Stopping” plot from the above plot to get a better look.
Next, we look at parking during the holidays!
We will check the parking tickets data during the holidays. We consider the 9 statutory holidays we have in B.C.: New Year's, Good Friday, Victoria Day, Canada Day, B.C. Day, Labour Day, Thanksgiving Day, Remembrance Day, and Christmas Day (this data was before the creation of Family Day).
I assume that people have more time to drive around and visit parts of Vancouver during the holidays. Will we see fewer, or more parking violations during the holidays? We can see the number of parking tickets given during each holday in the range of the data set. To have a point of comparison, we also plot the overall mean daily number of parking tickets (black), the weekday mean number (blue), and the weekend mean number (red).
It seems that during statutory holidays, we experience fewer parking tickets, below even the Weekend mean value (red dashed line), except in 2006 and 2007 for a couple of the holidays. During all the holidays, we see fewer parking tickets handed out compared to the average. Again, this is may due to parking enforcement officers taking a day off.
As they say, “Catch me once, shame on you… catch me twice, shame on me”. A parking fine is supposed to be a deterrent that stops people from parking illegally. Apparently, this is not effective for some people.
If we take a histogram of the number of parking tickets issued to each license plate in the data base, we see a highly skewed distribution, where most of the mass is on 1. The vast majority of license plates in the data set have only 1 parking ticket associated with it.
The keen observer will notice that only 99.689% of the data is displayed in the histogram. What happened to the remaining 0.302% of the data? We show the top 25 offenders in the table below.
| Plate | Make | No. Tickets |
|---|---|---|
| 0287GE | INTERNATIONAL | 362 |
| 1802EY | INTERNATIONAL | 339 |
| 0652AH | INTERNATIONAL | 306 |
| 1196GJ | MERCEDES | 229 |
| 7962TM | UNMARKED | 223 |
| 540HCL | BENTLEY | 204 |
| 6773NS | FORD | 200 |
| 1191GJ | MERCEDES | 197 |
| 474KEF | VOLKSWAGEN | 188 |
| 9531BA | UNMARKED | 174 |
| 9842FB | INTERNATIONAL | 173 |
| KSD207 | BMW | 172 |
| 772KEA | MERCEDES | 165 |
| 0117DF | INTERNATIONAL | 152 |
| 1193GJ | MERCEDES | 151 |
| 119ACC | JEEP | 149 |
| 4581PY | UNMARKED | 147 |
| INCMPT | FORD | 144 |
| 6617HK | UNLISTED | 141 |
| 627FKR | VOLKSWAGEN | 140 |
| AXN571 | BMW | 140 |
| 8598HS | UNMARKED | 138 |
| 982CXC | HONDA | 137 |
| 180KJM | MERCEDES | 135 |
| 2538AY | UNMARKED | 130 |
There is a handful of people who accrue a lot of fines from parking tickets. If a parking ticket were $35 (the usual amount, if you pay the fine early), these owners would be paying in excess of $3500 in just parking tickets! However, if we shift through the data, we have a number of cars (>1000) with license plates marked as “INCMPT” - probably meaning “Incompatible” or “Incomplete”. These may be foreign cars or cars with custom license plates that cannot be enterd into the system properly.
Finally, we will check which areas receive the largest number of parking tickets in the City of Vancouver. Looking through the data set, it appears that the address variable is accurate to a 100 block. In other words, all parking tickets issued on 1000 Robson St. to 1100 Robson street will marked as 1050 Robson St. in the data. We display the top 25 worst places to park in Vancouver in a table.
| Address | No. Tickets |
|---|---|
| 1050 Robson St. | 17899 |
| 1150 Robson St. | 15674 |
| 850 Hornby St. | 10501 |
| 650 Broadway St. W. | 10426 |
| 1050 Alberni St. | 9729 |
| 1750 Broadway St. W. | 9121 |
| 1050 Hornby St. | 9100 |
| 650 Hornby St. | 8863 |
| 1050 Mainland St | 8328 |
| 1050 Homer St. | 8269 |
| 1150 Hamilton St. | 8094 |
| 850 Howe St. | 7941 |
| 850 Broadway St. W. | 7848 |
| 2250 4th Ave W. | 7634 |
| 950 Hornby St. | 7462 |
| 1950 4th Ave W. | 7403 |
| 2650 Granville St. | 7354 |
| 1650 Broadway St. W. | 7062 |
| 1350 Broadway St. W. | 6961 |
| 1150 Broadway St. W. | 6903 |
| 1050 Hamilton St. | 6889 |
| 950 Broadway St. W. | 6821 |
| 150 Davie St. | 6676 |
| 1150 Homer St. | 6658 |
| 1150 Mainland St | 6562 |
It looks like Downtown Vancouver, and parts of Broadway St. are the hottest places to be for parking enforcement officers! Let us take a look on the map where these are:
This map is scrollable, and you can click on the tool tips for more information.
We will look at the big picture. Where are the most parking tickets handed out? We overlay a hexbin plot over a static map of the City of Vancouver. The areas with high frequency of parking tickets handed out are highlighted in red, while low frequency areas are coloured in black.
It looks like the Downtown area, as well as the Broadway Corridor, W 4th Ave and the Granville/Cambie areas attract bad parkers.
This raises another question. What kind of tickets are given out in different areas of Vancouver? Is it uniformly distributed accross all of Vancouver, or is it more likely to receive certain types of parking tickets in some parts of Vancouver over other regions? We overlay a scatter plot, faceted by type of parking ticket.
It looks like the stylesheet is shrinking the image, making the text unreadable. Here is a link to the full resolution image.
Interestingly, the spatial distributions of the different parking offences differ. The most common kind of parking ticket, “Expired Meter”, seems to be concentrated in the Downtown area. This is not surprising, as parking meters are not found everywhere in Vancouver. Parking tickets given for stopping in bus zones are concentrated in major roads where buses travel.
I had a lot of fun working with the Vancouver parking tickets data set. This data set was much larger than I was used to dealing with in R, and that presented several challenges, such as efficiently aggregating the data for visualizing, and plotting large amounts of data. It was unfortunate that there were no truly quantitative variables in the data set. On the plus side, there was a spatial aspect to the data set, and I was able to get my hands dirty plotting maps. If I had more time, I would investigate some quantitative variables that might be linked to the data, such as the fine amount associated with each parking ticket, and also try to visualize any spatio-temporal trends that may be present. I also want to spend more time plotting on maps.
The greatest lesson that I learned from this project is that cleaning the data and reading in the data for analysis is the most time consuming step. I underestimated the time that it would take to get to actually start plotting the data due to concerns of data quality, converting different file formats, etc. Note to self: start projects earlier!
Although this report is titled “Avoiding Parking Tickets in Vancouver”, I do not claim that this report will help you avoid getting parking tickets. Just be smart, and park where you're supposed to, and make sure to pay your dues!
All the data and code for this project is on my Github