This visualization of car towing events timing in the city of Baltimore was inspired by Max Watson’s post Time Maps: Visualizing Discrete Events Across Many Timescales. Please follow the link for more details on the logic and interpretation of time map graphs.
The data was obtained from Open Baltimore data portal and imported into a CSV file named DOT_Towing.csv.
library(readr)
library(lubridate)
library(ggplot2)
library(ggthemes)
library(scales)
library(dplyr)
# markdown options
knitr::opts_chunk$set(tidy=FALSE, fig.align='center')
# raw data from Baltimore portal
raw <- read_csv('../data/raw/DOT_Towing.csv')
Before plotting the time maps of tow events, let’s check the intraday pattern of towing frequency. First, we need to strop the dates from the data/time variable.
# convert from text to date
towedDateTime <- mdy_hms(raw$towedDateTime)
towedTime <- towedDateTime
# change the date to a fixed day in a year (keeps time only)
year(towedTime) <- 2015
month(towedTime) <- 1
day(towedTime) <- 1
Next, we add a day of the week and a weekend variable and remove tow companies with less than 3,000 records.
df_day <-
data_frame(
towCompany = raw$towCompany,
towedDateTime,
towedTime) %>%
# remove missing
filter(!is.na(towCompany) & !is.na(towedDateTime)) %>%
# week day variables
mutate(
week_day = wday(towedDateTime, label = T, abbr = T),
week_end = ifelse(week_day %in% c('Sun','Sat'),
'Weekend','Weekday') %>% factor) %>%
# remove tow companies with few records
group_by(towCompany) %>%
mutate(company_n = n()) %>%
filter(company_n>=3000)
head(df_day)
## Source: local data frame [6 x 6]
## Groups: towCompany [2]
##
## towCompany towedDateTime towedTime week_day
## (chr) (time) (time) (fctr)
## 1 City 2015-11-13 13:05:00 2015-01-01 13:05:00 Fri
## 2 City 2015-11-13 12:41:00 2015-01-01 12:41:00 Fri
## 3 City 2015-11-13 12:33:00 2015-01-01 12:33:00 Fri
## 4 McDels Enterprise Inc. 2015-11-13 12:33:00 2015-01-01 12:33:00 Fri
## 5 McDels Enterprise Inc. 2015-11-13 12:24:00 2015-01-01 12:24:00 Fri
## 6 McDels Enterprise Inc. 2015-11-13 12:23:00 2015-01-01 12:23:00 Fri
## Variables not shown: week_end (fctr), company_n (int)
We use the data above to plot a histogram of all towing events in the dataset by time of day. The plot reveals a spike in towing activity around 8AM and even bigger spike around 5PM.
p <-
ggplot(df_day, aes(towedTime))+
geom_histogram(binwidth=60*60, fill='white', color='black')+
scale_x_datetime(breaks = date_breaks("3 hours"),
labels = date_format("%I%p"))+
theme_few(base_size = 14) +
theme(axis.text.x = element_text(angle = 90, vjust = 0))
p
If we plot Saturday and Sunday separately, we can see that the weekends are much more evenly distributed and do not exhibit the same spikes in towing as the regular weekdays.
p + facet_wrap(~week_end)
Finally, we can compare the timing patterns between the tow companies. Note that the plot scale is different for each company, because we’re interested timing not volume.
p + facet_grid(towCompany~week_end, scales = 'free_y')+
theme(strip.text.y = element_text(angle = 0, hjust = 0),
axis.text.y = element_text(size = 8))
The calculations for making the time maps are straight-forward and are shown below.
df_tm <-
raw %>%
# remove missing companies
filter(!is.na(towCompany)) %>%
select(towCompany, towedDateTime) %>%
# convert date time
mutate(towedDateTime = mdy_hms(towedDateTime)) %>%
# remove companies with few records
group_by(towCompany) %>%
filter(n()>=3000) %>%
# get before and after times
arrange(towedDateTime) %>%
mutate(prv = lag(towedDateTime),
nxt = lead(towedDateTime)) %>%
filter(!is.na(prv) & !is.na(nxt)) %>%
ungroup() %>%
# calculate x and y mapping
mutate(x = as.numeric(towedDateTime - prv),
y = as.numeric(nxt - towedDateTime),
towCompany = factor(towCompany)) %>%
# change 0's to 1 seconds (for log scale plot) and convert to hours
mutate(x = pmax(1, x)/3600,
y = pmax(1, y)/3600)
head(df_tm %>% select(-prv, -nxt))
## Source: local data frame [6 x 4]
##
## towCompany towedDateTime x y
## (fctr) (time) (dbl) (dbl)
## 1 City 2010-10-25 14:49:00 2.618333e+01 1.389767e+03
## 2 City 2010-12-22 12:35:00 1.389767e+03 9.166667e-01
## 3 City 2010-12-22 13:30:00 9.166667e-01 7.500000e-01
## 4 City 2010-12-22 14:15:00 7.500000e-01 3.500000e-01
## 5 City 2010-12-22 14:36:00 3.500000e-01 6.666667e-02
## 6 City 2010-12-22 14:40:00 6.666667e-02 2.777778e-04
Below is the time map for the entire dataset.
p <-
ggplot(
data = df_tm,
mapping = aes(x,y)
)+
geom_density2d()+
stat_density2d(aes(fill = ..level..), geom="polygon")+
scale_fill_gradient(low="greenyellow", high="red")+
scale_x_log10()+ scale_y_log10()+
geom_abline(linetype=2)+
theme_few()+
theme(text=element_text(size=18),
plot.title = element_text(lineheight=1.75, face="bold"),
strip.text = element_text(size = 12, face="bold"),
legend.position="none",
axis.text.x=element_blank(),
axis.text.y=element_blank())+
labs(x='Time from Previous Towing',
y="Time untill Next Towing",
title='Baltimore Time Maps of Tow Events')
p + theme(plot.title = element_text(size=16))
Finally, we can also plot the time map of each individual tow company.
p +
facet_wrap(~towCompany)+
labs(title='Baltimore Time Maps of Tow Events by Company')