This case study is the capstone project for the Google Data Analytics Certificate Course on Coursera. Though the scenario and company in the project are fictional, the data sets included are real and provided for public consumption courtesy of Lyft / Divvy / City of Chicago (and Evanston) and all of them can be found here.
The data used is anonymized by Divvy and does not include any personally identifiable information. The data sets are stored on Divvy servers, in archived .csv documents, with each .csv file containing a month’s worth of user trips for all trips after April 2020. Data prior to April 2020 is broken down by yearly quarters. Divvy updates the data once a month.
The Coursera project challenge is to analyze the last 12 months of data in order to identify differences in how casual users and annual members use the Cyclistic bike share service, in order to strategically deploy marketing and grow annual user subscriptions.
I analyzed data from November 2021 through October 2022. All of the data used in this study was downloaded on November 11, 2022. I performed my analysis using R in RStudio and all data cleaning and manipulation was done non-destructively in the RStudio environment.
Below, I describe how I handled importing and processing the data sets, performed analysis, presented the findings, and concluded with recommendations for next steps. I have shown all my code and I’m always open to feedback on how to improve it. Thank you!
R packages I used for the study:
library(tidyverse) # tidying + wrangling data
library(scales) # number formatting
library(lubridate) # date+time formatting
library(hms) # time formatting
library(leaflet) # maps! I batch-imported all .csv files using data.table’s fread into a single data table.
cyc_trips <-
dir(path =".",
pattern = "*.csv") %>%
map(data.table::fread) %>%
reduce(rbind)When checking the structure of the imported data, we can see that all trips are sorted by a unique ID, and each trip includes data about a type of bike used, whether the user was casual or an annual member, the date and time the trip started and ended, and the location where the trip started and ended. That’s a good start, but before analyzing further I would need to ensure there are no bad or missing entries which could affect the interpretation.
str(cyc_trips) # check structure## Classes 'data.table' and 'data.frame': 5755694 obs. of 13 variables:
## $ ride_id : chr "7C00A93E10556E47" "90854840DFD508BA" "0A7D10CDD144061C" "2F3BE33085BCFF02" ...
## $ rideable_type : chr "electric_bike" "electric_bike" "electric_bike" "electric_bike" ...
## $ started_at : POSIXct, format: "2021-11-27 13:27:38" "2021-11-27 13:38:25" ...
## $ ended_at : POSIXct, format: "2021-11-27 13:46:38" "2021-11-27 13:56:10" ...
## $ start_station_name: chr "" "" "" "" ...
## $ start_station_id : chr "" "" "" "" ...
## $ end_station_name : chr "" "" "" "" ...
## $ end_station_id : chr "" "" "" "" ...
## $ start_lat : num 41.9 42 42 41.9 41.9 ...
## $ start_lng : num -87.7 -87.7 -87.7 -87.8 -87.6 ...
## $ end_lat : num 42 41.9 42 41.9 41.9 ...
## $ end_lng : num -87.7 -87.7 -87.7 -87.8 -87.6 ...
## $ member_casual : chr "casual" "casual" "casual" "casual" ...
## - attr(*, ".internal.selfref")=<externalptr>
I started by checking for missing data. After removing all rows with
NA and empty string values, I will be able to analyze the
remaining 76.6% of the data.
# replace all "", (empty strings) with NA, which is easier to wrangle
cyc_trips <- cyc_trips %>%
mutate_if(is.character, ~na_if(.,""))
# count incomplete cases (missing/NA values)
# divide by all rows to get percentage of remaining observations
# neatly print percentage of complete data
rows_dirty <- sum(!complete.cases(cyc_trips))
rows_clean <- sum(complete.cases(cyc_trips))
rows_all <- nrow(cyc_trips)
paste0("Remaining Rows:", label_percent(accuracy=.01)(rows_clean/rows_all)) ## [1] "Remaining Rows:76.63%"
cyc_trips <- na.omit(cyc_trips) # recreate the data.table with NA rows omittedThe dataset included three bike types. I confirmed with Divvy’s data team that only two types of rides are available, “classic_bike” and “electric_bike”, and that “docked_bike” is effectively “classic_bike”. I renamed all “docked_bike” entries to “classic_bike” for a more consistent analysis of results.
cyc_trips %>%
group_by(rideable_type) %>%
summarize(number_of_trips = n())## # A tibble: 3 × 2
## rideable_type number_of_trips
## <chr> <int>
## 1 classic_bike 2633599
## 2 docked_bike 179635
## 3 electric_bike 1597204
# rename all docked_bike to classic_bike
cyc_trips$rideable_type[cyc_trips$rideable_type == "docked_bike"] <-
"classic_bike"
cyc_trips %>%
group_by(rideable_type) %>%
summarize(number_of_trips = n())## # A tibble: 2 × 2
## rideable_type number_of_trips
## <chr> <int>
## 1 classic_bike 2813234
## 2 electric_bike 1597204
The data set combined times and dates into “started_at” and “ended_at” columns. I extracted the values into separate columns for more descriptive analysis and grouping of data. I made extra columns here that I wanted to have available to analysis, but I will remove them as I optimize the code.
started <- cyc_trips$started_at
ended <- cyc_trips$ended_at
# split into groups for easier subsetting
cyc_trips <- cyc_trips %>% mutate(
started_date = as.Date (started),
started_day = wday(started, label = TRUE),
started_month = month(started, label = TRUE),
started_time = as_hms(started),
started_hour = hour(started),
started_min = minute(started),
ended_time = as_hms(ended),
ended_hour = hour(ended),
ended_min = minute(ended),
trip_dur_sec = duration(as.numeric(ended - started))
) %>%
relocate(member_casual, .after = rideable_type)
str(cyc_trips)## Classes 'data.table' and 'data.frame': 4410438 obs. of 23 variables:
## $ ride_id : chr "4CA9676997DAFFF6" "F3E84A230AF2D676" "A1F2C92308007968" "9B871C3B14E9BEC4" ...
## $ rideable_type : chr "classic_bike" "classic_bike" "electric_bike" "classic_bike" ...
## $ member_casual : chr "casual" "casual" "casual" "casual" ...
## $ started_at : POSIXct, format: "2021-11-26 10:27:28" "2021-11-15 09:35:03" ...
## $ ended_at : POSIXct, format: "2021-11-26 11:22:13" "2021-11-15 09:42:08" ...
## $ start_station_name: chr "Michigan Ave & Oak St" "Clark St & Grace St" "Leamington Ave & Hirsch St" "Desplaines St & Kinzie St" ...
## $ start_station_id : chr "13042" "TA1307000127" "307" "TA1306000003" ...
## $ end_station_name : chr "Michigan Ave & Oak St" "Clark St & Leland Ave" "Leamington Ave & Hirsch St" "Desplaines St & Kinzie St" ...
## $ end_station_id : chr "13042" "TA1309000014" "307" "TA1306000003" ...
## $ start_lat : num 41.9 42 41.9 41.9 41.9 ...
## $ start_lng : num -87.6 -87.7 -87.8 -87.6 -87.6 ...
## $ end_lat : num 41.9 42 41.9 41.9 41.9 ...
## $ end_lng : num -87.6 -87.7 -87.8 -87.6 -87.6 ...
## $ started_date : Date, format: "2021-11-26" "2021-11-15" ...
## $ started_day : Ord.factor w/ 7 levels "Sun"<"Mon"<"Tue"<..: 6 2 4 3 7 5 3 6 5 1 ...
## $ started_month : Ord.factor w/ 12 levels "Jan"<"Feb"<"Mar"<..: 11 11 11 11 11 11 11 11 11 11 ...
## $ started_time : 'hms' num 10:27:28 09:35:03 16:27:02 19:51:36 ...
## ..- attr(*, "units")= chr "secs"
## $ started_hour : int 10 9 16 19 19 11 22 16 8 20 ...
## $ started_min : int 27 35 27 51 14 58 14 48 40 29 ...
## $ ended_time : 'hms' num 11:22:13 09:42:08 17:04:28 20:11:17 ...
## ..- attr(*, "units")= chr "secs"
## $ ended_hour : int 11 9 17 20 19 12 22 16 8 20 ...
## $ ended_min : int 22 42 4 11 33 8 44 53 48 33 ...
## $ trip_dur_sec :Formal class 'Duration' [package "lubridate"] with 1 slot
## .. ..@ .Data: num 3285 425 2246 1181 1149 ...
## - attr(*, ".internal.selfref")=<externalptr>
With calculable durations, I came across three types of edge scenarios that would impact the results:
# there are 220 trips over 24hrs
cyc_trips %>%
filter(trip_dur_sec > 86400) %>%
summarize(n = n())## n
## 1 220
# there are 74 trips with negative durations
cyc_trips %>%
filter(trip_dur_sec < 0) %>%
summarize(n = n()) ## n
## 1 74
# there are 75206 trips less than a minute long
cyc_trips %>%
filter(trip_dur_sec >= 0 & trip_dur_sec < 60) %>%
summarize(n = n())## n
## 1 75206
Users that do not return their bikes within 24 hours may be charged a lost or stolen bike fee of $1,200 plus tax (source). There were enough trips started between November 2021 and October 2022 to warrant potentially charging $264,000 in fines, so it may be worth looking at separately.
The amount of negative-duration trips was negligible and I didn’t want to investigate it further but I imagine something with leap years, unix time goofs, etc.
I decided to drop all the rows containing both of these.
Finally, for trips between 0-60 seconds long, Divvy support acknowledged they don’t categorize or track specific reasons for those short trips, and speculated they may be due to users re-docking bikes to make sure they’re secure. As an ex-Chicago biker, I imagine there were also a number of cases where bikes had flats or other damage that has not been serviced yet, and the user only discovered this after taking the bike out and getting on before quickly returning it.
I decided to keep those shorter trips in there since I can also imagine someone downtown Chicago taking a bike out for a quick ride down the block which may take less than a minute, and Divvy does round up each ride up to 1 minute, so there is some potential revenue there.
Dealing with the edge cases still let me proceed with 76.6% of the data, and I felt comfortable with analyzing the rest.
# filtering out all rows that are less than 0 seconds, and longer than 24h (86400 seconds)
cyc_trips <- cyc_trips %>%
filter(trip_dur_sec >= 0 & trip_dur_sec < 86400)#count current clean rows vs original number of rows
rows_clean2 <- nrow(cyc_trips)
paste0("Remaining Rows:", label_percent(accuracy=.01)(rows_clean2/rows_all)) #neatly print percentage## [1] "Remaining Rows:76.62%"
With a cleaner data set, we can start to look for trends and patterns. The goal of the marketing campaign in this study is to convert casual users into annual members, so I want to examine the habits of both types of users and find when and where there may be opportunities to target advertising or create incentives.
For reference, here’s a current offering of Divvy’s plan options:
Since the publicly provided data is anonymized, it is impossible to tie the number of rides to individual users. It is also impossible to tell which rides by casual users used a day pass. While Divvy likely has more data available internally, it’s still possible to make more general observations and identify the best times and places to deploy marketing.
First, let’s take a look at the number of the rides taken by users and casual members over the last year, broken down by month, day of week, and time of day.
Casual users started significantly more rides in May than the months prior, and the trip count remained high until October. June through August were the peak months for casual riders and may be the best months to target advertising.
ntrips_monthly_memvcas <- cyc_trips %>%
group_by(started_month,
member_casual) %>%
summarize(n = n())
ntrips_monthly_memvcas %>%
ggplot(aes(
x = started_month,
y = n,
fill = member_casual,
group = member_casual
)) +
geom_col(width = .7, position = position_dodge(width = .8)) +
#labels
labs(title = "Number of Trips Started - Month",
subtitle = "November 2021 — October 2022",
caption = "Data downloaded on November 11, 2022") +
scale_y_continuous(breaks = breaks_width(50000),
expand = expansion(mult = c(0, .1))) +
#months, reversed scale
scale_x_discrete(expand = expansion(add = c(1, 1)),
limits = rev(
c(
"Nov",
"Dec",
"Jan",
"Feb",
"Mar",
"Apr",
"May",
"Jun",
"Jul",
"Aug",
"Sep",
"Oct"
)
)) +
geom_text(
aes(
x = started_month,
y = n,
label = number(n, accuracy = 1, scale_cut = cut_short_scale()),
hjust = -.1,
vjust = .5,
color = member_casual,
),
position = position_dodge(width = .8)
) +
theme_minimal() +
scale_color_manual(
labels = c("Casual User", "Annual Member"),
values = c(casual = "#3B984B", member = "#3b5998")
) +
scale_fill_manual(
labels = c("Casual User", "Annual Member"),
values = c(casual = "#3B984B", member = "#3b5998")
) +
guides(fill = guide_legend(reverse = TRUE),
# THIS REVERSES THE LEGEND ORDER!
color = guide_legend(reverse = TRUE)) +
coord_flip(clip = "off") +
theme(
text = element_text(family = "Fira Sans"),
plot.background = element_rect(fill = "#f7f7f7", color = NA),
plot.margin = unit(rep(2, 4), "cm"),
plot.title = element_text(family = "Fira Sans Medium", size = 18),
axis.line.y = element_line(lineend = "round", color = "#5D5D5D"),
axis.title.y = element_blank(),
axis.title.x = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_text(color = "#5D5D5D"),
panel.grid = element_blank(),
#panel.grid.major.y = element_line(linetype = "dotted", color = "#dfe3ee"),
legend.position = "top",
legend.justification = "left",
legend.title = element_blank(),
legend.text = element_text(),
legend.key.width = unit(1, "cm")
)We can also see that while annual members ride more Monday - Friday, casual users outnumber members on the weekends. Advertising targeting Saturday and Sunday would catch the most casual users.
ntrips_daily_memvcas <- cyc_trips %>%
group_by(started_day,
member_casual) %>%
summarize(n = n())
ntrips_daily_memvcas %>%
ggplot(aes(
x = started_day,
y = n,
fill = member_casual,
group = member_casual
)) +
geom_col(width = .7, position = position_dodge(width = .8)) +
#labels
labs(title = "Number of Trips Started - Day",
subtitle = "November 2021 — October 2022",
caption = "Data downloaded on November 11, 2022") +
scale_y_continuous(breaks = breaks_width(50000),
expand = expansion(mult = c(0, .1))) +
scale_x_discrete(expand = expansion(add = c(1, 1)),
limits = rev(c(
"Mon",
"Tue",
"Wed",
"Thu",
"Fri",
"Sat",
"Sun"
))) +
geom_text(
aes(
x = started_day,
y = n,
label = number(n, accuracy = 1, scale_cut = cut_short_scale()),
hjust = -.1,
vjust = .5,
color = member_casual,
),
position = position_dodge(width = .8)
) +
theme_minimal() +
scale_color_manual(
labels = c("Casual User", "Annual Member"),
values = c(casual = "#3B984B", member = "#3b5998")
) +
scale_fill_manual(
labels = c("Casual User", "Annual Member"),
values = c(casual = "#3B984B", member = "#3b5998")
) +
# THIS REVERSES THE LEGEND ORDER!
guides(fill = guide_legend(reverse = TRUE),
color = guide_legend(reverse = TRUE)) +
coord_flip(clip = "off") +
theme(
text = element_text(family = "Fira Sans"),
plot.background = element_rect(fill = "#f7f7f7", color = NA),
plot.margin = unit(rep(2, 4), "cm"),
plot.title = element_text(family = "Fira Sans Medium"),
axis.line.y = element_line(lineend = "round", color = "#5D5D5D"),
axis.title.y = element_blank(),
axis.title.x = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_text(color = "#5D5D5D"),
panel.grid = element_blank(),
#panel.grid.major.y = element_line(linetype = "dotted", color = "#dfe3ee"),
legend.position = "top",
legend.justification = "left",
legend.title = element_blank(),
legend.text = element_text(),
legend.key.width = unit(1, "cm")
)When we look at times of day, we can see a spike of usage for annual members followed by a dip around 8:00 and right after, however casual usage rises gradually throughout the day, peaking at 17:00 (5pm). This is particularly apparent from Monday through Friday.
On Saturday and Sunday, the usage of both groups gradually rises throughout the day and we can see the heaviest usage between 12:00 and 16:00 (4pm), with heaviest usage for casual users around 15:00 (3pm).
This suggests advertising should target mid-day and late-afternoon to reach casual users.
ntrips_hourly_memvcas <- cyc_trips %>%
group_by(started_hour,
member_casual) %>%
summarize(n = n())
ntrips_hourly_memvcas %>%
ggplot(
aes(
x = started_hour,
y = n,
color = member_casual,
fill = member_casual,
group = member_casual,
)
) +
geom_col(width = .4, position = position_dodge(width = .5)) +
geom_text(
aes(
x = started_hour,
y = n,
label = number(n, accuracy = .1, scale_cut = cut_short_scale()),
hjust = -.3,
vjust = .5,
color = member_casual,
),
position = position_dodge(width = .5),
size = 3
) +
labs(title = "Number of Trips - Time of Day",
subtitle = "November 2021 — October 2022",
caption = "Data downloaded on November 11, 2022") +
scale_y_continuous(expand = expansion(mult = c(0, .1))) +
scale_x_reverse(
expand = expansion(add = c(0, 0)),
# reverse to make it make more sense time-wise
breaks = breaks_width(-1),
# USE NEGATIVE VALUE! breakdown DOWN by one
labels = label_number(suffix = ":00")
) +
theme_minimal() +
scale_color_manual(
labels = c("Casual User", "Annual Member"),
values = c(casual = "#3B984B", member = "#3b5998")
) +
scale_fill_manual(
labels = c("Casual User", "Annual Member"),
values = c(casual = "#3B984B", member = "#3b5998")
) +
# THIS REVERSES THE LEGEND ORDER!
guides(fill = guide_legend(reverse = TRUE),
color = guide_legend(reverse = TRUE)) +
coord_flip(clip = "off") +
theme(
text = element_text(family = "Fira Sans"),
plot.background = element_rect(fill = "#f7f7f7", color = NA),
plot.margin = unit(rep(2, 4), "cm"),
plot.title = element_text(family = "Fira Sans Medium"),
axis.line.y = element_line(lineend = "round", color = "#5D5D5D"),
axis.title.y = element_blank(),
axis.title.x = element_blank(),
axis.text.x = element_blank(),
panel.grid = element_blank(),
legend.position = "top",
legend.justification = "left",
legend.title = element_blank(),
legend.text = element_text(),
legend.key.width = unit(1, "cm")
)When we look at times of day, we can see a spike of usage for annual members followed by a dip around 8:00 and right after, however casual usage rises gradually throughout the day, peaking at 17:00 (5pm). This is particularly apparent from Monday through Friday.
On Saturday and Sunday, the usage of both groups gradually rises throughout the day and we can see the heaviest usage between 12:00 and 16:00 (4pm), with heaviest usage for casual users around 15:00 (3pm).
This suggests advertising should target mid-day and late-afternoon to reach casual users.
ntrips_hourly_memvcas <- cyc_trips %>%
filter(!(started_day == "Sat" | started_day == "Sun")) %>%
group_by(started_hour,
member_casual) %>%
summarize(n = n())
ntrips_hourly_memvcas %>%
ggplot(
aes(
x = started_hour,
y = n,
color = member_casual,
fill = member_casual,
group = member_casual,
)
) +
geom_col(width = .4, position = position_dodge(width = .5)) +
geom_text(
aes(
x = started_hour,
y = n,
label = number(n, accuracy = .1, scale_cut = cut_short_scale()),
hjust = -.3,
vjust = .5,
color = member_casual,
),
position = position_dodge(width = .5),
size = 3
) +
labs(title = "Number of Trips - Time of Day (Mon - Fri)",
subtitle = "November 2021 — October 2022",
caption = "Data downloaded on November 11, 2022") +
scale_y_continuous(expand = expansion(mult = c(0, .1))) +
scale_x_reverse(
expand = expansion(add = c(0, 0)),
# reverse to make it make more sense time-wise
breaks = breaks_width(-1),
# USE NEGATIVE VALUE! breakdown DOWN by one
labels = label_number(suffix = ":00")
) +
theme_minimal() +
scale_color_manual(
labels = c("Casual User", "Annual Member"),
values = c(casual = "#3B984B", member = "#3b5998")
) +
scale_fill_manual(
labels = c("Casual User", "Annual Member"),
values = c(casual = "#3B984B", member = "#3b5998")
) +
# THIS REVERSES THE LEGEND ORDER!
guides(fill = guide_legend(reverse = TRUE),
color = guide_legend(reverse = TRUE)) +
coord_flip(clip = "off") +
theme(
text = element_text(family = "Fira Sans"),
plot.background = element_rect(fill = "#f7f7f7", color = NA),
plot.margin = unit(rep(2, 4), "cm"),
plot.title = element_text(family = "Fira Sans Medium"),
axis.line.y = element_line(lineend = "round", color = "#5D5D5D"),
axis.title.y = element_blank(),
axis.title.x = element_blank(),
axis.text.x = element_blank(),
panel.grid = element_blank(),
legend.position = "top",
legend.justification = "left",
legend.title = element_blank(),
legend.text = element_text(),
legend.key.width = unit(1, "cm")
)When we look at times of day, we can see a spike of usage for annual members followed by a dip around 8:00 and right after, however casual usage rises gradually throughout the day, peaking at 17:00 (5pm). This is particularly apparent from Monday through Friday.
On Saturday and Sunday, the usage of both groups gradually rises throughout the day and we can see the heaviest usage between 12:00 and 16:00 (4pm), with heaviest usage for casual users around 15:00 (3pm).
This suggests advertising should target mid-day and late-afternoon to reach casual users.
ntrips_hourly_memvcas <- cyc_trips %>%
filter(started_day == "Sat" | started_day == "Sun") %>%
group_by(started_hour,
member_casual) %>%
summarize(n = n())
ntrips_hourly_memvcas %>%
ggplot(
aes(
x = started_hour,
y = n,
color = member_casual,
fill = member_casual,
group = member_casual,
)
) +
geom_col(width = .4, position = position_dodge(width = .5)) +
geom_text(
aes(
x = started_hour,
y = n,
label = number(n, accuracy = .1, scale_cut = cut_short_scale()),
hjust = -.3,
vjust = .5,
color = member_casual,
),
position = position_dodge(width = .5),
size = 3
) +
labs(title = "Number of Trips - Time of Day (Sat - Sun)",
subtitle = "November 2021 — October 2022",
caption = "Data downloaded on November 11, 2022") +
scale_y_continuous(expand = expansion(mult = c(0, .1))) +
scale_x_reverse(
expand = expansion(add = c(0, 0)),
# reverse to make it make more sense time-wise
breaks = breaks_width(-1),
# USE NEGATIVE VALUE! breakdown DOWN by one
labels = label_number(suffix = ":00")
) +
theme_minimal() +
scale_color_manual(
labels = c("Casual User", "Annual Member"),
values = c(casual = "#3B984B", member = "#3b5998")
) +
scale_fill_manual(
labels = c("Casual User", "Annual Member"),
values = c(casual = "#3B984B", member = "#3b5998")
) +
# THIS REVERSES THE LEGEND ORDER!
guides(fill = guide_legend(reverse = TRUE),
color = guide_legend(reverse = TRUE)) +
coord_flip(clip = "off") +
theme(
text = element_text(family = "Fira Sans"),
plot.background = element_rect(fill = "#f7f7f7", color = NA),
plot.margin = unit(rep(2, 4), "cm"),
plot.title = element_text(family = "Fira Sans Medium"),
axis.line.y = element_line(lineend = "round", color = "#5D5D5D"),
axis.title.y = element_blank(),
axis.title.x = element_blank(),
axis.text.x = element_blank(),
panel.grid = element_blank(),
legend.position = "top",
legend.justification = "left",
legend.title = element_blank(),
legend.text = element_text(),
legend.key.width = unit(1, "cm")
)Next, let’s see the average trip lengths to find opportunities where casual riders take the longest trip, on average.
I wrote a quick calculation for referencing the cost of trips for casual users that you can review as you check out the average trip durations. I’ll summarize the average trip duration for casual users and members here, and the data is broken down further in the following tabs.
avg_trip_duration <- cyc_trips %>%
group_by(member_casual) %>%
summarize(trip_duration = seconds_to_period(ceiling(mean(trip_dur_sec))))
knitr::kable(avg_trip_duration, col.names=c("Rider Type", "Average Trip Duration"))| Rider Type | Average Trip Duration |
|---|---|
| casual | 23M 55S |
| member | 12M 27S |
As a reminder, Annual membership runs $119 for the
year and members ride for free for the first 45
minutes.
Casual users pay $1 to unlock the bike and then
.16¢/minute thereafter.
A day pass for casual users costs $8
and allows for unlimited trips for up to 3 hours.
We can see that casual users’ average trip length rounds up to 24 minutes, which is costing those users $4.84 per trip.
The average casual user would have to ride 25+ times in one year for the membership to beat the membership price.
\[25 * \$4.84 = \$121\]
The membership also becomes worth it when compared to the cost of 8 Day Passes.
\[8 * \$15 = \$120\]
day_in_mins <- (0:1440)
cost_casual <- (1 + day_in_mins * .16)
cost_member <- (-7.20 + day_in_mins * .16)
cyc_costs <-
data.frame(
cost_casual = cost_casual,
cost_member = cost_member
) %>%
mutate(trip_dur = seconds_to_period(duration(day_in_mins, unit = "min")),.before=cost_casual)
cyc_costs$cost_member[cyc_costs$cost_member < 0] <- 0
cyc_costs <- cyc_costs %>%
mutate(cost_member = label_dollar()(cost_member)) %>%
mutate(cost_casual = label_dollar()(cost_casual))
rmarkdown::paged_table(cyc_costs)By looking at the average ride length, we can identify when casual users take the longest rides and target advertising there, as those users would see the biggest benefit from the membership price incentive of unlimited rides.
It seems that every month, casual users take the longest trips in January, March, and May. By getting a membership earlier on in the year, casual users will reap a larger benefit throughout the rest of the year.
avglength_monthly_memvcas <- cyc_trips %>%
group_by(started_month,
member_casual) %>%
summarize(mean = ceiling(mean(trip_dur_sec)))
avglength_monthly_memvcas %>%
ggplot(aes(x = mean,
y = started_month,
fill = member_casual)) +
geom_col(aes(fill = member_casual),
width = .7,
position = position_dodge(width = .8)) +
geom_text(
aes(
x = mean,
y = started_month,
label = seconds_to_period(mean),
hjust = -.1,
vjust = .5,
color = member_casual,
),
position = position_dodge(width = .8)
) +
coord_cartesian(clip = "off") +
scale_color_manual(
labels = c("Casual User", "Annual Member"),
values = c(casual = "#3B984B", member = "#3b5998")
) +
scale_fill_manual(
labels = c("Casual User", "Annual Member"),
values = c(casual = "#3B984B", member = "#3b5998")
) +
scale_x_time(breaks = breaks_width("5 min"),
#minor_breaks = breaks_width("1 min"),
expand = expansion(add = c(0, 1))) +
scale_y_discrete(expand = expansion(add = c(1, 1)),
limits = rev(
c(
"Nov",
"Dec",
"Jan",
"Feb",
"Mar",
"Apr",
"May",
"Jun",
"Jul",
"Aug",
"Sep",
"Oct"
)
)) +
theme_minimal() +
labs(title = "Average Trip Length - Month",
subtitle = "November 2021 — October 2022",
caption = "Data downloaded on November 11, 2022") +
xlab(label = "Trip Length") +
theme(
text = element_text(family = "Fira Sans"),
plot.background = element_rect(fill = "#f7f7f7", color = NA),
plot.margin = unit(rep(2, 4), "cm"),
plot.title = element_text(family = "Fira Sans Medium"),
axis.line = element_line(lineend = "butt", color = "#5D5D5D"),
axis.line.x = element_blank(),
axis.title.y = element_blank(),
axis.title.x = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_text(color = "#5D5D5D"),
panel.grid = element_blank(),
legend.position = "top",
legend.justification = "left",
legend.title = element_blank(),
legend.text = element_text(),
legend.key.width = unit(1, "cm")
)Somewhat unsurprisingly, casual users ride longer on the weekends. Saturday - Monday rides are 4-6 minutes longer than the other days of the week. Monday seems odd!
avglength_monthly_memvcas <- cyc_trips %>%
group_by(started_day,
member_casual) %>%
summarize(mean = ceiling(mean(trip_dur_sec)))
avglength_monthly_memvcas %>%
ggplot(aes(x = mean,
y = started_day,
fill = member_casual)) +
geom_col(aes(fill = member_casual),
width = .7,
position = position_dodge(width = .8)) +
geom_text(
aes(
x = mean,
y = started_day,
label = seconds_to_period(mean),
hjust = -.1,
vjust = .5,
color = member_casual,
),
position = position_dodge(width = .8)
) +
coord_cartesian(clip = "off") +
scale_color_manual(
labels = c("Casual User", "Annual Member"),
values = c(casual = "#3B984B", member = "#3b5998")
) +
scale_fill_manual(
labels = c("Casual User", "Annual Member"),
values = c(casual = "#3B984B", member = "#3b5998")
) +
scale_x_time(breaks = breaks_width("5 min"),
#minor_breaks = breaks_width("1 min"),
expand = expansion(add = c(0, 1))) +
scale_y_discrete(expand = expansion(add = c(1, 1)),
limits = rev(c(
"Mon",
"Tue",
"Wed",
"Thu",
"Fri",
"Sat",
"Sun"
))) +
theme_minimal() +
labs(title = "Average Trip Length - Day",
subtitle = "November 2021 — October 2022",
caption = "Data downloaded on November 11, 2022") +
xlab(label = "Trip Length") +
theme(
text = element_text(family = "Fira Sans"),
plot.background = element_rect(fill = "#f7f7f7", color = NA),
plot.margin = unit(rep(2, 4), "cm"),
plot.title = element_text(family = "Fira Sans Medium"),
axis.line = element_line(lineend = "butt", color = "#5D5D5D"),
axis.line.x = element_blank(),
axis.title.y = element_blank(),
axis.title.x = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_text(color = "#5D5D5D"),
panel.grid = element_blank(),
#panel.grid.major.x = element_line(linetype = "solid", color="#e7e7e7"),
#panel.grid.minor.x = element_line(linetype = "dotted", color="#cfcfcf"),
legend.position = "top",
legend.justification = "left",
legend.title = element_blank(),
legend.text = element_text(),
legend.key.width = unit(1, "cm")
)When examining casual trip lengths for the whole week, there are two times where trips are longer than most of the day, around 1:00 and 11:00.
avglength_monthly_memvcas <- cyc_trips %>%
group_by(started_hour,
member_casual) %>%
summarize(mean = ceiling(mean(trip_dur_sec)))
avglength_monthly_memvcas %>%
ggplot(aes(x = started_hour,
y = mean,
fill = member_casual)) +
geom_col(width = .7,
position = position_dodge(width = .8)) +
geom_text(
aes(
x = started_hour,
y = mean,
label = seconds_to_period(mean),
hjust = -.1,
vjust = .5,
color = member_casual,
),
position = position_dodge(width = .8)
) +
coord_flip(clip = "off") +
scale_color_manual(
labels = c("Casual User", "Annual Member"),
values = c(casual = "#3B984B", member = "#3b5998")
) +
scale_fill_manual(
labels = c("Casual User", "Annual Member"),
values = c(casual = "#3B984B", member = "#3b5998")
) +
scale_y_time(breaks = breaks_width("5 min"),
#minor_breaks = breaks_width("1 min"),
expand = expansion(add = c(0, 1))) +
scale_x_reverse(
expand = expansion(add = c(0, 0)),
# reverse to make it make more sense time-wise
breaks = breaks_width(-1),
# USE NEGATIVE VALUE! breakdown DOWN by one
labels = label_number(suffix = ":00")
) +
theme_minimal() +
labs(title = "Average Trip Length - Time of Day (Whole Week)",
subtitle = "November 2021 — October 2022",
caption = "Data downloaded on November 11, 2022") +
xlab(label = "Trip Length") +
theme(
text = element_text(family = "Fira Sans"),
plot.background = element_rect(fill = "#f7f7f7", color = NA),
plot.margin = unit(rep(2, 4), "cm"),
plot.title = element_text(family = "Fira Sans Medium"),
axis.line = element_line(lineend = "butt", color = "#5D5D5D"),
axis.line.x = element_blank(),
axis.title.y = element_blank(),
axis.title.x = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_text(color = "#5D5D5D"),
panel.grid = element_blank(),
#panel.grid.major.x = element_line(linetype = "solid", color="#e7e7e7"),
#panel.grid.minor.x = element_line(linetype = "dotted", color="#cfcfcf"),
legend.position = "top",
legend.justification = "left",
legend.title = element_blank(),
legend.text = element_text(),
legend.key.width = unit(1, "cm")
)Casual users take longer rides on average at 1:00 and 11:00.
avglength_monthly_memvcas <- cyc_trips %>%
filter(!(started_day == "Sat" | started_day == "Sun")) %>%
group_by(started_hour,
member_casual) %>%
summarize(mean = ceiling(mean(trip_dur_sec)))
avglength_monthly_memvcas %>%
ggplot(aes(x = started_hour,
y = mean,
fill = member_casual)) +
geom_col(width = .7,
position = position_dodge(width = .8)) +
geom_text(
aes(
x = started_hour,
y = mean,
label = seconds_to_period(mean),
hjust = -.1,
vjust = .5,
color = member_casual,
),
position = position_dodge(width = .8)
) +
coord_flip(clip = "off") +
scale_color_manual(
labels = c("Casual User", "Annual Member"),
values = c(casual = "#3B984B", member = "#3b5998")
) +
scale_fill_manual(
labels = c("Casual User", "Annual Member"),
values = c(casual = "#3B984B", member = "#3b5998")
) +
scale_y_time(breaks = breaks_width("5 min"),
#minor_breaks = breaks_width("1 min"),
expand = expansion(add = c(0, 1))) +
scale_x_reverse(
expand = expansion(add = c(0, 0)),
# reverse to make it make more sense time-wise
breaks = breaks_width(-1),
# USE NEGATIVE VALUE! breakdown DOWN by one
labels = label_number(suffix = ":00")
) +
theme_minimal() +
labs(title = "Average Trip Length - Time of Day (Mon - Fri)",
subtitle = "November 2021 — October 2022",
caption = "Data downloaded on November 11, 2022") +
xlab(label = "Trip Length") +
theme(
text = element_text(family = "Fira Sans"),
plot.background = element_rect(fill = "#f7f7f7", color = NA),
plot.margin = unit(rep(2, 4), "cm"),
plot.title = element_text(family = "Fira Sans Medium"),
axis.line = element_line(lineend = "butt", color = "#5D5D5D"),
axis.line.x = element_blank(),
axis.title.y = element_blank(),
axis.title.x = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_text(color = "#5D5D5D"),
panel.grid = element_blank(),
#panel.grid.major.x = element_line(linetype = "solid", color="#e7e7e7"),
#panel.grid.minor.x = element_line(linetype = "dotted", color="#cfcfcf"),
legend.position = "top",
legend.justification = "left",
legend.title = element_blank(),
legend.text = element_text(),
legend.key.width = unit(1, "cm")
)The average trip gets longer progressively throughout the day, peaking at 14:00 (2pm) until it starts rising again at around midnight.
avglength_monthly_memvcas <- cyc_trips %>%
filter(started_day == "Sat" | started_day == "Sun") %>%
group_by(started_hour,
member_casual) %>%
summarize(mean = ceiling(mean(trip_dur_sec)))
avglength_monthly_memvcas %>%
ggplot(aes(x = started_hour,
y = mean,
fill = member_casual)) +
geom_col(width = .7,
position = position_dodge(width = .8)) +
geom_text(
aes(
x = started_hour,
y = mean,
label = seconds_to_period(mean),
hjust = -.1,
vjust = .5,
color = member_casual,
),
position = position_dodge(width = .8)
) +
coord_flip(clip = "off") +
scale_color_manual(
labels = c("Casual User", "Annual Member"),
values = c(casual = "#3B984B", member = "#3b5998")
) +
scale_fill_manual(
labels = c("Casual User", "Annual Member"),
values = c(casual = "#3B984B", member = "#3b5998")
) +
scale_y_time(breaks = breaks_width("5 min"),
#minor_breaks = breaks_width("1 min"),
expand = expansion(add = c(0, 1))) +
scale_x_reverse(
expand = expansion(add = c(0, 0)),
# reverse to make it make more sense time-wise
breaks = breaks_width(-1),
# USE NEGATIVE VALUE! breakdown DOWN by one
labels = label_number(suffix = ":00")
) +
theme_minimal() +
labs(title = "Average Trip Length - Time of Day (Sat - Sun)",
subtitle = "November 2021 — October 2022",
caption = "Data downloaded on November 11, 2022") +
xlab(label = "Trip Length") +
theme(
text = element_text(family = "Fira Sans"),
plot.background = element_rect(fill = "#f7f7f7", color = NA),
plot.margin = unit(rep(2, 4), "cm"),
plot.title = element_text(family = "Fira Sans Medium"),
axis.line = element_line(lineend = "butt", color = "#5D5D5D"),
axis.line.x = element_blank(),
axis.title.y = element_blank(),
axis.title.x = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_text(color = "#5D5D5D"),
panel.grid = element_blank(),
#panel.grid.major.x = element_line(linetype = "solid", color="#e7e7e7"),
#panel.grid.minor.x = element_line(linetype = "dotted", color="#cfcfcf"),
legend.position = "top",
legend.justification = "left",
legend.title = element_blank(),
legend.text = element_text(),
legend.key.width = unit(1, "cm")
)To help identify the locations to target advertising, I’ve included a list of top 10 locations for both casual users and members, as well as interactive maps that allows you to see the number of trips started and ended for each recorded station.
Zoom in and out to see the locations where the most trips started. The locations will cluster and come apart as you zoom in and out, showing you the total number of trips started in each cluster down to the individual location. This looks crummy on mobile right now and I will clean it up soon! But the data is there and works.
start_locs_member <- cyc_trips %>%
filter(member_casual == "member") %>%
group_by(start_station_name) %>%
summarize(
member_trip_count = n(),
start_lng = round(mean(start_lng), digits = 5),
start_lat = round(mean(start_lat), digits = 5))
start_locs_casual <- cyc_trips %>%
filter(member_casual == "casual") %>%
group_by(start_station_name) %>%
summarize(
casual_trip_count = n(),
start_lng = round(mean(start_lng), digits = 5),
start_lat = round(mean(start_lat), digits = 5))
start_locs <- full_join(start_locs_casual, start_locs_member, by="start_station_name")
start_locs <- start_locs %>%
mutate(start_lng = coalesce(start_lng.x, start_lng.y),
start_lat = coalesce(start_lat.x, start_lat.y))
start_locs <- start_locs[c(-3,-4,-6,-7)] %>%
replace(is.na(.),0)
leaflet( height = "600px",
width = "100%",
options = leafletOptions(
zoomControl = TRUE,
minZoom = 10,
)) %>% setView(lng = -87.63,
lat = 41.8781,
zoom = 14) %>%
setMaxBounds(
lng1 = -87.93,
lat1 = 41.58,
lng2 = -87.51,
lat2 = 42.11
) %>%
addProviderTiles(providers$CartoDB.PositronNoLabels) %>%
addProviderTiles(providers$CartoDB.PositronOnlyLabels) %>%
addMarkers(
data = start_locs,
lng = ~ start_lng,
lat = ~ start_lat,
label = ~ start_station_name,
options = markerOptions(n = ~ (casual_trip_count + member_trip_count)),
popupOptions = popupOptions(closePopupOnClick = F),
popup =
paste(
"<b>",
start_locs$start_station_name,
"</b>",
"",
"<br/>",
"Casual Users: ",
prettyNum(start_locs$casual_trip_count, big.mark = ","),
"<br/>",
"Members: ",
prettyNum(start_locs$member_trip_count, big.mark = ",")
),
clusterOptions = markerClusterOptions(
maxClusterRadius = 120,
iconCreateFunction = JS(
"function (cluster) {
var markers = cluster.getAllChildMarkers();
var sum = 0;
for (i = 0; i < markers.length; i++) {
sum += Number(markers[i].options.n);
sum += 1;
}
var childCount = sum;
var c = ' marker-cluster-';
if (childCount < 10000) {
c += 'small';
}
else if (childCount < 100000) {
c += 'medium';
}
else {
c += 'large';
}
return new L.DivIcon({ html: '<div><span>' + sum.toLocaleString(\"en-US\") + '</span></div>',
className: 'marker-cluster' +c, iconSize: new L.Point(40,40)});
}"
)
))Here we can see where casual users like to start, and end their trips the most. It appears that casual users start their trips near popular tourist destinations, e.g. the number one location is Streeter Dr & Grand Ave, which happens to be outside of Navy Pier.
start_locs_member <- cyc_trips %>%
filter(member_casual == "member") %>%
group_by(start_station_name) %>%
summarize(
member_trip_count = n(),
start_lng = mean(start_lng),
start_lat = mean(start_lat)
) %>% arrange(desc(member_trip_count))
start_locs_casual <- cyc_trips %>%
filter(member_casual == "casual") %>%
group_by(start_station_name) %>%
summarize(
casual_trip_count = n(),
start_lng = mean(start_lng),
start_lat = mean(start_lat)
) %>% arrange(desc(casual_trip_count))
top_start_locs_cas <-
start_locs_casual[1:10,] %>% mutate(rank = (1:10), .before = start_station_name)
top_start_locs_mem <-
start_locs_member[1:10,] %>% mutate(rank = (1:10), .before = start_station_name)
knitr::kable(
top_start_locs_cas[1:3],
col.names = c(" ", "Location Name", "Trips Started"),
align = "clr",
caption = "Trips Started by Casual Users"
)| Location Name | Trips Started | |
|---|---|---|
| 1 | Streeter Dr & Grand Ave | 55295 |
| 2 | DuSable Lake Shore Dr & Monroe St | 30938 |
| 3 | Millennium Park | 24354 |
| 4 | Michigan Ave & Oak St | 23795 |
| 5 | DuSable Lake Shore Dr & North Blvd | 22275 |
| 6 | Shedd Aquarium | 19578 |
| 7 | Theater on the Lake | 17451 |
| 8 | Wells St & Concord Ln | 15003 |
| 9 | Dusable Harbor | 13438 |
| 10 | Clark St & Armitage Ave | 12920 |
leaflet(
height = "600px",
width = "100%",
options = leafletOptions(
zoomControl = FALSE,
minZoom = 13,
maxZoom = 13,
dragging = FALSE
)
) %>% setView(lng = -87.622,
lat = 41.897,
zoom = 13) %>%
addProviderTiles(providers$CartoDB.PositronNoLabels) %>%
addProviderTiles(providers$CartoDB.PositronOnlyLabels) %>%
#addProviderTiles(providers$Stamen.TonerLite) %>%
addCircles(
data = top_start_locs_cas,
lng = ~ start_lng,
lat = ~ start_lat,
radius = ~ casual_trip_count / 100,
stroke = F,
color = "#3B984B",
opacity = 1,
fillOpacity = .618,
label = paste(
top_start_locs_mem$rank,
": ",
top_start_locs_mem$start_station_name
),
labelOptions = labelOptions(direction = "top"),
popup = paste(
"<b>",
top_start_locs_cas$start_station_name,
"</b>",
"",
"<br/>",
"Casual Users: ",
prettyNum(top_start_locs_cas$casual_trip_count, big.mark = ",")
),
popupOptions = popupOptions(closeOnClick = T)
)knitr::kable(
top_start_locs_mem[1:3],
col.names = c(" ", "Location Name", "Trips Started"),
align = "clr",
caption = "Trips Started by Members"
)| Location Name | Trips Started | |
|---|---|---|
| 1 | Kingsbury St & Kinzie St | 24408 |
| 2 | Clark St & Elm St | 21250 |
| 3 | Wells St & Concord Ln | 20310 |
| 4 | Clinton St & Washington Blvd | 19008 |
| 5 | Clinton St & Madison St | 18512 |
| 6 | Loomis St & Lexington St | 18477 |
| 7 | Wells St & Elm St | 18017 |
| 8 | University Ave & 57th St | 17642 |
| 9 | Ellis Ave & 60th St | 17609 |
| 10 | Streeter Dr & Grand Ave | 16312 |
leaflet(
height = "600px",
width = "100%",
options = leafletOptions(
zoomControl = FALSE,
minZoom = 12,
maxZoom = 12,
dragging = FALSE
)
) %>% setView(lng = -87.621,
lat = 41.85,
zoom = 11) %>%
addProviderTiles(providers$CartoDB.PositronNoLabels) %>%
addProviderTiles(providers$CartoDB.PositronOnlyLabels) %>%
#addProviderTiles(providers$Stamen.TonerLite) %>%
addCircles(
data = top_start_locs_mem,
lng = ~ start_lng,
lat = ~ start_lat,
radius = ~ member_trip_count / 50,
stroke = F,
color = "#3b5998",
opacity = 1,
fillOpacity = .618,
label = paste(
top_start_locs_mem$rank,
": ",
top_start_locs_mem$start_station_name
),
labelOptions = labelOptions(direction = "top"),
popup = paste(
"<b>",
top_start_locs_mem$start_station_name,
"</b>",
"",
"<br/>",
"Members: ",
prettyNum(top_start_locs_mem$member_trip_count, big.mark = ",")
),
popupOptions = popupOptions(closeOnClick = T)
)end_locs_member <- cyc_trips %>%
filter(member_casual == "member") %>%
group_by(end_station_name) %>%
summarize(
member_trip_count = n(),
end_lng = round(mean(end_lng), digits = 5),
end_lat = round(mean(end_lat), digits = 5))
end_locs_casual <- cyc_trips %>%
filter(member_casual == "casual") %>%
group_by(end_station_name) %>%
summarize(
casual_trip_count = n(),
end_lng = round(mean(end_lng), digits = 5),
end_lat = round(mean(end_lat), digits = 5))
end_locs <- full_join(end_locs_casual, end_locs_member, by="end_station_name")
end_locs <- end_locs %>%
mutate(end_lng = coalesce(end_lng.x, end_lng.y),
end_lat = coalesce(end_lat.x, end_lat.y))
end_locs <- end_locs[c(-3,-4,-6,-7)] %>%
replace(is.na(.),0)
leaflet( height = "600px",
width = "100%",
options = leafletOptions(
zoomControl = FALSE,
minZoom = 10,
)) %>% setView(lng = -87.63,
lat = 41.8781,
zoom = 14) %>%
setMaxBounds(
lng1 = -87.93,
lat1 = 41.58,
lng2 = -87.51,
lat2 = 42.11
) %>%
addProviderTiles(providers$CartoDB.PositronNoLabels) %>%
addProviderTiles(providers$CartoDB.PositronOnlyLabels) %>%
#addProviderTiles(providers$Stamen.TonerLite) %>%
addMarkers(
data = end_locs,
lng = ~ end_lng,
lat = ~ end_lat,
label = paste(top_start_locs_mem$rank,
": ",
top_start_locs_mem$start_station_name),
options = markerOptions(n = ~ (casual_trip_count + member_trip_count)),
popupOptions = popupOptions(closePopupOnClick = F),
popup =
paste("<b>",
end_locs$end_station_name,
"</b>",
"",
"<br/>",
"Casual Users: ",
prettyNum(end_locs$casual_trip_count, big.mark = ","),
"<br/>",
"Members: ",
prettyNum(end_locs$member_trip_count, big.mark = ",")
),
clusterOptions = markerClusterOptions(
maxClusterRadius = 120,
iconCreateFunction = JS(
"function (cluster) {
var markers = cluster.getAllChildMarkers();
var sum = 0;
for (i = 0; i < markers.length; i++) {
sum += Number(markers[i].options.n);
sum += 1;
}
var childCount = sum;
var c = ' marker-cluster-';
if (childCount < 10000) {
c += 'small';
}
else if (childCount < 100000) {
c += 'medium';
}
else {
c += 'large';
}
return new L.DivIcon({ html: '<div><span>' + sum.toLocaleString(\"en-US\") + '</span></div>',
className: 'marker-cluster' +c, iconSize: new L.Point(40,40)});
}"
)
))end_locs_member <- cyc_trips %>%
filter(member_casual == "member") %>%
group_by(end_station_name) %>%
summarize(
member_trip_count = n(),
end_lng = mean(end_lng),
end_lat = mean(end_lat)
) %>% arrange(desc(member_trip_count))
end_locs_casual <- cyc_trips %>%
filter(member_casual == "casual") %>%
group_by(end_station_name) %>%
summarize(
casual_trip_count = n(),
end_lng = mean(end_lng),
end_lat = mean(end_lat)
) %>% arrange(desc(casual_trip_count))
top_end_locs_cas <-
end_locs_casual[1:10,] %>% mutate(rank = (1:10), .before = end_station_name)
top_end_locs_mem <-
end_locs_member[1:10,] %>% mutate(rank = (1:10), .before = end_station_name)
knitr::kable(
top_end_locs_cas[1:3],
col.names = c(" ", "Location Name", "Trips ended"),
align = "clr",
caption = "Trips Ended by Casual Users"
)| Location Name | Trips ended | |
|---|---|---|
| 1 | Streeter Dr & Grand Ave | 58182 |
| 2 | DuSable Lake Shore Dr & Monroe St | 28929 |
| 3 | Millennium Park | 26171 |
| 4 | Michigan Ave & Oak St | 25494 |
| 5 | DuSable Lake Shore Dr & North Blvd | 25409 |
| 6 | Theater on the Lake | 18789 |
| 7 | Shedd Aquarium | 18244 |
| 8 | Wells St & Concord Ln | 14555 |
| 9 | Clark St & Armitage Ave | 13190 |
| 10 | Clark St & Lincoln Ave | 12991 |
leaflet(
height = "600px",
width = "100%",
options = leafletOptions(
zoomControl = FALSE,
minZoom = 13,
maxZoom = 13,
dragging = FALSE
)
) %>% setView(lng = -87.622,
lat = 41.897,
zoom = 13) %>%
addProviderTiles(providers$CartoDB.PositronNoLabels) %>%
addProviderTiles(providers$CartoDB.PositronOnlyLabels) %>%
#addProviderTiles(providers$Stamen.TonerLite) %>%
addCircles(
data = top_end_locs_cas,
lng = ~ end_lng,
lat = ~ end_lat,
radius = ~ casual_trip_count / 100,
stroke = F,
color = "#3B984B",
opacity = 1,
fillOpacity = .618,
label = paste(
top_end_locs_mem$rank,
": ",
top_end_locs_mem$end_station_name
),
labelOptions = labelOptions(direction = "top"),
popup = paste(
"<b>",
top_end_locs_cas$end_station_name,
"</b>",
"",
"<br/>",
"Casual Users: ",
prettyNum(top_end_locs_cas$casual_trip_count, big.mark = ",")
),
popupOptions = popupOptions(closeOnClick = F)
)knitr::kable(
top_end_locs_mem[1:3],
col.names = c(" ", "Location Name", "Trips ended"),
align = "clr",
caption = "Trips Ended by Members"
)| Location Name | Trips ended | |
|---|---|---|
| 1 | Kingsbury St & Kinzie St | 23918 |
| 2 | Clark St & Elm St | 21556 |
| 3 | Wells St & Concord Ln | 20862 |
| 4 | Clinton St & Washington Blvd | 19673 |
| 5 | Clinton St & Madison St | 18954 |
| 6 | University Ave & 57th St | 18748 |
| 7 | Loomis St & Lexington St | 18081 |
| 8 | Wells St & Elm St | 17854 |
| 9 | Ellis Ave & 60th St | 17363 |
| 10 | Broadway & Barry Ave | 16742 |
leaflet(
height = "600px",
width = "100%",
options = leafletOptions(
zoomControl = FALSE,
minZoom = 12,
maxZoom = 12,
dragging = FALSE
)
) %>% setView(lng = -87.621,
lat = 41.85,
zoom = 11) %>%
addProviderTiles(providers$CartoDB.PositronNoLabels) %>%
addProviderTiles(providers$CartoDB.PositronOnlyLabels) %>%
#addProviderTiles(providers$Stamen.TonerLite) %>%
addCircles(
data = top_end_locs_mem,
lng = ~ end_lng,
lat = ~ end_lat,
radius = ~ member_trip_count / 50,
stroke = F,
color = "#3b5998",
opacity = 1,
fillOpacity = .618,
label = paste(
top_end_locs_mem$rank,
": ",
top_end_locs_mem$end_station_name
),
labelOptions = labelOptions(direction = "top"),
popup = paste(
"<b>",
top_end_locs_mem$end_station_name,
"</b>",
"",
"<br/>",
"Members: ",
prettyNum(top_end_locs_mem$member_trip_count, big.mark = ",")
),
popupOptions = popupOptions(closeOnClick = F)
)Analyzing nearly 5 million individual trip entries allowed us to discover some insights into casual user and member habits that could help us target casual users and drive conversion to annual membership.
An average user would see benefits to the memberships after taking 25 average individual trips or if they plan on using more than 8 day passes in a 12 month period.
We discovered that most trips got started June through August. We also saw that the weekends were the the most popular days for casual users to ride, and we identified casual usage gradually increased throughout those days, peaking at around 15:00, or 3:00pm.
Casual users started their trips near popular tourist destinations, such as Navy Pier and Shedd Aquarium, along the lake front, and near major transportation hubs, like Merchandise Mart, and Ogilvie and Union stations.
Partnerships with those popular destinations, as well as Metra/Pace/CTA, that could produce benefits for users could drive conversion to memberships. Campaigns that begin in Spring that advertise easier access to popular events and festivals, such as Taste of Chicago or Lollapalooza could also drive increased interest.
For a digital marketing strategy, the company could use internal data to show in-app content to users that have purchased around 8 day passes in the last 12 month period, and to users whose first few rides, or several subsequent rides, end up lasting around the average length of 24 minutes.
Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R, Grolemund G, Hayes A, Henry L, Hester J, Kuhn M, Pedersen TL, Miller E, Bache SM, Müller K, Ooms J, Robinson D, Seidel DP, Spinu V, Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H (2019). “Welcome to the tidyverse.” Journal of Open Source Software, 4(43), 1686. doi:10.21105/joss.01686 https://doi.org/10.21105/joss.01686.
Wickham H, Seidel D (2022). scales: Scale Functions for Visualization. R package version 1.2.1, https://CRAN.R-project.org/package=scales.
Garrett Grolemund, Hadley Wickham (2011). Dates and Times Made Easy with lubridate. Journal of Statistical Software, 40(3), 1-25. URL https://www.jstatsoft.org/v40/i03/.
Müller K (2022). hms: Pretty Time of Day. R package version 1.1.2, https://CRAN.R-project.org/package=hms.
Cheng J, Karambelkar B, Xie Y (2022). leaflet: Create Interactive Web Maps with the JavaScript ‘Leaflet’ Library. R package version 2.1.1, https://CRAN.R-project.org/package=leaflet.