The Cyclistic Bike Share Case Study is my capstone project for the Google Data Analytics Certificate. For the analysis I will be using R programing language and RStudio. In this project, I will follow the data analysis process which I learned from the course: ask, prepare, process, analyze, share and act to analyze the data.
You are a junior data analyst working on the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore, your team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, your team will design a new marketing strategy to convert casual riders into annual members. But first, Cyclistic executives must approve your recommendations, so they must be backed up with compelling data insights and professional data visualizations.
Lily Moreno,The director of marketing and your manager, the has assigned you the question to answer: How do annual members and casual riders use Cyclistic bikes differently?
The business objective of the case study is to identify opportunities for marketing campaigns to convert casual riders into annual members. This will be done through analysis of bike trip data and the understanding user behavior and preferences. The ultimate goal is to increase profit.
Analyze Cyclistic’s historical bike trip data Identify trends and Patterns of casual and member riders Understand behavior and preferences of these two user groups Identify ways to convert casual riders into annual members Use insights from analysis to inform development of new marketing strategy for the company Goal increase profitability and drive future growth.
Lily Moreno: Director of Marketing at Cyclistic, who is responsible for implementing the marketing campaigns at Cyclistic.
Cyclistic’s marketing team: They will be responsible for conducting the analysis and developing the marketing strategy based on the insights gained.
Cyclistic’s investors and shareholders: They have a financial interest in the company’s success and may be interested in the results of the analysis and any changes to the marketing strategy.
The data for this analysis can be accessed through the provided link. It includes 12 months of historical trip data in 2019 and 2020 from Cyclistic, a fictional bike share company based in Chicago.
Motivate, Inc. collected the data for this analysis directly through its management of the Cyclistic Bike Share program for the City of Chicago. The data is comprehensive and consistent, because it includes information on all rides taken by users; not just a sample. It is also current, as it is released on a monthly basis by the City of Chicago. The data is made available to the public by the City of Chicago.
Here, we perform the data cleaning, ensure integrity, and that it is complete, correct, and relevant.
library(tidyverse)
library(conflicted)
library(ggplot2)
read_csv("Divvy_Trips_2019_Q1.csv")
## Rows: 365069 Columns: 12
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (6): start_time, end_time, from_station_name, to_station_name, usertype,...
## dbl (5): trip_id, bikeid, from_station_id, to_station_id, birthyear
## num (1): tripduration
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## # A tibble: 365,069 Ă— 12
## trip_id start_time end_time bikeid tripduration from_station_id
## <dbl> <chr> <chr> <dbl> <dbl> <dbl>
## 1 21742443 2019-01-01 0:04:37 2019-01-01 0… 2167 390 199
## 2 21742444 2019-01-01 0:08:13 2019-01-01 0… 4386 441 44
## 3 21742445 2019-01-01 0:13:23 2019-01-01 0… 1524 829 15
## 4 21742446 2019-01-01 0:13:45 2019-01-01 0… 252 1783 123
## 5 21742447 2019-01-01 0:14:52 2019-01-01 0… 1170 364 173
## 6 21742448 2019-01-01 0:15:33 2019-01-01 0… 2437 216 98
## 7 21742449 2019-01-01 0:16:06 2019-01-01 0… 2708 177 98
## 8 21742450 2019-01-01 0:18:41 2019-01-01 0… 2796 100 211
## 9 21742451 2019-01-01 0:18:43 2019-01-01 0… 6205 1727 150
## 10 21742452 2019-01-01 0:19:18 2019-01-01 0… 3939 336 268
## # ℹ 365,059 more rows
## # ℹ 6 more variables: from_station_name <chr>, to_station_id <dbl>,
## # to_station_name <chr>, usertype <chr>, gender <chr>, birthyear <dbl>
read_csv("Divvy_Trips_2020_Q1.csv")
## Rows: 426887 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, started_at, ended_at, start_station_name, e...
## dbl (6): start_station_id, end_station_id, start_lat, start_lng, end_lat, en...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## # A tibble: 426,887 Ă— 13
## ride_id rideable_type started_at ended_at start_station_name start_station_id
## <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 EACB19… docked_bike 2020-01-2… 2020-01… Western Ave & Lel… 239
## 2 8FED87… docked_bike 2020-01-3… 2020-01… Clark St & Montro… 234
## 3 789F3C… docked_bike 2020-01-0… 2020-01… Broadway & Belmon… 296
## 4 C9A388… docked_bike 2020-01-0… 2020-01… Clark St & Randol… 51
## 5 943BC3… docked_bike 2020-01-3… 2020-01… Clinton St & Lake… 66
## 6 6D9C8A… docked_bike 2020-01-1… 2020-01… Wells St & Hubbar… 212
## 7 31EB9B… docked_bike 2020-01-1… 2020-01… Desplaines St & R… 96
## 8 A2B24E… docked_bike 2020-01-1… 2020-01… Desplaines St & R… 96
## 9 5E3F01… docked_bike 2020-01-3… 2020-01… Wells St & Hubbar… 212
## 10 19DC57… docked_bike 2020-01-3… 2020-01… Clark St & Lake St 38
## # ℹ 426,877 more rows
## # ℹ 7 more variables: end_station_name <chr>, end_station_id <dbl>,
## # start_lat <dbl>, start_lng <dbl>, end_lat <dbl>, end_lng <dbl>,
## # member_casual <chr>
Rename the files to make them consistant with the 2020 data
(Divvy_Trips_2019_Q1 <- rename(Divvy_Trips_2019_Q1
,ride_id = trip_id
,rideable_type = bikeid
,started_at = start_time
,ended_at = end_time
,start_station_name = from_station_name
,start_station_id = from_station_id
,end_station_name = to_station_name
,end_station_id = to_station_id
,member_casual = usertype
))
## # A tibble: 365,069 Ă— 12
## ride_id started_at ended_at rideable_type tripduration start_station_id
## <dbl> <chr> <chr> <dbl> <dbl> <dbl>
## 1 21742443 2019-01-01 0:0… 2019-01… 2167 390 199
## 2 21742444 2019-01-01 0:0… 2019-01… 4386 441 44
## 3 21742445 2019-01-01 0:1… 2019-01… 1524 829 15
## 4 21742446 2019-01-01 0:1… 2019-01… 252 1783 123
## 5 21742447 2019-01-01 0:1… 2019-01… 1170 364 173
## 6 21742448 2019-01-01 0:1… 2019-01… 2437 216 98
## 7 21742449 2019-01-01 0:1… 2019-01… 2708 177 98
## 8 21742450 2019-01-01 0:1… 2019-01… 2796 100 211
## 9 21742451 2019-01-01 0:1… 2019-01… 6205 1727 150
## 10 21742452 2019-01-01 0:1… 2019-01… 3939 336 268
## # ℹ 365,059 more rows
## # ℹ 6 more variables: start_station_name <chr>, end_station_id <dbl>,
## # end_station_name <chr>, member_casual <chr>, gender <chr>, birthyear <dbl>
Divvy_Trips_2019_Q1 <- mutate(Divvy_Trips_2019_Q1, ride_id = as.character(ride_id)
,rideable_type = as.character(rideable_type))
all_trips <- bind_rows(Divvy_Trips_2019_Q1, Divvy_Trips_2020_Q1)#, q3_2019)#, q4_2019, q1_2020)
all_trips <- all_trips %>%
select(-c(start_lat, start_lng, end_lat, end_lng, birthyear, gender, "tripduration"))
all_trips <- all_trips %>%
mutate(member_casual = recode(member_casual
,"Subscriber" = "member"
,"Customer" = "casual"))
all_trips$ride_length <- as.numeric(difftime(all_trips$ended_at, all_trips$started_at, units = "mins"))
all_trips <- all_trips %>% dplyr::filter(start_station_name != "HQ QR", ride_length >= 0)
all_trips$date <- as.Date(all_trips$started_at)
all_trips$month <- format(as.Date(all_trips$date), "%m")
all_trips$day <- format(as.Date(all_trips$date), "%d")
all_trips$year <- format(as.Date(all_trips$date), "%Y")
all_trips$day_of_week <- format(as.Date(all_trips$date), "%A")
all_trips$ride_length <- difftime(all_trips$ended_at,all_trips$started_at)
all_trips$ride_length <- as.numeric(as.character(all_trips$ride_length))
``` r
summary(all_trips$ride_length)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1 331 539 1189 912 10628422
Arrange by the days of the week a compare the casual riders and members
all_trips$day_of_week <- ordered(all_trips$day_of_week, levels=c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))
aggregate(all_trips$ride_length ~ all_trips$member_casual, FUN = mean)
## all_trips$member_casual all_trips$ride_length
## 1 casual 5370.7685
## 2 member 795.1324
aggregate(all_trips$ride_length ~ all_trips$member_casual, FUN = median)
## all_trips$member_casual all_trips$ride_length
## 1 casual 1393
## 2 member 508
aggregate(all_trips$ride_length ~ all_trips$member_casual, FUN = max)
## all_trips$member_casual all_trips$ride_length
## 1 casual 10628422
## 2 member 6096428
aggregate(all_trips$ride_length ~ all_trips$member_casual, FUN = min)
## all_trips$member_casual all_trips$ride_length
## 1 casual 2
## 2 member 1
The average ride time by each day for members vs casual users
aggregate(all_trips$ride_length ~ all_trips$member_casual + all_trips$day_of_week, FUN = mean)
## all_trips$member_casual all_trips$day_of_week all_trips$ride_length
## 1 casual Sunday 5059.1813
## 2 member Sunday 972.1609
## 3 casual Monday 4750.7627
## 4 member Monday 822.2134
## 5 casual Tuesday 4560.8190
## 6 member Tuesday 769.4416
## 7 casual Wednesday 4480.3724
## 8 member Wednesday 711.9838
## 9 casual Thursday 8450.1557
## 10 member Thursday 707.2093
## 11 casual Friday 6088.9402
## 12 member Friday 796.6713
## 13 casual Saturday 4946.4956
## 14 member Saturday 973.7094
The relationship between rides and days of the week
all_trips %>%
mutate(weekday = wday(started_at, label = TRUE)) %>%
group_by(member_casual, weekday) %>%
summarise(number_of_rides = n()
,average_duration = mean(ride_length)) %>%
arrange(member_casual, weekday)
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
## # A tibble: 14 Ă— 4
## # Groups: member_casual [2]
## member_casual weekday number_of_rides average_duration
## <chr> <ord> <int> <dbl>
## 1 casual Sun 18652 5059.
## 2 casual Mon 5591 4751.
## 3 casual Tue 7311 4561.
## 4 casual Wed 7690 4480.
## 5 casual Thu 7147 8450.
## 6 casual Fri 8013 6089.
## 7 casual Sat 13473 4946.
## 8 member Sun 60197 972.
## 9 member Mon 110430 822.
## 10 member Tue 127974 769.
## 11 member Wed 121902 712.
## 12 member Thu 125228 707.
## 13 member Fri 115168 797.
## 14 member Sat 59413 974.
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
This analysis provides valuable insights into the preferences and behaviors of members and casual riders. By tailoring strategies to the identified differences and preferences, Cyclistic can effectively convert casual riders into portential members.