The analysis is the capstone project for the Google Data Analysis Certificate, since this isn’t my first analysis I want to try R as tool because I don’t have enough experience with it. I take the certificate to back up my experience and learn things than I probably miss in my autodidact learning path. This was a great experience full of new things, headache and research. I’m so excited to know what is next in this beautiful data world I’m discovering.
The primary goal of this analysis is to understand the key differences in how annual members and casual riders use Cyclistic bike-share services. By identifying distinct usage patterns, we can develop targeted marketing strategies to convert casual riders into more profitable annual members.
This report is based on Cyclistic’s historical trip data from the previous 12 months (January 2024 - December 2024). The data was made available by Motivate International Inc. and is considered reliable for this analysis. Due to data privacy policies, no personally identifiable information was used.
The raw data from the 12 separate monthly files was merged into a single dataset after reviewing columns and data.
#Loading librarys
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.2 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(rmarkdown)
## Warning: package 'rmarkdown' was built under R version 4.5.1
library(naniar)
## Warning: package 'naniar' was built under R version 4.5.1
# Uploading all the tables
D01_2024 <- read_csv("C:/Users/Barba/Projects/Data Projects/Bikes/Data/202401-divvy-tripdata.csv")
## Rows: 144873 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
D02_2024 <- read_csv("C:/Users/Barba/Projects/Data Projects/Bikes/Data/202402-divvy-tripdata.csv")
## Rows: 223164 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
D03_2024 <- read_csv("C:/Users/Barba/Projects/Data Projects/Bikes/Data/202403-divvy-tripdata.csv")
## Rows: 301687 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
D04_2024 <- read_csv("C:/Users/Barba/Projects/Data Projects/Bikes/Data/202404-divvy-tripdata.csv")
## Rows: 415025 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
D05_2024 <- read_csv("C:/Users/Barba/Projects/Data Projects/Bikes/Data/202405-divvy-tripdata.csv")
## Rows: 609493 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
D06_2024 <- read_csv("C:/Users/Barba/Projects/Data Projects/Bikes/Data/202406-divvy-tripdata.csv")
## Rows: 710721 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
D07_2024 <- read_csv("C:/Users/Barba/Projects/Data Projects/Bikes/Data/202407-divvy-tripdata.csv")
## Rows: 748962 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
D08_2024 <- read_csv("C:/Users/Barba/Projects/Data Projects/Bikes/Data/202408-divvy-tripdata.csv")
## Rows: 755639 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
D09_2024 <- read_csv("C:/Users/Barba/Projects/Data Projects/Bikes/Data/202409-divvy-tripdata.csv")
## Rows: 821276 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
D10_2024 <- read_csv("C:/Users/Barba/Projects/Data Projects/Bikes/Data/202410-divvy-tripdata.csv")
## Rows: 616281 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
D11_2024 <- read_csv("C:/Users/Barba/Projects/Data Projects/Bikes/Data/202411-divvy-tripdata.csv")
## Rows: 335075 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
D12_2024 <- read_csv("C:/Users/Barba/Projects/Data Projects/Bikes/Data/202412-divvy-tripdata.csv")
## Rows: 178372 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#Exploring the tables
str(D01_2024)
## spc_tbl_ [144,873 × 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ ride_id : chr [1:144873] "C1D650626C8C899A" "EECD38BDB25BFCB0" "F4A9CE78061F17F7" "0A0D9E15EE50B171" ...
## $ rideable_type : chr [1:144873] "electric_bike" "electric_bike" "electric_bike" "classic_bike" ...
## $ started_at : POSIXct[1:144873], format: "2024-01-12 15:30:27" "2024-01-08 15:45:46" ...
## $ ended_at : POSIXct[1:144873], format: "2024-01-12 15:37:59" "2024-01-08 15:52:59" ...
## $ start_station_name: chr [1:144873] "Wells St & Elm St" "Wells St & Elm St" "Wells St & Elm St" "Wells St & Randolph St" ...
## $ start_station_id : chr [1:144873] "KA1504000135" "KA1504000135" "KA1504000135" "TA1305000030" ...
## $ end_station_name : chr [1:144873] "Kingsbury St & Kinzie St" "Kingsbury St & Kinzie St" "Kingsbury St & Kinzie St" "Larrabee St & Webster Ave" ...
## $ end_station_id : chr [1:144873] "KA1503000043" "KA1503000043" "KA1503000043" "13193" ...
## $ start_lat : num [1:144873] 41.9 41.9 41.9 41.9 41.9 ...
## $ start_lng : num [1:144873] -87.6 -87.6 -87.6 -87.6 -87.7 ...
## $ end_lat : num [1:144873] 41.9 41.9 41.9 41.9 41.9 ...
## $ end_lng : num [1:144873] -87.6 -87.6 -87.6 -87.6 -87.6 ...
## $ member_casual : chr [1:144873] "member" "member" "member" "member" ...
## - attr(*, "spec")=
## .. cols(
## .. ride_id = col_character(),
## .. rideable_type = col_character(),
## .. started_at = col_datetime(format = ""),
## .. ended_at = col_datetime(format = ""),
## .. start_station_name = col_character(),
## .. start_station_id = col_character(),
## .. end_station_name = col_character(),
## .. end_station_id = col_character(),
## .. start_lat = col_double(),
## .. start_lng = col_double(),
## .. end_lat = col_double(),
## .. end_lng = col_double(),
## .. member_casual = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
str(D02_2024)
## spc_tbl_ [223,164 × 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ ride_id : chr [1:223164] "FCB05EB1758F85E8" "7FB986AD5D3DE9D6" "40CA13E15B5B470D" "D47A1660919E8861" ...
## $ rideable_type : chr [1:223164] "classic_bike" "classic_bike" "electric_bike" "classic_bike" ...
## $ started_at : POSIXct[1:223164], format: "2024-02-03 14:14:18" "2024-02-05 21:10:06" ...
## $ ended_at : POSIXct[1:223164], format: "2024-02-03 14:21:00" "2024-02-05 21:15:44" ...
## $ start_station_name: chr [1:223164] "Clark St & Newport St" "Michigan Ave & Washington St" "Leavitt St & Armitage Ave" "Southport Ave & Waveland Ave" ...
## $ start_station_id : chr [1:223164] "632" "13001" "TA1309000029" "13235" ...
## $ end_station_name : chr [1:223164] "Southport Ave & Waveland Ave" "Wabash Ave & Grand Ave" "Milwaukee Ave & Wabansia Ave" "Southport Ave & Belmont Ave" ...
## $ end_station_id : chr [1:223164] "13235" "TA1307000117" "13243" "13229" ...
## $ start_lat : num [1:223164] 41.9 41.9 41.9 41.9 41.8 ...
## $ start_lng : num [1:223164] -87.7 -87.6 -87.7 -87.7 -87.6 ...
## $ end_lat : num [1:223164] 41.9 41.9 41.9 41.9 41.8 ...
## $ end_lng : num [1:223164] -87.7 -87.6 -87.7 -87.7 -87.6 ...
## $ member_casual : chr [1:223164] "member" "member" "member" "member" ...
## - attr(*, "spec")=
## .. cols(
## .. ride_id = col_character(),
## .. rideable_type = col_character(),
## .. started_at = col_datetime(format = ""),
## .. ended_at = col_datetime(format = ""),
## .. start_station_name = col_character(),
## .. start_station_id = col_character(),
## .. end_station_name = col_character(),
## .. end_station_id = col_character(),
## .. start_lat = col_double(),
## .. start_lng = col_double(),
## .. end_lat = col_double(),
## .. end_lng = col_double(),
## .. member_casual = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
str(D03_2024)
## spc_tbl_ [301,687 × 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ ride_id : chr [1:301687] "64FBE3BAED5F29E6" "9991629435C5E20E" "E5C9FECD5B71BEBD" "4CEA3EC8906DAEA8" ...
## $ rideable_type : chr [1:301687] "electric_bike" "electric_bike" "electric_bike" "electric_bike" ...
## $ started_at : POSIXct[1:301687], format: "2024-03-05 18:33:11" "2024-03-06 17:15:14" ...
## $ ended_at : POSIXct[1:301687], format: "2024-03-05 18:51:48" "2024-03-06 17:16:04" ...
## $ start_station_name: chr [1:301687] NA NA NA NA ...
## $ start_station_id : chr [1:301687] NA NA NA NA ...
## $ end_station_name : chr [1:301687] NA NA NA NA ...
## $ end_station_id : chr [1:301687] NA NA NA NA ...
## $ start_lat : num [1:301687] 41.9 41.9 41.9 41.9 41.9 ...
## $ start_lng : num [1:301687] -87.7 -87.6 -87.6 -87.6 -87.7 ...
## $ end_lat : num [1:301687] 42 41.9 41.9 41.9 41.9 ...
## $ end_lng : num [1:301687] -87.7 -87.6 -87.6 -87.6 -87.7 ...
## $ member_casual : chr [1:301687] "member" "member" "member" "member" ...
## - attr(*, "spec")=
## .. cols(
## .. ride_id = col_character(),
## .. rideable_type = col_character(),
## .. started_at = col_datetime(format = ""),
## .. ended_at = col_datetime(format = ""),
## .. start_station_name = col_character(),
## .. start_station_id = col_character(),
## .. end_station_name = col_character(),
## .. end_station_id = col_character(),
## .. start_lat = col_double(),
## .. start_lng = col_double(),
## .. end_lat = col_double(),
## .. end_lng = col_double(),
## .. member_casual = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
str(D04_2024)
## spc_tbl_ [415,025 × 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ ride_id : chr [1:415025] "743252713F32516B" "BE90D33D2240C614" "D47BBDDE7C40DD61" "6684E760BF9EA9B5" ...
## $ rideable_type : chr [1:415025] "classic_bike" "electric_bike" "classic_bike" "classic_bike" ...
## $ started_at : POSIXct[1:415025], format: "2024-04-22 19:08:21" "2024-04-11 06:19:24" ...
## $ ended_at : POSIXct[1:415025], format: "2024-04-22 19:12:56" "2024-04-11 06:22:21" ...
## $ start_station_name: chr [1:415025] "Aberdeen St & Jackson Blvd" "Aberdeen St & Jackson Blvd" "Sheridan Rd & Montrose Ave" "Aberdeen St & Jackson Blvd" ...
## $ start_station_id : chr [1:415025] "13157" "13157" "TA1307000107" "13157" ...
## $ end_station_name : chr [1:415025] "Desplaines St & Jackson Blvd" "Desplaines St & Jackson Blvd" "Ashland Ave & Belle Plaine Ave" "Desplaines St & Jackson Blvd" ...
## $ end_station_id : chr [1:415025] "15539" "15539" "13249" "15539" ...
## $ start_lat : num [1:415025] 41.9 41.9 42 41.9 42 ...
## $ start_lng : num [1:415025] -87.7 -87.7 -87.7 -87.7 -87.7 ...
## $ end_lat : num [1:415025] 41.9 41.9 42 41.9 41.9 ...
## $ end_lng : num [1:415025] -87.6 -87.6 -87.7 -87.6 -87.6 ...
## $ member_casual : chr [1:415025] "member" "member" "member" "member" ...
## - attr(*, "spec")=
## .. cols(
## .. ride_id = col_character(),
## .. rideable_type = col_character(),
## .. started_at = col_datetime(format = ""),
## .. ended_at = col_datetime(format = ""),
## .. start_station_name = col_character(),
## .. start_station_id = col_character(),
## .. end_station_name = col_character(),
## .. end_station_id = col_character(),
## .. start_lat = col_double(),
## .. start_lng = col_double(),
## .. end_lat = col_double(),
## .. end_lng = col_double(),
## .. member_casual = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
str(D05_2024)
## spc_tbl_ [609,493 × 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ ride_id : chr [1:609493] "7D9F0CE9EC2A1297" "02EC47687411416F" "101370FB2D3402BE" "E97E396331ED6913" ...
## $ rideable_type : chr [1:609493] "classic_bike" "classic_bike" "classic_bike" "electric_bike" ...
## $ started_at : POSIXct[1:609493], format: "2024-05-25 15:52:42" "2024-05-14 15:11:51" ...
## $ ended_at : POSIXct[1:609493], format: "2024-05-25 16:11:50" "2024-05-14 15:22:00" ...
## $ start_station_name: chr [1:609493] "Streeter Dr & Grand Ave" "Sheridan Rd & Greenleaf Ave" "Streeter Dr & Grand Ave" "Streeter Dr & Grand Ave" ...
## $ start_station_id : chr [1:609493] "13022" "KA1504000159" "13022" "13022" ...
## $ end_station_name : chr [1:609493] "Clark St & Elm St" "Sheridan Rd & Loyola Ave" "Wabash Ave & 9th St" "Sheffield Ave & Wellington Ave" ...
## $ end_station_id : chr [1:609493] "TA1307000039" "RP-009" "TA1309000010" "TA1307000052" ...
## $ start_lat : num [1:609493] 41.9 42 41.9 41.9 41.9 ...
## $ start_lng : num [1:609493] -87.6 -87.7 -87.6 -87.6 -87.6 ...
## $ end_lat : num [1:609493] 41.9 42 41.9 41.9 41.9 ...
## $ end_lng : num [1:609493] -87.6 -87.7 -87.6 -87.7 -87.6 ...
## $ member_casual : chr [1:609493] "casual" "casual" "member" "member" ...
## - attr(*, "spec")=
## .. cols(
## .. ride_id = col_character(),
## .. rideable_type = col_character(),
## .. started_at = col_datetime(format = ""),
## .. ended_at = col_datetime(format = ""),
## .. start_station_name = col_character(),
## .. start_station_id = col_character(),
## .. end_station_name = col_character(),
## .. end_station_id = col_character(),
## .. start_lat = col_double(),
## .. start_lng = col_double(),
## .. end_lat = col_double(),
## .. end_lng = col_double(),
## .. member_casual = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
str(D06_2024)
## spc_tbl_ [710,721 × 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ ride_id : chr [1:710721] "CDE6023BE6B11D2F" "462B48CD292B6A18" "9CFB6A858D23ABF7" "6365EFEB64231153" ...
## $ rideable_type : chr [1:710721] "electric_bike" "electric_bike" "electric_bike" "electric_bike" ...
## $ started_at : POSIXct[1:710721], format: "2024-06-11 17:20:06" "2024-06-11 17:19:21" ...
## $ ended_at : POSIXct[1:710721], format: "2024-06-11 17:21:39" "2024-06-11 17:19:36" ...
## $ start_station_name: chr [1:710721] NA NA NA NA ...
## $ start_station_id : chr [1:710721] NA NA NA NA ...
## $ end_station_name : chr [1:710721] NA NA NA NA ...
## $ end_station_id : chr [1:710721] NA NA NA NA ...
## $ start_lat : num [1:710721] 41.9 41.9 41.9 41.9 41.9 ...
## $ start_lng : num [1:710721] -87.7 -87.7 -87.7 -87.6 -87.6 ...
## $ end_lat : num [1:710721] 41.9 41.9 41.9 41.9 41.9 ...
## $ end_lng : num [1:710721] -87.7 -87.7 -87.7 -87.6 -87.6 ...
## $ member_casual : chr [1:710721] "casual" "casual" "casual" "casual" ...
## - attr(*, "spec")=
## .. cols(
## .. ride_id = col_character(),
## .. rideable_type = col_character(),
## .. started_at = col_datetime(format = ""),
## .. ended_at = col_datetime(format = ""),
## .. start_station_name = col_character(),
## .. start_station_id = col_character(),
## .. end_station_name = col_character(),
## .. end_station_id = col_character(),
## .. start_lat = col_double(),
## .. start_lng = col_double(),
## .. end_lat = col_double(),
## .. end_lng = col_double(),
## .. member_casual = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
str(D07_2024)
## spc_tbl_ [748,962 × 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ ride_id : chr [1:748962] "2658E319B13141F9" "B2176315168A47CE" "C2A9D33DF7EBB422" "8BFEA406DF01D8AD" ...
## $ rideable_type : chr [1:748962] "electric_bike" "electric_bike" "electric_bike" "electric_bike" ...
## $ started_at : POSIXct[1:748962], format: "2024-07-11 08:15:14" "2024-07-11 15:45:07" ...
## $ ended_at : POSIXct[1:748962], format: "2024-07-11 08:17:56" "2024-07-11 16:06:04" ...
## $ start_station_name: chr [1:748962] NA NA NA NA ...
## $ start_station_id : chr [1:748962] NA NA NA NA ...
## $ end_station_name : chr [1:748962] NA NA NA NA ...
## $ end_station_id : chr [1:748962] NA NA NA NA ...
## $ start_lat : num [1:748962] 41.8 41.8 41.8 41.9 42 ...
## $ start_lng : num [1:748962] -87.6 -87.6 -87.6 -87.6 -87.6 ...
## $ end_lat : num [1:748962] 41.8 41.8 41.8 41.9 41.9 ...
## $ end_lng : num [1:748962] -87.6 -87.6 -87.6 -87.7 -87.6 ...
## $ member_casual : chr [1:748962] "casual" "casual" "casual" "casual" ...
## - attr(*, "spec")=
## .. cols(
## .. ride_id = col_character(),
## .. rideable_type = col_character(),
## .. started_at = col_datetime(format = ""),
## .. ended_at = col_datetime(format = ""),
## .. start_station_name = col_character(),
## .. start_station_id = col_character(),
## .. end_station_name = col_character(),
## .. end_station_id = col_character(),
## .. start_lat = col_double(),
## .. start_lng = col_double(),
## .. end_lat = col_double(),
## .. end_lng = col_double(),
## .. member_casual = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
str(D08_2024)
## spc_tbl_ [755,639 × 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ ride_id : chr [1:755639] "BAA154388A869E64" "8752245932EFF67A" "44DDF9F57A9A161F" "44AAAF069B0C78C3" ...
## $ rideable_type : chr [1:755639] "classic_bike" "electric_bike" "classic_bike" "electric_bike" ...
## $ started_at : POSIXct[1:755639], format: "2024-08-02 13:35:14" "2024-08-02 15:33:13" ...
## $ ended_at : POSIXct[1:755639], format: "2024-08-02 13:48:24" "2024-08-02 15:55:23" ...
## $ start_station_name: chr [1:755639] "State St & Randolph St" "Franklin St & Monroe St" "Franklin St & Monroe St" "Clark St & Elm St" ...
## $ start_station_id : chr [1:755639] "TA1305000029" "TA1309000007" "TA1309000007" "TA1307000039" ...
## $ end_station_name : chr [1:755639] "Wabash Ave & 9th St" "Damen Ave & Cortland St" "Clark St & Elm St" "McClurg Ct & Ohio St" ...
## $ end_station_id : chr [1:755639] "TA1309000010" "13133" "TA1307000039" "TA1306000029" ...
## $ start_lat : num [1:755639] 41.9 41.9 41.9 41.9 42 ...
## $ start_lng : num [1:755639] -87.6 -87.6 -87.6 -87.6 -87.7 ...
## $ end_lat : num [1:755639] 41.9 41.9 41.9 41.9 42 ...
## $ end_lng : num [1:755639] -87.6 -87.7 -87.6 -87.6 -87.7 ...
## $ member_casual : chr [1:755639] "member" "member" "member" "member" ...
## - attr(*, "spec")=
## .. cols(
## .. ride_id = col_character(),
## .. rideable_type = col_character(),
## .. started_at = col_datetime(format = ""),
## .. ended_at = col_datetime(format = ""),
## .. start_station_name = col_character(),
## .. start_station_id = col_character(),
## .. end_station_name = col_character(),
## .. end_station_id = col_character(),
## .. start_lat = col_double(),
## .. start_lng = col_double(),
## .. end_lat = col_double(),
## .. end_lng = col_double(),
## .. member_casual = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
str(D09_2024)
## spc_tbl_ [821,276 × 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ ride_id : chr [1:821276] "31D38723D5A8665A" "67CB39987F4E895B" "DA61204FD26EC681" "06F160D46AF235DD" ...
## $ rideable_type : chr [1:821276] "electric_bike" "electric_bike" "electric_bike" "electric_bike" ...
## $ started_at : POSIXct[1:821276], format: "2024-09-26 15:30:58" "2024-09-26 15:31:32" ...
## $ ended_at : POSIXct[1:821276], format: "2024-09-26 15:30:59" "2024-09-26 15:53:13" ...
## $ start_station_name: chr [1:821276] NA NA NA NA ...
## $ start_station_id : chr [1:821276] NA NA NA NA ...
## $ end_station_name : chr [1:821276] NA NA NA NA ...
## $ end_station_id : chr [1:821276] NA NA NA NA ...
## $ start_lat : num [1:821276] 41.9 41.9 41.9 41.9 41.9 ...
## $ start_lng : num [1:821276] -87.6 -87.6 -87.6 -87.6 -87.7 ...
## $ end_lat : num [1:821276] 41.9 41.9 41.9 41.9 41.9 ...
## $ end_lng : num [1:821276] -87.6 -87.6 -87.6 -87.6 -87.6 ...
## $ member_casual : chr [1:821276] "member" "member" "member" "member" ...
## - attr(*, "spec")=
## .. cols(
## .. ride_id = col_character(),
## .. rideable_type = col_character(),
## .. started_at = col_datetime(format = ""),
## .. ended_at = col_datetime(format = ""),
## .. start_station_name = col_character(),
## .. start_station_id = col_character(),
## .. end_station_name = col_character(),
## .. end_station_id = col_character(),
## .. start_lat = col_double(),
## .. start_lng = col_double(),
## .. end_lat = col_double(),
## .. end_lng = col_double(),
## .. member_casual = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
str(D10_2024)
## spc_tbl_ [616,281 × 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ ride_id : chr [1:616281] "4422E707103AA4FF" "19DB722B44CBE82F" "20AE2509FD68C939" "D0F17580AB9515A9" ...
## $ rideable_type : chr [1:616281] "electric_bike" "electric_bike" "electric_bike" "electric_bike" ...
## $ started_at : POSIXct[1:616281], format: "2024-10-14 03:26:04" "2024-10-13 19:33:38" ...
## $ ended_at : POSIXct[1:616281], format: "2024-10-14 03:32:56" "2024-10-13 19:39:04" ...
## $ start_station_name: chr [1:616281] NA NA NA NA ...
## $ start_station_id : chr [1:616281] NA NA NA NA ...
## $ end_station_name : chr [1:616281] NA NA NA NA ...
## $ end_station_id : chr [1:616281] NA NA NA NA ...
## $ start_lat : num [1:616281] 42 42 42 42 42 ...
## $ start_lng : num [1:616281] -87.7 -87.7 -87.7 -87.7 -87.7 ...
## $ end_lat : num [1:616281] 42 42 42 42 42 ...
## $ end_lng : num [1:616281] -87.7 -87.7 -87.7 -87.7 -87.7 ...
## $ member_casual : chr [1:616281] "member" "member" "member" "member" ...
## - attr(*, "spec")=
## .. cols(
## .. ride_id = col_character(),
## .. rideable_type = col_character(),
## .. started_at = col_datetime(format = ""),
## .. ended_at = col_datetime(format = ""),
## .. start_station_name = col_character(),
## .. start_station_id = col_character(),
## .. end_station_name = col_character(),
## .. end_station_id = col_character(),
## .. start_lat = col_double(),
## .. start_lng = col_double(),
## .. end_lat = col_double(),
## .. end_lng = col_double(),
## .. member_casual = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
str(D11_2024)
## spc_tbl_ [335,075 × 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ ride_id : chr [1:335075] "578DDD7CE1771FFA" "78B141C50102ABA6" "1E794CF36394E2D7" "E5DD2CAB58D73F98" ...
## $ rideable_type : chr [1:335075] "classic_bike" "classic_bike" "classic_bike" "classic_bike" ...
## $ started_at : POSIXct[1:335075], format: "2024-11-07 19:21:58" "2024-11-22 14:49:00" ...
## $ ended_at : POSIXct[1:335075], format: "2024-11-07 19:28:57" "2024-11-22 14:56:15" ...
## $ start_station_name: chr [1:335075] "Walsh Park" "Walsh Park" "Walsh Park" "Clark St & Elm St" ...
## $ start_station_id : chr [1:335075] "18067" "18067" "18067" "TA1307000039" ...
## $ end_station_name : chr [1:335075] "Leavitt St & North Ave" "Leavitt St & Armitage Ave" "Damen Ave & Cortland St" "Clark St & Drummond Pl" ...
## $ end_station_id : chr [1:335075] "TA1308000005" "TA1309000029" "13133" "TA1307000142" ...
## $ start_lat : num [1:335075] 41.9 41.9 41.9 41.9 41.9 ...
## $ start_lng : num [1:335075] -87.7 -87.7 -87.7 -87.6 -87.6 ...
## $ end_lat : num [1:335075] 41.9 41.9 41.9 41.9 41.9 ...
## $ end_lng : num [1:335075] -87.7 -87.7 -87.7 -87.6 -87.6 ...
## $ member_casual : chr [1:335075] "member" "member" "member" "member" ...
## - attr(*, "spec")=
## .. cols(
## .. ride_id = col_character(),
## .. rideable_type = col_character(),
## .. started_at = col_datetime(format = ""),
## .. ended_at = col_datetime(format = ""),
## .. start_station_name = col_character(),
## .. start_station_id = col_character(),
## .. end_station_name = col_character(),
## .. end_station_id = col_character(),
## .. start_lat = col_double(),
## .. start_lng = col_double(),
## .. end_lat = col_double(),
## .. end_lng = col_double(),
## .. member_casual = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
str(D12_2024)
## spc_tbl_ [178,372 × 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ ride_id : chr [1:178372] "6C960DEB4F78854E" "C0913EEB2834E7A2" "848A37DD4723078A" "3FA09C762ECB48BD" ...
## $ rideable_type : chr [1:178372] "electric_bike" "classic_bike" "classic_bike" "electric_bike" ...
## $ started_at : POSIXct[1:178372], format: "2024-12-31 01:38:35" "2024-12-21 18:41:26" ...
## $ ended_at : POSIXct[1:178372], format: "2024-12-31 01:48:45" "2024-12-21 18:47:33" ...
## $ start_station_name: chr [1:178372] "Halsted St & Roscoe St" "Clark St & Wellington Ave" "Sheridan Rd & Montrose Ave" "Aberdeen St & Jackson Blvd" ...
## $ start_station_id : chr [1:178372] "TA1309000025" "TA1307000136" "TA1307000107" "13157" ...
## $ end_station_name : chr [1:178372] "Clark St & Winnemac Ave" "Halsted St & Roscoe St" "Broadway & Barry Ave" "Green St & Randolph St*" ...
## $ end_station_id : chr [1:178372] "TA1309000035" "TA1309000025" "13137" "chargingstx3" ...
## $ start_lat : num [1:178372] 41.9 41.9 42 41.9 41.9 ...
## $ start_lng : num [1:178372] -87.6 -87.6 -87.7 -87.7 -87.7 ...
## $ end_lat : num [1:178372] 42 41.9 41.9 41.9 41.9 ...
## $ end_lng : num [1:178372] -87.7 -87.6 -87.6 -87.6 -87.7 ...
## $ member_casual : chr [1:178372] "member" "member" "member" "member" ...
## - attr(*, "spec")=
## .. cols(
## .. ride_id = col_character(),
## .. rideable_type = col_character(),
## .. started_at = col_datetime(format = ""),
## .. ended_at = col_datetime(format = ""),
## .. start_station_name = col_character(),
## .. start_station_id = col_character(),
## .. end_station_name = col_character(),
## .. end_station_id = col_character(),
## .. start_lat = col_double(),
## .. start_lng = col_double(),
## .. end_lat = col_double(),
## .. end_lng = col_double(),
## .. member_casual = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
# Unite the data in one table
travels_in_2024 <- bind_rows(D01_2024,D02_2024,D03_2024,D04_2024,D05_2024,D06_2024,D07_2024,D08_2024,D09_2024,D10_2024,D11_2024,D12_2024)
The following steps were taken to ensure data integrity: -Review the missing data
# Sample of table
data_sample_1 <- slice_sample(travels_in_2024, n = 50000)
vis_miss(data_sample_1)
-New Columns. Calculated Ride Length: A new column, ride_length_mins, was created to calculate the duration of each trip in minutes. Extract of 2 new columns for weekday and month. Travel: a new column for travel born from the union of start_station and end_station.
travels_in_2024 <- travels_in_2024 %>%
mutate(
# Calculate the duration of travel in minutes
ride_length_mins = as.numeric(difftime(ended_at, started_at, units = "mins")),
# Extract the day of the week (1=Sunday, 7=Saturday)
day_of_week = wday(started_at, week_start = 1),
# Exctract the month (1=January, 12=December)
month_number = month(started_at),
# Make a travel column
travel = paste(start_station_name, end_station_name, sep = " - ")
)
-Filtered Outliers: Trips that were shorter than one minute or longer than 24 hours (1440 minutes) were removed to eliminate potential data errors.
# Filtering and droping nulls
analyst_of_behavior_2024 <- travels_in_2024 %>%
filter(ride_length_mins > 1 & ride_length_mins < 1440) %>%
drop_na()
-Handled Missing Data: A significant portion of rides (18-19%) were missing start or end station names. These records were removed to ensure the accuracy of location-based analysis. After cleaning, the dataset was confirmed to have no missing values.
# Sample of table
data_sample_2 <- slice_sample(analyst_of_behavior_2024, n = 50000)
vis_miss(data_sample_2)
Our analysis reveals significant behavioral differences between casual riders and annual members.
analyst_of_behavior_2024 %>%
group_by(member_casual) %>%
summarise(
mean_ride_length = mean(ride_length_mins),
median_ride_length = median(ride_length_mins),
max_ride_length = max(ride_length_mins),
min_ride_length = min(ride_length_mins)
)
## # A tibble: 2 × 5
## member_casual mean_ride_length median_ride_length max_ride_length
## <chr> <dbl> <dbl> <dbl>
## 1 casual 24.2 13.6 1440.
## 2 member 12.6 8.9 1438.
## # ℹ 1 more variable: min_ride_length <dbl>
Casual riders use the bikes for significantly longer durations than members. The average ride for a casual user is 24.2 minutes, while for a member, it’s only 12.6 minutes. This suggests that casual riders are more likely using the bikes for recreation, while members are likely using them for shorter, more purposeful trips like commuting.
ggplot(data = analyst_of_behavior_2024, aes(x = member_casual, y = ride_length_mins, fill = member_casual)) +
geom_bar(stat = "summary", fun = "mean") +
labs(title = "Average time of travel: Members vs Casuals", x = "User", y = "Average time") +
scale_y_continuous(labels = scales::comma)
Member usage is highest during the typical work week (Monday-Friday), peaking on Wednesday. This reinforces the idea that members use the bikes for commuting.
In contrast, casual rider usage peaks dramatically on the weekends (Saturday and Sunday), indicating a preference for leisure and recreational rides.
# Graphic: Number of travels per day of the week
ggplot(data = analyst_of_behavior_2024, aes(x = day_of_week, fill = member_casual)) +
geom_bar(position = "dodge") +
labs(title = "Number of travels per day of the week", x = "Day of the week", y = "Number of travels") +
scale_y_continuous(labels = scales::comma)
Both casual riders and members use the bike-share service most during the warmer months, from May to October. However, the increase in ridership during these months is much more pronounced for casual users.
# Graphic: Number of travels per month
ggplot(data = analyst_of_behavior_2024, aes(x = month_number, fill = member_casual)) +
geom_bar(position = "dodge") +
labs(title = "Number of travels per month", x = "Month", y = "Number of travels") +
scale_y_continuous(labels = scales::comma)
The most popular routes for casual riders are concentrated in areas known for tourism and recreation, such as Streeter Dr & Grand Ave, DuSable Lake Shore Dr, and Millennium Park. Many of the top routes are round-trips, starting and ending at the same station. While members used for commute, mostly in the trips for State St to Calumet Ave following the 33rd St. on both ways and 55 St to 60 St following Ellis Ave also on both ways.
# Grafico: Top 10 routes
top_20_casual_routes <- analyst_of_behavior_2024 %>%
count(member_casual, travel) %>%
group_by(member_casual) %>%
slice_max(order_by = n, n = 10) %>%
mutate(travel = fct_inorder(travel))
ggplot(data=top_20_casual_routes, aes(x = travel, y = n, fill = member_casual)) +
geom_col(position = "dodge") +
geom_text(aes(label = n), size = 4, hjust = 1.1, color = "white" ) +
labs(title = "Top 10 routes", x = "Routes", y = "Number of travels") +
coord_flip()
Based on these findings, we recommend the following strategies to convert casual riders into annual members:
Create a discounted annual membership that offers enhanced benefits for weekend rides. This could appeal to casual riders who primarily use the service for leisure on Saturdays and Sundays. Marketing for this could be targeted at popular weekend locations like parks and lakefront stations.
During peak commuting months (May-October), launch a promotional campaign offering a discounted 1-month or 3-month membership. Target this campaign with geo-fenced digital ads around the most popular member start stations (e.g., Kingsbury St & Kinzie St, Clinton St & Washington Blvd) to attract casual riders who might be using the bikes for work-related travel.
Develop digital media content that showcases the benefits of membership for both commuting and leisure.
“Ride to Own Your Commute”: Highlight the cost savings and convenience of an annual membership for daily commuters compared to single-ride passes.
“Become a City Explorer”: Create suggested tour routes and highlight points of interest that can be easily accessed via Cyclistic bikes. Offer these as a perk to new annual members to show the value beyond just getting from point A to point B. This would appeal to the recreational nature of casual riders.
By implementing these targeted strategies, we can more effectively demonstrate the value of an annual membership to our casual riders and drive the future growth of Cyclistic.