Case Study: How Does a Bike-Share Navigate Speedy Success?

Introduction

Welcome to the Cyclistic bike-share analysis case study! In this case study, I performed many real-world tasks of a junior data analyst for a fictional company, Cyclistic. In order to answer the key business questions, I followed the steps of the data analysis process: Ask, Prepare, Process, Analyze, Share and Act.

Scenario

Cyclistic, is a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore, our team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, our team will design a new marketing strategy to convert casual riders into annual members. But first, Cyclistic executives must approve the recommendations, so they must be backed up with compelling data insights and professional data visualizations.

Characters and teams

● Cyclistic: A bike-share program that features more than 5,800 bicycles and 600 docking stations. Cyclistic sets itself apart by also offering reclining bikes, hand tricycles, and cargo bikes, making bike-share more inclusive to people with disabilities and riders who can’t use a standard two-wheeled bike. The majority of riders opt for traditional bikes; about 8% of riders use the assistive options. Cyclistic users are more likely to ride for leisure, but about 30% use them to commute to work each day.

● Lily Moreno: The director of marketing and your manager. Moreno is responsible for the development of campaigns and initiatives to promote the bike-share program. These may include email, social media, and other channels.

● Cyclistic marketing analytics team: A team of data analysts who are responsible for collecting, analyzing, and reporting data that helps guide Cyclistic marketing strategy. You joined this team six months ago and have been busy learning about Cyclistic’s mission and business goals — as well as how you, as a junior data analyst, can help Cyclistic achieve them.

● Cyclistic executive team: The notoriously detail-oriented executive team will decide whether to approve the recommended marketing program.

Ask

Key tasks

  1. Identify the business task.

The primary question to answer which will guide the future marketing program is how do annual members and casual riders use Cyclistic bikes differently? To figure it out we need to study the bike usage behavior of both casual and annual members. After this we have to brainstorm the recommendations from the insights gained which will led to conversion of casual users to annual members.

  1. Consider key stakeholders.
  1. Lily Moreno: The director of marketing and manager
  2. Cyclistic executive team

Prepare

This phase addresses where the data is located? How is it organized? Bias and credibility issues in the data along with licensing, privacy, security and accessibility. Does your data ROCCC?

For the purpose of this case study Cyclistic’s current trip data of last 12 months i.e. from Feb 2021 to Jan 2022 is used to analyze and identify trends. The data has been made available by Motivate International Inc. under this license. This is a public dataset which can be downloaded from here. The dataset provided were zipped csv formatted files contained monthly trip data. ROCCC stands for Reliability, Originality, Comprehensiveness, Current and Cited. As cited above, this data is from a fictional company so we couldn’t stand for it’s reliability or if it’s cited. As for originality this data was acquired by company itself. Sensitive information was not provided, such as names and addresses of users. The data is licensed, private, secure and accessible. Dataset consist of 13 columns and upto 100000’s of rows per data file.

Loading required R Packages for data preparation:

library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.6     v dplyr   1.0.8
## v tidyr   1.2.0     v stringr 1.4.0
## v readr   2.1.2     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(dplyr)
library(ggplot2)
library(ggmap)
## Google's Terms of Service: https://cloud.google.com/maps-platform/terms/.
## Please cite ggmap if you use it! See citation("ggmap") for details.
library(geosphere)

Reading all the data files.

Feb <- read.csv("feb_2021.csv")
Mar <- read.csv('mar_2021.csv')
Apr <- read.csv('apr_2021.csv')
May <- read.csv('may_2021.csv')
Jun <- read.csv('jun_2021.csv')
Jul <- read.csv('jul_2021.csv')
Aug <- read.csv('aug_2021.csv')
Sep <- read.csv('sep_2021.csv')
Oct <- read.csv('oct_2021.csv')
Nov <- read.csv('nov_2021.csv')
Dec <- read.csv('dec_2021.csv')
Jan <- read.csv('jan_2022.csv')

Inspecting the structure of all the data files to ensure equal columns and appropriate datatypes.

colnames(Jan)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"
str(Jan)
## 'data.frame':    103770 obs. of  13 variables:
##  $ ride_id           : chr  "C2F7DD78E82EC875" "A6CF8980A652D272" "BD0F91DFF741C66D" "CBB80ED419105406" ...
##  $ rideable_type     : chr  "electric_bike" "electric_bike" "classic_bike" "classic_bike" ...
##  $ started_at        : chr  "2022-01-13 11:59:47" "2022-01-10 08:41:56" "2022-01-25 04:53:40" "2022-01-04 00:18:04" ...
##  $ ended_at          : chr  "2022-01-13 12:02:44" "2022-01-10 08:46:17" "2022-01-25 04:58:01" "2022-01-04 00:33:00" ...
##  $ start_station_name: chr  "Glenwood Ave & Touhy Ave" "Glenwood Ave & Touhy Ave" "Sheffield Ave & Fullerton Ave" "Clark St & Bryn Mawr Ave" ...
##  $ start_station_id  : chr  "525" "525" "TA1306000016" "KA1504000151" ...
##  $ end_station_name  : chr  "Clark St & Touhy Ave" "Clark St & Touhy Ave" "Greenview Ave & Fullerton Ave" "Paulina St & Montrose Ave" ...
##  $ end_station_id    : chr  "RP-007" "RP-007" "TA1307000001" "TA1309000021" ...
##  $ start_lat         : num  42 42 41.9 42 41.9 ...
##  $ start_lng         : num  -87.7 -87.7 -87.7 -87.7 -87.6 ...
##  $ end_lat           : num  42 42 41.9 42 41.9 ...
##  $ end_lng           : num  -87.7 -87.7 -87.7 -87.7 -87.6 ...
##  $ member_casual     : chr  "casual" "casual" "member" "casual" ...
colnames(Feb)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"
str(Feb)
## 'data.frame':    49622 obs. of  13 variables:
##  $ ride_id           : chr  "89E7AA6C29227EFF" "0FEFDE2603568365" "E6159D746B2DBB91" "B32D3199F1C2E75B" ...
##  $ rideable_type     : chr  "classic_bike" "classic_bike" "electric_bike" "classic_bike" ...
##  $ started_at        : chr  "2021-02-12 16:14:56" "2021-02-14 17:52:38" "2021-02-09 19:10:18" "2021-02-02 17:49:41" ...
##  $ ended_at          : chr  "2021-02-12 16:21:43" "2021-02-14 18:12:09" "2021-02-09 19:19:10" "2021-02-02 17:54:06" ...
##  $ start_station_name: chr  "Glenwood Ave & Touhy Ave" "Glenwood Ave & Touhy Ave" "Clark St & Lake St" "Wood St & Chicago Ave" ...
##  $ start_station_id  : chr  "525" "525" "KA1503000012" "637" ...
##  $ end_station_name  : chr  "Sheridan Rd & Columbia Ave" "Bosworth Ave & Howard St" "State St & Randolph St" "Honore St & Division St" ...
##  $ end_station_id    : chr  "660" "16806" "TA1305000029" "TA1305000034" ...
##  $ start_lat         : num  42 42 41.9 41.9 41.8 ...
##  $ start_lng         : num  -87.7 -87.7 -87.6 -87.7 -87.6 ...
##  $ end_lat           : num  42 42 41.9 41.9 41.8 ...
##  $ end_lng           : num  -87.7 -87.7 -87.6 -87.7 -87.6 ...
##  $ member_casual     : chr  "member" "casual" "member" "member" ...
colnames(Mar)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"
str(Mar)
## 'data.frame':    228496 obs. of  13 variables:
##  $ ride_id           : chr  "CFA86D4455AA1030" "30D9DC61227D1AF3" "846D87A15682A284" "994D05AA75A168F2" ...
##  $ rideable_type     : chr  "classic_bike" "classic_bike" "classic_bike" "classic_bike" ...
##  $ started_at        : chr  "2021-03-16 08:32:30" "2021-03-28 01:26:28" "2021-03-11 21:17:29" "2021-03-11 13:26:42" ...
##  $ ended_at          : chr  "2021-03-16 08:36:34" "2021-03-28 01:36:55" "2021-03-11 21:33:53" "2021-03-11 13:55:41" ...
##  $ start_station_name: chr  "Humboldt Blvd & Armitage Ave" "Humboldt Blvd & Armitage Ave" "Shields Ave & 28th Pl" "Winthrop Ave & Lawrence Ave" ...
##  $ start_station_id  : chr  "15651" "15651" "15443" "TA1308000021" ...
##  $ end_station_name  : chr  "Stave St & Armitage Ave" "Central Park Ave & Bloomingdale Ave" "Halsted St & 35th St" "Broadway & Sheridan Rd" ...
##  $ end_station_id    : chr  "13266" "18017" "TA1308000043" "13323" ...
##  $ start_lat         : num  41.9 41.9 41.8 42 42 ...
##  $ start_lng         : num  -87.7 -87.7 -87.6 -87.7 -87.7 ...
##  $ end_lat           : num  41.9 41.9 41.8 42 42.1 ...
##  $ end_lng           : num  -87.7 -87.7 -87.6 -87.6 -87.7 ...
##  $ member_casual     : chr  "casual" "casual" "casual" "casual" ...
colnames(Apr)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"
str(Apr)
## 'data.frame':    337230 obs. of  13 variables:
##  $ ride_id           : chr  "6C992BD37A98A63F" "1E0145613A209000" "E498E15508A80BAD" "1887262AD101C604" ...
##  $ rideable_type     : chr  "classic_bike" "docked_bike" "docked_bike" "classic_bike" ...
##  $ started_at        : chr  "2021-04-12 18:25:36" "2021-04-27 17:27:11" "2021-04-03 12:42:45" "2021-04-17 09:17:42" ...
##  $ ended_at          : chr  "2021-04-12 18:56:55" "2021-04-27 18:31:29" "2021-04-07 11:40:24" "2021-04-17 09:42:48" ...
##  $ start_station_name: chr  "State St & Pearson St" "Dorchester Ave & 49th St" "Loomis Blvd & 84th St" "Honore St & Division St" ...
##  $ start_station_id  : chr  "TA1307000061" "KA1503000069" "20121" "TA1305000034" ...
##  $ end_station_name  : chr  "Southport Ave & Waveland Ave" "Dorchester Ave & 49th St" "Loomis Blvd & 84th St" "Southport Ave & Waveland Ave" ...
##  $ end_station_id    : chr  "13235" "KA1503000069" "20121" "13235" ...
##  $ start_lat         : num  41.9 41.8 41.7 41.9 41.7 ...
##  $ start_lng         : num  -87.6 -87.6 -87.7 -87.7 -87.7 ...
##  $ end_lat           : num  41.9 41.8 41.7 41.9 41.7 ...
##  $ end_lng           : num  -87.7 -87.6 -87.7 -87.7 -87.7 ...
##  $ member_casual     : chr  "member" "casual" "casual" "member" ...
colnames(May)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"
str(May)
## 'data.frame':    531633 obs. of  13 variables:
##  $ ride_id           : chr  "C809ED75D6160B2A" "DD59FDCE0ACACAF3" "0AB83CB88C43EFC2" "7881AC6D39110C60" ...
##  $ rideable_type     : chr  "electric_bike" "electric_bike" "electric_bike" "electric_bike" ...
##  $ started_at        : chr  "2021-05-30 11:58:15" "2021-05-30 11:29:14" "2021-05-30 14:24:01" "2021-05-30 14:25:51" ...
##  $ ended_at          : chr  "2021-05-30 12:10:39" "2021-05-30 12:14:09" "2021-05-30 14:25:13" "2021-05-30 14:41:04" ...
##  $ start_station_name: chr  "" "" "" "" ...
##  $ start_station_id  : chr  "" "" "" "" ...
##  $ end_station_name  : chr  "" "" "" "" ...
##  $ end_station_id    : chr  "" "" "" "" ...
##  $ start_lat         : num  41.9 41.9 41.9 41.9 41.9 ...
##  $ start_lng         : num  -87.6 -87.6 -87.7 -87.7 -87.7 ...
##  $ end_lat           : num  41.9 41.8 41.9 41.9 41.9 ...
##  $ end_lng           : num  -87.6 -87.6 -87.7 -87.7 -87.7 ...
##  $ member_casual     : chr  "casual" "casual" "casual" "casual" ...
colnames(Jun)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"
str(Jun)
## 'data.frame':    729595 obs. of  13 variables:
##  $ ride_id           : chr  "99FEC93BA843FB20" "06048DCFC8520CAF" "9598066F68045DF2" "B03C0FE48C412214" ...
##  $ rideable_type     : chr  "electric_bike" "electric_bike" "electric_bike" "electric_bike" ...
##  $ started_at        : chr  "2021-06-13 14:31:28" "2021-06-04 11:18:02" "2021-06-04 09:49:35" "2021-06-03 19:56:05" ...
##  $ ended_at          : chr  "2021-06-13 14:34:11" "2021-06-04 11:24:19" "2021-06-04 09:55:34" "2021-06-03 20:21:55" ...
##  $ start_station_name: chr  "" "" "" "" ...
##  $ start_station_id  : chr  "" "" "" "" ...
##  $ end_station_name  : chr  "" "" "" "" ...
##  $ end_station_id    : chr  "" "" "" "" ...
##  $ start_lat         : num  41.8 41.8 41.8 41.8 41.8 ...
##  $ start_lng         : num  -87.6 -87.6 -87.6 -87.6 -87.6 ...
##  $ end_lat           : num  41.8 41.8 41.8 41.8 41.8 ...
##  $ end_lng           : num  -87.6 -87.6 -87.6 -87.6 -87.6 ...
##  $ member_casual     : chr  "member" "member" "member" "member" ...
colnames(Jul)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"
str(Jul)
## 'data.frame':    822410 obs. of  13 variables:
##  $ ride_id           : chr  "0A1B623926EF4E16" "B2D5583A5A5E76EE" "6F264597DDBF427A" "379B58EAB20E8AA5" ...
##  $ rideable_type     : chr  "docked_bike" "classic_bike" "classic_bike" "classic_bike" ...
##  $ started_at        : chr  "2021-07-02 14:44:36" "2021-07-07 16:57:42" "2021-07-25 11:30:55" "2021-07-08 22:08:30" ...
##  $ ended_at          : chr  "2021-07-02 15:19:58" "2021-07-07 17:16:09" "2021-07-25 11:48:45" "2021-07-08 22:23:32" ...
##  $ start_station_name: chr  "Michigan Ave & Washington St" "California Ave & Cortez St" "Wabash Ave & 16th St" "California Ave & Cortez St" ...
##  $ start_station_id  : chr  "13001" "17660" "SL-012" "17660" ...
##  $ end_station_name  : chr  "Halsted St & North Branch St" "Wood St & Hubbard St" "Rush St & Hubbard St" "Carpenter St & Huron St" ...
##  $ end_station_id    : chr  "KA1504000117" "13432" "KA1503000044" "13196" ...
##  $ start_lat         : num  41.9 41.9 41.9 41.9 41.9 ...
##  $ start_lng         : num  -87.6 -87.7 -87.6 -87.7 -87.7 ...
##  $ end_lat           : num  41.9 41.9 41.9 41.9 41.9 ...
##  $ end_lng           : num  -87.6 -87.7 -87.6 -87.7 -87.7 ...
##  $ member_casual     : chr  "casual" "casual" "member" "member" ...
colnames(Aug)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"
str(Aug)
## 'data.frame':    804352 obs. of  13 variables:
##  $ ride_id           : chr  "99103BB87CC6C1BB" "EAFCCCFB0A3FC5A1" "9EF4F46C57AD234D" "5834D3208BFAF1DA" ...
##  $ rideable_type     : chr  "electric_bike" "electric_bike" "electric_bike" "electric_bike" ...
##  $ started_at        : chr  "2021-08-10 17:15:49" "2021-08-10 17:23:14" "2021-08-21 02:34:23" "2021-08-21 06:52:55" ...
##  $ ended_at          : chr  "2021-08-10 17:22:44" "2021-08-10 17:39:24" "2021-08-21 02:50:36" "2021-08-21 07:08:13" ...
##  $ start_station_name: chr  "" "" "" "" ...
##  $ start_station_id  : chr  "" "" "" "" ...
##  $ end_station_name  : chr  "" "" "" "" ...
##  $ end_station_id    : chr  "" "" "" "" ...
##  $ start_lat         : num  41.8 41.8 42 42 41.8 ...
##  $ start_lng         : num  -87.7 -87.7 -87.7 -87.7 -87.6 ...
##  $ end_lat           : num  41.8 41.8 42 42 41.8 ...
##  $ end_lng           : num  -87.7 -87.6 -87.7 -87.7 -87.6 ...
##  $ member_casual     : chr  "member" "member" "member" "member" ...
colnames(Sep)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"
str(Sep)
## 'data.frame':    756147 obs. of  13 variables:
##  $ ride_id           : chr  "9DC7B962304CBFD8" "F930E2C6872D6B32" "6EF72137900BB910" "78D1DE133B3DBF55" ...
##  $ rideable_type     : chr  "electric_bike" "electric_bike" "electric_bike" "electric_bike" ...
##  $ started_at        : chr  "2021-09-28 16:07:10" "2021-09-28 14:24:51" "2021-09-28 00:20:16" "2021-09-28 14:51:17" ...
##  $ ended_at          : chr  "2021-09-28 16:09:54" "2021-09-28 14:40:05" "2021-09-28 00:23:57" "2021-09-28 15:00:06" ...
##  $ start_station_name: chr  "" "" "" "" ...
##  $ start_station_id  : chr  "" "" "" "" ...
##  $ end_station_name  : chr  "" "" "" "" ...
##  $ end_station_id    : chr  "" "" "" "" ...
##  $ start_lat         : num  41.9 41.9 41.8 41.8 41.9 ...
##  $ start_lng         : num  -87.7 -87.6 -87.7 -87.7 -87.7 ...
##  $ end_lat           : num  41.9 42 41.8 41.8 41.9 ...
##  $ end_lng           : num  -87.7 -87.7 -87.7 -87.7 -87.7 ...
##  $ member_casual     : chr  "casual" "casual" "casual" "casual" ...
colnames(Oct)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"
str(Oct)
## 'data.frame':    631226 obs. of  13 variables:
##  $ ride_id           : chr  "620BC6107255BF4C" "4471C70731AB2E45" "26CA69D43D15EE14" "362947F0437E1514" ...
##  $ rideable_type     : chr  "electric_bike" "electric_bike" "electric_bike" "electric_bike" ...
##  $ started_at        : chr  "2021-10-22 12:46:42" "2021-10-21 09:12:37" "2021-10-16 16:28:39" "2021-10-16 16:17:48" ...
##  $ ended_at          : chr  "2021-10-22 12:49:50" "2021-10-21 09:14:14" "2021-10-16 16:36:26" "2021-10-16 16:19:03" ...
##  $ start_station_name: chr  "Kingsbury St & Kinzie St" "" "" "" ...
##  $ start_station_id  : chr  "KA1503000043" "" "" "" ...
##  $ end_station_name  : chr  "" "" "" "" ...
##  $ end_station_id    : chr  "" "" "" "" ...
##  $ start_lat         : num  41.9 41.9 41.9 41.9 41.9 ...
##  $ start_lng         : num  -87.6 -87.7 -87.7 -87.7 -87.7 ...
##  $ end_lat           : num  41.9 41.9 41.9 41.9 41.9 ...
##  $ end_lng           : num  -87.6 -87.7 -87.7 -87.7 -87.7 ...
##  $ member_casual     : chr  "member" "member" "member" "member" ...
colnames(Nov)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"
str(Nov)
## 'data.frame':    359978 obs. of  13 variables:
##  $ ride_id           : chr  "7C00A93E10556E47" "90854840DFD508BA" "0A7D10CDD144061C" "2F3BE33085BCFF02" ...
##  $ rideable_type     : chr  "electric_bike" "electric_bike" "electric_bike" "electric_bike" ...
##  $ started_at        : chr  "2021-11-27 13:27:38" "2021-11-27 13:38:25" "2021-11-26 22:03:34" "2021-11-27 09:56:49" ...
##  $ ended_at          : chr  "2021-11-27 13:46:38" "2021-11-27 13:56:10" "2021-11-26 22:05:56" "2021-11-27 10:01:50" ...
##  $ start_station_name: chr  "" "" "" "" ...
##  $ start_station_id  : chr  "" "" "" "" ...
##  $ end_station_name  : chr  "" "" "" "" ...
##  $ end_station_id    : chr  "" "" "" "" ...
##  $ start_lat         : num  41.9 42 42 41.9 41.9 ...
##  $ start_lng         : num  -87.7 -87.7 -87.7 -87.8 -87.6 ...
##  $ end_lat           : num  42 41.9 42 41.9 41.9 ...
##  $ end_lng           : num  -87.7 -87.7 -87.7 -87.8 -87.6 ...
##  $ member_casual     : chr  "casual" "casual" "casual" "casual" ...
colnames(Dec)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"
str(Dec)
## 'data.frame':    247540 obs. of  13 variables:
##  $ ride_id           : chr  "46F8167220E4431F" "73A77762838B32FD" "4CF42452054F59C5" "3278BA87BF698339" ...
##  $ rideable_type     : chr  "electric_bike" "electric_bike" "electric_bike" "classic_bike" ...
##  $ started_at        : chr  "2021-12-07 15:06:07" "2021-12-11 03:43:29" "2021-12-15 23:10:28" "2021-12-26 16:16:10" ...
##  $ ended_at          : chr  "2021-12-07 15:13:42" "2021-12-11 04:10:23" "2021-12-15 23:23:14" "2021-12-26 16:30:53" ...
##  $ start_station_name: chr  "Laflin St & Cullerton St" "LaSalle Dr & Huron St" "Halsted St & North Branch St" "Halsted St & North Branch St" ...
##  $ start_station_id  : chr  "13307" "KP1705001026" "KA1504000117" "KA1504000117" ...
##  $ end_station_name  : chr  "Morgan St & Polk St" "Clarendon Ave & Leland Ave" "Broadway & Barry Ave" "LaSalle Dr & Huron St" ...
##  $ end_station_id    : chr  "TA1307000130" "TA1307000119" "13137" "KP1705001026" ...
##  $ start_lat         : num  41.9 41.9 41.9 41.9 41.9 ...
##  $ start_lng         : num  -87.7 -87.6 -87.6 -87.6 -87.7 ...
##  $ end_lat           : num  41.9 42 41.9 41.9 41.9 ...
##  $ end_lng           : num  -87.7 -87.7 -87.6 -87.6 -87.6 ...
##  $ member_casual     : chr  "member" "casual" "member" "member" ...

Merging all the individual data frames of monthly data into a single data frame.

trip_data <- bind_rows(Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec, Jan)
str(trip_data)
## 'data.frame':    5601999 obs. of  13 variables:
##  $ ride_id           : chr  "89E7AA6C29227EFF" "0FEFDE2603568365" "E6159D746B2DBB91" "B32D3199F1C2E75B" ...
##  $ rideable_type     : chr  "classic_bike" "classic_bike" "electric_bike" "classic_bike" ...
##  $ started_at        : chr  "2021-02-12 16:14:56" "2021-02-14 17:52:38" "2021-02-09 19:10:18" "2021-02-02 17:49:41" ...
##  $ ended_at          : chr  "2021-02-12 16:21:43" "2021-02-14 18:12:09" "2021-02-09 19:19:10" "2021-02-02 17:54:06" ...
##  $ start_station_name: chr  "Glenwood Ave & Touhy Ave" "Glenwood Ave & Touhy Ave" "Clark St & Lake St" "Wood St & Chicago Ave" ...
##  $ start_station_id  : chr  "525" "525" "KA1503000012" "637" ...
##  $ end_station_name  : chr  "Sheridan Rd & Columbia Ave" "Bosworth Ave & Howard St" "State St & Randolph St" "Honore St & Division St" ...
##  $ end_station_id    : chr  "660" "16806" "TA1305000029" "TA1305000034" ...
##  $ start_lat         : num  42 42 41.9 41.9 41.8 ...
##  $ start_lng         : num  -87.7 -87.7 -87.6 -87.7 -87.6 ...
##  $ end_lat           : num  42 42 41.9 41.9 41.8 ...
##  $ end_lng           : num  -87.7 -87.7 -87.6 -87.7 -87.6 ...
##  $ member_casual     : chr  "member" "casual" "member" "member" ...

Process

Process the data for analysis which includes checking for data errors, documenting the cleaning process and transforming the data to work with it effectively. As the data is extremely large, R is used as data processing and analysis tool.

head(trip_data)    #first 6 rows of data frame
##            ride_id rideable_type          started_at            ended_at
## 1 89E7AA6C29227EFF  classic_bike 2021-02-12 16:14:56 2021-02-12 16:21:43
## 2 0FEFDE2603568365  classic_bike 2021-02-14 17:52:38 2021-02-14 18:12:09
## 3 E6159D746B2DBB91 electric_bike 2021-02-09 19:10:18 2021-02-09 19:19:10
## 4 B32D3199F1C2E75B  classic_bike 2021-02-02 17:49:41 2021-02-02 17:54:06
## 5 83E463F23575F4BF electric_bike 2021-02-23 15:07:23 2021-02-23 15:22:37
## 6 BDAA7E3494E8D545 electric_bike 2021-02-24 15:43:33 2021-02-24 15:49:05
##           start_station_name start_station_id           end_station_name
## 1   Glenwood Ave & Touhy Ave              525 Sheridan Rd & Columbia Ave
## 2   Glenwood Ave & Touhy Ave              525   Bosworth Ave & Howard St
## 3         Clark St & Lake St     KA1503000012     State St & Randolph St
## 4      Wood St & Chicago Ave              637    Honore St & Division St
## 5         State St & 33rd St            13216      Emerald Ave & 31st St
## 6 Fairbanks St & Superior St            18003      LaSalle Dr & Huron St
##   end_station_id start_lat start_lng  end_lat   end_lng member_casual
## 1            660  42.01270 -87.66606 42.00458 -87.66141        member
## 2          16806  42.01270 -87.66606 42.01954 -87.66956        casual
## 3   TA1305000029  41.88579 -87.63110 41.88487 -87.62750        member
## 4   TA1305000034  41.89563 -87.67207 41.90312 -87.67394        member
## 5   TA1309000055  41.83473 -87.62583 41.83816 -87.64512        member
## 6   KP1705001026  41.89581 -87.62025 41.89489 -87.63198        casual
colnames(trip_data)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"
nrow(trip_data)   #no. of rows in data frame
## [1] 5601999
dim(trip_data)   #dimensions of data frame
## [1] 5601999      13
summary(trip_data)   #statistical summary of data mainly for numerics
##    ride_id          rideable_type       started_at          ended_at        
##  Length:5601999     Length:5601999     Length:5601999     Length:5601999    
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##  start_station_name start_station_id   end_station_name   end_station_id    
##  Length:5601999     Length:5601999     Length:5601999     Length:5601999    
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##    start_lat       start_lng         end_lat         end_lng      
##  Min.   :41.64   Min.   :-87.84   Min.   :41.39   Min.   :-88.97  
##  1st Qu.:41.88   1st Qu.:-87.66   1st Qu.:41.88   1st Qu.:-87.66  
##  Median :41.90   Median :-87.64   Median :41.90   Median :-87.64  
##  Mean   :41.90   Mean   :-87.65   Mean   :41.90   Mean   :-87.65  
##  3rd Qu.:41.93   3rd Qu.:-87.63   3rd Qu.:41.93   3rd Qu.:-87.63  
##  Max.   :45.64   Max.   :-73.80   Max.   :42.17   Max.   :-87.49  
##                                   NA's   :4754    NA's   :4754    
##  member_casual     
##  Length:5601999    
##  Class :character  
##  Mode  :character  
##                    
##                    
##                    
## 
str(trip_data)  #list of columns and datatypes
## 'data.frame':    5601999 obs. of  13 variables:
##  $ ride_id           : chr  "89E7AA6C29227EFF" "0FEFDE2603568365" "E6159D746B2DBB91" "B32D3199F1C2E75B" ...
##  $ rideable_type     : chr  "classic_bike" "classic_bike" "electric_bike" "classic_bike" ...
##  $ started_at        : chr  "2021-02-12 16:14:56" "2021-02-14 17:52:38" "2021-02-09 19:10:18" "2021-02-02 17:49:41" ...
##  $ ended_at          : chr  "2021-02-12 16:21:43" "2021-02-14 18:12:09" "2021-02-09 19:19:10" "2021-02-02 17:54:06" ...
##  $ start_station_name: chr  "Glenwood Ave & Touhy Ave" "Glenwood Ave & Touhy Ave" "Clark St & Lake St" "Wood St & Chicago Ave" ...
##  $ start_station_id  : chr  "525" "525" "KA1503000012" "637" ...
##  $ end_station_name  : chr  "Sheridan Rd & Columbia Ave" "Bosworth Ave & Howard St" "State St & Randolph St" "Honore St & Division St" ...
##  $ end_station_id    : chr  "660" "16806" "TA1305000029" "TA1305000034" ...
##  $ start_lat         : num  42 42 41.9 41.9 41.8 ...
##  $ start_lng         : num  -87.7 -87.7 -87.6 -87.7 -87.6 ...
##  $ end_lat           : num  42 42 41.9 41.9 41.8 ...
##  $ end_lng           : num  -87.7 -87.7 -87.6 -87.7 -87.6 ...
##  $ member_casual     : chr  "member" "casual" "member" "member" ...

Adding the individual columns for date, day, month, year, day of the week to ease the in-depth analysis.

# Adding columns for date, month, year, day of the week into the data frame
trip_data$date <- as.Date(trip_data$started_at)
trip_data$month <- format(as.Date(trip_data$date),"%m")
trip_data$day <- format(as.Date(trip_data$date),"%d")
trip_data$year <- format(as.Date(trip_data$date),"%Y")
trip_data$day_of_week <- format(as.Date(trip_data$date),"%A")
colnames(trip_data)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"      "date"               "month"             
## [16] "day"                "year"               "day_of_week"
head(trip_data)
##            ride_id rideable_type          started_at            ended_at
## 1 89E7AA6C29227EFF  classic_bike 2021-02-12 16:14:56 2021-02-12 16:21:43
## 2 0FEFDE2603568365  classic_bike 2021-02-14 17:52:38 2021-02-14 18:12:09
## 3 E6159D746B2DBB91 electric_bike 2021-02-09 19:10:18 2021-02-09 19:19:10
## 4 B32D3199F1C2E75B  classic_bike 2021-02-02 17:49:41 2021-02-02 17:54:06
## 5 83E463F23575F4BF electric_bike 2021-02-23 15:07:23 2021-02-23 15:22:37
## 6 BDAA7E3494E8D545 electric_bike 2021-02-24 15:43:33 2021-02-24 15:49:05
##           start_station_name start_station_id           end_station_name
## 1   Glenwood Ave & Touhy Ave              525 Sheridan Rd & Columbia Ave
## 2   Glenwood Ave & Touhy Ave              525   Bosworth Ave & Howard St
## 3         Clark St & Lake St     KA1503000012     State St & Randolph St
## 4      Wood St & Chicago Ave              637    Honore St & Division St
## 5         State St & 33rd St            13216      Emerald Ave & 31st St
## 6 Fairbanks St & Superior St            18003      LaSalle Dr & Huron St
##   end_station_id start_lat start_lng  end_lat   end_lng member_casual
## 1            660  42.01270 -87.66606 42.00458 -87.66141        member
## 2          16806  42.01270 -87.66606 42.01954 -87.66956        casual
## 3   TA1305000029  41.88579 -87.63110 41.88487 -87.62750        member
## 4   TA1305000034  41.89563 -87.67207 41.90312 -87.67394        member
## 5   TA1309000055  41.83473 -87.62583 41.83816 -87.64512        member
## 6   KP1705001026  41.89581 -87.62025 41.89489 -87.63198        casual
##         date month day year day_of_week
## 1 2021-02-12    02  12 2021      Friday
## 2 2021-02-14    02  14 2021      Sunday
## 3 2021-02-09    02  09 2021     Tuesday
## 4 2021-02-02    02  02 2021     Tuesday
## 5 2021-02-23    02  23 2021     Tuesday
## 6 2021-02-24    02  24 2021   Wednesday

Adding a column to calculate the ride duration per ride.

# Adding ride_length column into the data frame
trip_data$ride_length <- difftime(trip_data$ended_at, trip_data$started_at)
str(trip_data)
## 'data.frame':    5601999 obs. of  19 variables:
##  $ ride_id           : chr  "89E7AA6C29227EFF" "0FEFDE2603568365" "E6159D746B2DBB91" "B32D3199F1C2E75B" ...
##  $ rideable_type     : chr  "classic_bike" "classic_bike" "electric_bike" "classic_bike" ...
##  $ started_at        : chr  "2021-02-12 16:14:56" "2021-02-14 17:52:38" "2021-02-09 19:10:18" "2021-02-02 17:49:41" ...
##  $ ended_at          : chr  "2021-02-12 16:21:43" "2021-02-14 18:12:09" "2021-02-09 19:19:10" "2021-02-02 17:54:06" ...
##  $ start_station_name: chr  "Glenwood Ave & Touhy Ave" "Glenwood Ave & Touhy Ave" "Clark St & Lake St" "Wood St & Chicago Ave" ...
##  $ start_station_id  : chr  "525" "525" "KA1503000012" "637" ...
##  $ end_station_name  : chr  "Sheridan Rd & Columbia Ave" "Bosworth Ave & Howard St" "State St & Randolph St" "Honore St & Division St" ...
##  $ end_station_id    : chr  "660" "16806" "TA1305000029" "TA1305000034" ...
##  $ start_lat         : num  42 42 41.9 41.9 41.8 ...
##  $ start_lng         : num  -87.7 -87.7 -87.6 -87.7 -87.6 ...
##  $ end_lat           : num  42 42 41.9 41.9 41.8 ...
##  $ end_lng           : num  -87.7 -87.7 -87.6 -87.7 -87.6 ...
##  $ member_casual     : chr  "member" "casual" "member" "member" ...
##  $ date              : Date, format: "2021-02-12" "2021-02-14" ...
##  $ month             : chr  "02" "02" "02" "02" ...
##  $ day               : chr  "12" "14" "09" "02" ...
##  $ year              : chr  "2021" "2021" "2021" "2021" ...
##  $ day_of_week       : chr  "Friday" "Sunday" "Tuesday" "Tuesday" ...
##  $ ride_length       : 'difftime' num  407 1171 532 265 ...
##   ..- attr(*, "units")= chr "secs"
glimpse(trip_data)
## Rows: 5,601,999
## Columns: 19
## $ ride_id            <chr> "89E7AA6C29227EFF", "0FEFDE2603568365", "E6159D746B~
## $ rideable_type      <chr> "classic_bike", "classic_bike", "electric_bike", "c~
## $ started_at         <chr> "2021-02-12 16:14:56", "2021-02-14 17:52:38", "2021~
## $ ended_at           <chr> "2021-02-12 16:21:43", "2021-02-14 18:12:09", "2021~
## $ start_station_name <chr> "Glenwood Ave & Touhy Ave", "Glenwood Ave & Touhy A~
## $ start_station_id   <chr> "525", "525", "KA1503000012", "637", "13216", "1800~
## $ end_station_name   <chr> "Sheridan Rd & Columbia Ave", "Bosworth Ave & Howar~
## $ end_station_id     <chr> "660", "16806", "TA1305000029", "TA1305000034", "TA~
## $ start_lat          <dbl> 42.01270, 42.01270, 41.88579, 41.89563, 41.83473, 4~
## $ start_lng          <dbl> -87.66606, -87.66606, -87.63110, -87.67207, -87.625~
## $ end_lat            <dbl> 42.00458, 42.01954, 41.88487, 41.90312, 41.83816, 4~
## $ end_lng            <dbl> -87.66141, -87.66956, -87.62750, -87.67394, -87.645~
## $ member_casual      <chr> "member", "casual", "member", "member", "member", "~
## $ date               <date> 2021-02-12, 2021-02-14, 2021-02-09, 2021-02-02, 20~
## $ month              <chr> "02", "02", "02", "02", "02", "02", "02", "02", "02~
## $ day                <chr> "12", "14", "09", "02", "23", "24", "01", "11", "27~
## $ year               <chr> "2021", "2021", "2021", "2021", "2021", "2021", "20~
## $ day_of_week        <chr> "Friday", "Sunday", "Tuesday", "Tuesday", "Tuesday"~
## $ ride_length        <drtn> 407 secs, 1171 secs, 532 secs, 265 secs, 914 secs,~

Changing the data type of the ride length column to numeric.

# converting ride_length to numeric 
trip_data$ride_length <- as.numeric(as.character(trip_data$ride_length))
is.numeric(trip_data$ride_length)
## [1] TRUE
glimpse(trip_data)
## Rows: 5,601,999
## Columns: 19
## $ ride_id            <chr> "89E7AA6C29227EFF", "0FEFDE2603568365", "E6159D746B~
## $ rideable_type      <chr> "classic_bike", "classic_bike", "electric_bike", "c~
## $ started_at         <chr> "2021-02-12 16:14:56", "2021-02-14 17:52:38", "2021~
## $ ended_at           <chr> "2021-02-12 16:21:43", "2021-02-14 18:12:09", "2021~
## $ start_station_name <chr> "Glenwood Ave & Touhy Ave", "Glenwood Ave & Touhy A~
## $ start_station_id   <chr> "525", "525", "KA1503000012", "637", "13216", "1800~
## $ end_station_name   <chr> "Sheridan Rd & Columbia Ave", "Bosworth Ave & Howar~
## $ end_station_id     <chr> "660", "16806", "TA1305000029", "TA1305000034", "TA~
## $ start_lat          <dbl> 42.01270, 42.01270, 41.88579, 41.89563, 41.83473, 4~
## $ start_lng          <dbl> -87.66606, -87.66606, -87.63110, -87.67207, -87.625~
## $ end_lat            <dbl> 42.00458, 42.01954, 41.88487, 41.90312, 41.83816, 4~
## $ end_lng            <dbl> -87.66141, -87.66956, -87.62750, -87.67394, -87.645~
## $ member_casual      <chr> "member", "casual", "member", "member", "member", "~
## $ date               <date> 2021-02-12, 2021-02-14, 2021-02-09, 2021-02-02, 20~
## $ month              <chr> "02", "02", "02", "02", "02", "02", "02", "02", "02~
## $ day                <chr> "12", "14", "09", "02", "23", "24", "01", "11", "27~
## $ year               <chr> "2021", "2021", "2021", "2021", "2021", "2021", "20~
## $ day_of_week        <chr> "Friday", "Sunday", "Tuesday", "Tuesday", "Tuesday"~
## $ ride_length        <dbl> 407, 1171, 532, 265, 914, 332, 51, 76, 1377, 1042, ~

Inspecting the bad ride length i.e. rides having ride length <=0.

# checking bad ride length
sum(trip_data$ride_length <= 0)
## [1] 652
nrow(trip_data)
## [1] 5601999

Removing the bad data.

# Removing bad ride length data
trip_data <- trip_data[!(trip_data$ride_length <= 0),]
sum(trip_data$ride_length <= 0)
## [1] 0
nrow(trip_data)
## [1] 5601347

Adding another column for different periods in a day i.e. morning, evening, afternoon and night.

# Creating breaks 
breaks <- hour(hm("00:00", "6:00", "12:00", "18:00", "23:59"))
# labels for the breaks
labels <- c("Night", "Morning", "Afternoon", "Evening")
#Defining time of the day(morning, afternoon, evening, night)
trip_data$time_of_the_trip <- cut(x=hour(trip_data$started_at), breaks = breaks, labels = labels, include.lowest=TRUE)
colnames(trip_data)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"      "date"               "month"             
## [16] "day"                "year"               "day_of_week"       
## [19] "ride_length"        "time_of_the_trip"
head(trip_data)
##            ride_id rideable_type          started_at            ended_at
## 1 89E7AA6C29227EFF  classic_bike 2021-02-12 16:14:56 2021-02-12 16:21:43
## 2 0FEFDE2603568365  classic_bike 2021-02-14 17:52:38 2021-02-14 18:12:09
## 3 E6159D746B2DBB91 electric_bike 2021-02-09 19:10:18 2021-02-09 19:19:10
## 4 B32D3199F1C2E75B  classic_bike 2021-02-02 17:49:41 2021-02-02 17:54:06
## 5 83E463F23575F4BF electric_bike 2021-02-23 15:07:23 2021-02-23 15:22:37
## 6 BDAA7E3494E8D545 electric_bike 2021-02-24 15:43:33 2021-02-24 15:49:05
##           start_station_name start_station_id           end_station_name
## 1   Glenwood Ave & Touhy Ave              525 Sheridan Rd & Columbia Ave
## 2   Glenwood Ave & Touhy Ave              525   Bosworth Ave & Howard St
## 3         Clark St & Lake St     KA1503000012     State St & Randolph St
## 4      Wood St & Chicago Ave              637    Honore St & Division St
## 5         State St & 33rd St            13216      Emerald Ave & 31st St
## 6 Fairbanks St & Superior St            18003      LaSalle Dr & Huron St
##   end_station_id start_lat start_lng  end_lat   end_lng member_casual
## 1            660  42.01270 -87.66606 42.00458 -87.66141        member
## 2          16806  42.01270 -87.66606 42.01954 -87.66956        casual
## 3   TA1305000029  41.88579 -87.63110 41.88487 -87.62750        member
## 4   TA1305000034  41.89563 -87.67207 41.90312 -87.67394        member
## 5   TA1309000055  41.83473 -87.62583 41.83816 -87.64512        member
## 6   KP1705001026  41.89581 -87.62025 41.89489 -87.63198        casual
##         date month day year day_of_week ride_length time_of_the_trip
## 1 2021-02-12    02  12 2021      Friday         407        Afternoon
## 2 2021-02-14    02  14 2021      Sunday        1171        Afternoon
## 3 2021-02-09    02  09 2021     Tuesday         532          Evening
## 4 2021-02-02    02  02 2021     Tuesday         265        Afternoon
## 5 2021-02-23    02  23 2021     Tuesday         914        Afternoon
## 6 2021-02-24    02  24 2021   Wednesday         332        Afternoon

Analyze

The data has been prepared and processed now ready for descriptive analysis. Analysis includes performing calculations on the cleaned, consistent data and identification of trends, patterns and relationships.

Performing statistical analysis by calculating mean, median, maximum and minimum on ride length column for both casual riders and members.

# finding mean(total ride length/total rides), median(midpoint), max(longest), min(shortest) for ride_length
trip_data %>% 
  group_by(member_casual) %>% summarise(average_ride_length = mean(ride_length), median_length = median(ride_length),
  max_ride_length = max(ride_length), min_ride_length = min(ride_length))
## # A tibble: 2 x 5
##   member_casual average_ride_leng~ median_length max_ride_length min_ride_length
##   <chr>                      <dbl>         <dbl>           <dbl>           <dbl>
## 1 casual                     1922.           957         3356649               1
## 2 member                      816.           574           93596               1

Calculating total no. of rides.

# total ride taken(ride count) by members and casual riders
trip_data %>%
  group_by(member_casual) %>%
  summarise(ride_count = length(ride_id))
## # A tibble: 2 x 2
##   member_casual ride_count
##   <chr>              <int>
## 1 casual           2529064
## 2 member           3072283

Calculating average ride length and no. of rides as per day of the week.

# calculating total rides and average ride time by each day for members and  casual riders
trip_data %>%
  group_by(member_casual, day_of_week) %>%
  summarise(number_of_rides = n(),
            average_ride_length = mean(ride_length),.groups = "drop")
## # A tibble: 14 x 4
##    member_casual day_of_week number_of_rides average_ride_length
##    <chr>         <chr>                 <int>               <dbl>
##  1 casual        Friday               363656               1822.
##  2 casual        Monday               286681               1916.
##  3 casual        Saturday             557722               2085.
##  4 casual        Sunday               480699               2254.
##  5 casual        Thursday             286233               1669.
##  6 casual        Tuesday              274868               1676.
##  7 casual        Wednesday            279205               1665.
##  8 member        Friday               445093                799.
##  9 member        Monday               418420                792.
## 10 member        Saturday             431674                914.
## 11 member        Sunday               376207                939.
## 12 member        Thursday             453535                765.
## 13 member        Tuesday              468659                767.
## 14 member        Wednesday            478695                766.

Comparing ride lengths between different times of the day

# Comparing time period(night, morning, evening, day) of ride with ride length for both riders
trip_data %>%
  group_by(member_casual, time_of_the_trip) %>%
  summarise(number_of_rides = n(),
            average_ride_length = mean(ride_length),.groups = "drop")
## # A tibble: 8 x 4
##   member_casual time_of_the_trip number_of_rides average_ride_length
##   <chr>         <fct>                      <int>               <dbl>
## 1 casual        Night                     181708               2121.
## 2 casual        Morning                   588852               1852.
## 3 casual        Afternoon                1195470               1933.
## 4 casual        Evening                   563034               1909.
## 5 member        Night                     195833                785.
## 6 member        Morning                   920021                776.
## 7 member        Afternoon                1404941                843.
## 8 member        Evening                   551488                827.

Share

In this phase, the gained insights and findings are shared through effective data visualizations. Bar charts are used to share the above analysis.

# Visualizing total rides taken by members and casual riders
trip_data %>%
  group_by(member_casual) %>%
  summarise(ride_count = length(ride_id)) %>%
   ggplot() + geom_col(mapping = aes(x = member_casual, y = ride_count, fill = member_casual), show.legend = FALSE) +
  labs(title = "Total no. of rides ")

# Visualizing the days of the week with no. of rides taken by riders
trip_data %>%
  group_by(member_casual, day_of_week) %>%
  summarise(number_of_rides = n(), .groups = "drop") %>%
  arrange(member_casual, day_of_week) %>%
  ggplot(aes(x = day_of_week, y = number_of_rides, fill = member_casual)) +
  labs(title = "Total rides vs.day of the week") +
  geom_col(width = 0.5, position = position_dodge(width = 0.5)) +
  scale_y_continuous(labels = function(x) format(x,scientific = FALSE))

# Visualizing average ride by day of the week
trip_data %>%
  group_by(member_casual, day_of_week) %>%
  summarise(average_ride_length = mean(ride_length), .groups = "drop") %>%
  ggplot(aes(x = day_of_week, y = average_ride_length, fill = member_casual)) +
   geom_col(width = 0.5, position = position_dodge(width = 0.5)) +
  labs(title = "Average ride length vs. day of the week")

# visualizing total rides taken by members and casuals by month
trip_data %>%
  group_by(member_casual, month) %>%
  summarise(number_of_rides = n(), .groups = "drop") %>%
  arrange(member_casual, month) %>%
  ggplot(aes(x = month, y = number_of_rides, fill = member_casual)) +
  labs(title = "Total rides vs.month") +
  geom_col(width = 0.5, position = position_dodge(width = 0.5)) +
  scale_y_continuous(labels = function(x) format(x,scientific = FALSE))

# visualizing average rides by month
trip_data %>%
  group_by(member_casual, month) %>%
  summarise(average_ride_length = mean(ride_length), .groups = "drop") %>%
  ggplot(aes(x = month, y = average_ride_length, fill = member_casual)) +
  geom_col(width = 0.5, position = position_dodge(width = 0.5)) +
  labs(title = "Average ride length vs. month")

# visualizing and comparing casual and member rides by distance
trip_data %>%
  group_by(member_casual) %>%
  summarise(average_ride_distance = mean(ride_length)) %>%
  ggplot() + geom_col(mapping = aes(x = member_casual, y = average_ride_distance, fill = member_casual), show.legend = FALSE) +
  labs(title = "Mean distance travelled")

# Visualizing time period(night, morning, evening, day) of rides with total no. of rides
trip_data %>%
  group_by(member_casual, time_of_the_trip) %>%
  summarise(number_of_rides = n(), .groups = "drop") %>%
  ggplot() + geom_col(mapping = aes(x = time_of_the_trip, y = number_of_rides, fill = member_casual), show.legend = TRUE) +
  labs(title = "Total no. of rides vs. the time of the trip")

# Visualizing comparison of total rides with the type of ride
trip_data %>%
  group_by(member_casual, rideable_type) %>%
  summarise(number_of_rides = n(), .groups = "drop") %>%
ggplot() + geom_col(mapping = aes(x = rideable_type, y = number_of_rides, fill = member_casual), show.legend = TRUE) +
  labs(title = "Total no. of rides vs. ride type")

Visualizing the start and end positions of rides using latitudes and longitudes co-ordinates.

# Visualizing and analyzing on map via latitudes and longitudes 

# Adding a new dataframe only for most popular routes > 200 rides
coordinates_df <- trip_data %>%
  filter(start_lat != end_lng & start_lng != end_lat) %>%
  group_by(start_lng, start_lat, end_lng, end_lat, member_casual, rideable_type) %>%
  summarise(total_rides = n(), .groups = "drop") %>%
  filter(total_rides > 200)

casual_riders <- coordinates_df %>%
  filter(member_casual == "casual")
member_riders <- coordinates_df %>%
  filter(member_casual =="member")

# Storing map of Chicago 
chicago <- c(left = -87.700424, bottom = 41.790769, right = -87.554855, top = 41.990119)
chicago_map <- get_stamenmap(bbox = chicago, zoom = 12, maptype = "terrain" )
## Source : http://tile.stamen.com/terrain/12/1050/1520.png
## Source : http://tile.stamen.com/terrain/12/1051/1520.png
## Source : http://tile.stamen.com/terrain/12/1050/1521.png
## Source : http://tile.stamen.com/terrain/12/1051/1521.png
## Source : http://tile.stamen.com/terrain/12/1050/1522.png
## Source : http://tile.stamen.com/terrain/12/1051/1522.png
## Source : http://tile.stamen.com/terrain/12/1050/1523.png
## Source : http://tile.stamen.com/terrain/12/1051/1523.png
# maps for casual and member riders
ggmap(chicago_map, darken = c(0.1, "white")) + geom_point(casual_riders, mapping = aes(x = start_lng, y = start_lat, color = rideable_type), size = 2) + coord_fixed(0.8) + labs(title = "Hotspots of casual riders", x=NULL, y=NULL) + theme(legend.position = "Right")
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Warning: Removed 49 rows containing missing values (geom_point).

ggmap(chicago_map, darken = c(0.1,"white")) + geom_point(member_riders, mapping = aes(x = start_lng, y = start_lat, color = rideable_type), size = 2) + coord_fixed(0.8) + labs(title = "Hotspots of member riders",x=NULL, y=NULL) + theme(legend.position = "Right")
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Warning: Removed 109 rows containing missing values (geom_point).

Act

Now that we have finished creating visualizations, its time to act on our findings and proposing the top 3 recommendations based on our analysis.

  1. Weekend membership : As we found that most of the casual riders prefer riding on weekends more, thus a weekend membership can attract new casual riders as well as the existing ones and also the weekend membership benefits can be used to influence them for extended memberships.

  2. Marketing and promotional campaigns : The busiest time of the year for Cyclistic is in the 3rd quarter of the year when rides are on its peak for both type of riders which is the best time for promotional activities and campaigns. Those can be conducted nearby riding hotspots. Classic bikes are used the most thus offerings can be created for those.

  3. Discounts and riding competitions : Cyclistic can organize bike riding competitions with exciting prizes and can offer discounted yearly memberships to the participants.

Additional data like pricing details etc. could be used to expand our findings and scope of analysis but the provided data is sufficient to conclude our findings and accomplish the business task.

Resources-

RStudio, Medium, Linkedin and Kaggle community.

For ggmap: http://journal.r-project.org/archive/2013-1/kahle-wickham.pdf https://cran.r-project.org/web/packages/ggmap/citation.html