Data-driven decision-making (DDDM) is defined as using facts, metrics, and data to guide strategic business decisions that align with your goals, objectives, and initiatives. Various of Data-driven decision-making process has been defined by Industry experts. There is no universal DDDM process but most of the DDDM follows the almost same process. The method used in this project is Ask - Prepare - Process - Analyze - Share - Act ( APPASA )
In 2016, Cyclistic launched a successful bike-share offering. Since then, the program has grown to a fleet of 5,824 bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime. Until now, Cyclistic’s marketing strategy relied on building general awareness and appealing to broad consumer segments.One approach that helped make these things possible was the flexibility of its pricing plans: single-ride passes, full-day passes,and annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members.
Cyclistic’s finance analysts have concluded that annual members are much more profitable than casual riders. Although the pricing flexibility helps Cyclistic attract more customers, Moreno believes that maximizing the number of annual members will be key to future growth. Rather than creating a marketing campaign that targets all-new customers, Moreno believes there is a very good chance to convert casual riders into members. She notes that casual riders are already aware of the Cyclistic program and have chosen Cyclistic for their mobility needs. This data analysis has been done to obtain following :
To know how annual members and casual riders use Cyclistic bikes differently?
Why would casual riders buy Cyclistic annual memberships?
How can Cyclistic use digital media to influence casual riders to become members?
Primary Stakeholder : Lily Moreno, Cyclistic executive team
Secondary Stakeholder : Cyclistic marketing analytics team
“divvy-tripdata” is historical trip data of company Divvy and data has been made available to public. Data set comes under Divvy Data License Agreement and it released monthly in schedule. The Data of Each trip is anonymized and includes: Trip start day and time Trip end day and time Trip start station Trip end station Rider type (Member, Single Ride, and Day Pass) Bike type (Classic bike, Docked bike, Electric bike)
ROCCC analysis
Reliable
Original
Comprehensive
Current
Cited
Data set checks yes in all criteria of ROCCC analysis.
All the data used in this analysis can be downlaoded from here.
The packages used in this data analysis are : tidyverse, dplyr, skimr, janitor, lubridate, readr
and installed using : install.packages(“package_name”)
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5 v purrr 0.3.4
## v tibble 3.1.6 v dplyr 1.0.8
## v tidyr 1.2.0 v stringr 1.4.0
## v readr 2.1.2 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(here)
## here() starts at C:/Users/mayank/Documents
library(dplyr)
library(skimr)
library(janitor)
##
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(readr)
library(readr)
Feb_2021_trip <- read_csv("Bluetooth Folder/excel_data/Feb_2021_trip.csv")
## Rows: 49622 Columns: 13
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (9): ride_id, rideable_type, started_at, ended_at, start_station_name, s...
## dbl (4): start_lat, start_lng, end_lat, end_lng
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
library(readr)
March_2021_trip <- read_csv("Bluetooth Folder/excel_data/March_2021_trip.csv")
## Rows: 228496 Columns: 13
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
library(readr)
April_2021_trip <- read_csv("Bluetooth Folder/excel_data/April_2021_trip.csv")
## Rows: 337230 Columns: 13
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
library(readr)
May_2021_trip <- read_csv("Bluetooth Folder/excel_data/May_2021_trip.csv")
## Rows: 531633 Columns: 13
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
library(readr)
June_2021_trip <- read_csv("Bluetooth Folder/excel_data/June_2021_trip.csv")
## Rows: 729595 Columns: 13
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
library(readr)
July_2021_trip <- read_csv("Bluetooth Folder/excel_data/July_2021_trip.csv")
## Rows: 822410 Columns: 13
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
library(readr)
August_2021_trip <- read_csv("Bluetooth Folder/excel_data/August_2021_trip.csv")
## Rows: 804352 Columns: 13
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
library(readr)
Sept_2021_trip <- read_csv("Bluetooth Folder/excel_data/Sept_2021_trip.csv")
## Rows: 756147 Columns: 13
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
library(readr)
Oct_2021_trip <- read_csv("Bluetooth Folder/excel_data/Oct_2021_trip.csv")
## Rows: 631226 Columns: 13
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
library(readr)
Nov_2021_trip <- read_csv("Bluetooth Folder/excel_data/Nov_2021_trip.csv")
## Rows: 359978 Columns: 13
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
library(readr)
Dec_2021_trip <- read_csv("Bluetooth Folder/excel_data/Dec_2021_trip.csv")
## Rows: 247540 Columns: 13
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
library(readr)
Jan_2022_trip <- read_csv("Bluetooth Folder/excel_data/Jan_2022_trip.csv")
## Rows: 103770 Columns: 13
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
str(Feb_2021_trip)
## spec_tbl_df [49,622 x 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ ride_id : chr [1:49622] "89E7AA6C29227EFF" "0FEFDE2603568365" "E6159D746B2DBB91" "B32D3199F1C2E75B" ...
## $ rideable_type : chr [1:49622] "classic_bike" "classic_bike" "electric_bike" "classic_bike" ...
## $ started_at : chr [1:49622] "2/12/2021 16:14" "2/14/2021 17:52" "2/9/2021 19:10" "2/2/2021 17:49" ...
## $ ended_at : chr [1:49622] "2/12/2021 16:21" "2/14/2021 18:12" "2/9/2021 19:19" "2/2/2021 17:54" ...
## $ start_station_name: chr [1:49622] "Glenwood Ave & Touhy Ave" "Glenwood Ave & Touhy Ave" "Clark St & Lake St" "Wood St & Chicago Ave" ...
## $ start_station_id : chr [1:49622] "525" "525" "KA1503000012" "637" ...
## $ end_station_name : chr [1:49622] "Sheridan Rd & Columbia Ave" "Bosworth Ave & Howard St" "State St & Randolph St" "Honore St & Division St" ...
## $ end_station_id : chr [1:49622] "660" "16806" "TA1305000029" "TA1305000034" ...
## $ start_lat : num [1:49622] 42 42 41.9 41.9 41.8 ...
## $ start_lng : num [1:49622] -87.7 -87.7 -87.6 -87.7 -87.6 ...
## $ end_lat : num [1:49622] 42 42 41.9 41.9 41.8 ...
## $ end_lng : num [1:49622] -87.7 -87.7 -87.6 -87.7 -87.6 ...
## $ member_casual : chr [1:49622] "member" "casual" "member" "member" ...
## - attr(*, "spec")=
## .. cols(
## .. ride_id = col_character(),
## .. rideable_type = col_character(),
## .. started_at = col_character(),
## .. ended_at = col_character(),
## .. start_station_name = col_character(),
## .. start_station_id = col_character(),
## .. end_station_name = col_character(),
## .. end_station_id = col_character(),
## .. start_lat = col_double(),
## .. start_lng = col_double(),
## .. end_lat = col_double(),
## .. end_lng = col_double(),
## .. member_casual = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
str(March_2021_trip)
## spec_tbl_df [228,496 x 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ ride_id : chr [1:228496] "CFA86D4455AA1030" "30D9DC61227D1AF3" "846D87A15682A284" "994D05AA75A168F2" ...
## $ rideable_type : chr [1:228496] "classic_bike" "classic_bike" "classic_bike" "classic_bike" ...
## $ started_at : POSIXct[1:228496], format: "2021-03-16 08:32:30" "2021-03-28 01:26:28" ...
## $ ended_at : POSIXct[1:228496], format: "2021-03-16 08:36:34" "2021-03-28 01:36:55" ...
## $ start_station_name: chr [1:228496] "Humboldt Blvd & Armitage Ave" "Humboldt Blvd & Armitage Ave" "Shields Ave & 28th Pl" "Winthrop Ave & Lawrence Ave" ...
## $ start_station_id : chr [1:228496] "15651" "15651" "15443" "TA1308000021" ...
## $ end_station_name : chr [1:228496] "Stave St & Armitage Ave" "Central Park Ave & Bloomingdale Ave" "Halsted St & 35th St" "Broadway & Sheridan Rd" ...
## $ end_station_id : chr [1:228496] "13266" "18017" "TA1308000043" "13323" ...
## $ start_lat : num [1:228496] 41.9 41.9 41.8 42 42 ...
## $ start_lng : num [1:228496] -87.7 -87.7 -87.6 -87.7 -87.7 ...
## $ end_lat : num [1:228496] 41.9 41.9 41.8 42 42.1 ...
## $ end_lng : num [1:228496] -87.7 -87.7 -87.6 -87.6 -87.7 ...
## $ member_casual : chr [1:228496] "casual" "casual" "casual" "casual" ...
## - attr(*, "spec")=
## .. cols(
## .. ride_id = col_character(),
## .. rideable_type = col_character(),
## .. started_at = col_datetime(format = ""),
## .. ended_at = col_datetime(format = ""),
## .. start_station_name = col_character(),
## .. start_station_id = col_character(),
## .. end_station_name = col_character(),
## .. end_station_id = col_character(),
## .. start_lat = col_double(),
## .. start_lng = col_double(),
## .. end_lat = col_double(),
## .. end_lng = col_double(),
## .. member_casual = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
str(April_2021_trip)
## spec_tbl_df [337,230 x 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ ride_id : chr [1:337230] "6C992BD37A98A63F" "1E0145613A209000" "E498E15508A80BAD" "1887262AD101C604" ...
## $ rideable_type : chr [1:337230] "classic_bike" "docked_bike" "docked_bike" "classic_bike" ...
## $ started_at : POSIXct[1:337230], format: "2021-04-12 18:25:36" "2021-04-27 17:27:11" ...
## $ ended_at : POSIXct[1:337230], format: "2021-04-12 18:56:55" "2021-04-27 18:31:29" ...
## $ start_station_name: chr [1:337230] "State St & Pearson St" "Dorchester Ave & 49th St" "Loomis Blvd & 84th St" "Honore St & Division St" ...
## $ start_station_id : chr [1:337230] "TA1307000061" "KA1503000069" "20121" "TA1305000034" ...
## $ end_station_name : chr [1:337230] "Southport Ave & Waveland Ave" "Dorchester Ave & 49th St" "Loomis Blvd & 84th St" "Southport Ave & Waveland Ave" ...
## $ end_station_id : chr [1:337230] "13235" "KA1503000069" "20121" "13235" ...
## $ start_lat : num [1:337230] 41.9 41.8 41.7 41.9 41.7 ...
## $ start_lng : num [1:337230] -87.6 -87.6 -87.7 -87.7 -87.7 ...
## $ end_lat : num [1:337230] 41.9 41.8 41.7 41.9 41.7 ...
## $ end_lng : num [1:337230] -87.7 -87.6 -87.7 -87.7 -87.7 ...
## $ member_casual : chr [1:337230] "member" "casual" "casual" "member" ...
## - attr(*, "spec")=
## .. cols(
## .. ride_id = col_character(),
## .. rideable_type = col_character(),
## .. started_at = col_datetime(format = ""),
## .. ended_at = col_datetime(format = ""),
## .. start_station_name = col_character(),
## .. start_station_id = col_character(),
## .. end_station_name = col_character(),
## .. end_station_id = col_character(),
## .. start_lat = col_double(),
## .. start_lng = col_double(),
## .. end_lat = col_double(),
## .. end_lng = col_double(),
## .. member_casual = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
str(May_2021_trip)
## spec_tbl_df [531,633 x 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ ride_id : chr [1:531633] "C809ED75D6160B2A" "DD59FDCE0ACACAF3" "0AB83CB88C43EFC2" "7881AC6D39110C60" ...
## $ rideable_type : chr [1:531633] "electric_bike" "electric_bike" "electric_bike" "electric_bike" ...
## $ started_at : POSIXct[1:531633], format: "2021-05-30 11:58:15" "2021-05-30 11:29:14" ...
## $ ended_at : POSIXct[1:531633], format: "2021-05-30 12:10:39" "2021-05-30 12:14:09" ...
## $ start_station_name: chr [1:531633] NA NA NA NA ...
## $ start_station_id : chr [1:531633] NA NA NA NA ...
## $ end_station_name : chr [1:531633] NA NA NA NA ...
## $ end_station_id : chr [1:531633] NA NA NA NA ...
## $ start_lat : num [1:531633] 41.9 41.9 41.9 41.9 41.9 ...
## $ start_lng : num [1:531633] -87.6 -87.6 -87.7 -87.7 -87.7 ...
## $ end_lat : num [1:531633] 41.9 41.8 41.9 41.9 41.9 ...
## $ end_lng : num [1:531633] -87.6 -87.6 -87.7 -87.7 -87.7 ...
## $ member_casual : chr [1:531633] "casual" "casual" "casual" "casual" ...
## - attr(*, "spec")=
## .. cols(
## .. ride_id = col_character(),
## .. rideable_type = col_character(),
## .. started_at = col_datetime(format = ""),
## .. ended_at = col_datetime(format = ""),
## .. start_station_name = col_character(),
## .. start_station_id = col_character(),
## .. end_station_name = col_character(),
## .. end_station_id = col_character(),
## .. start_lat = col_double(),
## .. start_lng = col_double(),
## .. end_lat = col_double(),
## .. end_lng = col_double(),
## .. member_casual = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
str(June_2021_trip)
## spec_tbl_df [729,595 x 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ ride_id : chr [1:729595] "99FEC93BA843FB20" "06048DCFC8520CAF" "9598066F68045DF2" "B03C0FE48C412214" ...
## $ rideable_type : chr [1:729595] "electric_bike" "electric_bike" "electric_bike" "electric_bike" ...
## $ started_at : POSIXct[1:729595], format: "2021-06-13 14:31:28" "2021-06-04 11:18:02" ...
## $ ended_at : POSIXct[1:729595], format: "2021-06-13 14:34:11" "2021-06-04 11:24:19" ...
## $ start_station_name: chr [1:729595] NA NA NA NA ...
## $ start_station_id : chr [1:729595] NA NA NA NA ...
## $ end_station_name : chr [1:729595] NA NA NA NA ...
## $ end_station_id : chr [1:729595] NA NA NA NA ...
## $ start_lat : num [1:729595] 41.8 41.8 41.8 41.8 41.8 ...
## $ start_lng : num [1:729595] -87.6 -87.6 -87.6 -87.6 -87.6 ...
## $ end_lat : num [1:729595] 41.8 41.8 41.8 41.8 41.8 ...
## $ end_lng : num [1:729595] -87.6 -87.6 -87.6 -87.6 -87.6 ...
## $ member_casual : chr [1:729595] "member" "member" "member" "member" ...
## - attr(*, "spec")=
## .. cols(
## .. ride_id = col_character(),
## .. rideable_type = col_character(),
## .. started_at = col_datetime(format = ""),
## .. ended_at = col_datetime(format = ""),
## .. start_station_name = col_character(),
## .. start_station_id = col_character(),
## .. end_station_name = col_character(),
## .. end_station_id = col_character(),
## .. start_lat = col_double(),
## .. start_lng = col_double(),
## .. end_lat = col_double(),
## .. end_lng = col_double(),
## .. member_casual = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
str(July_2021_trip)
## spec_tbl_df [822,410 x 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ ride_id : chr [1:822410] "0A1B623926EF4E16" "B2D5583A5A5E76EE" "6F264597DDBF427A" "379B58EAB20E8AA5" ...
## $ rideable_type : chr [1:822410] "docked_bike" "classic_bike" "classic_bike" "classic_bike" ...
## $ started_at : POSIXct[1:822410], format: "2021-07-02 14:44:36" "2021-07-07 16:57:42" ...
## $ ended_at : POSIXct[1:822410], format: "2021-07-02 15:19:58" "2021-07-07 17:16:09" ...
## $ start_station_name: chr [1:822410] "Michigan Ave & Washington St" "California Ave & Cortez St" "Wabash Ave & 16th St" "California Ave & Cortez St" ...
## $ start_station_id : chr [1:822410] "13001" "17660" "SL-012" "17660" ...
## $ end_station_name : chr [1:822410] "Halsted St & North Branch St" "Wood St & Hubbard St" "Rush St & Hubbard St" "Carpenter St & Huron St" ...
## $ end_station_id : chr [1:822410] "KA1504000117" "13432" "KA1503000044" "13196" ...
## $ start_lat : num [1:822410] 41.9 41.9 41.9 41.9 41.9 ...
## $ start_lng : num [1:822410] -87.6 -87.7 -87.6 -87.7 -87.7 ...
## $ end_lat : num [1:822410] 41.9 41.9 41.9 41.9 41.9 ...
## $ end_lng : num [1:822410] -87.6 -87.7 -87.6 -87.7 -87.7 ...
## $ member_casual : chr [1:822410] "casual" "casual" "member" "member" ...
## - attr(*, "spec")=
## .. cols(
## .. ride_id = col_character(),
## .. rideable_type = col_character(),
## .. started_at = col_datetime(format = ""),
## .. ended_at = col_datetime(format = ""),
## .. start_station_name = col_character(),
## .. start_station_id = col_character(),
## .. end_station_name = col_character(),
## .. end_station_id = col_character(),
## .. start_lat = col_double(),
## .. start_lng = col_double(),
## .. end_lat = col_double(),
## .. end_lng = col_double(),
## .. member_casual = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
str(August_2021_trip)
## spec_tbl_df [804,352 x 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ ride_id : chr [1:804352] "99103BB87CC6C1BB" "EAFCCCFB0A3FC5A1" "9EF4F46C57AD234D" "5834D3208BFAF1DA" ...
## $ rideable_type : chr [1:804352] "electric_bike" "electric_bike" "electric_bike" "electric_bike" ...
## $ started_at : POSIXct[1:804352], format: "2021-08-10 17:15:49" "2021-08-10 17:23:14" ...
## $ ended_at : POSIXct[1:804352], format: "2021-08-10 17:22:44" "2021-08-10 17:39:24" ...
## $ start_station_name: chr [1:804352] NA NA NA NA ...
## $ start_station_id : chr [1:804352] NA NA NA NA ...
## $ end_station_name : chr [1:804352] NA NA NA NA ...
## $ end_station_id : chr [1:804352] NA NA NA NA ...
## $ start_lat : num [1:804352] 41.8 41.8 42 42 41.8 ...
## $ start_lng : num [1:804352] -87.7 -87.7 -87.7 -87.7 -87.6 ...
## $ end_lat : num [1:804352] 41.8 41.8 42 42 41.8 ...
## $ end_lng : num [1:804352] -87.7 -87.6 -87.7 -87.7 -87.6 ...
## $ member_casual : chr [1:804352] "member" "member" "member" "member" ...
## - attr(*, "spec")=
## .. cols(
## .. ride_id = col_character(),
## .. rideable_type = col_character(),
## .. started_at = col_datetime(format = ""),
## .. ended_at = col_datetime(format = ""),
## .. start_station_name = col_character(),
## .. start_station_id = col_character(),
## .. end_station_name = col_character(),
## .. end_station_id = col_character(),
## .. start_lat = col_double(),
## .. start_lng = col_double(),
## .. end_lat = col_double(),
## .. end_lng = col_double(),
## .. member_casual = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
str(Sept_2021_trip)
## spec_tbl_df [756,147 x 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ ride_id : chr [1:756147] "9DC7B962304CBFD8" "F930E2C6872D6B32" "6EF72137900BB910" "78D1DE133B3DBF55" ...
## $ rideable_type : chr [1:756147] "electric_bike" "electric_bike" "electric_bike" "electric_bike" ...
## $ started_at : POSIXct[1:756147], format: "2021-09-28 16:07:10" "2021-09-28 14:24:51" ...
## $ ended_at : POSIXct[1:756147], format: "2021-09-28 16:09:54" "2021-09-28 14:40:05" ...
## $ start_station_name: chr [1:756147] NA NA NA NA ...
## $ start_station_id : chr [1:756147] NA NA NA NA ...
## $ end_station_name : chr [1:756147] NA NA NA NA ...
## $ end_station_id : chr [1:756147] NA NA NA NA ...
## $ start_lat : num [1:756147] 41.9 41.9 41.8 41.8 41.9 ...
## $ start_lng : num [1:756147] -87.7 -87.6 -87.7 -87.7 -87.7 ...
## $ end_lat : num [1:756147] 41.9 42 41.8 41.8 41.9 ...
## $ end_lng : num [1:756147] -87.7 -87.7 -87.7 -87.7 -87.7 ...
## $ member_casual : chr [1:756147] "casual" "casual" "casual" "casual" ...
## - attr(*, "spec")=
## .. cols(
## .. ride_id = col_character(),
## .. rideable_type = col_character(),
## .. started_at = col_datetime(format = ""),
## .. ended_at = col_datetime(format = ""),
## .. start_station_name = col_character(),
## .. start_station_id = col_character(),
## .. end_station_name = col_character(),
## .. end_station_id = col_character(),
## .. start_lat = col_double(),
## .. start_lng = col_double(),
## .. end_lat = col_double(),
## .. end_lng = col_double(),
## .. member_casual = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
str(Oct_2021_trip)
## spec_tbl_df [631,226 x 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ ride_id : chr [1:631226] "620BC6107255BF4C" "4471C70731AB2E45" "26CA69D43D15EE14" "362947F0437E1514" ...
## $ rideable_type : chr [1:631226] "electric_bike" "electric_bike" "electric_bike" "electric_bike" ...
## $ started_at : POSIXct[1:631226], format: "2021-10-22 12:46:42" "2021-10-21 09:12:37" ...
## $ ended_at : POSIXct[1:631226], format: "2021-10-22 12:49:50" "2021-10-21 09:14:14" ...
## $ start_station_name: chr [1:631226] "Kingsbury St & Kinzie St" NA NA NA ...
## $ start_station_id : chr [1:631226] "KA1503000043" NA NA NA ...
## $ end_station_name : chr [1:631226] NA NA NA NA ...
## $ end_station_id : chr [1:631226] NA NA NA NA ...
## $ start_lat : num [1:631226] 41.9 41.9 41.9 41.9 41.9 ...
## $ start_lng : num [1:631226] -87.6 -87.7 -87.7 -87.7 -87.7 ...
## $ end_lat : num [1:631226] 41.9 41.9 41.9 41.9 41.9 ...
## $ end_lng : num [1:631226] -87.6 -87.7 -87.7 -87.7 -87.7 ...
## $ member_casual : chr [1:631226] "member" "member" "member" "member" ...
## - attr(*, "spec")=
## .. cols(
## .. ride_id = col_character(),
## .. rideable_type = col_character(),
## .. started_at = col_datetime(format = ""),
## .. ended_at = col_datetime(format = ""),
## .. start_station_name = col_character(),
## .. start_station_id = col_character(),
## .. end_station_name = col_character(),
## .. end_station_id = col_character(),
## .. start_lat = col_double(),
## .. start_lng = col_double(),
## .. end_lat = col_double(),
## .. end_lng = col_double(),
## .. member_casual = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
str(Nov_2021_trip)
## spec_tbl_df [359,978 x 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ ride_id : chr [1:359978] "7C00A93E10556E47" "90854840DFD508BA" "0A7D10CDD144061C" "2F3BE33085BCFF02" ...
## $ rideable_type : chr [1:359978] "electric_bike" "electric_bike" "electric_bike" "electric_bike" ...
## $ started_at : POSIXct[1:359978], format: "2021-11-27 13:27:38" "2021-11-27 13:38:25" ...
## $ ended_at : POSIXct[1:359978], format: "2021-11-27 13:46:38" "2021-11-27 13:56:10" ...
## $ start_station_name: chr [1:359978] NA NA NA NA ...
## $ start_station_id : chr [1:359978] NA NA NA NA ...
## $ end_station_name : chr [1:359978] NA NA NA NA ...
## $ end_station_id : chr [1:359978] NA NA NA NA ...
## $ start_lat : num [1:359978] 41.9 42 42 41.9 41.9 ...
## $ start_lng : num [1:359978] -87.7 -87.7 -87.7 -87.8 -87.6 ...
## $ end_lat : num [1:359978] 42 41.9 42 41.9 41.9 ...
## $ end_lng : num [1:359978] -87.7 -87.7 -87.7 -87.8 -87.6 ...
## $ member_casual : chr [1:359978] "casual" "casual" "casual" "casual" ...
## - attr(*, "spec")=
## .. cols(
## .. ride_id = col_character(),
## .. rideable_type = col_character(),
## .. started_at = col_datetime(format = ""),
## .. ended_at = col_datetime(format = ""),
## .. start_station_name = col_character(),
## .. start_station_id = col_character(),
## .. end_station_name = col_character(),
## .. end_station_id = col_character(),
## .. start_lat = col_double(),
## .. start_lng = col_double(),
## .. end_lat = col_double(),
## .. end_lng = col_double(),
## .. member_casual = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
str(Dec_2021_trip)
## spec_tbl_df [247,540 x 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ ride_id : chr [1:247540] "46F8167220E4431F" "73A77762838B32FD" "4CF42452054F59C5" "3278BA87BF698339" ...
## $ rideable_type : chr [1:247540] "electric_bike" "electric_bike" "electric_bike" "classic_bike" ...
## $ started_at : POSIXct[1:247540], format: "2021-12-07 15:06:07" "2021-12-11 03:43:29" ...
## $ ended_at : POSIXct[1:247540], format: "2021-12-07 15:13:42" "2021-12-11 04:10:23" ...
## $ start_station_name: chr [1:247540] "Laflin St & Cullerton St" "LaSalle Dr & Huron St" "Halsted St & North Branch St" "Halsted St & North Branch St" ...
## $ start_station_id : chr [1:247540] "13307" "KP1705001026" "KA1504000117" "KA1504000117" ...
## $ end_station_name : chr [1:247540] "Morgan St & Polk St" "Clarendon Ave & Leland Ave" "Broadway & Barry Ave" "LaSalle Dr & Huron St" ...
## $ end_station_id : chr [1:247540] "TA1307000130" "TA1307000119" "13137" "KP1705001026" ...
## $ start_lat : num [1:247540] 41.9 41.9 41.9 41.9 41.9 ...
## $ start_lng : num [1:247540] -87.7 -87.6 -87.6 -87.6 -87.7 ...
## $ end_lat : num [1:247540] 41.9 42 41.9 41.9 41.9 ...
## $ end_lng : num [1:247540] -87.7 -87.7 -87.6 -87.6 -87.6 ...
## $ member_casual : chr [1:247540] "member" "casual" "member" "member" ...
## - attr(*, "spec")=
## .. cols(
## .. ride_id = col_character(),
## .. rideable_type = col_character(),
## .. started_at = col_datetime(format = ""),
## .. ended_at = col_datetime(format = ""),
## .. start_station_name = col_character(),
## .. start_station_id = col_character(),
## .. end_station_name = col_character(),
## .. end_station_id = col_character(),
## .. start_lat = col_double(),
## .. start_lng = col_double(),
## .. end_lat = col_double(),
## .. end_lng = col_double(),
## .. member_casual = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
str(Jan_2022_trip)
## spec_tbl_df [103,770 x 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ ride_id : chr [1:103770] "C2F7DD78E82EC875" "A6CF8980A652D272" "BD0F91DFF741C66D" "CBB80ED419105406" ...
## $ rideable_type : chr [1:103770] "electric_bike" "electric_bike" "classic_bike" "classic_bike" ...
## $ started_at : POSIXct[1:103770], format: "2022-01-13 11:59:47" "2022-01-10 08:41:56" ...
## $ ended_at : POSIXct[1:103770], format: "2022-01-13 12:02:44" "2022-01-10 08:46:17" ...
## $ start_station_name: chr [1:103770] "Glenwood Ave & Touhy Ave" "Glenwood Ave & Touhy Ave" "Sheffield Ave & Fullerton Ave" "Clark St & Bryn Mawr Ave" ...
## $ start_station_id : chr [1:103770] "525" "525" "TA1306000016" "KA1504000151" ...
## $ end_station_name : chr [1:103770] "Clark St & Touhy Ave" "Clark St & Touhy Ave" "Greenview Ave & Fullerton Ave" "Paulina St & Montrose Ave" ...
## $ end_station_id : chr [1:103770] "RP-007" "RP-007" "TA1307000001" "TA1309000021" ...
## $ start_lat : num [1:103770] 42 42 41.9 42 41.9 ...
## $ start_lng : num [1:103770] -87.7 -87.7 -87.7 -87.7 -87.6 ...
## $ end_lat : num [1:103770] 42 42 41.9 42 41.9 ...
## $ end_lng : num [1:103770] -87.7 -87.7 -87.7 -87.7 -87.6 ...
## $ member_casual : chr [1:103770] "casual" "casual" "member" "casual" ...
## - attr(*, "spec")=
## .. cols(
## .. ride_id = col_character(),
## .. rideable_type = col_character(),
## .. started_at = col_datetime(format = ""),
## .. ended_at = col_datetime(format = ""),
## .. start_station_name = col_character(),
## .. start_station_id = col_character(),
## .. end_station_name = col_character(),
## .. end_station_id = col_character(),
## .. start_lat = col_double(),
## .. start_lng = col_double(),
## .. end_lat = col_double(),
## .. end_lng = col_double(),
## .. member_casual = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
Feb_2021_trip <- mutate(Feb_2021_trip, start_station_id = as.character(start_station_id), end_station_id = as.character(end_station_id), started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S"), ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S"))
March_2021_trip <- mutate(March_2021_trip, start_station_id = as.character(start_station_id), end_station_id = as.character(end_station_id), started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S"), ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S"))
April_2021_trip <- mutate(April_2021_trip, start_station_id = as.character(start_station_id), end_station_id = as.character(end_station_id), started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S"), ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S"))
May_2021_trip <- mutate(May_2021_trip, start_station_id = as.character(start_station_id), end_station_id = as.character(end_station_id), started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S"), ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S"))
June_2021_trip <- mutate(June_2021_trip, start_station_id = as.character(start_station_id), end_station_id = as.character(end_station_id), started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S"), ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S"))
July_2021_trip <- mutate(July_2021_trip, start_station_id = as.character(start_station_id), end_station_id = as.character(end_station_id), started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S"), ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S"))
August_2021_trip <- mutate(August_2021_trip, start_station_id = as.character(start_station_id), end_station_id = as.character(end_station_id), started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S"), ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S"))
Sept_2021_trip <- mutate(Sept_2021_trip, start_station_id = as.character(start_station_id), end_station_id = as.character(end_station_id), started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S"), ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S"))
Oct_2021_trip <- mutate(Oct_2021_trip, start_station_id = as.character(start_station_id), end_station_id = as.character(end_station_id), started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S"), ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S"))
Nov_2021_trip <- mutate(Nov_2021_trip, start_station_id = as.character(start_station_id), end_station_id = as.character(end_station_id), started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S"), ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S"))
Dec_2021_trip <- mutate(Dec_2021_trip, start_station_id = as.character(start_station_id), end_station_id = as.character(end_station_id), started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S"), ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S"))
Jan_2022_trip <- mutate(Jan_2022_trip, start_station_id = as.character(start_station_id), end_station_id = as.character(end_station_id), started_at = as.POSIXct(started_at, format="%Y-%m-%d %H:%M:%S"), ended_at = as.POSIXct(ended_at, format="%Y-%m-%d %H:%M:%S"))
All_trip <- bind_rows(Feb_2021_trip, March_2021_trip, April_2021_trip, May_2021_trip, June_2021_trip, July_2021_trip, August_2021_trip, Sept_2021_trip, Oct_2021_trip, Nov_2021_trip, Dec_2021_trip, Jan_2022_trip)
#checking Column name
colnames(All_trip)
## [1] "ride_id" "rideable_type" "started_at"
## [4] "ended_at" "start_station_name" "start_station_id"
## [7] "end_station_name" "end_station_id" "start_lat"
## [10] "start_lng" "end_lat" "end_lng"
## [13] "member_casual"
#checking number of row
nrow(All_trip)
## [1] 5601999
#Checking Dimensions
dim(All_trip)
## [1] 5601999 13
#checking First six row of data
head(All_trip)
## # A tibble: 6 x 13
## ride_id rideable_type started_at ended_at start_station_name
## <chr> <chr> <dttm> <dttm> <chr>
## 1 89E7AA~ classic_bike NA NA Glenwood Ave & Touhy Ave
## 2 0FEFDE~ classic_bike NA NA Glenwood Ave & Touhy Ave
## 3 E6159D~ electric_bike NA NA Clark St & Lake St
## 4 B32D31~ classic_bike NA NA Wood St & Chicago Ave
## 5 83E463~ electric_bike NA NA State St & 33rd St
## 6 BDAA7E~ electric_bike NA NA Fairbanks St & Superior St
## # ... with 8 more variables: start_station_id <chr>, end_station_name <chr>,
## # end_station_id <chr>, start_lat <dbl>, start_lng <dbl>, end_lat <dbl>,
## # end_lng <dbl>, member_casual <chr>
#Checking Column names and Data types
str(All_trip)
## tibble [5,601,999 x 13] (S3: tbl_df/tbl/data.frame)
## $ ride_id : chr [1:5601999] "89E7AA6C29227EFF" "0FEFDE2603568365" "E6159D746B2DBB91" "B32D3199F1C2E75B" ...
## $ rideable_type : chr [1:5601999] "classic_bike" "classic_bike" "electric_bike" "classic_bike" ...
## $ started_at : POSIXct[1:5601999], format: NA NA ...
## $ ended_at : POSIXct[1:5601999], format: NA NA ...
## $ start_station_name: chr [1:5601999] "Glenwood Ave & Touhy Ave" "Glenwood Ave & Touhy Ave" "Clark St & Lake St" "Wood St & Chicago Ave" ...
## $ start_station_id : chr [1:5601999] "525" "525" "KA1503000012" "637" ...
## $ end_station_name : chr [1:5601999] "Sheridan Rd & Columbia Ave" "Bosworth Ave & Howard St" "State St & Randolph St" "Honore St & Division St" ...
## $ end_station_id : chr [1:5601999] "660" "16806" "TA1305000029" "TA1305000034" ...
## $ start_lat : num [1:5601999] 42 42 41.9 41.9 41.8 ...
## $ start_lng : num [1:5601999] -87.7 -87.7 -87.6 -87.7 -87.6 ...
## $ end_lat : num [1:5601999] 42 42 41.9 41.9 41.8 ...
## $ end_lng : num [1:5601999] -87.7 -87.7 -87.6 -87.7 -87.6 ...
## $ member_casual : chr [1:5601999] "member" "casual" "member" "member" ...
#checking Summary of data frame
summary(All_trip)
## ride_id rideable_type started_at
## Length:5601999 Length:5601999 Min. :2021-03-01 00:01:09
## Class :character Class :character 1st Qu.:2021-06-12 13:30:12
## Mode :character Mode :character Median :2021-08-05 21:12:13
## Mean :2021-08-06 08:30:19
## 3rd Qu.:2021-09-29 07:41:17
## Max. :2022-01-31 23:58:37
## NA's :49622
## ended_at start_station_name start_station_id
## Min. :2021-03-01 00:06:28 Length:5601999 Length:5601999
## 1st Qu.:2021-06-12 13:57:47 Class :character Class :character
## Median :2021-08-05 21:33:22 Mode :character Mode :character
## Mean :2021-08-06 08:52:13
## 3rd Qu.:2021-09-29 07:55:19
## Max. :2022-02-01 01:46:16
## NA's :49622
## end_station_name end_station_id start_lat start_lng
## Length:5601999 Length:5601999 Min. :41.64 Min. :-87.84
## Class :character Class :character 1st Qu.:41.88 1st Qu.:-87.66
## Mode :character Mode :character Median :41.90 Median :-87.64
## Mean :41.90 Mean :-87.65
## 3rd Qu.:41.93 3rd Qu.:-87.63
## Max. :45.64 Max. :-73.80
##
## end_lat end_lng member_casual
## Min. :41.39 Min. :-88.97 Length:5601999
## 1st Qu.:41.88 1st Qu.:-87.66 Class :character
## Median :41.90 Median :-87.64 Mode :character
## Mean :41.90 Mean :-87.65
## 3rd Qu.:41.93 3rd Qu.:-87.63
## Max. :42.17 Max. :-87.49
## NA's :4754 NA's :4754
#Viewing some data
glimpse(All_trip)
## Rows: 5,601,999
## Columns: 13
## $ ride_id <chr> "89E7AA6C29227EFF", "0FEFDE2603568365", "E6159D746B~
## $ rideable_type <chr> "classic_bike", "classic_bike", "electric_bike", "c~
## $ started_at <dttm> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA~
## $ ended_at <dttm> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA~
## $ start_station_name <chr> "Glenwood Ave & Touhy Ave", "Glenwood Ave & Touhy A~
## $ start_station_id <chr> "525", "525", "KA1503000012", "637", "13216", "1800~
## $ end_station_name <chr> "Sheridan Rd & Columbia Ave", "Bosworth Ave & Howar~
## $ end_station_id <chr> "660", "16806", "TA1305000029", "TA1305000034", "TA~
## $ start_lat <dbl> 42.01270, 42.01270, 41.88579, 41.89563, 41.83473, 4~
## $ start_lng <dbl> -87.66606, -87.66606, -87.63110, -87.67207, -87.625~
## $ end_lat <dbl> 42.00458, 42.01954, 41.88487, 41.90312, 41.83816, 4~
## $ end_lng <dbl> -87.66141, -87.66956, -87.62750, -87.67394, -87.645~
## $ member_casual <chr> "member", "casual", "member", "member", "member", "~
colSums(is.na(All_trip))
## ride_id rideable_type started_at ended_at
## 0 0 49622 49622
## start_station_name start_station_id end_station_name end_station_id
## 698444 698441 746820 746820
## start_lat start_lng end_lat end_lng
## 0 0 4754 4754
## member_casual
## 0
All_trip_cleaned <- All_trip[complete.cases(All_trip), ]
# Filtering data In which trip Trip start time is greater than Trip end time
All_trip_cleaned <- All_trip_cleaned %>% filter(All_trip_cleaned$started_at<All_trip_cleaned$ended_at)
#Saving new column date
All_trip_cleaned$date <- as.Date(All_trip_cleaned$started_at, format= "%m/%d/%Y %H:%M")
#Saving new column month
All_trip_cleaned$month <- format(as.Date(All_trip_cleaned$date), "%m")
#Saving new column dates
All_trip_cleaned$dates <- format(as.Date(All_trip_cleaned$date), "%d")
#Saving new column year
All_trip_cleaned$year <- format(as.Date(All_trip_cleaned$date), "%Y")
#Saving new column day
All_trip_cleaned$day <- format(as.Date(All_trip_cleaned$date), "%A")
#Viewing some data to check everything is uniform and consistent
glimpse(All_trip_cleaned)
## Rows: 4,541,730
## Columns: 18
## $ ride_id <chr> "CFA86D4455AA1030", "30D9DC61227D1AF3", "846D87A156~
## $ rideable_type <chr> "classic_bike", "classic_bike", "classic_bike", "cl~
## $ started_at <dttm> 2021-03-16 08:32:30, 2021-03-28 01:26:28, 2021-03-~
## $ ended_at <dttm> 2021-03-16 08:36:34, 2021-03-28 01:36:55, 2021-03-~
## $ start_station_name <chr> "Humboldt Blvd & Armitage Ave", "Humboldt Blvd & Ar~
## $ start_station_id <chr> "15651", "15651", "15443", "TA1308000021", "525", "~
## $ end_station_name <chr> "Stave St & Armitage Ave", "Central Park Ave & Bloo~
## $ end_station_id <chr> "13266", "18017", "TA1308000043", "13323", "E008", ~
## $ start_lat <dbl> 41.91751, 41.91751, 41.84273, 41.96881, 42.01270, 4~
## $ start_lng <dbl> -87.70181, -87.70181, -87.63549, -87.65766, -87.666~
## $ end_lat <dbl> 41.91774, 41.91417, 41.83066, 41.95283, 42.05049, 4~
## $ end_lng <dbl> -87.69139, -87.71676, -87.64717, -87.64999, -87.677~
## $ member_casual <chr> "casual", "casual", "casual", "casual", "casual", "~
## $ date <date> 2021-03-16, 2021-03-28, 2021-03-11, 2021-03-11, 20~
## $ month <chr> "03", "03", "03", "03", "03", "03", "03", "03", "03~
## $ dates <chr> "16", "28", "11", "11", "21", "20", "20", "23", "31~
## $ year <chr> "2021", "2021", "2021", "2021", "2021", "2021", "20~
## $ day <chr> "Tuesday", "Sunday", "Thursday", "Thursday", "Sunda~
#Calculating ride duration/ Trip duration and dividing it by 60 to convert it to minutes
All_trip_cleaned$ride_duration <- difftime(All_trip_cleaned$ended_at,All_trip_cleaned$started_at)/60
str(All_trip_cleaned)
## tibble [4,541,730 x 19] (S3: tbl_df/tbl/data.frame)
## $ ride_id : chr [1:4541730] "CFA86D4455AA1030" "30D9DC61227D1AF3" "846D87A15682A284" "994D05AA75A168F2" ...
## $ rideable_type : chr [1:4541730] "classic_bike" "classic_bike" "classic_bike" "classic_bike" ...
## $ started_at : POSIXct[1:4541730], format: "2021-03-16 08:32:30" "2021-03-28 01:26:28" ...
## $ ended_at : POSIXct[1:4541730], format: "2021-03-16 08:36:34" "2021-03-28 01:36:55" ...
## $ start_station_name: chr [1:4541730] "Humboldt Blvd & Armitage Ave" "Humboldt Blvd & Armitage Ave" "Shields Ave & 28th Pl" "Winthrop Ave & Lawrence Ave" ...
## $ start_station_id : chr [1:4541730] "15651" "15651" "15443" "TA1308000021" ...
## $ end_station_name : chr [1:4541730] "Stave St & Armitage Ave" "Central Park Ave & Bloomingdale Ave" "Halsted St & 35th St" "Broadway & Sheridan Rd" ...
## $ end_station_id : chr [1:4541730] "13266" "18017" "TA1308000043" "13323" ...
## $ start_lat : num [1:4541730] 41.9 41.9 41.8 42 42 ...
## $ start_lng : num [1:4541730] -87.7 -87.7 -87.6 -87.7 -87.7 ...
## $ end_lat : num [1:4541730] 41.9 41.9 41.8 42 42.1 ...
## $ end_lng : num [1:4541730] -87.7 -87.7 -87.6 -87.6 -87.7 ...
## $ member_casual : chr [1:4541730] "casual" "casual" "casual" "casual" ...
## $ date : Date[1:4541730], format: "2021-03-16" "2021-03-28" ...
## $ month : chr [1:4541730] "03" "03" "03" "03" ...
## $ dates : chr [1:4541730] "16" "28" "11" "11" ...
## $ year : chr [1:4541730] "2021" "2021" "2021" "2021" ...
## $ day : chr [1:4541730] "Tuesday" "Sunday" "Thursday" "Thursday" ...
## $ ride_duration : 'difftime' num [1:4541730] 4.06666666666667 10.45 16.4 28.9833333333333 ...
## ..- attr(*, "units")= chr "secs"
#Converting “ride_length” from Factor to numeric in order to do calculations
is.factor(All_trip_cleaned$ride_duration)
## [1] FALSE
All_trip_cleaned$ride_duration <- as.numeric(as.character(All_trip_cleaned$ride_duration))
is.numeric(All_trip_cleaned$ride_duration)
## [1] TRUE
#Removing “bad” data and creating a new data frame
All_trip_cleaned_v1<- All_trip_cleaned[!(All_trip_cleaned$start_station_name == "HQ QR" |All_trip_cleaned$ride_duration < 0),]
#calculating Average ride duration
mean(All_trip_cleaned_v1$ride_duration)
## [1] 21.79437
#calculating Median ride duration
median(All_trip_cleaned_v1$ride_duration)
## [1] 12.18333
#calculating maximum ride duration
max(All_trip_cleaned_v1$ride_duration)
## [1] 55944.15
#Calculating minimum rude duration
min(All_trip_cleaned_v1$ride_duration)
## [1] 0.01666667
#Viewing Average, median, maximum and minimum ride duration for better comparison
summary(All_trip_cleaned_v1$ride_duration)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.02 6.93 12.18 21.79 22.08 55944.15
aggregate(All_trip_cleaned_v1$ride_duration ~ All_trip_cleaned_v1$member_casual, FUN = mean)
## All_trip_cleaned_v1$member_casual All_trip_cleaned_v1$ride_duration
## 1 casual 32.46070
## 2 member 13.11519
aggregate(All_trip_cleaned_v1$ride_duration ~ All_trip_cleaned_v1$member_casual, FUN = median)
## All_trip_cleaned_v1$member_casual All_trip_cleaned_v1$ride_duration
## 1 casual 16.633333
## 2 member 9.666667
aggregate(All_trip_cleaned_v1$ride_duration ~ All_trip_cleaned_v1$member_casual, FUN = min)
## All_trip_cleaned_v1$member_casual All_trip_cleaned_v1$ride_duration
## 1 casual 0.01666667
## 2 member 0.01666667
aggregate(All_trip_cleaned_v1$ride_duration ~ All_trip_cleaned_v1$member_casual, FUN = max)
## All_trip_cleaned_v1$member_casual All_trip_cleaned_v1$ride_duration
## 1 casual 55944.150
## 2 member 1495.633
# Arranging and maintaining the order of day by week
All_trip_cleaned_v1$day <- ordered(All_trip_cleaned_v1$day, levels=c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))
aggregate(All_trip_cleaned_v1$ride_duration ~ All_trip_cleaned_v1$member_casual + All_trip_cleaned_v1$day, FUN = mean)
## All_trip_cleaned_v1$member_casual All_trip_cleaned_v1$day
## 1 casual Sunday
## 2 member Sunday
## 3 casual Monday
## 4 member Monday
## 5 casual Tuesday
## 6 member Tuesday
## 7 casual Wednesday
## 8 member Wednesday
## 9 casual Thursday
## 10 member Thursday
## 11 casual Friday
## 12 member Friday
## 13 casual Saturday
## 14 member Saturday
## All_trip_cleaned_v1$ride_duration
## 1 37.59261
## 2 15.14081
## 3 32.66038
## 4 12.66732
## 5 28.69078
## 6 12.31038
## 7 28.32422
## 8 12.37723
## 9 28.19142
## 10 12.28490
## 11 30.78521
## 12 12.74821
## 13 34.68589
## 14 14.74594
All_trip_cleaned_v1 %>% mutate(days = wday(started_at, label = TRUE)) %>% group_by(member_casual,days) %>% summarise(number_of_rides = n(), average_duration = mean(ride_duration)) %>% arrange(member_casual,days)
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
## # A tibble: 14 x 4
## # Groups: member_casual [2]
## member_casual days number_of_rides average_duration
## <chr> <ord> <int> <dbl>
## 1 casual Sun 401873 37.6
## 2 casual Mon 228420 32.7
## 3 casual Tue 214205 28.7
## 4 casual Wed 217115 28.3
## 5 casual Thu 223205 28.2
## 6 casual Fri 288289 30.8
## 7 casual Sat 464497 34.7
## 8 member Sun 306851 15.1
## 9 member Mon 343722 12.7
## 10 member Tue 384700 12.3
## 11 member Wed 392458 12.4
## 12 member Thu 369009 12.3
## 13 member Fri 358201 12.7
## 14 member Sat 349185 14.7
All_trip_cleaned_v1 %>% group_by(member_casual, day) %>% summarise(Number_of_rides = n(), .groups = 'drop') %>% ggplot(aes(x = day, y = Number_of_rides, fill = member_casual)) + geom_col(position = "dodge") + scale_y_continuous(labels = scales::comma) +labs(title = "Number of Rides by Day: Members vs. Casual Riders")
All_trip_cleaned_v1 %>% group_by(member_casual, day) %>% summarise(Average_duration = mean(ride_duration), .groups = 'drop') %>% ggplot(aes(x = day, y =Average_duration, fill = member_casual )) + geom_col(position = "dodge") + labs(title = "Average ride duration per day Vs Member/Casual")
All_trip_cleaned_v1 %>% group_by(month,member_casual) %>% summarise( number_of_rides= n(), .groups = 'drop') %>% ggplot(aes(x = month,y = number_of_rides, fill = member_casual)) + geom_bar(position = 'dodge', stat = 'identity') + scale_y_continuous(labels = scales::comma) + labs(title = "Number of Rides by Month: Casual Vs member")
All_trip_cleaned_v1 %>% group_by(month,member_casual) %>% summarise( average_ride_duration = mean(ride_duration), .groups = 'drop') %>% ggplot(aes(x = month,y = average_ride_duration, fill = member_casual)) + geom_bar(position = 'dodge', stat = 'identity') + scale_y_continuous(labels = scales::comma) + labs(title = "Average ride duration by Month: Casual Vs member")
All_trip_cleaned_v1 %>% group_by(member_casual) %>% summarize(number_of_rides = n()) %>% mutate(percentage = round(number_of_rides*100/ sum(number_of_rides))) %>% ggplot(aes(x = "", y = number_of_rides, fill = member_casual)) + geom_bar(width = 1, stat="identity", color = "white", show.legend = FALSE) + coord_polar("y", start=0) + geom_text(aes(label = paste(member_casual,paste(percentage,"%"), sep = "\n")), position = position_stack(vjust = 0.5),color ="white") + labs(title = "Ride Percentage of Rider type ") + theme_void()
#Combining all Station's name
#Removing No station name
All_stations <- bind_rows( data.frame("station" = All_trip_cleaned_v1$start_station_name,"member_casual" = All_trip_cleaned_v1$member_casual), data.frame("station" = All_trip_cleaned_v1$end_station_name, "member_casual" = All_trip_cleaned_v1$member_casual))
All_stations_v1 <- All_stations[!(All_stations$station == "" | is.na(All_stations$station)), ]
#Separating rider type
All_station_member <- All_stations_v1[All_stations_v1$member_casual == 'member',]
All_station_casual <- All_stations_v1[All_stations_v1$member_casual == 'casual',]
#Top 10 Station's combined casual & member
Top_station <- All_stations_v1 %>% group_by(station) %>% summarise(station_count =n()) %>% arrange(desc(station_count)) %>% arrange(desc(station_count)) %>% slice(1:10)
#Top 10 station member rider type
Top_10_station_member <- All_station_member %>% group_by(station) %>% summarise(station_count = n()) %>% arrange(desc(station_count)) %>% head(n=10)
#Top 10 station casual rider type
Top_10_station_casual <- All_station_casual %>% group_by(station) %>% summarise(station_count = n()) %>% arrange(desc(station_count)) %>% head(n=10)
ggplot(data = Top_10_station_member) + geom_col(aes(x= station, y= station_count, fill = "thistle")) + labs(title = "Top 10 Used Stations by Members", y = "Number of Rides", x = "") + scale_y_continuous(labels = scales::comma)+ coord_flip() + theme_minimal()
ggplot(data = Top_10_station_casual) + geom_col(aes(x= station, y= station_count, fill = "thistle")) + labs(title = "Top 10 Used Stations by Casual", y = "Number of Rides", x = "") + scale_y_continuous(labels = scales::comma)+ coord_flip() + theme_minimal()
#Bike type preference of rider type
All_trip_cleaned_v1 %>% group_by(rideable_type, member_casual) %>% summarize(number_of_rides = n(), .groups = 'drop')
## # A tibble: 5 x 3
## rideable_type member_casual number_of_rides
## <chr> <chr> <int>
## 1 classic_bike casual 1254587
## 2 classic_bike member 1945901
## 3 docked_bike casual 309607
## 4 electric_bike casual 473410
## 5 electric_bike member 558225
All_trip_cleaned_v1 %>% group_by(rideable_type, member_casual) %>% summarize(number_of_rides = n(), .groups = 'drop') %>% ggplot(aes(x = rideable_type, y =number_of_rides, fill = member_casual )) + geom_col(position = "dodge") + labs(title = "Number of rides & Rideable type Vs Member/Casual")
All_trip_cleaned_v1 %>% group_by(rideable_type) %>% summarize(number_of_rides = n()) %>% mutate(percentage = round(number_of_rides*100/ sum(number_of_rides))) %>% ggplot(aes(x = "", y = number_of_rides, fill = rideable_type)) + geom_bar(width = 1, stat="identity", color = "white", show.legend = FALSE) + coord_polar("y", start=0) + geom_text(aes(label = paste(rideable_type,paste(percentage,"%"), sep = "\n")), position = position_stack(vjust = 0.5),color ="white") + labs(title = "Ride Percentage of Bike type") + theme_void()
All_trip_cleaned_v1 %>% group_by(month, member_casual, rideable_type) %>% summarize(number_of_rides = n(), .groups = 'drop') %>% drop_na() %>% ggplot(aes(x = month, y = number_of_rides, fill = member_casual)) + geom_col(position = "dodge") + scale_y_continuous(labels = scales::comma) + facet_grid(member_casual~rideable_type) + labs(title = "Number of Rides by Month") + theme(axis.text.x = element_text(angle = 90))
All_trip_cleaned_v1 %>% group_by(month, member_casual, rideable_type) %>% summarize(average_ride_duration = mean(ride_duration), .groups = 'drop') %>% drop_na() %>% ggplot(aes(x = month, y = average_ride_duration, fill = member_casual)) + geom_col(position = "dodge") + scale_y_continuous(labels = scales::comma) + facet_grid(member_casual~rideable_type) + labs(title = "Average Ride duration by Month") + theme(axis.text.x = element_text(angle = 90))
The sole purpose of this data analysis is to design a marketing strategies to convert casual rider into member rider by using digital media and to perform data analysis using Data driven decision making process : Ask - Prepare - Process - Analyze - Share - Act.
Number of rides of Casual rider is higher On Saturday & Sunday, Number of rides of Member rider is higher from Monday to Friday.
Average ride duration of Casual rider is almost double compared to Member rider type on any given day of week.
In any given month of calender the Average ride duration of Casual rider is almost double of Member rider type.
In last 12 months the Ride percentage of Rider type was Casual Rider : 45 % Member Rider : 55 %
In last 12 months the Bike preference type of rider was Classic Bike : 71% Electric Bike : 23% Docked Bike : 7%
Docked bike is only Preferred by casual member
Design a digital marketing campaign to offer discount on Membership and start running campaign since Thursday and end it on Saturday morning.
Design a digital marketing campaign (Video or series of posts) to educate : How using Cyclist user can avoid heavy traffic, Health benefit of using a bike over other commuting option, Emission free commute and user’s contribution toward Environment.
as well as keeping user Interest toward Membership.
Design a marketing campaign for local business for choosing Cyclist as their commuting partner, showing advantages compared to other commuting options.
Further analysis of Geometric coordinate, Ride distance analysis, Ride route preferences as well as Regression can be done to further enhance the result.
Survey can be conducted on Casual rider to understand more about user and can be mixed survey data with this analysis to increase effectiveness of marketing camping.
This data analysis follows : Ask - Prepare - Process - Analyze - Share - Act. We followed Ask- Prepare - Process - Analyze as well as created effective visualization and make recommendation.