This is a report containing my analysis of bike ride company Divvy by Lyft. Data used in this project is made public by Divvy.
In order to guide the marketing program these 3 questions will guide the decision making.
This report will work towards an awenser to question 1.
Gain insight into how Divvy’s annual members and casual riders use the service differently in order to get more members.
It is the belief that it is the most reasonable cause of action since casual riders already have tried the service.
Converting existing users should be easier than gaining new annual members from people unfamiliar with the service.
Annual members should be more profitable in the long run when we look at customer lifetime value.
The rest of the business tasks will be carried out by my competent team members and include:
Figure out why casual riders buy Divvy annual membership.
Understand how Divvy can use digital media to influence casual riders to become members.
Has assigned me the task of identifying the differences between members and casual riders, with the aim of making recommendations to improve the growth of annual membership.
Responsibilities include: Collecting, analyzing, and reporting data that helps guide Divvy’s marketing strategy.
Will decide whether to approve the recommended marketing program.
It is currently stored on Amazon Web Services (AWS): https://divvy-tripdata.s3.amazonaws.com/index.html
It is organized in zip folders with some data set ordered by: Year, Month, and then what the data includes. Other data sets organized by: What data is included, Year then Quarter.
Historical trip data available to the public.
https://www.divvybikes.com/system-data
Here you’ll find Divvy’s trip data for public use. So whether you’re a policy maker, transportation professional, web developer, designer, or just plain curious, feel free to download it, map it, animate it, or bring it to life!
Note: This data is provided according to the Divvy Data License Agreement and released on a monthly schedule.
https://www.divvybikes.com/data-license-agreement
Each trip is anonymized and includes:
Trip start day and time Trip end day and time Trip start station Trip end station Rider type (Member, Single Ride, and Day Pass) The data has been processed to remove trips that are taken by staff as they service and inspect the system; and any trips that were below 60 seconds in length (potentially false starts or users trying to re-dock a bike to ensure it was secure).
Download Divvy trip history data.
https://divvy-tripdata.s3.amazonaws.com/index.html
You can get live station info on our station GBFS JSON feed.
library(tidyverse) # collection of R packages data wrangling
Registered S3 methods overwritten by 'dbplyr':
method from
print.tbl_lazy
print.tbl_sql
-- Attaching packages ---------------------------------------------------------------------- tidyverse 1.3.1 --
v ggplot2 3.3.5 v purrr 0.3.4
v tibble 3.1.5 v dplyr 1.0.7
v tidyr 1.1.4 v stringr 1.4.0
v readr 2.0.2 v forcats 0.5.1
-- Conflicts ------------------------------------------------------------------------- tidyverse_conflicts() --
x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()
library(lubridate) # wrangle date attributes
Vedhæfter pakke: ‘lubridate’
De følgende objekter er maskerede fra ‘package:base’:
date, intersect, setdiff, union
library(ggplot2) # data visualization
library(readr)
X2020_09 <- read_csv("input/divvy-data/202009-divvy-tripdata.csv")
Rows: 532958 Columns: 13
-- Column specification ---------------------------------------------------------------------------------------
Delimiter: ","
chr (5): ride_id, rideable_type, start_station_name, end_station_name, member_casual
dbl (6): start_station_id, end_station_id, start_lat, start_lng, end_lat, end_lng
dttm (2): started_at, ended_at
i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
X2020_10 <- read_csv("input/divvy-data/202010-divvy-tripdata.csv")
Rows: 388653 Columns: 13
-- Column specification ---------------------------------------------------------------------------------------
Delimiter: ","
chr (5): ride_id, rideable_type, start_station_name, end_station_name, member_casual
dbl (6): start_station_id, end_station_id, start_lat, start_lng, end_lat, end_lng
dttm (2): started_at, ended_at
i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
X2020_11 <- read_csv("input/divvy-data/202011-divvy-tripdata.csv")
Rows: 259716 Columns: 13
-- Column specification ---------------------------------------------------------------------------------------
Delimiter: ","
chr (5): ride_id, rideable_type, start_station_name, end_station_name, member_casual
dbl (6): start_station_id, end_station_id, start_lat, start_lng, end_lat, end_lng
dttm (2): started_at, ended_at
i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
X2020_12 <- read_csv("input/divvy-data/202012-divvy-tripdata.csv")
Rows: 131573 Columns: 13
-- Column specification ---------------------------------------------------------------------------------------
Delimiter: ","
chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_station_name, end_station_id, m...
dbl (4): start_lat, start_lng, end_lat, end_lng
dttm (2): started_at, ended_at
i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
X2021_01 <- read_csv("input/divvy-data/202101-divvy-tripdata.csv")
Rows: 96834 Columns: 13
-- Column specification ---------------------------------------------------------------------------------------
Delimiter: ","
chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_station_name, end_station_id, m...
dbl (4): start_lat, start_lng, end_lat, end_lng
dttm (2): started_at, ended_at
i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
X2021_02 <- read_csv("input/divvy-data/202102-divvy-tripdata.csv")
Rows: 49622 Columns: 13
-- Column specification ---------------------------------------------------------------------------------------
Delimiter: ","
chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_station_name, end_station_id, m...
dbl (4): start_lat, start_lng, end_lat, end_lng
dttm (2): started_at, ended_at
i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
X2021_03 <- read_csv("input/divvy-data/202103-divvy-tripdata.csv")
Rows: 228496 Columns: 13
-- Column specification ---------------------------------------------------------------------------------------
Delimiter: ","
chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_station_name, end_station_id, m...
dbl (4): start_lat, start_lng, end_lat, end_lng
dttm (2): started_at, ended_at
i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
X2021_04 <- read_csv("input/divvy-data/202104-divvy-tripdata.csv")
Rows: 337230 Columns: 13
-- Column specification ---------------------------------------------------------------------------------------
Delimiter: ","
chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_station_name, end_station_id, m...
dbl (4): start_lat, start_lng, end_lat, end_lng
dttm (2): started_at, ended_at
i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
X2021_05 <- read_csv("input/divvy-data/202105-divvy-tripdata.csv")
Rows: 531633 Columns: 13
-- Column specification ---------------------------------------------------------------------------------------
Delimiter: ","
chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_station_name, end_station_id, m...
dbl (4): start_lat, start_lng, end_lat, end_lng
dttm (2): started_at, ended_at
i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
X2021_06 <- read_csv("input/divvy-data/202106-divvy-tripdata.csv")
Rows: 729595 Columns: 13
-- Column specification ---------------------------------------------------------------------------------------
Delimiter: ","
chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_station_name, end_station_id, m...
dbl (4): start_lat, start_lng, end_lat, end_lng
dttm (2): started_at, ended_at
i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
X2021_07 <- read_csv("input/divvy-data/202107-divvy-tripdata.csv")
Rows: 822410 Columns: 13
-- Column specification ---------------------------------------------------------------------------------------
Delimiter: ","
chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_station_name, end_station_id, m...
dbl (4): start_lat, start_lng, end_lat, end_lng
dttm (2): started_at, ended_at
i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
X2021_08 <- read_csv("input/divvy-data/202102-divvy-tripdata.csv")
Rows: 49622 Columns: 13
-- Column specification ---------------------------------------------------------------------------------------
Delimiter: ","
chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_station_name, end_station_id, m...
dbl (4): start_lat, start_lng, end_lat, end_lng
dttm (2): started_at, ended_at
i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
Tools used:
R will be my main tool in this project as it is capable of handling larger data sets and visualizing data.
SQL was used to quickly gain insight into what the data was about and how much data was in the files.
Excel was used to test the limits for what a spreadsheet can handle, a little cleaning and small calculations possible to be made here, but for large data R and SQL is recommended.
Other tools to consider: Dashboard software like Tableau or Power Bi could be used together with Divvy live data from stations. Since my assignment for now is to come up with a recommendation for a longer term plan, this is not as beneficial to start with. This could be useful for service or campaign with handouts and Divvy representatives helping upgrade members.
str(X2020_09)
spec_tbl_df [532,958 x 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ ride_id : chr [1:532958] "2B22BD5F95FB2629" "A7FB70B4AFC6CAF2" "86057FA01BAC778E" "57F6DC9A153DB98C" ...
$ rideable_type : chr [1:532958] "electric_bike" "electric_bike" "electric_bike" "electric_bike" ...
$ started_at : POSIXct[1:532958], format: "2020-09-17 14:27:11" "2020-09-17 15:07:31" "2020-09-17 15:09:04" "2020-09-17 18:10:46" ...
$ ended_at : POSIXct[1:532958], format: "2020-09-17 14:44:24" "2020-09-17 15:07:45" "2020-09-17 15:09:35" "2020-09-17 18:35:49" ...
$ start_station_name: chr [1:532958] "Michigan Ave & Lake St" "W Oakdale Ave & N Broadway" "W Oakdale Ave & N Broadway" "Ashland Ave & Belle Plaine Ave" ...
$ start_station_id : num [1:532958] 52 NA NA 246 24 94 291 NA NA NA ...
$ end_station_name : chr [1:532958] "Green St & Randolph St" "W Oakdale Ave & N Broadway" "W Oakdale Ave & N Broadway" "Montrose Harbor" ...
$ end_station_id : num [1:532958] 112 NA NA 249 24 NA 256 NA NA NA ...
$ start_lat : num [1:532958] 41.9 41.9 41.9 42 41.9 ...
$ start_lng : num [1:532958] -87.6 -87.6 -87.6 -87.7 -87.6 ...
$ end_lat : num [1:532958] 41.9 41.9 41.9 42 41.9 ...
$ end_lng : num [1:532958] -87.6 -87.6 -87.6 -87.6 -87.6 ...
$ member_casual : chr [1:532958] "casual" "casual" "casual" "casual" ...
- attr(*, "spec")=
.. cols(
.. ride_id = col_character(),
.. rideable_type = col_character(),
.. started_at = col_datetime(format = ""),
.. ended_at = col_datetime(format = ""),
.. start_station_name = col_character(),
.. start_station_id = col_double(),
.. end_station_name = col_character(),
.. end_station_id = col_double(),
.. start_lat = col_double(),
.. start_lng = col_double(),
.. end_lat = col_double(),
.. end_lng = col_double(),
.. member_casual = col_character()
.. )
- attr(*, "problems")=<externalptr>
I notice the station id data type changed in December 2020 from num to chr.
To ensure all my data share the newst data type I modify data sets from November 2020 and earlier to data type chr.
Converting data type:
X2020_11 <- mutate(X2020_11, start_station_id =as.character(start_station_id))
X2020_10 <- mutate(X2020_10, start_station_id =as.character(start_station_id))
X2020_09 <- mutate(X2020_09, start_station_id =as.character(start_station_id))
X2020_11 <- mutate(X2020_11, end_station_id =as.character(end_station_id))
X2020_10 <- mutate(X2020_10, end_station_id =as.character(end_station_id))
X2020_09 <- mutate(X2020_09, end_station_id =as.character(end_station_id))
all_trips <- bind_rows(X2020_09, X2020_10, X2020_11, X2020_12, X2021_01, X2021_02, X2021_03, X2021_04, X2021_05, X2021_06, X2021_07, X2021_08)
colnames(all_trips) # Check columb names
[1] "ride_id" "rideable_type" "started_at" "ended_at" "start_station_name"
[6] "start_station_id" "end_station_name" "end_station_id" "start_lat" "start_lng"
[11] "end_lat" "end_lng" "member_casual"
head(all_trips) # Check first 6 rows of the dataframe
str(all_trips) # Check structure for all_trips
spec_tbl_df [4,158,342 x 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ ride_id : chr [1:4158342] "2B22BD5F95FB2629" "A7FB70B4AFC6CAF2" "86057FA01BAC778E" "57F6DC9A153DB98C" ...
$ rideable_type : chr [1:4158342] "electric_bike" "electric_bike" "electric_bike" "electric_bike" ...
$ started_at : POSIXct[1:4158342], format: "2020-09-17 14:27:11" "2020-09-17 15:07:31" "2020-09-17 15:09:04" "2020-09-17 18:10:46" ...
$ ended_at : POSIXct[1:4158342], format: "2020-09-17 14:44:24" "2020-09-17 15:07:45" "2020-09-17 15:09:35" "2020-09-17 18:35:49" ...
$ start_station_name: chr [1:4158342] "Michigan Ave & Lake St" "W Oakdale Ave & N Broadway" "W Oakdale Ave & N Broadway" "Ashland Ave & Belle Plaine Ave" ...
$ start_station_id : chr [1:4158342] "52" NA NA "246" ...
$ end_station_name : chr [1:4158342] "Green St & Randolph St" "W Oakdale Ave & N Broadway" "W Oakdale Ave & N Broadway" "Montrose Harbor" ...
$ end_station_id : chr [1:4158342] "112" NA NA "249" ...
$ start_lat : num [1:4158342] 41.9 41.9 41.9 42 41.9 ...
$ start_lng : num [1:4158342] -87.6 -87.6 -87.6 -87.7 -87.6 ...
$ end_lat : num [1:4158342] 41.9 41.9 41.9 42 41.9 ...
$ end_lng : num [1:4158342] -87.6 -87.6 -87.6 -87.6 -87.6 ...
$ member_casual : chr [1:4158342] "casual" "casual" "casual" "casual" ...
- attr(*, "spec")=
.. cols(
.. ride_id = col_character(),
.. rideable_type = col_character(),
.. started_at = col_datetime(format = ""),
.. ended_at = col_datetime(format = ""),
.. start_station_name = col_character(),
.. start_station_id = col_double(),
.. end_station_name = col_character(),
.. end_station_id = col_double(),
.. start_lat = col_double(),
.. start_lng = col_double(),
.. end_lat = col_double(),
.. end_lng = col_double(),
.. member_casual = col_character()
.. )
- attr(*, "problems")=<externalptr>
dim(all_trips)
[1] 4158342 13
nrow(all_trips)
[1] 4158342
summary(all_trips) # Summary
ride_id rideable_type started_at ended_at
Length:4158342 Length:4158342 Min. :2020-09-01 00:00:07 Min. :2020-09-01 00:04:43
Class :character Class :character 1st Qu.:2020-11-09 16:53:46 1st Qu.:2020-11-09 17:14:31
Mode :character Mode :character Median :2021-05-01 11:42:16 Median :2021-05-01 12:06:31
Mean :2021-03-17 01:06:13 Mean :2021-03-17 01:27:19
3rd Qu.:2021-06-20 07:01:36 3rd Qu.:2021-06-20 07:52:01
Max. :2021-07-31 23:59:58 Max. :2021-08-12 17:45:41
start_station_name start_station_id end_station_name end_station_id start_lat start_lng
Length:4158342 Length:4158342 Length:4158342 Length:4158342 Min. :41.64 Min. :-87.84
Class :character Class :character Class :character Class :character 1st Qu.:41.88 1st Qu.:-87.66
Mode :character Mode :character Mode :character Mode :character Median :41.90 Median :-87.64
Mean :41.90 Mean :-87.64
3rd Qu.:41.93 3rd Qu.:-87.63
Max. :42.08 Max. :-87.52
end_lat end_lng member_casual
Min. :41.51 Min. :-88.07 Length:4158342
1st Qu.:41.88 1st Qu.:-87.66 Class :character
Median :41.90 Median :-87.64 Mode :character
Mean :41.90 Mean :-87.64
3rd Qu.:41.93 3rd Qu.:-87.63
Max. :42.15 Max. :-87.44
NA's :4523 NA's :4523
table(all_trips$rideable_type) # Checking what rideable types are included
classic_bike docked_bike electric_bike
1820526 1003455 1334361
table(all_trips$member_casual) # Check membership and casual users
casual member
1822549 2335793
all_trips$date <- as.Date(all_trips$started_at)
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
all_trips$year <- as.Date(all_trips$date, "%y")
all_trips$month <- as.Date(all_trips$date, "%m")
all_trips$day <- as.Date(all_trips$date, "%d")
all_trips$day_of_week <- as.Date(all_trips$date, "%a")
all_trips$ride_length <- difftime(all_trips$ended_at, all_trips$started_at)
str(all_trips) # Checking structure of columns
spec_tbl_df [4,158,342 x 19] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ ride_id : chr [1:4158342] "2B22BD5F95FB2629" "A7FB70B4AFC6CAF2" "86057FA01BAC778E" "57F6DC9A153DB98C" ...
$ rideable_type : chr [1:4158342] "electric_bike" "electric_bike" "electric_bike" "electric_bike" ...
$ started_at : POSIXct[1:4158342], format: "2020-09-17 14:27:11" "2020-09-17 15:07:31" "2020-09-17 15:09:04" "2020-09-17 18:10:46" ...
$ ended_at : POSIXct[1:4158342], format: "2020-09-17 14:44:24" "2020-09-17 15:07:45" "2020-09-17 15:09:35" "2020-09-17 18:35:49" ...
$ start_station_name: chr [1:4158342] "Michigan Ave & Lake St" "W Oakdale Ave & N Broadway" "W Oakdale Ave & N Broadway" "Ashland Ave & Belle Plaine Ave" ...
$ start_station_id : chr [1:4158342] "52" NA NA "246" ...
$ end_station_name : chr [1:4158342] "Green St & Randolph St" "W Oakdale Ave & N Broadway" "W Oakdale Ave & N Broadway" "Montrose Harbor" ...
$ end_station_id : chr [1:4158342] "112" NA NA "249" ...
$ start_lat : num [1:4158342] 41.9 41.9 41.9 42 41.9 ...
$ start_lng : num [1:4158342] -87.6 -87.6 -87.6 -87.7 -87.6 ...
$ end_lat : num [1:4158342] 41.9 41.9 41.9 42 41.9 ...
$ end_lng : num [1:4158342] -87.6 -87.6 -87.6 -87.6 -87.6 ...
$ member_casual : chr [1:4158342] "casual" "casual" "casual" "casual" ...
$ date : Date[1:4158342], format: "2020-09-17" "2020-09-17" "2020-09-17" "2020-09-17" ...
$ year : Date[1:4158342], format: "2020-09-17" "2020-09-17" "2020-09-17" "2020-09-17" ...
$ month : Date[1:4158342], format: "2020-09-17" "2020-09-17" "2020-09-17" "2020-09-17" ...
$ day : Date[1:4158342], format: "2020-09-17" "2020-09-17" "2020-09-17" "2020-09-17" ...
$ day_of_week : Date[1:4158342], format: "2020-09-17" "2020-09-17" "2020-09-17" "2020-09-17" ...
$ ride_length : 'difftime' num [1:4158342] 1033 14 31 1503 ...
..- attr(*, "units")= chr "secs"
- attr(*, "spec")=
.. cols(
.. ride_id = col_character(),
.. rideable_type = col_character(),
.. started_at = col_datetime(format = ""),
.. ended_at = col_datetime(format = ""),
.. start_station_name = col_character(),
.. start_station_id = col_double(),
.. end_station_name = col_character(),
.. end_station_id = col_double(),
.. start_lat = col_double(),
.. start_lng = col_double(),
.. end_lat = col_double(),
.. end_lng = col_double(),
.. member_casual = col_character()
.. )
- attr(*, "problems")=<externalptr>
all_trips$ride_length <- as.numeric(as.character(all_trips$ride_length))
is.numeric(all_trips$ride_length) # Checking if converted to numeric.
[1] TRUE
all_trips_v1 <- all_trips[!(all_trips$start_station_name == "HQ QR" | all_trips$ride_length<0),] # Remove negative ride lenth and bikes taken out for quality controle
all_trips_clean <- drop_na(all_trips_v1)
summary(all_trips_clean) # Check summary of new data set
ride_id rideable_type started_at ended_at
Length:3596608 Length:3596608 Min. :2020-09-01 00:00:07 Min. :2020-09-01 00:04:43
Class :character Class :character 1st Qu.:2020-11-07 12:56:25 1st Qu.:2020-11-07 13:23:47
Mode :character Mode :character Median :2021-04-26 17:36:30 Median :2021-04-26 17:58:28
Mean :2021-03-13 20:48:29 Mean :2021-03-13 21:12:12
3rd Qu.:2021-06-19 11:00:00 3rd Qu.:2021-06-19 11:26:40
Max. :2021-07-31 23:59:57 Max. :2021-08-12 17:45:41
start_station_name start_station_id end_station_name end_station_id start_lat start_lng
Length:3596608 Length:3596608 Length:3596608 Length:3596608 Min. :41.65 Min. :-87.78
Class :character Class :character Class :character Class :character 1st Qu.:41.88 1st Qu.:-87.65
Mode :character Mode :character Mode :character Mode :character Median :41.90 Median :-87.64
Mean :41.90 Mean :-87.64
3rd Qu.:41.93 3rd Qu.:-87.63
Max. :42.06 Max. :-87.53
end_lat end_lng member_casual date year
Min. :41.65 Min. :-87.78 Length:3596608 Min. :2020-09-01 Min. :2020-09-01
1st Qu.:41.88 1st Qu.:-87.66 Class :character 1st Qu.:2020-11-07 1st Qu.:2020-11-07
Median :41.90 Median :-87.64 Mode :character Median :2021-04-26 Median :2021-04-26
Mean :41.90 Mean :-87.64 Mean :2021-03-13 Mean :2021-03-13
3rd Qu.:41.93 3rd Qu.:-87.63 3rd Qu.:2021-06-19 3rd Qu.:2021-06-19
Max. :42.08 Max. :-87.52 Max. :2021-07-31 Max. :2021-07-31
month day day_of_week ride_length
Min. :2020-09-01 Min. :2020-09-01 Min. :2020-09-01 Min. : 0
1st Qu.:2020-11-07 1st Qu.:2020-11-07 1st Qu.:2020-11-07 1st Qu.: 438
Median :2021-04-26 Median :2021-04-26 Median :2021-04-26 Median : 776
Mean :2021-03-13 Mean :2021-03-13 Mean :2021-03-13 Mean : 1422
3rd Qu.:2021-06-19 3rd Qu.:2021-06-19 3rd Qu.:2021-06-19 3rd Qu.: 1413
Max. :2021-07-31 Max. :2021-07-31 Max. :2021-07-31 Max. :3356649
head(all_trips_clean) # Check new data set
sum(is.na(all_trips_clean)) # Check if all "NA" entries was removed correctly.
[1] 0
Now it is time to analyze the data to gain valuable insights.
min(all_trips_clean$ride_length) # Shortest ride time - in seconds
[1] 0
max(all_trips_clean$ride_length) # Longest ride time - in seconds
[1] 3356649
median(all_trips_clean$ride_length) # Median ride time - in seconds
[1] 776
mean(all_trips_clean$ride_length) # Mean ride time - in seconds
[1] 1422.252
summary(all_trips_clean$ride_length)/60
Min. 1st Qu. Median Mean 3rd Qu.
0.00 7.30 12.93 23.70 23.55
Max.
55944.15
Median ride length in minutes
aggregate(all_trips_clean$ride_length/60 ~ all_trips_clean$member_casual, FUN = median)
Mean ride length in minutes
aggregate(all_trips_clean$ride_length/60 ~ all_trips_clean$member_casual, FUN = mean)
Order week days
all_trips_clean$day_of_week <- ordered(all_trips_clean$day_of_week, levels=c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))
Lets look at a table of how each member type uses the platform and for how long based on day of the week.
# analyze ridership data by type and weekday
all_trips_clean %>%
mutate(weekday = wday(started_at, label = TRUE)) %>% #creates weekday field using wday()
group_by(member_casual, weekday) %>% #groups by usertype and weekday
summarise(number_of_rides = n() #calculates the number of rides and average duration
,average_duration = mean(ride_length)) %>% # calculates the average duration
arrange(member_casual, weekday) # sorts
`summarise()` has grouped output by 'member_casual'. You can override using the `.groups` argument.
Actionable advice:
If users have allowed us to send relevant advertising to them I believe we should:
Identify users with ride lengths longer than 30 min. and inform them about the yearly plan with includes 45 min. free for each ride.
Based on my findings I recommend we look into why casual members use the service for longer ride times than our members.
Dig into why annual members don’t have as long ride times as casual members. I think this is somewhat due to the old pricing models.
What was the purpose of the ride? Joy ride Commute Transport of you and other stuff Other (please specify)
We could organize some fun marketing campaigns in the summer months eg. sign up and get a drink to stay hydrated or an ice cream.