A Little about the case study
This case study is the part of my Google Data Analytics professional certificate that i completed with coursera. As a part of my final course I will use the R progaramming language.
The case study requires follwing the six crucial steps of the data analysis process: Ask, Prepare, Process, Analyze, Share, and Act.
Scenario
You are a junior data analyst working on the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore, your team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, your team will design a new marketing strategy to convert casual riders into annual members. But first, Cyclistic executives must approve your recommendations, so they must be backed up with compelling data insights and professional data visualizations
Ask Three questions will guide the future marketing program: 1. How do annual members and casual riders use Cyclistic bikes differently? 2. Why would casual riders buy Cyclistic annual memberships? 3. How can Cyclistic use digital media to influence casual riders to become members?
Assigned Problem statement by my manager 1. How do annual members and casual riders use Cyclistic bikes differently?
Key tasks
Identify the business task ->The business objective to understand the key difference between bike usage among annual and casual riders to maximize profit by converting casual riders to annual members.
Consider key stakeholders ->The key stakeholders are the Director of Marketing (Lily Moreno), Marketing Analytics team, and Executive team.
Deliverable 1. A clear statement of the business task ->”our goal is to identify the difference between usage of cyclists bikes in annual and casual riders”
Prepare
The data sources that i used has been made available by Motivate International Inc. under this license. Datasets are available here previous 12 months of data.
Key tasks 1.Download data and store it appropriately. ->The data is downlaoded from the provided source and is stored properly.
2.Identify how it’s organized. ->The data is in CSV format and therefore two folders have been created. one for .XLSX and another for CSV files for future refernce.The data is organized with multiple columns containing riders trip information including trip_duration, start_station_id, end_station, bike_type, . The dataset has ride_id as the primary key.
3.Sort and filter the data. For this case study, I will be using the data for the year 2023.
Deliverable 1. A description of all teh data sources used?
->The main source of all the data used was provided by the cyclistic company
Process installing and loading packages
options(repos = c(CRAN = "https://cran.r-project.org"))
install.packages("tidyverse", repos = "https://cran.r-project.org")
## Installing package into 'C:/Users/smart/AppData/Local/R/win-library/4.3'
## (as 'lib' is unspecified)
## package 'tidyverse' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\smart\AppData\Local\Temp\RtmpGqSsQ3\downloaded_packages
library("tidyverse")
## Warning: package 'tidyverse' was built under R version 4.3.3
## Warning: package 'ggplot2' was built under R version 4.3.3
## Warning: package 'tibble' was built under R version 4.3.3
## Warning: package 'tidyr' was built under R version 4.3.3
## Warning: package 'readr' was built under R version 4.3.3
## Warning: package 'purrr' was built under R version 4.3.3
## Warning: package 'dplyr' was built under R version 4.3.3
## Warning: package 'stringr' was built under R version 4.3.3
## Warning: package 'forcats' was built under R version 4.3.3
## Warning: package 'lubridate' was built under R version 4.3.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
install.packages("dplyr")
## Warning: package 'dplyr' is in use and will not be installed
library("dplyr")
install.packages("skimr")
## Installing package into 'C:/Users/smart/AppData/Local/R/win-library/4.3'
## (as 'lib' is unspecified)
## package 'skimr' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\smart\AppData\Local\Temp\RtmpGqSsQ3\downloaded_packages
library(skimr)
## Warning: package 'skimr' was built under R version 4.3.3
install.packages("ggplot2")
## Warning: package 'ggplot2' is in use and will not be installed
library(ggplot2)
install.packages("janitor")
## Installing package into 'C:/Users/smart/AppData/Local/R/win-library/4.3'
## (as 'lib' is unspecified)
## package 'janitor' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\smart\AppData\Local\Temp\RtmpGqSsQ3\downloaded_packages
library(janitor)
## Warning: package 'janitor' was built under R version 4.3.3
##
## Attaching package: 'janitor'
##
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
importing and loading the data in R
For the case study i am using 12 months of data from year 2023.
read_csv("202301-divvy-tripdata.csv")
## Rows: 190301 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## # A tibble: 190,301 × 13
## ride_id rideable_type started_at ended_at
## <chr> <chr> <dttm> <dttm>
## 1 F96D5A74A3E41399 electric_bike 2023-01-21 20:05:42 2023-01-21 20:16:33
## 2 13CB7EB698CEDB88 classic_bike 2023-01-10 15:37:36 2023-01-10 15:46:05
## 3 BD88A2E670661CE5 electric_bike 2023-01-02 07:51:57 2023-01-02 08:05:11
## 4 C90792D034FED968 classic_bike 2023-01-22 10:52:58 2023-01-22 11:01:44
## 5 3397017529188E8A classic_bike 2023-01-12 13:58:01 2023-01-12 14:13:20
## 6 58E68156DAE3E311 electric_bike 2023-01-31 07:18:03 2023-01-31 07:21:16
## 7 2F7194B6012A98D4 electric_bike 2023-01-15 21:18:36 2023-01-15 21:32:36
## 8 DB1CF84154D6A049 classic_bike 2023-01-25 10:49:01 2023-01-25 10:58:22
## 9 34EAB943F88C4C5D electric_bike 2023-01-25 20:49:47 2023-01-25 21:02:14
## 10 BC8AB1AA51DA9115 classic_bike 2023-01-06 16:37:19 2023-01-06 16:49:52
## # ℹ 190,291 more rows
## # ℹ 9 more variables: start_station_name <chr>, start_station_id <chr>,
## # end_station_name <chr>, end_station_id <chr>, start_lat <dbl>,
## # start_lng <dbl>, end_lat <dbl>, end_lng <dbl>, member_casual <chr>
jan_trips <- read_csv("202301-divvy-tripdata.csv")
## Rows: 190301 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
read_csv("202302-divvy-tripdata.csv")
## Rows: 190445 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## # A tibble: 190,445 × 13
## ride_id rideable_type started_at ended_at
## <chr> <chr> <dttm> <dttm>
## 1 CBCD0D7777F0E45F classic_bike 2023-02-14 11:59:42 2023-02-14 12:13:38
## 2 F3EC5FCE5FF39DE9 electric_bike 2023-02-15 13:53:48 2023-02-15 13:59:08
## 3 E54C1F27FA9354FF classic_bike 2023-02-19 11:10:57 2023-02-19 11:35:01
## 4 3D561E04F739CC45 electric_bike 2023-02-26 16:12:05 2023-02-26 16:39:55
## 5 0CB4B4D53B2DBE05 electric_bike 2023-02-20 11:55:23 2023-02-20 12:05:48
## 6 C67EB62172C472EB classic_bike 2023-02-24 18:50:16 2023-02-24 18:56:40
## 7 08A1E9326F68ACF7 classic_bike 2023-02-28 12:58:03 2023-02-28 13:03:33
## 8 904C61FB3984A60E classic_bike 2023-02-27 20:26:01 2023-02-27 20:31:24
## 9 A96A6DA2D96544E6 classic_bike 2023-02-08 19:56:36 2023-02-08 20:02:22
## 10 DA895AE47787D208 classic_bike 2023-02-21 18:52:20 2023-02-21 18:57:57
## # ℹ 190,435 more rows
## # ℹ 9 more variables: start_station_name <chr>, start_station_id <chr>,
## # end_station_name <chr>, end_station_id <chr>, start_lat <dbl>,
## # start_lng <dbl>, end_lat <dbl>, end_lng <dbl>, member_casual <chr>
feb_trips <- read_csv("202302-divvy-tripdata.csv")
## Rows: 190445 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
read_csv("202303-divvy-tripdata.csv")
## Rows: 258678 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## # A tibble: 258,678 × 13
## ride_id rideable_type started_at ended_at
## <chr> <chr> <dttm> <dttm>
## 1 6842AA605EE9FBB3 electric_bike 2023-03-16 08:20:34 2023-03-16 08:22:52
## 2 F984267A75B99A8C electric_bike 2023-03-04 14:07:06 2023-03-04 14:15:31
## 3 FF7CF57CFE026D02 classic_bike 2023-03-31 12:28:09 2023-03-31 12:38:47
## 4 6B61B916032CB6D6 classic_bike 2023-03-22 14:09:08 2023-03-22 14:24:51
## 5 E55E61A5F1260040 electric_bike 2023-03-09 07:15:00 2023-03-09 07:26:00
## 6 123AAD676850F53C classic_bike 2023-03-22 17:47:02 2023-03-22 18:01:29
## 7 5929D3080983AF4F classic_bike 2023-03-08 19:58:44 2023-03-08 20:05:39
## 8 B2624BAEDDDA3FB1 docked_bike 2023-03-22 17:28:24 2023-03-22 17:50:24
## 9 979C41EAC356278F classic_bike 2023-03-16 19:31:14 2023-03-16 19:41:01
## 10 6C1DCA9593CA8F5F classic_bike 2023-03-16 17:33:50 2023-03-16 17:45:47
## # ℹ 258,668 more rows
## # ℹ 9 more variables: start_station_name <chr>, start_station_id <chr>,
## # end_station_name <chr>, end_station_id <chr>, start_lat <dbl>,
## # start_lng <dbl>, end_lat <dbl>, end_lng <dbl>, member_casual <chr>
march_trips <- read_csv("202303-divvy-tripdata.csv")
## Rows: 258678 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
read_csv("202304-divvy-tripdata.csv")
## Rows: 426590 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## # A tibble: 426,590 × 13
## ride_id rideable_type started_at ended_at
## <chr> <chr> <dttm> <dttm>
## 1 8FE8F7D9C10E88C7 electric_bike 2023-04-02 08:37:28 2023-04-02 08:41:37
## 2 34E4ED3ADF1D821B electric_bike 2023-04-19 11:29:02 2023-04-19 11:52:12
## 3 5296BF07A2F77CB5 electric_bike 2023-04-19 08:41:22 2023-04-19 08:43:22
## 4 40759916B76D5D52 electric_bike 2023-04-19 13:31:30 2023-04-19 13:35:09
## 5 77A96F460101AC63 electric_bike 2023-04-19 12:05:36 2023-04-19 12:10:26
## 6 8D6A2328E19DC168 electric_bike 2023-04-19 12:17:34 2023-04-19 12:21:38
## 7 C97BBA66E07889F9 electric_bike 2023-04-19 09:35:48 2023-04-19 09:45:00
## 8 6687AD4C575FF734 electric_bike 2023-04-11 16:13:43 2023-04-11 16:18:41
## 9 A8FA4F73B22BC11F electric_bike 2023-04-11 16:29:24 2023-04-11 16:40:23
## 10 81E158FE63D99994 electric_bike 2023-04-19 17:35:40 2023-04-19 17:36:11
## # ℹ 426,580 more rows
## # ℹ 9 more variables: start_station_name <chr>, start_station_id <chr>,
## # end_station_name <chr>, end_station_id <chr>, start_lat <dbl>,
## # start_lng <dbl>, end_lat <dbl>, end_lng <dbl>, member_casual <chr>
april_trips <- read_csv("202304-divvy-tripdata.csv")
## Rows: 426590 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
read_csv("202305-divvy-tripdata.csv")
## Rows: 604827 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## # A tibble: 604,827 × 13
## ride_id rideable_type started_at ended_at
## <chr> <chr> <dttm> <dttm>
## 1 0D9FA920C3062031 electric_bike 2023-05-07 19:53:48 2023-05-07 19:58:32
## 2 92485E5FB5888ACD electric_bike 2023-05-06 18:54:08 2023-05-06 19:03:35
## 3 FB144B3FC8300187 electric_bike 2023-05-21 00:40:21 2023-05-21 00:44:36
## 4 DDEB93BC2CE9AA77 classic_bike 2023-05-10 16:47:01 2023-05-10 16:59:52
## 5 C07B70172FC92F59 classic_bike 2023-05-09 18:30:34 2023-05-09 18:39:28
## 6 2BA66385DF8F815A classic_bike 2023-05-30 15:01:21 2023-05-30 15:17:00
## 7 31EFCCB05F12D8EF docked_bike 2023-05-09 14:13:40 2023-05-09 14:47:20
## 8 71DFF834E1D3CE0B classic_bike 2023-05-06 16:47:22 2023-05-06 16:52:13
## 9 2117485899B4CEA4 classic_bike 2023-05-15 12:47:26 2023-05-15 13:00:05
## 10 811149F69AAE82DD electric_bike 2023-05-19 05:44:26 2023-05-19 05:47:24
## # ℹ 604,817 more rows
## # ℹ 9 more variables: start_station_name <chr>, start_station_id <chr>,
## # end_station_name <chr>, end_station_id <chr>, start_lat <dbl>,
## # start_lng <dbl>, end_lat <dbl>, end_lng <dbl>, member_casual <chr>
may_trips <- read_csv("202305-divvy-tripdata.csv")
## Rows: 604827 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
read_csv("202306-divvy-tripdata.csv")
## Rows: 719618 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## # A tibble: 719,618 × 13
## ride_id rideable_type started_at ended_at
## <chr> <chr> <dttm> <dttm>
## 1 6F1682AC40EB6F71 electric_bike 2023-06-05 13:34:12 2023-06-05 14:31:56
## 2 622A1686D64948EB electric_bike 2023-06-05 01:30:22 2023-06-05 01:33:06
## 3 3C88859D926253B4 electric_bike 2023-06-20 18:15:49 2023-06-20 18:32:05
## 4 EAD8A5E0259DEC88 electric_bike 2023-06-19 14:56:00 2023-06-19 15:00:35
## 5 5A36F21930D6A55C electric_bike 2023-06-19 15:03:34 2023-06-19 15:07:16
## 6 CF682EA7D0F961DB electric_bike 2023-06-09 21:30:25 2023-06-09 21:49:52
## 7 4910FBB710157754 electric_bike 2023-06-03 13:34:09 2023-06-03 13:34:28
## 8 EA19D850A42F56D8 electric_bike 2023-06-03 13:34:46 2023-06-03 13:35:00
## 9 E68F43784662A2D0 electric_bike 2023-06-02 22:27:35 2023-06-02 22:35:26
## 10 5A013E29CC001611 electric_bike 2023-06-02 21:18:31 2023-06-03 01:27:19
## # ℹ 719,608 more rows
## # ℹ 9 more variables: start_station_name <chr>, start_station_id <chr>,
## # end_station_name <chr>, end_station_id <chr>, start_lat <dbl>,
## # start_lng <dbl>, end_lat <dbl>, end_lng <dbl>, member_casual <chr>
june_trips <- read_csv("202306-divvy-tripdata.csv")
## Rows: 719618 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
read_csv("202307-divvy-tripdata.csv")
## Rows: 767650 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## # A tibble: 767,650 × 13
## ride_id rideable_type started_at ended_at
## <chr> <chr> <dttm> <dttm>
## 1 9340B064F0AEE130 electric_bike 2023-07-23 20:06:14 2023-07-23 20:22:44
## 2 D1460EE3CE0D8AF8 classic_bike 2023-07-23 17:05:07 2023-07-23 17:18:37
## 3 DF41BE31B895A25E classic_bike 2023-07-23 10:14:53 2023-07-23 10:24:29
## 4 9624A293749EF703 electric_bike 2023-07-21 08:27:44 2023-07-21 08:32:40
## 5 2F68A6A4CDB4C99A classic_bike 2023-07-08 15:46:42 2023-07-08 15:58:08
## 6 9AEE973E6B941A9C classic_bike 2023-07-10 08:44:47 2023-07-10 08:49:41
## 7 E366E997FDA1582B classic_bike 2023-07-25 14:30:44 2023-07-25 14:37:45
## 8 1BB3E73851E6C2C1 classic_bike 2023-07-07 10:11:53 2023-07-07 10:17:55
## 9 DA1E1D0866E6566E electric_bike 2023-07-04 21:57:27 2023-07-04 22:08:27
## 10 39BF4A73A704CA85 classic_bike 2023-07-29 10:51:17 2023-07-29 11:03:13
## # ℹ 767,640 more rows
## # ℹ 9 more variables: start_station_name <chr>, start_station_id <chr>,
## # end_station_name <chr>, end_station_id <chr>, start_lat <dbl>,
## # start_lng <dbl>, end_lat <dbl>, end_lng <dbl>, member_casual <chr>
july_trips <- read_csv("202307-divvy-tripdata.csv")
## Rows: 767650 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
read_csv("202308-divvy-tripdata.csv")
## Rows: 771693 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## # A tibble: 771,693 × 13
## ride_id rideable_type started_at ended_at
## <chr> <chr> <dttm> <dttm>
## 1 903C30C2D810A53B electric_bike 2023-08-19 15:41:53 2023-08-19 15:53:36
## 2 F2FB18A98E110A2B electric_bike 2023-08-18 15:30:18 2023-08-18 15:45:25
## 3 D0DEC7C94E4663DA electric_bike 2023-08-30 16:15:08 2023-08-30 16:27:37
## 4 E0DDDC5F84747ED9 electric_bike 2023-08-30 16:24:07 2023-08-30 16:33:34
## 5 7797A4874BA260CA electric_bike 2023-08-22 15:59:44 2023-08-22 16:20:38
## 6 DF4DE734EBC4DF66 electric_bike 2023-08-24 12:27:24 2023-08-24 12:54:59
## 7 EE60FB066E69AFAC electric_bike 2023-08-31 20:42:14 2023-08-31 20:54:38
## 8 A115DA6AA13DE5EF electric_bike 2023-08-17 15:15:51 2023-08-17 15:22:27
## 9 86DBB19374245893 electric_bike 2023-08-24 21:37:19 2023-08-24 21:47:22
## 10 2905CBC8B8EE392C electric_bike 2023-08-28 14:53:38 2023-08-28 14:59:35
## # ℹ 771,683 more rows
## # ℹ 9 more variables: start_station_name <chr>, start_station_id <chr>,
## # end_station_name <chr>, end_station_id <chr>, start_lat <dbl>,
## # start_lng <dbl>, end_lat <dbl>, end_lng <dbl>, member_casual <chr>
august_trips <- read_csv("202308-divvy-tripdata.csv")
## Rows: 771693 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
read_csv("202309-divvy-tripdata.csv")
## Rows: 666371 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## # A tibble: 666,371 × 13
## ride_id rideable_type started_at ended_at
## <chr> <chr> <dttm> <dttm>
## 1 011C1903BF4E2E28 classic_bike 2023-09-23 00:27:50 2023-09-23 00:33:27
## 2 87DB80E048A1BF9F classic_bike 2023-09-02 09:26:43 2023-09-02 09:38:19
## 3 7C2EB7AF669066E3 electric_bike 2023-09-25 18:30:11 2023-09-25 18:41:39
## 4 57D197B010269CE3 classic_bike 2023-09-13 15:30:49 2023-09-13 15:39:18
## 5 8A2CEA7C8C8074D8 classic_bike 2023-09-18 15:58:58 2023-09-18 16:05:04
## 6 03F7044D1304CD58 electric_bike 2023-09-15 20:19:25 2023-09-15 20:30:27
## 7 672503E0FC0835EC electric_bike 2023-09-27 16:52:18 2023-09-27 17:03:22
## 8 1D806492F95973AC electric_bike 2023-09-17 11:07:05 2023-09-17 11:13:39
## 9 40D9EF382CC6C53D classic_bike 2023-09-17 11:58:50 2023-09-17 12:08:36
## 10 C60CE661AF7ECC93 electric_bike 2023-09-07 20:52:43 2023-09-07 21:06:51
## # ℹ 666,361 more rows
## # ℹ 9 more variables: start_station_name <chr>, start_station_id <chr>,
## # end_station_name <chr>, end_station_id <chr>, start_lat <dbl>,
## # start_lng <dbl>, end_lat <dbl>, end_lng <dbl>, member_casual <chr>
september_trips <- read_csv("202309-divvy-tripdata.csv")
## Rows: 666371 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
read_csv("202310-divvy-tripdata.csv")
## Rows: 537113 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## # A tibble: 537,113 × 13
## ride_id rideable_type started_at ended_at
## <chr> <chr> <dttm> <dttm>
## 1 4449097279F8BBE7 classic_bike 2023-10-08 10:36:26 2023-10-08 10:49:19
## 2 9CF060543CA7B439 electric_bike 2023-10-11 17:23:59 2023-10-11 17:36:08
## 3 667F21F4D6BDE69C electric_bike 2023-10-12 07:02:33 2023-10-12 07:06:53
## 4 F92714CC6B019B96 classic_bike 2023-10-24 19:13:03 2023-10-24 19:18:29
## 5 5E34BA5DE945A9CC classic_bike 2023-10-09 18:19:26 2023-10-09 18:30:56
## 6 F7D7420AFAC53CD9 electric_bike 2023-10-04 17:10:59 2023-10-04 17:25:21
## 7 870B2D4CD112D7B7 electric_bike 2023-10-31 17:32:20 2023-10-31 17:44:20
## 8 D9179D36E32D456C classic_bike 2023-10-02 18:51:51 2023-10-02 18:57:09
## 9 F8E131281F722FEF classic_bike 2023-10-17 08:28:18 2023-10-17 08:50:03
## 10 91938B71748FA405 classic_bike 2023-10-17 19:17:38 2023-10-17 19:32:23
## # ℹ 537,103 more rows
## # ℹ 9 more variables: start_station_name <chr>, start_station_id <chr>,
## # end_station_name <chr>, end_station_id <chr>, start_lat <dbl>,
## # start_lng <dbl>, end_lat <dbl>, end_lng <dbl>, member_casual <chr>
october_trips <- read_csv("202310-divvy-tripdata.csv")
## Rows: 537113 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
read_csv("202311-divvy-tripdata.csv")
## Rows: 362518 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## # A tibble: 362,518 × 13
## ride_id rideable_type started_at ended_at
## <chr> <chr> <dttm> <dttm>
## 1 4EAD8F1AD547356B electric_bike 2023-11-30 21:50:05 2023-11-30 22:13:27
## 2 6322270563BF5470 electric_bike 2023-11-03 09:44:02 2023-11-03 10:17:15
## 3 B37BDE091ECA38E0 electric_bike 2023-11-30 11:39:44 2023-11-30 11:40:08
## 4 CF0CA5DD26E4F90E classic_bike 2023-11-08 10:01:45 2023-11-08 10:27:05
## 5 EB8381AA641348DB classic_bike 2023-11-03 16:20:25 2023-11-03 16:54:25
## 6 B8CF14EA423D6886 electric_bike 2023-11-30 16:15:53 2023-11-30 16:39:52
## 7 1763B0A2778C185E classic_bike 2023-11-09 11:55:54 2023-11-09 13:08:18
## 8 8307B5F616A3D2EE classic_bike 2023-11-19 14:37:02 2023-11-19 14:59:07
## 9 90B4E47C4977935E classic_bike 2023-11-19 15:12:54 2023-11-19 15:27:50
## 10 A9A78F624F996079 classic_bike 2023-11-09 19:34:57 2023-11-09 19:37:53
## # ℹ 362,508 more rows
## # ℹ 9 more variables: start_station_name <chr>, start_station_id <chr>,
## # end_station_name <chr>, end_station_id <chr>, start_lat <dbl>,
## # start_lng <dbl>, end_lat <dbl>, end_lng <dbl>, member_casual <chr>
november_trips <- read_csv("202311-divvy-tripdata.csv")
## Rows: 362518 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
read_csv("202312-divvy-tripdata.csv")
## Rows: 224073 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## # A tibble: 224,073 × 13
## ride_id rideable_type started_at ended_at
## <chr> <chr> <dttm> <dttm>
## 1 C9BD54F578F57246 electric_bike 2023-12-02 18:44:01 2023-12-02 18:47:51
## 2 CDBD92F067FA620E electric_bike 2023-12-02 18:48:19 2023-12-02 18:54:48
## 3 ABC0858E52CBFC84 electric_bike 2023-12-24 01:56:32 2023-12-24 02:04:09
## 4 F44B6F0E8F76DC90 electric_bike 2023-12-24 10:58:12 2023-12-24 11:03:04
## 5 3C876413281A90DF electric_bike 2023-12-24 12:43:16 2023-12-24 12:44:57
## 6 28C0D6EFB81E1769 electric_bike 2023-12-24 13:59:57 2023-12-24 14:10:57
## 7 8A38729DE7B2FAFE electric_bike 2023-12-24 09:01:58 2023-12-24 09:07:51
## 8 19FD7AA9B32E12AD electric_bike 2023-12-24 08:21:38 2023-12-24 08:27:09
## 9 055C15FE4A207408 electric_bike 2023-12-11 18:17:46 2023-12-11 18:22:43
## 10 A73B25A7D94889C9 electric_bike 2023-12-03 06:05:56 2023-12-03 06:06:06
## # ℹ 224,063 more rows
## # ℹ 9 more variables: start_station_name <chr>, start_station_id <chr>,
## # end_station_name <chr>, end_station_id <chr>, start_lat <dbl>,
## # start_lng <dbl>, end_lat <dbl>, end_lng <dbl>, member_casual <chr>
december_trips <- read_csv("202312-divvy-tripdata.csv")
## Rows: 224073 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
combing the cyclistic trip data for seprate months into one dataframe name combined_trips
combined_trips <- rbind(jan_trips,feb_trips,march_trips,april_trips,may_trips,june_trips,july_trips,august_trips,september_trips,october_trips,november_trips,december_trips)
checking the structure of the new data frame after combining the data
str(combined_trips)
## spc_tbl_ [5,719,877 × 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ ride_id : chr [1:5719877] "F96D5A74A3E41399" "13CB7EB698CEDB88" "BD88A2E670661CE5" "C90792D034FED968" ...
## $ rideable_type : chr [1:5719877] "electric_bike" "classic_bike" "electric_bike" "classic_bike" ...
## $ started_at : POSIXct[1:5719877], format: "2023-01-21 20:05:42" "2023-01-10 15:37:36" ...
## $ ended_at : POSIXct[1:5719877], format: "2023-01-21 20:16:33" "2023-01-10 15:46:05" ...
## $ start_station_name: chr [1:5719877] "Lincoln Ave & Fullerton Ave" "Kimbark Ave & 53rd St" "Western Ave & Lunt Ave" "Kimbark Ave & 53rd St" ...
## $ start_station_id : chr [1:5719877] "TA1309000058" "TA1309000037" "RP-005" "TA1309000037" ...
## $ end_station_name : chr [1:5719877] "Hampden Ct & Diversey Ave" "Greenwood Ave & 47th St" "Valli Produce - Evanston Plaza" "Greenwood Ave & 47th St" ...
## $ end_station_id : chr [1:5719877] "202480.0" "TA1308000002" "599" "TA1308000002" ...
## $ start_lat : num [1:5719877] 41.9 41.8 42 41.8 41.8 ...
## $ start_lng : num [1:5719877] -87.6 -87.6 -87.7 -87.6 -87.6 ...
## $ end_lat : num [1:5719877] 41.9 41.8 42 41.8 41.8 ...
## $ end_lng : num [1:5719877] -87.6 -87.6 -87.7 -87.6 -87.6 ...
## $ member_casual : chr [1:5719877] "member" "member" "casual" "member" ...
## - attr(*, "spec")=
## .. cols(
## .. ride_id = col_character(),
## .. rideable_type = col_character(),
## .. started_at = col_datetime(format = ""),
## .. ended_at = col_datetime(format = ""),
## .. start_station_name = col_character(),
## .. start_station_id = col_character(),
## .. end_station_name = col_character(),
## .. end_station_id = col_character(),
## .. start_lat = col_double(),
## .. start_lng = col_double(),
## .. end_lat = col_double(),
## .. end_lng = col_double(),
## .. member_casual = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
also checking for the first 10 rows
as_tibble(combined_trips)
## # A tibble: 5,719,877 × 13
## ride_id rideable_type started_at ended_at
## <chr> <chr> <dttm> <dttm>
## 1 F96D5A74A3E41399 electric_bike 2023-01-21 20:05:42 2023-01-21 20:16:33
## 2 13CB7EB698CEDB88 classic_bike 2023-01-10 15:37:36 2023-01-10 15:46:05
## 3 BD88A2E670661CE5 electric_bike 2023-01-02 07:51:57 2023-01-02 08:05:11
## 4 C90792D034FED968 classic_bike 2023-01-22 10:52:58 2023-01-22 11:01:44
## 5 3397017529188E8A classic_bike 2023-01-12 13:58:01 2023-01-12 14:13:20
## 6 58E68156DAE3E311 electric_bike 2023-01-31 07:18:03 2023-01-31 07:21:16
## 7 2F7194B6012A98D4 electric_bike 2023-01-15 21:18:36 2023-01-15 21:32:36
## 8 DB1CF84154D6A049 classic_bike 2023-01-25 10:49:01 2023-01-25 10:58:22
## 9 34EAB943F88C4C5D electric_bike 2023-01-25 20:49:47 2023-01-25 21:02:14
## 10 BC8AB1AA51DA9115 classic_bike 2023-01-06 16:37:19 2023-01-06 16:49:52
## # ℹ 5,719,867 more rows
## # ℹ 9 more variables: start_station_name <chr>, start_station_id <chr>,
## # end_station_name <chr>, end_station_id <chr>, start_lat <dbl>,
## # start_lng <dbl>, end_lat <dbl>, end_lng <dbl>, member_casual <chr>
changing the format of start_date and end_date as they are in chr format changing the start_date and_end date to date and time format
combined_trips$started_at = strptime(combined_trips$started_at,"%Y-%m-%d %H:%M:%S")
combined_trips$ended_at = strptime(combined_trips$ended_at,"%Y-%m-%d %H:%M:%S")
checking if the date and time format applied
str(combined_trips)
## spc_tbl_ [5,719,877 × 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ ride_id : chr [1:5719877] "F96D5A74A3E41399" "13CB7EB698CEDB88" "BD88A2E670661CE5" "C90792D034FED968" ...
## $ rideable_type : chr [1:5719877] "electric_bike" "classic_bike" "electric_bike" "classic_bike" ...
## $ started_at : POSIXlt[1:5719877], format: "2023-01-21 20:05:42" "2023-01-10 15:37:36" ...
## $ ended_at : POSIXlt[1:5719877], format: "2023-01-21 20:16:33" "2023-01-10 15:46:05" ...
## $ start_station_name: chr [1:5719877] "Lincoln Ave & Fullerton Ave" "Kimbark Ave & 53rd St" "Western Ave & Lunt Ave" "Kimbark Ave & 53rd St" ...
## $ start_station_id : chr [1:5719877] "TA1309000058" "TA1309000037" "RP-005" "TA1309000037" ...
## $ end_station_name : chr [1:5719877] "Hampden Ct & Diversey Ave" "Greenwood Ave & 47th St" "Valli Produce - Evanston Plaza" "Greenwood Ave & 47th St" ...
## $ end_station_id : chr [1:5719877] "202480.0" "TA1308000002" "599" "TA1308000002" ...
## $ start_lat : num [1:5719877] 41.9 41.8 42 41.8 41.8 ...
## $ start_lng : num [1:5719877] -87.6 -87.6 -87.7 -87.6 -87.6 ...
## $ end_lat : num [1:5719877] 41.9 41.8 42 41.8 41.8 ...
## $ end_lng : num [1:5719877] -87.6 -87.6 -87.7 -87.6 -87.6 ...
## $ member_casual : chr [1:5719877] "member" "member" "casual" "member" ...
## - attr(*, "spec")=
## .. cols(
## .. ride_id = col_character(),
## .. rideable_type = col_character(),
## .. started_at = col_datetime(format = ""),
## .. ended_at = col_datetime(format = ""),
## .. start_station_name = col_character(),
## .. start_station_id = col_character(),
## .. end_station_name = col_character(),
## .. end_station_id = col_character(),
## .. start_lat = col_double(),
## .. start_lng = col_double(),
## .. end_lat = col_double(),
## .. end_lng = col_double(),
## .. member_casual = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
now making the data ready for analysis and Adding a column for calculating the ride_length and day_of the week
combined_trips<-mutate(combined_trips,ride_length=difftime(ended_at,started_at, units = "secs"))
combined_trips$day_of_week<-format(as.Date(combined_trips$started_at),"%A")
now seeing the column created
head(combined_trips)
## # A tibble: 6 × 15
## ride_id rideable_type started_at ended_at
## <chr> <chr> <dttm> <dttm>
## 1 F96D5A74A3E41399 electric_bike 2023-01-21 20:05:42 2023-01-21 20:16:33
## 2 13CB7EB698CEDB88 classic_bike 2023-01-10 15:37:36 2023-01-10 15:46:05
## 3 BD88A2E670661CE5 electric_bike 2023-01-02 07:51:57 2023-01-02 08:05:11
## 4 C90792D034FED968 classic_bike 2023-01-22 10:52:58 2023-01-22 11:01:44
## 5 3397017529188E8A classic_bike 2023-01-12 13:58:01 2023-01-12 14:13:20
## 6 58E68156DAE3E311 electric_bike 2023-01-31 07:18:03 2023-01-31 07:21:16
## # ℹ 11 more variables: start_station_name <chr>, start_station_id <chr>,
## # end_station_name <chr>, end_station_id <chr>, start_lat <dbl>,
## # start_lng <dbl>, end_lat <dbl>, end_lng <dbl>, member_casual <chr>,
## # ride_length <drtn>, day_of_week <chr>
filtering out values that have ride_length lesss then 0 secs. we don’t want those to be counted. the code will count include only those values that have ride_length greater then 0.
combined_trips <- filter(combined_trips,combined_trips$ride_length>0)
after filtering the outliers we can remove any null/missing or blank values that may alter the analysis
combined_trips <-combined_trips%>%
na.omit()
we can add another column for analysis by adding the month column to identify the month
combined_trips$month<-format(as.Date(combined_trips$started_at),"%m")
Analyzing the cleaned data
determining the average ride_length for member_casual
combined_trips %>%
group_by(member_casual) %>%
summarise(average_ride_length=mean(ride_length))
## # A tibble: 2 × 2
## member_casual average_ride_length
## <chr> <drtn>
## 1 casual 1376.4738 secs
## 2 member 727.9772 secs
similarly determining the median, min, max ride length and total_rides for members and casual
combined_trips %>%
group_by(member_casual) %>%
summarise(median_ride_length=median(ride_length), min_ride_length=min(ride_length), max_ride_length=max(ride_length), total_rides=length(ride_id))
## # A tibble: 2 × 5
## member_casual median_ride_length min_ride_length max_ride_length total_rides
## <chr> <drtn> <drtn> <drtn> <int>
## 1 casual 765 secs 1 secs 728178 secs 1531517
## 2 member 517 secs 1 secs 89872 secs 2799589
calculating the average_ride_length and total_rides by member_casual and day_of_the_week
combined_trips %>%
group_by(member_casual,day_of_week) %>%
summarise(average_ride_length = mean(ride_length), total_rides = length(ride_id))
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
## # A tibble: 14 × 4
## # Groups: member_casual [2]
## member_casual day_of_week average_ride_length total_rides
## <chr> <chr> <drtn> <int>
## 1 casual Friday 1339.3150 secs 227826
## 2 casual Monday 1352.1903 secs 175381
## 3 casual Saturday 1555.1620 secs 310123
## 4 casual Sunday 1594.4702 secs 254710
## 5 casual Thursday 1199.9962 secs 198904
## 6 casual Tuesday 1230.8860 secs 181510
## 7 casual Wednesday 1176.0574 secs 183063
## 8 member Friday 722.4142 secs 400467
## 9 member Monday 693.0606 secs 386648
## 10 member Saturday 815.0967 secs 350592
## 11 member Sunday 817.0324 secs 307818
## 12 member Thursday 696.1708 secs 452609
## 13 member Tuesday 698.9975 secs 448778
## 14 member Wednesday 695.2239 secs 452677
sharing phase of analysis
sharing the findings by comparing the total number of rides among casual and member riders
combined_trips %>%
group_by(member_casual) %>%
summarise(total_rides=length(ride_id)) %>%
ggplot(mapping = aes(x=member_casual, y=total_rides, fill=member_casual,))+geom_col()
saving the currently created plot
ggsave(".png")
## Saving 7 x 5 in image
visualizing the total_rides taken by the member and casuals on different days of the week
combined_trips %>%
group_by(member_casual,day_of_week) %>%
summarise(total_rides=length(ride_id)) %>%
ggplot(mapping = aes(x=day_of_week, y=total_rides, fill=member_casual))+geom_col(width = 0.5, position = position_dodge(width = 0.5))+
labs(title="total_rides of member and casuals vs days of the week")
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
visualizing the average_ride_length of member and casuals on different days of the week
combined_trips %>%
group_by(member_casual,day_of_week) %>%
summarise(average_ride_length=mean(ride_length)) %>%
ggplot(mapping = aes(x=day_of_week,y=average_ride_length, fill=member_casual)) +geom_col(width = 0.5, position = position_dodge(width = 0.5))+
labs(title="average_ride_length vs days of the week for member_casual")
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
## Don't know how to automatically pick scale for object of type <difftime>.
## Defaulting to continuous.
checking for the usage of rideable type among riders
combined_trips %>%
group_by(member_casual,rideable_type) %>%
summarise(average_ride_length=mean(ride_length)) %>%
ggplot(mapping = aes(x=rideable_type, y=average_ride_length, fill=member_casual))+geom_col()+
labs(title="average_ride_length vs rideable_type")
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
## Don't know how to automatically pick scale for object of type <difftime>.
## Defaulting to continuous.
ACT PHASE sharing findings with the executive team and manager
we discovered that the casual riders have average_ride_length higher then those of members and seems to increase more over the weekends. Therefore, it is concluded that casual riders use bikes for leisure activities in weekends
a special offer like special price for weekend rides for members can attract potenial casual members to switch membership to anual.
during weekdays the average_ride_length seems to be consistent for casual riders. therefore, it can be believed that causal riders use bikes for work.
Therefore, a weekly pass could attract casual riders to apply for membership.
it is also noticed that the casual riders may prefer classic bike over the electric ones. however, there may also be outliers for average_ride_length for docked bikes