cyclist case study

A Little about the case study

This case study is the part of my Google Data Analytics professional certificate that i completed with coursera. As a part of my final course I will use the R progaramming language.

The case study requires follwing the six crucial steps of the data analysis process: Ask, Prepare, Process, Analyze, Share, and Act.

Scenario

You are a junior data analyst working on the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore, your team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, your team will design a new marketing strategy to convert casual riders into annual members. But first, Cyclistic executives must approve your recommendations, so they must be backed up with compelling data insights and professional data visualizations

Ask Three questions will guide the future marketing program: 1. How do annual members and casual riders use Cyclistic bikes differently? 2. Why would casual riders buy Cyclistic annual memberships? 3. How can Cyclistic use digital media to influence casual riders to become members?

Assigned Problem statement by my manager 1. How do annual members and casual riders use Cyclistic bikes differently?

Key tasks

Identify the business task ->The business objective to understand the key difference between bike usage among annual and casual riders to maximize profit by converting casual riders to annual members.
Consider key stakeholders ->The key stakeholders are the Director of Marketing (Lily Moreno), Marketing Analytics team, and Executive team.

Deliverable 1. A clear statement of the business task ->”our goal is to identify the difference between usage of cyclists bikes in annual and casual riders”

Prepare

The data sources that i used has been made available by Motivate International Inc. under this license. Datasets are available here previous 12 months of data.

Key tasks 1.Download data and store it appropriately. ->The data is downlaoded from the provided source and is stored properly.

2.Identify how it’s organized. ->The data is in CSV format and therefore two folders have been created. one for .XLSX and another for CSV files for future refernce.The data is organized with multiple columns containing riders trip information including trip_duration, start_station_id, end_station, bike_type, . The dataset has ride_id as the primary key.

3.Sort and filter the data. For this case study, I will be using the data for the year 2023.

determine the credibility of data? ->The data is credible as it is from a reliable source, original, comprehensive, current and cited.

Deliverable 1. A description of all teh data sources used?

->The main source of all the data used was provided by the cyclistic company

Process installing and loading packages

options(repos = c(CRAN = "https://cran.r-project.org"))

install.packages("tidyverse", repos = "https://cran.r-project.org")

## Installing package into 'C:/Users/smart/AppData/Local/R/win-library/4.3'
## (as 'lib' is unspecified)

## package 'tidyverse' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\smart\AppData\Local\Temp\RtmpGqSsQ3\downloaded_packages

library("tidyverse")

## Warning: package 'tidyverse' was built under R version 4.3.3

## Warning: package 'ggplot2' was built under R version 4.3.3

## Warning: package 'tibble' was built under R version 4.3.3

## Warning: package 'tidyr' was built under R version 4.3.3

## Warning: package 'readr' was built under R version 4.3.3

## Warning: package 'purrr' was built under R version 4.3.3

## Warning: package 'dplyr' was built under R version 4.3.3

## Warning: package 'stringr' was built under R version 4.3.3

## Warning: package 'forcats' was built under R version 4.3.3

## Warning: package 'lubridate' was built under R version 4.3.3

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

install.packages("dplyr")

## Warning: package 'dplyr' is in use and will not be installed

library("dplyr")
install.packages("skimr")

## Installing package into 'C:/Users/smart/AppData/Local/R/win-library/4.3'
## (as 'lib' is unspecified)

## package 'skimr' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\smart\AppData\Local\Temp\RtmpGqSsQ3\downloaded_packages

library(skimr)

## Warning: package 'skimr' was built under R version 4.3.3

install.packages("ggplot2")

## Warning: package 'ggplot2' is in use and will not be installed

library(ggplot2)
install.packages("janitor")

## Installing package into 'C:/Users/smart/AppData/Local/R/win-library/4.3'
## (as 'lib' is unspecified)

## package 'janitor' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\smart\AppData\Local\Temp\RtmpGqSsQ3\downloaded_packages

library(janitor)

## Warning: package 'janitor' was built under R version 4.3.3

## 
## Attaching package: 'janitor'
## 
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test

importing and loading the data in R

For the case study i am using 12 months of data from year 2023.

read_csv("202301-divvy-tripdata.csv")

## Rows: 190301 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

## # A tibble: 190,301 × 13
##    ride_id          rideable_type started_at          ended_at           
##    <chr>            <chr>         <dttm>              <dttm>             
##  1 F96D5A74A3E41399 electric_bike 2023-01-21 20:05:42 2023-01-21 20:16:33
##  2 13CB7EB698CEDB88 classic_bike  2023-01-10 15:37:36 2023-01-10 15:46:05
##  3 BD88A2E670661CE5 electric_bike 2023-01-02 07:51:57 2023-01-02 08:05:11
##  4 C90792D034FED968 classic_bike  2023-01-22 10:52:58 2023-01-22 11:01:44
##  5 3397017529188E8A classic_bike  2023-01-12 13:58:01 2023-01-12 14:13:20
##  6 58E68156DAE3E311 electric_bike 2023-01-31 07:18:03 2023-01-31 07:21:16
##  7 2F7194B6012A98D4 electric_bike 2023-01-15 21:18:36 2023-01-15 21:32:36
##  8 DB1CF84154D6A049 classic_bike  2023-01-25 10:49:01 2023-01-25 10:58:22
##  9 34EAB943F88C4C5D electric_bike 2023-01-25 20:49:47 2023-01-25 21:02:14
## 10 BC8AB1AA51DA9115 classic_bike  2023-01-06 16:37:19 2023-01-06 16:49:52
## # ℹ 190,291 more rows
## # ℹ 9 more variables: start_station_name <chr>, start_station_id <chr>,
## #   end_station_name <chr>, end_station_id <chr>, start_lat <dbl>,
## #   start_lng <dbl>, end_lat <dbl>, end_lng <dbl>, member_casual <chr>

jan_trips <- read_csv("202301-divvy-tripdata.csv")

## Rows: 190301 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

read_csv("202302-divvy-tripdata.csv")

## Rows: 190445 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

## # A tibble: 190,445 × 13
##    ride_id          rideable_type started_at          ended_at           
##    <chr>            <chr>         <dttm>              <dttm>             
##  1 CBCD0D7777F0E45F classic_bike  2023-02-14 11:59:42 2023-02-14 12:13:38
##  2 F3EC5FCE5FF39DE9 electric_bike 2023-02-15 13:53:48 2023-02-15 13:59:08
##  3 E54C1F27FA9354FF classic_bike  2023-02-19 11:10:57 2023-02-19 11:35:01
##  4 3D561E04F739CC45 electric_bike 2023-02-26 16:12:05 2023-02-26 16:39:55
##  5 0CB4B4D53B2DBE05 electric_bike 2023-02-20 11:55:23 2023-02-20 12:05:48
##  6 C67EB62172C472EB classic_bike  2023-02-24 18:50:16 2023-02-24 18:56:40
##  7 08A1E9326F68ACF7 classic_bike  2023-02-28 12:58:03 2023-02-28 13:03:33
##  8 904C61FB3984A60E classic_bike  2023-02-27 20:26:01 2023-02-27 20:31:24
##  9 A96A6DA2D96544E6 classic_bike  2023-02-08 19:56:36 2023-02-08 20:02:22
## 10 DA895AE47787D208 classic_bike  2023-02-21 18:52:20 2023-02-21 18:57:57
## # ℹ 190,435 more rows
## # ℹ 9 more variables: start_station_name <chr>, start_station_id <chr>,
## #   end_station_name <chr>, end_station_id <chr>, start_lat <dbl>,
## #   start_lng <dbl>, end_lat <dbl>, end_lng <dbl>, member_casual <chr>

feb_trips <- read_csv("202302-divvy-tripdata.csv")

## Rows: 190445 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

read_csv("202303-divvy-tripdata.csv")

## Rows: 258678 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

## # A tibble: 258,678 × 13
##    ride_id          rideable_type started_at          ended_at           
##    <chr>            <chr>         <dttm>              <dttm>             
##  1 6842AA605EE9FBB3 electric_bike 2023-03-16 08:20:34 2023-03-16 08:22:52
##  2 F984267A75B99A8C electric_bike 2023-03-04 14:07:06 2023-03-04 14:15:31
##  3 FF7CF57CFE026D02 classic_bike  2023-03-31 12:28:09 2023-03-31 12:38:47
##  4 6B61B916032CB6D6 classic_bike  2023-03-22 14:09:08 2023-03-22 14:24:51
##  5 E55E61A5F1260040 electric_bike 2023-03-09 07:15:00 2023-03-09 07:26:00
##  6 123AAD676850F53C classic_bike  2023-03-22 17:47:02 2023-03-22 18:01:29
##  7 5929D3080983AF4F classic_bike  2023-03-08 19:58:44 2023-03-08 20:05:39
##  8 B2624BAEDDDA3FB1 docked_bike   2023-03-22 17:28:24 2023-03-22 17:50:24
##  9 979C41EAC356278F classic_bike  2023-03-16 19:31:14 2023-03-16 19:41:01
## 10 6C1DCA9593CA8F5F classic_bike  2023-03-16 17:33:50 2023-03-16 17:45:47
## # ℹ 258,668 more rows
## # ℹ 9 more variables: start_station_name <chr>, start_station_id <chr>,
## #   end_station_name <chr>, end_station_id <chr>, start_lat <dbl>,
## #   start_lng <dbl>, end_lat <dbl>, end_lng <dbl>, member_casual <chr>

march_trips <- read_csv("202303-divvy-tripdata.csv")

## Rows: 258678 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

read_csv("202304-divvy-tripdata.csv")

## Rows: 426590 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

## # A tibble: 426,590 × 13
##    ride_id          rideable_type started_at          ended_at           
##    <chr>            <chr>         <dttm>              <dttm>             
##  1 8FE8F7D9C10E88C7 electric_bike 2023-04-02 08:37:28 2023-04-02 08:41:37
##  2 34E4ED3ADF1D821B electric_bike 2023-04-19 11:29:02 2023-04-19 11:52:12
##  3 5296BF07A2F77CB5 electric_bike 2023-04-19 08:41:22 2023-04-19 08:43:22
##  4 40759916B76D5D52 electric_bike 2023-04-19 13:31:30 2023-04-19 13:35:09
##  5 77A96F460101AC63 electric_bike 2023-04-19 12:05:36 2023-04-19 12:10:26
##  6 8D6A2328E19DC168 electric_bike 2023-04-19 12:17:34 2023-04-19 12:21:38
##  7 C97BBA66E07889F9 electric_bike 2023-04-19 09:35:48 2023-04-19 09:45:00
##  8 6687AD4C575FF734 electric_bike 2023-04-11 16:13:43 2023-04-11 16:18:41
##  9 A8FA4F73B22BC11F electric_bike 2023-04-11 16:29:24 2023-04-11 16:40:23
## 10 81E158FE63D99994 electric_bike 2023-04-19 17:35:40 2023-04-19 17:36:11
## # ℹ 426,580 more rows
## # ℹ 9 more variables: start_station_name <chr>, start_station_id <chr>,
## #   end_station_name <chr>, end_station_id <chr>, start_lat <dbl>,
## #   start_lng <dbl>, end_lat <dbl>, end_lng <dbl>, member_casual <chr>

april_trips <- read_csv("202304-divvy-tripdata.csv")

## Rows: 426590 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

read_csv("202305-divvy-tripdata.csv")

## Rows: 604827 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

## # A tibble: 604,827 × 13
##    ride_id          rideable_type started_at          ended_at           
##    <chr>            <chr>         <dttm>              <dttm>             
##  1 0D9FA920C3062031 electric_bike 2023-05-07 19:53:48 2023-05-07 19:58:32
##  2 92485E5FB5888ACD electric_bike 2023-05-06 18:54:08 2023-05-06 19:03:35
##  3 FB144B3FC8300187 electric_bike 2023-05-21 00:40:21 2023-05-21 00:44:36
##  4 DDEB93BC2CE9AA77 classic_bike  2023-05-10 16:47:01 2023-05-10 16:59:52
##  5 C07B70172FC92F59 classic_bike  2023-05-09 18:30:34 2023-05-09 18:39:28
##  6 2BA66385DF8F815A classic_bike  2023-05-30 15:01:21 2023-05-30 15:17:00
##  7 31EFCCB05F12D8EF docked_bike   2023-05-09 14:13:40 2023-05-09 14:47:20
##  8 71DFF834E1D3CE0B classic_bike  2023-05-06 16:47:22 2023-05-06 16:52:13
##  9 2117485899B4CEA4 classic_bike  2023-05-15 12:47:26 2023-05-15 13:00:05
## 10 811149F69AAE82DD electric_bike 2023-05-19 05:44:26 2023-05-19 05:47:24
## # ℹ 604,817 more rows
## # ℹ 9 more variables: start_station_name <chr>, start_station_id <chr>,
## #   end_station_name <chr>, end_station_id <chr>, start_lat <dbl>,
## #   start_lng <dbl>, end_lat <dbl>, end_lng <dbl>, member_casual <chr>

may_trips <- read_csv("202305-divvy-tripdata.csv")

## Rows: 604827 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

read_csv("202306-divvy-tripdata.csv")

## Rows: 719618 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

## # A tibble: 719,618 × 13
##    ride_id          rideable_type started_at          ended_at           
##    <chr>            <chr>         <dttm>              <dttm>             
##  1 6F1682AC40EB6F71 electric_bike 2023-06-05 13:34:12 2023-06-05 14:31:56
##  2 622A1686D64948EB electric_bike 2023-06-05 01:30:22 2023-06-05 01:33:06
##  3 3C88859D926253B4 electric_bike 2023-06-20 18:15:49 2023-06-20 18:32:05
##  4 EAD8A5E0259DEC88 electric_bike 2023-06-19 14:56:00 2023-06-19 15:00:35
##  5 5A36F21930D6A55C electric_bike 2023-06-19 15:03:34 2023-06-19 15:07:16
##  6 CF682EA7D0F961DB electric_bike 2023-06-09 21:30:25 2023-06-09 21:49:52
##  7 4910FBB710157754 electric_bike 2023-06-03 13:34:09 2023-06-03 13:34:28
##  8 EA19D850A42F56D8 electric_bike 2023-06-03 13:34:46 2023-06-03 13:35:00
##  9 E68F43784662A2D0 electric_bike 2023-06-02 22:27:35 2023-06-02 22:35:26
## 10 5A013E29CC001611 electric_bike 2023-06-02 21:18:31 2023-06-03 01:27:19
## # ℹ 719,608 more rows
## # ℹ 9 more variables: start_station_name <chr>, start_station_id <chr>,
## #   end_station_name <chr>, end_station_id <chr>, start_lat <dbl>,
## #   start_lng <dbl>, end_lat <dbl>, end_lng <dbl>, member_casual <chr>

june_trips <- read_csv("202306-divvy-tripdata.csv")

## Rows: 719618 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

read_csv("202307-divvy-tripdata.csv")

## Rows: 767650 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

## # A tibble: 767,650 × 13
##    ride_id          rideable_type started_at          ended_at           
##    <chr>            <chr>         <dttm>              <dttm>             
##  1 9340B064F0AEE130 electric_bike 2023-07-23 20:06:14 2023-07-23 20:22:44
##  2 D1460EE3CE0D8AF8 classic_bike  2023-07-23 17:05:07 2023-07-23 17:18:37
##  3 DF41BE31B895A25E classic_bike  2023-07-23 10:14:53 2023-07-23 10:24:29
##  4 9624A293749EF703 electric_bike 2023-07-21 08:27:44 2023-07-21 08:32:40
##  5 2F68A6A4CDB4C99A classic_bike  2023-07-08 15:46:42 2023-07-08 15:58:08
##  6 9AEE973E6B941A9C classic_bike  2023-07-10 08:44:47 2023-07-10 08:49:41
##  7 E366E997FDA1582B classic_bike  2023-07-25 14:30:44 2023-07-25 14:37:45
##  8 1BB3E73851E6C2C1 classic_bike  2023-07-07 10:11:53 2023-07-07 10:17:55
##  9 DA1E1D0866E6566E electric_bike 2023-07-04 21:57:27 2023-07-04 22:08:27
## 10 39BF4A73A704CA85 classic_bike  2023-07-29 10:51:17 2023-07-29 11:03:13
## # ℹ 767,640 more rows
## # ℹ 9 more variables: start_station_name <chr>, start_station_id <chr>,
## #   end_station_name <chr>, end_station_id <chr>, start_lat <dbl>,
## #   start_lng <dbl>, end_lat <dbl>, end_lng <dbl>, member_casual <chr>

july_trips <- read_csv("202307-divvy-tripdata.csv")

## Rows: 767650 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

read_csv("202308-divvy-tripdata.csv")

## Rows: 771693 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

## # A tibble: 771,693 × 13
##    ride_id          rideable_type started_at          ended_at           
##    <chr>            <chr>         <dttm>              <dttm>             
##  1 903C30C2D810A53B electric_bike 2023-08-19 15:41:53 2023-08-19 15:53:36
##  2 F2FB18A98E110A2B electric_bike 2023-08-18 15:30:18 2023-08-18 15:45:25
##  3 D0DEC7C94E4663DA electric_bike 2023-08-30 16:15:08 2023-08-30 16:27:37
##  4 E0DDDC5F84747ED9 electric_bike 2023-08-30 16:24:07 2023-08-30 16:33:34
##  5 7797A4874BA260CA electric_bike 2023-08-22 15:59:44 2023-08-22 16:20:38
##  6 DF4DE734EBC4DF66 electric_bike 2023-08-24 12:27:24 2023-08-24 12:54:59
##  7 EE60FB066E69AFAC electric_bike 2023-08-31 20:42:14 2023-08-31 20:54:38
##  8 A115DA6AA13DE5EF electric_bike 2023-08-17 15:15:51 2023-08-17 15:22:27
##  9 86DBB19374245893 electric_bike 2023-08-24 21:37:19 2023-08-24 21:47:22
## 10 2905CBC8B8EE392C electric_bike 2023-08-28 14:53:38 2023-08-28 14:59:35
## # ℹ 771,683 more rows
## # ℹ 9 more variables: start_station_name <chr>, start_station_id <chr>,
## #   end_station_name <chr>, end_station_id <chr>, start_lat <dbl>,
## #   start_lng <dbl>, end_lat <dbl>, end_lng <dbl>, member_casual <chr>

august_trips <- read_csv("202308-divvy-tripdata.csv")

## Rows: 771693 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

read_csv("202309-divvy-tripdata.csv")

## Rows: 666371 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

## # A tibble: 666,371 × 13
##    ride_id          rideable_type started_at          ended_at           
##    <chr>            <chr>         <dttm>              <dttm>             
##  1 011C1903BF4E2E28 classic_bike  2023-09-23 00:27:50 2023-09-23 00:33:27
##  2 87DB80E048A1BF9F classic_bike  2023-09-02 09:26:43 2023-09-02 09:38:19
##  3 7C2EB7AF669066E3 electric_bike 2023-09-25 18:30:11 2023-09-25 18:41:39
##  4 57D197B010269CE3 classic_bike  2023-09-13 15:30:49 2023-09-13 15:39:18
##  5 8A2CEA7C8C8074D8 classic_bike  2023-09-18 15:58:58 2023-09-18 16:05:04
##  6 03F7044D1304CD58 electric_bike 2023-09-15 20:19:25 2023-09-15 20:30:27
##  7 672503E0FC0835EC electric_bike 2023-09-27 16:52:18 2023-09-27 17:03:22
##  8 1D806492F95973AC electric_bike 2023-09-17 11:07:05 2023-09-17 11:13:39
##  9 40D9EF382CC6C53D classic_bike  2023-09-17 11:58:50 2023-09-17 12:08:36
## 10 C60CE661AF7ECC93 electric_bike 2023-09-07 20:52:43 2023-09-07 21:06:51
## # ℹ 666,361 more rows
## # ℹ 9 more variables: start_station_name <chr>, start_station_id <chr>,
## #   end_station_name <chr>, end_station_id <chr>, start_lat <dbl>,
## #   start_lng <dbl>, end_lat <dbl>, end_lng <dbl>, member_casual <chr>

september_trips <- read_csv("202309-divvy-tripdata.csv")

## Rows: 666371 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

read_csv("202310-divvy-tripdata.csv")

## Rows: 537113 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

## # A tibble: 537,113 × 13
##    ride_id          rideable_type started_at          ended_at           
##    <chr>            <chr>         <dttm>              <dttm>             
##  1 4449097279F8BBE7 classic_bike  2023-10-08 10:36:26 2023-10-08 10:49:19
##  2 9CF060543CA7B439 electric_bike 2023-10-11 17:23:59 2023-10-11 17:36:08
##  3 667F21F4D6BDE69C electric_bike 2023-10-12 07:02:33 2023-10-12 07:06:53
##  4 F92714CC6B019B96 classic_bike  2023-10-24 19:13:03 2023-10-24 19:18:29
##  5 5E34BA5DE945A9CC classic_bike  2023-10-09 18:19:26 2023-10-09 18:30:56
##  6 F7D7420AFAC53CD9 electric_bike 2023-10-04 17:10:59 2023-10-04 17:25:21
##  7 870B2D4CD112D7B7 electric_bike 2023-10-31 17:32:20 2023-10-31 17:44:20
##  8 D9179D36E32D456C classic_bike  2023-10-02 18:51:51 2023-10-02 18:57:09
##  9 F8E131281F722FEF classic_bike  2023-10-17 08:28:18 2023-10-17 08:50:03
## 10 91938B71748FA405 classic_bike  2023-10-17 19:17:38 2023-10-17 19:32:23
## # ℹ 537,103 more rows
## # ℹ 9 more variables: start_station_name <chr>, start_station_id <chr>,
## #   end_station_name <chr>, end_station_id <chr>, start_lat <dbl>,
## #   start_lng <dbl>, end_lat <dbl>, end_lng <dbl>, member_casual <chr>

october_trips <- read_csv("202310-divvy-tripdata.csv")

## Rows: 537113 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

read_csv("202311-divvy-tripdata.csv")

## Rows: 362518 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

## # A tibble: 362,518 × 13
##    ride_id          rideable_type started_at          ended_at           
##    <chr>            <chr>         <dttm>              <dttm>             
##  1 4EAD8F1AD547356B electric_bike 2023-11-30 21:50:05 2023-11-30 22:13:27
##  2 6322270563BF5470 electric_bike 2023-11-03 09:44:02 2023-11-03 10:17:15
##  3 B37BDE091ECA38E0 electric_bike 2023-11-30 11:39:44 2023-11-30 11:40:08
##  4 CF0CA5DD26E4F90E classic_bike  2023-11-08 10:01:45 2023-11-08 10:27:05
##  5 EB8381AA641348DB classic_bike  2023-11-03 16:20:25 2023-11-03 16:54:25
##  6 B8CF14EA423D6886 electric_bike 2023-11-30 16:15:53 2023-11-30 16:39:52
##  7 1763B0A2778C185E classic_bike  2023-11-09 11:55:54 2023-11-09 13:08:18
##  8 8307B5F616A3D2EE classic_bike  2023-11-19 14:37:02 2023-11-19 14:59:07
##  9 90B4E47C4977935E classic_bike  2023-11-19 15:12:54 2023-11-19 15:27:50
## 10 A9A78F624F996079 classic_bike  2023-11-09 19:34:57 2023-11-09 19:37:53
## # ℹ 362,508 more rows
## # ℹ 9 more variables: start_station_name <chr>, start_station_id <chr>,
## #   end_station_name <chr>, end_station_id <chr>, start_lat <dbl>,
## #   start_lng <dbl>, end_lat <dbl>, end_lng <dbl>, member_casual <chr>

november_trips <- read_csv("202311-divvy-tripdata.csv")

## Rows: 362518 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

read_csv("202312-divvy-tripdata.csv")

## Rows: 224073 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

## # A tibble: 224,073 × 13
##    ride_id          rideable_type started_at          ended_at           
##    <chr>            <chr>         <dttm>              <dttm>             
##  1 C9BD54F578F57246 electric_bike 2023-12-02 18:44:01 2023-12-02 18:47:51
##  2 CDBD92F067FA620E electric_bike 2023-12-02 18:48:19 2023-12-02 18:54:48
##  3 ABC0858E52CBFC84 electric_bike 2023-12-24 01:56:32 2023-12-24 02:04:09
##  4 F44B6F0E8F76DC90 electric_bike 2023-12-24 10:58:12 2023-12-24 11:03:04
##  5 3C876413281A90DF electric_bike 2023-12-24 12:43:16 2023-12-24 12:44:57
##  6 28C0D6EFB81E1769 electric_bike 2023-12-24 13:59:57 2023-12-24 14:10:57
##  7 8A38729DE7B2FAFE electric_bike 2023-12-24 09:01:58 2023-12-24 09:07:51
##  8 19FD7AA9B32E12AD electric_bike 2023-12-24 08:21:38 2023-12-24 08:27:09
##  9 055C15FE4A207408 electric_bike 2023-12-11 18:17:46 2023-12-11 18:22:43
## 10 A73B25A7D94889C9 electric_bike 2023-12-03 06:05:56 2023-12-03 06:06:06
## # ℹ 224,063 more rows
## # ℹ 9 more variables: start_station_name <chr>, start_station_id <chr>,
## #   end_station_name <chr>, end_station_id <chr>, start_lat <dbl>,
## #   start_lng <dbl>, end_lat <dbl>, end_lng <dbl>, member_casual <chr>

december_trips <- read_csv("202312-divvy-tripdata.csv")

## Rows: 224073 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

combing the cyclistic trip data for seprate months into one dataframe name combined_trips

combined_trips <- rbind(jan_trips,feb_trips,march_trips,april_trips,may_trips,june_trips,july_trips,august_trips,september_trips,october_trips,november_trips,december_trips)

checking the structure of the new data frame after combining the data

str(combined_trips)

## spc_tbl_ [5,719,877 × 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ ride_id           : chr [1:5719877] "F96D5A74A3E41399" "13CB7EB698CEDB88" "BD88A2E670661CE5" "C90792D034FED968" ...
##  $ rideable_type     : chr [1:5719877] "electric_bike" "classic_bike" "electric_bike" "classic_bike" ...
##  $ started_at        : POSIXct[1:5719877], format: "2023-01-21 20:05:42" "2023-01-10 15:37:36" ...
##  $ ended_at          : POSIXct[1:5719877], format: "2023-01-21 20:16:33" "2023-01-10 15:46:05" ...
##  $ start_station_name: chr [1:5719877] "Lincoln Ave & Fullerton Ave" "Kimbark Ave & 53rd St" "Western Ave & Lunt Ave" "Kimbark Ave & 53rd St" ...
##  $ start_station_id  : chr [1:5719877] "TA1309000058" "TA1309000037" "RP-005" "TA1309000037" ...
##  $ end_station_name  : chr [1:5719877] "Hampden Ct & Diversey Ave" "Greenwood Ave & 47th St" "Valli Produce - Evanston Plaza" "Greenwood Ave & 47th St" ...
##  $ end_station_id    : chr [1:5719877] "202480.0" "TA1308000002" "599" "TA1308000002" ...
##  $ start_lat         : num [1:5719877] 41.9 41.8 42 41.8 41.8 ...
##  $ start_lng         : num [1:5719877] -87.6 -87.6 -87.7 -87.6 -87.6 ...
##  $ end_lat           : num [1:5719877] 41.9 41.8 42 41.8 41.8 ...
##  $ end_lng           : num [1:5719877] -87.6 -87.6 -87.7 -87.6 -87.6 ...
##  $ member_casual     : chr [1:5719877] "member" "member" "casual" "member" ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   ride_id = col_character(),
##   ..   rideable_type = col_character(),
##   ..   started_at = col_datetime(format = ""),
##   ..   ended_at = col_datetime(format = ""),
##   ..   start_station_name = col_character(),
##   ..   start_station_id = col_character(),
##   ..   end_station_name = col_character(),
##   ..   end_station_id = col_character(),
##   ..   start_lat = col_double(),
##   ..   start_lng = col_double(),
##   ..   end_lat = col_double(),
##   ..   end_lng = col_double(),
##   ..   member_casual = col_character()
##   .. )
##  - attr(*, "problems")=<externalptr>

also checking for the first 10 rows

as_tibble(combined_trips)

## # A tibble: 5,719,877 × 13
##    ride_id          rideable_type started_at          ended_at           
##    <chr>            <chr>         <dttm>              <dttm>             
##  1 F96D5A74A3E41399 electric_bike 2023-01-21 20:05:42 2023-01-21 20:16:33
##  2 13CB7EB698CEDB88 classic_bike  2023-01-10 15:37:36 2023-01-10 15:46:05
##  3 BD88A2E670661CE5 electric_bike 2023-01-02 07:51:57 2023-01-02 08:05:11
##  4 C90792D034FED968 classic_bike  2023-01-22 10:52:58 2023-01-22 11:01:44
##  5 3397017529188E8A classic_bike  2023-01-12 13:58:01 2023-01-12 14:13:20
##  6 58E68156DAE3E311 electric_bike 2023-01-31 07:18:03 2023-01-31 07:21:16
##  7 2F7194B6012A98D4 electric_bike 2023-01-15 21:18:36 2023-01-15 21:32:36
##  8 DB1CF84154D6A049 classic_bike  2023-01-25 10:49:01 2023-01-25 10:58:22
##  9 34EAB943F88C4C5D electric_bike 2023-01-25 20:49:47 2023-01-25 21:02:14
## 10 BC8AB1AA51DA9115 classic_bike  2023-01-06 16:37:19 2023-01-06 16:49:52
## # ℹ 5,719,867 more rows
## # ℹ 9 more variables: start_station_name <chr>, start_station_id <chr>,
## #   end_station_name <chr>, end_station_id <chr>, start_lat <dbl>,
## #   start_lng <dbl>, end_lat <dbl>, end_lng <dbl>, member_casual <chr>

changing the format of start_date and end_date as they are in chr format changing the start_date and_end date to date and time format

combined_trips$started_at = strptime(combined_trips$started_at,"%Y-%m-%d %H:%M:%S")
combined_trips$ended_at = strptime(combined_trips$ended_at,"%Y-%m-%d %H:%M:%S")

checking if the date and time format applied

str(combined_trips)

## spc_tbl_ [5,719,877 × 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ ride_id           : chr [1:5719877] "F96D5A74A3E41399" "13CB7EB698CEDB88" "BD88A2E670661CE5" "C90792D034FED968" ...
##  $ rideable_type     : chr [1:5719877] "electric_bike" "classic_bike" "electric_bike" "classic_bike" ...
##  $ started_at        : POSIXlt[1:5719877], format: "2023-01-21 20:05:42" "2023-01-10 15:37:36" ...
##  $ ended_at          : POSIXlt[1:5719877], format: "2023-01-21 20:16:33" "2023-01-10 15:46:05" ...
##  $ start_station_name: chr [1:5719877] "Lincoln Ave & Fullerton Ave" "Kimbark Ave & 53rd St" "Western Ave & Lunt Ave" "Kimbark Ave & 53rd St" ...
##  $ start_station_id  : chr [1:5719877] "TA1309000058" "TA1309000037" "RP-005" "TA1309000037" ...
##  $ end_station_name  : chr [1:5719877] "Hampden Ct & Diversey Ave" "Greenwood Ave & 47th St" "Valli Produce - Evanston Plaza" "Greenwood Ave & 47th St" ...
##  $ end_station_id    : chr [1:5719877] "202480.0" "TA1308000002" "599" "TA1308000002" ...
##  $ start_lat         : num [1:5719877] 41.9 41.8 42 41.8 41.8 ...
##  $ start_lng         : num [1:5719877] -87.6 -87.6 -87.7 -87.6 -87.6 ...
##  $ end_lat           : num [1:5719877] 41.9 41.8 42 41.8 41.8 ...
##  $ end_lng           : num [1:5719877] -87.6 -87.6 -87.7 -87.6 -87.6 ...
##  $ member_casual     : chr [1:5719877] "member" "member" "casual" "member" ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   ride_id = col_character(),
##   ..   rideable_type = col_character(),
##   ..   started_at = col_datetime(format = ""),
##   ..   ended_at = col_datetime(format = ""),
##   ..   start_station_name = col_character(),
##   ..   start_station_id = col_character(),
##   ..   end_station_name = col_character(),
##   ..   end_station_id = col_character(),
##   ..   start_lat = col_double(),
##   ..   start_lng = col_double(),
##   ..   end_lat = col_double(),
##   ..   end_lng = col_double(),
##   ..   member_casual = col_character()
##   .. )
##  - attr(*, "problems")=<externalptr>

now making the data ready for analysis and Adding a column for calculating the ride_length and day_of the week

combined_trips<-mutate(combined_trips,ride_length=difftime(ended_at,started_at, units = "secs"))
combined_trips$day_of_week<-format(as.Date(combined_trips$started_at),"%A")

now seeing the column created

head(combined_trips)

## # A tibble: 6 × 15
##   ride_id          rideable_type started_at          ended_at           
##   <chr>            <chr>         <dttm>              <dttm>             
## 1 F96D5A74A3E41399 electric_bike 2023-01-21 20:05:42 2023-01-21 20:16:33
## 2 13CB7EB698CEDB88 classic_bike  2023-01-10 15:37:36 2023-01-10 15:46:05
## 3 BD88A2E670661CE5 electric_bike 2023-01-02 07:51:57 2023-01-02 08:05:11
## 4 C90792D034FED968 classic_bike  2023-01-22 10:52:58 2023-01-22 11:01:44
## 5 3397017529188E8A classic_bike  2023-01-12 13:58:01 2023-01-12 14:13:20
## 6 58E68156DAE3E311 electric_bike 2023-01-31 07:18:03 2023-01-31 07:21:16
## # ℹ 11 more variables: start_station_name <chr>, start_station_id <chr>,
## #   end_station_name <chr>, end_station_id <chr>, start_lat <dbl>,
## #   start_lng <dbl>, end_lat <dbl>, end_lng <dbl>, member_casual <chr>,
## #   ride_length <drtn>, day_of_week <chr>

filtering out values that have ride_length lesss then 0 secs. we don’t want those to be counted. the code will count include only those values that have ride_length greater then 0.

combined_trips <- filter(combined_trips,combined_trips$ride_length>0)

after filtering the outliers we can remove any null/missing or blank values that may alter the analysis

combined_trips <-combined_trips%>% 
    na.omit()

we can add another column for analysis by adding the month column to identify the month

combined_trips$month<-format(as.Date(combined_trips$started_at),"%m")

Analyzing the cleaned data

determining the average ride_length for member_casual

combined_trips %>% 
  group_by(member_casual) %>% 
summarise(average_ride_length=mean(ride_length))

## # A tibble: 2 × 2
##   member_casual average_ride_length
##   <chr>         <drtn>             
## 1 casual        1376.4738 secs     
## 2 member         727.9772 secs

similarly determining the median, min, max ride length and total_rides for members and casual

combined_trips %>% 
  group_by(member_casual) %>% 
  summarise(median_ride_length=median(ride_length), min_ride_length=min(ride_length), max_ride_length=max(ride_length), total_rides=length(ride_id))

## # A tibble: 2 × 5
##   member_casual median_ride_length min_ride_length max_ride_length total_rides
##   <chr>         <drtn>             <drtn>          <drtn>                <int>
## 1 casual        765 secs           1 secs          728178 secs         1531517
## 2 member        517 secs           1 secs           89872 secs         2799589

calculating the average_ride_length and total_rides by member_casual and day_of_the_week

combined_trips %>% 
  group_by(member_casual,day_of_week) %>% 
  summarise(average_ride_length = mean(ride_length), total_rides = length(ride_id))

## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.

## # A tibble: 14 × 4
## # Groups:   member_casual [2]
##    member_casual day_of_week average_ride_length total_rides
##    <chr>         <chr>       <drtn>                    <int>
##  1 casual        Friday      1339.3150 secs           227826
##  2 casual        Monday      1352.1903 secs           175381
##  3 casual        Saturday    1555.1620 secs           310123
##  4 casual        Sunday      1594.4702 secs           254710
##  5 casual        Thursday    1199.9962 secs           198904
##  6 casual        Tuesday     1230.8860 secs           181510
##  7 casual        Wednesday   1176.0574 secs           183063
##  8 member        Friday       722.4142 secs           400467
##  9 member        Monday       693.0606 secs           386648
## 10 member        Saturday     815.0967 secs           350592
## 11 member        Sunday       817.0324 secs           307818
## 12 member        Thursday     696.1708 secs           452609
## 13 member        Tuesday      698.9975 secs           448778
## 14 member        Wednesday    695.2239 secs           452677

sharing phase of analysis

sharing the findings by comparing the total number of rides among casual and member riders

combined_trips %>% 
  group_by(member_casual) %>% 
 summarise(total_rides=length(ride_id)) %>% 
  ggplot(mapping = aes(x=member_casual, y=total_rides, fill=member_casual,))+geom_col()

saving the currently created plot

ggsave(".png")

## Saving 7 x 5 in image

visualizing the total_rides taken by the member and casuals on different days of the week

combined_trips %>% 
  group_by(member_casual,day_of_week) %>% 
  summarise(total_rides=length(ride_id)) %>% 
  ggplot(mapping = aes(x=day_of_week, y=total_rides, fill=member_casual))+geom_col(width = 0.5, position = position_dodge(width = 0.5))+
  labs(title="total_rides of member and casuals vs days of the week")

## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.

visualizing the average_ride_length of member and casuals on different days of the week

combined_trips %>% 
  group_by(member_casual,day_of_week) %>% 
  summarise(average_ride_length=mean(ride_length)) %>% 
ggplot(mapping = aes(x=day_of_week,y=average_ride_length, fill=member_casual)) +geom_col(width = 0.5, position = position_dodge(width = 0.5))+
  labs(title="average_ride_length vs days of the week for member_casual")

## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
## Don't know how to automatically pick scale for object of type <difftime>.
## Defaulting to continuous.

checking for the usage of rideable type among riders

combined_trips %>% 
  group_by(member_casual,rideable_type) %>% 
  summarise(average_ride_length=mean(ride_length)) %>% 
ggplot(mapping = aes(x=rideable_type, y=average_ride_length, fill=member_casual))+geom_col()+
  labs(title="average_ride_length vs rideable_type")

## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
## Don't know how to automatically pick scale for object of type <difftime>.
## Defaulting to continuous.

ACT PHASE sharing findings with the executive team and manager

we discovered that the casual riders have average_ride_length higher then those of members and seems to increase more over the weekends. Therefore, it is concluded that casual riders use bikes for leisure activities in weekends

a special offer like special price for weekend rides for members can attract potenial casual members to switch membership to anual.

during weekdays the average_ride_length seems to be consistent for casual riders. therefore, it can be believed that causal riders use bikes for work.

Therefore, a weekly pass could attract casual riders to apply for membership.

it is also noticed that the casual riders may prefer classic bike over the electric ones. however, there may also be outliers for average_ride_length for docked bikes