Working as a junior data analyst in the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore,team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights,team will design a new marketing strategy to convert casual riders into annual members. But first, Cyclistic executives must approve the recommendations, so they must be backed up with compelling data insights and professional data visualizations.
In 2016, Cyclistic launched a successful bike-share offering. Since then, the program has grown to a fleet of 5,824 bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime.
Until now, Cyclistic’s marketing strategy relied on building general awareness and appealing to broad consumer segments. One approach that helped make these things possible was the flexibility of its pricing plans: single-ride passes, full-day passes, and annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members.
Cyclistic’s finance analysts have concluded that annual members are much more profitable than casual riders. Although the pricing flexibility helps Cyclistic attract more customers, The Director of marketing believes that maximizing the number of annual members will be key to future growth. Rather than creating a marketing campaign that targets all-new customers, director believes there is a solid opportunity to convert casual riders into members. She notes that casual riders are already aware of the Cyclistic program and have chosen Cyclistic for their mobility needs.
Three questions will guide future marketing team:-
The director of marketing has assigned the first question to answer: How do annual members and casual riders use Cyclistic bikes differently?
In this assignment, a report with the following deliverable will be shown:
Note:-In this case study, Google’s analysis process(Ask - Prepare - Process - Analyze - Share - Act) is used
1.Business Task In order to maximize the number of annual membership, I, data analyst, will find trend and patterns among casual riders and membership riders, and identify potential riders who can get benefit from annual membership.I do not need to raise awareness of annual membership among casual riders as they are already aware of the program.
2.Stakeholders
3.Stakeholder’s expectation Design marketing strategies aimed at converting casual riders into annual members. In order to do that, however, the marketing analyst team needs to better understand how annual members and casual riders differ, why casual riders would buy a membership, and how digital media could affect their marketing tactics. The marketing team is interested in analyzing the Cyclistic historical bike trip data to identify trends.
About the data set:
Since Cyclistic is a fictional company, I will use Divvy’s, a bike-share program based in Chicago, data used from January 2023 – December 2023 to complete this case studyThis data was made public by Motivate International Inc, under this license. Due to data privacy issues, personal information has been removed or encrypted.
In this phase data loaded
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.4.4 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(geosphere)
Data loaded, verified, and merged into a single dataframe
all_trips <- list.files(path = "Trip_data",full.names = TRUE) %>%
lapply(read_csv) %>%
bind_rows()
## Rows: 190301 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 190445 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 258678 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 426590 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 604827 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 719618 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 767650 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 771693 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 666371 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 537113 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 362518 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 224073 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Data columns,Dimensions,Summary Checked
colnames(all_trips)
## [1] "ride_id" "rideable_type" "started_at"
## [4] "ended_at" "start_station_name" "start_station_id"
## [7] "end_station_name" "end_station_id" "start_lat"
## [10] "start_lng" "end_lat" "end_lng"
## [13] "member_casual"
dim(all_trips)
## [1] 5719877 13
head(all_trips)
## # A tibble: 6 × 13
## ride_id rideable_type started_at ended_at
## <chr> <chr> <dttm> <dttm>
## 1 F96D5A74A3E41399 electric_bike 2023-01-21 20:05:42 2023-01-21 20:16:33
## 2 13CB7EB698CEDB88 classic_bike 2023-01-10 15:37:36 2023-01-10 15:46:05
## 3 BD88A2E670661CE5 electric_bike 2023-01-02 07:51:57 2023-01-02 08:05:11
## 4 C90792D034FED968 classic_bike 2023-01-22 10:52:58 2023-01-22 11:01:44
## 5 3397017529188E8A classic_bike 2023-01-12 13:58:01 2023-01-12 14:13:20
## 6 58E68156DAE3E311 electric_bike 2023-01-31 07:18:03 2023-01-31 07:21:16
## # ℹ 9 more variables: start_station_name <chr>, start_station_id <chr>,
## # end_station_name <chr>, end_station_id <chr>, start_lat <dbl>,
## # start_lng <dbl>, end_lat <dbl>, end_lng <dbl>, member_casual <chr>
summary(all_trips)
## ride_id rideable_type started_at
## Length:5719877 Length:5719877 Min. :2023-01-01 00:01:58.00
## Class :character Class :character 1st Qu.:2023-05-21 12:50:44.00
## Mode :character Mode :character Median :2023-07-20 18:02:50.00
## Mean :2023-07-16 10:27:50.01
## 3rd Qu.:2023-09-16 20:08:49.00
## Max. :2023-12-31 23:59:38.00
##
## ended_at start_station_name start_station_id
## Min. :2023-01-01 00:02:41.00 Length:5719877 Length:5719877
## 1st Qu.:2023-05-21 13:14:09.00 Class :character Class :character
## Median :2023-07-20 18:19:47.00 Mode :character Mode :character
## Mean :2023-07-16 10:46:00.18
## 3rd Qu.:2023-09-16 20:28:10.00
## Max. :2024-01-01 23:50:51.00
##
## end_station_name end_station_id start_lat start_lng
## Length:5719877 Length:5719877 Min. :41.63 Min. :-87.94
## Class :character Class :character 1st Qu.:41.88 1st Qu.:-87.66
## Mode :character Mode :character Median :41.90 Median :-87.64
## Mean :41.90 Mean :-87.65
## 3rd Qu.:41.93 3rd Qu.:-87.63
## Max. :42.07 Max. :-87.46
##
## end_lat end_lng member_casual
## Min. : 0.00 Min. :-88.16 Length:5719877
## 1st Qu.:41.88 1st Qu.:-87.66 Class :character
## Median :41.90 Median :-87.64 Mode :character
## Mean :41.90 Mean :-87.65
## 3rd Qu.:41.93 3rd Qu.:-87.63
## Max. :42.18 Max. : 0.00
## NA's :6990 NA's :6990
Data Cleaning before conducting analysis
Added columns that list the date, month, day, and year of each ride as we might need to aggregate ride data for each month, day, or year.The default format is yyyy-mm-dd columns verfied
all_trips$date <- as.Date(all_trips$started_at)
all_trips$month <- format(as.Date(all_trips$date),"%m")
all_trips$day <- format(as.Date(all_trips$date),"%d")
all_trips$year <- format(as.Date(all_trips$date),"%Y")
all_trips$day_of_week <- format(as.Date(all_trips$date),"%A")
colnames(all_trips)
## [1] "ride_id" "rideable_type" "started_at"
## [4] "ended_at" "start_station_name" "start_station_id"
## [7] "end_station_name" "end_station_id" "start_lat"
## [10] "start_lng" "end_lat" "end_lng"
## [13] "member_casual" "date" "month"
## [16] "day" "year" "day_of_week"
Added a “ride_length” calculation to all_trips (in seconds) so that I can compare ride length for each ride
all_trips$ride_length <- difftime(all_trips$ended_at,all_trips$started_at)
Converted “ride_length” from Double to numeric so we can run calculations on the data
all_trips$ride_length <- as.numeric(as.character(all_trips$ride_length))
is.numeric(all_trips$ride_length)
## [1] TRUE
Removed ride length is less than 0 second and is > 1440 minutes as ride length shouldn’t be either negative or more than one day Created a new data frame without records that have ride length <= zero minute OR > 1440 minutes New Dataframe checked:
all_trips_v2 <- all_trips[!(all_trips$ride_length <= 0 | all_trips$ride_length > 1440),]
dim(all_trips_v2)
## [1] 4902180 19
summary(all_trips_v2)
## ride_id rideable_type started_at
## Length:4902180 Length:4902180 Min. :2023-01-01 00:01:58.00
## Class :character Class :character 1st Qu.:2023-05-18 22:29:22.25
## Mode :character Mode :character Median :2023-07-20 20:34:25.50
## Mean :2023-07-16 07:43:44.45
## 3rd Qu.:2023-09-19 18:09:40.50
## Max. :2023-12-31 23:59:38.00
##
## ended_at start_station_name start_station_id
## Min. :2023-01-01 00:02:41.00 Length:4902180 Length:4902180
## 1st Qu.:2023-05-18 22:37:03.75 Class :character Class :character
## Median :2023-07-20 20:44:02.00 Mode :character Mode :character
## Mean :2023-07-16 07:53:04.57
## 3rd Qu.:2023-09-19 18:18:52.25
## Max. :2024-01-01 00:06:08.00
##
## end_station_name end_station_id start_lat start_lng
## Length:4902180 Length:4902180 Min. :41.64 Min. :-87.92
## Class :character Class :character 1st Qu.:41.88 1st Qu.:-87.66
## Mode :character Mode :character Median :41.90 Median :-87.64
## Mean :41.90 Mean :-87.65
## 3rd Qu.:41.93 3rd Qu.:-87.63
## Max. :42.07 Max. :-87.52
##
## end_lat end_lng member_casual date
## Min. : 0.00 Min. :-87.99 Length:4902180 Min. :2023-01-01
## 1st Qu.:41.88 1st Qu.:-87.66 Class :character 1st Qu.:2023-05-18
## Median :41.90 Median :-87.65 Mode :character Median :2023-07-20
## Mean :41.90 Mean :-87.65 Mean :2023-07-15
## 3rd Qu.:41.93 3rd Qu.:-87.63 3rd Qu.:2023-09-19
## Max. :42.09 Max. : 0.00 Max. :2023-12-31
## NA's :146 NA's :146
## month day year day_of_week
## Length:4902180 Length:4902180 Length:4902180 Length:4902180
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## ride_length
## Min. : 1.0
## 1st Qu.: 295.0
## Median : 492.0
## Mean : 560.1
## 3rd Qu.: 778.0
## Max. :1440.0
##
Remove NA data from the all_trips_v2 to get accurate data.
all_trips_v2 <- drop_na(all_trips_v2)
summary(all_trips_v2)
## ride_id rideable_type started_at
## Length:3667286 Length:3667286 Min. :2023-01-01 00:03:26.00
## Class :character Class :character 1st Qu.:2023-05-17 15:06:54.50
## Mode :character Mode :character Median :2023-07-20 17:09:49.00
## Mean :2023-07-15 13:37:51.17
## 3rd Qu.:2023-09-19 16:28:53.00
## Max. :2023-12-31 23:58:55.00
## ended_at start_station_name start_station_id
## Min. :2023-01-01 00:07:23.00 Length:3667286 Length:3667286
## 1st Qu.:2023-05-17 15:15:37.00 Class :character Class :character
## Median :2023-07-20 17:20:26.00 Mode :character Mode :character
## Mean :2023-07-15 13:47:20.02
## 3rd Qu.:2023-09-19 16:38:41.50
## Max. :2024-01-01 00:06:08.00
## end_station_name end_station_id start_lat start_lng
## Length:3667286 Length:3667286 Min. :41.65 Min. :-87.84
## Class :character Class :character 1st Qu.:41.88 1st Qu.:-87.66
## Mode :character Mode :character Median :41.90 Median :-87.64
## Mean :41.90 Mean :-87.65
## 3rd Qu.:41.93 3rd Qu.:-87.63
## Max. :42.06 Max. :-87.53
## end_lat end_lng member_casual date
## Min. : 0.00 Min. :-87.84 Length:3667286 Min. :2023-01-01
## 1st Qu.:41.88 1st Qu.:-87.66 Class :character 1st Qu.:2023-05-17
## Median :41.90 Median :-87.64 Mode :character Median :2023-07-20
## Mean :41.90 Mean :-87.65 Mean :2023-07-14
## 3rd Qu.:41.93 3rd Qu.:-87.63 3rd Qu.:2023-09-19
## Max. :42.06 Max. : 0.00 Max. :2023-12-31
## month day year day_of_week
## Length:3667286 Length:3667286 Length:3667286 Length:3667286
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## ride_length
## Min. : 1.0
## 1st Qu.: 304.0
## Median : 501.0
## Mean : 568.9
## 3rd Qu.: 786.0
## Max. :1440.0
Find out the distance for each ride,viewed the dataframe and summary of dataframe
all_trips_v2$ride_distance <- distGeo(matrix(c(all_trips_v2$start_lng,all_trips_v2$start_lat),ncol = 2),
matrix(c(all_trips_v2$end_lng,all_trips_v2$end_lat),ncol = 2))
View(all_trips_v2)
summary(all_trips_v2)
## ride_id rideable_type started_at
## Length:3667286 Length:3667286 Min. :2023-01-01 00:03:26.00
## Class :character Class :character 1st Qu.:2023-05-17 15:06:54.50
## Mode :character Mode :character Median :2023-07-20 17:09:49.00
## Mean :2023-07-15 13:37:51.17
## 3rd Qu.:2023-09-19 16:28:53.00
## Max. :2023-12-31 23:58:55.00
## ended_at start_station_name start_station_id
## Min. :2023-01-01 00:07:23.00 Length:3667286 Length:3667286
## 1st Qu.:2023-05-17 15:15:37.00 Class :character Class :character
## Median :2023-07-20 17:20:26.00 Mode :character Mode :character
## Mean :2023-07-15 13:47:20.02
## 3rd Qu.:2023-09-19 16:38:41.50
## Max. :2024-01-01 00:06:08.00
## end_station_name end_station_id start_lat start_lng
## Length:3667286 Length:3667286 Min. :41.65 Min. :-87.84
## Class :character Class :character 1st Qu.:41.88 1st Qu.:-87.66
## Mode :character Mode :character Median :41.90 Median :-87.64
## Mean :41.90 Mean :-87.65
## 3rd Qu.:41.93 3rd Qu.:-87.63
## Max. :42.06 Max. :-87.53
## end_lat end_lng member_casual date
## Min. : 0.00 Min. :-87.84 Length:3667286 Min. :2023-01-01
## 1st Qu.:41.88 1st Qu.:-87.66 Class :character 1st Qu.:2023-05-17
## Median :41.90 Median :-87.64 Mode :character Median :2023-07-20
## Mean :41.90 Mean :-87.65 Mean :2023-07-14
## 3rd Qu.:41.93 3rd Qu.:-87.63 3rd Qu.:2023-09-19
## Max. :42.06 Max. : 0.00 Max. :2023-12-31
## month day year day_of_week
## Length:3667286 Length:3667286 Length:3667286 Length:3667286
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## ride_length ride_distance
## Min. : 1.0 Min. : 0
## 1st Qu.: 304.0 1st Qu.: 857
## Median : 501.0 Median : 1414
## Mean : 568.9 Mean : 1733
## 3rd Qu.: 786.0 3rd Qu.: 2325
## Max. :1440.0 Max. :9818680
Firstly, let’s find out the number of ride by type of rider. Assign the correct order to each day of the week
all_trips_v2$day_of_week <-
ordered(all_trips_v2$day_of_week, levels = c('Monday', 'Tuesday', 'Wednesday',
'Thursday', 'Friday', 'Saturday', 'Sunday'))
all_trips_v2 %>%
group_by(member_casual, day_of_week) %>%
summarise(number_of_ride = n(), .groups = 'drop') %>%
arrange(day_of_week)
## # A tibble: 14 × 3
## member_casual day_of_week number_of_ride
## <chr> <ord> <int>
## 1 casual Monday 132295
## 2 member Monday 351944
## 3 casual Tuesday 143391
## 4 member Tuesday 408089
## 5 casual Wednesday 148147
## 6 member Wednesday 413726
## 7 casual Thursday 159769
## 8 member Thursday 412675
## 9 casual Friday 173226
## 10 member Friday 362254
## 11 casual Saturday 215442
## 12 member Saturday 304606
## 13 casual Sunday 175226
## 14 member Sunday 266496
Assign the correct order to each month of the year
all_trips_v2$month <-
ordered(all_trips_v2$month, levels = c('05', '06', '07', '08', '09', '10', '11', '12', '01', '02', '03', '04'))
all_trips_v2 %>%
group_by(member_casual, month) %>%
summarise(number_of_ride = n(), .groups = 'drop') %>%
arrange(month)
## # A tibble: 24 × 3
## member_casual month number_of_ride
## <chr> <ord> <int>
## 1 casual 05 126599
## 2 member 05 253682
## 3 casual 06 160526
## 4 member 06 278485
## 5 casual 07 174088
## 6 member 07 287690
## 7 casual 08 170883
## 8 member 08 309150
## 9 casual 09 145166
## 10 member 09 275832
## # ℹ 14 more rows
Now, find out whether ride_length can be different depends on rider type.
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual + all_trips_v2$day_of_week, FUN=mean)
## all_trips_v2$member_casual all_trips_v2$day_of_week all_trips_v2$ride_length
## 1 casual Monday 612.7304
## 2 member Monday 523.9225
## 3 casual Tuesday 606.4741
## 4 member Tuesday 532.6762
## 5 casual Wednesday 601.8963
## 6 member Wednesday 534.7719
## 7 casual Thursday 609.6153
## 8 member Thursday 535.8058
## 9 casual Friday 631.5177
## 10 member Friday 533.8625
## 11 casual Saturday 679.1902
## 12 member Saturday 564.8961
## 13 casual Sunday 671.2377
## 14 member Sunday 557.8728
all_trips_v2 %>%
group_by(member_casual, month) %>%
summarise(average_ride_length = mean(ride_length), .groups = 'drop') %>%
arrange(month)
## # A tibble: 24 × 3
## member_casual month average_ride_length
## <chr> <ord> <dbl>
## 1 casual 05 653.
## 2 member 05 554.
## 3 casual 06 661.
## 4 member 06 571.
## 5 casual 07 669.
## 6 member 07 573.
## 7 casual 08 665.
## 8 member 08 572.
## 9 casual 09 648.
## 10 member 09 559.
## # ℹ 14 more rows
Next, checking whether each type of rider use the bike by looking at ride distance.
all_trips_v2 %>%
group_by(member_casual, day_of_week) %>%
summarise(distance_of_ride = mean(ride_distance), .groups = 'drop') %>%
arrange(day_of_week)
## # A tibble: 14 × 3
## member_casual day_of_week distance_of_ride
## <chr> <ord> <dbl>
## 1 casual Monday 1677.
## 2 member Monday 1708.
## 3 casual Tuesday 1728.
## 4 member Tuesday 1742.
## 5 casual Wednesday 1732.
## 6 member Wednesday 1750.
## 7 casual Thursday 1799.
## 8 member Thursday 1769.
## 9 casual Friday 1707.
## 10 member Friday 1705.
## 11 casual Saturday 1716.
## 12 member Saturday 1739.
## 13 casual Sunday 1713.
## 14 member Sunday 1730.
all_trips_v2 %>%
group_by(member_casual, month) %>%
summarise(distance_of_ride = mean(ride_distance), .groups = 'drop') %>%
arrange(month)
## # A tibble: 24 × 3
## member_casual month distance_of_ride
## <chr> <ord> <dbl>
## 1 casual 05 1766.
## 2 member 05 1803.
## 3 casual 06 1849.
## 4 member 06 1875.
## 5 casual 07 1753.
## 6 member 07 1820.
## 7 casual 08 1756.
## 8 member 08 1794.
## 9 casual 09 1727.
## 10 member 09 1764.
## # ℹ 14 more rows
Finaly, in order to support my assumption, let’s find out how many riders use the same bike station for start point and end point (ride_distance = 0).
all_trips_v2 %>%
group_by(member_casual) %>%
summarize(number_of_rides = n() , .groups = 'drop')
## # A tibble: 2 × 2
## member_casual number_of_rides
## <chr> <int>
## 1 casual 1147496
## 2 member 2519790
all_trips_v2 %>%
group_by(member_casual) %>%
filter(ride_distance < 1) %>%
summarize(number_of_rides = n() , .groups = 'drop')
## # A tibble: 2 × 2
## member_casual number_of_rides
## <chr> <int>
## 1 casual 53738
## 2 member 57589
The Casual users have leisure, and tourism rides mostly on weekends.
The Annual users have commute or pragmatic rides during weekdays.
***End of the Report***