This is a capstone project required for Google Course Certificate Program. The project entails analyzing a fictional company, Cyclistic Bike share company, Chicago.
The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. This analysis is to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, a new marketing strategy targeted at the most profitable riding category will be designed.
Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members or annual members.
This analysis will follow the steps of data analysis process: Ask, Prepare, Process, Analyze, Share, and Act as a guide. The data to be used to analyze trends is from historical trip data of the company.
The study will follow the SMART methodology to ask questions that will help solve the business problem as well as align with the business task. The methodology ensures that specific, measurable, action oriented, relevant, time-bound questions are asked.
Three questions will guide the future marketing program:
How do annual members and casual riders use Cyclistic bikes differently?
Why would casual riders buy Cyclistic annual memberships?
How can Cyclistic use digital media to influence casual riders to become members?
The business task is to identify how casual riders and annual members use Cyclistic bikes differently and provide actionable insights that will help to design a new marketing strategy to convert casual riders into annual members.
The key stakeholders are the marketing team members, including the director of marketing, Lily Moreno, and the executive team members.
The data is made available by Motivate International Inc. The data is publicly available for download. It contains the historical trip data of the Cyclistic bike share company grouped by monthly and quarterly data. All the data files are in zip folders which can be converted to comma delimited files for convenience of data processing. Click the link to view or download the data https://divvy-tripdata.s3.amazonaws.com/index.html
The population of the data set is the Cyclistic bike riders. The data is gathered by the fictional company making it a first-hand primary data,original and reliable. It is also downloaded from an open source which makes it accessible. The most recent data including millions of rows and several columns with information regarding trip details (routes, start and end times with their corresponding station names and ids) which are relevant to the business questions are available for the analysis.
In order to verify the credibility of the data, a quick summary of the whole data was conducted and is found to have consistent columns throughout individual files, although some rows were found to have missing values.
The data is initially in Zip files. They were transformed to .csv files and then downloaded into the Rstudio for analysis. They were then wrapped up using a code for merging provided by the tidyverse package. 1080470 rows with missing start station details, and 1111801 rows with missing end station details were deleted to ensure data quality.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## âś” dplyr 1.1.4 âś” readr 2.1.5
## âś” forcats 1.0.0 âś” stringr 1.5.1
## âś” ggplot2 3.5.1 âś” tibble 3.2.1
## âś” lubridate 1.9.3 âś” tidyr 1.3.1
## âś” purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## âś– dplyr::filter() masks stats::filter()
## âś– dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(janitor)
##
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(lubridate)
library(skimr)
library(ggplot2)
These will load the necessary tools into the R platform.
The next step is to load the data that will be used to conduct the study. Previous 12 months data will be downloaded, including the immediate previous months.
library(readr)
dec_2023_tripdata <- read_csv("202312-divvy-tripdata.csv")
## Rows: 224073 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
library(readr)
jan_2024_tripdata <- read_csv("202401-divvy-tripdata.csv")
## Rows: 144873 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
library(readr)
feb_2024_tripdata <- read_csv("202402-divvy-tripdata.csv")
## Rows: 223164 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
library(readr)
mar_2024_tripdata <- read_csv("202403-divvy-tripdata.csv")
## Rows: 301687 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
library(readr)
apr_2024_tripdata <- read_csv("202404-divvy-tripdata.csv")
## Rows: 415025 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
library(readr)
may_2024_tripdata <- read_csv("202405-divvy-tripdata.csv")
## Rows: 609493 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
library(readr)
jun_2024_tripdata <- read_csv("202406-divvy-tripdata.csv")
## Rows: 710721 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
library(readr)
jul_2024_tripdata <- read_csv("202407-divvy-tripdata.csv")
## Rows: 748962 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
library(readr)
aug_2024_tripdata <- read_csv("202408-divvy-tripdata.csv")
## Rows: 755639 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
library(readr)
sep_2024_tripdata <- read_csv("202408-divvy-tripdata.csv")
## Rows: 755639 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
library(readr)
oct_2024_tripdata <- read_csv("202409-divvy-tripdata.csv")
## Rows: 821276 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
library(readr)
nov_2024_tripdata <- read_csv("202411-divvy-tripdata.csv")
## Rows: 335075 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#merging the seperate 12 months data files into 1 data frame
total_trip_data <- rbind(dec_2023_tripdata, jan_2024_tripdata, feb_2024_tripdata, mar_2024_tripdata, apr_2024_tripdata, may_2024_tripdata, jun_2024_tripdata, jul_2024_tripdata, aug_2024_tripdata, sep_2024_tripdata, oct_2024_tripdata, nov_2024_tripdata)
skim_without_charts(total_trip_data) #in order to have a comprehensive summary of the data set
| Name | total_trip_data |
| Number of rows | 6045627 |
| Number of columns | 13 |
| _______________________ | |
| Column type frequency: | |
| character | 7 |
| numeric | 4 |
| POSIXct | 2 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| ride_id | 0 | 1.00 | 16 | 16 | 0 | 5289777 | 0 |
| rideable_type | 0 | 1.00 | 12 | 16 | 0 | 3 | 0 |
| start_station_name | 1113914 | 0.82 | 10 | 64 | 0 | 1775 | 0 |
| start_station_id | 1113914 | 0.82 | 3 | 35 | 0 | 1727 | 0 |
| end_station_name | 1144509 | 0.81 | 10 | 64 | 0 | 1788 | 0 |
| end_station_id | 1144509 | 0.81 | 3 | 36 | 0 | 1740 | 0 |
| member_casual | 0 | 1.00 | 6 | 6 | 0 | 2 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 |
|---|---|---|---|---|---|---|---|---|---|
| start_lat | 0 | 1 | 41.90 | 0.04 | 41.64 | 41.88 | 41.90 | 41.93 | 42.07 |
| start_lng | 0 | 1 | -87.65 | 0.03 | -87.91 | -87.66 | -87.64 | -87.63 | -87.52 |
| end_lat | 7799 | 1 | 41.90 | 0.06 | 16.06 | 41.88 | 41.90 | 41.93 | 87.96 |
| end_lng | 7799 | 1 | -87.65 | 0.06 | -144.05 | -87.66 | -87.64 | -87.63 | 1.72 |
Variable type: POSIXct
| skim_variable | n_missing | complete_rate | min | max | median | n_unique |
|---|---|---|---|---|---|---|
| started_at | 0 | 1 | 2023-12-01 00:00:03 | 2024-11-30 23:52:17 | 2024-07-18 00:44:23 | 5063177 |
| ended_at | 0 | 1 | 2023-12-01 00:04:12 | 2024-11-30 23:57:43 | 2024-07-18 01:13:44 | 5066285 |
#to familiarize with the data structure and columns
str(total_trip_data)
## spc_tbl_ [6,045,627 Ă— 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ ride_id : chr [1:6045627] "C9BD54F578F57246" "CDBD92F067FA620E" "ABC0858E52CBFC84" "F44B6F0E8F76DC90" ...
## $ rideable_type : chr [1:6045627] "electric_bike" "electric_bike" "electric_bike" "electric_bike" ...
## $ started_at : POSIXct[1:6045627], format: "2023-12-02 18:44:01" "2023-12-02 18:48:19" ...
## $ ended_at : POSIXct[1:6045627], format: "2023-12-02 18:47:51" "2023-12-02 18:54:48" ...
## $ start_station_name: chr [1:6045627] NA NA NA NA ...
## $ start_station_id : chr [1:6045627] NA NA NA NA ...
## $ end_station_name : chr [1:6045627] NA NA NA NA ...
## $ end_station_id : chr [1:6045627] NA NA NA NA ...
## $ start_lat : num [1:6045627] 41.9 41.9 41.9 42 41.9 ...
## $ start_lng : num [1:6045627] -87.7 -87.7 -87.6 -87.7 -87.6 ...
## $ end_lat : num [1:6045627] 41.9 41.9 41.9 41.9 41.9 ...
## $ end_lng : num [1:6045627] -87.7 -87.6 -87.6 -87.7 -87.6 ...
## $ member_casual : chr [1:6045627] "member" "member" "member" "member" ...
## - attr(*, "spec")=
## .. cols(
## .. ride_id = col_character(),
## .. rideable_type = col_character(),
## .. started_at = col_datetime(format = ""),
## .. ended_at = col_datetime(format = ""),
## .. start_station_name = col_character(),
## .. start_station_id = col_character(),
## .. end_station_name = col_character(),
## .. end_station_id = col_character(),
## .. start_lat = col_double(),
## .. start_lng = col_double(),
## .. end_lat = col_double(),
## .. end_lng = col_double(),
## .. member_casual = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
colnames(total_trip_data)
## [1] "ride_id" "rideable_type" "started_at"
## [4] "ended_at" "start_station_name" "start_station_id"
## [7] "end_station_name" "end_station_id" "start_lat"
## [10] "start_lng" "end_lat" "end_lng"
## [13] "member_casual"
All data have consistent columns
# have a view of first few rows of the data frame
head(total_trip_data)
## # A tibble: 6 Ă— 13
## ride_id rideable_type started_at ended_at
## <chr> <chr> <dttm> <dttm>
## 1 C9BD54F578F57246 electric_bike 2023-12-02 18:44:01 2023-12-02 18:47:51
## 2 CDBD92F067FA620E electric_bike 2023-12-02 18:48:19 2023-12-02 18:54:48
## 3 ABC0858E52CBFC84 electric_bike 2023-12-24 01:56:32 2023-12-24 02:04:09
## 4 F44B6F0E8F76DC90 electric_bike 2023-12-24 10:58:12 2023-12-24 11:03:04
## 5 3C876413281A90DF electric_bike 2023-12-24 12:43:16 2023-12-24 12:44:57
## 6 28C0D6EFB81E1769 electric_bike 2023-12-24 13:59:57 2023-12-24 14:10:57
## # ℹ 9 more variables: start_station_name <chr>, start_station_id <chr>,
## # end_station_name <chr>, end_station_id <chr>, start_lat <dbl>,
## # start_lng <dbl>, end_lat <dbl>, end_lng <dbl>, member_casual <chr>
# alternatively, using inbuilt code to chech for columns with null values
sapply(total_trip_data, function(x) sum(is.na(x)))
## ride_id rideable_type started_at ended_at
## 0 0 0 0
## start_station_name start_station_id end_station_name end_station_id
## 1113914 1113914 1144509 1144509
## start_lat start_lng end_lat end_lng
## 0 0 7799 7799
## member_casual
## 0
#drop rows with missing data values
#rows with missing end_lat and end_lng will be ignored because other data relevant for the study are complete in these rows.
clean_total_tripdata <- total_trip_data %>%
drop_na(start_station_name, start_station_id, end_station_name, end_station_id)
#check for deleted rows
sapply(clean_total_tripdata, function(x) sum(is.na(x)))
## ride_id rideable_type started_at ended_at
## 0 0 0 0
## start_station_name start_station_id end_station_name end_station_id
## 0 0 0 0
## start_lat start_lng end_lat end_lng
## 0 0 0 0
## member_casual
## 0
unique(clean_total_tripdata) # to ensure there are no duplicates
## # A tibble: 3,795,608 Ă— 13
## ride_id rideable_type started_at ended_at
## <chr> <chr> <dttm> <dttm>
## 1 84BFC1F137684EAB classic_bike 2023-12-02 23:12:51 2023-12-02 23:21:01
## 2 EEC92D30A70471E5 classic_bike 2023-12-14 13:43:14 2023-12-14 13:44:14
## 3 1C33464DEEB1F23C electric_bike 2023-12-04 11:57:04 2023-12-04 12:13:59
## 4 E0A61810C305E5EC classic_bike 2023-12-04 09:34:22 2023-12-04 09:35:56
## 5 0706CEB2E1924F3D classic_bike 2023-12-04 09:36:27 2023-12-04 09:36:40
## 6 EB09035006DCCB2C electric_bike 2023-12-02 06:06:32 2023-12-02 06:09:06
## 7 81EE8687F217E531 classic_bike 2023-12-27 23:55:45 2023-12-28 01:43:13
## 8 2C519D5FC6290C41 electric_bike 2023-12-02 13:08:54 2023-12-02 13:14:45
## 9 BACE7E3BCE0919A8 electric_bike 2023-12-24 07:38:07 2023-12-24 07:45:46
## 10 DCCFC2DE81C0B1F9 electric_bike 2023-12-25 10:23:13 2023-12-25 10:25:53
## # ℹ 3,795,598 more rows
## # ℹ 9 more variables: start_station_name <chr>, start_station_id <chr>,
## # end_station_name <chr>, end_station_id <chr>, start_lat <dbl>,
## # start_lng <dbl>, end_lat <dbl>, end_lng <dbl>, member_casual <chr>
#extracting year,day and months column. also calculating ride length
trip_data <- clean_total_tripdata %>%
mutate(year = format(as.Date(started_at), "%Y")) %>% # extract year
mutate(month = format(as.Date(started_at), "%B")) %>% #extract month
mutate(date = format(as.Date(started_at), "%d")) %>% # extract date
mutate(day_of_week = format(as.Date(started_at), "%A")) %>% # extract day of week
mutate(ride_length = difftime(ended_at, started_at)) %>%
mutate(start_time = strftime(started_at, "%H"))
trip_data <- trip_data %>%
mutate(ride_length = as.numeric(ride_length))
is.numeric(trip_data$ride_length) # to check it is right format
## [1] TRUE
#in order to avoid using negative trip ride length
clean_trip_data <- filter(trip_data,ride_length > 1)
str(clean_trip_data)
## tibble [4,336,182 Ă— 19] (S3: tbl_df/tbl/data.frame)
## $ ride_id : chr [1:4336182] "84BFC1F137684EAB" "EEC92D30A70471E5" "1C33464DEEB1F23C" "E0A61810C305E5EC" ...
## $ rideable_type : chr [1:4336182] "classic_bike" "classic_bike" "electric_bike" "classic_bike" ...
## $ started_at : POSIXct[1:4336182], format: "2023-12-02 23:12:51" "2023-12-14 13:43:14" ...
## $ ended_at : POSIXct[1:4336182], format: "2023-12-02 23:21:01" "2023-12-14 13:44:14" ...
## $ start_station_name: chr [1:4336182] "DuSable Museum" "California Ave & Division St" "Chicago State University" "Cottage Grove Ave & 51st St" ...
## $ start_station_id : chr [1:4336182] "KA1503000075" "13256" "20106" "TA1309000067" ...
## $ end_station_name : chr [1:4336182] "Cottage Grove Ave & 51st St" "California Ave & Division St" "Chicago State University" "Cottage Grove Ave & 51st St" ...
## $ end_station_id : chr [1:4336182] "TA1309000067" "13256" "20106" "TA1309000067" ...
## $ start_lat : num [1:4336182] 41.8 41.9 41.7 41.8 41.8 ...
## $ start_lng : num [1:4336182] -87.6 -87.7 -87.6 -87.6 -87.6 ...
## $ end_lat : num [1:4336182] 41.8 41.9 41.7 41.8 41.8 ...
## $ end_lng : num [1:4336182] -87.6 -87.7 -87.6 -87.6 -87.6 ...
## $ member_casual : chr [1:4336182] "member" "casual" "casual" "casual" ...
## $ year : chr [1:4336182] "2023" "2023" "2023" "2023" ...
## $ month : chr [1:4336182] "December" "December" "December" "December" ...
## $ date : chr [1:4336182] "02" "14" "04" "04" ...
## $ day_of_week : chr [1:4336182] "Saturday" "Thursday" "Monday" "Monday" ...
## $ ride_length : num [1:4336182] 490 60 1015 94 13 ...
## $ start_time : chr [1:4336182] "00" "14" "12" "10" ...
#to have a view of the data
#checking details of the cleaned data set, in summary
summary(clean_trip_data)
## ride_id rideable_type started_at
## Length:4336182 Length:4336182 Min. :2023-12-01 00:00:20.00
## Class :character Class :character 1st Qu.:2024-05-08 05:15:29.00
## Mode :character Mode :character Median :2024-07-15 14:12:23.23
## Mean :2024-06-26 21:18:35.27
## 3rd Qu.:2024-08-23 14:54:12.25
## Max. :2024-11-30 23:50:53.45
## ended_at start_station_name start_station_id
## Min. :2023-12-01 00:05:59.00 Length:4336182 Length:4336182
## 1st Qu.:2024-05-08 05:27:56.75 Class :character Class :character
## Median :2024-07-15 14:30:30.95 Mode :character Mode :character
## Mean :2024-06-26 21:35:30.93
## 3rd Qu.:2024-08-23 15:12:52.67
## Max. :2024-11-30 23:57:43.00
## end_station_name end_station_id start_lat start_lng
## Length:4336182 Length:4336182 Min. :41.65 Min. :-87.86
## Class :character Class :character 1st Qu.:41.88 1st Qu.:-87.66
## Mode :character Mode :character Median :41.89 Median :-87.64
## Mean :41.90 Mean :-87.64
## 3rd Qu.:41.93 3rd Qu.:-87.63
## Max. :42.06 Max. :-87.53
## end_lat end_lng member_casual year
## Min. :41.65 Min. :-87.84 Length:4336182 Length:4336182
## 1st Qu.:41.88 1st Qu.:-87.66 Class :character Class :character
## Median :41.90 Median :-87.64 Mode :character Mode :character
## Mean :41.90 Mean :-87.64
## 3rd Qu.:41.93 3rd Qu.:-87.63
## Max. :42.06 Max. :-87.53
## month date day_of_week ride_length
## Length:4336182 Length:4336182 Length:4336182 Min. : 1.01
## Class :character Class :character Class :character 1st Qu.: 355.66
## Mode :character Mode :character Mode :character Median : 618.76
## Mean : 1015.65
## 3rd Qu.: 1112.30
## Max. :90562.00
## start_time
## Length:4336182
## Class :character
## Mode :character
##
##
##
#descriptive analysis of the data
#ride length analysis
# mean of ride length = average lenth of ride
#max ride length = longest ride
# mode of ride length = most frequent ride
#min ride length = the shortest ride distance
clean_trip_data %>%
summarize(average_ride_length = mean(ride_length), median_ride_length = median(ride_length), max_ride_length = max(ride_length), min_ride_length = min(ride_length))
## # A tibble: 1 Ă— 4
## average_ride_length median_ride_length max_ride_length min_ride_length
## <dbl> <dbl> <dbl> <dbl>
## 1 1016. 619. 90562 1.01
clean_trip_data %>%
group_by(member_casual) %>%
summarize(count = n()) %>%
mutate(percentage = count/sum(count)*100)
## # A tibble: 2 Ă— 3
## member_casual count percentage
## <chr> <int> <dbl>
## 1 casual 1598947 36.9
## 2 member 2737235 63.1
#member_casual is the customer type.
The average ride length is 998.38(mins). 64% of the bike users are customers that signed up annual membership while the remaining 36% of the bike riders are one_off purchase casual riders.
#finding the trends of how diffent customers ride the bikes
#analsye the frequency and pattern of rides daily and monthlty
clean_trip_data %>%
group_by(month, member_casual) %>%
summarize(mean(ride_length), min(ride_length), max(ride_length))
## `summarise()` has grouped output by 'month'. You can override using the
## `.groups` argument.
## # A tibble: 24 Ă— 5
## # Groups: month [12]
## month member_casual `mean(ride_length)` `min(ride_length)` `max(ride_length)`
## <chr> <chr> <dbl> <dbl> <dbl>
## 1 April casual 1487. 2 89613
## 2 April member 737. 2 89007
## 3 Augu… casual 1486. 1.06 89853.
## 4 Augu… member 788. 1.01 86887.
## 5 Dece… casual 992. 2 84885
## 6 Dece… member 648. 2 89668
## 7 Febr… casual 1190. 2 89100
## 8 Febr… member 705. 2 89859
## 9 Janu… casual 932. 2 88737
## 10 Janu… member 694. 2 89839
## # ℹ 14 more rows
#for daily trends
clean_trip_data %>%
group_by(day_of_week, member_casual) %>%
summarize(mean(ride_length), min(ride_length), max(ride_length))
## `summarise()` has grouped output by 'day_of_week'. You can override using the
## `.groups` argument.
## # A tibble: 14 Ă— 5
## # Groups: day_of_week [7]
## day_of_week member_casual `mean(ride_length)` `min(ride_length)`
## <chr> <chr> <dbl> <dbl>
## 1 Friday casual 1407. 1.18
## 2 Friday member 733. 1.10
## 3 Monday casual 1406. 1.07
## 4 Monday member 720. 1.03
## 5 Saturday casual 1642. 1.03
## 6 Saturday member 848. 1.02
## 7 Sunday casual 1658. 1.01
## 8 Sunday member 849. 1.07
## 9 Thursday casual 1276. 1.03
## 10 Thursday member 723. 1.01
## 11 Tuesday casual 1262. 1.08
## 12 Tuesday member 721. 1.01
## 13 Wednesday casual 1320. 1.05
## 14 Wednesday member 740. 1.03
## # ℹ 1 more variable: `max(ride_length)` <dbl>