Google Capstone: Cyclist Case Study
2024-05-14
Introduction
Source:
Google Capstone Bike Case Study
This case study is part of my Google Data Analytics Professional Certificate, which I completed through Coursera. As part of my final course, I will use the R programming language to conduct the analysis.
The case study follows the six crucial steps of the data analysis process: Ask, Prepare, Process, Analyze, Share, and Act.
Scenario
I am a junior data analyst working on the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore, my team and I want to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, we will design a new marketing strategy to convert casual riders into annual members. However, before we proceed, Cyclistic executives must approve our recommendations, so they need to be backed up with compelling data insights and professional data visualizations.
1. Ask Phase
- How do annual members and casual riders use Cyclistic bikes differently?
- Why would casual riders buy Cyclistic annual memberships?
- How can Cyclistic use digital media to influence casual riders to become members?
| Case Study Roadmap - Ask |
|---|
Guiding questions
|
Key tasks
|
Deliverable
|
2. Preperation Phase
The data sources that i used has been made available by Motivate International Inc. under this license. Datasets are available here previous 12 months of data.
| Key Tasks |
|---|
|
3. Process Phase
Installing and loading packages
options(repos = c(CRAN = "https://cran.r-project.org"))
install.packages("tidyverse", repos = "https://cran.r-project.org")## Installing package into 'C:/Users/Guest123/AppData/Local/R/win-library/4.4'
## (as 'lib' is unspecified)
## package 'tidyverse' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\Guest123\AppData\Local\Temp\RtmpIrOaKA\downloaded_packages
library("tidyverse")## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library("dplyr")
library(skimr)
library(ggplot2)
library(janitor)##
## Attaching package: 'janitor'
##
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
Importing and Loading the data in R
read_csv("C:/Users/Guest123/Documents/Rstudio Prac/Projects/Bike Case Study/CSV/202301-divvy-tripdata.csv")## Rows: 190301 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## # A tibble: 190,301 × 13
## ride_id rideable_type started_at ended_at
## <chr> <chr> <dttm> <dttm>
## 1 F96D5A74A3E41399 electric_bike 2023-01-21 20:05:42 2023-01-21 20:16:33
## 2 13CB7EB698CEDB88 classic_bike 2023-01-10 15:37:36 2023-01-10 15:46:05
## 3 BD88A2E670661CE5 electric_bike 2023-01-02 07:51:57 2023-01-02 08:05:11
## 4 C90792D034FED968 classic_bike 2023-01-22 10:52:58 2023-01-22 11:01:44
## 5 3397017529188E8A classic_bike 2023-01-12 13:58:01 2023-01-12 14:13:20
## 6 58E68156DAE3E311 electric_bike 2023-01-31 07:18:03 2023-01-31 07:21:16
## 7 2F7194B6012A98D4 electric_bike 2023-01-15 21:18:36 2023-01-15 21:32:36
## 8 DB1CF84154D6A049 classic_bike 2023-01-25 10:49:01 2023-01-25 10:58:22
## 9 34EAB943F88C4C5D electric_bike 2023-01-25 20:49:47 2023-01-25 21:02:14
## 10 BC8AB1AA51DA9115 classic_bike 2023-01-06 16:37:19 2023-01-06 16:49:52
## # ℹ 190,291 more rows
## # ℹ 9 more variables: start_station_name <chr>, start_station_id <chr>,
## # end_station_name <chr>, end_station_id <chr>, start_lat <dbl>,
## # start_lng <dbl>, end_lat <dbl>, end_lng <dbl>, member_casual <chr>
jan_ride <- read_csv("C:/Users/Guest123/Documents/Rstudio Prac/Projects/Bike Case Study/CSV/202301-divvy-tripdata.csv")## Rows: 190301 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
feb_ride <- read_csv("C:/Users/Guest123/Documents/Rstudio Prac/Projects/Bike Case Study/CSV/202302-divvy-tripdata.csv")## Rows: 190445 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
mar_ride <- read_csv("C:/Users/Guest123/Documents/Rstudio Prac/Projects/Bike Case Study/CSV/202303-divvy-tripdata.csv")## Rows: 258678 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
apr_ride <- read_csv("C:/Users/Guest123/Documents/Rstudio Prac/Projects/Bike Case Study/CSV/202304-divvy-tripdata.csv")## Rows: 426590 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
may_ride <- read_csv("C:/Users/Guest123/Documents/Rstudio Prac/Projects/Bike Case Study/CSV/202305-divvy-tripdata.csv")## Rows: 604827 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
jun_ride <- read_csv("C:/Users/Guest123/Documents/Rstudio Prac/Projects/Bike Case Study/CSV/202306-divvy-tripdata.csv")## Rows: 719618 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
jul_ride <- read_csv("C:/Users/Guest123/Documents/Rstudio Prac/Projects/Bike Case Study/CSV/202307-divvy-tripdata.csv")## Rows: 767650 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
aug_ride <- read_csv("C:/Users/Guest123/Documents/Rstudio Prac/Projects/Bike Case Study/CSV/202308-divvy-tripdata.csv")## Rows: 771693 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
sep_ride <- read_csv("C:/Users/Guest123/Documents/Rstudio Prac/Projects/Bike Case Study/CSV/202309-divvy-tripdata.csv")## Rows: 666371 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
oct_ride <- read_csv("C:/Users/Guest123/Documents/Rstudio Prac/Projects/Bike Case Study/CSV/202310-divvy-tripdata.csv")## Rows: 537113 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
nov_ride <- read_csv("C:/Users/Guest123/Documents/Rstudio Prac/Projects/Bike Case Study/CSV/202311-divvy-tripdata.csv")## Rows: 362518 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
dec_ride <- read_csv("C:/Users/Guest123/Documents/Rstudio Prac/Projects/Bike Case Study/CSV/202312-divvy-tripdata.csv")## Rows: 224073 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Combing the cyclistic trip data for seprate months into one dataframe name combined_trips.
combined_trips <- rbind(jan_ride,feb_ride,mar_ride,apr_ride,may_ride,jun_ride,jul_ride,aug_ride,sep_ride,oct_ride,nov_ride,dec_ride)Checking the structure of the new data frame after combining the data
str(combined_trips)## spc_tbl_ [5,719,877 × 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ ride_id : chr [1:5719877] "F96D5A74A3E41399" "13CB7EB698CEDB88" "BD88A2E670661CE5" "C90792D034FED968" ...
## $ rideable_type : chr [1:5719877] "electric_bike" "classic_bike" "electric_bike" "classic_bike" ...
## $ started_at : POSIXct[1:5719877], format: "2023-01-21 20:05:42" "2023-01-10 15:37:36" ...
## $ ended_at : POSIXct[1:5719877], format: "2023-01-21 20:16:33" "2023-01-10 15:46:05" ...
## $ start_station_name: chr [1:5719877] "Lincoln Ave & Fullerton Ave" "Kimbark Ave & 53rd St" "Western Ave & Lunt Ave" "Kimbark Ave & 53rd St" ...
## $ start_station_id : chr [1:5719877] "TA1309000058" "TA1309000037" "RP-005" "TA1309000037" ...
## $ end_station_name : chr [1:5719877] "Hampden Ct & Diversey Ave" "Greenwood Ave & 47th St" "Valli Produce - Evanston Plaza" "Greenwood Ave & 47th St" ...
## $ end_station_id : chr [1:5719877] "202480.0" "TA1308000002" "599" "TA1308000002" ...
## $ start_lat : num [1:5719877] 41.9 41.8 42 41.8 41.8 ...
## $ start_lng : num [1:5719877] -87.6 -87.6 -87.7 -87.6 -87.6 ...
## $ end_lat : num [1:5719877] 41.9 41.8 42 41.8 41.8 ...
## $ end_lng : num [1:5719877] -87.6 -87.6 -87.7 -87.6 -87.6 ...
## $ member_casual : chr [1:5719877] "member" "member" "casual" "member" ...
## - attr(*, "spec")=
## .. cols(
## .. ride_id = col_character(),
## .. rideable_type = col_character(),
## .. started_at = col_datetime(format = ""),
## .. ended_at = col_datetime(format = ""),
## .. start_station_name = col_character(),
## .. start_station_id = col_character(),
## .. end_station_name = col_character(),
## .. end_station_id = col_character(),
## .. start_lat = col_double(),
## .. start_lng = col_double(),
## .. end_lat = col_double(),
## .. end_lng = col_double(),
## .. member_casual = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
Checking for the first 10 rows
as_tibble(combined_trips)## # A tibble: 5,719,877 × 13
## ride_id rideable_type started_at ended_at
## <chr> <chr> <dttm> <dttm>
## 1 F96D5A74A3E41399 electric_bike 2023-01-21 20:05:42 2023-01-21 20:16:33
## 2 13CB7EB698CEDB88 classic_bike 2023-01-10 15:37:36 2023-01-10 15:46:05
## 3 BD88A2E670661CE5 electric_bike 2023-01-02 07:51:57 2023-01-02 08:05:11
## 4 C90792D034FED968 classic_bike 2023-01-22 10:52:58 2023-01-22 11:01:44
## 5 3397017529188E8A classic_bike 2023-01-12 13:58:01 2023-01-12 14:13:20
## 6 58E68156DAE3E311 electric_bike 2023-01-31 07:18:03 2023-01-31 07:21:16
## 7 2F7194B6012A98D4 electric_bike 2023-01-15 21:18:36 2023-01-15 21:32:36
## 8 DB1CF84154D6A049 classic_bike 2023-01-25 10:49:01 2023-01-25 10:58:22
## 9 34EAB943F88C4C5D electric_bike 2023-01-25 20:49:47 2023-01-25 21:02:14
## 10 BC8AB1AA51DA9115 classic_bike 2023-01-06 16:37:19 2023-01-06 16:49:52
## # ℹ 5,719,867 more rows
## # ℹ 9 more variables: start_station_name <chr>, start_station_id <chr>,
## # end_station_name <chr>, end_station_id <chr>, start_lat <dbl>,
## # start_lng <dbl>, end_lat <dbl>, end_lng <dbl>, member_casual <chr>
I will be changing the format of start_date and end_date as they are in chr format changing the start_date and_end date to date and time format
combined_trips$started_at = strptime(combined_trips$started_at,"%Y-%m-%d %H:%M:%S")
combined_trips$ended_at = strptime(combined_trips$ended_at,"%Y-%m-%d %H:%M:%S")Let’s now make the data ready for analysis and Adding a column for calculating the ride_length and day_of the week
combined_trips<-mutate(combined_trips,ride_length=difftime(ended_at,started_at, units = "secs"))
combined_trips$day_of_week<-format(as.Date(combined_trips$started_at),"%A")Let’s see the column created
head(combined_trips)## # A tibble: 6 × 15
## ride_id rideable_type started_at ended_at
## <chr> <chr> <dttm> <dttm>
## 1 F96D5A74A3E41399 electric_bike 2023-01-21 20:05:42 2023-01-21 20:16:33
## 2 13CB7EB698CEDB88 classic_bike 2023-01-10 15:37:36 2023-01-10 15:46:05
## 3 BD88A2E670661CE5 electric_bike 2023-01-02 07:51:57 2023-01-02 08:05:11
## 4 C90792D034FED968 classic_bike 2023-01-22 10:52:58 2023-01-22 11:01:44
## 5 3397017529188E8A classic_bike 2023-01-12 13:58:01 2023-01-12 14:13:20
## 6 58E68156DAE3E311 electric_bike 2023-01-31 07:18:03 2023-01-31 07:21:16
## # ℹ 11 more variables: start_station_name <chr>, start_station_id <chr>,
## # end_station_name <chr>, end_station_id <chr>, start_lat <dbl>,
## # start_lng <dbl>, end_lat <dbl>, end_lng <dbl>, member_casual <chr>,
## # ride_length <drtn>, day_of_week <chr>
I’m filtering out values that have ride_length less then 0 secs. we don’t want those to be counted.
combined_trips <- filter(combined_trips,combined_trips$ride_length>0)After filtering the outliers I can remove any null/missing or blank values that may alter the analysis
combined_trips <-combined_trips%>%
na.omit()Now I can add another column for analysis by adding the month column to identify the month
combined_trips$month<-format(as.Date(combined_trips$started_at),"%m")4. Analyzing Phase
Cleaned Data
Determining the average ride_length for member_casual
combined_trips %>%
group_by(member_casual) %>%
summarise(average_ride_length=mean(ride_length))## # A tibble: 2 × 2
## member_casual average_ride_length
## <chr> <drtn>
## 1 casual 1376.4150 secs
## 2 member 727.9566 secs
Determining the median, min, max ride length and total_rides for members and casual
combined_trips %>%
group_by(member_casual) %>%
summarise(median_ride_length=median(ride_length), min_ride_length=min(ride_length), max_ride_length=max(ride_length), total_rides=length(ride_id))## # A tibble: 2 × 5
## member_casual median_ride_length min_ride_length max_ride_length total_rides
## <chr> <drtn> <drtn> <drtn> <int>
## 1 casual 765 secs 1 secs 728178 secs 1531517
## 2 member 517 secs 1 secs 89872 secs 2799589
Calculating the average_ride_length and total_rides by member_casual and day_of_the_week
combined_trips %>%
group_by(member_casual,day_of_week) %>%
summarise(average_ride_length = mean(ride_length), total_rides = length(ride_id))## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
## # A tibble: 14 × 4
## # Groups: member_casual [2]
## member_casual day_of_week average_ride_length total_rides
## <chr> <chr> <drtn> <int>
## 1 casual Friday 1339.3150 secs 227826
## 2 casual Monday 1352.1903 secs 175381
## 3 casual Saturday 1555.1620 secs 310123
## 4 casual Sunday 1594.1168 secs 254710
## 5 casual Thursday 1199.9962 secs 198904
## 6 casual Tuesday 1230.8860 secs 181510
## 7 casual Wednesday 1176.0574 secs 183063
## 8 member Friday 722.4142 secs 400467
## 9 member Monday 693.0606 secs 386648
## 10 member Saturday 815.0762 secs 350592
## 11 member Sunday 816.8686 secs 307818
## 12 member Thursday 696.1708 secs 452609
## 13 member Tuesday 698.9975 secs 448778
## 14 member Wednesday 695.2239 secs 452677
5. Sharing Phase
Sharing the findings by comparing the total number of rides among casual and member riders
library(dplyr)
library(ggplot2)
library(scales)##
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
##
## discard
## The following object is masked from 'package:readr':
##
## col_factor
combined_trips <- combined_trips %>%
mutate(day_of_week = factor(weekdays(as.Date(started_at)),
levels = c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday")))
# Summarize total rides by membership type and day of the week
weekly_summary <- combined_trips %>%
group_by(member_casual, day_of_week) %>%
summarise(total_rides = n(), .groups = 'drop')
# Plot the data
ggplot(weekly_summary, aes(x = day_of_week, y = total_rides, fill = member_casual)) +
geom_col(position = "dodge") +
scale_y_continuous(labels = comma) + # Use comma format for y-axis
scale_fill_manual(values = c("member" = "#0073C2FF", "casual" = "#EFC000FF")) + # Professional color scheme
labs(x = "Day of the Week", y = "Total Rides", title = "Total Rides by Day of the Week and Membership Type", caption = "Made by Kliz John Millares") +
theme_minimal() +
theme(legend.position = "right")
Total number of bike rides per month categorized by membership type
library(lubridate)
combined_trips <- combined_trips %>%
mutate(month = factor(month(started_at, label = TRUE, abbr = FALSE),
levels = month.name))
# Summarize total rides by membership type and month
monthly_summary <- combined_trips %>%
group_by(member_casual, month) %>%
summarise(total_rides = n(), .groups = 'drop')
# Plot the data
ggplot(monthly_summary, aes(x = month, y = total_rides, fill = member_casual)) +
geom_col(position = "dodge") +
scale_y_continuous(labels = comma) + # Use comma format for y-axis
scale_fill_manual(values = c("member" = "#0073C2FF", "casual" = "#EFC000FF")) + # Professional color scheme
labs(x = "Month", y = "Total Rides", title = "Total Rides by Month and Membership Type", caption = "Made by Kliz John Millares") +
theme_minimal() +
theme(legend.position = "right")
This next visualization helps in understanding the hourly patterns of
bike usage throughout the week, allowing you to identify peak hours for
each day.
combined_trips <- combined_trips %>%
mutate(day_of_week = factor(weekdays(as.Date(started_at)),
levels = c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday")),
hour = hour(started_at))
# Summarize total rides by day of the week and hour
hourly_summary <- combined_trips %>%
group_by(day_of_week, hour) %>%
summarise(total_rides = n(), .groups = 'drop')
# Plot the data
ggplot(hourly_summary, aes(x = hour, y = total_rides, color = day_of_week, group = day_of_week)) +
geom_line(size = 1) +
scale_y_continuous(labels = comma) + # Use comma format for y-axis
scale_color_manual(values = c("Sunday" = "#E41A1C", "Monday" = "#377EB8", "Tuesday" = "#4DAF4A",
"Wednesday" = "#984EA3", "Thursday" = "#FF7F00", "Friday" = "#FFFF33",
"Saturday" = "#A65628")) + # Professional color scheme
labs(x = "Hour of the Day", y = "Total Rides", title = "Hourly Bike Usage Throughout the Week", caption = "Made by Kliz Andrei Millares") +
theme_minimal() +
theme(legend.position = "right")## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
I will compare the monthly ride trends between members and casual
users
combined_trips <- combined_trips %>%
mutate(month = factor(month(started_at, label = TRUE, abbr = FALSE),
levels = month.name))
# Summarize total rides by membership type and month
monthly_summary <- combined_trips %>%
group_by(month, member_casual) %>%
summarise(total_rides = n(), .groups = 'drop')
# Plot the data
ggplot(monthly_summary, aes(x = month, y = total_rides, color = member_casual, group = member_casual)) +
geom_line(size = 1) +
labs(x = "Month", y = "Total Rides", title = "Monthly Ride Trends by Membership Type", caption = "Made by Kliz John Millares") +
scale_color_manual(values = c("member" = "#0073C2FF", "casual" = "#EFC000FF")) +
scale_y_continuous(labels = scales::comma) + # Format y-axis with commas
theme_minimal()Checking for the usage of rideable type among riders
# Summarize average ride length by membership type and rideable type
average_ride_length_summary <- combined_trips %>%
group_by(member_casual, rideable_type) %>%
summarise(average_ride_length = mean(ride_length), .groups = 'drop')
# Plot the data
ggplot(average_ride_length_summary, aes(x = rideable_type, y = average_ride_length, fill = member_casual)) +
geom_col(position = "dodge") +
labs(title = "Average Ride Length vs Rideable Type", x = "Rideable Type", y = "Average Ride Length (minutes)", caption = "Made by Kliz John Millares") +
scale_fill_manual(values = c("member" = "#0073C2FF", "casual" = "#EFC000FF")) + # Color scheme
theme_minimal()## Don't know how to automatically pick scale for object of type <difftime>.
## Defaulting to continuous.
6. Act Phase
| Findings | Conclusion | Recommendation |
|---|---|---|
| Casual riders have a higher average ride length than members, especially on weekends. | Casual riders likely use bikes for leisure activities on weekends. | Offer special weekend prices for members to encourage casual riders to switch to annual membership. |
| Average ride length for casual riders is consistent during weekdays. | Casual riders likely use bikes for commuting to work on weekdays. | Introduce a weekly pass to attract casual riders to apply for membership. |
| Casual riders tend to prefer classic bikes over electric ones. | There may be outliers affecting the average ride length for docked bikes. | N/A |