Google Capstone: Cyclist Case Study

2024-05-14

Kliz John Andrei Millares

Introduction

Source:

This case study is part of my Google Data Analytics Professional Certificate, which I completed through Coursera. As part of my final course, I will use the R programming language to conduct the analysis.

The case study follows the six crucial steps of the data analysis process: Ask, Prepare, Process, Analyze, Share, and Act.

Scenario

I am a junior data analyst working on the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore, my team and I want to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, we will design a new marketing strategy to convert casual riders into annual members. However, before we proceed, Cyclistic executives must approve our recommendations, so they need to be backed up with compelling data insights and professional data visualizations.

1. Ask Phase

How do annual members and casual riders use Cyclistic bikes differently?
Why would casual riders buy Cyclistic annual memberships?
How can Cyclistic use digital media to influence casual riders to become members?

Case Study Roadmap - Ask
Guiding questions What is the problem I’m trying to solve? How can my insights drive business decisions?
Key tasks The business objective to understand the key difference between bike usage among annual and casual riders to maximize profit by converting casual riders to annual members. Consider key stakeholders: The key stakeholders are the Director of Marketing (Lily Moreno), Marketing Analytics team, and Executive team.
Deliverable Our goal is to identify the difference between usage of Cyclistic bikes by annual and casual riders.

2. Preperation Phase

The data sources that i used has been made available by Motivate International Inc. under this license. Datasets are available here previous 12 months of data.

Key Tasks
Download data and store it appropriately. The data is in CSV format and therefore two folders have been created: one for .XLSX and another for CSV files for future reference. Sort and filter the data. For this case study, I will be using the data for the year 2023. The data is credible as it is from a reliable source, original, comprehensive, current, and cited.

3. Process Phase

Installing and loading packages

options(repos = c(CRAN = "https://cran.r-project.org"))
install.packages("tidyverse", repos = "https://cran.r-project.org")

## Installing package into 'C:/Users/Guest123/AppData/Local/R/win-library/4.4'
## (as 'lib' is unspecified)

## package 'tidyverse' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\Guest123\AppData\Local\Temp\RtmpIrOaKA\downloaded_packages

library("tidyverse")

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library("dplyr")
library(skimr)
library(ggplot2)
library(janitor)

## 
## Attaching package: 'janitor'
## 
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test

Importing and Loading the data in R

read_csv("C:/Users/Guest123/Documents/Rstudio Prac/Projects/Bike Case Study/CSV/202301-divvy-tripdata.csv")

## Rows: 190301 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

## # A tibble: 190,301 × 13
##    ride_id          rideable_type started_at          ended_at           
##    <chr>            <chr>         <dttm>              <dttm>             
##  1 F96D5A74A3E41399 electric_bike 2023-01-21 20:05:42 2023-01-21 20:16:33
##  2 13CB7EB698CEDB88 classic_bike  2023-01-10 15:37:36 2023-01-10 15:46:05
##  3 BD88A2E670661CE5 electric_bike 2023-01-02 07:51:57 2023-01-02 08:05:11
##  4 C90792D034FED968 classic_bike  2023-01-22 10:52:58 2023-01-22 11:01:44
##  5 3397017529188E8A classic_bike  2023-01-12 13:58:01 2023-01-12 14:13:20
##  6 58E68156DAE3E311 electric_bike 2023-01-31 07:18:03 2023-01-31 07:21:16
##  7 2F7194B6012A98D4 electric_bike 2023-01-15 21:18:36 2023-01-15 21:32:36
##  8 DB1CF84154D6A049 classic_bike  2023-01-25 10:49:01 2023-01-25 10:58:22
##  9 34EAB943F88C4C5D electric_bike 2023-01-25 20:49:47 2023-01-25 21:02:14
## 10 BC8AB1AA51DA9115 classic_bike  2023-01-06 16:37:19 2023-01-06 16:49:52
## # ℹ 190,291 more rows
## # ℹ 9 more variables: start_station_name <chr>, start_station_id <chr>,
## #   end_station_name <chr>, end_station_id <chr>, start_lat <dbl>,
## #   start_lng <dbl>, end_lat <dbl>, end_lng <dbl>, member_casual <chr>

jan_ride <- read_csv("C:/Users/Guest123/Documents/Rstudio Prac/Projects/Bike Case Study/CSV/202301-divvy-tripdata.csv")

## Rows: 190301 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

feb_ride <- read_csv("C:/Users/Guest123/Documents/Rstudio Prac/Projects/Bike Case Study/CSV/202302-divvy-tripdata.csv")

## Rows: 190445 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

mar_ride <- read_csv("C:/Users/Guest123/Documents/Rstudio Prac/Projects/Bike Case Study/CSV/202303-divvy-tripdata.csv")

## Rows: 258678 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

apr_ride <- read_csv("C:/Users/Guest123/Documents/Rstudio Prac/Projects/Bike Case Study/CSV/202304-divvy-tripdata.csv")

## Rows: 426590 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

may_ride <- read_csv("C:/Users/Guest123/Documents/Rstudio Prac/Projects/Bike Case Study/CSV/202305-divvy-tripdata.csv")

## Rows: 604827 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

jun_ride <- read_csv("C:/Users/Guest123/Documents/Rstudio Prac/Projects/Bike Case Study/CSV/202306-divvy-tripdata.csv")

## Rows: 719618 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

jul_ride <- read_csv("C:/Users/Guest123/Documents/Rstudio Prac/Projects/Bike Case Study/CSV/202307-divvy-tripdata.csv")

## Rows: 767650 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

aug_ride <- read_csv("C:/Users/Guest123/Documents/Rstudio Prac/Projects/Bike Case Study/CSV/202308-divvy-tripdata.csv")

## Rows: 771693 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

sep_ride <- read_csv("C:/Users/Guest123/Documents/Rstudio Prac/Projects/Bike Case Study/CSV/202309-divvy-tripdata.csv")

## Rows: 666371 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

oct_ride <- read_csv("C:/Users/Guest123/Documents/Rstudio Prac/Projects/Bike Case Study/CSV/202310-divvy-tripdata.csv")

## Rows: 537113 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

nov_ride <- read_csv("C:/Users/Guest123/Documents/Rstudio Prac/Projects/Bike Case Study/CSV/202311-divvy-tripdata.csv")

## Rows: 362518 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

dec_ride <- read_csv("C:/Users/Guest123/Documents/Rstudio Prac/Projects/Bike Case Study/CSV/202312-divvy-tripdata.csv")

## Rows: 224073 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Combing the cyclistic trip data for seprate months into one dataframe name combined_trips.

combined_trips <- rbind(jan_ride,feb_ride,mar_ride,apr_ride,may_ride,jun_ride,jul_ride,aug_ride,sep_ride,oct_ride,nov_ride,dec_ride)

Checking the structure of the new data frame after combining the data

str(combined_trips)

## spc_tbl_ [5,719,877 × 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ ride_id           : chr [1:5719877] "F96D5A74A3E41399" "13CB7EB698CEDB88" "BD88A2E670661CE5" "C90792D034FED968" ...
##  $ rideable_type     : chr [1:5719877] "electric_bike" "classic_bike" "electric_bike" "classic_bike" ...
##  $ started_at        : POSIXct[1:5719877], format: "2023-01-21 20:05:42" "2023-01-10 15:37:36" ...
##  $ ended_at          : POSIXct[1:5719877], format: "2023-01-21 20:16:33" "2023-01-10 15:46:05" ...
##  $ start_station_name: chr [1:5719877] "Lincoln Ave & Fullerton Ave" "Kimbark Ave & 53rd St" "Western Ave & Lunt Ave" "Kimbark Ave & 53rd St" ...
##  $ start_station_id  : chr [1:5719877] "TA1309000058" "TA1309000037" "RP-005" "TA1309000037" ...
##  $ end_station_name  : chr [1:5719877] "Hampden Ct & Diversey Ave" "Greenwood Ave & 47th St" "Valli Produce - Evanston Plaza" "Greenwood Ave & 47th St" ...
##  $ end_station_id    : chr [1:5719877] "202480.0" "TA1308000002" "599" "TA1308000002" ...
##  $ start_lat         : num [1:5719877] 41.9 41.8 42 41.8 41.8 ...
##  $ start_lng         : num [1:5719877] -87.6 -87.6 -87.7 -87.6 -87.6 ...
##  $ end_lat           : num [1:5719877] 41.9 41.8 42 41.8 41.8 ...
##  $ end_lng           : num [1:5719877] -87.6 -87.6 -87.7 -87.6 -87.6 ...
##  $ member_casual     : chr [1:5719877] "member" "member" "casual" "member" ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   ride_id = col_character(),
##   ..   rideable_type = col_character(),
##   ..   started_at = col_datetime(format = ""),
##   ..   ended_at = col_datetime(format = ""),
##   ..   start_station_name = col_character(),
##   ..   start_station_id = col_character(),
##   ..   end_station_name = col_character(),
##   ..   end_station_id = col_character(),
##   ..   start_lat = col_double(),
##   ..   start_lng = col_double(),
##   ..   end_lat = col_double(),
##   ..   end_lng = col_double(),
##   ..   member_casual = col_character()
##   .. )
##  - attr(*, "problems")=<externalptr>

Checking for the first 10 rows

as_tibble(combined_trips)

## # A tibble: 5,719,877 × 13
##    ride_id          rideable_type started_at          ended_at           
##    <chr>            <chr>         <dttm>              <dttm>             
##  1 F96D5A74A3E41399 electric_bike 2023-01-21 20:05:42 2023-01-21 20:16:33
##  2 13CB7EB698CEDB88 classic_bike  2023-01-10 15:37:36 2023-01-10 15:46:05
##  3 BD88A2E670661CE5 electric_bike 2023-01-02 07:51:57 2023-01-02 08:05:11
##  4 C90792D034FED968 classic_bike  2023-01-22 10:52:58 2023-01-22 11:01:44
##  5 3397017529188E8A classic_bike  2023-01-12 13:58:01 2023-01-12 14:13:20
##  6 58E68156DAE3E311 electric_bike 2023-01-31 07:18:03 2023-01-31 07:21:16
##  7 2F7194B6012A98D4 electric_bike 2023-01-15 21:18:36 2023-01-15 21:32:36
##  8 DB1CF84154D6A049 classic_bike  2023-01-25 10:49:01 2023-01-25 10:58:22
##  9 34EAB943F88C4C5D electric_bike 2023-01-25 20:49:47 2023-01-25 21:02:14
## 10 BC8AB1AA51DA9115 classic_bike  2023-01-06 16:37:19 2023-01-06 16:49:52
## # ℹ 5,719,867 more rows
## # ℹ 9 more variables: start_station_name <chr>, start_station_id <chr>,
## #   end_station_name <chr>, end_station_id <chr>, start_lat <dbl>,
## #   start_lng <dbl>, end_lat <dbl>, end_lng <dbl>, member_casual <chr>

I will be changing the format of start_date and end_date as they are in chr format changing the start_date and_end date to date and time format

combined_trips$started_at = strptime(combined_trips$started_at,"%Y-%m-%d %H:%M:%S")
combined_trips$ended_at = strptime(combined_trips$ended_at,"%Y-%m-%d %H:%M:%S")

Let’s now make the data ready for analysis and Adding a column for calculating the ride_length and day_of the week

combined_trips<-mutate(combined_trips,ride_length=difftime(ended_at,started_at, units = "secs"))
combined_trips$day_of_week<-format(as.Date(combined_trips$started_at),"%A")

Let’s see the column created

head(combined_trips)

## # A tibble: 6 × 15
##   ride_id          rideable_type started_at          ended_at           
##   <chr>            <chr>         <dttm>              <dttm>             
## 1 F96D5A74A3E41399 electric_bike 2023-01-21 20:05:42 2023-01-21 20:16:33
## 2 13CB7EB698CEDB88 classic_bike  2023-01-10 15:37:36 2023-01-10 15:46:05
## 3 BD88A2E670661CE5 electric_bike 2023-01-02 07:51:57 2023-01-02 08:05:11
## 4 C90792D034FED968 classic_bike  2023-01-22 10:52:58 2023-01-22 11:01:44
## 5 3397017529188E8A classic_bike  2023-01-12 13:58:01 2023-01-12 14:13:20
## 6 58E68156DAE3E311 electric_bike 2023-01-31 07:18:03 2023-01-31 07:21:16
## # ℹ 11 more variables: start_station_name <chr>, start_station_id <chr>,
## #   end_station_name <chr>, end_station_id <chr>, start_lat <dbl>,
## #   start_lng <dbl>, end_lat <dbl>, end_lng <dbl>, member_casual <chr>,
## #   ride_length <drtn>, day_of_week <chr>

I’m filtering out values that have ride_length less then 0 secs. we don’t want those to be counted.

combined_trips <- filter(combined_trips,combined_trips$ride_length>0)

After filtering the outliers I can remove any null/missing or blank values that may alter the analysis

combined_trips <-combined_trips%>% 
    na.omit()

Now I can add another column for analysis by adding the month column to identify the month

combined_trips$month<-format(as.Date(combined_trips$started_at),"%m")

4. Analyzing Phase

Cleaned Data

Determining the average ride_length for member_casual

combined_trips %>% 
  group_by(member_casual) %>% 
summarise(average_ride_length=mean(ride_length))

## # A tibble: 2 × 2
##   member_casual average_ride_length
##   <chr>         <drtn>             
## 1 casual        1376.4150 secs     
## 2 member         727.9566 secs

Determining the median, min, max ride length and total_rides for members and casual

combined_trips %>% 
  group_by(member_casual) %>% 
  summarise(median_ride_length=median(ride_length), min_ride_length=min(ride_length), max_ride_length=max(ride_length), total_rides=length(ride_id))

## # A tibble: 2 × 5
##   member_casual median_ride_length min_ride_length max_ride_length total_rides
##   <chr>         <drtn>             <drtn>          <drtn>                <int>
## 1 casual        765 secs           1 secs          728178 secs         1531517
## 2 member        517 secs           1 secs           89872 secs         2799589

Calculating the average_ride_length and total_rides by member_casual and day_of_the_week

combined_trips %>% 
  group_by(member_casual,day_of_week) %>% 
  summarise(average_ride_length = mean(ride_length), total_rides = length(ride_id))

## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.

## # A tibble: 14 × 4
## # Groups:   member_casual [2]
##    member_casual day_of_week average_ride_length total_rides
##    <chr>         <chr>       <drtn>                    <int>
##  1 casual        Friday      1339.3150 secs           227826
##  2 casual        Monday      1352.1903 secs           175381
##  3 casual        Saturday    1555.1620 secs           310123
##  4 casual        Sunday      1594.1168 secs           254710
##  5 casual        Thursday    1199.9962 secs           198904
##  6 casual        Tuesday     1230.8860 secs           181510
##  7 casual        Wednesday   1176.0574 secs           183063
##  8 member        Friday       722.4142 secs           400467
##  9 member        Monday       693.0606 secs           386648
## 10 member        Saturday     815.0762 secs           350592
## 11 member        Sunday       816.8686 secs           307818
## 12 member        Thursday     696.1708 secs           452609
## 13 member        Tuesday      698.9975 secs           448778
## 14 member        Wednesday    695.2239 secs           452677

Sharing the findings by comparing the total number of rides among casual and member riders

library(dplyr)
library(ggplot2)
library(scales)

## 
## Attaching package: 'scales'

## The following object is masked from 'package:purrr':
## 
##     discard

## The following object is masked from 'package:readr':
## 
##     col_factor

combined_trips <- combined_trips %>%
  mutate(day_of_week = factor(weekdays(as.Date(started_at)), 
                              levels = c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday")))

# Summarize total rides by membership type and day of the week
weekly_summary <- combined_trips %>%
  group_by(member_casual, day_of_week) %>%
  summarise(total_rides = n(), .groups = 'drop')

# Plot the data
ggplot(weekly_summary, aes(x = day_of_week, y = total_rides, fill = member_casual)) +
  geom_col(position = "dodge") +
  scale_y_continuous(labels = comma) +  # Use comma format for y-axis
  scale_fill_manual(values = c("member" = "#0073C2FF", "casual" = "#EFC000FF")) +  # Professional color scheme
  labs(x = "Day of the Week", y = "Total Rides", title = "Total Rides by Day of the Week and Membership Type", caption = "Made by Kliz John Millares") +
  theme_minimal() +
  theme(legend.position = "right")

Total number of bike rides per month categorized by membership type

library(lubridate)

combined_trips <- combined_trips %>%
  mutate(month = factor(month(started_at, label = TRUE, abbr = FALSE), 
                        levels = month.name))

# Summarize total rides by membership type and month
monthly_summary <- combined_trips %>%
  group_by(member_casual, month) %>%
  summarise(total_rides = n(), .groups = 'drop')

# Plot the data
ggplot(monthly_summary, aes(x = month, y = total_rides, fill = member_casual)) +
  geom_col(position = "dodge") +
  scale_y_continuous(labels = comma) +  # Use comma format for y-axis
  scale_fill_manual(values = c("member" = "#0073C2FF", "casual" = "#EFC000FF")) +  # Professional color scheme
  labs(x = "Month", y = "Total Rides", title = "Total Rides by Month and Membership Type", caption = "Made by Kliz John Millares") +
  theme_minimal() +
  theme(legend.position = "right")

This next visualization helps in understanding the hourly patterns of bike usage throughout the week, allowing you to identify peak hours for each day.

combined_trips <- combined_trips %>%
  mutate(day_of_week = factor(weekdays(as.Date(started_at)), 
                              levels = c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday")),
         hour = hour(started_at))

# Summarize total rides by day of the week and hour
hourly_summary <- combined_trips %>%
  group_by(day_of_week, hour) %>%
  summarise(total_rides = n(), .groups = 'drop')

# Plot the data
ggplot(hourly_summary, aes(x = hour, y = total_rides, color = day_of_week, group = day_of_week)) +
  geom_line(size = 1) +
  scale_y_continuous(labels = comma) +  # Use comma format for y-axis
  scale_color_manual(values = c("Sunday" = "#E41A1C", "Monday" = "#377EB8", "Tuesday" = "#4DAF4A",
                                "Wednesday" = "#984EA3", "Thursday" = "#FF7F00", "Friday" = "#FFFF33", 
                                "Saturday" = "#A65628")) +  # Professional color scheme
  labs(x = "Hour of the Day", y = "Total Rides", title = "Hourly Bike Usage Throughout the Week", caption = "Made by Kliz Andrei Millares") +
  theme_minimal() +
  theme(legend.position = "right")

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

I will compare the monthly ride trends between members and casual users

combined_trips <- combined_trips %>%
  mutate(month = factor(month(started_at, label = TRUE, abbr = FALSE), 
                        levels = month.name))

# Summarize total rides by membership type and month
monthly_summary <- combined_trips %>%
  group_by(month, member_casual) %>%
  summarise(total_rides = n(), .groups = 'drop')

# Plot the data
ggplot(monthly_summary, aes(x = month, y = total_rides, color = member_casual, group = member_casual)) +
  geom_line(size = 1) +
  labs(x = "Month", y = "Total Rides", title = "Monthly Ride Trends by Membership Type", caption = "Made by Kliz John Millares") +
  scale_color_manual(values = c("member" = "#0073C2FF", "casual" = "#EFC000FF")) +
  scale_y_continuous(labels = scales::comma) +  # Format y-axis with commas
  theme_minimal()

Checking for the usage of rideable type among riders

# Summarize average ride length by membership type and rideable type
average_ride_length_summary <- combined_trips %>%
  group_by(member_casual, rideable_type) %>%
  summarise(average_ride_length = mean(ride_length), .groups = 'drop')

# Plot the data
ggplot(average_ride_length_summary, aes(x = rideable_type, y = average_ride_length, fill = member_casual)) +
  geom_col(position = "dodge") +
  labs(title = "Average Ride Length vs Rideable Type", x = "Rideable Type", y = "Average Ride Length (minutes)", caption = "Made by Kliz John Millares") +
  scale_fill_manual(values = c("member" = "#0073C2FF", "casual" = "#EFC000FF")) +  # Color scheme
  theme_minimal()

## Don't know how to automatically pick scale for object of type <difftime>.
## Defaulting to continuous.

6. Act Phase

Findings	Conclusion	Recommendation
Casual riders have a higher average ride length than members, especially on weekends.	Casual riders likely use bikes for leisure activities on weekends.	Offer special weekend prices for members to encourage casual riders to switch to annual membership.
Average ride length for casual riders is consistent during weekdays.	Casual riders likely use bikes for commuting to work on weekdays.	Introduce a weekly pass to attract casual riders to apply for membership.
Casual riders tend to prefer classic bikes over electric ones.	There may be outliers affecting the average ride length for docked bikes.	N/A