Case study: How does a bike-share navigate speedy success?

About the company

In 2016, Cyclistic launched a successful bike-share offering. Since then, the program has grown to a fleet of 5,824 bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime.

Cyclistic’s marketing strategy relied on building general awareness and appealing to broad consumer segments. One approach that helped make these things possible was the flexibility of its pricing plans: single-ride passes, full-day passes, and annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members.

Stakeholders; Director of Marketing: Lily Moreno Cyclists Cyclist executives

Business Task; Design marketing strategies aimed at converting casual riders into annual members. In order to do that, however, the team needs to better understand how annual members and casual riders differ, why casual riders would buy a membership, and how digital media could affect their marketing tactics.

Description of data sources used

I will be using Cyclistic’s public historical trip data to analyze and identify trends. I will be working with the latest 12 months datasets for 2024 from Divvy. The data has been made available by Motivate International Inc. under this licence

Data cleaning and manipulation

  • Downloaded 12 datasets for 2024
  • Saved .csv and .xls versions in different folders
  • Reviewed all 12 datasets and performed the following actions using excel
  • Filtered variables, to check out blanks (missing data) mainly noticed in station_name, station_id, end_station_name, end_station_id
  • Created the ride_length variable in each of the 12 datasets
    • This was a difference between start_at and ended_at time
  • Created day_of_week variable
    • This was picked from started_at date variable (had date & time) using excel WEEKDAY function with returning number type 1 (Sunday through Saturday)
  • Observed some start_at time is after ended_at time (negative ride_lengths) while for others the time was same for both
  • June 2024 had only time recorded start_at & end_at data instead of dd/mm/yyyy hh:mm
  • Changed start_at & end_at custom date format from mm:ss to dd/mm/yyyy hh:mm for July, August and September
  • Imported all 12 data sets into R studio
  • Reviewed column names to ensure uniformity
  • Merged 11 data sets into 1 data set
    • Left out June because of incomplete start_at data
    • The 2 created variables were both empty for June due to the missing data
  • Renamed variable “member_casual” to new name “user_type”
  • Renamed variable “rideable_type” to new name “ride_type”
  • Removed duplicate data with the distinct() function
  • Removed missing data from ride_id:ride_type and end_lat:end_lng using the drop_na() function
  • Viewed merged data set using glimpse() and head() functions
  • Changed time format for variable ride_length from seconds to minutes using the lubridate() function

Install these packages, if not yet install

install.packages(“skimr”) install.packages(“janitor”) install.packages(“hms”) install.packages(“lubridate”)

Set working directory

setwd("G:/My Drive/Learning/Google Data Analytics/Course 8/Case study/Dataset/Excel/CSV")

Import data sets into R studio

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readxl)
Jan_24 <- read_csv("202401-divvy-tripdata.csv")
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
##   dat <- vroom(...)
##   problems(dat)
## Rows: 144873 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (9): ride_id, rideable_type, started_at, ended_at, start_station_name, ...
## dbl  (5): start_lat, start_lng, end_lat, end_lng, day_of_week
## time (1): ride_length
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Feb_24 <- read_csv("202402-divvy-tripdata.csv")
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
##   dat <- vroom(...)
##   problems(dat)
## Rows: 223164 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (9): ride_id, rideable_type, started_at, ended_at, start_station_name, ...
## dbl  (5): start_lat, start_lng, end_lat, end_lng, day_of_week
## time (1): ride_length
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Mar_24 <- read_csv("202403-divvy-tripdata.csv")
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
##   dat <- vroom(...)
##   problems(dat)
## Rows: 301687 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (9): ride_id, rideable_type, started_at, ended_at, start_station_name, ...
## dbl  (5): start_lat, start_lng, end_lat, end_lng, day_of_week
## time (1): ride_length
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Apr_24 <- read_csv("202404-divvy-tripdata.csv")
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
##   dat <- vroom(...)
##   problems(dat)
## Rows: 415025 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (9): ride_id, rideable_type, started_at, ended_at, start_station_name, ...
## dbl  (5): start_lat, start_lng, end_lat, end_lng, day_of_week
## time (1): ride_length
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
May_24 <- read_csv("202405-divvy-tripdata.csv")
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
##   dat <- vroom(...)
##   problems(dat)
## Rows: 609493 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (9): ride_id, rideable_type, started_at, ended_at, start_station_name, ...
## dbl  (5): start_lat, start_lng, end_lat, end_lng, day_of_week
## time (1): ride_length
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Jun_24 <- read_csv("202406-divvy-tripdata.csv")
## Rows: 710721 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## lgl  (2): ride_length, day_of_week
## time (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Jul_24 <- read_csv("202407-divvy-tripdata.csv")
## Rows: 748962 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (9): ride_id, rideable_type, started_at, ended_at, start_station_name, ...
## dbl  (5): start_lat, start_lng, end_lat, end_lng, day_of_week
## time (1): ride_length
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Aug_24 <- read_csv("202408-divvy-tripdata.csv")
## Rows: 755639 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (9): ride_id, rideable_type, started_at, ended_at, start_station_name, ...
## dbl  (5): start_lat, start_lng, end_lat, end_lng, day_of_week
## time (1): ride_length
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Sep_24 <- read_csv("202409-divvy-tripdata.csv")
## Rows: 821276 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (9): ride_id, rideable_type, started_at, ended_at, start_station_name, ...
## dbl  (5): start_lat, start_lng, end_lat, end_lng, day_of_week
## time (1): ride_length
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Oct_24 <- read_csv("202410-divvy-tripdata.csv")
## Rows: 616281 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (9): ride_id, rideable_type, started_at, ended_at, start_station_name, ...
## dbl  (5): start_lat, start_lng, end_lat, end_lng, day_of_week
## time (1): ride_length
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Nov_24 <- read_csv("202411-divvy-tripdata.csv")
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
##   dat <- vroom(...)
##   problems(dat)
## Rows: 335075 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (9): ride_id, rideable_type, started_at, ended_at, start_station_name, ...
## dbl  (5): start_lat, start_lng, end_lat, end_lng, day_of_week
## time (1): ride_length
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Dec_24 <- read_csv("202412-divvy-tripdata.csv")
## Rows: 178372 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (9): ride_id, rideable_type, started_at, ended_at, start_station_name, ...
## dbl  (5): start_lat, start_lng, end_lat, end_lng, day_of_week
## time (1): ride_length
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Review all column names

colnames(Jan_24)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"      "ride_length"        "day_of_week"
colnames(Feb_24)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"      "ride_length"        "day_of_week"
colnames(Mar_24)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"      "ride_length"        "day_of_week"
colnames(Apr_24)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"      "ride_length"        "day_of_week"
colnames(May_24)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"      "ride_length"        "day_of_week"
colnames(Jun_24)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"      "ride_length"        "day_of_week"
colnames(Jul_24)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"      "ride_length"        "day_of_week"
colnames(Aug_24)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"      "ride_length"        "day_of_week"
colnames(Sep_24)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"      "ride_length"        "day_of_week"
colnames(Oct_24)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"      "ride_length"        "day_of_week"
colnames(Nov_24)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"      "ride_length"        "day_of_week"
colnames(Dec_24)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"      "ride_length"        "day_of_week"

Merge data from 11 months into one dataset

Dataset for June was left out because the Start_at variabe only has time recorded intead of date and time hence I couldn’t calculate the ride_length and day_of_week variables

library(dplyr)
Merged_2024 <- bind_rows(Jan_24, Feb_24, Mar_24, Apr_24, May_24, Jul_24, Aug_24,
                        Sep_24, Oct_24, Nov_24, Dec_24)

Rename variable member_casual to new name ‘user type’ Memeber_casual is abit confusing

library(tidyverse)
Merged_2024 <- Merged_2024 %>% 
  rename(user_type = member_casual)

Rename rideable_type to new name ‘ride_type’

library(tidyverse)
Merged_2024 <- Merged_2024 %>% 
  rename(ride_type = rideable_type)

Remove duplicate data using distinct function

library(tidyverse)
Merged_2024 %>% distinct()
## # A tibble: 5,149,847 × 15
##    ride_id     ride_type started_at ended_at start_station_name start_station_id
##    <chr>       <chr>     <chr>      <chr>    <chr>              <chr>           
##  1 C1D650626C… electric… 12/01/202… 12/01/2… Wells St & Elm St  KA1504000135    
##  2 EECD38BDB2… electric… 08/01/202… 08/01/2… Wells St & Elm St  KA1504000135    
##  3 F4A9CE7806… electric… 27/01/202… 27/01/2… Wells St & Elm St  KA1504000135    
##  4 0A0D9E15EE… classic_… 29/01/202… 29/01/2… Wells St & Randol… TA1305000030    
##  5 33FFC9805E… classic_… 31/01/202… 31/01/2… Lincoln Ave & Wav… 13253           
##  6 C96080812C… classic_… 07/01/202… 07/01/2… Wells St & Elm St  KA1504000135    
##  7 0EA7CB313D… classic_… 05/01/202… 05/01/2… Wells St & Elm St  KA1504000135    
##  8 EE11F3A3B3… electric… 04/01/202… 04/01/2… Wells St & Elm St  KA1504000135    
##  9 63E83DE8E3… classic_… 01/01/202… 01/01/2… Wells St & Elm St  KA1504000135    
## 10 8005682869… electric… 03/01/202… 03/01/2… Clark St & Ida B … TA1305000009    
## # ℹ 5,149,837 more rows
## # ℹ 9 more variables: end_station_name <chr>, end_station_id <chr>,
## #   start_lat <dbl>, start_lng <dbl>, end_lat <dbl>, end_lng <dbl>,
## #   user_type <chr>, ride_length <time>, day_of_week <dbl>

Remove missing data

Merged_2024 <- Merged_2024 %>%
  drop_na(ride_id:ride_type)

colSums(is.na(Merged_2024))
##            ride_id          ride_type         started_at           ended_at 
##                  0                  0                  0                  0 
## start_station_name   start_station_id   end_station_name     end_station_id 
##             929926             929926             956626             956626 
##          start_lat          start_lng            end_lat            end_lng 
##                  0                  0               6108               6108 
##          user_type        ride_length        day_of_week 
##                  0                105                  0

Realized there are 929926 null values for the variables start_station_name $ start_station_id. end_station_name $ end_station_id are missing 956626 values as well while end_lat $ end_lng are missing 6108 values. For now I’ll remove the the null values for end_lat $ end_lng and ignore start_station_name $ start_station_id. end_station_name $ end_station_id

Merged_2024 <- Merged_2024 %>%
  drop_na(end_lat:end_lng)
glimpse(Merged_2024)
## Rows: 5,143,739
## Columns: 15
## $ ride_id            <chr> "C1D650626C8C899A", "EECD38BDB25BFCB0", "F4A9CE7806…
## $ ride_type          <chr> "electric_bike", "electric_bike", "electric_bike", …
## $ started_at         <chr> "12/01/2024 15:30", "08/01/2024 15:45", "27/01/2024…
## $ ended_at           <chr> "12/01/2024 15:37", "08/01/2024 15:52", "27/01/2024…
## $ start_station_name <chr> "Wells St & Elm St", "Wells St & Elm St", "Wells St…
## $ start_station_id   <chr> "KA1504000135", "KA1504000135", "KA1504000135", "TA…
## $ end_station_name   <chr> "Kingsbury St & Kinzie St", "Kingsbury St & Kinzie …
## $ end_station_id     <chr> "KA1503000043", "KA1503000043", "KA1503000043", "13…
## $ start_lat          <dbl> 41.90327, 41.90294, 41.90295, 41.88430, 41.94880, 4…
## $ start_lng          <dbl> -87.63474, -87.63444, -87.63447, -87.63396, -87.675…
## $ end_lat            <dbl> 41.88918, 41.88918, 41.88918, 41.92182, 41.88918, 4…
## $ end_lng            <dbl> -87.63851, -87.63851, -87.63851, -87.64414, -87.638…
## $ user_type          <chr> "member", "member", "member", "member", "member", "…
## $ ride_length        <time> 00:07:00, 00:07:00, 00:08:00, 00:30:00, 00:26:00, …
## $ day_of_week        <dbl> 6, 2, 7, 2, 4, 1, 6, 5, 2, 4, 4, 4, 4, 6, 1, 4, 7, …
head(Merged_2024)
## # A tibble: 6 × 15
##   ride_id      ride_type started_at ended_at start_station_name start_station_id
##   <chr>        <chr>     <chr>      <chr>    <chr>              <chr>           
## 1 C1D650626C8… electric… 12/01/202… 12/01/2… Wells St & Elm St  KA1504000135    
## 2 EECD38BDB25… electric… 08/01/202… 08/01/2… Wells St & Elm St  KA1504000135    
## 3 F4A9CE78061… electric… 27/01/202… 27/01/2… Wells St & Elm St  KA1504000135    
## 4 0A0D9E15EE5… classic_… 29/01/202… 29/01/2… Wells St & Randol… TA1305000030    
## 5 33FFC9805E3… classic_… 31/01/202… 31/01/2… Lincoln Ave & Wav… 13253           
## 6 C96080812CD… classic_… 07/01/202… 07/01/2… Wells St & Elm St  KA1504000135    
## # ℹ 9 more variables: end_station_name <chr>, end_station_id <chr>,
## #   start_lat <dbl>, start_lng <dbl>, end_lat <dbl>, end_lng <dbl>,
## #   user_type <chr>, ride_length <time>, day_of_week <dbl>

Change time format for variable ride_length from seconds to minutes

Install these packages, if not yet installed install.packages(“hms”) install.packages(“lubridate”) library(lubridate) library(hms)

Merged_2024$day_of_week <- format(as.Date(Merged_2024$started_at), "%A")

Analyse

Min, Max & mean

Removed missing values and ensured ride_length was greater than 0

library(dplyr)
min_max_avg <- Merged_2024 %>% 
  group_by(user_type) %>% 
  summarise(average_ride_length = mean(ride_length[ride_length>0],na.rm = T), max_ride_length = max(ride_length[ride_length>0],na.rm = T), min_ride_length = min(ride_length[ride_length>0],na.rm = T))

Calculate mode of the day of the week

library(DescTools)
  Mode(Merged_2024$day_of_week)
## [1] "Sunday"
## attr(,"freq")
## [1] 771361

Calculating total rides by user type

ride_counts <- as.data.frame(table(Merged_2024$user_type))
colnames(ride_counts) <- c("user_type", "total_rides")

Average rides

library(tidyverse)
Average_rides <- Merged_2024 %>% 
  group_by(user_type, day_of_week) %>%                 
  summarise(rides_per_day = n(), average_ride_length = mean(ride_length[ride_length>0],na.rm = T)) %>%
  arrange(-rides_per_day, day_of_week)                    
## `summarise()` has grouped output by 'user_type'. You can override using the
## `.groups` argument.

Count of start stations

library(tidyverse)
Stations <- Merged_2024 %>% 
  drop_na(start_station_name) %>% 
  group_by(user_type, start_station_name) %>% 
  summarise(station_count = n()) %>% 
  arrange(-station_count, start_station_name)
## `summarise()` has grouped output by 'user_type'. You can override using the
## `.groups` argument.

Convert average time from seconds to minutes under new dataframe average_rides_new

Average_rides_new <- Average_rides %>%
  mutate(average_time = average_ride_length/360)

Visualisations

Total number of rides by user type

library(ggplot2)
ggplot(ride_counts, aes(x = "", y = total_rides, fill = user_type)) +
  geom_bar(stat = "identity", width = 1) +
  coord_polar(theta = "y") +
  theme_void() +
  ggtitle("Total Rides by User Type") +
  scale_fill_manual(values = c("casual" = "#F8766D", "member" = "#00BFC4")) +
  geom_text(aes(label = paste0(round(total_rides / sum(total_rides) * 100, 1), "%")),
            position = position_stack(vjust = 0.5), color = "white", size = 5)

## User type against type of bike

ggplot(data =Merged_2024) +
  geom_bar(mapping = aes(x = ride_type, fill=ride_type)) +
  facet_wrap(~user_type) +
  theme(axis.text.x = element_text(angle = 45)) +
  guides(fill="none") +
  labs(title = "Biker types Vs User subscription types ", x="Type of bike", 
       y="Number of riders")

## Total number of rides per day

ggplot(Merged_2024) +
  geom_bar(mapping=aes(x = day_of_week, fill=day_of_week)) +
  facet_wrap(~user_type) +
  theme(axis.text.x = element_text(angle = 45)) +
  guides(fill="none") +
  labs(title = "Total Rides per day ", x="Day of the week",y="Total rides")

## Average ride duration per day

ggplot(data = Average_rides_new) + 
  geom_col(mapping = aes(x = day_of_week, y = average_time, fill=day_of_week))+
  facet_wrap(~user_type) +
  theme(axis.text.x = element_text(angle = 45)) +
  guides(fill="none") +
  labs(title = "Average ride duration per day", x="Day of the week",y="Average duration in minutes")
## Don't know how to automatically pick scale for object of type <difftime>.
## Defaulting to continuous.

## Top 10 start stations by user type

library(ggplot2)
library(dplyr)
Top_ten <- Stations %>%
  slice_max(station_count, n = 10)
ggplot(data = Top_ten) +
 geom_col(mapping = aes(x = start_station_name, y = station_count, fill=start_station_name)) +
 facet_wrap(~user_type) +
 coord_flip() + 
 guides(fill="none") +
 labs(title = "Top 10 start stations by user type", x = "Start station name", y = "station count")

## Top ten end stations

library(tidyverse)
End_stations <- Merged_2024 %>% 
  drop_na(end_station_name) %>% 
  group_by(user_type, end_station_name) %>% 
  summarise(station_count = n()) %>% 
  arrange(-station_count, end_station_name)
## `summarise()` has grouped output by 'user_type'. You can override using the
## `.groups` argument.
library(ggplot2)
library(dplyr)
End_top_ten <- End_stations %>%
  slice_max(station_count, n = 10)
ggplot(data = End_top_ten) +
 geom_col(mapping = aes(x = end_station_name, y = station_count, fill= end_station_name)) +
 facet_wrap(~user_type) +
 coord_flip() + 
 guides(fill="none") +
 labs(title = "Top 10 end stations by user type", x = "End station name", y = "station count")

## Summary of Anlysis

  • Average ride length: casual 3.5 mins and member 2 mins
  • Max ride length: Casual and member was 24 hrs for both
  • Min ride length: Casual and member was 1 sec for both (could be due to data error)

Mode of day of the week: Sunday (771361)

Ride total counts by user type

  • Casual 35.1%
  • Member 64.1%

Type of bike by user type

  • Both Casual and Member users used the electric bike the most (most used)
  • Least used was the electronic scooter for both

Total number of rides per day

  • Saturday and Sunday were the busiest days for both casual and member users
  • Least days were Wednesday, Thursday & Friday for members and Monday, Wednesday & Friday for Casual users

Most frequent start and end stations

  • There were no stations used by both users/subscribers
  • Streeter dr & grand ave and Dusable lake shore dr & monroe st were the most used start and end stations for casual users
  • Kingsbury st & Kimzie st and Clinton st & Washington blvd were the most used start and end stations for member users

Recommendation

These are my top three recommendations based on the results of my analysis

One: From the results, we saw that both Casual and member users mostly preferred to start and end their rides at Streeter dr & grand ave and Dusable lake shore dr & monroe st. The company can use this station to advertise their memebership packages through billboards, fliers to attract more causal riders into membership.

Two:Casual users preffered to use electric bikes (so did the members). Use promotions like discounted costs for electronic bikes for casual users who enroll into membership category.

Three: For both casual and member users, we noticed the highest number of rides were recorded on Saturday and Sunday. The company could come up with attractive membership packages like weekend package to attract more casual riders into membership

References: https://www.kaggle.com/code/michellemariesanna/cyclistic-case-study. This helped to draw a pie-chart