Introduction

Welcome to the Cyclistic bike-share analysis case study. This project involves a fictional company called Cyclistic, which follows the data analysis process: Ask, Prepare, Process, Analyze, Share, and Act. The focus is on increasing annual memberships.

Key points include: the differences in bike usage between annual members and casual riders; the potential revenue increase from transitioning casual riders to annual memberships; and the strategies Cyclistic can implement using digital media to motivate casual riders to become members. Additionally, a new marketing strategy will be recommended to facilitate this conversion.

Stakeholders

⦁ Lily Moreno: The director of marketing ⦁ Cyclistic marketing analytics team ⦁ Cyclistic executive team

NEED to prepare report with the following deliverables: - ASK: A clear statement of the business task. - PREPARE: A description of all data sources used. - PROCESS: Documentation of any cleaning or manipulation of data. - ANALYZE: A summary of your analysis. - SHARE : Supporting visualizations and key findings. - ACT: Your top three recommendations based on your analysis.

Ask

Business Task

How do annual members and casual riders use Cyclistic bikes differently?
Why would casual riders buy Cyclistic annual memberships?
How can Cyclistic use digital media to influence casual riders to become members?

Prepare

Data Source Description

The data was obtained through https://divvy-tripdata.s3.amazonaws.com/index.html. This data has been made available by Motivate International Inc. This data is public, although due to privacy considerations, personal data was removed.

The data was checked accroding to ROCCC: Reliable: the data has been rpoven to be reliable collected by a credible source Original: the data was first-hand collected by Cyclistic Comprehensive: the dataframe includes the data to answer business tasks Current: the data was collected in the last 12 months Cited: the data is authorized under license

The dataframe consists of 5561700 rows and 13 columns.

Process

Setting up environment

install.packages("tidyverse")

## 
## The downloaded binary packages are in
##  /var/folders/2s/bm7h6_g91cn25m74196plqq00000gn/T//Rtmp65ufeL/downloaded_packages

install.packages("skimr")

## 
## The downloaded binary packages are in
##  /var/folders/2s/bm7h6_g91cn25m74196plqq00000gn/T//Rtmp65ufeL/downloaded_packages

library(tidyverse)
library(lubridate)
library(dplyr)
library(tidyr)
library(ggplot2)
library(stringr)
library(skimr)

Load the data

data_202309 <- read.csv("202309-divvy-tripdata.csv")
data_202310 <- read.csv("202310-divvy-tripdata.csv")
data_202311 <- read.csv("202311-divvy-tripdata.csv")
data_202312 <- read.csv("202312-divvy-tripdata.csv")
data_202401 <- read.csv("202401-divvy-tripdata.csv")
data_202402 <- read.csv("202402-divvy-tripdata.csv")
data_202403 <- read.csv("202403-divvy-tripdata.csv")
data_202404 <- read.csv("202404-divvy-tripdata.csv")
data_202405 <- read.csv("202405-divvy-tripdata.csv")
data_202406 <- read.csv("202406-divvy-tripdata.csv")
data_202407 <- read.csv("202407-divvy-tripdata.csv")
data_202408 <- read.csv("202408-divvy-tripdata.csv")

Preview the data

colnames(data_202309)

##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"

colnames(data_202310)

##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"

colnames(data_202311)

##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"

colnames(data_202312)

##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"

colnames(data_202401)

##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"

colnames(data_202402)

##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"

colnames(data_202403)

##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"

colnames(data_202404)

##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"

colnames(data_202405)

##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"

colnames(data_202406)

##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"

colnames(data_202407)

##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"

colnames(data_202408)

##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"

Merge the data

all_trips <- bind_rows(data_202309, data_202310, data_202311, data_202312, data_202401, data_202402, data_202403, data_202404, data_202405, data_202406, data_202407, data_202408)

View the dataset

str(all_trips)

## 'data.frame':    5699639 obs. of  13 variables:
##  $ ride_id           : chr  "011C1903BF4E2E28" "87DB80E048A1BF9F" "7C2EB7AF669066E3" "57D197B010269CE3" ...
##  $ rideable_type     : chr  "classic_bike" "classic_bike" "electric_bike" "classic_bike" ...
##  $ started_at        : chr  "2023-09-23 00:27:50" "2023-09-02 09:26:43" "2023-09-25 18:30:11" "2023-09-13 15:30:49" ...
##  $ ended_at          : chr  "2023-09-23 00:33:27" "2023-09-02 09:38:19" "2023-09-25 18:41:39" "2023-09-13 15:39:18" ...
##  $ start_station_name: chr  "Halsted St & Wrightwood Ave" "Clark St & Drummond Pl" "Financial Pl & Ida B Wells Dr" "Clark St & Drummond Pl" ...
##  $ start_station_id  : chr  "TA1309000061" "TA1307000142" "SL-010" "TA1307000142" ...
##  $ end_station_name  : chr  "Sheffield Ave & Wellington Ave" "Racine Ave & Fullerton Ave" "Racine Ave & 15th St" "Racine Ave & Belmont Ave" ...
##  $ end_station_id    : chr  "TA1307000052" "TA1306000026" "13304" "TA1308000019" ...
##  $ start_lat         : num  41.9 41.9 41.9 41.9 41.9 ...
##  $ start_lng         : num  -87.6 -87.6 -87.6 -87.6 -87.6 ...
##  $ end_lat           : num  41.9 41.9 41.9 41.9 41.9 ...
##  $ end_lng           : num  -87.7 -87.7 -87.7 -87.7 -87.7 ...
##  $ member_casual     : chr  "member" "member" "member" "member" ...

skim_without_charts(all_trips)

Data summary
Name	all_trips
Number of rows	5699639
Number of columns	13
_______________________
Column type frequency:
character	9
numeric	4
________________________
Group variables	None

Variable type: character

skim_variable	complete_rate	min	max	empty	n_unique
ride_id	1	16	16	0	5699428
rideable_type	1	12	13	0	2
started_at	1	19	23	0	5232178
ended_at	1	19	23	0	5238399
start_station_name	1	0	64	968697	1727
start_station_id	1	0	14	968697	1694
end_station_name	1	0	64	1006133	1739
end_station_id	1	0	36	1006133	1703
member_casual	1	6	6	0	2

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100
start_lat	0	1	41.90	0.05	41.64	41.88	41.90	41.93	42.07
start_lng	0	1	-87.65	0.03	-87.94	-87.66	-87.64	-87.63	-87.52
end_lat	7526	1	41.90	0.05	16.06	41.88	41.90	41.93	87.96
end_lng	7526	1	-87.65	0.04	-144.05	-87.66	-87.64	-87.63	-79.02

Delete monthly files, create a copy of the dataset to preserve data integrity

#delete
remove(data_202309, data_202310, data_202311, data_202312, data_202401, data_202402, data_202403, data_202404, data_202405, data_202406, data_202407, data_202408)
#copy_df
all_trips_2 <- all_trips

Detect errors, patterns and outliers

# quick glance a the dataset
head(all_trips, 15)

##             ride_id rideable_type          started_at            ended_at
## 1  011C1903BF4E2E28  classic_bike 2023-09-23 00:27:50 2023-09-23 00:33:27
## 2  87DB80E048A1BF9F  classic_bike 2023-09-02 09:26:43 2023-09-02 09:38:19
## 3  7C2EB7AF669066E3 electric_bike 2023-09-25 18:30:11 2023-09-25 18:41:39
## 4  57D197B010269CE3  classic_bike 2023-09-13 15:30:49 2023-09-13 15:39:18
## 5  8A2CEA7C8C8074D8  classic_bike 2023-09-18 15:58:58 2023-09-18 16:05:04
## 6  03F7044D1304CD58 electric_bike 2023-09-15 20:19:25 2023-09-15 20:30:27
## 7  672503E0FC0835EC electric_bike 2023-09-27 16:52:18 2023-09-27 17:03:22
## 8  1D806492F95973AC electric_bike 2023-09-17 11:07:05 2023-09-17 11:13:39
## 9  40D9EF382CC6C53D  classic_bike 2023-09-17 11:58:50 2023-09-17 12:08:36
## 10 C60CE661AF7ECC93 electric_bike 2023-09-07 20:52:43 2023-09-07 21:06:51
## 11 3812B98E9406040E  classic_bike 2023-09-12 16:01:28 2023-09-12 16:17:47
## 12 EBA56298CB3C803F  classic_bike 2023-09-24 13:17:23 2023-09-24 13:50:43
## 13 C6BD5AF648F11D11 electric_bike 2023-09-28 18:09:40 2023-09-28 18:15:04
## 14 585C82FA2E006DE9  classic_bike 2023-09-22 12:30:41 2023-09-22 12:42:21
## 15 95E72C49D692F822  classic_bike 2023-09-07 16:28:17 2023-09-07 16:31:25
##                start_station_name start_station_id
## 1     Halsted St & Wrightwood Ave     TA1309000061
## 2          Clark St & Drummond Pl     TA1307000142
## 3   Financial Pl & Ida B Wells Dr           SL-010
## 4          Clark St & Drummond Pl     TA1307000142
## 5     Halsted St & Wrightwood Ave     TA1309000061
## 6  Southport Ave & Wrightwood Ave     TA1307000113
## 7      Kedzie Ave & Milwaukee Ave            13085
## 8          Jeffery Blvd & 71st St     KA1503000018
## 9      Kedzie Ave & Milwaukee Ave            13085
## 10 Southport Ave & Wrightwood Ave     TA1307000113
## 11  Financial Pl & Ida B Wells Dr           SL-010
## 12       Clark St & Schreiber Ave     KA1504000156
## 13    Halsted St & Wrightwood Ave     TA1309000061
## 14    Halsted St & Wrightwood Ave     TA1309000061
## 15         Clark St & Drummond Pl     TA1307000142
##                  end_station_name end_station_id start_lat start_lng  end_lat
## 1  Sheffield Ave & Wellington Ave   TA1307000052  41.92914 -87.64908 41.93625
## 2      Racine Ave & Fullerton Ave   TA1306000026  41.93125 -87.64434 41.92557
## 3            Racine Ave & 15th St          13304  41.87506 -87.63314 41.86127
## 4        Racine Ave & Belmont Ave   TA1308000019  41.93125 -87.64434 41.93974
## 5      Racine Ave & Fullerton Ave   TA1306000026  41.92914 -87.64908 41.92557
## 6                                                 41.92884 -87.66387 41.90000
## 7                                                 41.92956 -87.70796 41.93000
## 8                                                 41.76659 -87.57645 41.77000
## 9  California Ave & Milwaukee Ave          13084  41.92957 -87.70786 41.92269
## 10                                                41.92882 -87.66391 41.90000
## 11              Adler Planetarium          13431  41.87502 -87.63309 41.86610
## 12         Oakley Ave & Touhy Ave         RP-004  41.99990 -87.67007 42.01234
## 13         Halsted St & Roscoe St   TA1309000025  41.92919 -87.64914 41.94367
## 14         Halsted St & Roscoe St   TA1309000025  41.92914 -87.64908 41.94367
## 15      Clark St & Wellington Ave   TA1307000136  41.93125 -87.64434 41.93650
##      end_lng member_casual
## 1  -87.65266        member
## 2  -87.65842        member
## 3  -87.65663        member
## 4  -87.65887        member
## 5  -87.65842        member
## 6  -87.64000        member
## 7  -87.66000        member
## 8  -87.57000        member
## 9  -87.69715        member
## 10 -87.63000        member
## 11 -87.60727        member
## 12 -87.68824        member
## 13 -87.64895        member
## 14 -87.64895        member
## 15 -87.64754        member

# checking for duplicates
nrow(all_trips)

## [1] 5699639

# number of rows is larger than n_unique value for ride_id, we need to delete duplicates
cleaned_all_trips <- all_trips %>%
  distinct(ride_id, .keep_all = TRUE)

#remove rows with null values
cleaned_all_trips <- na.omit(cleaned_all_trips)

#str() showed that started_at and ended_at values are stored as chr, we need to convert it into datetime format
cleaned_all_trips$started_at <- as.POSIXct(cleaned_all_trips$started_at)
cleaned_all_trips$ended_at <- as.POSIXct(cleaned_all_trips$ended_at)

cleaned_all_trips$date <- as.Date(cleaned_all_trips$started_at)
cleaned_all_trips$day_of_week <- format(as.Date(cleaned_all_trips$date), "%A")

#create ride_length column which is calculated by substracting started_at from ended_at
cleaned_all_trips$ride_length <-as.numeric(difftime(cleaned_all_trips$ended_at,cleaned_all_trips$started_at, units = "mins"))

#remove rows where ride_length is <= 0
cleaned_all_trips <- cleaned_all_trips %>%
  filter(ride_length >0)

# rename member_casual to member_type
cleaned_all_trips <- cleaned_all_trips %>% rename(member_type = member_casual)

# overview the final dataset
glimpse(cleaned_all_trips)

## Rows: 5,690,679
## Columns: 16
## $ ride_id            <chr> "011C1903BF4E2E28", "87DB80E048A1BF9F", "7C2EB7AF66…
## $ rideable_type      <chr> "classic_bike", "classic_bike", "electric_bike", "c…
## $ started_at         <dttm> 2023-09-23 00:27:50, 2023-09-02 09:26:43, 2023-09-…
## $ ended_at           <dttm> 2023-09-23 00:33:27, 2023-09-02 09:38:19, 2023-09-…
## $ start_station_name <chr> "Halsted St & Wrightwood Ave", "Clark St & Drummond…
## $ start_station_id   <chr> "TA1309000061", "TA1307000142", "SL-010", "TA130700…
## $ end_station_name   <chr> "Sheffield Ave & Wellington Ave", "Racine Ave & Ful…
## $ end_station_id     <chr> "TA1307000052", "TA1306000026", "13304", "TA1308000…
## $ start_lat          <dbl> 41.92914, 41.93125, 41.87506, 41.93125, 41.92914, 4…
## $ start_lng          <dbl> -87.64908, -87.64434, -87.63314, -87.64434, -87.649…
## $ end_lat            <dbl> 41.93625, 41.92557, 41.86127, 41.93974, 41.92557, 4…
## $ end_lng            <dbl> -87.65266, -87.65842, -87.65663, -87.65887, -87.658…
## $ member_type        <chr> "member", "member", "member", "member", "member", "…
## $ date               <date> 2023-09-23, 2023-09-02, 2023-09-25, 2023-09-13, 20…
## $ day_of_week        <chr> "Saturday", "Saturday", "Monday", "Wednesday", "Mon…
## $ ride_length        <dbl> 5.6166667, 11.6000000, 11.4666667, 8.4833333, 6.100…

# identify bad data and outliers
cleaned_all_trips %>%
  select(member_type, ride_length) %>%
  group_by(member_type) %>%
  dplyr::summarize(min_ride_length = min(ride_length), max_ride_length = max(ride_length))

## # A tibble: 2 × 3
##   member_type min_ride_length max_ride_length
##   <chr>                 <dbl>           <dbl>
## 1 casual             0.00172            1501.
## 2 member             0.000650           1500.

The min_ride_length and max_ride_length contain values that are less than 1 min and more than 24 hours, 24 hours = 1,440 mins

# remove bad data and outliers
cleaned_all_trips <- cleaned_all_trips %>%
  filter(ride_length >= 1 & ride_length < 1440)
nrow(cleaned_all_trips)

## [1] 5561700

Analyze

Descriptive Analysis

# identify average, mean, maximum, minimum, etc.
summary(cleaned_all_trips$ride_length)

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##    1.000    5.815    9.967   15.839   17.550 1439.867

We found that a minimum ride length is 1 min, 25% of rides are less or equal to 5.815 min, median ride lasted 9.967 min, mean ride was 15.839, 75% of rides were less or equal to 17.550, and a maximum ride lasted 1439.867 min (almost 24 hours).

We need to identify these values for two different rider types – member and casual

# identify % of member and casual riders
result_percentage <- cleaned_all_trips %>%
  group_by(member_type) %>%
  summarize(total_count = n()) %>%
  mutate(percentage = (total_count / sum(total_count)) * 100)

print(result_percentage)

## # A tibble: 2 × 3
##   member_type total_count percentage
##   <chr>             <int>      <dbl>
## 1 casual          1981364       35.6
## 2 member          3580336       64.4

ggplot(result_percentage, aes(x = "", y = percentage, fill = member_type)) +
  geom_bar(width = 1, stat = "identity") +
  coord_polar("y") +
  labs(title = "Percentage of each member type") +
  theme_void() +
  scale_fill_manual(values = c("member" = "blue", "casual" = "orange")) +
  geom_text(aes(label = paste0(round(percentage, 1), "%")),
            position = position_stack(vjust = 0.5),
            color = "black")

# identify number of member and casual riders by bicycle type
bike_type_dist <- cleaned_all_trips %>%
  group_by(rideable_type, member_type) %>%
  summarize(count_trips = n(), .groups = 'drop') %>%
  group_by(rideable_type) %>%
  mutate(perc = (count_trips / sum(count_trips)) * 100) 

# create a viz
  ggplot(bike_type_dist, aes(x=rideable_type, y=count_trips,fill=member_type,color=member_type)) +
  geom_bar(stat = 'identity', position = 'dodge') +
  geom_text(aes(label = paste0(round(perc, 1), "%")),
            color = "black",
            position = position_dodge(width = 0.9),
            vjust = -0.5) +
  theme_bw() +
  labs(title="Percentage of rides by bicycle and member type", x = "Bicycle type", y = "Number of rides") +
  scale_fill_manual(values = c("member" = "blue", "casual" = "orange"))

### Separating date and time,transforming day_of_week, date to month and season

# separate started_at to date and time format
cleaned_all_trips$started_date <- format(cleaned_all_trips$started_at, "%m%d%y")
cleaned_all_trips$started_time<- format(cleaned_all_trips$started_at, "%H:%M:%S")

Visualization of rides per hour

cleaned_all_trips <- cleaned_all_trips %>%
  mutate(started_hour = factor(hour(started_at), levels = 0:23))
cleaned_all_trips %>%
  group_by(started_hour, member_type) %>%
  summarize(ride_number = n(),
            avg_duration = mean(ride_length),
            .groups = 'drop') %>%
  ggplot(aes(x = started_hour, y = ride_number, fill = member_type)) +
  geom_col(position = "dodge") +
  labs(title = "Daily length of ride", x= "Hours", y = "Number of rides", fill = "Member type") +
  scale_fill_manual(values = c("member" = "blue", "casual" = "orange"))

Visualization of rides per day of week

cleaned_all_trips <- cleaned_all_trips %>%
  mutate(day_of_week = factor(day_of_week,
                              levels = c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday")))

cleaned_all_trips %>%
  group_by(day_of_week, member_type) %>%
  summarize(num_dow = n(),
            .groups = 'drop') %>%
  arrange(day_of_week) %>%
  ggplot(aes(x = day_of_week, y = num_dow, fill = member_type)) +
  geom_col(position = "dodge") +
  labs(title = "Weekly distribution of rides by member type", x= "Days", y = "Number of rides", fill = "Member type") +
  scale_fill_manual(values = c("member" = "blue", "casual" = "orange"))

# create a column for months 
cleaned_all_trips$started_date_object <- as.Date(cleaned_all_trips$started_at)
cleaned_all_trips$month_number <- month(cleaned_all_trips$started_date_object)
cleaned_all_trips$month_name <- month(cleaned_all_trips$started_date_object, label = TRUE)

# create a column for seasons
cleaned_all_trips <- cleaned_all_trips %>%
  mutate(season = case_when(
    month_number %in% c(12,1,2) ~ "Winter",
    month_number %in% c(3,4,5) ~ "Spring",
    month_number %in% c(6,7,8) ~ "Summer",
    month_number %in% c(9,10,11) ~ "Fall",
    TRUE ~ "UNKNOWN"
  ))

# checking if there are outliers in month_name and season
num_unique_months <-n_distinct(cleaned_all_trips$month_name)
print(num_unique_months)

## [1] 12

num_unique_seasons <- n_distinct(cleaned_all_trips$season)
print(num_unique_seasons)

## [1] 4

head(cleaned_all_trips)

##            ride_id rideable_type          started_at            ended_at
## 1 011C1903BF4E2E28  classic_bike 2023-09-23 00:27:50 2023-09-23 00:33:27
## 2 87DB80E048A1BF9F  classic_bike 2023-09-02 09:26:43 2023-09-02 09:38:19
## 3 7C2EB7AF669066E3 electric_bike 2023-09-25 18:30:11 2023-09-25 18:41:39
## 4 57D197B010269CE3  classic_bike 2023-09-13 15:30:49 2023-09-13 15:39:18
## 5 8A2CEA7C8C8074D8  classic_bike 2023-09-18 15:58:58 2023-09-18 16:05:04
## 6 03F7044D1304CD58 electric_bike 2023-09-15 20:19:25 2023-09-15 20:30:27
##               start_station_name start_station_id
## 1    Halsted St & Wrightwood Ave     TA1309000061
## 2         Clark St & Drummond Pl     TA1307000142
## 3  Financial Pl & Ida B Wells Dr           SL-010
## 4         Clark St & Drummond Pl     TA1307000142
## 5    Halsted St & Wrightwood Ave     TA1309000061
## 6 Southport Ave & Wrightwood Ave     TA1307000113
##                 end_station_name end_station_id start_lat start_lng  end_lat
## 1 Sheffield Ave & Wellington Ave   TA1307000052  41.92914 -87.64908 41.93625
## 2     Racine Ave & Fullerton Ave   TA1306000026  41.93125 -87.64434 41.92557
## 3           Racine Ave & 15th St          13304  41.87506 -87.63314 41.86127
## 4       Racine Ave & Belmont Ave   TA1308000019  41.93125 -87.64434 41.93974
## 5     Racine Ave & Fullerton Ave   TA1306000026  41.92914 -87.64908 41.92557
## 6                                                41.92884 -87.66387 41.90000
##     end_lng member_type       date day_of_week ride_length started_date
## 1 -87.65266      member 2023-09-23    Saturday    5.616667       092323
## 2 -87.65842      member 2023-09-02    Saturday   11.600000       090223
## 3 -87.65663      member 2023-09-25      Monday   11.466667       092523
## 4 -87.65887      member 2023-09-13   Wednesday    8.483333       091323
## 5 -87.65842      member 2023-09-18      Monday    6.100000       091823
## 6 -87.64000      member 2023-09-16    Saturday   11.033333       091523
##   started_time started_hour started_date_object month_number month_name season
## 1     00:27:50            0          2023-09-23            9        Sep   Fall
## 2     09:26:43            9          2023-09-02            9        Sep   Fall
## 3     18:30:11           18          2023-09-25            9        Sep   Fall
## 4     15:30:49           15          2023-09-13            9        Sep   Fall
## 5     15:58:58           15          2023-09-18            9        Sep   Fall
## 6     20:19:25           20          2023-09-16            9        Sep   Fall

# create a visualization for months
monthly_distribution <- cleaned_all_trips %>%
  group_by(month_name, member_type) %>%
  summarize(num_rides_month = n(), .groups = 'drop')

ggplot(monthly_distribution, aes(x = month_name, y = num_rides_month, fill = member_type)) +
  geom_col(position = "dodge") +
  labs(title = "Monthly distribution by member type",
       x = "Month", y = "Number of rides", fill = "Member type") +
  scale_fill_manual(values = c("member" = "blue", "casual" = "orange")) +
  theme_minimal() +
  scale_x_discrete(limits = levels(cleaned_all_trips$month_name))

# create a visualization for seasons
season_distribution <- cleaned_all_trips %>%
  group_by(season, member_type) %>%
  summarize(num_rides_season = n(), .groups = 'drop')

ggplot(season_distribution, aes(x = season, y = num_rides_season, fill = member_type)) +
  geom_col(position = "dodge") +
  labs(title = "Rides distribution by member type and season",
       x = "Season", y = "Number of rides", fill = "Member type") +
  scale_fill_manual(values = c("member" = "blue", "casual" = "orange")) +
  theme_minimal() +
  scale_x_discrete(limits = levels(cleaned_all_trips$season))

Basic statistics

Members and casual riders distribution

Percentage of Members: 64.4% Percentage of Casual Riders: 35.6%

Cyclistic rides distribution by member type

Bicycle usage by member type

Both members and casual riders exhibit similar preferences when it comes to bicycle types, with a near-equal distribution in usage across the different types of bikes.

Ride length

Minimum ride length: 1 minute Maximum ride length: 1439.867 minutes (nearly 24 hours) Average ride length: 15.839 minutes

Rides by hour and member type

Members have a strong presence in the morning and evening hours, from which we can assume the Cyclistic bikes are used for commuting purposes. Casual riders seem to peak in the late afternoon around 3-4pm, which mai indicate that they use bikes for leisure or non-commuting purposes. Lastly, there is evidence of higher usage during daytime hours.

Rides by day of week and member type

Members show a consistent pattern of higher bicycle usage throughout weekdays which confirms our assumption that members use bicycle for commuting purposes. Casual riders tend to ride on weekends, especially Sundays. This confirms our assumption that casual riders use bikes mostly for leisure.

Rides by months and member types

The summer months (June to August) see the highest number of rides, with July showing a peak for both member and casual riders. A steady decline in bicycle usage occurs during the fall and winter months. As for member types, we can assume both groups show increased ridership in the summer, casual riders have a sharper peak compared to members. This indicates that casual riders are more influenced by the seasonal appeal of biking, likely for leisure or tourism. As the weather cools, both members and casual riders exhibit a decline in ridership starting from September, but the drop is more dramatic for casual riders. Members continue to use bikes at relatively higher rates, even into the fall months, whereas casual riders sharply reduce their usage, especially by November and December. Thus, we can say that members seem less affected by the changing seasons, maintaining a higher level of usage, possibly due to commuting needs, while casual riders appear to limit their biking activities once the cooler months set in.

Rides by seasons and member types

As it was mentioned earlier, members have a more consistent and gradual fluctuation across the seasons, with peaks in summer and lows in winter, but they maintain a presence year-round. Their usage appears to be more practical, such as commuting or other routine transportation needs, making them less sensitive to extreme weather. Casual riders are highly seasonal, with their ridership concentrated heavily in summer and almost nonexistent in winter. Their usage pattern is clearly tied to favorable weather, likely driven by recreational or occasional use.

Act

In the analysis results above we showed how members and casual riders use Cyclistic bikes differently. There two more questions in our Business task section we need to answer.

Why would casual riders buy Cyclistic annual membership? Casual riders may be motivated to purchase an annual membership due to significant cost savings compared to the cumulative cost of single rides, particularly for those who ride frequently, especially during peak seasons like summer and weekends. The membership offers unlimited rides, making it ideal for regular commuting and spontaneous trips without worrying about additional fees. The convenience of having a bike readily available for work or leisure activities can also enhance the overall experience, encouraging casual riders to transition to membership.

The ride distribution analysis shows that members ride most frequently on weekdays, suggesting a high number of commuting trips. Casual riders, however, tend to ride more on weekends. This indicates that casual riders might not be using bikes for regular commuting yet but could be persuaded to do so. Marketing the membership as an affordable, reliable commuting option can appeal to this group.

How can Cyclistic use digital media to influence casual riders to become members? Cyclistic can leverage targeted digital advertising on platforms like social media, ride-sharing apps, and local community forums to reach casual riders effectively. Highlighting the cost savings associated with membership in these campaigns can resonate with potential customers. Additionally, showcasing the convenience of city bikes for commuting—especially during rush hours—can attract riders looking for efficient transportation options. Seasonal promotions, such as discounts or special offers during peak riding months, can further entice casual users. Personalization of offers based on riding behavior can also be effective, demonstrating to casual riders that membership provides tailored benefits suited to their needs.

Cyclistic Capstone Project

chekuuche

2024-10-04