Welcome to the Cyclistic bike-share analysis case study. This project involves a fictional company called Cyclistic, which follows the data analysis process: Ask, Prepare, Process, Analyze, Share, and Act. The focus is on increasing annual memberships.
Key points include: the differences in bike usage between annual members and casual riders; the potential revenue increase from transitioning casual riders to annual memberships; and the strategies Cyclistic can implement using digital media to motivate casual riders to become members. Additionally, a new marketing strategy will be recommended to facilitate this conversion.
⦁ Lily Moreno: The director of marketing ⦁ Cyclistic marketing analytics team ⦁ Cyclistic executive team
NEED to prepare report with the following deliverables: - ASK: A clear statement of the business task. - PREPARE: A description of all data sources used. - PROCESS: Documentation of any cleaning or manipulation of data. - ANALYZE: A summary of your analysis. - SHARE : Supporting visualizations and key findings. - ACT: Your top three recommendations based on your analysis.
The data was obtained through https://divvy-tripdata.s3.amazonaws.com/index.html. This data has been made available by Motivate International Inc. This data is public, although due to privacy considerations, personal data was removed.
The data was checked accroding to ROCCC: Reliable: the data has been rpoven to be reliable collected by a credible source Original: the data was first-hand collected by Cyclistic Comprehensive: the dataframe includes the data to answer business tasks Current: the data was collected in the last 12 months Cited: the data is authorized under license
The dataframe consists of 5561700 rows and 13 columns.
install.packages("tidyverse")
##
## The downloaded binary packages are in
## /var/folders/2s/bm7h6_g91cn25m74196plqq00000gn/T//Rtmp65ufeL/downloaded_packages
install.packages("skimr")
##
## The downloaded binary packages are in
## /var/folders/2s/bm7h6_g91cn25m74196plqq00000gn/T//Rtmp65ufeL/downloaded_packages
library(tidyverse)
library(lubridate)
library(dplyr)
library(tidyr)
library(ggplot2)
library(stringr)
library(skimr)
data_202309 <- read.csv("202309-divvy-tripdata.csv")
data_202310 <- read.csv("202310-divvy-tripdata.csv")
data_202311 <- read.csv("202311-divvy-tripdata.csv")
data_202312 <- read.csv("202312-divvy-tripdata.csv")
data_202401 <- read.csv("202401-divvy-tripdata.csv")
data_202402 <- read.csv("202402-divvy-tripdata.csv")
data_202403 <- read.csv("202403-divvy-tripdata.csv")
data_202404 <- read.csv("202404-divvy-tripdata.csv")
data_202405 <- read.csv("202405-divvy-tripdata.csv")
data_202406 <- read.csv("202406-divvy-tripdata.csv")
data_202407 <- read.csv("202407-divvy-tripdata.csv")
data_202408 <- read.csv("202408-divvy-tripdata.csv")
colnames(data_202309)
## [1] "ride_id" "rideable_type" "started_at"
## [4] "ended_at" "start_station_name" "start_station_id"
## [7] "end_station_name" "end_station_id" "start_lat"
## [10] "start_lng" "end_lat" "end_lng"
## [13] "member_casual"
colnames(data_202310)
## [1] "ride_id" "rideable_type" "started_at"
## [4] "ended_at" "start_station_name" "start_station_id"
## [7] "end_station_name" "end_station_id" "start_lat"
## [10] "start_lng" "end_lat" "end_lng"
## [13] "member_casual"
colnames(data_202311)
## [1] "ride_id" "rideable_type" "started_at"
## [4] "ended_at" "start_station_name" "start_station_id"
## [7] "end_station_name" "end_station_id" "start_lat"
## [10] "start_lng" "end_lat" "end_lng"
## [13] "member_casual"
colnames(data_202312)
## [1] "ride_id" "rideable_type" "started_at"
## [4] "ended_at" "start_station_name" "start_station_id"
## [7] "end_station_name" "end_station_id" "start_lat"
## [10] "start_lng" "end_lat" "end_lng"
## [13] "member_casual"
colnames(data_202401)
## [1] "ride_id" "rideable_type" "started_at"
## [4] "ended_at" "start_station_name" "start_station_id"
## [7] "end_station_name" "end_station_id" "start_lat"
## [10] "start_lng" "end_lat" "end_lng"
## [13] "member_casual"
colnames(data_202402)
## [1] "ride_id" "rideable_type" "started_at"
## [4] "ended_at" "start_station_name" "start_station_id"
## [7] "end_station_name" "end_station_id" "start_lat"
## [10] "start_lng" "end_lat" "end_lng"
## [13] "member_casual"
colnames(data_202403)
## [1] "ride_id" "rideable_type" "started_at"
## [4] "ended_at" "start_station_name" "start_station_id"
## [7] "end_station_name" "end_station_id" "start_lat"
## [10] "start_lng" "end_lat" "end_lng"
## [13] "member_casual"
colnames(data_202404)
## [1] "ride_id" "rideable_type" "started_at"
## [4] "ended_at" "start_station_name" "start_station_id"
## [7] "end_station_name" "end_station_id" "start_lat"
## [10] "start_lng" "end_lat" "end_lng"
## [13] "member_casual"
colnames(data_202405)
## [1] "ride_id" "rideable_type" "started_at"
## [4] "ended_at" "start_station_name" "start_station_id"
## [7] "end_station_name" "end_station_id" "start_lat"
## [10] "start_lng" "end_lat" "end_lng"
## [13] "member_casual"
colnames(data_202406)
## [1] "ride_id" "rideable_type" "started_at"
## [4] "ended_at" "start_station_name" "start_station_id"
## [7] "end_station_name" "end_station_id" "start_lat"
## [10] "start_lng" "end_lat" "end_lng"
## [13] "member_casual"
colnames(data_202407)
## [1] "ride_id" "rideable_type" "started_at"
## [4] "ended_at" "start_station_name" "start_station_id"
## [7] "end_station_name" "end_station_id" "start_lat"
## [10] "start_lng" "end_lat" "end_lng"
## [13] "member_casual"
colnames(data_202408)
## [1] "ride_id" "rideable_type" "started_at"
## [4] "ended_at" "start_station_name" "start_station_id"
## [7] "end_station_name" "end_station_id" "start_lat"
## [10] "start_lng" "end_lat" "end_lng"
## [13] "member_casual"
all_trips <- bind_rows(data_202309, data_202310, data_202311, data_202312, data_202401, data_202402, data_202403, data_202404, data_202405, data_202406, data_202407, data_202408)
str(all_trips)
## 'data.frame': 5699639 obs. of 13 variables:
## $ ride_id : chr "011C1903BF4E2E28" "87DB80E048A1BF9F" "7C2EB7AF669066E3" "57D197B010269CE3" ...
## $ rideable_type : chr "classic_bike" "classic_bike" "electric_bike" "classic_bike" ...
## $ started_at : chr "2023-09-23 00:27:50" "2023-09-02 09:26:43" "2023-09-25 18:30:11" "2023-09-13 15:30:49" ...
## $ ended_at : chr "2023-09-23 00:33:27" "2023-09-02 09:38:19" "2023-09-25 18:41:39" "2023-09-13 15:39:18" ...
## $ start_station_name: chr "Halsted St & Wrightwood Ave" "Clark St & Drummond Pl" "Financial Pl & Ida B Wells Dr" "Clark St & Drummond Pl" ...
## $ start_station_id : chr "TA1309000061" "TA1307000142" "SL-010" "TA1307000142" ...
## $ end_station_name : chr "Sheffield Ave & Wellington Ave" "Racine Ave & Fullerton Ave" "Racine Ave & 15th St" "Racine Ave & Belmont Ave" ...
## $ end_station_id : chr "TA1307000052" "TA1306000026" "13304" "TA1308000019" ...
## $ start_lat : num 41.9 41.9 41.9 41.9 41.9 ...
## $ start_lng : num -87.6 -87.6 -87.6 -87.6 -87.6 ...
## $ end_lat : num 41.9 41.9 41.9 41.9 41.9 ...
## $ end_lng : num -87.7 -87.7 -87.7 -87.7 -87.7 ...
## $ member_casual : chr "member" "member" "member" "member" ...
skim_without_charts(all_trips)
| Name | all_trips |
| Number of rows | 5699639 |
| Number of columns | 13 |
| _______________________ | |
| Column type frequency: | |
| character | 9 |
| numeric | 4 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| ride_id | 0 | 1 | 16 | 16 | 0 | 5699428 | 0 |
| rideable_type | 0 | 1 | 12 | 13 | 0 | 2 | 0 |
| started_at | 0 | 1 | 19 | 23 | 0 | 5232178 | 0 |
| ended_at | 0 | 1 | 19 | 23 | 0 | 5238399 | 0 |
| start_station_name | 0 | 1 | 0 | 64 | 968697 | 1727 | 0 |
| start_station_id | 0 | 1 | 0 | 14 | 968697 | 1694 | 0 |
| end_station_name | 0 | 1 | 0 | 64 | 1006133 | 1739 | 0 |
| end_station_id | 0 | 1 | 0 | 36 | 1006133 | 1703 | 0 |
| member_casual | 0 | 1 | 6 | 6 | 0 | 2 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 |
|---|---|---|---|---|---|---|---|---|---|
| start_lat | 0 | 1 | 41.90 | 0.05 | 41.64 | 41.88 | 41.90 | 41.93 | 42.07 |
| start_lng | 0 | 1 | -87.65 | 0.03 | -87.94 | -87.66 | -87.64 | -87.63 | -87.52 |
| end_lat | 7526 | 1 | 41.90 | 0.05 | 16.06 | 41.88 | 41.90 | 41.93 | 87.96 |
| end_lng | 7526 | 1 | -87.65 | 0.04 | -144.05 | -87.66 | -87.64 | -87.63 | -79.02 |
#delete
remove(data_202309, data_202310, data_202311, data_202312, data_202401, data_202402, data_202403, data_202404, data_202405, data_202406, data_202407, data_202408)
#copy_df
all_trips_2 <- all_trips
# quick glance a the dataset
head(all_trips, 15)
## ride_id rideable_type started_at ended_at
## 1 011C1903BF4E2E28 classic_bike 2023-09-23 00:27:50 2023-09-23 00:33:27
## 2 87DB80E048A1BF9F classic_bike 2023-09-02 09:26:43 2023-09-02 09:38:19
## 3 7C2EB7AF669066E3 electric_bike 2023-09-25 18:30:11 2023-09-25 18:41:39
## 4 57D197B010269CE3 classic_bike 2023-09-13 15:30:49 2023-09-13 15:39:18
## 5 8A2CEA7C8C8074D8 classic_bike 2023-09-18 15:58:58 2023-09-18 16:05:04
## 6 03F7044D1304CD58 electric_bike 2023-09-15 20:19:25 2023-09-15 20:30:27
## 7 672503E0FC0835EC electric_bike 2023-09-27 16:52:18 2023-09-27 17:03:22
## 8 1D806492F95973AC electric_bike 2023-09-17 11:07:05 2023-09-17 11:13:39
## 9 40D9EF382CC6C53D classic_bike 2023-09-17 11:58:50 2023-09-17 12:08:36
## 10 C60CE661AF7ECC93 electric_bike 2023-09-07 20:52:43 2023-09-07 21:06:51
## 11 3812B98E9406040E classic_bike 2023-09-12 16:01:28 2023-09-12 16:17:47
## 12 EBA56298CB3C803F classic_bike 2023-09-24 13:17:23 2023-09-24 13:50:43
## 13 C6BD5AF648F11D11 electric_bike 2023-09-28 18:09:40 2023-09-28 18:15:04
## 14 585C82FA2E006DE9 classic_bike 2023-09-22 12:30:41 2023-09-22 12:42:21
## 15 95E72C49D692F822 classic_bike 2023-09-07 16:28:17 2023-09-07 16:31:25
## start_station_name start_station_id
## 1 Halsted St & Wrightwood Ave TA1309000061
## 2 Clark St & Drummond Pl TA1307000142
## 3 Financial Pl & Ida B Wells Dr SL-010
## 4 Clark St & Drummond Pl TA1307000142
## 5 Halsted St & Wrightwood Ave TA1309000061
## 6 Southport Ave & Wrightwood Ave TA1307000113
## 7 Kedzie Ave & Milwaukee Ave 13085
## 8 Jeffery Blvd & 71st St KA1503000018
## 9 Kedzie Ave & Milwaukee Ave 13085
## 10 Southport Ave & Wrightwood Ave TA1307000113
## 11 Financial Pl & Ida B Wells Dr SL-010
## 12 Clark St & Schreiber Ave KA1504000156
## 13 Halsted St & Wrightwood Ave TA1309000061
## 14 Halsted St & Wrightwood Ave TA1309000061
## 15 Clark St & Drummond Pl TA1307000142
## end_station_name end_station_id start_lat start_lng end_lat
## 1 Sheffield Ave & Wellington Ave TA1307000052 41.92914 -87.64908 41.93625
## 2 Racine Ave & Fullerton Ave TA1306000026 41.93125 -87.64434 41.92557
## 3 Racine Ave & 15th St 13304 41.87506 -87.63314 41.86127
## 4 Racine Ave & Belmont Ave TA1308000019 41.93125 -87.64434 41.93974
## 5 Racine Ave & Fullerton Ave TA1306000026 41.92914 -87.64908 41.92557
## 6 41.92884 -87.66387 41.90000
## 7 41.92956 -87.70796 41.93000
## 8 41.76659 -87.57645 41.77000
## 9 California Ave & Milwaukee Ave 13084 41.92957 -87.70786 41.92269
## 10 41.92882 -87.66391 41.90000
## 11 Adler Planetarium 13431 41.87502 -87.63309 41.86610
## 12 Oakley Ave & Touhy Ave RP-004 41.99990 -87.67007 42.01234
## 13 Halsted St & Roscoe St TA1309000025 41.92919 -87.64914 41.94367
## 14 Halsted St & Roscoe St TA1309000025 41.92914 -87.64908 41.94367
## 15 Clark St & Wellington Ave TA1307000136 41.93125 -87.64434 41.93650
## end_lng member_casual
## 1 -87.65266 member
## 2 -87.65842 member
## 3 -87.65663 member
## 4 -87.65887 member
## 5 -87.65842 member
## 6 -87.64000 member
## 7 -87.66000 member
## 8 -87.57000 member
## 9 -87.69715 member
## 10 -87.63000 member
## 11 -87.60727 member
## 12 -87.68824 member
## 13 -87.64895 member
## 14 -87.64895 member
## 15 -87.64754 member
# checking for duplicates
nrow(all_trips)
## [1] 5699639
# number of rows is larger than n_unique value for ride_id, we need to delete duplicates
cleaned_all_trips <- all_trips %>%
distinct(ride_id, .keep_all = TRUE)
#remove rows with null values
cleaned_all_trips <- na.omit(cleaned_all_trips)
#str() showed that started_at and ended_at values are stored as chr, we need to convert it into datetime format
cleaned_all_trips$started_at <- as.POSIXct(cleaned_all_trips$started_at)
cleaned_all_trips$ended_at <- as.POSIXct(cleaned_all_trips$ended_at)
cleaned_all_trips$date <- as.Date(cleaned_all_trips$started_at)
cleaned_all_trips$day_of_week <- format(as.Date(cleaned_all_trips$date), "%A")
#create ride_length column which is calculated by substracting started_at from ended_at
cleaned_all_trips$ride_length <-as.numeric(difftime(cleaned_all_trips$ended_at,cleaned_all_trips$started_at, units = "mins"))
#remove rows where ride_length is <= 0
cleaned_all_trips <- cleaned_all_trips %>%
filter(ride_length >0)
# rename member_casual to member_type
cleaned_all_trips <- cleaned_all_trips %>% rename(member_type = member_casual)
# overview the final dataset
glimpse(cleaned_all_trips)
## Rows: 5,690,679
## Columns: 16
## $ ride_id <chr> "011C1903BF4E2E28", "87DB80E048A1BF9F", "7C2EB7AF66…
## $ rideable_type <chr> "classic_bike", "classic_bike", "electric_bike", "c…
## $ started_at <dttm> 2023-09-23 00:27:50, 2023-09-02 09:26:43, 2023-09-…
## $ ended_at <dttm> 2023-09-23 00:33:27, 2023-09-02 09:38:19, 2023-09-…
## $ start_station_name <chr> "Halsted St & Wrightwood Ave", "Clark St & Drummond…
## $ start_station_id <chr> "TA1309000061", "TA1307000142", "SL-010", "TA130700…
## $ end_station_name <chr> "Sheffield Ave & Wellington Ave", "Racine Ave & Ful…
## $ end_station_id <chr> "TA1307000052", "TA1306000026", "13304", "TA1308000…
## $ start_lat <dbl> 41.92914, 41.93125, 41.87506, 41.93125, 41.92914, 4…
## $ start_lng <dbl> -87.64908, -87.64434, -87.63314, -87.64434, -87.649…
## $ end_lat <dbl> 41.93625, 41.92557, 41.86127, 41.93974, 41.92557, 4…
## $ end_lng <dbl> -87.65266, -87.65842, -87.65663, -87.65887, -87.658…
## $ member_type <chr> "member", "member", "member", "member", "member", "…
## $ date <date> 2023-09-23, 2023-09-02, 2023-09-25, 2023-09-13, 20…
## $ day_of_week <chr> "Saturday", "Saturday", "Monday", "Wednesday", "Mon…
## $ ride_length <dbl> 5.6166667, 11.6000000, 11.4666667, 8.4833333, 6.100…
# identify bad data and outliers
cleaned_all_trips %>%
select(member_type, ride_length) %>%
group_by(member_type) %>%
dplyr::summarize(min_ride_length = min(ride_length), max_ride_length = max(ride_length))
## # A tibble: 2 × 3
## member_type min_ride_length max_ride_length
## <chr> <dbl> <dbl>
## 1 casual 0.00172 1501.
## 2 member 0.000650 1500.
The min_ride_length and max_ride_length contain values that are less than 1 min and more than 24 hours, 24 hours = 1,440 mins
# remove bad data and outliers
cleaned_all_trips <- cleaned_all_trips %>%
filter(ride_length >= 1 & ride_length < 1440)
nrow(cleaned_all_trips)
## [1] 5561700
# identify average, mean, maximum, minimum, etc.
summary(cleaned_all_trips$ride_length)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 5.815 9.967 15.839 17.550 1439.867
We found that a minimum ride length is 1 min, 25% of rides are less or equal to 5.815 min, median ride lasted 9.967 min, mean ride was 15.839, 75% of rides were less or equal to 17.550, and a maximum ride lasted 1439.867 min (almost 24 hours).
We need to identify these values for two different rider types – member and casual
# identify % of member and casual riders
result_percentage <- cleaned_all_trips %>%
group_by(member_type) %>%
summarize(total_count = n()) %>%
mutate(percentage = (total_count / sum(total_count)) * 100)
print(result_percentage)
## # A tibble: 2 × 3
## member_type total_count percentage
## <chr> <int> <dbl>
## 1 casual 1981364 35.6
## 2 member 3580336 64.4
ggplot(result_percentage, aes(x = "", y = percentage, fill = member_type)) +
geom_bar(width = 1, stat = "identity") +
coord_polar("y") +
labs(title = "Percentage of each member type") +
theme_void() +
scale_fill_manual(values = c("member" = "blue", "casual" = "orange")) +
geom_text(aes(label = paste0(round(percentage, 1), "%")),
position = position_stack(vjust = 0.5),
color = "black")
# identify number of member and casual riders by bicycle type
bike_type_dist <- cleaned_all_trips %>%
group_by(rideable_type, member_type) %>%
summarize(count_trips = n(), .groups = 'drop') %>%
group_by(rideable_type) %>%
mutate(perc = (count_trips / sum(count_trips)) * 100)
# create a viz
ggplot(bike_type_dist, aes(x=rideable_type, y=count_trips,fill=member_type,color=member_type)) +
geom_bar(stat = 'identity', position = 'dodge') +
geom_text(aes(label = paste0(round(perc, 1), "%")),
color = "black",
position = position_dodge(width = 0.9),
vjust = -0.5) +
theme_bw() +
labs(title="Percentage of rides by bicycle and member type", x = "Bicycle type", y = "Number of rides") +
scale_fill_manual(values = c("member" = "blue", "casual" = "orange"))
### Separating date and time,transforming day_of_week, date to month and
season
# separate started_at to date and time format
cleaned_all_trips$started_date <- format(cleaned_all_trips$started_at, "%m%d%y")
cleaned_all_trips$started_time<- format(cleaned_all_trips$started_at, "%H:%M:%S")
cleaned_all_trips <- cleaned_all_trips %>%
mutate(started_hour = factor(hour(started_at), levels = 0:23))
cleaned_all_trips %>%
group_by(started_hour, member_type) %>%
summarize(ride_number = n(),
avg_duration = mean(ride_length),
.groups = 'drop') %>%
ggplot(aes(x = started_hour, y = ride_number, fill = member_type)) +
geom_col(position = "dodge") +
labs(title = "Daily length of ride", x= "Hours", y = "Number of rides", fill = "Member type") +
scale_fill_manual(values = c("member" = "blue", "casual" = "orange"))
cleaned_all_trips <- cleaned_all_trips %>%
mutate(day_of_week = factor(day_of_week,
levels = c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday")))
cleaned_all_trips %>%
group_by(day_of_week, member_type) %>%
summarize(num_dow = n(),
.groups = 'drop') %>%
arrange(day_of_week) %>%
ggplot(aes(x = day_of_week, y = num_dow, fill = member_type)) +
geom_col(position = "dodge") +
labs(title = "Weekly distribution of rides by member type", x= "Days", y = "Number of rides", fill = "Member type") +
scale_fill_manual(values = c("member" = "blue", "casual" = "orange"))
# create a column for months
cleaned_all_trips$started_date_object <- as.Date(cleaned_all_trips$started_at)
cleaned_all_trips$month_number <- month(cleaned_all_trips$started_date_object)
cleaned_all_trips$month_name <- month(cleaned_all_trips$started_date_object, label = TRUE)
# create a column for seasons
cleaned_all_trips <- cleaned_all_trips %>%
mutate(season = case_when(
month_number %in% c(12,1,2) ~ "Winter",
month_number %in% c(3,4,5) ~ "Spring",
month_number %in% c(6,7,8) ~ "Summer",
month_number %in% c(9,10,11) ~ "Fall",
TRUE ~ "UNKNOWN"
))
# checking if there are outliers in month_name and season
num_unique_months <-n_distinct(cleaned_all_trips$month_name)
print(num_unique_months)
## [1] 12
num_unique_seasons <- n_distinct(cleaned_all_trips$season)
print(num_unique_seasons)
## [1] 4
head(cleaned_all_trips)
## ride_id rideable_type started_at ended_at
## 1 011C1903BF4E2E28 classic_bike 2023-09-23 00:27:50 2023-09-23 00:33:27
## 2 87DB80E048A1BF9F classic_bike 2023-09-02 09:26:43 2023-09-02 09:38:19
## 3 7C2EB7AF669066E3 electric_bike 2023-09-25 18:30:11 2023-09-25 18:41:39
## 4 57D197B010269CE3 classic_bike 2023-09-13 15:30:49 2023-09-13 15:39:18
## 5 8A2CEA7C8C8074D8 classic_bike 2023-09-18 15:58:58 2023-09-18 16:05:04
## 6 03F7044D1304CD58 electric_bike 2023-09-15 20:19:25 2023-09-15 20:30:27
## start_station_name start_station_id
## 1 Halsted St & Wrightwood Ave TA1309000061
## 2 Clark St & Drummond Pl TA1307000142
## 3 Financial Pl & Ida B Wells Dr SL-010
## 4 Clark St & Drummond Pl TA1307000142
## 5 Halsted St & Wrightwood Ave TA1309000061
## 6 Southport Ave & Wrightwood Ave TA1307000113
## end_station_name end_station_id start_lat start_lng end_lat
## 1 Sheffield Ave & Wellington Ave TA1307000052 41.92914 -87.64908 41.93625
## 2 Racine Ave & Fullerton Ave TA1306000026 41.93125 -87.64434 41.92557
## 3 Racine Ave & 15th St 13304 41.87506 -87.63314 41.86127
## 4 Racine Ave & Belmont Ave TA1308000019 41.93125 -87.64434 41.93974
## 5 Racine Ave & Fullerton Ave TA1306000026 41.92914 -87.64908 41.92557
## 6 41.92884 -87.66387 41.90000
## end_lng member_type date day_of_week ride_length started_date
## 1 -87.65266 member 2023-09-23 Saturday 5.616667 092323
## 2 -87.65842 member 2023-09-02 Saturday 11.600000 090223
## 3 -87.65663 member 2023-09-25 Monday 11.466667 092523
## 4 -87.65887 member 2023-09-13 Wednesday 8.483333 091323
## 5 -87.65842 member 2023-09-18 Monday 6.100000 091823
## 6 -87.64000 member 2023-09-16 Saturday 11.033333 091523
## started_time started_hour started_date_object month_number month_name season
## 1 00:27:50 0 2023-09-23 9 Sep Fall
## 2 09:26:43 9 2023-09-02 9 Sep Fall
## 3 18:30:11 18 2023-09-25 9 Sep Fall
## 4 15:30:49 15 2023-09-13 9 Sep Fall
## 5 15:58:58 15 2023-09-18 9 Sep Fall
## 6 20:19:25 20 2023-09-16 9 Sep Fall
# create a visualization for months
monthly_distribution <- cleaned_all_trips %>%
group_by(month_name, member_type) %>%
summarize(num_rides_month = n(), .groups = 'drop')
ggplot(monthly_distribution, aes(x = month_name, y = num_rides_month, fill = member_type)) +
geom_col(position = "dodge") +
labs(title = "Monthly distribution by member type",
x = "Month", y = "Number of rides", fill = "Member type") +
scale_fill_manual(values = c("member" = "blue", "casual" = "orange")) +
theme_minimal() +
scale_x_discrete(limits = levels(cleaned_all_trips$month_name))
# create a visualization for seasons
season_distribution <- cleaned_all_trips %>%
group_by(season, member_type) %>%
summarize(num_rides_season = n(), .groups = 'drop')
ggplot(season_distribution, aes(x = season, y = num_rides_season, fill = member_type)) +
geom_col(position = "dodge") +
labs(title = "Rides distribution by member type and season",
x = "Season", y = "Number of rides", fill = "Member type") +
scale_fill_manual(values = c("member" = "blue", "casual" = "orange")) +
theme_minimal() +
scale_x_discrete(limits = levels(cleaned_all_trips$season))
In the analysis results above we showed how members and casual riders use Cyclistic bikes differently. There two more questions in our Business task section we need to answer.
Why would casual riders buy Cyclistic annual membership? Casual riders may be motivated to purchase an annual membership due to significant cost savings compared to the cumulative cost of single rides, particularly for those who ride frequently, especially during peak seasons like summer and weekends. The membership offers unlimited rides, making it ideal for regular commuting and spontaneous trips without worrying about additional fees. The convenience of having a bike readily available for work or leisure activities can also enhance the overall experience, encouraging casual riders to transition to membership.
The ride distribution analysis shows that members ride most frequently on weekdays, suggesting a high number of commuting trips. Casual riders, however, tend to ride more on weekends. This indicates that casual riders might not be using bikes for regular commuting yet but could be persuaded to do so. Marketing the membership as an affordable, reliable commuting option can appeal to this group.
How can Cyclistic use digital media to influence casual riders to become members? Cyclistic can leverage targeted digital advertising on platforms like social media, ride-sharing apps, and local community forums to reach casual riders effectively. Highlighting the cost savings associated with membership in these campaigns can resonate with potential customers. Additionally, showcasing the convenience of city bikes for commuting—especially during rush hours—can attract riders looking for efficient transportation options. Seasonal promotions, such as discounts or special offers during peak riding months, can further entice casual users. Personalization of offers based on riding behavior can also be effective, demonstrating to casual riders that membership provides tailored benefits suited to their needs.