Cyclistic, a prominent bike-share company headquartered in Chicago, has rapidly gained traction in the city’s transportation landscape. In an effort to delve deeper into their customer base and refine their marketing strategies, Cyclistic seeks to understand the distinct behaviors and preferences of casual riders versus annual members.
The company recognizes the need to tailor its marketing approach to effectively convert casual riders into committed annual members. By leveraging data-driven insights, Cyclistic aims to develop a comprehensive understanding of how these two customer segments interact with their services differently.
Business Task
The objective of this business task is to develop a comprehensive marketing strategy for Cyclistic that addresses the distinct needs and behaviors of both annual members and casual riders. By answering the following three questions, we aim to optimize marketing efforts, increase customer engagement, and drive conversions from casual riders to annual members.
Data Background
The dataset was acquired from Click here and Motivate International Inc made the data available under this license.
For this project, I downloaded data for twelve months (January to December 2020). The zipped CSVs were downloaded and unzipped into a folder.
Below shown the dataset of a cyclistic biketrip data for the year 2020.The dataset has 3541683 rows and 13 column.
Due to the large size of data we use R to analyse effectively.
In R, the library() function is used to load R packages into your current R session
library (tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library (janitor)
##
## Attaching package: 'janitor'
##
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library (lubridate)
library (scales)
##
## Attaching package: 'scales'
##
## The following object is masked from 'package:purrr':
##
## discard
##
## The following object is masked from 'package:readr':
##
## col_factor
library(readr)
library(geosphere)
rm(list=ls())
Below given are the year 2020 dataset of cyclistic bike share program which are downloaded and saved as CSV files. Here read.csv() is used for reading the csv files.
df1 <- read.csv(“Divvy_Trips_2020_Q1.csv”) df2 <- read.csv(“202004-divvy-tripdata.csv”) df3 <- read.csv(“202005-divvy-tripdata.csv”) df4 <- read.csv(“202006-divvy-tripdata.csv”) df5<- read.csv(“202007-divvy-tripdata.csv”) df6 <- read.csv(“202008-divvy-tripdata.csv”) df7 <- read.csv(“202009-divvy-tripdata.csv”) df8 <- read.csv(“202010-divvy-tripdata.csv”) df9 <- read.csv(“202011-divvy-tripdata.csv”) df10 <- read.csv(“202012-divvy-tripdata.csv”) df20 <- rbind(df1,df2,df3,df4,df5,df6,df7,df8,df9,df10)
Save the binded dataset as CSV file.
write.csv(df20,file = “df20.CSV”,row.names = FALSE)
df20 <- read_csv("C:/Users/nisha/Desktop/New folder/DataAnalytics_NishaP/Dataset/df20.CSV")
## Rows: 3541683 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
In R, the head() function is used to view the first few rows of a data frame or a matrix. It allows you to quickly inspect the structure and content of your data without displaying the entire dataset.
head(df20)
## # A tibble: 6 × 13
## ride_id rideable_type started_at ended_at
## <chr> <chr> <dttm> <dttm>
## 1 EACB19130B0CDA4A docked_bike 2020-01-21 20:06:59 2020-01-21 20:14:30
## 2 8FED874C809DC021 docked_bike 2020-01-30 14:22:39 2020-01-30 14:26:22
## 3 789F3C21E472CA96 docked_bike 2020-01-09 19:29:26 2020-01-09 19:32:17
## 4 C9A388DAC6ABF313 docked_bike 2020-01-06 16:17:07 2020-01-06 16:25:56
## 5 943BC3CBECCFD662 docked_bike 2020-01-30 08:37:16 2020-01-30 08:42:48
## 6 6D9C8A6938165C11 docked_bike 2020-01-10 12:33:05 2020-01-10 12:37:54
## # ℹ 9 more variables: start_station_name <chr>, start_station_id <chr>,
## # end_station_name <chr>, end_station_id <chr>, start_lat <dbl>,
## # start_lng <dbl>, end_lat <dbl>, end_lng <dbl>, member_casual <chr>
Janitor is an R package that provides a set of functions to clean and preprocess data in R data frames
df20_cleanedcols <- janitor::remove_empty(df20,which =c("cols"))
df20_cleanedrows <- janitor::remove_empty(df20,which =c("rows"))
dim(df20_cleanedcols)
## [1] 3541683 13
dim(df20_cleanedrows)
## [1] 3541683 13
df20_clean <- na.omit(df20)
# for unique and removing duplicates
unique(df20_clean)
## # A tibble: 3,389,381 × 13
## ride_id rideable_type started_at ended_at
## <chr> <chr> <dttm> <dttm>
## 1 EACB19130B0CDA4A docked_bike 2020-01-21 20:06:59 2020-01-21 20:14:30
## 2 8FED874C809DC021 docked_bike 2020-01-30 14:22:39 2020-01-30 14:26:22
## 3 789F3C21E472CA96 docked_bike 2020-01-09 19:29:26 2020-01-09 19:32:17
## 4 C9A388DAC6ABF313 docked_bike 2020-01-06 16:17:07 2020-01-06 16:25:56
## 5 943BC3CBECCFD662 docked_bike 2020-01-30 08:37:16 2020-01-30 08:42:48
## 6 6D9C8A6938165C11 docked_bike 2020-01-10 12:33:05 2020-01-10 12:37:54
## 7 31EB9B8F406D4C82 docked_bike 2020-01-10 13:07:35 2020-01-10 13:12:24
## 8 A2B24E3F9C9720E3 docked_bike 2020-01-10 07:24:53 2020-01-10 07:29:50
## 9 5E3F01E1441730B7 docked_bike 2020-01-31 16:37:16 2020-01-31 16:42:11
## 10 19DC57F7E3140131 docked_bike 2020-01-31 09:39:17 2020-01-31 09:42:40
## # ℹ 3,389,371 more rows
## # ℹ 9 more variables: start_station_name <chr>, start_station_id <chr>,
## # end_station_name <chr>, end_station_id <chr>, start_lat <dbl>,
## # start_lng <dbl>, end_lat <dbl>, end_lng <dbl>, member_casual <chr>
dim(df20_clean)
## [1] 3389381 13
df20_clean <- df20_clean %>% filter(df20_clean$start_station_name!=" ")
Lubridate is an R package designed to make it easier to work with dates and times in R. It provides a set of functions that simplify common tasks such as parsing, manipulating, and formatting dates and times.we use parse date ymd_hms() and as.Date() for changing the Started_at and ended_at column format.
Difftime() is used for calculating the difference in time. This helps us to find and analyse the duration of each ride.
df <- df20_clean
df$started_date <- as.Date(df$started_at)
df$ended_date <- as.Date(df$ended_at)
#time as hours and minutes
df$started_at <- lubridate::ymd_hms(df$started_at)
## Warning: 11 failed to parse.
df$ended_at <- lubridate::ymd_hms(df$ended_at)
## Warning: 1 failed to parse.
df$Start_time <- format(df$started_at,"%H:%M:%S")
df$End_time <- format(df$ended_at,"%H:%M:%S")
df$day_of_the_week <- weekdays(df$started_at)
df$month <- month(df$started_at, label = TRUE, abbr = TRUE)
df$trip_duration <- (as.double(difftime(df$ended_at,df$started_at)))/60
df<-df %>%
filter(trip_duration > 0)
glimpse(df)
## Rows: 3,378,424
## Columns: 20
## $ ride_id <chr> "EACB19130B0CDA4A", "8FED874C809DC021", "789F3C21E4…
## $ rideable_type <chr> "docked_bike", "docked_bike", "docked_bike", "docke…
## $ started_at <dttm> 2020-01-21 20:06:59, 2020-01-30 14:22:39, 2020-01-…
## $ ended_at <dttm> 2020-01-21 20:14:30, 2020-01-30 14:26:22, 2020-01-…
## $ start_station_name <chr> "Western Ave & Leland Ave", "Clark St & Montrose Av…
## $ start_station_id <chr> "239", "234", "296", "51", "66", "212", "96", "96",…
## $ end_station_name <chr> "Clark St & Leland Ave", "Southport Ave & Irving Pa…
## $ end_station_id <chr> "326", "318", "117", "24", "212", "96", "212", "212…
## $ start_lat <dbl> 41.9665, 41.9616, 41.9401, 41.8846, 41.8856, 41.889…
## $ start_lng <dbl> -87.6884, -87.6660, -87.6455, -87.6319, -87.6418, -…
## $ end_lat <dbl> 41.9671, 41.9542, 41.9402, 41.8918, 41.8899, 41.884…
## $ end_lng <dbl> -87.6674, -87.6644, -87.6530, -87.6206, -87.6343, -…
## $ member_casual <chr> "member", "member", "member", "member", "member", "…
## $ started_date <date> 2020-01-21, 2020-01-30, 2020-01-09, 2020-01-06, 20…
## $ ended_date <date> 2020-01-21, 2020-01-30, 2020-01-09, 2020-01-06, 20…
## $ Start_time <chr> "20:06:59", "14:22:39", "19:29:26", "16:17:07", "08…
## $ End_time <chr> "20:14:30", "14:26:22", "19:32:17", "16:25:56", "08…
## $ day_of_the_week <chr> "Tuesday", "Thursday", "Thursday", "Monday", "Thurs…
## $ month <ord> Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, J…
## $ trip_duration <dbl> 7.516667, 3.716667, 2.850000, 8.816667, 5.533333, 4…
Dim function dim(df) retrieve or set the dimensions of an object, such as a matrix or an array.
Here’s how it works:
dim(df)
## [1] 3378424 20
The distHaversine functionin R, from the geosphere package, is used to calculate the great-circle distance between two points on the Earth’s surface given their latitude and longitude coordinates. This distance is calculated using the Haversine formula, which accounts for the spherical shape of the Earth.
df$distance <- mapply(function(lat1, lon1, lat2, lon2) {
distHaversine(c(lon1, lat1), c(lon2, lat2))
}, df$start_lat, df$start_lng, df$end_lat, df$end_lng)
#change to km
df$distance <- df$distance/1000
glimpse(df)
## Rows: 3,378,424
## Columns: 21
## $ ride_id <chr> "EACB19130B0CDA4A", "8FED874C809DC021", "789F3C21E4…
## $ rideable_type <chr> "docked_bike", "docked_bike", "docked_bike", "docke…
## $ started_at <dttm> 2020-01-21 20:06:59, 2020-01-30 14:22:39, 2020-01-…
## $ ended_at <dttm> 2020-01-21 20:14:30, 2020-01-30 14:26:22, 2020-01-…
## $ start_station_name <chr> "Western Ave & Leland Ave", "Clark St & Montrose Av…
## $ start_station_id <chr> "239", "234", "296", "51", "66", "212", "96", "96",…
## $ end_station_name <chr> "Clark St & Leland Ave", "Southport Ave & Irving Pa…
## $ end_station_id <chr> "326", "318", "117", "24", "212", "96", "212", "212…
## $ start_lat <dbl> 41.9665, 41.9616, 41.9401, 41.8846, 41.8856, 41.889…
## $ start_lng <dbl> -87.6884, -87.6660, -87.6455, -87.6319, -87.6418, -…
## $ end_lat <dbl> 41.9671, 41.9542, 41.9402, 41.8918, 41.8899, 41.884…
## $ end_lng <dbl> -87.6674, -87.6644, -87.6530, -87.6206, -87.6343, -…
## $ member_casual <chr> "member", "member", "member", "member", "member", "…
## $ started_date <date> 2020-01-21, 2020-01-30, 2020-01-09, 2020-01-06, 20…
## $ ended_date <date> 2020-01-21, 2020-01-30, 2020-01-09, 2020-01-06, 20…
## $ Start_time <chr> "20:06:59", "14:22:39", "19:29:26", "16:17:07", "08…
## $ End_time <chr> "20:14:30", "14:26:22", "19:32:17", "16:25:56", "08…
## $ day_of_the_week <chr> "Tuesday", "Thursday", "Thursday", "Monday", "Thurs…
## $ month <ord> Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, J…
## $ trip_duration <dbl> 7.516667, 3.716667, 2.850000, 8.816667, 5.533333, 4…
## $ distance <dbl> 1.7394455, 0.8343444, 0.6211318, 1.2326158, 0.78450…
df <- df %>%
select(-start_station_id,-end_station_id,-start_lat,-end_lat,-start_lng,-end_lng)
glimpse(df)
## Rows: 3,378,424
## Columns: 15
## $ ride_id <chr> "EACB19130B0CDA4A", "8FED874C809DC021", "789F3C21E4…
## $ rideable_type <chr> "docked_bike", "docked_bike", "docked_bike", "docke…
## $ started_at <dttm> 2020-01-21 20:06:59, 2020-01-30 14:22:39, 2020-01-…
## $ ended_at <dttm> 2020-01-21 20:14:30, 2020-01-30 14:26:22, 2020-01-…
## $ start_station_name <chr> "Western Ave & Leland Ave", "Clark St & Montrose Av…
## $ end_station_name <chr> "Clark St & Leland Ave", "Southport Ave & Irving Pa…
## $ member_casual <chr> "member", "member", "member", "member", "member", "…
## $ started_date <date> 2020-01-21, 2020-01-30, 2020-01-09, 2020-01-06, 20…
## $ ended_date <date> 2020-01-21, 2020-01-30, 2020-01-09, 2020-01-06, 20…
## $ Start_time <chr> "20:06:59", "14:22:39", "19:29:26", "16:17:07", "08…
## $ End_time <chr> "20:14:30", "14:26:22", "19:32:17", "16:25:56", "08…
## $ day_of_the_week <chr> "Tuesday", "Thursday", "Thursday", "Monday", "Thurs…
## $ month <ord> Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, J…
## $ trip_duration <dbl> 7.516667, 3.716667, 2.850000, 8.816667, 5.533333, 4…
## $ distance <dbl> 1.7394455, 0.8343444, 0.6211318, 1.2326158, 0.78450…
##calculate riable_type usage
sum_df <- df %>%
select(rideable_type,member_casual,started_at,start_station_name,day_of_the_week,month,trip_duration,distance) %>%
group_by(rideable_type,member_casual) %>%
summarise(Total_Duration = sum(trip_duration),Count = n(),Total_distance = sum(distance)) %>%
ungroup()
## `summarise()` has grouped output by 'rideable_type'. You can override using the
## `.groups` argument.
glimpse(sum_df)
## Rows: 6
## Columns: 5
## $ rideable_type <chr> "classic_bike", "classic_bike", "docked_bike", "docked_…
## $ member_casual <chr> "casual", "member", "casual", "member", "casual", "memb…
## $ Total_Duration <dbl> 261577.0, 747037.1, 58895632.9, 29011008.2, 3052981.5, …
## $ Count <int> 11259, 59141, 1140592, 1810758, 145379, 211295
## $ Total_distance <dbl> 22434.51, 112343.41, 2422396.66, 3948031.01, 362682.23,…
##calculate rideable_type usage
sum_df <- df %>%
select(rideable_type,member_casual,started_at,start_station_name,day_of_the_week,month,trip_duration,distance) %>%
group_by(rideable_type,member_casual) %>%
summarise(Total_Duration = sum(trip_duration),Count = n(),Total_distance = sum(distance)) %>%
ungroup()
## `summarise()` has grouped output by 'rideable_type'. You can override using the
## `.groups` argument.
glimpse(sum_df)
## Rows: 6
## Columns: 5
## $ rideable_type <chr> "classic_bike", "classic_bike", "docked_bike", "docked_…
## $ member_casual <chr> "casual", "member", "casual", "member", "casual", "memb…
## $ Total_Duration <dbl> 261577.0, 747037.1, 58895632.9, 29011008.2, 3052981.5, …
## $ Count <int> 11259, 59141, 1140592, 1810758, 145379, 211295
## $ Total_distance <dbl> 22434.51, 112343.41, 2422396.66, 3948031.01, 362682.23,…
## member Vs Casual distribution
Member_type<- df %>%
group_by(member_casual) %>%
summarise(Count = n(),Total_duration = sum(trip_duration),Total_distance = sum(distance)) %>%
ungroup()
glimpse(Member_type)
## Rows: 2
## Columns: 4
## $ member_casual <chr> "casual", "member"
## $ Count <int> 1297230, 2081194
## $ Total_duration <dbl> 62210191, 32533813
## $ Total_distance <dbl> 2807513, 4603502
# Daily ride
ride_per_day <- df %>%
group_by(started_date,member_casual) %>%
summarise(Avg_Trip = mean(trip_duration),Avg_distance = mean(distance),Count = n()) %>%
arrange(started_date) %>%
ungroup()
## `summarise()` has grouped output by 'started_date'. You can override using the
## `.groups` argument.
glimpse(ride_per_day)
## Rows: 726
## Columns: 5
## $ started_date <date> 2020-01-01, 2020-01-01, 2020-01-02, 2020-01-02, 2020-01…
## $ member_casual <chr> "casual", "member", "casual", "member", "casual", "membe…
## $ Avg_Trip <dbl> 82.751572, 12.622806, 102.745953, 11.360752, 31.038227, …
## $ Avg_distance <dbl> 1.856419, 1.807829, 2.067135, 1.945271, 2.050374, 1.8494…
## $ Count <int> 477, 1664, 663, 5816, 453, 5437, 390, 2797, 431, 2604, 2…
## weekly ride
Weekly_ride <- df %>%
group_by(day_of_the_week,member_casual) %>%
summarise(Avg_Trip = mean(trip_duration),
Avg_distance = mean(distance),Count = n()) %>%
arrange(day_of_the_week) %>%
ungroup()
## `summarise()` has grouped output by 'day_of_the_week'. You can override using
## the `.groups` argument.
glimpse(Weekly_ride)
## Rows: 14
## Columns: 5
## $ day_of_the_week <chr> "Friday", "Friday", "Monday", "Monday", "Saturday", "S…
## $ member_casual <chr> "casual", "member", "casual", "member", "casual", "mem…
## $ Avg_Trip <dbl> 46.86454, 15.37371, 45.70907, 14.86617, 49.65495, 17.9…
## $ Avg_distance <dbl> 2.145251, 2.184277, 2.042295, 2.151856, 2.299606, 2.34…
## $ Count <int> 199038, 314621, 131980, 279927, 300853, 297345, 236559…
# monthly Ride
monthly_ride <- df %>%
group_by(month, member_casual) %>%
summarise(Avg_Trip = mean(trip_duration),
Avg_distance = mean(distance),Count = n()) %>%
arrange(month) %>%
ungroup()
## `summarise()` has grouped output by 'month'. You can override using the
## `.groups` argument.
glimpse(monthly_ride)
## Rows: 24
## Columns: 5
## $ month <ord> Jan, Jan, Feb, Feb, Mar, Mar, Apr, Apr, May, May, Jun, J…
## $ member_casual <chr> "casual", "member", "casual", "member", "casual", "membe…
## $ Avg_Trip <dbl> 161.64949, 11.14904, 127.63017, 12.80652, 63.11613, 14.3…
## $ Avg_distance <dbl> 1.944444, 1.783580, 1.955242, 1.768710, 1.962375, 1.9841…
## $ Count <int> 7785, 136099, 12860, 126715, 27631, 115617, 23584, 61065…
# Popular start station
Popular_top_start_stations <- df %>%
count(start_station_name) %>%
arrange(desc(n)) %>%
head(10)
# Top 20 start station
top_start_stations <- df %>%
group_by(start_station_name,member_casual) %>%
count(start_station_name) %>%
arrange(desc(n)) %>%
head(20) %>%
ungroup()
#Top 20 end Station
top_end_stations <- df %>%
group_by(end_station_name,member_casual) %>%
count(end_station_name) %>%
arrange(desc(n)) %>%
head(20) %>%
ungroup()
head(top_start_stations)
## # A tibble: 6 × 3
## start_station_name member_casual n
## <chr> <chr> <int>
## 1 Streeter Dr & Grand Ave casual 25859
## 2 Clark St & Elm St member 20193
## 3 Lake Shore Dr & Monroe St casual 19892
## 4 Millennium Park casual 18368
## 5 Kingsbury St & Kinzie St member 16431
## 6 St. Clair St & Erie St member 15814
head(top_end_stations)
## # A tibble: 6 × 3
## end_station_name member_casual n
## <chr> <chr> <int>
## 1 Streeter Dr & Grand Ave casual 28463
## 2 Clark St & Elm St member 20882
## 3 Millennium Park casual 19419
## 4 Lake Shore Dr & Monroe St casual 19253
## 5 St. Clair St & Erie St member 17654
## 6 Kingsbury St & Kinzie St member 16630
# top station with large distance ride
dis_df <- df %>%
group_by(start_station_name,member_casual) %>%
summarise(Avg_distance = mean(distance)) %>%
arrange(desc(Avg_distance)) %>%
head(20) %>%
ungroup()
## `summarise()` has grouped output by 'start_station_name'. You can override
## using the `.groups` argument.
head(dis_df)
## # A tibble: 6 × 3
## start_station_name member_casual Avg_distance
## <chr> <chr> <dbl>
## 1 Stony Island Ave & 90th St member 7.69
## 2 Vincennes Ave & 104th St member 7.19
## 3 Dodge Ave & Main St casual 6.87
## 4 Michigan Ave & 71st St member 6.65
## 5 Oglesby Ave & 100th St member 6.61
## 6 Ashland Ave & 74th St member 6.08
#hourly Bike Demand
df <- df %>%
mutate(start_hour = lubridate::hour(started_at))
hourly_need <- df %>%
group_by(member_casual,start_hour) %>%
summarise(number_of_trips = n()) %>%
ungroup()
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
The ggplot() function is the primary function used in the ggplot2 package, a popular data visualization package in R. It is used to create and customize plots based on a grammar of graphics approach, allowing users to create complex and highly customizable visualizations with relatively simple syntax.
Here’s how the ggplot() workes to calculate
Member_type$Percentage <- round(Member_type$Count/sum(Member_type$Count)*100)
ggplot(Member_type,mapping = aes(x = " ", y = Percentage, fill = member_casual)) +
geom_col(color = "black") +
geom_text(aes(label=paste(member_casual, paste(Percentage,"%"),sep="\n")), position = position_stack(vjust=0.5), color="black") +
labs(title = "Members vs Casual Distribution") +
coord_polar(theta = "y") +
scale_fill_brewer() +
theme_bw()
#Most used bike type
ggplot(sum_df,mapping = aes(x = rideable_type ,y = Count,fill = rideable_type)) +
geom_bar(stat = "identity") +
facet_wrap(~member_casual, nrow = 1) +
theme(legend.position = "none") +
labs(title = "Rider Bike type Usage",x = "Bike_type",y = "Count")
# Total Ride per day
ggplot(ride_per_day, aes(x = started_date, y = Count,fill = factor(member_casual))) +
geom_col()+ labs( title ="Ride taken Per Day",
x = "Date",
y = "Count") +
theme_minimal()
#Total Ride per Week
ggplot(Weekly_ride, aes(x = day_of_the_week, y = Count,fill = factor(member_casual))) +
geom_col()+ labs( title ="Weekly Ride Count",
x = "Day of Week",
y = "Count") +
theme_minimal()
ggplot(monthly_ride, aes(x = month, y = Count ,fill = factor(member_casual))) +
geom_col()+ labs( title ="Monthly Ride Count",
x = "Month(year 2020)",
y = "Count") +theme_minimal()+
theme(axis.text.x = element_text(angle = 45))
## Average Distance ride by member type in a year
ggplot(dis_df, aes(x = Avg_distance, y = reorder(start_station_name,Avg_distance), fill = factor(member_casual))) +
geom_col() +
labs(title = "Large Distance Covered from Various Start Station(Top 20)",
x = "Distance Covered (km)",
y = "Station Name",
fill = "Rider Type") +
theme_minimal()+
theme(axis.text.x = element_text(angle = 90))
Popular_top_start_stations %>%
ggplot() + geom_col(aes(x=n,y= reorder(start_station_name,n)))+ scale_x_continuous(labels = comma)+
labs(title = "Top 10 popular Start Station", y = "No of Rides")
#Top Start Station Name
ggplot(top_start_stations, aes(x = n, y = reorder(start_station_name,n), fill = factor(member_casual))) +
geom_bar(stat = "identity", position = "dodge") +
labs(title = "Top 20 Start Station Name",
x = "Ride Count",
y = "Station Name",
fill = "Rider Type") +
theme_minimal()
#Top End Station Name
ggplot(top_end_stations, aes(x = n, y = reorder(end_station_name,n), fill = factor(member_casual))) +
geom_bar(stat = "identity", position = "dodge") +
labs(title = "Top 20 End Station Name",
x = "Ride Count",
y = "Station Name",
fill = "Rider Type") +
theme_minimal()
hourly_need %>%
ggplot()+geom_line(aes(x=start_hour,y= number_of_trips,color = member_casual))+
labs(title = "Hourly Bike Demand",
x = "Hour",
y = "No of Trips",
fill = "Member_casual") +
scale_x_continuous(limits = c(0,24), name ="Hours")+
theme_minimal()
1.Enhance Membership Programs:
Since 62%* of the users are members, there is an opportunity to further strengthen membership benefits to retain and attract more long-term users. Consider offering loyalty programs, discounts for long-term memberships, or exclusive benefits during peak seasons (April to September).
2.Bike Type Optimization:
Given that both members and casual riders prefer docked bikes, ensure that there are sufficient docked bikes available at high-demand stations, especially during peak hours. Consider investing in more docked bikes and maintaining a balance with other bike types.
3.Seasonal Promotions:
Since ride frequency increases from April and decreases after September, plan for seasonal promotions and marketing campaigns to maximize ridership during these months. This could include discounted rides, special events, or partnerships with local businesses to encourage more usage.
4.Improve Weekend Services:
With higher ride volumes on weekends, ensure there are adequate resources and bike availability. Consider running special weekend events or promotions to further boost ridership.
5.Focus on High-Demand Stations:
Vincennes Ave & 104th Station and Streeter Dr & Grand Ave are key stations with high ride counts and distances covered. Enhance services at these stations, such as better bike maintenance, increased docking stations, and potentially setting up customer service points.
6.Adjust for Peak Hours:
With peak demand between 3 pm to 6 pm, allocate more bikes and ensure efficient redistribution of bikes to meet demand. Consider offering incentives for riders who choose to ride outside of these peak hours to balance the load.
Develop targeted campaigns to convert casual riders to members. Highlight the benefits of membership, such as cost savings, exclusive access to promotions, and convenience.
Use the popularity of stations like Streeter Dr & Grand Ave to create event-based marketing. For instance, set up pop-up events, offer free refreshments, or partner with nearby attractions to draw in more riders.
Utilize data showing increased rides from April to September to launch time-limited offers and campaigns. Engage with riders through social media, email newsletters, and local advertisements to promote these offers.
4.Weekend Specials:
Since weekends see higher ridership, promote special weekend passes or family packages to attract group rides. Collaborate with local tourist attractions or restaurants to offer combined deals.
5.Highlight Environmental Impact:
Emphasize the environmental benefits of using bike share programs in your marketing materials. Share statistics on carbon footprint reduction and promote the sustainable aspect of biking to attract eco-conscious riders.
6.Dynamic Pricing:
Implement dynamic pricing strategies during peak hours and seasons to manage demand and encourage off-peak usage. Offer discounted rates for rides starting early in the morning or late at night.
By addressing these, Cyclistic can enhance user experience, optimize operations, and effectively increase ridership and membership.
Thank you,
Nisha Prasanth.