Cyclistic, a prominent bike-share company headquartered in Chicago, has rapidly gained traction in the city’s transportation landscape. In an effort to delve deeper into their customer base and refine their marketing strategies, Cyclistic seeks to understand the distinct behaviors and preferences of casual riders versus annual members.
The company recognizes the need to tailor its marketing approach to effectively convert casual riders into committed annual members. By leveraging data-driven insights, Cyclistic aims to develop a comprehensive understanding of how these two customer segments interact with their services differently.
Business Task
The objective of this business task is to develop a comprehensive marketing strategy for Cyclistic that addresses the distinct needs and behaviors of both annual members and casual riders. By answering the following three questions, we aim to optimize marketing efforts, increase customer engagement, and drive conversions from casual riders to annual members.
Data Background
The dataset was acquired from Index of bucket “divvy-tripdata”which are appropriate and will enable to analyse and identify trends.Motivate International Inc made the data available under thislicense
For this project, I downloaded data for twelve months (January to December 2020). The zipped CSVs were downloaded and unzipped into a folder.
Below shown the dataset of a cyclistic biketrip data for the year 2020.The dataset has 3541683 rows and 13 column.
Due to the large size of data we use R to analyse effectively.
In R, the library() function is used to load R packages into your current R session
library (tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.0 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library (janitor)
##
## Attaching package: 'janitor'
##
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library (lubridate)
library (scales)
##
## Attaching package: 'scales'
##
## The following object is masked from 'package:purrr':
##
## discard
##
## The following object is masked from 'package:readr':
##
## col_factor
rm(list=ls())
Below given are the year 2020 dataset of cyclistic bike share program which are downloaded and saved as CSV files. Here read.csv() is used for reading the csv files.
df1 <- read.csv("Divvy_Trips_2020_Q1.csv")
df2 <- read.csv("202004.csv")
df3 <- read.csv("202005.csv")
df4 <- read.csv("202006.csv")
df5<- read.csv("202007.csv")
df6 <- read.csv("202008.csv")
df7 <- read.csv("202009.csv")
df8 <- read.csv("202010.csv")
df9 <- read.csv("202011.csv")
df10 <- read.csv("202012.csv")
df20 <- rbind(df1,df2,df3,df4,df5,df6,df7,df8,df9,df10)
In R, the head() function is used to view the first few rows of a data frame or a matrix. It allows you to quickly inspect the structure and content of your data without displaying the entire dataset.
head(df20)
## ride_id rideable_type started_at ended_at
## 1 EACB19130B0CDA4A docked_bike 2020-01-21 20:06:59 2020-01-21 20:14:30
## 2 8FED874C809DC021 docked_bike 2020-01-30 14:22:39 2020-01-30 14:26:22
## 3 789F3C21E472CA96 docked_bike 2020-01-09 19:29:26 2020-01-09 19:32:17
## 4 C9A388DAC6ABF313 docked_bike 2020-01-06 16:17:07 2020-01-06 16:25:56
## 5 943BC3CBECCFD662 docked_bike 2020-01-30 08:37:16 2020-01-30 08:42:48
## 6 6D9C8A6938165C11 docked_bike 2020-01-10 12:33:05 2020-01-10 12:37:54
## start_station_name start_station_id end_station_name
## 1 Western Ave & Leland Ave 239 Clark St & Leland Ave
## 2 Clark St & Montrose Ave 234 Southport Ave & Irving Park Rd
## 3 Broadway & Belmont Ave 296 Wilton Ave & Belmont Ave
## 4 Clark St & Randolph St 51 Fairbanks Ct & Grand Ave
## 5 Clinton St & Lake St 66 Wells St & Hubbard St
## 6 Wells St & Hubbard St 212 Desplaines St & Randolph St
## end_station_id start_lat start_lng end_lat end_lng member_casual
## 1 326 41.9665 -87.6884 41.9671 -87.6674 member
## 2 318 41.9616 -87.6660 41.9542 -87.6644 member
## 3 117 41.9401 -87.6455 41.9402 -87.6530 member
## 4 24 41.8846 -87.6319 41.8918 -87.6206 member
## 5 212 41.8856 -87.6418 41.8899 -87.6343 member
## 6 96 41.8899 -87.6343 41.8846 -87.6446 member
Janitor is an R package that provides a set of functions to clean and preprocess data in R data frames
df20_cleanedcols <- janitor::remove_empty(df20,which =c("cols"))
df20_cleanedrows <- janitor::remove_empty(df20,which =c("rows"))
dim(df20_cleanedcols)
dim(df20_cleanedrows)
df20_clean <- na.omit(df20)
# for unique and removing duplicates
unique(df20_clean)
dim(df20_clean)
df20_clean <- df20_clean %>% filter(df20_clean$start_station_name!=" ")
Lubridate is an R package designed to make it easier to work with dates and times in R. It provides a set of functions that simplify common tasks such as parsing, manipulating, and formatting dates and times.we use parse date ymd_hms() and as.Date() for changing the Started_at and ended_at column format.
Given:
Changed:
Difftime() is used for calculating the difference in time. This helps us to find and analyse the duration of each ride.
##convert time and date
df <- df20_clean
#date
df$started_date <- as.Date(df$started_at)
df$ended_date <- as.Date(df$ended_at)
#time as hours and minutes
df$started_at <- lubridate::ymd_hms(df$started_at)
df$ended_at <- lubridate::ymd_hms(df$ended_at)
df$start_hour <-lubridate::hour(df$started_at)
df$ended_hour <-lubridate::hour(df$ended_at)
df$Hours <- difftime(df$ended_at,df$started_at,units = c("hours"))
df$Minutes <- difftime(df$ended_at,df$started_at,units = c("mins"))
df <- df %>%
filter(Minutes>0)
View(df)
Dim function dim(df) retrieve or set the dimensions of an object, such as a matrix or an array.
Here’s how it works:
dim(df)
## [1] 3395919 19
df2 <- df %>%
group_by(weekly = floor_date(started_date,"week"),start_hour) %>%
summarise(Minutes = sum(Minutes),
mean = mean(Minutes),Max = max(Minutes),
min = min(Minutes),count = n()) %>%
ungroup()
## `summarise()` has grouped output by 'weekly'. You can override using the
## `.groups` argument.
View(df2)
Here how it looks like
summary(df2$count)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2 465 1586 2680 3805 15285
#table of counts by hours
xtabs(df2$count~df2$start_hour)
## df2$start_hour
## 0 1 2 3 4 5 6 7 8 9 10
## 30995 18593 10371 5789 6708 23857 73771 131468 156179 129798 144410
## 11 12 13 14 15 16 17 18 19 20 21
## 186267 220570 227164 232564 256197 304646 356837 303557 217744 140971 92470
## 22 23
## 72082 52911
#table of count by months
df2$Monthy <- lubridate::month(df2$weekly)
The ggplot() function is the primary function used in the ggplot2 package, a popular data visualization package in R. It is used to create and customize plots based on a grammar of graphics approach, allowing users to create complex and highly customizable visualizations with relatively simple syntax.
Here’s how the ggplot() workes to calculate
#hourly count per ride
df2 %>%
ggplot() + geom_col(aes(x=weekly,y=count))+ scale_y_continuous(labels = comma)+
labs(title = "Count of rides per day", y = "Rides per hour")
df2 %>%
ggplot() + geom_col(aes(x=weekly,y=count))+ scale_y_continuous(labels = comma)+
labs(title = "Count of rides per day", subtitle = "based on 28 day moving average", y = "Avg rides per day")
df_biketype <- df %>%
group_by(member_casual,rideable_type,weekly = floor_date(started_date,"week")) %>%
summarise(Minutes = sum(Minutes),
mean = mean(Minutes),Max = max(Minutes),
min = min(Minutes),count = n()) %>%
ungroup()
## `summarise()` has grouped output by 'member_casual', 'rideable_type'. You can
## override using the `.groups` argument.
3. No of Rides Done on Per month
#table of count by months
df2 %>%
ggplot() + geom_col(aes(x=weekly,y=count))+ scale_y_continuous(labels = comma)+
labs(title = "Count of rides per week", y = "Rides per hour")
4. Ride variation between member Vs Casual
df_biketype <- df %>%
group_by(member_casual,rideable_type,weekly = floor_date(started_date,"week")) %>%
summarise(Minutes = sum(Minutes),
mean = mean(Minutes),Max = max(Minutes),
min = min(Minutes),count = n()) %>%
ungroup()
## `summarise()` has grouped output by 'member_casual', 'rideable_type'. You can
## override using the `.groups` argument.
View(df_biketype)
#count by rider type
ggplot(data = df_biketype) + geom_area( aes(x=weekly,y=count,fill = member_casual))+scale_y_continuous(labels = comma)+
labs(title = "Count of rides by rider type")
5.Understanding Most Bike type Usage
#count by bike type (total by week)
ggplot(df_biketype) + geom_area(aes(x=weekly,y=count,fill = rideable_type))+ scale_y_continuous(labels = comma)+
labs(title = "Count of rides by bike type",subtitle = "For the count of 12 months")
6. Identifying Top 20 station with Higher Ride Count
df %>% count(start_station_name,sort = TRUE) %>% top_n(20)
## Selecting by n
## start_station_name n
## 1 Streeter Dr & Grand Ave 34984
## 2 Clark St & Elm St 31459
## 3 Theater on the Lake 29117
## 4 Lake Shore Dr & Monroe St 28836
## 5 Lake Shore Dr & North Blvd 26299
## 6 Wells St & Concord Ln 24711
## 7 Indiana Ave & Roosevelt Rd 24346
## 8 Millennium Park 23956
## 9 Dearborn St & Erie St 23930
## 10 Columbus Dr & Randolph St 23574
## 11 Broadway & Barry Ave 23485
## 12 Clark St & Armitage Ave 23377
## 13 Wells St & Huron St 22623
## 14 Wells St & Elm St 22169
## 15 Kingsbury St & Kinzie St 22133
## 16 Wabash Ave & Grand Ave 21838
## 17 Clark St & Lincoln Ave 21768
## 18 St. Clair St & Erie St 21506
## 19 Michigan Ave & Oak St 21398
## 20 Desplaines St & Kinzie St 21181
# top 20start station by ride count
df %>% count(start_station_name,sort = TRUE) %>% top_n(20) %>% ggplot()+geom_col(aes(x=reorder(start_station_name,n),y=n))+
coord_flip()+labs(title = "Top 20 start stations by ride count", y = "station name",x="count of rides")+ scale_y_continuous(labels = comma)
## Selecting by n
Streeter Dr & Grand Ave has the large ride count as 34984.
Casual riders have the ride count more than members. marking the summer months of April to September where the most ride have happened.
Maximum docker bikes are used by both riders. In summer the riders have maximized.
Based on the analysis output indicating a high ride count at Streeter Dr & Grand Ave, with casual riders outnumbering members and peak ride activity during the summer months, as well as the utilization of maximum docker bikes by both rider groups, the marketing strategy team can implement the following recommendations:
Targeted Summer Campaigns: Launch targeted marketing campaigns during the summer months, especially from April to September, to capitalize on the peak ride activity. Focus on promoting Cyclistic’s services and memberships to casual riders, highlighting the benefits of biking during the warmer seasons, such as enjoying the outdoors and avoiding traffic congestion.
Membership Incentives: Offer special incentives and promotions to encourage casual riders to sign up for annual memberships. Highlight the cost-effectiveness and convenience of becoming a Cyclistic member, especially during periods of high bike usage like summer, when demand for rentals is at its peak.
Enhanced Docking Stations: Ensure that docking stations, especially at popular locations like Streeter Dr & Grand Ave, are well-maintained and stocked with a sufficient number of bikes, including maximum docker bikes. This will improve the overall user experience and make it easier for both casual riders and members to access bikes when needed.
Social Media Engagement: Leverage social media platforms to engage with potential customers and promote Cyclistic’s services. Share user-generated content, testimonials, and tips for biking in the city during the summer months. Encourage followers to become members and take advantage of exclusive benefits.
Data-Driven Decision Making: Continuously analyze ride data to identify trends and patterns in bike usage. Use this information to refine marketing strategies, optimize bike distribution, and make data-driven decisions that enhance the overall effectiveness of Cyclistic’s services.
By implementing these recommendations, the marketing strategy team can effectively capitalize on the high ride count at Streeter Dr & Grand Ave, increase membership conversions among casual riders, and maximize the utilization of Cyclistic’s bike rental service during the summer months.
Thank you,
Nisha Prasanth.