Case study: How does a bike-share navigate speedy success?

Overview

Cyclistic, a prominent bike-share company headquartered in Chicago, has rapidly gained traction in the city’s transportation landscape. In an effort to delve deeper into their customer base and refine their marketing strategies, Cyclistic seeks to understand the distinct behaviors and preferences of casual riders versus annual members.

The company recognizes the need to tailor its marketing approach to effectively convert casual riders into committed annual members. By leveraging data-driven insights, Cyclistic aims to develop a comprehensive understanding of how these two customer segments interact with their services differently.

Business Task

The objective of this business task is to develop a comprehensive marketing strategy for Cyclistic that addresses the distinct needs and behaviors of both annual members and casual riders. By answering the following three questions, we aim to optimize marketing efforts, increase customer engagement, and drive conversions from casual riders to annual members.

Understanding Usage Patterns: Analyze Cyclistic’s dataset to identify differences in how annual members and casual riders utilize Cyclistic bikes.

Data Background

The dataset was acquired from Index of bucket “divvy-tripdata”which are appropriate and will enable to analyse and identify trends.Motivate International Inc made the data available under thislicense

For this project, I downloaded data for twelve months (January to December 2020). The zipped CSVs were downloaded and unzipped into a folder.

Below shown the dataset of a cyclistic biketrip data for the year 2020.The dataset has 3541683 rows and 13 column.

Due to the large size of data we use R to analyse effectively.

R Programming

Loading Packages The R package is a collection of R functions, data sets, and compiled code that extends the functionality of R. Here we use four packages to analyse the data.

In R, the library() function is used to load R packages into your current R session

library (tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.0     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library (janitor)

## 
## Attaching package: 'janitor'
## 
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test

library (lubridate)
library (scales)

## 
## Attaching package: 'scales'
## 
## The following object is masked from 'package:purrr':
## 
##     discard
## 
## The following object is masked from 'package:readr':
## 
##     col_factor

rm(list=ls())

Read CSV file

Below given are the year 2020 dataset of cyclistic bike share program which are downloaded and saved as CSV files. Here read.csv() is used for reading the csv files.

df1 <- read.csv("Divvy_Trips_2020_Q1.csv")
df2 <-  read.csv("202004.csv")
df3 <- read.csv("202005.csv")
df4 <- read.csv("202006.csv")
df5<- read.csv("202007.csv")
df6 <- read.csv("202008.csv")
df7 <- read.csv("202009.csv")
df8 <- read.csv("202010.csv")
df9 <- read.csv("202011.csv")
df10 <- read.csv("202012.csv")
df20 <- rbind(df1,df2,df3,df4,df5,df6,df7,df8,df9,df10)

In R, the head() function is used to view the first few rows of a data frame or a matrix. It allows you to quickly inspect the structure and content of your data without displaying the entire dataset.

head(df20)

##            ride_id rideable_type          started_at            ended_at
## 1 EACB19130B0CDA4A   docked_bike 2020-01-21 20:06:59 2020-01-21 20:14:30
## 2 8FED874C809DC021   docked_bike 2020-01-30 14:22:39 2020-01-30 14:26:22
## 3 789F3C21E472CA96   docked_bike 2020-01-09 19:29:26 2020-01-09 19:32:17
## 4 C9A388DAC6ABF313   docked_bike 2020-01-06 16:17:07 2020-01-06 16:25:56
## 5 943BC3CBECCFD662   docked_bike 2020-01-30 08:37:16 2020-01-30 08:42:48
## 6 6D9C8A6938165C11   docked_bike 2020-01-10 12:33:05 2020-01-10 12:37:54
##         start_station_name start_station_id               end_station_name
## 1 Western Ave & Leland Ave              239          Clark St & Leland Ave
## 2  Clark St & Montrose Ave              234 Southport Ave & Irving Park Rd
## 3   Broadway & Belmont Ave              296       Wilton Ave & Belmont Ave
## 4   Clark St & Randolph St               51       Fairbanks Ct & Grand Ave
## 5     Clinton St & Lake St               66          Wells St & Hubbard St
## 6    Wells St & Hubbard St              212    Desplaines St & Randolph St
##   end_station_id start_lat start_lng end_lat  end_lng member_casual
## 1            326   41.9665  -87.6884 41.9671 -87.6674        member
## 2            318   41.9616  -87.6660 41.9542 -87.6644        member
## 3            117   41.9401  -87.6455 41.9402 -87.6530        member
## 4             24   41.8846  -87.6319 41.8918 -87.6206        member
## 5            212   41.8856  -87.6418 41.8899 -87.6343        member
## 6             96   41.8899  -87.6343 41.8846 -87.6446        member

Cleaning Data

Janitor is an R package that provides a set of functions to clean and preprocess data in R data frames

df20_cleanedcols <- janitor::remove_empty(df20,which =c("cols"))
df20_cleanedrows <- janitor::remove_empty(df20,which =c("rows"))
dim(df20_cleanedcols)
dim(df20_cleanedrows)

Removing duplicates and NA values

df20_clean <- na.omit(df20)
# for unique and removing duplicates 
unique(df20_clean)
dim(df20_clean)
df20_clean <- df20_clean %>% filter(df20_clean$start_station_name!=" ")

Organising Data

Lubridate is an R package designed to make it easier to work with dates and times in R. It provides a set of functions that simplify common tasks such as parsing, manipulating, and formatting dates and times.we use parse date ymd_hms() and as.Date() for changing the Started_at and ended_at column format.

Given:

Changed:

Difftime() is used for calculating the difference in time. This helps us to find and analyse the duration of each ride.

##convert time and date
df <- df20_clean
#date
df$started_date <- as.Date(df$started_at)
df$ended_date <- as.Date(df$ended_at)
#time as  hours and minutes
df$started_at <- lubridate::ymd_hms(df$started_at)
df$ended_at <- lubridate::ymd_hms(df$ended_at)

df$start_hour <-lubridate::hour(df$started_at)
df$ended_hour <-lubridate::hour(df$ended_at)

df$Hours <- difftime(df$ended_at,df$started_at,units = c("hours"))
df$Minutes <- difftime(df$ended_at,df$started_at,units = c("mins"))

df <- df %>% 
  filter(Minutes>0)

View(df)

Dim function dim(df) retrieve or set the dimensions of an object, such as a matrix or an array.

Here’s how it works:

dim(df)

## [1] 3395919      19

Summarize the data

df2 <- df %>% 
    group_by(weekly = floor_date(started_date,"week"),start_hour) %>% 
    summarise(Minutes = sum(Minutes), 
              mean = mean(Minutes),Max = max(Minutes),
              min = min(Minutes),count = n()) %>% 
              ungroup()

## `summarise()` has grouped output by 'weekly'. You can override using the
## `.groups` argument.

View(df2)

Here how it looks like Time

summary(df2$count)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       2     465    1586    2680    3805   15285

#table of counts by hours
xtabs(df2$count~df2$start_hour)

## df2$start_hour
##      0      1      2      3      4      5      6      7      8      9     10 
##  30995  18593  10371   5789   6708  23857  73771 131468 156179 129798 144410 
##     11     12     13     14     15     16     17     18     19     20     21 
## 186267 220570 227164 232564 256197 304646 356837 303557 217744 140971  92470 
##     22     23 
##  72082  52911

#table of count by months
df2$Monthy <- lubridate::month(df2$weekly)

Data Visualisation

The ggplot() function is the primary function used in the ggplot2 package, a popular data visualization package in R. It is used to create and customize plots based on a grammar of graphics approach, allowing users to create complex and highly customizable visualizations with relatively simple syntax.

Here’s how the ggplot() workes to calculate

Rides Done on Per Day

#hourly count per ride
df2 %>% 
  ggplot() + geom_col(aes(x=weekly,y=count))+ scale_y_continuous(labels = comma)+
  labs(title = "Count of rides per day", y = "Rides per hour")

Calculating Average Ride Rer Day

df2 %>% 
  ggplot() + geom_col(aes(x=weekly,y=count))+ scale_y_continuous(labels = comma)+
  labs(title = "Count of rides per day", subtitle = "based on 28 day moving average", y = "Avg rides per day")

Summarise data with ridable_type

df_biketype <- df %>% 
  group_by(member_casual,rideable_type,weekly = floor_date(started_date,"week")) %>% 
  summarise(Minutes = sum(Minutes), 
            mean = mean(Minutes),Max = max(Minutes),
            min = min(Minutes),count = n()) %>% 
  ungroup()

## `summarise()` has grouped output by 'member_casual', 'rideable_type'. You can
## override using the `.groups` argument.

3. No of Rides Done on Per month

#table of count by months

df2 %>% 
  ggplot() + geom_col(aes(x=weekly,y=count))+ scale_y_continuous(labels = comma)+
  labs(title = "Count of rides per week", y = "Rides per hour")

4. Ride variation between member Vs Casual

df_biketype <- df %>% 
  group_by(member_casual,rideable_type,weekly = floor_date(started_date,"week")) %>% 
  summarise(Minutes = sum(Minutes), 
            mean = mean(Minutes),Max = max(Minutes),
            min = min(Minutes),count = n()) %>% 
  ungroup()

## `summarise()` has grouped output by 'member_casual', 'rideable_type'. You can
## override using the `.groups` argument.

View(df_biketype)

biketype

#count by rider type
  
  ggplot(data = df_biketype) + geom_area( aes(x=weekly,y=count,fill = member_casual))+scale_y_continuous(labels = comma)+
    labs(title = "Count of rides by rider type")

5.Understanding Most Bike type Usage

#count by bike type (total by week)

  ggplot(df_biketype) + geom_area(aes(x=weekly,y=count,fill = rideable_type))+ scale_y_continuous(labels = comma)+
    labs(title = "Count of rides by bike type",subtitle = "For the count of 12 months")

6. Identifying Top 20 station with Higher Ride Count

df %>% count(start_station_name,sort = TRUE) %>% top_n(20)

## Selecting by n

##            start_station_name     n
## 1     Streeter Dr & Grand Ave 34984
## 2           Clark St & Elm St 31459
## 3         Theater on the Lake 29117
## 4   Lake Shore Dr & Monroe St 28836
## 5  Lake Shore Dr & North Blvd 26299
## 6       Wells St & Concord Ln 24711
## 7  Indiana Ave & Roosevelt Rd 24346
## 8             Millennium Park 23956
## 9       Dearborn St & Erie St 23930
## 10  Columbus Dr & Randolph St 23574
## 11       Broadway & Barry Ave 23485
## 12    Clark St & Armitage Ave 23377
## 13        Wells St & Huron St 22623
## 14          Wells St & Elm St 22169
## 15   Kingsbury St & Kinzie St 22133
## 16     Wabash Ave & Grand Ave 21838
## 17     Clark St & Lincoln Ave 21768
## 18     St. Clair St & Erie St 21506
## 19      Michigan Ave & Oak St 21398
## 20  Desplaines St & Kinzie St 21181

# top 20start station by ride count
   df %>% count(start_station_name,sort = TRUE) %>% top_n(20) %>% ggplot()+geom_col(aes(x=reorder(start_station_name,n),y=n))+
  coord_flip()+labs(title = "Top 20 start stations by ride count", y = "station name",x="count of rides")+ scale_y_continuous(labels = comma)

## Selecting by n

Key Findings

Streeter Dr & Grand Ave has the large ride count as 34984.
Casual riders have the ride count more than members. marking the summer months of April to September where the most ride have happened.
Maximum docker bikes are used by both riders. In summer the riders have maximized.

Recommendations

Based on the analysis output indicating a high ride count at Streeter Dr & Grand Ave, with casual riders outnumbering members and peak ride activity during the summer months, as well as the utilization of maximum docker bikes by both rider groups, the marketing strategy team can implement the following recommendations:

Targeted Summer Campaigns: Launch targeted marketing campaigns during the summer months, especially from April to September, to capitalize on the peak ride activity. Focus on promoting Cyclistic’s services and memberships to casual riders, highlighting the benefits of biking during the warmer seasons, such as enjoying the outdoors and avoiding traffic congestion.
Membership Incentives: Offer special incentives and promotions to encourage casual riders to sign up for annual memberships. Highlight the cost-effectiveness and convenience of becoming a Cyclistic member, especially during periods of high bike usage like summer, when demand for rentals is at its peak.
Enhanced Docking Stations: Ensure that docking stations, especially at popular locations like Streeter Dr & Grand Ave, are well-maintained and stocked with a sufficient number of bikes, including maximum docker bikes. This will improve the overall user experience and make it easier for both casual riders and members to access bikes when needed.
Social Media Engagement: Leverage social media platforms to engage with potential customers and promote Cyclistic’s services. Share user-generated content, testimonials, and tips for biking in the city during the summer months. Encourage followers to become members and take advantage of exclusive benefits.
Data-Driven Decision Making: Continuously analyze ride data to identify trends and patterns in bike usage. Use this information to refine marketing strategies, optimize bike distribution, and make data-driven decisions that enhance the overall effectiveness of Cyclistic’s services.

By implementing these recommendations, the marketing strategy team can effectively capitalize on the high ride count at Streeter Dr & Grand Ave, increase membership conversions among casual riders, and maximize the utilization of Cyclistic’s bike rental service during the summer months.

Thank you,

Nisha Prasanth.

Case study: How does a bike-share navigate speedy success?

Nisha P

2024-03-31

Overview

R Programming

Key Findings

Recommendations