How does a Bike-Share navigate speedy success?

Summary

This is the case study of Google Analytics certificate. In this document, The analysis work is applied by the public data from June 2022 till May 2023 which is from Cyclistic Bike-Share. Cyclistic is a bike-share company in Chicago. The goal of this analysis is to suggest recommendations which are to maximize the number of annual memberships.The analysis process will be taken by each phase of “ask”, “prepare”, “process”, “analyze”, “share”, and “act” to answer the key business questions.

Objectives

  • An analytically report
  • Included:
    • A clear statement of the business task
    • A description of all data sources used
    • Documentation of any cleaning or manipulation of data
    • A summary of the analysis
    • Supporting visualizations and key findings
    • Top three recommendations based on the analysis

1.ASK

This section is to clarify the basic requirements and business questions in related with the analysis.

1.1.Stakeholders

  1. Lily Moreno: The director of marketing. Moreno is responsible for the development of campaigns and initiatives to promote the bike-share program.

  2. Cyclistc marketing analytics team: a team of data analysts who are responsible for the development of campaigns an initiatives to promote the bike-share program.

  3. Cyclistic executive team: The detail-oriented exective team will decide whether to approve the recommended marketing program.

1.2.Bike-share program

  • More than 5800 bicycles

  • More than 600 docking stations

  • Type of bikes:

    • Reclining bikes

    • Hand tricycles

    • Cargo bikes

  • Type of riders:

    • Majority of riders use standard two-wheeled bikes.

    • 8% of riders use assistive options.

  • Application

    • More likely to ride for leisure.

    • 30% to commute to work each day.

1.3.Current strategies

  • Building general awareness and appealing to broad consumer segments.
  • Price plans:
  • Single-ride pass (Casual rider)
  • Full-day passe (Casual rider)
  • Annual membership (Cyclistic member)
  • A plan of annual membership is much more profitable for Cyclistc.
  • Flexible price plan attracts more customers.

1.4.Business goal

  • Design marketing strategies aimed at converting casual riders into annual members to maximize the numbers of annual memberships.

1.5.Business tasks

  1. How do annual members and casual riders use Cyclistic bikes differently?
  2. Why would casual riders buy Cyclistic annual memberships?
  3. How can Cyclistic use digital media to influence casual riders to become members?

2.PREPARE

In order to start analysis, the Cyclistic’s historical trip data will be helpful for trends of the analysis and the identification.

2.1.Data location

The historical trip data is stored and available for the analytically use by Motivate International Inc. under the license.

2.2.Data set

Data is organised monthly. For this analysis, data for 12 months is required. the period is from Jun 2022 till May 2023. The Data set follows ROCC, Reliability, Original, Comprehensive, Current, and Cited. According to the license of the company divvy, the data set is under licensing, privacy, security, and accessibility. [(https://ride.divvybikes.com/data-license-agreement)]

Please note that the following information is excluded due to the data-privacy issues; - Personal identifiable information - Credit card numbers - Cyclistic service area - Purchasing frequency.

2.3.Data credibility

According to the ROCC (Reliable, Original, Comprehensive, Cited/Current) standard, the trip data is from reliable and original data location, the data type is csv file, and updated current period. So, we confirm the data is credible for the analysis.

3.PROCESS

At the process phase of the data analysis, the following tasks will be implemented.

  1. Installation of tools to import and to clean the data.
  2. Collect and ensure data’s integrity by Rstudio.
  3. Steps to ensure that the data is clean. Checking the data for errors, duplicates, vectors, the number of unique data, etc.
  4. Verify that data is clean and ready to analyze. For example, no duplicated data, appropriate vector for the data variables.
  5. Document the cleaning or manipulation of data.

3.1 Install tool packages

Install required packages for the data cleaning, manipulation, visualization, and documentation.

  • tidyverse: Data import and wrangling
  • dplyr: Fast data manipulation
  • here: Enables easy file referencing
  • ggplot2: Make beautiful graphics.

Then, we take a place to load the following functions.

library(tidyverse) #helps wrangle data
library(lubridate) #helps wrangle date attributes
library(ggplot2) #helps visualize data
library(cowplot) #helps visualize data

#For data cleaning, following packages are installed.
library(here)
library(janitor)
library(skimr)
library(dplyr)
getwd() #displays your working directory
## [1] "C:/Users/satos/Documents/project/case-1"

3.2.Collect and ensure data

3.2.1.Collect data

Collect 12 data sets with csv files by the function of “read_csv”.

df2206 <- read_csv("202206-divvy-tripdata.csv")
df2207 <- read_csv("202207-divvy-tripdata.csv")
df2208 <- read_csv("202208-divvy-tripdata.csv")
df2209 <- read_csv("202209-divvy-tripdata.csv")
df2210 <- read_csv("202210-divvy-tripdata.csv")
df2211 <- read_csv("202211-divvy-tripdata.csv")
df2212 <- read_csv("202212-divvy-tripdata.csv")
df2301 <- read_csv("202301-divvy-tripdata.csv")
df2302 <- read_csv("202302-divvy-tripdata.csv")
df2303 <- read_csv("202303-divvy-tripdata.csv")
df2304 <- read_csv("202304-divvy-tripdata.csv")
df2305 <- read_csv("202305-divvy-tripdata.csv")
3.2.2 Ensure data integrity

Confirm all column names on each file has the same.

colnames(df2206)
colnames(df2207)
colnames(df2208)
colnames(df2209)
colnames(df2210)
colnames(df2211)
colnames(df2212)
colnames(df2301)
colnames(df2302)
colnames(df2303)
colnames(df2304)
colnames(df2305)
############
#[1] "ride_id"            "rideable_type"      "started_at"        
#[4] "ended_at"           "start_station_name" "start_station_id"  
#[7] "end_station_name"   "end_station_id"     "start_lat"         
#[10] "start_lng"          "end_lat"            "end_lng"           #
#[13] "member_casual"    
###########

Accordingly, the column names are the same for all 12 files. In addition, it does not include any personally identifiable information, such as birth-year, gender, names, and any credit card information.

3.2.3.Inspect the data frames Using string function, we look for incongruities for data analysis.
str(df2206)
str(df2207)
str(df2208)
str(df2209)
str(df2210)
str(df2211)
str(df2212)
str(df2301)
str(df2302)
str(df2303)
str(df2304)
str(df2305)

### 
#  ..   ride_id = col_character(),
#  ..   rideable_type = col_character(),
#  ..   started_at = col_datetime(format = ""),
#  ..   ended_at = col_datetime(format = ""),
#  ..   start_station_name = col_character(),
#  ..   start_station_id = col_character(),
#  ..   end_station_name = col_character(),
#  ..   end_station_id = col_character(),
#  ..   start_lat = col_double(),
#  ..   start_lng = col_double(),
#  ..   end_lat = col_double(),
#  ..   end_lng = col_double(),
#  ..   member_casual = col_character()
###

Accordingly, we confirmed all the data type on each data set are same. But, at the same time, NA data values are found in some columns.

3.3.Steps to clean data

In order to clean data efficiently, such as removing “NA” and duplicates, we wrangle 12 data sets and combine into a single file as follows:

3.3.1.Wrangle data and combine into a single file

In this case, all column names on each file are same, so we use “bind_rows” function instead of using the combination of “merge()” with “group_by()”.

all_trips<- bind_rows(df2206, df2207, df2208, df2209, df2210, df2211, df2212, df2301, df2302, df2303, df2304, df2305)
3.3.2.Check duplicates

Check if there are any duplicated data.

sum(duplicated(all_trips)) # 0

Accordingly, we find out zero duplicated data in “all_trips”.

3.3.3.Drop off NA data

Then, we drop off NA data as well.

all_trips <- all_trips %>%
  distinct() %>%
  drop_na()
3.3.4.Add Year, Month, Day, and the day of the week.

The data can only be aggregated at the ride-level, which is too granular. We will want to add some additional columns of data –such as day, month, year –that provide additional opportunities to aggregate the data.

all_trips$date<-as.Date(all_trips$started_at) #The default format is yyyy-mm-dd
all_trips$month<-format(as.Date(all_trips$date),"%m")
all_trips$day<-format(as.Date(all_trips$date),"%d")
all_trips$year<-format(as.Date(all_trips$date),"%Y")
all_trips$day_of_week<-format(as.Date(all_trips$date),"%A")
all_trips$time <- format(all_trips$started_at, format = "%H:%M:%S")
3.3.5.Add the extra columns to know riding time as “ride_length”

We will want to add a calculated field for length of rides as the “trip duration” column. We will add “ride_length” in seconds to the entire data frame for consistency.

all_trips$ride_length<-difftime(all_trips$ended_at,all_trips$started_at)

3.4.Verify the data as clean

Check the data that is cleaned with the following functions:

head(all_trips) #View the first 6 data rows
tail(all_trips) #View the last 6 data rows
nrow(all_trips) #Numbers fo rows
#4494681
dim(all_trips) #dimension of data frame
# 4494681      13
n_unique(all_trips$member_casual) #Count numbers of unique data variables. 
# 2. 
summary(all_trips) #summarise data 
  • Vector type of each column is confirmed and there is no duplicated data according to the verification process at “3.3.2.”.

  • According to the results of “summary and head(), the class of”ride_length” shall be transform to numeric for the calculation.

3.4.1 Transform to numeric vector of “ride_length”.
is.factor(all_trips$ride_length) #Check if it is the categorical data. [1] FALSE

all_trips$ride_length<-as.numeric(as.character(all_trips$ride_length)) #Transform data to numeric via character.
is.numeric(all_trips$ride_length) #Confirm if it is numeric data. [1] TRUE

Accordingly, now the class of “ride_length” turns as numeric.

Check the “ride_length” again.

summary(all_trips$ride_length)

We found negative values on the column of “ride_length”. This values will give confusion for analytics works and zero value does not mean anything, so we need to remove them.

3.4.2.Clean the data of “ride_length”.

At first, we count the numbers of negative and zero values and arrange ascending order of the “ride_length” and see the results.

sum(all_trips$ride_length <=0) #Count numbers of negative and zero data.
#The negative values exists 339 data out of 4494681 data.(0.75%)

#See other data information in the columns when the negative value of "ride_length".
all_trips %>%
  arrange(ride_length)

It seems no tendencies of the negative values on the column of “ride_length”. So we drop off the rows in the “ride_length” with negative and zero values with subset() function.

#Filter out negative and zero values in the column of ride_length and create a new data frame.
all_trips1 <- subset(all_trips, ride_length > 0)
head(all_trips1)
sum(all_trips1$ride_length <=0) # Verify any negative and zero values are remained in the column of "ride_length". 
summary(all_trips1$ride_length)
# Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
# 1.0     348.0     609.0     977.3    1090.0 1922127.0 

Accordingly, we removed any negative, zero, and NA values in the column of “ride_length”.

3.5. Document

Through the process from 3.1 to 3.4, we cleaned and manipulate the trip data of divvy from June 2022 until May 2023.

summary(all_trips1)
##    ride_id          rideable_type        started_at                    
##  Length:4494342     Length:4494342     Min.   :2022-06-01 00:00:04.00  
##  Class :character   Class :character   1st Qu.:2022-07-25 17:40:49.25  
##  Mode  :character   Mode  :character   Median :2022-09-21 14:53:43.50  
##                                        Mean   :2022-10-26 20:47:49.33  
##                                        3rd Qu.:2023-02-04 15:56:11.25  
##                                        Max.   :2023-05-31 23:59:49.00  
##     ended_at                      start_station_name start_station_id  
##  Min.   :2022-06-01 00:02:38.00   Length:4494342     Length:4494342    
##  1st Qu.:2022-07-25 17:56:54.00   Class :character   Class :character  
##  Median :2022-09-21 15:08:44.00   Mode  :character   Mode  :character  
##  Mean   :2022-10-26 21:04:06.61                                        
##  3rd Qu.:2023-02-04 16:08:30.75                                        
##  Max.   :2023-06-07 23:04:26.00                                        
##  end_station_name   end_station_id       start_lat       start_lng     
##  Length:4494342     Length:4494342     Min.   :41.65   Min.   :-87.84  
##  Class :character   Class :character   1st Qu.:41.88   1st Qu.:-87.66  
##  Mode  :character   Mode  :character   Median :41.90   Median :-87.64  
##                                        Mean   :41.90   Mean   :-87.65  
##                                        3rd Qu.:41.93   3rd Qu.:-87.63  
##                                        Max.   :42.06   Max.   :-87.53  
##     end_lat         end_lng       member_casual           date           
##  Min.   : 0.00   Min.   :-87.84   Length:4494342     Min.   :2022-06-01  
##  1st Qu.:41.88   1st Qu.:-87.66   Class :character   1st Qu.:2022-07-25  
##  Median :41.90   Median :-87.64   Mode  :character   Median :2022-09-21  
##  Mean   :41.90   Mean   :-87.65                      Mean   :2022-10-26  
##  3rd Qu.:41.93   3rd Qu.:-87.63                      3rd Qu.:2023-02-04  
##  Max.   :42.06   Max.   :  0.00                      Max.   :2023-05-31  
##     month               day                year           day_of_week       
##  Length:4494342     Length:4494342     Length:4494342     Length:4494342    
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##      time            ride_length       
##  Length:4494342     Min.   :      1.0  
##  Class :character   1st Qu.:    348.0  
##  Mode  :character   Median :    609.0  
##                     Mean   :    977.3  
##                     3rd Qu.:   1090.0  
##                     Max.   :1922127.0
str(all_trips1)
## tibble [4,494,342 × 20] (S3: tbl_df/tbl/data.frame)
##  $ ride_id           : chr [1:4494342] "B12AD6565494C368" "BAD4CB075003A605" "76DAD9FC95774B53" "47DE68ACCA138C13" ...
##  $ rideable_type     : chr [1:4494342] "classic_bike" "electric_bike" "electric_bike" "electric_bike" ...
##  $ started_at        : POSIXct[1:4494342], format: "2022-06-09 22:28:32" "2022-06-19 17:08:23" ...
##  $ ended_at          : POSIXct[1:4494342], format: "2022-06-09 22:52:17" "2022-06-19 17:08:25" ...
##  $ start_station_name: chr [1:4494342] "California Ave & Milwaukee Ave" "California Ave & Milwaukee Ave" "Burnham Greenway & 105th St" "Wood St & Chicago Ave" ...
##  $ start_station_id  : chr [1:4494342] "13084" "13084" "20222" "637" ...
##  $ end_station_name  : chr [1:4494342] "California Ave & Milwaukee Ave" "California Ave & Milwaukee Ave" "Burnham Greenway & 105th St" "California Ave & Division St" ...
##  $ end_station_id    : chr [1:4494342] "13084" "13084" "20222" "13256" ...
##  $ start_lat         : num [1:4494342] 41.9 41.9 41.7 41.9 41.9 ...
##  $ start_lng         : num [1:4494342] -87.7 -87.7 -87.5 -87.7 -87.7 ...
##  $ end_lat           : num [1:4494342] 41.9 41.9 41.7 41.9 41.9 ...
##  $ end_lng           : num [1:4494342] -87.7 -87.7 -87.5 -87.7 -87.7 ...
##  $ member_casual     : chr [1:4494342] "casual" "casual" "casual" "casual" ...
##  $ date              : Date[1:4494342], format: "2022-06-09" "2022-06-19" ...
##  $ month             : chr [1:4494342] "06" "06" "06" "06" ...
##  $ day               : chr [1:4494342] "09" "19" "26" "27" ...
##  $ year              : chr [1:4494342] "2022" "2022" "2022" "2022" ...
##  $ day_of_week       : chr [1:4494342] "Thursday" "Sunday" "Sunday" "Monday" ...
##  $ time              : chr [1:4494342] "22:28:32" "17:08:23" "23:59:44" "11:40:53" ...
##  $ ride_length       : num [1:4494342] 1425 2 1542 563 2083 ...

4.ANALYZE

In this section, we will identify patterns and draw conclusions and make predictions and recommendations in order to respond the following business questions. 1. How do annual members and casual riders use Cyclistic bikes differently? 2. Why would casual riders buy Cyclistic annual memberships? 3. How can Cyclistic use digital media to influence casual riders to become members?

4.1.Identify patterns

4.1.1.Members and Casuals

At first, let’s see the ratio and numbers of members and casuals.

#Count numbers of member and casuals
table(all_trips1$member_casual)
#casual  member 
#1747757 2746585 

#Calculate the ratio
member_casual_ratio <- all_trips1 %>%
  group_by(member_casual) %>%
  summarise(total = n()) %>% #Count total numbers of each variables, "casual" and "member".
  mutate(totals = sum(total)) %>% #Create a new column as "totals" by mutate() function and calculate the total by sum() function
  group_by(member_casual) %>%
  summarise(total_ratio = total / totals) %>% #summarize ratio
  mutate(labels = scales::percent(total_ratio)) #Create a new column and lable withing the cell with % by "scales::percent()" function.
head(member_casual_ratio)
#Visualize as a pie chart.
member_casual_ratio %>%
  ggplot(aes(x="",y=total_ratio, fill=member_casual)) +
  geom_bar(stat = "identity", width = 1)+
  coord_polar("y", start=0)+  #Convert the plot to polar coordinates
  theme_minimal()+
  theme(axis.title.x= element_blank(),
        axis.title.y = element_blank(),
        panel.border = element_blank(), 
        panel.grid = element_blank(), 
        axis.ticks = element_blank(),
        axis.text.x = element_blank(),
        plot.title = element_text(hjust = 0.5, size=14, face = "bold")) +
  scale_fill_manual(values = c("#66CDAA","#ffd480")) +
  geom_text(aes(label = labels),
            position = position_stack(vjust = 0.5, reverse = FALSE))+
  labs(title="User distribution", fill = "User type")

Accordingly, the number of member is 61% and 39% are casual. This is the latest data sets, maybe the numbers of member has been increased because of the improvement.

4.1.2.Average, Median, Max, and Min values

We calculate, average, median, max, and min values accordingly.

mean(all_trips1$ride_length) #straight average(total ride length/rides)
## [1] 977.2823
median(all_trips1$ride_length) #midpoint number in the ascending array of ride lengths
## [1] 609
max(all_trips1$ride_length) #longest ride
## [1] 1922127
min(all_trips1$ride_length) #shortest ride
## [1] 1
#Compare members and casual users
aggregate(all_trips1$ride_length~all_trips1$member_casual,FUN=mean)
##   all_trips1$member_casual all_trips1$ride_length
## 1                   casual              1363.2759
## 2                   member               731.6599
aggregate(all_trips1$ride_length~all_trips1$member_casual,FUN=median)
##   all_trips1$member_casual all_trips1$ride_length
## 1                   casual                    785
## 2                   member                    525
aggregate(all_trips1$ride_length~all_trips1$member_casual,FUN=max)
##   all_trips1$member_casual all_trips1$ride_length
## 1                   casual                1922127
## 2                   member                  89872
aggregate(all_trips1$ride_length~all_trips1$member_casual,FUN=min)
##   all_trips1$member_casual all_trips1$ride_length
## 1                   casual                      1
## 2                   member                      1
4.1.3.Chlonogical analysis, month and the day of week.

See the average ride time by each month for members vs casual users.

aggregate(all_trips1$ride_length~all_trips1$member_casual+all_trips1$month,FUN=mean)
##    all_trips1$member_casual all_trips1$month all_trips1$ride_length
## 1                    casual               01               892.8232
## 2                    member               01               600.2738
## 3                    casual               02              1060.4027
## 4                    member               02               625.3519
## 5                    casual               03              1003.2793
## 6                    member               03               610.2839
## 7                    casual               04              1357.3867
## 8                    member               04               693.3902
## 9                    casual               05              1471.5850
## 10                   member               05               761.6897
## 11                   casual               06              1501.2135
## 12                   member               06               821.0720
## 13                   casual               07              1505.7013
## 14                   member               07               810.2364
## 15                   casual               08              1397.1418
## 16                   member               08               786.4180
## 17                   casual               09              1308.2871
## 18                   member               09               757.3497
## 19                   casual               10              1228.1667
## 20                   member               10               700.5410
## 21                   casual               11              1034.8040
## 22                   member               11               649.6634
## 23                   casual               12               890.5108
## 24                   member               12               612.0680

See the average ride time by each day for members vs casual users.

aggregate(all_trips1$ride_length~all_trips1$member_casual+all_trips1$day_of_week,FUN=mean)
##    all_trips1$member_casual all_trips1$day_of_week all_trips1$ride_length
## 1                    casual                 Friday              1308.4576
## 2                    member                 Friday               720.9742
## 3                    casual                 Monday              1369.1382
## 4                    member                 Monday               693.9649
## 5                    casual               Saturday              1526.4866
## 6                    member               Saturday               821.0278
## 7                    casual                 Sunday              1570.1538
## 8                    member                 Sunday               817.0922
## 9                    casual               Thursday              1210.4286
## 10                   member               Thursday               706.6981
## 11                   casual                Tuesday              1225.4594
## 12                   member                Tuesday               701.2078
## 13                   casual              Wednesday              1177.3245
## 14                   member              Wednesday               700.9543
4.1.4.Order the months and the days of the week

Notice that the days of the week are out of order.

all_trips1$day_of_week<-ordered(all_trips1$day_of_week,levels=c("Sunday","Monday","Tuesday","Wednesday","Thursday","Friday","Saturday"))

#Confirm
aggregate(all_trips1$ride_length~all_trips1$member_casual+all_trips1$day_of_week,FUN=mean)
##    all_trips1$member_casual all_trips1$day_of_week all_trips1$ride_length
## 1                    casual                 Sunday              1570.1538
## 2                    member                 Sunday               817.0922
## 3                    casual                 Monday              1369.1382
## 4                    member                 Monday               693.9649
## 5                    casual                Tuesday              1225.4594
## 6                    member                Tuesday               701.2078
## 7                    casual              Wednesday              1177.3245
## 8                    member              Wednesday               700.9543
## 9                    casual               Thursday              1210.4286
## 10                   member               Thursday               706.6981
## 11                   casual                 Friday              1308.4576
## 12                   member                 Friday               720.9742
## 13                   casual               Saturday              1526.4866
## 14                   member               Saturday               821.0278

4.2.Tendency by month and the day of week.

Analyze ridership data by type and month as a case 1 and by type and weekday as a case 2.

4.2.1.Month
#creates month field using month()
all_trips1 %>%
mutate(month=month(started_at,label=TRUE))%>% 

#groups by user type and weekday. 
group_by(member_casual,month)%>%  

#calculates the number of rides and average duration.
summarise(number_of_rides=n(), average_duration=mean(ride_length))%>%
#sorts data. 
arrange(member_casual,month) 
## # A tibble: 24 × 4
## # Groups:   member_casual [2]
##    member_casual month number_of_rides average_duration
##    <chr>         <ord>           <int>            <dbl>
##  1 casual        Jan             29618             893.
##  2 casual        Feb             32774            1060.
##  3 casual        Mar             46786            1003.
##  4 casual        Apr            110526            1357.
##  5 casual        May            177025            1472.
##  6 casual        Jun            292053            1501.
##  7 casual        Jul            311649            1506.
##  8 casual        Aug            270074            1397.
##  9 casual        Sep            220905            1308.
## 10 casual        Oct            151312            1228.
## # ℹ 14 more rows
4.2.2.Weekday
#creates weekday field using wday()
all_trips1 %>%
mutate(weekday = wday(started_at,label = TRUE))%>% 

#groups by user type and weekday. 
group_by(member_casual,weekday)%>%  

#calculates the number of rides and average duration.
summarise(number_of_rides = n(), average_duration = mean(ride_length))%>%
#sorts data. 
arrange(member_casual,weekday) 
## # A tibble: 14 × 4
## # Groups:   member_casual [2]
##    member_casual weekday number_of_rides average_duration
##    <chr>         <ord>             <int>            <dbl>
##  1 casual        Sun              289531            1570.
##  2 casual        Mon              194960            1369.
##  3 casual        Tue              202311            1225.
##  4 casual        Wed              218102            1177.
##  5 casual        Thu              233749            1210.
##  6 casual        Fri              258161            1308.
##  7 casual        Sat              350943            1526.
##  8 member        Sun              303812             817.
##  9 member        Mon              374352             694.
## 10 member        Tue              438773             701.
## 11 member        Wed              456824             701.
## 12 member        Thu              442091             707.
## 13 member        Fri              387036             721.
## 14 member        Sat              343697             821.
4.2.3.Daily hours
#At first, transform to POSIXct vector from character about "time" data, H:M:S.
all_trips1$time <- as.POSIXct(all_trips1$time, format = "%H:%M:%S")

#Extract the hour from the POSIXct object by hour() function
all_trips1 %>%
  mutate(hour_day= hour(time)) %>%
  
#groups by user type and hour. 
group_by(member_casual, hour_day)%>%  

#calculates the number of rides and average duration.
summarise(number_of_rides = n(), average_duration = mean(ride_length))%>%
#sorts data. 
arrange(member_casual,hour_day) 
## # A tibble: 48 × 4
## # Groups:   member_casual [2]
##    member_casual hour_day number_of_rides average_duration
##    <chr>            <int>           <int>            <dbl>
##  1 casual               0           32053            1217.
##  2 casual               1           20813            1304.
##  3 casual               2           12246            1251.
##  4 casual               3            6763            1186.
##  5 casual               4            4515            1078.
##  6 casual               5            8707             923.
##  7 casual               6           23390             893.
##  8 casual               7           40346             877.
##  9 casual               8           55066            1001.
## 10 casual               9           56190            1350.
## # ℹ 38 more rows

5.SHARE

5.1.Number of rides by monthly rider type

all_trips1%>%
mutate(month=month(started_at,label=TRUE))%>%
group_by(member_casual,month)%>%
summarise(number_of_rides=n(),average_duration=mean(ride_length))%>%
arrange(member_casual,month)%>%
ggplot(mapping = aes(x=month,y=number_of_rides,fill=member_casual))+
  geom_col(position="dodge")+
    scale_y_continuous(labels = scales::number_format())+ #To display as numbers to avoid "1e+5".
    scale_fill_manual(values = c("#66CDAA","#ffd480")) +
  labs(title = "Rider numbers through a year", subtitle = "Member vs Casual", caption = "Data collected by Divvy Data", x = "Month", y= "Numbers of riders")

  • Casual riders use the bike-share service more from May to October.

  • Casual riders use the service less from November to April.

  • Member riders use the service relatively more often from March to November.

  • Member riders use the service less from December to February.

  • During warm month, casual riders uses the service more.

5.2.Average duration through a year

all_trips1%>%
mutate(month=month(started_at,label=TRUE))%>% #Use month() function for labeling readable variables.
group_by(member_casual,month)%>%
summarise(number_of_rides=n(),average_duration=mean(ride_length))%>%
arrange(member_casual,month)%>%
ggplot(aes(x=month,y=average_duration,fill=member_casual))+geom_col(position="dodge")+
    scale_y_continuous(labels = scales::number_format())+ #To display as numbers to avoid "1e+5".
    scale_fill_manual(values = c("#66CDAA","#ffd480")) +
  labs(title = "Average riding lenghth through a year", subtitle = "Member vs Casual", caption = "Data collected by Divvy Data", x = "Month", y= "Riding length in seconds")

  • Casual users always use the service longer than member.

  • Member users use the time of the service relatively stable, around 10 minutes (600 seconds).

5.3.Number of rides by weekly rider type.

all_trips1%>%
mutate(weekday=wday(started_at,label=TRUE))%>% #Creat a new column as "weekday" and transform to name of weekday by wday() function.
group_by(member_casual,weekday)%>% #Pick up two columns
summarise(number_of_rides=n(),average_duration=mean(ride_length))%>%
arrange(member_casual,weekday)%>%
ggplot(aes(x=weekday,y=number_of_rides,fill=member_casual))+geom_col(position="dodge")+
    scale_y_continuous(labels = scales::number_format())+ #To display as numbers to avoid "1e+5".
    scale_fill_manual(values = c("#66CDAA","#ffd480")) +
  labs(title = "Riding numbers through a week", subtitle = "Member vs Casual", caption = "Data collected by Divvy Data", x = "Weekday", y= "Riding numbers")

  • During weekend, both users recorded the same amount.

  • Casual riders use the service more often during weekend, but less during weekdays.

  • Member riders use the service more frequently during weekdays than during weekend.

  • Though casual riders do not use as same frequency as what member riders do during weekdays, but around 200,0000 records are counted by casual riders from Monday to Friday.

5.4.Average duration through a week

all_trips1%>%
mutate(weekday=wday(started_at,label=TRUE))%>%
group_by(member_casual,weekday)%>%
summarise(number_of_rides=n()
,average_duration=mean(ride_length))%>%
arrange(member_casual,weekday)%>%
ggplot(aes(x=weekday,y=average_duration,fill=member_casual))+geom_col(position="dodge")+
  scale_y_continuous(labels = scales::number_format())+ #To display as numbers to avoid "1e+5".
  scale_fill_manual(values = c("#66CDAA","#ffd480")) +
  labs(title = "Average riding lenghth through a week", subtitle = "Member vs Casual", caption = "Data collected by Divvy Data", x = "Weekday", y= "Riding length in seconds")

  • Casual riders use the service longer during weekend, but they also use it more than 15 minutes (900 seconds) during weekdays.

  • Member riders use it around 10 minutes (600 seconds).

  • Among casual riders, some riders might use the bike-share service for commutes.

5.5.Numbers of rides through a day

all_trips1%>%
mutate(hour_day = hour(time))%>% #Create a new column as "hour_day" and extract hour part by hour() function.
group_by(member_casual,hour_day)%>% #Pick up two columns
summarise(number_of_rides=n(),average_duration=mean(ride_length))%>%
arrange(member_casual,hour_day)%>%
ggplot(aes(x=hour_day,y=number_of_rides,fill=member_casual))+geom_col(position="dodge")+
    scale_y_continuous(labels = scales::number_format())+ #To display as numbers to avoid "1e+5".
    scale_x_continuous(breaks = seq(0, 23))+
    scale_fill_manual(values = c("#66CDAA","#ffd480")) +
  labs(title = "Riding numbers through a day", subtitle = "Member vs Casual", caption = "Data collected by Divvy Data", x = "Hour of day", y= "Riding numbers")

  • Casual riders use the bike-share service mainly during late afternoon.

  • Member riders use the service at 7-8 in the morning and 15-19 in the afternoon and evening.

  • In the morning, casual riders also use the service though the ratio is about 30% of member riders.

5.6.Average duration through a day

all_trips1%>%
mutate(hour_day=hour(time))%>%
group_by(member_casual,hour_day)%>%
summarise(number_of_rides=n(),average_duration=mean(ride_length))%>%
arrange(member_casual,hour_day)%>%
ggplot(aes(x=hour_day,y=average_duration,fill=member_casual))+geom_col(position="dodge")+
  scale_y_continuous(labels = scales::number_format())+ #To display as numbers to avoid "1e+5".
  scale_x_continuous(breaks = seq(0, 23))+
  scale_fill_manual(values = c("#66CDAA","#ffd480")) +
  labs(title = "Average riding lenghth through a day", subtitle = "Member vs Casual", caption = "Data collected by Divvy Data", x = "Hour of day", y= "Riding length in seconds")

  • Casual riders more longer than member riders through a day.

  • Casual riders especially use this service from 10:00 to 15:00.

  • Even during midnight, the service is used among both riders.

5.7.Riding number by hours through a week.

all_trips1%>%
mutate(hour_day = hour(time))%>% #Create a new column as "hour_day" and extract hour part by hour() function.
mutate(weekday=wday(started_at,label=TRUE))%>%
group_by(member_casual,hour_day, weekday)%>% #Pick up two columns
summarise(number_of_rides=n(),average_duration=mean(ride_length))%>%
arrange(member_casual,hour_day, weekday)%>%
ggplot(aes(x=hour_day,y=number_of_rides,fill=member_casual))+geom_col(position="dodge")+
    scale_y_continuous(labels = scales::number_format())+ #To display as numbers to avoid "1e+5".
    scale_fill_manual(values = c("#66CDAA","#ffd480")) +
    facet_grid(~weekday)+
  labs(title = "Riding numbers through a day by day of week", subtitle = "Member vs Casual", caption = "Data collected by Divvy Data", x = "Hour of day", y= "Riding numbers", fill = "Rider type")

  • Member riders use the service during daytime and weekdays.

  • Casual riders use the service more than member riders during afternoon on Saturday.

  • Despite the number of casual riders is lower than members, but a certain percentage of casual riders might use the service in the morning during weekdays.

5.8.EXPORT SUMMARY FILE FOR FURTHER ANALYSIS

Create a csv file that others will visualize in Excel, Tableau, or other presentation software.

6.ACT

In this section, based on the business tasks and analysis works, we describe our recommendations to take actions for a new marketing strategy of Cyclistic.

6.1. Responses to business questions

  1. How do annual member and casual riders use Cyclistic bikes differently? According to the visualizations with 5.1. - 5.6., member riders use more frequently than casual riders, but casual riders uses more longer time than member riders.
  2. Why would casual riders buy Cyclistic annual memberships? According to the visualization with 5.7., member riders use the bike-share service frequently through a week and a daytime. So, frequent casual riders, such as riding for commute or going for shopping, could buy Cyclistic annual memberships.
  3. How can Cyclistic use digital media to influence casual riders to become members? Accordingly, frequent users could become members. Cyclistic can create the promotional campaign on the digital media for the casual users who use the bike-share service 3 days a week or 10 days a month. In addition, the campaign can be released before April according to the visualization 5.1..

6.2. Recommendations

In order to maximize the annual members of the bike-share service of Cyclistic, the following three recommendations will help for a new marketing strategy.

  1. Increasing the stations at the business, school, and shopping areas to make frequent habits of casual riders by using the service.

  2. Place digital promotional campaigns before April to convert casual riders to annual membership users. According to the visualization with 5.1. many casual riders use the service during warm months.

  3. Notify or suggest the annual service plan for the casual riders who use the service 10 days a month.

Having implementation of these recommendations, we hope Cyclistic achieve their business task as soon as possible and contribute to the Eco-society.