Introduction

This project is a case study from the Google Data Analytics Professional Certificate course. For the analysis i have chosen to use R programming language and Rstudio IDE for its easy statisical analysis and data visualisations with large datasets.

Data analysis steps

  • Ask
  • Prepare
  • Process
  • Analyse
  • Share
  • Act

Scenario

You are a junior data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore, your team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, your team will design a new marketing strategy to convert casual riders into annual members. But first, Cyclistic executives must approve your recommendations, so they must be backed up with compelling data insights and professional data visualisations.

Ask

Three questions will guide the future marketing program:
1. How do annual members and casual riders use Cyclistic bikes differently?
2. Why would casual riders buy Cyclistic annual memberships?
3. How can Cyclistic use digital media to influence casual riders to become members?

The director of marketing and your manager Lily Moreno has assigned you the first question to answer: How do annual members and casual riders use Cyclistic bikes differently?

Business task

The main objective is build successful marketing strategies that will aim to turn casual bike riders into members by providing data insights into the differences in how the two groups use Cyclistic bike share differently.

Key stakeholders

Director of marketing (Lily Moreno), Cyclistic executive team and marketing analytics team.

Statement of work

Find and explore the differences in between casual riders and member riders.

Prepare

Cyclistic’s historical trip data will be used to analyse and identify trends This data has been made avialble as seen in these licenses Datasets are avialable here

For this analysis i have chosen the most recent data from 2022-03 to 2023-02, as its more current for the business task. All trips data is a comma-delimited (.CSV) format. Total of 13 columns including “ride_id”, “rideable_type”, “started_at”, “ended_at”, “start_station_name”, “start_station_id”, “end_station_id”, “start_lat”, “start_lng”, “end_lat”, “end_lng” and “member_casual” For the purpose of this case study the datasets are appropriate and will enable me to answer the business questions.

# install and load the necessary packages
library(tidyverse)
library(ggplot2)
library(lubridate)
library(readr)
library(dplyr)
library(geosphere)
library(tidyr)
library(rmarkdown)
#import data sets 
trip_2022_03 <- read.csv("~/Rstudio/Cyclistic_Case_Study/202203-tripdata.csv", stringsAsFactors = FALSE)
trip_2022_04 <- read.csv("~/Rstudio/Cyclistic_Case_Study/202204-tripdata.csv", stringsAsFactors = FALSE)
trip_2022_05 <- read.csv("~/Rstudio/Cyclistic_Case_Study/202205-tripdata.csv", stringsAsFactors = FALSE)
trip_2022_06 <- read.csv("~/Rstudio/Cyclistic_Case_Study/202206-tripdata.csv", stringsAsFactors = FALSE)
trip_2022_07 <- read.csv("~/Rstudio/Cyclistic_Case_Study/202207-tripdata.csv", stringsAsFactors = FALSE)
trip_2022_08 <- read.csv("~/Rstudio/Cyclistic_Case_Study/202208-tripdata.csv", stringsAsFactors = FALSE)
trip_2022_09 <- read.csv("~/Rstudio/Cyclistic_Case_Study/202209-tripdata.csv", stringsAsFactors = FALSE)
trip_2022_10 <- read.csv("~/Rstudio/Cyclistic_Case_Study/202210-tripdata.csv", stringsAsFactors = FALSE)
trip_2022_11 <- read.csv("~/Rstudio/Cyclistic_Case_Study/202211-tripdata.csv", stringsAsFactors = FALSE)
trip_2022_12 <- read.csv("~/Rstudio/Cyclistic_Case_Study/202212-tripdata.csv", stringsAsFactors = FALSE)
trip_2023_01 <- read.csv("~/Rstudio/Cyclistic_Case_Study/202301-tripdata.csv", stringsAsFactors = FALSE)
trip_2023_02 <- read.csv("~/Rstudio/Cyclistic_Case_Study/202302-tripdata.csv", stringsAsFactors = FALSE)

Process

First we need to examine the datasets for consistency including column names and datetypes. It is important for all column names to be in the correct order before the merge the data frames successfully.

#check for consistency between data sets - matching column names 

colnames(trip_2022_03)
colnames(trip_2022_04)
colnames(trip_2022_05)
colnames(trip_2022_06)
colnames(trip_2022_07)
colnames(trip_2022_08)
colnames(trip_2022_09)
colnames(trip_2022_10)
colnames(trip_2022_11)
colnames(trip_2022_12)
colnames(trip_2023_01)
colnames(trip_2023_02)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"
#check data types - chr, date etc 

str(trip_2022_03)
str(trip_2022_04)
str(trip_2022_05)
str(trip_2022_06)
str(trip_2022_07)
str(trip_2022_08)
str(trip_2022_09)
str(trip_2022_10)
str(trip_2022_11)
str(trip_2022_12)
str(trip_2023_01)
str(trip_2023_02)
## 'data.frame':    284042 obs. of  13 variables:
##  $ ride_id           : chr  "47EC0A7F82E65D52" "8494861979B0F477" "EFE527AF80B66109" "9F446FD9DEE3F389" ...
##  $ rideable_type     : chr  "classic_bike" "electric_bike" "classic_bike" "classic_bike" ...
##  $ started_at        : chr  "2022-03-21 13:45:01" "2022-03-16 09:37:16" "2022-03-23 19:52:02" "2022-03-01 19:12:26" ...
##  $ ended_at          : chr  "2022-03-21 13:51:18" "2022-03-16 09:43:34" "2022-03-23 19:54:48" "2022-03-01 19:22:14" ...
##  $ start_station_name: chr  "Wabash Ave & Wacker Pl" "Michigan Ave & Oak St" "Broadway & Berwyn Ave" "Wabash Ave & Wacker Pl" ...
##  $ start_station_id  : chr  "TA1307000131" "13042" "13109" "TA1307000131" ...
##  $ end_station_name  : chr  "Kingsbury St & Kinzie St" "Orleans St & Chestnut St (NEXT Apts)" "Broadway & Ridge Ave" "Franklin St & Jackson Blvd" ...
##  $ end_station_id    : chr  "KA1503000043" "620" "15578" "TA1305000025" ...
##  $ start_lat         : num  41.9 41.9 42 41.9 41.9 ...
##  $ start_lng         : num  -87.6 -87.6 -87.7 -87.6 -87.6 ...
##  $ end_lat           : num  41.9 41.9 42 41.9 41.9 ...
##  $ end_lng           : num  -87.6 -87.6 -87.7 -87.6 -87.7 ...
##  $ member_casual     : chr  "member" "member" "member" "member" ...
#merge the individual monthly data frames into one large data frame  
yearly_trips <- bind_rows(trip_2022_03, trip_2022_04, trip_2022_05, trip_2022_06, trip_2022_07, trip_2022_08, 
                          trip_2022_09, trip_2022_10, trip_2022_11, trip_2022_12, trip_2023_01, trip_2023_02)

Process

Cleaning and Preparation of data for analysis

# checking merged data frame
colnames(yearly_trips) #List of column names 
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"
head(yearly_trips) #See the first six row of data 
##            ride_id rideable_type          started_at            ended_at
## 1 47EC0A7F82E65D52  classic_bike 2022-03-21 13:45:01 2022-03-21 13:51:18
## 2 8494861979B0F477 electric_bike 2022-03-16 09:37:16 2022-03-16 09:43:34
## 3 EFE527AF80B66109  classic_bike 2022-03-23 19:52:02 2022-03-23 19:54:48
## 4 9F446FD9DEE3F389  classic_bike 2022-03-01 19:12:26 2022-03-01 19:22:14
## 5 431128AD9AFFEDC0  classic_bike 2022-03-21 18:37:01 2022-03-21 19:19:11
## 6 9AA8A13AF7A85325  classic_bike 2022-03-07 17:10:22 2022-03-07 17:15:04
##                   start_station_name start_station_id
## 1             Wabash Ave & Wacker Pl     TA1307000131
## 2              Michigan Ave & Oak St            13042
## 3              Broadway & Berwyn Ave            13109
## 4             Wabash Ave & Wacker Pl     TA1307000131
## 5 DuSable Lake Shore Dr & North Blvd           LF-005
## 6          Bissell St & Armitage Ave            13059
##                       end_station_name end_station_id start_lat start_lng
## 1             Kingsbury St & Kinzie St   KA1503000043  41.88688 -87.62603
## 2 Orleans St & Chestnut St (NEXT Apts)            620  41.90100 -87.62375
## 3                 Broadway & Ridge Ave          15578  41.97835 -87.65975
## 4           Franklin St & Jackson Blvd   TA1305000025  41.88688 -87.62603
## 5             Loomis St & Jackson Blvd          13206  41.91172 -87.62680
## 6         Southport Ave & Clybourn Ave   TA1309000030  41.91802 -87.65218
##    end_lat   end_lng member_casual
## 1 41.88918 -87.63851        member
## 2 41.89820 -87.63754        member
## 3 41.98404 -87.66027        member
## 4 41.87771 -87.63532        member
## 5 41.87794 -87.66201        member
## 6 41.92077 -87.66371        member
str(yearly_trips) #List of column names and data types
## 'data.frame':    5829084 obs. of  13 variables:
##  $ ride_id           : chr  "47EC0A7F82E65D52" "8494861979B0F477" "EFE527AF80B66109" "9F446FD9DEE3F389" ...
##  $ rideable_type     : chr  "classic_bike" "electric_bike" "classic_bike" "classic_bike" ...
##  $ started_at        : chr  "2022-03-21 13:45:01" "2022-03-16 09:37:16" "2022-03-23 19:52:02" "2022-03-01 19:12:26" ...
##  $ ended_at          : chr  "2022-03-21 13:51:18" "2022-03-16 09:43:34" "2022-03-23 19:54:48" "2022-03-01 19:22:14" ...
##  $ start_station_name: chr  "Wabash Ave & Wacker Pl" "Michigan Ave & Oak St" "Broadway & Berwyn Ave" "Wabash Ave & Wacker Pl" ...
##  $ start_station_id  : chr  "TA1307000131" "13042" "13109" "TA1307000131" ...
##  $ end_station_name  : chr  "Kingsbury St & Kinzie St" "Orleans St & Chestnut St (NEXT Apts)" "Broadway & Ridge Ave" "Franklin St & Jackson Blvd" ...
##  $ end_station_id    : chr  "KA1503000043" "620" "15578" "TA1305000025" ...
##  $ start_lat         : num  41.9 41.9 42 41.9 41.9 ...
##  $ start_lng         : num  -87.6 -87.6 -87.7 -87.6 -87.6 ...
##  $ end_lat           : num  41.9 41.9 42 41.9 41.9 ...
##  $ end_lng           : num  -87.6 -87.6 -87.7 -87.6 -87.7 ...
##  $ member_casual     : chr  "member" "member" "member" "member" ...
#create separate columns for date, year, month and day 

yearly_trips$date <- as.Date(yearly_trips$started_at)
yearly_trips$year <- format(as.Date(yearly_trips$date), "%y") #extract year
yearly_trips$month <- format(as.Date(yearly_trips$date), "%m") # extract month
yearly_trips$day <- format(as.Date(yearly_trips$date), "%d") #extract day of month
yearly_trips$day_of_week <- wday(yearly_trips$date, label = TRUE) #extract day of the week column 

yearly_trips$start_time <- format(strptime(yearly_trips$started_at,"%Y-%m-%d %H:%M:%S"), "%H") #extract hour of the day 
# calculate the length of ride by finding the difference in end and start time (needed to convert start and ended at columns)
yearly_trips$started_at2 <- strptime(yearly_trips$started_at, "%Y-%m-%d %H:%M:%S") #convert chr date time 
yearly_trips$ended_at2 <- strptime (yearly_trips$ended_at, "%Y-%m-%d %H:%M:%S") #convert chr date time 


yearly_trips$ride_length <- difftime(yearly_trips$ended_at2, yearly_trips$started_at2, units ="mins") #calculate difference in times 
yearly_trips$ride_length2 <- as.numeric(yearly_trips$ride_length) # convert to number for calculations 
# calculate the ride_distance 
yearly_trips$ride_distance <- distGeo(matrix(c(yearly_trips$start_lng, yearly_trips$start_lat), ncol = 2), 
                                      matrix(c(yearly_trips$end_lng, yearly_trips$end_lat), ncol = 2))

yearly_trips$ride_distance <- yearly_trips$ride_distance/1000 #distance in km 
#clean data set 
yearly_trips2 <- yearly_trips [!(yearly_trips$ride_length2 <=0),] # remove values less than or equal to 0 

yearly_trips2 <- na.omit(yearly_trips2) # remove rows where with NA values

Analyse

Now all the required data is in one place. We are ready to perform descriptive analysis of the data to find patterns and explore the differences between casual customers and members.

#first lets looks at our cleaned data frame 
str(yearly_trips2)
## 'data.frame':    5822489 obs. of  24 variables:
##  $ ride_id           : chr  "47EC0A7F82E65D52" "8494861979B0F477" "EFE527AF80B66109" "9F446FD9DEE3F389" ...
##  $ rideable_type     : chr  "classic_bike" "electric_bike" "classic_bike" "classic_bike" ...
##  $ started_at        : chr  "2022-03-21 13:45:01" "2022-03-16 09:37:16" "2022-03-23 19:52:02" "2022-03-01 19:12:26" ...
##  $ ended_at          : chr  "2022-03-21 13:51:18" "2022-03-16 09:43:34" "2022-03-23 19:54:48" "2022-03-01 19:22:14" ...
##  $ start_station_name: chr  "Wabash Ave & Wacker Pl" "Michigan Ave & Oak St" "Broadway & Berwyn Ave" "Wabash Ave & Wacker Pl" ...
##  $ start_station_id  : chr  "TA1307000131" "13042" "13109" "TA1307000131" ...
##  $ end_station_name  : chr  "Kingsbury St & Kinzie St" "Orleans St & Chestnut St (NEXT Apts)" "Broadway & Ridge Ave" "Franklin St & Jackson Blvd" ...
##  $ end_station_id    : chr  "KA1503000043" "620" "15578" "TA1305000025" ...
##  $ start_lat         : num  41.9 41.9 42 41.9 41.9 ...
##  $ start_lng         : num  -87.6 -87.6 -87.7 -87.6 -87.6 ...
##  $ end_lat           : num  41.9 41.9 42 41.9 41.9 ...
##  $ end_lng           : num  -87.6 -87.6 -87.7 -87.6 -87.7 ...
##  $ member_casual     : chr  "member" "member" "member" "member" ...
##  $ date              : Date, format: "2022-03-21" "2022-03-16" ...
##  $ year              : chr  "22" "22" "22" "22" ...
##  $ month             : chr  "03" "03" "03" "03" ...
##  $ day               : chr  "21" "16" "23" "01" ...
##  $ day_of_week       : Ord.factor w/ 7 levels "Sun"<"Mon"<"Tue"<..: 2 4 4 3 2 2 5 7 5 6 ...
##  $ start_time        : chr  "13" "09" "19" "19" ...
##  $ started_at2       : POSIXlt, format: "2022-03-21 13:45:01" "2022-03-16 09:37:16" ...
##  $ ended_at2         : POSIXlt, format: "2022-03-21 13:51:18" "2022-03-16 09:43:34" ...
##  $ ride_length       : 'difftime' num  6.28333333333333 6.3 2.76666666666667 9.8 ...
##   ..- attr(*, "units")= chr "mins"
##  $ ride_length2      : num  6.28 6.3 2.77 9.8 42.17 ...
##  $ ride_distance     : num  1.067 1.185 0.634 1.277 4.755 ...
##  - attr(*, "na.action")= 'omit' Named int [1:6057] 309 461 2686 2771 3160 5694 7114 8195 13771 13951 ...
##   ..- attr(*, "names")= chr [1:6057] "NA" "NA.1" "NA.2" "NA.3" ...
# good practice to look at basic descriptive statisitcs of our data 

yearly_trips2 %>% 
    summarise(average_ride_length = mean(ride_length2), median_ride_length = median(ride_length2), 
              max_ride_length = max(ride_length2), min_ride_length = min(ride_length2))
##   average_ride_length median_ride_length max_ride_length min_ride_length
## 1            16.11311           10.13333        34294.07      0.01666667

Visualise total rides comparision for casual riders and members

yearly_trips2 %>% 
  group_by(member_casual) %>% 
  summarise(ride_count = length(ride_id), ride_percentage = (length(ride_id) / 
nrow(yearly_trips2)) *100 )
## # A tibble: 2 × 3
##   member_casual ride_count ride_percentage
##   <chr>              <int>           <dbl>
## 1 casual           2359547            40.5
## 2 member           3462942            59.5
ggplot(yearly_trips2, aes(x = member_casual, fill = member_casual)) + geom_bar() +
  labs (title = "casual riders vs member riders", x="Casuals VS Members", y = "number of riders", x ="Casual vs Members distribution") + 
  scale_y_continuous(labels = function(x) format(x, scientific = FALSE))

  • From the above tables and chart we can see member riders have taken more rides during the year than casual riders.

Visualise total and average ride times for each day of the week

#total and average ride time for each day of the week 
yearly_trips2 %>% 
  group_by(member_casual, day_of_week) %>% 
  summarise(number_of_rides = n(), average_ride_length = mean(ride_length2)) %>% 
    arrange(member_casual, day_of_week)
## # A tibble: 14 × 4
## # Groups:   member_casual [2]
##    member_casual day_of_week number_of_rides average_ride_length
##    <chr>         <ord>                 <int>               <dbl>
##  1 casual        Sun                  397503                24.9
##  2 casual        Mon                  282669                22.3
##  3 casual        Tue                  271722                19.3
##  4 casual        Wed                  279352                18.7
##  5 casual        Thu                  312994                19.4
##  6 casual        Fri                  338015                20.4
##  7 casual        Sat                  477292                24.5
##  8 member        Sun                  402676                13.6
##  9 member        Mon                  488499                11.9
## 10 member        Tue                  546468                11.6
## 11 member        Wed                  542308                11.7
## 12 member        Thu                  547254                11.9
## 13 member        Fri                  481057                12.1
## 14 member        Sat                  454680                13.7
#total casual vs member - day of the week 
yearly_trips2 %>% 
  group_by(member_casual, day_of_week) %>% 
  summarise(number_of_rides = n()) %>% 
  arrange(member_casual, day_of_week) %>% 
  ggplot(aes(x = day_of_week, y=number_of_rides, fill = member_casual)) + 
  labs(title = "Total rides by Memeber vs Casuals for each day of the week") +
  geom_col(width=0.5, position = position_dodge(width=0.5))+ 
  scale_y_continuous(labels = function(x) format(x, scientific = FALSE))

  • From table and chart above we can conclude that members consistently take more trips on weekdays and casual riders have increased activity on Saturday and Sunday
#average ride length - casual vs member - day of the week 
yearly_trips2 %>% 
  group_by(member_casual, day_of_week) %>% 
  summarise(average_ride_length = mean(ride_length2)) %>% 
  arrange(member_casual, day_of_week) %>% 
  ggplot(aes(x= day_of_week, y = average_ride_length, fill = member_casual)) +
  geom_col(width=0.5, position = position_dodge(width=0.5)) + 
  labs(title ="Average ride time by Members and Casual riders Vs. Day of the week")

  • From the chart above we can see that members have on average shorter ride lengths and are fairly consistent throughout the week.
  • We can also see that casual riders take consistently longer riders than members and this variance peaks on saturdays and sundays.

See total and average monthly ride times for casual and members

#first lets convert into a month abbreviation for ease of viewing 
yearly_trips2$month <- as.numeric(yearly_trips2$month) # convert character to number
yearly_trips2$month <- month.abb[yearly_trips2$month] # convert numeric month to month abbreviation 

Visualise total rides by type and month

yearly_trips2 %>% 
  group_by(member_casual, month) %>% 
  summarise(numbner_of_rides = n()) %>% 
  arrange(member_casual, month) %>% 
  print(n = Inf) %>% # print all rows
  ggplot(aes(x=month, y=numbner_of_rides, fill = member_casual)) + 
  labs (title = "Total monthly comparison - members vs casuals", x = "month", y = "Number of rides") +
  geom_col(width=0.5, position = position_dodge(width=0.5)) +
  scale_y_continuous(labels = function(x) format(x, scientific = FALSE)) + 
                       scale_x_discrete(limits = month.abb) # order the months 
## # A tibble: 24 × 3
## # Groups:   member_casual [2]
##    member_casual month numbner_of_rides
##    <chr>         <chr>            <int>
##  1 casual        Apr             126102
##  2 casual        Aug             358130
##  3 casual        Dec              44788
##  4 casual        Feb              42920
##  5 casual        Jan              39902
##  6 casual        Jul             405146
##  7 casual        Jun             368060
##  8 casual        Mar              89580
##  9 casual        May             279737
## 10 casual        Nov             100554
## 11 casual        Oct             208584
## 12 casual        Sep             296044
## 13 member        Apr             244799
## 14 member        Aug             426882
## 15 member        Dec             136876
## 16 member        Feb             147400
## 17 member        Jan             150264
## 18 member        Jul             417323
## 19 member        Jun             400023
## 20 member        Mar             194059
## 21 member        May             354351
## 22 member        Nov             236893
## 23 member        Oct             349561
## 24 member        Sep             404511

Visualise average ride times by type and month

#average rides vs month   
yearly_trips2 %>% 
  group_by(member_casual, month) %>% 
  summarise(average_ride_length = mean(ride_length2)) %>% 
  arrange(member_casual, month) %>% 
  print(n = Inf) %>% # print all rows
  ggplot(aes(x=month, y=average_ride_length, fill = member_casual)) + 
  labs (title = "Average monthly comparison - members vs casuals", x = "month", y = "Average ride length (mins)") +
  geom_col(width=0.5, position = position_dodge(width=0.5)) +
  scale_y_continuous(labels = function(x) format(x, scientific = FALSE)) + 
  scale_x_discrete(limits = month.abb) # order the months 
## # A tibble: 24 × 3
## # Groups:   member_casual [2]
##    member_casual month average_ride_length
##    <chr>         <chr>               <dbl>
##  1 casual        Apr                  23.4
##  2 casual        Aug                  21.4
##  3 casual        Dec                  13.4
##  4 casual        Feb                  15.9
##  5 casual        Jan                  13.6
##  6 casual        Jul                  23.3
##  7 casual        Jun                  23.4
##  8 casual        Mar                  25.7
##  9 casual        May                  25.5
## 10 casual        Nov                  15.5
## 11 casual        Oct                  18.5
## 12 casual        Sep                  20.0
## 13 member        Apr                  11.4
## 14 member        Aug                  13.1
## 15 member        Dec                  10.4
## 16 member        Feb                  10.5
## 17 member        Jan                  10.1
## 18 member        Jul                  13.4
## 19 member        Jun                  13.7
## 20 member        Mar                  11.7
## 21 member        May                  13.1
## 22 member        Nov                  10.9
## 23 member        Oct                  11.6
## 24 member        Sep                  12.6

  • June, July, August and September are the busiest times of the year for both casual riders and members. It is plausible this is due to weather and seasonal changes there is a drop of in bike usages as December, January and Febauary see the lowest number of riders for both types of customers. Casual riders also show greater drop off compared to members before and after the summer months.
  • Member riders show consistent average ride lengths between approximiately 10 - 14 mintutes throughout the year. While casual ride length is significantly longer compared to member riders and varies between approximately 13-24 minutes.

Comparison on ride distance between members and casual riders.

#comparison on ride distance 
yearly_trips2 %>% 
  group_by(member_casual) %>% 
  summarise(average_ride_distance = mean(ride_distance)) %>% 
  ggplot(aes(x=member_casual, y=average_ride_distance, fill = member_casual)) + 
  geom_col() + 
  labs(title = "Mean travel distance members vs casuals", y="Average distance in KM" )

  • From the above chart we can see than casual riders and members average ride length was very similar. Plausible that trips from casuals cover similar distance despite a longer ride time due to their liesurely nature compared to their brisker member counterparts.

Hourly comparision of bike demand for members and casual riders

#hourly comparison
yearly_trips2 %>%  
  ggplot(aes(x = start_time, fill = member_casual)) + 
  geom_bar()+
  labs(title = "Hourly demand comparison", x = "hour of the day")+
  scale_y_continuous(labels = function(x) format(x, scientific = FALSE))+
  theme(axis.text.x = element_text(angle = 30))+
  facet_wrap(~member_casual) 

  • From the above charts we can see that members use it more frequently between 7-11am and between 4-6pm. This supports that members typically use the bikes for commuting during the week however it needs to be checked on a daily basis. Members see demand around 4pm.

Daily and hourly comparison of bike demand for members and casual riders

#hourly demand by day of the week
yearly_trips2 %>% 
  ggplot(aes(x = start_time, fill = member_casual)) + 
  geom_bar() + 
  labs(title = "Hourly demand comparison", x = "hour of the day")+
  scale_y_continuous(labels = function(x) format(x, scientific = FALSE))+
  theme(axis.text.x = element_text(angle = 90))+
  facet_wrap(~day_of_week)

  • Weekdays and weekends vary greatly in the comparison. With members seeing increased volumes on weekdays between 7am - 8am and again at 5pm. This provides further support that members frequently use the bikes for their work commutes. Member demand increased greatly on the weekend, from this we can hypothesise that casuals use their bikes for leisure activities at the weekend.

Rideable type comparison for mebers and casual riders

#rideable type 
yearly_trips2 %>% 
  group_by(rideable_type) %>% 
  summarise(count = length(ride_id))
## # A tibble: 3 × 2
##   rideable_type   count
##   <chr>           <int>
## 1 classic_bike  2663497
## 2 docked_bike    176393
## 3 electric_bike 2982599
yearly_trips2 %>% 
  ggplot(aes(x=rideable_type, fill = member_casual)) + 
  labs(title = "Rideable type comparison - members vs casuals", x= "Rideable type") + 
  geom_bar(width=0.5, position = position_dodge(width=0.5))+ 
  scale_y_continuous(labels = function(x) format(x, scientific = FALSE))

  • From the table and chart above we can see that members mostly use classic bikes followed by electric bikes. Docked bikes are only used by casuals

Share

Before we make our recommendation for the marketing department we will share our data visualisations with our stakeholders as they will provide insight into the the key differences between member and casual riders. This will be done via presentation and explain the analaysis laid out above as will as key insights from said analysis.

Differences between casual riders and members

  • Member riders account for ~10% more rides than casual riders
  • Member riders consitently take more trips on weekdays compared to casual riders who have increased activty on Saturday and Sunday
  • Members on average take shorter (time) rides than casual riders on every day of the week with the difference most visible on Saturday and Sunday
  • Demand for the bikes peaks in summer and dips in winter for both types. However the drop off is far more evident in casual riders compared to members
  • Members demand typically peaks around 8 am and 5 pm Monday - Friday compared to casual riders who only see increased volume in the evenings and on weekends
  • Members mostly use classic bikes followed by electric bikes. Docked bikes are only used by casuals

Key findings

  • Members likely use their bikes for commuting to and from work. This is supported by increased demand on weekdays ~9am and ~5pm as well as usage all year round including the colder months
  • There is a greater volume of bikers in the afternoon
  • For casual riders the biggest demand was at the weekend and during summer months

Act

For the last stage of the data analysis process, we will make three recommendations from our key analysis aimed at increasing our annual members.

Recomendations

1. Offer a weekend only membership targeting at the current casual riders at a different price point to the current annual members.
2. Launch the advertising campaign coming into summer when we see increased demand among casuals as they are more likely to sign up to a membership when frequently using the bike.
3. Offer coupons and discounts to sign up at the docking stations on weekends when casual customers are using the bikes