Context

The following data analysis project is the final part of the Google Data Analytics Professional Certificate. The objective here, is to put all the skill taught throughout the course into practice. The course itself is broken down in 6 key phases that represent the integral processes of data analysis.

  1. Ask
  2. Prepare
  3. Process
  4. Analyze
  5. Share
  6. Act

Before we can jump into these processes, we first need to answer some preliminary questions below.

Ask

The project outlines what questions we are asking of Cyclystics data and sets the scene under which these questions are asked below.

You are a junior data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore, your team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, your team will design a new marketing strategy to convert casual riders into annual members. But first, Cyclistic executives must approve your recommendations, so they must be backed up with compelling data insights and professional data visualizations.

  • How do annual members and casual riders use Cyclistic bikes differently?
  • Why would casual riders buy Cyclistic annual memberships?
  • How can Cyclistic use digital media to influence casual riders to become members?

Prepare

The objective of this part of the project is to outline how we get the data ready for analysis.

Download

The data was downloaded from Cyclystics aka Divvy Bikes, in the form of zip files representing each month the data was collected.

Store

The zip files were stored initially on my laptops hard drive, then they were unzipped and the individual CSV files were transferred to my Google Drive and Kaggle

Process

Here is where fun really started, during this phase I worked on removing any inconsistencies and ensured the data was properly cleaned, and added any useful calculations. The first part of the processing and cleaning was done in Excel, the second part was handled in R, the reason for this is explained later on along with any assumptions made.

What did I do in excel?

  • To make life easier for myself, i saved each .CSV file as a .XLSX file. The reason for this is because .CSV files do not retain certain types of formatting, like tables, etc.

  • After saving the files in .XLSX format, I put the data into tables. This was done so i could see the data presented as columns and rows much easier. This would be useful when creating formulas

  • Once the data was in table format, I added 4 new columns to the existing ones. These were:

    • trip_month: Based on the “started_at” column, used to calculate the month of the trip using the formula - =SWITCH(MONTH(started_at),1,“JANUARY”,2,“FEBRUARY”,3,“MARCH”,4,“APRIL”,5,“MAY”,6,“JUNE”,7,“JULY”,8,“AUGUST”,9,“SEPTEMBER”,10,“OCTOBER”,11,“NOVEMBER”,12,“DECEMBER”), the result was formatted as a custom data type.
    • trip_day: Based on the “started_at” column, used to calculate the day of the trip using the formula - =SWITCH(WEEKDAY(started_at),1,“SUNDAY”,2,“MONDAY”,3,“TUESDAY”,4,“WEDNESDAY”,5,“THURSDAY”,6,“FRIDAY”,7,“SATURDAY”), the result was formatted as a custom data type.
    • trip_time_period: Based on the “started_at” column, used to calculate the period of the day of the trip using the formula - =IF(AND(HOUR(started_at)>0,HOUR([@[started_at]])<12),“Morning”,IF(AND(HOUR(started_at)>12,HOUR(started_at)<17),“Afternoon”,IF(AND(HOUR(started_at)>17, HOUR(started_at)<20),“Evening”,“Night”))), the result was formatted as a custom data type.
    • trip_duration: Based on the “started_at” and “ended_at” columns, used to calculate the duration of the trip using the formula - =(ended_at)-(started_at)x1440, the result was formatted as a numeric data type.

Quick note: I am quite adept at using excel, it was a skill I had developed prior to doing this project or pursuing the certification, hence my use of functions like SWITCH

  • Once the new columns were added, I formatted the pre existing columns where necessary.
    • ride_id: initially general, changed to text
    • rideable_type: initially general, changed to text
    • started_at: initially custom date/time, left unchanged
    • ended_at: initially custom date/time, left unchanged
    • start_station_name: initially general, changed to text
    • start_station_id: initially general, changed to text
    • end_station_name: initially general, changed to text
    • start_station_id: initially general, changed to text
    • start_lat: initiallly general, changed to numeric
    • start_lng: initiallly general, changed to numeric
    • end_lat: initiallly general, changed to numeric
    • end_lng: initiallly general, changed to numeric
    • member_casual: initially general, changed to text
  • Once all the formatting was completed, the files were saved the rest of process phase was handled in R, documented below.

What did I do in R?

Loaded the necessary libraries

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.1     ✔ purrr   1.0.1
## ✔ tibble  3.1.8     ✔ dplyr   1.1.0
## ✔ tidyr   1.3.0     ✔ stringr 1.5.0
## ✔ readr   2.1.3     ✔ forcats 1.0.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(dplyr)
library(ggplot2)
library(lubridate)
## 
## Attaching package: 'lubridate'
## 
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(readxl)
library(modeest)
## Registered S3 method overwritten by 'rmutil':
##   method         from
##   print.response httr
library(stringr)

Imported the partially cleaned excel files into data frames

I took the time to do some preliminary data cleaning in Excel prior to bring the data over to R. The reason for this is that there are thing that Excel is just naturally good at from a data cleaning perspective. These include fixing data types, performing basic functions and calculations, as well as, highlighting errors. One major caveat here is that the excel files for this project were huge, in excess of 60MB or over 100,000 rows of data. Given that while I do a fairly competent laptop, it wouldn’t be efficient to try to do everything in Excel, especially when R can be brought in handle some of the heavy lifting.

january_tripdata <- read_excel("Divvy-TripData/202201-divvy-tripdata.xlsx", 
    sheet = "202201-divvy-tripdata", col_types = c("text", 
        "text", "date", "date", "text", "text", "text", 
        "numeric", "text", "text", "text", 
        "text", "numeric", "numeric", "numeric", 
        "numeric", "text"))

february_tripdata <- read_excel("Divvy-TripData/202202-divvy-tripdata.xlsx", 
    sheet = "202202-divvy-tripdata", col_types = c("text", 
        "text", "date", "date","text", "text", "text", 
        "numeric", "text", "text", "text", 
        "text", "numeric", "numeric", "numeric", 
        "numeric", "text"))

march_tripdata <- read_excel("Divvy-TripData/202203-divvy-tripdata.xlsx", 
    sheet = "202203-divvy-tripdata", col_types = c("text", 
        "text", "date", "date", "text", "text", "text", 
        "numeric", "text", "text", "text", 
        "text", "numeric", "numeric", "numeric", 
        "numeric", "text"))

april_tripdata <- read_excel("Divvy-TripData/202204-divvy-tripdata.xlsx", 
    sheet = "202204-divvy-tripdata", col_types = c("text", 
        "text", "date", "date", "text", "text", "text", 
        "numeric", "text", "text", "text", 
        "text", "numeric", "numeric", "numeric", 
        "numeric", "text"))

may_tripdata <- read_excel("Divvy-TripData/202205-divvy-tripdata.xlsx", 
    sheet = "202205-divvy-tripdata", col_types = c("text", 
        "text", "date", "date", "text", "text", "text", 
        "numeric", "text", "text", "text", 
        "text", "numeric", "numeric", "numeric", 
        "numeric", "text"))

june_tripdata <- read_excel("Divvy-TripData/202206-divvy-tripdata.xlsx", 
    sheet = "202206-divvy-tripdata", col_types = c("text", 
        "text", "date", "date", "text", "text", "text", 
        "numeric", "text", "text", "text", 
        "text", "numeric", "numeric", "numeric", 
        "numeric", "text"))

july_tripdata <- read_excel("Divvy-TripData/202207-divvy-tripdata.xlsx", 
    sheet = "202207-divvy-tripdata", col_types = c("text", 
        "text", "date", "date", "text", "text", "text", 
        "numeric", "text", "text", "text", 
        "text", "numeric", "numeric", "numeric", 
        "numeric", "text"))

august_tripdata <- read_excel("Divvy-TripData/202208-divvy-tripdata.xlsx", 
    sheet = "202208-divvy-tripdata", col_types = c("text", 
        "text", "date", "date", "text", "text", "text", 
        "numeric", "text", "text", "text", 
        "text", "numeric", "numeric", "numeric", 
        "numeric", "text"))

september_tripdata <- read_excel("Divvy-TripData/202209-divvy-tripdata.xlsx", 
    sheet = "202209-divvy-tripdata", col_types = c("text", 
        "text", "date", "date", "text", "text", "text", 
        "numeric", "text", "text", "text", 
        "text", "numeric", "numeric", "numeric", 
        "numeric", "text"))

october_tripdata <- read_excel("Divvy-TripData/202210-divvy-tripdata.xlsx", 
    sheet = "202210-divvy-tripdata", col_types = c("text", 
        "text", "date", "date", "text", "text", "text", 
        "numeric", "text", "text", "text", 
        "text", "numeric", "numeric", "numeric", 
        "numeric", "text"))

november_tripdata <- read_excel("Divvy-TripData/202211-divvy-tripdata.xlsx", 
    sheet = "202211-divvy-tripdata", col_types = c("text", 
        "text", "date", "date", "text", "text", "text", 
        "numeric", "text", "text", "text", 
        "text", "numeric", "numeric", "numeric", 
        "numeric", "text"))

december_tripdata <- read_excel("Divvy-TripData/202212-divvy-tripdata.xlsx", 
    sheet = "202212-divvy-tripdata", col_types = c("text", 
        "text", "date", "date", "text", "text", "text", 
        "numeric", "text", "text", "text", 
        "text", "numeric", "numeric", "numeric", 
        "numeric", "text"))

Bound all the data frames together into a single data frame

Now we have all of our Excel files loaded into their respective data frames, to make the process of analysis more efficient, let’s combine them all into one big data frame.

all_trips <- bind_rows(
            january_tripdata,
            february_tripdata,
            march_tripdata,
            april_tripdata,
            may_tripdata,
            june_tripdata,
            july_tripdata,
            august_tripdata,
            september_tripdata,
            october_tripdata,
            november_tripdata,
            december_tripdata,
            )

Removed any blank rows

One of the assumptions that I made, was that any trip without a start or end station, was an error and as such, should be removed from the dataset. In addition to start and end stations, there were coordinates that also had blank rows, so those needed to be removed as well.

all_trips_no_blanks <- drop_na(all_trips)

Removed unrealistic ride durations

Initially I thought about excluding trip duration less than 0 minutes only, however in reality that doesn’t make real world sense from an analysis perspective. A realistic bike ride can’t take 0 minutes, it must be longer. By the same reasoning, you are limited to a certain amount of rides in 24 hours,as such any rides longer than 24 hours are assumed to be errors. Therefore, I decided to filter out rides less than 1 minute and greater than 24 hours for a more meaningful analysis.

all_trips_no_negatives <- all_trips_no_blanks[(all_trips_no_blanks$trip_duration > 1 & all_trips_no_blanks$trip_duration < 34560),]
trips_summary <- all_trips_no_negatives

Analyze and Share

I combined these two aspects of the data analysis process here because as you are reading this, I am fulfilling these aspects. I have analyzed the data which I am also sharing with you below.

Summarized the newly clean data

Much of the heavy lifting for cleaning the data was done in excel, but due to the enormous file sizes for the data sets used in this analysis, it wouldn’t be practical to do all of summarisation there. So, we tackle that aspect of things here, since R is much more efficient.

summary(trips_summary)
##    ride_id          rideable_type        started_at                    
##  Length:4292473     Length:4292473     Min.   :2022-01-01 00:00:05.00  
##  Class :character   Class :character   1st Qu.:2022-05-29 05:01:35.00  
##  Mode  :character   Mode  :character   Median :2022-07-20 19:37:51.00  
##                                        Mean   :2022-07-19 11:37:12.23  
##                                        3rd Qu.:2022-09-14 17:45:46.00  
##                                        Max.   :2022-12-31 23:59:26.00  
##     ended_at                       trip_month          trip_day        
##  Min.   :2022-01-01 00:01:48.00   Length:4292473     Length:4292473    
##  1st Qu.:2022-05-29 05:40:24.00   Class :character   Class :character  
##  Median :2022-07-20 19:55:37.00   Mode  :character   Mode  :character  
##  Mean   :2022-07-19 11:54:35.89                                        
##  3rd Qu.:2022-09-14 18:01:26.00                                        
##  Max.   :2023-01-01 18:09:37.00                                        
##  trip_time_period   trip_duration      start_station_name start_station_id  
##  Length:4292473     Min.   :    1.00   Length:4292473     Length:4292473    
##  Class :character   1st Qu.:    6.25   Class :character   Class :character  
##  Mode  :character   Median :   10.80   Mode  :character   Mode  :character  
##                     Mean   :   17.39                                        
##                     3rd Qu.:   19.25                                        
##                     Max.   :34354.07                                        
##  end_station_name   end_station_id       start_lat       start_lng     
##  Length:4292473     Length:4292473     Min.   :41.65   Min.   :-87.83  
##  Class :character   Class :character   1st Qu.:41.88   1st Qu.:-87.66  
##  Mode  :character   Mode  :character   Median :41.90   Median :-87.64  
##                                        Mean   :41.90   Mean   :-87.64  
##                                        3rd Qu.:41.93   3rd Qu.:-87.63  
##                                        Max.   :45.64   Max.   :-73.80  
##     end_lat         end_lng       member_casual     
##  Min.   : 0.00   Min.   :-87.83   Length:4292473    
##  1st Qu.:41.88   1st Qu.:-87.66   Class :character  
##  Median :41.90   Median :-87.64   Mode  :character  
##  Mean   :41.90   Mean   :-87.64                     
##  3rd Qu.:41.93   3rd Qu.:-87.63                     
##  Max.   :42.06   Max.   :  0.00
str(trips_summary)
## tibble [4,292,473 × 17] (S3: tbl_df/tbl/data.frame)
##  $ ride_id           : chr [1:4292473] "578BA30BA1348F18" "5EE2D7C533CCC17B" "5AA216F2E2138811" "81F3141973924C8C" ...
##  $ rideable_type     : chr [1:4292473] "docked_bike" "docked_bike" "docked_bike" "docked_bike" ...
##  $ started_at        : POSIXct[1:4292473], format: "2022-01-01 01:00:05" "2022-01-06 19:07:45" ...
##  $ ended_at          : POSIXct[1:4292473], format: "2022-01-21 08:51:11" "2022-01-25 14:30:33" ...
##  $ trip_month        : chr [1:4292473] "JANUARY" "JANUARY" "JANUARY" "JANUARY" ...
##  $ trip_day          : chr [1:4292473] "SATURDAY" "THURSDAY" "THURSDAY" "WEDNESDAY" ...
##  $ trip_time_period  : chr [1:4292473] "Morning" "Evening" "Night" "Afternoon" ...
##  $ trip_duration     : num [1:4292473] 29271 27083 14238 9839 8531 ...
##  $ start_station_name: chr [1:4292473] "Millennium Park" "Wabash Ave & Grand Ave" "Broadway & Belmont Ave" "Sedgwick St & Schiller St" ...
##  $ start_station_id  : chr [1:4292473] "13008" "TA1307000117" "13277" "TA1307000143" ...
##  $ end_station_name  : chr [1:4292473] "Fairfield Ave & Roosevelt Rd" "Base - 2132 W Hubbard Warehouse" "Avers Ave & Belmont Ave" "Larrabee St & Division St" ...
##  $ end_station_id    : chr [1:4292473] "KA1504000102" "Hubbard Bike-checking (LBS-WH-TEST)" "15640" "KA1504000079" ...
##  $ start_lat         : num [1:4292473] 41.9 41.9 41.9 41.9 41.9 ...
##  $ start_lng         : num [1:4292473] -87.6 -87.6 -87.6 -87.6 -87.6 ...
##  $ end_lat           : num [1:4292473] 41.9 41.9 41.9 41.9 41.9 ...
##  $ end_lng           : num [1:4292473] -87.7 -87.7 -87.7 -87.6 -87.7 ...
##  $ member_casual     : chr [1:4292473] "casual" "casual" "casual" "casual" ...

Dived Deeper into the analysis

What was the summary information for trip duration

summary(trips_summary$trip_duration)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##     1.00     6.25    10.80    17.39    19.25 34354.07

What as the total number of rides based on Membership Type>

members_trip_total <- data.frame(table(trips_summary$member_casual))
colnames(members_trip_total) <- c("Membership Type", "Number of Rides")
members_trip_total
##   Membership Type Number of Rides
## 1          casual         1731141
## 2          member         2561332

Quick Observation: There are almost 50% more Annual Members than Casual Riders

What was the Total Duration of rides based on Membership Type?

members_trip_total_duration <- aggregate(trips_summary$trip_duration, list(trips_summary$member_casual), FUN = sum)
colnames(members_trip_total_duration) <- c("Membership Type", "Trip Duration (Minutes)")
members_trip_total_duration
##   Membership Type Trip Duration (Minutes)
## 1          casual                42171615
## 2          member                32492675

Quick Observation Casual riders rode for almost 30% longer than Annual Members in total

What was the average trip duration for based on Membership Type?

members_trip_mean <- aggregate(trips_summary$trip_duration, list(trips_summary$member_casual), FUN = mean)
colnames(members_trip_mean) <- c("Membership Type","Average Trip (minutes)")
members_trip_mean
##   Membership Type Average Trip (minutes)
## 1          casual               24.36059
## 2          member               12.68585

Quick Observation: Casual Riders for 50% more than Annual Members on average

What were the longest and shortest trip duration based on Membership Type?

members_trip_max <- aggregate(trips_summary$trip_duration, list(trips_summary$member_casual), FUN = max)
colnames(members_trip_max) <- c("Membership Type", "Longest Trip (minutes)")
members_trip_max
##   Membership Type Longest Trip (minutes)
## 1          casual              34354.067
## 2          member               1493.233
members_trip_min <- aggregate(trips_summary$trip_duration, list(trips_summary$member_casual), FUN = min)
colnames(members_trip_min) <- c("Membership Type", "Shortest Trip (minutes)")
members_trip_min
##   Membership Type Shortest Trip (minutes)
## 1          casual                       1
## 2          member                       1

Repeated the entire analysis again, but using Bicycle Type as the focus

The Average Trip Duration by Bicycle Type

bicycle_trip_mean <- aggregate(trips_summary$trip_duration, list(trips_summary$rideable_type), FUN = mean)
colnames(bicycle_trip_mean) <- c("Bicycle Type","Average Trip (minutes)")
bicycle_trip_mean
##    Bicycle Type Average Trip (minutes)
## 1  classic_bike               17.32218
## 2   docked_bike               51.14780
## 3 electric_bike               13.76267

Longest Trip by Bicycle Type

bicycle_trip_max <- aggregate(trips_summary$trip_duration, list(trips_summary$rideable_type), FUN = max)
colnames(bicycle_trip_max) <- c("Bicycle Type", "Longest Trip (minutes)")
bicycle_trip_max
##    Bicycle Type Longest Trip (minutes)
## 1  classic_bike               1499.417
## 2   docked_bike              34354.067
## 3 electric_bike                480.000

Shortest Trip by Bicycle Type

bicycle_trip_min <- aggregate(trips_summary$trip_duration, list(trips_summary$rideable_type), FUN = min)
colnames(bicycle_trip_min) <- c("Bicycle Type", "Shortest Trip (minutes)")
bicycle_trip_min
##    Bicycle Type Shortest Trip (minutes)
## 1  classic_bike                       1
## 2   docked_bike                       1
## 3 electric_bike                       1

What was the most popular Bicycle Type overall?

members_trip_popular_bike <- data.frame(table(trips_summary$rideable_type))
colnames(members_trip_popular_bike) <- c("Bicycle Type", "Number of rides")
members_trip_popular_bike
##    Bicycle Type Number of rides
## 1  classic_bike         2558775
## 2   docked_bike          173342
## 3 electric_bike         1560356

Quick Observation: The most popular bicycle type overall was the Classic Bike

What was the most Popular Day by Bicycle Type?

bicycle_trip_popular_days <- data.frame(table(trips_summary$rideable_type, trips_summary$trip_day))
colnames(bicycle_trip_popular_days) <- c("Bicycle Type","Weekday","Number of rides")
bicycle_trip_popular_days
##     Bicycle Type   Weekday Number of rides
## 1   classic_bike    FRIDAY          349141
## 2    docked_bike    FRIDAY           22806
## 3  electric_bike    FRIDAY          226090
## 4   classic_bike    MONDAY          346180
## 5    docked_bike    MONDAY           21995
## 6  electric_bike    MONDAY          207582
## 7   classic_bike  SATURDAY          416944
## 8    docked_bike  SATURDAY           40026
## 9  electric_bike  SATURDAY          235920
## 10  classic_bike    SUNDAY          353180
## 11   docked_bike    SUNDAY           34882
## 12 electric_bike    SUNDAY          200137
## 13  classic_bike  THURSDAY          375396
## 14   docked_bike  THURSDAY           19304
## 15 electric_bike  THURSDAY          240034
## 16  classic_bike   TUESDAY          358874
## 17   docked_bike   TUESDAY           17366
## 18 electric_bike   TUESDAY          220986
## 19  classic_bike WEDNESDAY          359060
## 20   docked_bike WEDNESDAY           16963
## 21 electric_bike WEDNESDAY          229607

Visualized the results

ggplot(bicycle_trip_popular_days, aes(x = `Weekday`, y = `Number of rides`, fill = `Bicycle Type`)) +
  geom_col(position = "dodge") +
  geom_text(aes(label = `Number of rides`), position = position_dodge(width = 0.9), vjust = -0.25)+
  scale_y_continuous(labels = scales::comma) +
  labs(x = "Weekday", y = "Number of rides", fill = "Bicycle Type", title = "Most Popular Weekday by Bicycle Type") +
  theme_minimal()

Extracted the key results for each bicycle type

popular_classic_day <- bicycle_trip_popular_days[bicycle_trip_popular_days[, "Bicycle Type"] == "classic_bike", ]
popular_docked_day <- bicycle_trip_popular_days[bicycle_trip_popular_days[, "Bicycle Type"] == "docked_bike", ]
popular_electric_day <- bicycle_trip_popular_days[bicycle_trip_popular_days[, "Bicycle Type"] == "electric_bike", ]
classic_most_popular_row <- which.max(popular_classic_day[, "Number of rides"])
docked_most_popular_row <- which.max(popular_docked_day[, "Number of rides"])
electric_most_popular_row <- which.max(popular_electric_day[, "Number of rides"])
classic_most_popular_day <- popular_classic_day[classic_most_popular_row, c("Bicycle Type", "Weekday", "Number of rides")]
docked_most_popular_day <- popular_docked_day[docked_most_popular_row, c("Bicycle Type", "Weekday", "Number of rides")]
electric_most_popular_day <- popular_electric_day[electric_most_popular_row, c("Bicycle Type", "Weekday", "Number of rides")]
most_popular_bike_days <- rbind(classic_most_popular_day, docked_most_popular_day, electric_most_popular_day)
most_popular_bike_days
##     Bicycle Type  Weekday Number of rides
## 7   classic_bike SATURDAY          416944
## 8    docked_bike SATURDAY           40026
## 15 electric_bike THURSDAY          240034

Conclusion

Act

The last aspect of the data analysis process is taking action based on the analysis you have done and shared. In this particular case, taking an action has been taken in the form of the recommendations to the Cyclistic team based on the initial questions at the start of this project, which I have outline below.

Questions and Recommendations

  • How do annual members and casual riders use Cyclistic bikes differently?

    • Based on the data above, we can see a few things regarding Annual and Casual Members:

      1. On average, Casual Members tend to take longer rides on Saturdays, whereas Annual members prefer Thursdays.
      2. Casual Members prefer to ride at night, whereas Annual Members prefer Mornings.
      3. Casual and Annual Members love the classic bikes over the other bicycle types, but Casual Members seem to be more inclined to give electric bikes a try as the disparity between the two isn’t as great as with Annual Members.
  • Why would casual riders buy Cyclistic annual memberships?

    • Personally, I don’t believe this question can be reasonably answered with the data provided, as the the benefits and disadvantages would come down to pricing, which isn’t given in this project. However, if we assume that casual riders would enjoy certain benefits that only exist by being an Annual Member, then those benefits would be a good reason to change membership types. These benefits could be:

      • Unlimited rides of a certain duration within 24hr day.

      • Discounted ride rates based on peak periods (Months, Days, Period of the day)

  • How can Cyclistic use digital media to influence casual riders to become members?

    • There are a few ways Cyclistic can use digital media to influence casual riders to become annual members.

      1. Digital Surveys at the end of each ride. These can used to gather qualitative data about why the rider chose the type of bicycle and the purpose of the trip. At the end of the survey, the casual rider could be presented with the price of such trips if they were an annual member, and present them with coupon code that would discount their first year of annual membership if they signed up.

      2. Targeted Digital Media Campaigns. These could be used to great effect when combined with Popular Month, Day and Time Period data exposed in the above analysis for casual riders. Offering discounted Annual Memberships for the first year if they sign up at peak periods for casual riders is a great way to catch their attention and incentivize them to switch to Annual Memberships.

      3. Targeted Digital Media referral campaigns. These campaigns would be primarily aimed at existing Annual Members but could be extended to newly converted Casual Riders. Whereby for existing Annual Members, if they convince a friend who is a casual rider to switch to an Annual Membership:

        • The existing Annual Member or newly converted Casual Rider, gets a small percentage discount on renewal for each casual rider converted

        • The casual rider get discount on their new Annual Membership at sign up.