Business Task

Through financial analysis, Cyclistic has concluded that annual members of their bike-share offering are much more profitable than casual riders. Due to this, the company’s goal is to maximize the number of annual members by marketing specifically to their existing customer pool of casual riders.

Additionally, I am working to identify the motivation behind a casual rider becoming an annual member and how digital media marketing would influence that decision to convert.

Specifically speaking, I am analyzing the Cyclistic historical bike trip data to answer three vital questions and create a practical plan of action for an effective marketing strategy to existing customers.


Key Questions To Answer

My insights will aid the Cyclistic executive team and my manager, Lily Moreno, in understanding how to create an attractive digital marketing campaign that connects with and motivates existing casual riders of the bike-share offering to convert into annual members.

The goal of this analysis is to answer the following questions.

  • How do annual members and casual riders use Cyclistic bikes differently?
  • Why would casual riders buy Cyclistic annual memberships?
  • How can Cyclistic use digital media to influence casual riders to become members?

Answering these questions will support Cyclistic’s need to to maximize their annual memberships and have access to a key element to their future growth as a company.


Dataset Description

This is public data that has been provided by Motivate International Inc. that will allow our analysis to explore the behaviors of different customer types that use Cyclist offerings. All data cleaning, organization, and analysis will be done in RStudio Desktop, using graphs and charts as our visualization tool.

The data files were too large for analysis on an Excel file, Google spreadsheet or BigQuery using SQL. Therefore, the data from the last 12-months to-date of when this analysis was conducted has been imported into RStudio Desktop; the dates are, respectively, from June 3, 2022 to July 15, 2021.

Each data set was saved in a specific folder and copied to preserve the original data from changes in analysis. Each month was renamed for easier use to “tripdata_yearmonth” for all 12 months to coorelate with the number of the month instead of the date when it was originally uploaded.


Overview

Our collection, cleaning and analysis can be separated into 4 main parts:


Collecting the Data

Install the required packages in RStudio: tidyverse, lubridate, and ggplot.

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✔ ggplot2 3.3.6     ✔ purrr   0.3.4
## ✔ tibble  3.1.7     ✔ dplyr   1.0.9
## ✔ tidyr   1.2.0     ✔ stringr 1.4.0
## ✔ readr   2.1.2     ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(ggplot2)

Upload data sets

Install the last 12-months of data into RStudio Desktop; the dates are, respectively, from June 3, 2022 to July 15, 2021. Each month is renamed for easier use to “tripdata_yearmonth” for all 12 months to coorelate with the number of the month instead of the date when it was originally uploaded.

tripdata_202206 <- read.csv("~/Desktop/Case_study_1/Copy _of_data/tripdata_202206.csv")
tripdata_202205 <- read.csv("~/Desktop/Case_study_1/Copy _of_data/tripdata_202205.csv")
tripdata_202204 <- read.csv("~/Desktop/Case_study_1/Copy _of_data/tripdata_202204.csv")
tripdata_202203 <- read.csv("~/Desktop/Case_study_1/Copy _of_data/tripdata_202203.csv")
tripdata_202202 <- read.csv("~/Desktop/Case_study_1/Copy _of_data/tripdata_202202.csv")
tripdata_202201 <- read.csv("~/Desktop/Case_study_1/Copy _of_data/tripdata_202201.csv")
tripdata_202112 <- read.csv("~/Desktop/Case_study_1/Copy _of_data/tripdata_202112.csv")
tripdata_202111 <- read.csv("~/Desktop/Case_study_1/Copy _of_data/tripdata_202111.csv")
tripdata_202110 <- read.csv("~/Desktop/Case_study_1/Copy _of_data/tripdata_202110.csv")
tripdata_202109 <- read.csv("~/Desktop/Case_study_1/Copy _of_data/tripdata_202109.csv")
tripdata_202108 <- read.csv("~/Desktop/Case_study_1/Copy _of_data/tripdata_202108.csv")
tripdata_202107 <- read.csv("~/Desktop/Case_study_1/Copy _of_data/tripdata_202107.csv")

Use colnames function

Check the column names of each data set to ensure that they are all matching. None required mutation as all were the same across each dataset.

colnames(tripdata_202107)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"
colnames(tripdata_202108)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"
colnames(tripdata_202109)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"
colnames(tripdata_202110)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"
colnames(tripdata_202111)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"
colnames(tripdata_202112)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"
colnames(tripdata_202201)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"
colnames(tripdata_202202)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"
colnames(tripdata_202203)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"
colnames(tripdata_202204)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"
colnames(tripdata_202205)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"
colnames(tripdata_202206)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"

Combine all columns

Since all column names are the same, proceed with binding all data into one big data frame.

all_trips <- bind_rows(tripdata_202107,tripdata_202108,tripdata_202109,tripdata_202110,tripdata_202111,tripdata_202112,tripdata_202201,tripdata_202202,tripdata_202203,tripdata_202204,tripdata_202205,tripdata_202206)

Cleaning the Data

Inspect the new table with the following functions; they perform the inspection in the order they are written in the bulleted list.

  • Find the list of all column names
  • Inspect how many rows are in the data frame
  • Find dimensions of data frame
  • Inspect the dimensions of the data frame
  • See the list of columns and data types
  • See the statistical summary of the data
View(all_trips)

Find the list of all column names in the new table all_trips.

colnames(all_trips)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"

Inspect how many rows are in the data frame and find the dimensions.

nrow(all_trips)
## [1] 5860776
dim(all_trips)
## [1] 5860776      13

Inspect the dimensions of the data frame.

head(all_trips)
##            ride_id rideable_type          started_at            ended_at
## 1 99FEC93BA843FB20 electric_bike 2021-06-13 14:31:28 2021-06-13 14:34:11
## 2 06048DCFC8520CAF electric_bike 2021-06-04 11:18:02 2021-06-04 11:24:19
## 3 9598066F68045DF2 electric_bike 2021-06-04 09:49:35 2021-06-04 09:55:34
## 4 B03C0FE48C412214 electric_bike 2021-06-03 19:56:05 2021-06-03 20:21:55
## 5 B9EEA89F8FEE73B7 electric_bike 2021-06-04 14:05:51 2021-06-04 14:09:59
## 6 62B943CEAAA420BA electric_bike 2021-06-03 19:32:01 2021-06-03 19:38:46
##   start_station_name start_station_id end_station_name end_station_id start_lat
## 1                                                                         41.80
## 2                                                                         41.79
## 3                                                                         41.80
## 4                                                                         41.78
## 5                                                                         41.80
## 6                                                                         41.78
##   start_lng end_lat end_lng member_casual
## 1    -87.59   41.80  -87.60        member
## 2    -87.59   41.80  -87.60        member
## 3    -87.60   41.79  -87.59        member
## 4    -87.58   41.80  -87.60        member
## 5    -87.59   41.79  -87.59        member
## 6    -87.58   41.78  -87.58        member
tail(all_trips)
##                  ride_id rideable_type          started_at            ended_at
## 5860771 284843EC9F8C5663  classic_bike 2022-05-30 18:34:44 2022-05-31 19:34:35
## 5860772 8891BA0053ECEC4F electric_bike 2022-05-27 22:00:02 2022-05-27 22:07:01
## 5860773 47D8B5FBCADECFC1 electric_bike 2022-05-15 16:05:39 2022-05-15 16:44:12
## 5860774 AA8D16CF38B40703 electric_bike 2022-05-21 10:10:13 2022-05-21 10:26:09
## 5860775 897EBFD44F329E0A electric_bike 2022-05-12 07:53:58 2022-05-12 08:01:18
## 5860776 AAC23AB89E8A7733 electric_bike 2022-05-11 21:14:28 2022-05-11 21:18:16
##                       start_station_name start_station_id end_station_name
## 5860771            Ashland Ave & Lake St            13073                 
## 5860772            Clark St & Newport St              632                 
## 5860773            Clark St & Newport St              632                 
## 5860774 Francisco Ave & Bloomingdale Ave              429                 
## 5860775 Francisco Ave & Bloomingdale Ave              429                 
## 5860776            Clark St & Newport St              632                 
##         end_station_id start_lat start_lng end_lat end_lng member_casual
## 5860771                 41.88592 -87.66717      NA      NA        casual
## 5860772                 41.94456 -87.65483   41.92  -87.65        member
## 5860773                 41.94448 -87.65476   41.92  -87.76        member
## 5860774                 41.91000 -87.70000   41.92  -87.66        casual
## 5860775                 41.91000 -87.70000   41.90  -87.69        member
## 5860776                 41.94457 -87.65480   41.94  -87.65        member

See the list of columns and data types.

str(all_trips)
## 'data.frame':    5860776 obs. of  13 variables:
##  $ ride_id           : chr  "99FEC93BA843FB20" "06048DCFC8520CAF" "9598066F68045DF2" "B03C0FE48C412214" ...
##  $ rideable_type     : chr  "electric_bike" "electric_bike" "electric_bike" "electric_bike" ...
##  $ started_at        : chr  "2021-06-13 14:31:28" "2021-06-04 11:18:02" "2021-06-04 09:49:35" "2021-06-03 19:56:05" ...
##  $ ended_at          : chr  "2021-06-13 14:34:11" "2021-06-04 11:24:19" "2021-06-04 09:55:34" "2021-06-03 20:21:55" ...
##  $ start_station_name: chr  "" "" "" "" ...
##  $ start_station_id  : chr  "" "" "" "" ...
##  $ end_station_name  : chr  "" "" "" "" ...
##  $ end_station_id    : chr  "" "" "" "" ...
##  $ start_lat         : num  41.8 41.8 41.8 41.8 41.8 ...
##  $ start_lng         : num  -87.6 -87.6 -87.6 -87.6 -87.6 ...
##  $ end_lat           : num  41.8 41.8 41.8 41.8 41.8 ...
##  $ end_lng           : num  -87.6 -87.6 -87.6 -87.6 -87.6 ...
##  $ member_casual     : chr  "member" "member" "member" "member" ...

See the statistical summary of the data.

summary(all_trips)
##    ride_id          rideable_type       started_at          ended_at        
##  Length:5860776     Length:5860776     Length:5860776     Length:5860776    
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##  start_station_name start_station_id   end_station_name   end_station_id    
##  Length:5860776     Length:5860776     Length:5860776     Length:5860776    
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##    start_lat       start_lng         end_lat         end_lng      
##  Min.   :41.64   Min.   :-87.84   Min.   :41.39   Min.   :-88.97  
##  1st Qu.:41.88   1st Qu.:-87.66   1st Qu.:41.88   1st Qu.:-87.66  
##  Median :41.90   Median :-87.64   Median :41.90   Median :-87.64  
##  Mean   :41.90   Mean   :-87.65   Mean   :41.90   Mean   :-87.65  
##  3rd Qu.:41.93   3rd Qu.:-87.63   3rd Qu.:41.93   3rd Qu.:-87.63  
##  Max.   :45.64   Max.   :-73.80   Max.   :42.17   Max.   :-87.49  
##                                   NA's   :5036    NA's   :5036    
##  member_casual     
##  Length:5860776    
##  Class :character  
##  Mode  :character  
##                    
##                    
##                    
## 

Solve 3 problems in the data

Continue with cleaning the data and solving our business task by fixing 3 problems found in the data.

  • Create additional columns of data - date, month, day, year, and day of week.
  • Add a calculated field for ride length to the entire dataframe.
  • Delete negative trip_duration rides as those do not represent ride time, but when bikes were taken out of circulaton for Quality Control reasons.

Solution to problem 1:

Create additional columns of data - date, month, day, year, and day of week. First step is to check to make sure the proper number of observations were reassigned.

table(all_trips$member_casual)
## 
##  casual  member 
## 2559857 3300919

Add columns that list date, month, day, year, and day of week for each ride.

all_trips$date <- as.Date(all_trips$started_at)
all_trips$month <- format(as.Date(all_trips$date),"%m")
all_trips$day <- format(as.Date(all_trips$date),"%d")
all_trips$year <- format(as.Date(all_trips$date),"%Y")
all_trips$day_of_week <- format(as.Date(all_trips$date),"%A")

Solution to problem 2:

Add a calculated field for ride_length to the entire dataframe to determine difference between casual riders and members. Ride_length is calculated in seconds.

all_trips$ride_length <- difftime(all_trips$ended_at,all_trips$started_at)

Inspect the structure of the columns.

str(all_trips)
## 'data.frame':    5860776 obs. of  19 variables:
##  $ ride_id           : chr  "99FEC93BA843FB20" "06048DCFC8520CAF" "9598066F68045DF2" "B03C0FE48C412214" ...
##  $ rideable_type     : chr  "electric_bike" "electric_bike" "electric_bike" "electric_bike" ...
##  $ started_at        : chr  "2021-06-13 14:31:28" "2021-06-04 11:18:02" "2021-06-04 09:49:35" "2021-06-03 19:56:05" ...
##  $ ended_at          : chr  "2021-06-13 14:34:11" "2021-06-04 11:24:19" "2021-06-04 09:55:34" "2021-06-03 20:21:55" ...
##  $ start_station_name: chr  "" "" "" "" ...
##  $ start_station_id  : chr  "" "" "" "" ...
##  $ end_station_name  : chr  "" "" "" "" ...
##  $ end_station_id    : chr  "" "" "" "" ...
##  $ start_lat         : num  41.8 41.8 41.8 41.8 41.8 ...
##  $ start_lng         : num  -87.6 -87.6 -87.6 -87.6 -87.6 ...
##  $ end_lat           : num  41.8 41.8 41.8 41.8 41.8 ...
##  $ end_lng           : num  -87.6 -87.6 -87.6 -87.6 -87.6 ...
##  $ member_casual     : chr  "member" "member" "member" "member" ...
##  $ date              : Date, format: "2021-06-13" "2021-06-04" ...
##  $ month             : chr  "06" "06" "06" "06" ...
##  $ day               : chr  "13" "04" "04" "03" ...
##  $ year              : chr  "2021" "2021" "2021" "2021" ...
##  $ day_of_week       : chr  "Sunday" "Friday" "Friday" "Thursday" ...
##  $ ride_length       : 'difftime' num  163 377 359 1550 ...
##   ..- attr(*, "units")= chr "secs"

Found that ride_length needs to be converted to numeric so we can run calculations on the data.

is.factor(all_trips$ride_length)
## [1] FALSE
all_trips$ride_length <- as.numeric(as.character(all_trips$ride_length))
is.numeric(all_trips$ride_length)
## [1] TRUE

Solution to problem 3:

Delete negative trip_duration rides as those do not represent ride time, but when bikes were taken out of circulaton for Quality Control reasons. Created new table all_trips_v2 to look at data sets without negative values.

all_trips_v2 <- all_trips[!(is.na(all_trips$start_station_name) | is.na(all_trips$start_station_id) | is.na(all_trips$end_station_name) | is.na(all_trips$end_station_id) | all_trips$ride_length<0),]

Check to ensure all_trips_v2 table is correct.

View(all_trips_v2)

Analyzing the Data

Now that the data has been cleaned, we move into analyzing the ride_length of members and casual riders. Begin with finding the average, midpoint number, longest ride, and shortest ride.

mean(all_trips_v2$ride_length)
## [1] 1241.466
median(all_trips_v2$ride_length)
## [1] 681
max(all_trips_v2$ride_length)
## [1] 3356649
min(all_trips_v2$ride_length)
## [1] 0

Can also use summary() to perform all 4 calculations in one function.

summary(all_trips_v2$ride_length)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0     382     681    1241    1236 3356649

Compare members and casual riders

Find the difference between the average, midpoint number, longest ride, and shortest ride of casual riders and members.

aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = mean)
##   all_trips_v2$member_casual all_trips_v2$ride_length
## 1                     casual                1833.0390
## 2                     member                 782.7016
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = median)
##   all_trips_v2$member_casual all_trips_v2$ride_length
## 1                     casual                      916
## 2                     member                      547
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = max)
##   all_trips_v2$member_casual all_trips_v2$ride_length
## 1                     casual                  3356649
## 2                     member                    89998
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = min)
##   all_trips_v2$member_casual all_trips_v2$ride_length
## 1                     casual                        0
## 2                     member                        0

Find the average ride time of each day between members and casual riders. At first glance, the data table comes up out of order pertaining to the days of the week. Run an ordered function to fix the order and then the average ride time after.

all_trips_v2$day_of_week <- ordered(all_trips_v2$day_of_week, levels=c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual + all_trips_v2$day_of_week, FUN = mean)
##    all_trips_v2$member_casual all_trips_v2$day_of_week all_trips_v2$ride_length
## 1                      casual                   Sunday                2121.0590
## 2                      member                   Sunday                 887.9617
## 3                      casual                   Monday                1831.6679
## 4                      member                   Monday                 758.7213
## 5                      casual                  Tuesday                1574.5123
## 6                      member                  Tuesday                 736.7175
## 7                      casual                Wednesday                1599.3681
## 8                      member                Wednesday                 738.5552
## 9                      casual                 Thursday                1662.4662
## 10                     member                 Thursday                 746.5689
## 11                     casual                   Friday                1726.7447
## 12                     member                   Friday                 766.9394
## 13                     casual                 Saturday                2010.4328
## 14                     member                 Saturday                 877.4190

Analyze ridership data by type and weekday.

all_trips_v2 %>% 
  mutate(weekday = wday(started_at, label = TRUE)) %>% 
  group_by(member_casual, weekday) %>% 
  summarise(number_of_rides = n(), average_duration = mean(ride_length)) %>% 
  arrange(member_casual,weekday)
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
## # A tibble: 14 × 4
## # Groups:   member_casual [2]
##    member_casual weekday number_of_rides average_duration
##    <chr>         <ord>             <int>            <dbl>
##  1 casual        Sun              470157            2121.
##  2 casual        Mon              302073            1832.
##  3 casual        Tue              287018            1575.
##  4 casual        Wed              285769            1599.
##  5 casual        Thu              308614            1662.
##  6 casual        Fri              360007            1727.
##  7 casual        Sat              546158            2010.
##  8 member        Sun              394686             888.
##  9 member        Mon              466109             759.
## 10 member        Tue              524804             737.
## 11 member        Wed              512600             739.
## 12 member        Thu              501807             747.
## 13 member        Fri              459798             767.
## 14 member        Sat              441037             877.

Visualizing Data

Use ggplot to visualize rider data and compare members vs. casual riders so we can answer the following questions:

  • How do annual members and casual riders use Cyclistic bikes differently?
  • Why would casual riders buy Cyclistic annual memberships?
  • How can Cyclistic use digital media to influence casual riders to become members?

Visual of the number of rides by rider type

We see at first glance that the data suggests that casual riders use the service more on the weekends and less during the week, while members function in the opposite fashion. A member seems to ride more in the middle of the week and less on the weekends.

all_trips_v2 %>% 
  mutate(weekday = wday(started_at, label = TRUE)) %>% 
  group_by(member_casual, weekday) %>% 
  summarise(number_of_rides = n(), average_duration = mean(ride_length)) %>% 
  arrange(member_casual,weekday) %>% 
  ggplot(aes(x=weekday,y=number_of_rides,fill=member_casual))+
  geom_col(position = "dodge")
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.


Visual of the average duration by rider type

We see at first glance that the data suggests that casual riders have just more than double the ride length, with the highest times on the weekend (Saturday and Sunday). Members seem to have a somewhat steady ride length throughout all 7 days of the week.

all_trips_v2 %>% 
  mutate(weekday = wday(started_at, label = TRUE)) %>% 
  group_by(member_casual, weekday) %>% 
  summarise(number_of_rides = n(), average_duration = mean(ride_length)) %>% 
  arrange(member_casual,weekday) %>% 
  ggplot(aes(x=weekday,y=average_duration,fill=member_casual))+
  geom_col(position = "dodge")
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.


Conclusion

Casual riders and members differ in many different ways.

  • The average ride length of casual riders was 1,833.04 seconds, which translates to 31.38 minutes, while the average length of members was 782.70 seconds, or rather 13.05 minutes. This means that the average ride length of a casual rider is more than double the average length of a member.

  • The max or longest ride length for a casual rider was 3,356,649 seconds which translates into 38.85 days. In comparison, the longest ride of a member was 89,998 seconds or 1.04 days. Members do not use the bike-share service for long ride lengths.

  • The difference between the longest ride of a casual rider and a member is 37.81 days or a 3,635.58% increase. Some casual riders use the bike-share offering for 30+ days longer than members do. It could be theorized that members use the service for commuting or cyclical routes, while a casual rider uses the offer for longer, scenic adventures like a vacation or long weekend trip. Perhaps they are using the bike-share instead of a car or cab to get around the city or country for the short time period they are in town.

  • When we compare the days of the week for when the bike-share is used between casual riders and members, we see a trend of casual riders having a near double trip length across the board for each weekday and on the weekends.

  • Looking at the average time of ride length for casual riders, we find that Saturday, Sunday, and Monday are the most popular days. Sunday’s average is the highest at 2121 seconds or 35.35 minutes, Saturday is second with 2010 seconds or 33.5 minutes, and Monday is the last of the three at 1831 seconds or 30.5 minutes.

  • Members tend to have less varied ride trip length, their times remain between 736 seconds and 887 seconds (or 12.26 minutes to 14.78 minutes) for all 7 days. This means that the average ride length for bike-share members on all weekdays and weekends is just under 15-minutes.