INTRODUCTION

This is one of the capstone projects of the Google Data Analytics Professional Certificate. In this scenario, I am working as a junior data analyst in the marketing analyst team at Cyclistic, a bike-share company in Chicago.

Cyclistic is a bike-share program that was launched in 2016 that features more than 5,800 bicycles and 600 docking stations. Cyclistic sets itself apart by also offering reclining bikes, hand tricycles, and cargo bikes, making bike-share more inclusive to people with disabilities and riders who can’t use a standard two-wheeled bike. The majority of riders opt for traditional bikes; about 8% of riders use the assistive options. Cyclistic users are more likely to ride for leisure, but about 30% use them to commute to work each day. The pricing plans are: single-ride passes, full-day passes, and annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members.

The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. My team wants to understand how casual riders and annual members use Cyclistic bikes differently. We would use the insights gained to design a new marketing strategy to convert casual riders into annual members. But first, Cyclistic executives must approve our recommendations, so they must be backed up with compelling data insights and professional data visualizations. I have followed the steps of the data analysis process:ask, prepare, process, analyze, share, and act, to be able to achieve our goal.

1. ASK

How do Annual members and Casual riders use Cyclistic bikes differently?

Key stakeholders

1.Lily Moreno, the director of marketing and my manger

2.Cyclistic executive team.

*By identifying and analysing trends,my team will be able to answer the key business question and design a new marketing strategy to convert casual riders into annual members

2. PREPARE

Limitations

  • Data-privacy issues prohibit me from using riders’ personally identifiable information. This means that you won’t be able to connect pass purchases to credit card numbers to determine if casual riders live in the Cyclistic service area or if they have purchased multiple single passes

3. PROCESS

  • I’m using R for this analysis because given the size of this data, it’s easier and faster.

  • As of data Integrity, its Accurate, Consistent and Trustworthy.

  • This data was first cleaned in excel by removing duplicates, checking for missing data and ensuring right data type formatting of the column.

#installing and loading necessary packages
library(tidyverse)  #helps wrangle data
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lubridate)  #helps wrangle date attributes
library(ggplot2)  #helps visualize data
library(tidyr)
library(dplyr)
#collect the data
#august_2022 <- read_csv("data/202208-202307-divvy-tripdata/202208-divvy-tripdata.csv")
#september_2022 <- read_csv("data/202208-202307-divvy-tripdata/202209-divvy-publictripdata.csv")
#october_2022 <- read_csv("data/202208-202307-divvy-tripdata/202210-divvy-tripdata.csv")
#november_2022 <- read_csv("data/202208-202307-divvy-tripdata/202211-divvy-tripdata.csv")
#december_2022 <- read_csv("data/202208-202307-divvy-tripdata/202212-divvy-tripdata.csv")
#january_2023 <- read_csv("data/202208-202307-divvy-tripdata/202301-divvy-tripdata.csv")
february_2023 <- read_csv("data/202302-divvy-tripdata.csv")
## Rows: 190445 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
march_2023 <- read_csv("data/202303-divvy-tripdata.csv")
## Rows: 258678 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
april_2023 <- read_csv("data/202304-divvy-tripdata.csv")
## Rows: 426590 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
may_2023 <- read_csv("data/202305-divvy-tripdata.csv")
## Rows: 604827 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
june_2023 <- read_csv("data//202306-divvy-tripdata.csv")
## Rows: 719618 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
july_2023 <- read_csv("data/202307-divvy-tripdata.csv")
## Rows: 767650 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#wrangle data and combine into a single file
#compare column names with each other to ensure consistency. so that they can be joined into one file
#colnames(august_2022)
#colnames(september_2022)
#colnames(october_2022)
#colnames(novemer_2022)
#colnames(december_2022)
#colnames(january_2023)
colnames(february_2023)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"
colnames(march_2023)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"
colnames(april_2023)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"
colnames(may_2023)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"
colnames(june_2023)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"
colnames(july_2023)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"
#now that it is confirm the column names are consistent, combine the dataset as one file
all_trips <- rbind(april_2023, may_2023, june_2023, july_2023)
str(all_trips)
## spc_tbl_ [2,518,685 × 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ ride_id           : chr [1:2518685] "8FE8F7D9C10E88C7" "34E4ED3ADF1D821B" "5296BF07A2F77CB5" "40759916B76D5D52" ...
##  $ rideable_type     : chr [1:2518685] "electric_bike" "electric_bike" "electric_bike" "electric_bike" ...
##  $ started_at        : POSIXct[1:2518685], format: "2023-04-02 08:37:28" "2023-04-19 11:29:02" ...
##  $ ended_at          : POSIXct[1:2518685], format: "2023-04-02 08:41:37" "2023-04-19 11:52:12" ...
##  $ start_station_name: chr [1:2518685] NA NA NA NA ...
##  $ start_station_id  : chr [1:2518685] NA NA NA NA ...
##  $ end_station_name  : chr [1:2518685] NA NA NA NA ...
##  $ end_station_id    : chr [1:2518685] NA NA NA NA ...
##  $ start_lat         : num [1:2518685] 41.8 41.9 41.9 41.9 41.9 ...
##  $ start_lng         : num [1:2518685] -87.6 -87.7 -87.7 -87.7 -87.7 ...
##  $ end_lat           : num [1:2518685] 41.8 41.9 41.9 41.9 41.9 ...
##  $ end_lng           : num [1:2518685] -87.6 -87.7 -87.7 -87.7 -87.6 ...
##  $ member_casual     : chr [1:2518685] "member" "member" "member" "member" ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   ride_id = col_character(),
##   ..   rideable_type = col_character(),
##   ..   started_at = col_datetime(format = ""),
##   ..   ended_at = col_datetime(format = ""),
##   ..   start_station_name = col_character(),
##   ..   start_station_id = col_character(),
##   ..   end_station_name = col_character(),
##   ..   end_station_id = col_character(),
##   ..   start_lat = col_double(),
##   ..   start_lng = col_double(),
##   ..   end_lat = col_double(),
##   ..   end_lng = col_double(),
##   ..   member_casual = col_character()
##   .. )
##  - attr(*, "problems")=<externalptr>
#remove data with missing values
all_trips <- drop_na(all_trips)

# Convert ride_id and rideable_type to character so that they can stack correctly
all_trips <-  mutate(all_trips, ride_id = as.character(ride_id))
# Remove columns that are not needed
all_trips <- all_trips %>%  select(-c(start_lat, start_lng, end_lat, end_lng))
#all_trips <- subset(all_trips, select = -c(start_lat, start_lng, end_lat, end_lng))

Cleaning

# Inspect the new table that has been created
colnames(all_trips)  #List of column names
## [1] "ride_id"            "rideable_type"      "started_at"        
## [4] "ended_at"           "start_station_name" "start_station_id"  
## [7] "end_station_name"   "end_station_id"     "member_casual"
nrow(all_trips)  #How many rows are in data frame
## [1] 1896140
dim(all_trips)  #Dimensions of the data frame
## [1] 1896140       9
head(all_trips)  #See the first 6 rows of data frame.  Also tail(qs_raw)
## # A tibble: 6 × 9
##   ride_id          rideable_type started_at          ended_at           
##   <chr>            <chr>         <dttm>              <dttm>             
## 1 5B6500E1E58655C0 classic_bike  2023-04-10 17:34:35 2023-04-10 18:02:36
## 2 AA65D25D69AF771F classic_bike  2023-04-12 12:29:46 2023-04-12 12:54:00
## 3 079FB2C196414482 electric_bike 2023-04-13 17:39:23 2023-04-13 17:40:57
## 4 599623864C871207 classic_bike  2023-04-29 20:57:10 2023-04-29 20:57:13
## 5 63ECC8A13D11A76A classic_bike  2023-04-20 17:03:11 2023-04-20 17:24:58
## 6 A396F54F4C1927D8 electric_bike 2023-04-14 22:29:12 2023-04-14 22:29:15
## # ℹ 5 more variables: start_station_name <chr>, start_station_id <chr>,
## #   end_station_name <chr>, end_station_id <chr>, member_casual <chr>
str(all_trips)  #See list of columns and data types (numeric, character, etc)
## tibble [1,896,140 × 9] (S3: tbl_df/tbl/data.frame)
##  $ ride_id           : chr [1:1896140] "5B6500E1E58655C0" "AA65D25D69AF771F" "079FB2C196414482" "599623864C871207" ...
##  $ rideable_type     : chr [1:1896140] "classic_bike" "classic_bike" "electric_bike" "classic_bike" ...
##  $ started_at        : POSIXct[1:1896140], format: "2023-04-10 17:34:35" "2023-04-12 12:29:46" ...
##  $ ended_at          : POSIXct[1:1896140], format: "2023-04-10 18:02:36" "2023-04-12 12:54:00" ...
##  $ start_station_name: chr [1:1896140] "Avenue O & 134th St" "Cottage Grove Ave & 51st St" "Morgan Ave & 14th Pl" "Cottage Grove Ave & 51st St" ...
##  $ start_station_id  : chr [1:1896140] "20214" "TA1309000067" "TA1306000002" "TA1309000067" ...
##  $ end_station_name  : chr [1:1896140] "Avenue O & 134th St" "Cottage Grove Ave & 51st St" "Morgan Ave & 14th Pl" "Cottage Grove Ave & 51st St" ...
##  $ end_station_id    : chr [1:1896140] "20214" "TA1309000067" "TA1306000002" "TA1309000067" ...
##  $ member_casual     : chr [1:1896140] "member" "member" "member" "member" ...
summary(all_trips)  #Statistical summary of data. Mainly for numerics
##    ride_id          rideable_type        started_at                   
##  Length:1896140     Length:1896140     Min.   :2023-04-01 00:00:02.0  
##  Class :character   Class :character   1st Qu.:2023-05-11 20:42:45.5  
##  Mode  :character   Mode  :character   Median :2023-06-09 13:37:44.0  
##                                        Mean   :2023-06-07 20:49:18.5  
##                                        3rd Qu.:2023-07-07 13:17:26.5  
##                                        Max.   :2023-07-31 23:59:15.0  
##     ended_at                      start_station_name start_station_id  
##  Min.   :2023-04-01 00:03:10.00   Length:1896140     Length:1896140    
##  1st Qu.:2023-05-11 20:58:41.75   Class :character   Class :character  
##  Median :2023-06-09 13:55:39.00   Mode  :character   Mode  :character  
##  Mean   :2023-06-07 21:06:39.20                                        
##  3rd Qu.:2023-07-07 13:37:24.75                                        
##  Max.   :2023-08-01 20:40:50.00                                        
##  end_station_name   end_station_id     member_casual     
##  Length:1896140     Length:1896140     Length:1896140    
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
## 
# Add columns that list the date, month, day, and year of each ride
# This will allow us to aggregate ride data for each month, day, or year
all_trips$date <- as.Date(all_trips$started_at) #The default format for date in R is yyyy-mm-dd
all_trips$month <- format(as.Date(all_trips$date), "%m")
all_trips$day <- format(as.Date(all_trips$date), "%d")
all_trips$year <- format(as.Date(all_trips$date), "%Y")
all_trips$day_of_week <- format(as.Date(all_trips$date), "%A")

# Add a "ride_length" calculation to all_trips (in seconds)
all_trips$ride_length <- difftime(all_trips$ended_at, all_trips$started_at)
# Inspect the structure of the columns
str(all_trips)
## tibble [1,896,140 × 15] (S3: tbl_df/tbl/data.frame)
##  $ ride_id           : chr [1:1896140] "5B6500E1E58655C0" "AA65D25D69AF771F" "079FB2C196414482" "599623864C871207" ...
##  $ rideable_type     : chr [1:1896140] "classic_bike" "classic_bike" "electric_bike" "classic_bike" ...
##  $ started_at        : POSIXct[1:1896140], format: "2023-04-10 17:34:35" "2023-04-12 12:29:46" ...
##  $ ended_at          : POSIXct[1:1896140], format: "2023-04-10 18:02:36" "2023-04-12 12:54:00" ...
##  $ start_station_name: chr [1:1896140] "Avenue O & 134th St" "Cottage Grove Ave & 51st St" "Morgan Ave & 14th Pl" "Cottage Grove Ave & 51st St" ...
##  $ start_station_id  : chr [1:1896140] "20214" "TA1309000067" "TA1306000002" "TA1309000067" ...
##  $ end_station_name  : chr [1:1896140] "Avenue O & 134th St" "Cottage Grove Ave & 51st St" "Morgan Ave & 14th Pl" "Cottage Grove Ave & 51st St" ...
##  $ end_station_id    : chr [1:1896140] "20214" "TA1309000067" "TA1306000002" "TA1309000067" ...
##  $ member_casual     : chr [1:1896140] "member" "member" "member" "member" ...
##  $ date              : Date[1:1896140], format: "2023-04-10" "2023-04-12" ...
##  $ month             : chr [1:1896140] "04" "04" "04" "04" ...
##  $ day               : chr [1:1896140] "10" "12" "13" "29" ...
##  $ year              : chr [1:1896140] "2023" "2023" "2023" "2023" ...
##  $ day_of_week       : chr [1:1896140] "Monday" "Wednesday" "Thursday" "Saturday" ...
##  $ ride_length       : 'difftime' num [1:1896140] 1681 1454 94 3 ...
##   ..- attr(*, "units")= chr "secs"
# Convert "ride_length" from Factor to numeric so we can run calculations on the data
is.factor(all_trips$ride_length)
## [1] FALSE
all_trips$ride_length <- as.numeric(as.character(all_trips$ride_length))
is.numeric(all_trips$ride_length)
## [1] TRUE
# Remove "bad" data
# The dataframe includes a few hundred entries when bikes were taken out of docks and checked for quality by Divvy or ride_length was negative
# new version of the data v2 will be;
all_trips_v2 <- all_trips[!(all_trips$start_station_name == "HQ QR" | all_trips$ride_length<0),]

4. ANALYZE

#Conduct descriptive analysis
# Descriptive analysis on ride_length (all figures in seconds)
mean(all_trips_v2$ride_length) #straight average (total ride length / rides)
## [1] 1040.609
median(all_trips_v2$ride_length) #midpoint number in the ascending array of ride lengths
## [1] 640
max(all_trips_v2$ride_length) #longest ride
## [1] 728178
min(all_trips_v2$ride_length) #shortest ride
## [1] 0
#summary of the descriptive analysis
summary(all_trips_v2$ride_length)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0     362     640    1041    1150  728178
# Compare members and casual users
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = mean)
##   all_trips_v2$member_casual all_trips_v2$ride_length
## 1                     casual                1460.3255
## 2                     member                 764.3405
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = median)
##   all_trips_v2$member_casual all_trips_v2$ride_length
## 1                     casual                      828
## 2                     member                      548
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = max)
##   all_trips_v2$member_casual all_trips_v2$ride_length
## 1                     casual                   728178
## 2                     member                    89165
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = min)
##   all_trips_v2$member_casual all_trips_v2$ride_length
## 1                     casual                        0
## 2                     member                        0
# See the average ride time by each day for members vs casual users
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual + all_trips_v2$day_of_week, FUN = mean)
##    all_trips_v2$member_casual all_trips_v2$day_of_week all_trips_v2$ride_length
## 1                      casual                   Friday                1405.3515
## 2                      member                   Friday                 747.9173
## 3                      casual                   Monday                1465.2394
## 4                      member                   Monday                 733.0519
## 5                      casual                 Saturday                1619.7595
## 6                      member                 Saturday                 857.7113
## 7                      casual                   Sunday                1660.4473
## 8                      member                   Sunday                 855.6031
## 9                      casual                 Thursday                1285.8628
## 10                     member                 Thursday                 731.8465
## 11                     casual                  Tuesday                1350.4401
## 12                     member                  Tuesday                 737.8307
## 13                     casual                Wednesday                1240.3217
## 14                     member                Wednesday                 719.3947
# Fixing the days of the week that are out of order 
all_trips_v2$day_of_week <- ordered(all_trips_v2$day_of_week, levels=c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))

# To get the average ride time by each day for members vs casual users
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual + all_trips_v2$day_of_week, FUN = mean)
##    all_trips_v2$member_casual all_trips_v2$day_of_week all_trips_v2$ride_length
## 1                      casual                   Sunday                1660.4473
## 2                      member                   Sunday                 855.6031
## 3                      casual                   Monday                1465.2394
## 4                      member                   Monday                 733.0519
## 5                      casual                  Tuesday                1350.4401
## 6                      member                  Tuesday                 737.8307
## 7                      casual                Wednesday                1240.3217
## 8                      member                Wednesday                 719.3947
## 9                      casual                 Thursday                1285.8628
## 10                     member                 Thursday                 731.8465
## 11                     casual                   Friday                1405.3515
## 12                     member                   Friday                 747.9173
## 13                     casual                 Saturday                1619.7595
## 14                     member                 Saturday                 857.7113
# analyze ridership data by type and weekday
all_trips_v2 %>% 
  mutate(weekday = wday(started_at, label = TRUE)) %>%  
  group_by(member_casual, weekday) %>%  
  summarise(number_of_rides = n()   
            ,average_duration = mean(ride_length)) %>%
  arrange(member_casual, weekday)   
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
## # A tibble: 14 × 4
## # Groups:   member_casual [2]
##    member_casual weekday number_of_rides average_duration
##    <chr>         <ord>             <int>            <dbl>
##  1 casual        Sun              122944            1660.
##  2 casual        Mon               91537            1465.
##  3 casual        Tue               84486            1350.
##  4 casual        Wed               82016            1240.
##  5 casual        Thu               98037            1286.
##  6 casual        Fri              113100            1405.
##  7 casual        Sat              160539            1620.
##  8 member        Sun              128154             856.
##  9 member        Mon              162802             733.
## 10 member        Tue              170013             738.
## 11 member        Wed              173191             719.
## 12 member        Thu              186382             732.
## 13 member        Fri              167642             748.
## 14 member        Sat              155283             858.

5. SHARE

# visualize the number of rides by rider type
all_trips_v2 %>% 
  mutate(weekday = wday(started_at, label = TRUE)) %>% 
  group_by(member_casual, weekday) %>% 
  summarise(number_of_rides = n()
            ,average_duration = mean(ride_length)) %>% 
  arrange(member_casual, weekday)  %>% 
  ggplot(aes(x = weekday, y = number_of_rides, fill = member_casual)) +
  geom_col(position = "dodge") 
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.

# Let's create a visualization for average duration
all_trips_v2 %>% 
  mutate(weekday = wday(started_at, label = TRUE)) %>% 
  group_by(member_casual, weekday) %>% 
  summarise(number_of_rides = n()
            ,average_duration = mean(ride_length)) %>% 
  arrange(member_casual, weekday)  %>% 
  ggplot(aes(x = weekday, y = average_duration, fill = member_casual)) +
  geom_col(position = "dodge")
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.

6. ACT

Conclusion

How do Annual members and Casual riders use Cyclistic bike share differently?

From the first visual which shows the number of riders (both casual and member) per day

  • It is evident that on weekdays (Monday through Friday),relatively the same number of casual riders use Cyclistic each day. Same holds true for the member riders but comparing both groups, there were significantly more member riders than casual riders utilizing Cyclistic.

  • Interestingly, on the weekends (Saturday and Sunday), the number of riders from both groups remains relatively close, with Saturday even witnessing a higher count of casual riders.

From the second visual which shows the average_duration of both casual and member riders per day

  • It is evident that casual riders tend to spend more time utilizing the service compared to member riders, both on weekdays and weekends.

Recommendation

  • Annual weekend plan can be included since casual riders tend to use the service on weekends just as frequently as annual members.
  • Run an infographic campaign to inform casual riders about the benefits of annual memberships. Highlight features like unlimited rides, cost savings, and exclusive member perks.