Scenario

Cyclistic, a bike-share compan=y launched a successful program in 2016. This program includes nearly 6,000 bicycles (ranging in style to accomadate a broad community) and nearly 700 docking stations for simple returns. The bikes can be broken down into three categories: Classic Bike, Docked Bike and Electric Bike.

Cyclistic currently has multiple options for rentals: A single-ride pass, a full-day pass, and an annual membership. The single-ride and full-day pass users are considered Casual users and those with annual memberships are called Members.

The director of marketing. Lily Moreno, would like to maximize the use of the annual memberships due the fact that those memberships are most profitable. The purpose of this case study is to determine how the different types of users interact with the different types of bikes so that the marketing plan can be updated to attract more memberships.

Ask

How do annual members and casual riders use Cyclistic differently?
Why would casual riders buy Cyclistic annual memberships?
How can Cyclistic use digital media to influence riders to become members?

Create a visual representation of the previous 12 months of data that will reveal a relationship between the time of year, type of bike rented and type of membership used.

Prepare

Raw data was uploaded to SQL where manipulation occurred. To start, when looking at the raw data, there are many NULL entries where the latitude and longitude for the rental and return locations. For this reason, bike distance traveled was not included in analysis.

One of the questions that seemed important initially was ‘How long are these bikes being rented for?’ To answer this, I calculated the trip_duration_minutes column by finding the difference between the started_at and ended_at columns. These columns were timestamps. The result was the change in minutes from rental to return. I noticed that there were negative results. These negative entries were not included in the analysis.

Data was collected from Divvy. “Divvy Trip Data.” Accessed [Access Date], from https://divvy-tripdata.s3.amazonaws.com/index.html.

Process

SQL Queuries were used to extract the following data for each of the previous 12 months

date the rental took place
the number of rentals that occurred that day
the type of membership used for the rental
the type of bike rented
the duration (minutes) of that rental.

These tables were then uploaded to R for further processing. To begin, the appropriate libraries were installed and loaded.

# Load the Libraries

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(dplyr)
library(lubridate)
library(ggplot2)
library(knitr)
library(kableExtra)

## 
## Attaching package: 'kableExtra'
## 
## The following object is masked from 'package:dplyr':
## 
##     group_rows

library(readr)

# Set Working Directory
knitr::opts_knit$set(root.dir = "/cloud/project/Data/Cyclistic_Case_Study/Data")

# Load the needed data

july_2022_clean <- read_csv("202207-cleaned.csv")

## Rows: 155 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (2): member_casual, bike_type
## dbl  (2): ride_count, trip_duration_minutes
## date (1): ride_date
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

aug_2022_clean <- read_csv("202208-cleaned.csv")

## Rows: 155 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (2): member_casual, bike_type
## dbl  (2): ride_count, trip_duration_minutes
## date (1): ride_date
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

sept_2022_clean <- read_csv("202209-cleaned.csv")

## Rows: 150 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (2): member_casual, bike_type
## dbl  (2): ride_count, trip_duration_minutes
## date (1): ride_date
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

oct_2022_clean <- read_csv("202210-cleaned.csv")

## Rows: 155 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (2): member_casual, bike_type
## dbl  (2): ride_count, trip_duration_minutes
## date (1): ride_date
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

nov_2022_clean <- read_csv("202211-cleaned.csv")

## Rows: 150 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (2): member_casual, bike_type
## dbl  (2): ride_count, trip_duration_minutes
## date (1): ride_date
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

dec_2022_clean <- read_csv("202212-cleaned.csv")

## Rows: 154 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (2): member_casual, bike_type
## dbl  (2): ride_count, trip_duration_minutes
## date (1): ride_date
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

jan_2023_clean <- read_csv("202301-cleaned.csv")

## Rows: 155 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (2): member_casual, bike_type
## dbl  (2): ride_count, trip_duration_minutes
## date (1): ride_date
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

feb_2023_clean <- read_csv("202302-cleaned.csv")

## Rows: 140 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (2): member_casual, bike_type
## dbl  (2): ride_count, trip_duration_minutes
## date (1): ride_date
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

mar_2023_clean <- read_csv("202303-cleaned.csv")

## Rows: 155 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (2): member_casual, bike_type
## dbl  (2): ride_count, trip_duration_minutes
## date (1): ride_date
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

apr_2023_clean <- read_csv("202304-cleaned.csv")

## Rows: 150 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (2): member_casual, bike_type
## dbl  (2): ride_count, trip_duration_minutes
## date (1): ride_date
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

may_2023_clean <- read_csv("202305-cleaned.csv")

## Rows: 155 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (2): member_casual, bike_type
## dbl  (2): ride_count, trip_duration_minutes
## date (1): ride_date
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

june_2023_clean <- read_csv("202306-cleaned.csv")

## Rows: 150 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (2): member_casual, bike_type
## dbl  (2): ride_count, trip_duration_minutes
## date (1): ride_date
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

# Combine data into one table

all_data <-bind_rows(july_2022_clean, aug_2022_clean, sept_2022_clean, oct_2022_clean, nov_2022_clean, dec_2022_clean, jan_2023_clean, feb_2023_clean, mar_2023_clean, apr_2023_clean, may_2023_clean, june_2023_clean)

Analyze

A table of averages were used to summarize the date.

# Load the Libraries
# SUMMARY OF THE DATA
result <- all_data %>%
  group_by(member_casual, bike_type) %>%
  summarise(mean_duration_minutes = mean(trip_duration_minutes), average_daily_usage = mean(ride_count)) %>%
  rename(Member_Casual = member_casual, Bike_Type = bike_type, Mean_Trip_Duration_Minutes = mean_duration_minutes, Mean_Daily_Rentals = average_daily_usage)

## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.

# Format the tibble as a table
formatted_table <- kable(result, format = "html") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))


# Display the formatted table
formatted_table

Member_Casual	Bike_Type	Mean_Trip_Duration_Minutes	Mean_Daily_Rentals
casual	classic_bike	2609.000	2203.7452
casual	docked_bike	11161.447	387.7671
casual	electric_bike	1489.901	3566.8956
member	classic_bike	2384.910	4632.7479
member	electric_bike	1510.822	5052.7096

To tell this story more efficiently, a Scatterplot was used as a visual.

ggplot(all_data, aes(x = ride_date, y= ride_count, color = trip_duration_minutes)) + geom_point(size = 1) + labs(x = "Date of Rental", y = "Number of Rentals", color = "Trip Duration (minutes)") + facet_grid(member_casual~ bike_type) + scale_color_gradient(low = "blue", high = "red") + theme(axis.text.x = element_text(angle = 45, hjust = 1))

Key Findings

Looking at the scatter plots, you will notice that the bike useage is seasonal. In the December-February months, there is a dip in the number of rentals.

Based on the data collected, docked bikes are only used by casual members and their trip duration are much higher. You notice that although their number of daily rentals are low, their trip durations are quite high. At imes, these trips will last a full month.

Act

Recommendations

There is a clear dip in rentals during the winter months. MArketing campaing could include incentives to increase ridership during these times

Marketing campaign should target docked-bike users. Design the campaign to entice docked-bike users to become members. Trip duration is much longer compared to other bike types. Using this information, design pricing to entice membership for riders who rent for multiple weeks at a time.

Cyclistic Bike-Share Analysis

Ross Harter

2023-07-23

Scenario

Ask

Prepare

Process

Analyze

Act

Recommendations