Cyclistic, a bike-share compan=y launched a successful program in 2016. This program includes nearly 6,000 bicycles (ranging in style to accomadate a broad community) and nearly 700 docking stations for simple returns. The bikes can be broken down into three categories: Classic Bike, Docked Bike and Electric Bike.
Cyclistic currently has multiple options for rentals: A single-ride pass, a full-day pass, and an annual membership. The single-ride and full-day pass users are considered Casual users and those with annual memberships are called Members.
The director of marketing. Lily Moreno, would like to maximize the use of the annual memberships due the fact that those memberships are most profitable. The purpose of this case study is to determine how the different types of users interact with the different types of bikes so that the marketing plan can be updated to attract more memberships.
Create a visual representation of the previous 12 months of data that will reveal a relationship between the time of year, type of bike rented and type of membership used.
Raw data was uploaded to SQL where manipulation occurred. To start, when looking at the raw data, there are many NULL entries where the latitude and longitude for the rental and return locations. For this reason, bike distance traveled was not included in analysis.
One of the questions that seemed important initially was ‘How long are these bikes being rented for?’ To answer this, I calculated the trip_duration_minutes column by finding the difference between the started_at and ended_at columns. These columns were timestamps. The result was the change in minutes from rental to return. I noticed that there were negative results. These negative entries were not included in the analysis.
Data was collected from Divvy. “Divvy Trip Data.” Accessed [Access Date], from https://divvy-tripdata.s3.amazonaws.com/index.html.
SQL Queuries were used to extract the following data for each of the previous 12 months
These tables were then uploaded to R for further processing. To begin, the appropriate libraries were installed and loaded.
# Load the Libraries
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(lubridate)
library(ggplot2)
library(knitr)
library(kableExtra)
##
## Attaching package: 'kableExtra'
##
## The following object is masked from 'package:dplyr':
##
## group_rows
library(readr)
# Set Working Directory
knitr::opts_knit$set(root.dir = "/cloud/project/Data/Cyclistic_Case_Study/Data")
# Load the needed data
july_2022_clean <- read_csv("202207-cleaned.csv")
## Rows: 155 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): member_casual, bike_type
## dbl (2): ride_count, trip_duration_minutes
## date (1): ride_date
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
aug_2022_clean <- read_csv("202208-cleaned.csv")
## Rows: 155 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): member_casual, bike_type
## dbl (2): ride_count, trip_duration_minutes
## date (1): ride_date
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
sept_2022_clean <- read_csv("202209-cleaned.csv")
## Rows: 150 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): member_casual, bike_type
## dbl (2): ride_count, trip_duration_minutes
## date (1): ride_date
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
oct_2022_clean <- read_csv("202210-cleaned.csv")
## Rows: 155 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): member_casual, bike_type
## dbl (2): ride_count, trip_duration_minutes
## date (1): ride_date
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
nov_2022_clean <- read_csv("202211-cleaned.csv")
## Rows: 150 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): member_casual, bike_type
## dbl (2): ride_count, trip_duration_minutes
## date (1): ride_date
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
dec_2022_clean <- read_csv("202212-cleaned.csv")
## Rows: 154 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): member_casual, bike_type
## dbl (2): ride_count, trip_duration_minutes
## date (1): ride_date
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
jan_2023_clean <- read_csv("202301-cleaned.csv")
## Rows: 155 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): member_casual, bike_type
## dbl (2): ride_count, trip_duration_minutes
## date (1): ride_date
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
feb_2023_clean <- read_csv("202302-cleaned.csv")
## Rows: 140 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): member_casual, bike_type
## dbl (2): ride_count, trip_duration_minutes
## date (1): ride_date
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
mar_2023_clean <- read_csv("202303-cleaned.csv")
## Rows: 155 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): member_casual, bike_type
## dbl (2): ride_count, trip_duration_minutes
## date (1): ride_date
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
apr_2023_clean <- read_csv("202304-cleaned.csv")
## Rows: 150 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): member_casual, bike_type
## dbl (2): ride_count, trip_duration_minutes
## date (1): ride_date
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
may_2023_clean <- read_csv("202305-cleaned.csv")
## Rows: 155 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): member_casual, bike_type
## dbl (2): ride_count, trip_duration_minutes
## date (1): ride_date
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
june_2023_clean <- read_csv("202306-cleaned.csv")
## Rows: 150 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): member_casual, bike_type
## dbl (2): ride_count, trip_duration_minutes
## date (1): ride_date
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Combine data into one table
all_data <-bind_rows(july_2022_clean, aug_2022_clean, sept_2022_clean, oct_2022_clean, nov_2022_clean, dec_2022_clean, jan_2023_clean, feb_2023_clean, mar_2023_clean, apr_2023_clean, may_2023_clean, june_2023_clean)
A table of averages were used to summarize the date.
# Load the Libraries
# SUMMARY OF THE DATA
result <- all_data %>%
group_by(member_casual, bike_type) %>%
summarise(mean_duration_minutes = mean(trip_duration_minutes), average_daily_usage = mean(ride_count)) %>%
rename(Member_Casual = member_casual, Bike_Type = bike_type, Mean_Trip_Duration_Minutes = mean_duration_minutes, Mean_Daily_Rentals = average_daily_usage)
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
# Format the tibble as a table
formatted_table <- kable(result, format = "html") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
# Display the formatted table
formatted_table
| Member_Casual | Bike_Type | Mean_Trip_Duration_Minutes | Mean_Daily_Rentals |
|---|---|---|---|
| casual | classic_bike | 2609.000 | 2203.7452 |
| casual | docked_bike | 11161.447 | 387.7671 |
| casual | electric_bike | 1489.901 | 3566.8956 |
| member | classic_bike | 2384.910 | 4632.7479 |
| member | electric_bike | 1510.822 | 5052.7096 |
There is a clear dip in rentals during the winter months. MArketing campaing could include incentives to increase ridership during these times
Marketing campaign should target docked-bike users. Design the campaign to entice docked-bike users to become members. Trip duration is much longer compared to other bike types. Using this information, design pricing to entice membership for riders who rent for multiple weeks at a time.