This project is the capstone assignment for the Google Data Analytics Professional Certificate program. The program prepares participants for a career in data analytics with training focused on key analytical skills (data cleaning, analysis, and visualization) and tools (Excel, SQL, R Programming, Tableau).
In the project the steps of the data analysis process followed: Ask, Prepare, Process, Analyze, and Share, Act.
In this project publicly available data sets provided by the course, for a bike share program based in Chicago are analyzed.
The primary purpose is to provide information for the company to use data-driven making decisions
I am a junior data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore, my team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, my team will design a new marketing strategy to convert casual riders into annual members.
Cyclistic is a successful bike-share program launched in 2016. It has since grown to a fleet of 5,824 bicycles that are geo-tracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime.
Cyclistic offers a variety of pricing plans including: single-ride passes, full-day passes, and annual memberships. Customers who purchase a single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members.
Three questions will guide the future marketing program:
The director of marketing has assigned me the first question to answer: How do annual members and casual riders use Cyclistic bikes differently?
In this assignment, I will produce a report with the following deliverables:
The business objective of the case study is to identify trend, pattern & opportunities among casual riders and membership riders, and identify potential riders who can get benefit from annual membership. This will be done through analysis of bike trip data and understanding the user behavior and preferences. The ultimate goal is to increase profitability and drive future growth for the company.
Cyclistic Director of Marketing: Lily Moreno, responsible for implementing the marketing campaigns at Cyclistic.
Cyclistic marketing analytics team: Team of data analysts who are responsible for collecting, analyzing, and reporting data that helps guide Cyclistic marketing strategy.
Cyclistic executive team: Executive team will decide whether to approve the recommended marketing program.
Cyclistic casual riders: They are the target audience of the marketing campaign, and their behavior and preferences will be a key focus of the analysis.
Cyclistic annual members: They are the group that the marketing campaign is trying to convert casual riders into, and their behavior and preferences may be compared to those of casual riders in the analysis.
Cyclistic investors and shareholders: They have a financial interest in the company’s success and may be interested in the results of the analysis and any changes to the marketing strategy.
Data sources used: Cyclistic’s historical trip data downloaded from https://divvy-tripdata.s3.amazonaws.com/index.html
Data owner: Unknown.
Data License: Made available by Motivate International Inc. under the license https://www.divvybikes.com/data-license-agreement.
How is the Data Organized?
The data used consists of monthly .csv files for the past 12 months from January 2022 - December 2022. Each file contains 13 columns of information related to trip data.
Data ROCCC validation:
Reliable(R): The data is reliable, not biased, and can be used in the project.
Original(O): The originality of the project is certain the original public data can be located.
Comprehensive(C): The data includes trip duration, user type, start & destination station name, start & end time which are comprehensive for analyzing user trends.
Current(C): Data is not obsolete as the historical data records from 2013 to 2022.
Cited(C): Yes the data is cited with the proper reference.
Licensing, privacy, security, and accessibility
The data used for this analysis has had all identifying information removed in order to protect the privacy of users. This limitation on the data does restrict the scope of the possible analysis, as it is not possible to determine whether casual riders are repeat users or residents of the Chicago area.
Problem with data
The data consists of 12 monthly trips which are completely separate .csv files. We need to combine all the 12 files into one aggregate file for proper analysis. Unnecessary variables, wrong & duplicated data were removed.
Tools Used
Since the amount of data is large tools like Excel is not able to handle it, hence R programming language is used.For Visualization tableau was considered but due to large size of the data and some limitations, R is chosen for it.
Data Review
All 12 files were combined into one data frame after checking the column names. The combined data set is reviewed.The final data set consisted of 5667186 rows with 13 columns of character and numeric data.
The data contains the following columns:
Data processing and analyzing will occur in RStudio using the R programming language including visualizations.
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.4.1
## Warning: package 'ggplot2' was built under R version 4.4.1
## Warning: package 'dplyr' was built under R version 4.4.1
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## âś” dplyr 1.1.4 âś” readr 2.1.5
## âś” forcats 1.0.0 âś” stringr 1.5.1
## âś” ggplot2 3.5.1 âś” tibble 3.2.1
## âś” lubridate 1.9.3 âś” tidyr 1.3.1
## âś” purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## âś– dplyr::filter() masks stats::filter()
## âś– dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(janitor)
## Warning: package 'janitor' was built under R version 4.4.1
##
## Attaching package: 'janitor'
##
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(readr)
library(psych)
## Warning: package 'psych' was built under R version 4.4.1
##
## Attaching package: 'psych'
##
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
trip_202201 <- read.csv("D://Data Analysis/Case Study/trip_data_2022/202201-divvy-tripdata.csv")
trip_202202 <- read.csv("D://Data Analysis/Case Study/trip_data_2022/202202-divvy-tripdata.csv")
trip_202203 <- read.csv("D://Data Analysis/Case Study/trip_data_2022/202203-divvy-tripdata.csv")
trip_202204 <- read.csv("D://Data Analysis/Case Study/trip_data_2022/202204-divvy-tripdata.csv")
trip_202205 <- read.csv("D://Data Analysis/Case Study/trip_data_2022/202205-divvy-tripdata.csv")
trip_202206 <- read.csv("D://Data Analysis/Case Study/trip_data_2022/202206-divvy-tripdata.csv")
trip_202207 <- read.csv("D://Data Analysis/Case Study/trip_data_2022/202207-divvy-tripdata.csv")
trip_202208 <- read.csv("D://Data Analysis/Case Study/trip_data_2022/202208-divvy-tripdata.csv")
trip_202209 <- read.csv("D://Data Analysis/Case Study/trip_data_2022/202209-divvy-tripdata.csv")
trip_202210 <- read.csv("D://Data Analysis/Case Study/trip_data_2022/202210-divvy-tripdata.csv")
trip_202211 <- read.csv("D://Data Analysis/Case Study/trip_data_2022/202211-divvy-tripdata.csv")
trip_202212 <- read.csv("D://Data Analysis/Case Study/trip_data_2022/202212-divvy-tripdata.csv")
colnames(trip_202201)
## [1] "ride_id" "rideable_type" "started_at"
## [4] "ended_at" "start_station_name" "start_station_id"
## [7] "end_station_name" "end_station_id" "start_lat"
## [10] "start_lng" "end_lat" "end_lng"
## [13] "member_casual"
colnames(trip_202202)
## [1] "ride_id" "rideable_type" "started_at"
## [4] "ended_at" "start_station_name" "start_station_id"
## [7] "end_station_name" "end_station_id" "start_lat"
## [10] "start_lng" "end_lat" "end_lng"
## [13] "member_casual"
colnames(trip_202203)
## [1] "ride_id" "rideable_type" "started_at"
## [4] "ended_at" "start_station_name" "start_station_id"
## [7] "end_station_name" "end_station_id" "start_lat"
## [10] "start_lng" "end_lat" "end_lng"
## [13] "member_casual"
colnames(trip_202204)
## [1] "ride_id" "rideable_type" "started_at"
## [4] "ended_at" "start_station_name" "start_station_id"
## [7] "end_station_name" "end_station_id" "start_lat"
## [10] "start_lng" "end_lat" "end_lng"
## [13] "member_casual"
colnames(trip_202205)
## [1] "ride_id" "rideable_type" "started_at"
## [4] "ended_at" "start_station_name" "start_station_id"
## [7] "end_station_name" "end_station_id" "start_lat"
## [10] "start_lng" "end_lat" "end_lng"
## [13] "member_casual"
colnames(trip_202206)
## [1] "ride_id" "rideable_type" "started_at"
## [4] "ended_at" "start_station_name" "start_station_id"
## [7] "end_station_name" "end_station_id" "start_lat"
## [10] "start_lng" "end_lat" "end_lng"
## [13] "member_casual"
colnames(trip_202207)
## [1] "ride_id" "rideable_type" "started_at"
## [4] "ended_at" "start_station_name" "start_station_id"
## [7] "end_station_name" "end_station_id" "start_lat"
## [10] "start_lng" "end_lat" "end_lng"
## [13] "member_casual"
colnames(trip_202208)
## [1] "ride_id" "rideable_type" "started_at"
## [4] "ended_at" "start_station_name" "start_station_id"
## [7] "end_station_name" "end_station_id" "start_lat"
## [10] "start_lng" "end_lat" "end_lng"
## [13] "member_casual"
colnames(trip_202209)
## [1] "ride_id" "rideable_type" "started_at"
## [4] "ended_at" "start_station_name" "start_station_id"
## [7] "end_station_name" "end_station_id" "start_lat"
## [10] "start_lng" "end_lat" "end_lng"
## [13] "member_casual"
colnames(trip_202210)
## [1] "ride_id" "rideable_type" "started_at"
## [4] "ended_at" "start_station_name" "start_station_id"
## [7] "end_station_name" "end_station_id" "start_lat"
## [10] "start_lng" "end_lat" "end_lng"
## [13] "member_casual"
colnames(trip_202211)
## [1] "ride_id" "rideable_type" "started_at"
## [4] "ended_at" "start_station_name" "start_station_id"
## [7] "end_station_name" "end_station_id" "start_lat"
## [10] "start_lng" "end_lat" "end_lng"
## [13] "member_casual"
colnames(trip_202212)
## [1] "ride_id" "rideable_type" "started_at"
## [4] "ended_at" "start_station_name" "start_station_id"
## [7] "end_station_name" "end_station_id" "start_lat"
## [10] "start_lng" "end_lat" "end_lng"
## [13] "member_casual"
str(trip_202201)
## 'data.frame': 103770 obs. of 13 variables:
## $ ride_id : chr "C2F7DD78E82EC875" "A6CF8980A652D272" "BD0F91DFF741C66D" "CBB80ED419105406" ...
## $ rideable_type : chr "electric_bike" "electric_bike" "classic_bike" "classic_bike" ...
## $ started_at : chr "2022-01-13 11:59:47" "2022-01-10 08:41:56" "2022-01-25 04:53:40" "2022-01-04 00:18:04" ...
## $ ended_at : chr "2022-01-13 12:02:44" "2022-01-10 08:46:17" "2022-01-25 04:58:01" "2022-01-04 00:33:00" ...
## $ start_station_name: chr "Glenwood Ave & Touhy Ave" "Glenwood Ave & Touhy Ave" "Sheffield Ave & Fullerton Ave" "Clark St & Bryn Mawr Ave" ...
## $ start_station_id : chr "525" "525" "TA1306000016" "KA1504000151" ...
## $ end_station_name : chr "Clark St & Touhy Ave" "Clark St & Touhy Ave" "Greenview Ave & Fullerton Ave" "Paulina St & Montrose Ave" ...
## $ end_station_id : chr "RP-007" "RP-007" "TA1307000001" "TA1309000021" ...
## $ start_lat : num 42 42 41.9 42 41.9 ...
## $ start_lng : num -87.7 -87.7 -87.7 -87.7 -87.6 ...
## $ end_lat : num 42 42 41.9 42 41.9 ...
## $ end_lng : num -87.7 -87.7 -87.7 -87.7 -87.6 ...
## $ member_casual : chr "casual" "casual" "member" "casual" ...
str(trip_202202)
## 'data.frame': 115609 obs. of 13 variables:
## $ ride_id : chr "E1E065E7ED285C02" "1602DCDC5B30FFE3" "BE7DD2AF4B55C4AF" "A1789BDF844412BE" ...
## $ rideable_type : chr "classic_bike" "classic_bike" "classic_bike" "classic_bike" ...
## $ started_at : chr "2022-02-19 18:08:41" "2022-02-20 17:41:30" "2022-02-25 18:55:56" "2022-02-14 11:57:03" ...
## $ ended_at : chr "2022-02-19 18:23:56" "2022-02-20 17:45:56" "2022-02-25 19:09:34" "2022-02-14 12:04:00" ...
## $ start_station_name: chr "State St & Randolph St" "Halsted St & Wrightwood Ave" "State St & Randolph St" "Southport Ave & Waveland Ave" ...
## $ start_station_id : chr "TA1305000029" "TA1309000061" "TA1305000029" "13235" ...
## $ end_station_name : chr "Clark St & Lincoln Ave" "Southport Ave & Wrightwood Ave" "Canal St & Adams St" "Broadway & Sheridan Rd" ...
## $ end_station_id : chr "13179" "TA1307000113" "13011" "13323" ...
## $ start_lat : num 41.9 41.9 41.9 41.9 41.9 ...
## $ start_lng : num -87.6 -87.6 -87.6 -87.7 -87.6 ...
## $ end_lat : num 41.9 41.9 41.9 42 41.9 ...
## $ end_lng : num -87.6 -87.7 -87.6 -87.6 -87.6 ...
## $ member_casual : chr "member" "member" "member" "member" ...
str(trip_202203)
## 'data.frame': 284042 obs. of 13 variables:
## $ ride_id : chr "47EC0A7F82E65D52" "8494861979B0F477" "EFE527AF80B66109" "9F446FD9DEE3F389" ...
## $ rideable_type : chr "classic_bike" "electric_bike" "classic_bike" "classic_bike" ...
## $ started_at : chr "2022-03-21 13:45:01" "2022-03-16 09:37:16" "2022-03-23 19:52:02" "2022-03-01 19:12:26" ...
## $ ended_at : chr "2022-03-21 13:51:18" "2022-03-16 09:43:34" "2022-03-23 19:54:48" "2022-03-01 19:22:14" ...
## $ start_station_name: chr "Wabash Ave & Wacker Pl" "Michigan Ave & Oak St" "Broadway & Berwyn Ave" "Wabash Ave & Wacker Pl" ...
## $ start_station_id : chr "TA1307000131" "13042" "13109" "TA1307000131" ...
## $ end_station_name : chr "Kingsbury St & Kinzie St" "Orleans St & Chestnut St (NEXT Apts)" "Broadway & Ridge Ave" "Franklin St & Jackson Blvd" ...
## $ end_station_id : chr "KA1503000043" "620" "15578" "TA1305000025" ...
## $ start_lat : num 41.9 41.9 42 41.9 41.9 ...
## $ start_lng : num -87.6 -87.6 -87.7 -87.6 -87.6 ...
## $ end_lat : num 41.9 41.9 42 41.9 41.9 ...
## $ end_lng : num -87.6 -87.6 -87.7 -87.6 -87.7 ...
## $ member_casual : chr "member" "member" "member" "member" ...
str(trip_202204)
## 'data.frame': 371249 obs. of 13 variables:
## $ ride_id : chr "3564070EEFD12711" "0B820C7FCF22F489" "89EEEE32293F07FF" "84D4751AEB31888D" ...
## $ rideable_type : chr "electric_bike" "classic_bike" "classic_bike" "classic_bike" ...
## $ started_at : chr "2022-04-06 17:42:48" "2022-04-24 19:23:07" "2022-04-20 19:29:08" "2022-04-22 21:14:06" ...
## $ ended_at : chr "2022-04-06 17:54:36" "2022-04-24 19:43:17" "2022-04-20 19:35:16" "2022-04-22 21:23:29" ...
## $ start_station_name: chr "Paulina St & Howard St" "Wentworth Ave & Cermak Rd" "Halsted St & Polk St" "Wentworth Ave & Cermak Rd" ...
## $ start_station_id : chr "515" "13075" "TA1307000121" "13075" ...
## $ end_station_name : chr "University Library (NU)" "Green St & Madison St" "Green St & Madison St" "Delano Ct & Roosevelt Rd" ...
## $ end_station_id : chr "605" "TA1307000120" "TA1307000120" "KA1706005007" ...
## $ start_lat : num 42 41.9 41.9 41.9 41.9 ...
## $ start_lng : num -87.7 -87.6 -87.6 -87.6 -87.6 ...
## $ end_lat : num 42.1 41.9 41.9 41.9 41.9 ...
## $ end_lng : num -87.7 -87.6 -87.6 -87.6 -87.6 ...
## $ member_casual : chr "member" "member" "member" "casual" ...
str(trip_202205)
## 'data.frame': 634858 obs. of 13 variables:
## $ ride_id : chr "EC2DE40644C6B0F4" "1C31AD03897EE385" "1542FBEC830415CF" "6FF59852924528F8" ...
## $ rideable_type : chr "classic_bike" "classic_bike" "classic_bike" "classic_bike" ...
## $ started_at : chr "2022-05-23 23:06:58" "2022-05-11 08:53:28" "2022-05-26 18:36:28" "2022-05-10 07:30:07" ...
## $ ended_at : chr "2022-05-23 23:40:19" "2022-05-11 09:31:22" "2022-05-26 18:58:18" "2022-05-10 07:38:49" ...
## $ start_station_name: chr "Wabash Ave & Grand Ave" "DuSable Lake Shore Dr & Monroe St" "Clinton St & Madison St" "Clinton St & Madison St" ...
## $ start_station_id : chr "TA1307000117" "13300" "TA1305000032" "TA1305000032" ...
## $ end_station_name : chr "Halsted St & Roscoe St" "Field Blvd & South Water St" "Wood St & Milwaukee Ave" "Clark St & Randolph St" ...
## $ end_station_id : chr "TA1309000025" "15534" "13221" "TA1305000030" ...
## $ start_lat : num 41.9 41.9 41.9 41.9 41.9 ...
## $ start_lng : num -87.6 -87.6 -87.6 -87.6 -87.6 ...
## $ end_lat : num 41.9 41.9 41.9 41.9 41.9 ...
## $ end_lng : num -87.6 -87.6 -87.7 -87.6 -87.7 ...
## $ member_casual : chr "member" "member" "member" "member" ...
str(trip_202206)
## 'data.frame': 769204 obs. of 13 variables:
## $ ride_id : chr "600CFD130D0FD2A4" "F5E6B5C1682C6464" "B6EB6D27BAD771D2" "C9C320375DE1D5C6" ...
## $ rideable_type : chr "electric_bike" "electric_bike" "electric_bike" "electric_bike" ...
## $ started_at : chr "2022-06-30 17:27:53" "2022-06-30 18:39:52" "2022-06-30 11:49:25" "2022-06-30 11:15:25" ...
## $ ended_at : chr "2022-06-30 17:35:15" "2022-06-30 18:47:28" "2022-06-30 12:02:54" "2022-06-30 11:19:43" ...
## $ start_station_name: chr "" "" "" "" ...
## $ start_station_id : chr "" "" "" "" ...
## $ end_station_name : chr "" "" "" "" ...
## $ end_station_id : chr "" "" "" "" ...
## $ start_lat : num 41.9 41.9 41.9 41.8 41.9 ...
## $ start_lng : num -87.6 -87.6 -87.7 -87.7 -87.6 ...
## $ end_lat : num 41.9 41.9 41.9 41.8 41.9 ...
## $ end_lng : num -87.6 -87.6 -87.6 -87.7 -87.6 ...
## $ member_casual : chr "casual" "casual" "casual" "casual" ...
str(trip_202207)
## 'data.frame': 823488 obs. of 13 variables:
## $ ride_id : chr "954144C2F67B1932" "292E027607D218B6" "57765852588AD6E0" "B5B6BE44314590E6" ...
## $ rideable_type : chr "classic_bike" "classic_bike" "classic_bike" "classic_bike" ...
## $ started_at : chr "2022-07-05 08:12:47" "2022-07-26 12:53:38" "2022-07-03 13:58:49" "2022-07-31 17:44:21" ...
## $ ended_at : chr "2022-07-05 08:24:32" "2022-07-26 12:55:31" "2022-07-03 14:06:32" "2022-07-31 18:42:50" ...
## $ start_station_name: chr "Ashland Ave & Blackhawk St" "Buckingham Fountain (Temp)" "Buckingham Fountain (Temp)" "Buckingham Fountain (Temp)" ...
## $ start_station_id : chr "13224" "15541" "15541" "15541" ...
## $ end_station_name : chr "Kingsbury St & Kinzie St" "Michigan Ave & 8th St" "Michigan Ave & 8th St" "Woodlawn Ave & 55th St" ...
## $ end_station_id : chr "KA1503000043" "623" "623" "TA1307000164" ...
## $ start_lat : num 41.9 41.9 41.9 41.9 41.9 ...
## $ start_lng : num -87.7 -87.6 -87.6 -87.6 -87.6 ...
## $ end_lat : num 41.9 41.9 41.9 41.8 41.9 ...
## $ end_lng : num -87.6 -87.6 -87.6 -87.6 -87.7 ...
## $ member_casual : chr "member" "casual" "casual" "casual" ...
str(trip_202208)
## 'data.frame': 785932 obs. of 13 variables:
## $ ride_id : chr "550CF7EFEAE0C618" "DAD198F405F9C5F5" "E6F2BC47B65CB7FD" "F597830181C2E13C" ...
## $ rideable_type : chr "electric_bike" "electric_bike" "electric_bike" "electric_bike" ...
## $ started_at : chr "2022-08-07 21:34:15" "2022-08-08 14:39:21" "2022-08-08 15:29:50" "2022-08-08 02:43:50" ...
## $ ended_at : chr "2022-08-07 21:41:46" "2022-08-08 14:53:23" "2022-08-08 15:40:34" "2022-08-08 02:58:53" ...
## $ start_station_name: chr "" "" "" "" ...
## $ start_station_id : chr "" "" "" "" ...
## $ end_station_name : chr "" "" "" "" ...
## $ end_station_id : chr "" "" "" "" ...
## $ start_lat : num 41.9 41.9 42 41.9 41.9 ...
## $ start_lng : num -87.7 -87.6 -87.7 -87.7 -87.7 ...
## $ end_lat : num 41.9 41.9 42 42 41.8 ...
## $ end_lng : num -87.7 -87.6 -87.7 -87.7 -87.7 ...
## $ member_casual : chr "casual" "casual" "casual" "casual" ...
str(trip_202209)
## 'data.frame': 701339 obs. of 13 variables:
## $ ride_id : chr "5156990AC19CA285" "E12D4A16BF51C274" "A02B53CD7DB72DD7" "C82E05FEE872DF11" ...
## $ rideable_type : chr "electric_bike" "electric_bike" "electric_bike" "electric_bike" ...
## $ started_at : chr "2022-09-01 08:36:22" "2022-09-01 17:11:29" "2022-09-01 17:15:50" "2022-09-01 09:00:28" ...
## $ ended_at : chr "2022-09-01 08:39:05" "2022-09-01 17:14:45" "2022-09-01 17:16:12" "2022-09-01 09:10:32" ...
## $ start_station_name: chr "" "" "" "" ...
## $ start_station_id : chr "" "" "" "" ...
## $ end_station_name : chr "California Ave & Milwaukee Ave" "" "" "" ...
## $ end_station_id : chr "13084" "" "" "" ...
## $ start_lat : num 41.9 41.9 41.9 41.9 41.9 ...
## $ start_lng : num -87.7 -87.6 -87.6 -87.7 -87.7 ...
## $ end_lat : num 41.9 41.9 41.9 41.9 41.9 ...
## $ end_lng : num -87.7 -87.6 -87.6 -87.7 -87.7 ...
## $ member_casual : chr "casual" "casual" "casual" "casual" ...
str(trip_202210)
## 'data.frame': 558685 obs. of 13 variables:
## $ ride_id : chr "A50255C1E17942AB" "DB692A70BD2DD4E3" "3C02727AAF60F873" "47E653FDC2D99236" ...
## $ rideable_type : chr "classic_bike" "electric_bike" "electric_bike" "electric_bike" ...
## $ started_at : chr "2022-10-14 17:13:30" "2022-10-01 16:29:26" "2022-10-19 18:55:40" "2022-10-31 07:52:36" ...
## $ ended_at : chr "2022-10-14 17:19:39" "2022-10-01 16:49:06" "2022-10-19 19:03:30" "2022-10-31 07:58:49" ...
## $ start_station_name: chr "Noble St & Milwaukee Ave" "Damen Ave & Charleston St" "Hoyne Ave & Balmoral Ave" "Rush St & Cedar St" ...
## $ start_station_id : chr "13290" "13288" "655" "KA1504000133" ...
## $ end_station_name : chr "Larrabee St & Division St" "Damen Ave & Cullerton St" "Western Ave & Leland Ave" "Orleans St & Chestnut St (NEXT Apts)" ...
## $ end_station_id : chr "KA1504000079" "13089" "TA1307000140" "620" ...
## $ start_lat : num 41.9 41.9 42 41.9 41.9 ...
## $ start_lng : num -87.7 -87.7 -87.7 -87.6 -87.6 ...
## $ end_lat : num 41.9 41.9 42 41.9 41.9 ...
## $ end_lng : num -87.6 -87.7 -87.7 -87.6 -87.6 ...
## $ member_casual : chr "member" "casual" "member" "member" ...
str(trip_202211)
## 'data.frame': 337735 obs. of 13 variables:
## $ ride_id : chr "BCC66FC6FAB27CC7" "772AB67E902C180F" "585EAD07FDEC0152" "91C4E7ED3C262FF9" ...
## $ rideable_type : chr "electric_bike" "classic_bike" "classic_bike" "classic_bike" ...
## $ started_at : chr "2022-11-10 06:21:55" "2022-11-04 07:31:55" "2022-11-21 17:20:29" "2022-11-25 17:29:34" ...
## $ ended_at : chr "2022-11-10 06:31:27" "2022-11-04 07:46:25" "2022-11-21 17:34:36" "2022-11-25 17:45:15" ...
## $ start_station_name: chr "Canal St & Adams St" "Canal St & Adams St" "Indiana Ave & Roosevelt Rd" "Indiana Ave & Roosevelt Rd" ...
## $ start_station_id : chr "13011" "13011" "SL-005" "SL-005" ...
## $ end_station_name : chr "St. Clair St & Erie St" "St. Clair St & Erie St" "St. Clair St & Erie St" "St. Clair St & Erie St" ...
## $ end_station_id : chr "13016" "13016" "13016" "13016" ...
## $ start_lat : num 41.9 41.9 41.9 41.9 41.9 ...
## $ start_lng : num -87.6 -87.6 -87.6 -87.6 -87.6 ...
## $ end_lat : num 41.9 41.9 41.9 41.9 41.9 ...
## $ end_lng : num -87.6 -87.6 -87.6 -87.6 -87.6 ...
## $ member_casual : chr "member" "member" "member" "member" ...
str(trip_202212)
## 'data.frame': 181806 obs. of 13 variables:
## $ ride_id : chr "65DBD2F447EC51C2" "0C201AA7EA0EA1AD" "E0B148CCB358A49D" "54C5775D2B7C9188" ...
## $ rideable_type : chr "electric_bike" "classic_bike" "electric_bike" "classic_bike" ...
## $ started_at : chr "2022-12-05 10:47:18" "2022-12-18 06:42:33" "2022-12-13 08:47:45" "2022-12-13 18:50:47" ...
## $ ended_at : chr "2022-12-05 10:56:34" "2022-12-18 07:08:44" "2022-12-13 08:59:51" "2022-12-13 19:19:48" ...
## $ start_station_name: chr "Clifton Ave & Armitage Ave" "Broadway & Belmont Ave" "Sangamon St & Lake St" "Shields Ave & 31st St" ...
## $ start_station_id : chr "TA1307000163" "13277" "TA1306000015" "KA1503000038" ...
## $ end_station_name : chr "Sedgwick St & Webster Ave" "Sedgwick St & Webster Ave" "St. Clair St & Erie St" "Damen Ave & Madison St" ...
## $ end_station_id : chr "13191" "13191" "13016" "13134" ...
## $ start_lat : num 41.9 41.9 41.9 41.8 41.9 ...
## $ start_lng : num -87.7 -87.6 -87.7 -87.6 -87.7 ...
## $ end_lat : num 41.9 41.9 41.9 41.9 41.9 ...
## $ end_lng : num -87.6 -87.6 -87.6 -87.7 -87.7 ...
## $ member_casual : chr "member" "casual" "member" "member" ...
sum(nrow(trip_202201), nrow(trip_202202), nrow(trip_202203), nrow(trip_202204),
nrow(trip_202205), nrow(trip_202206), nrow(trip_202207), nrow(trip_202208),
nrow(trip_202209), nrow(trip_202210), nrow(trip_202211), nrow(trip_202212))
## [1] 5667717
trip_data_2022 <- rbind(trip_202201, trip_202202, trip_202203, trip_202204,
trip_202205, trip_202206, trip_202207, trip_202208,
trip_202209, trip_202210, trip_202211, trip_202212)
nrow(trip_data_2022)
## [1] 5667717
colnames(trip_data_2022)
## [1] "ride_id" "rideable_type" "started_at"
## [4] "ended_at" "start_station_name" "start_station_id"
## [7] "end_station_name" "end_station_id" "start_lat"
## [10] "start_lng" "end_lat" "end_lng"
## [13] "member_casual"
str(trip_data_2022)
## 'data.frame': 5667717 obs. of 13 variables:
## $ ride_id : chr "C2F7DD78E82EC875" "A6CF8980A652D272" "BD0F91DFF741C66D" "CBB80ED419105406" ...
## $ rideable_type : chr "electric_bike" "electric_bike" "classic_bike" "classic_bike" ...
## $ started_at : chr "2022-01-13 11:59:47" "2022-01-10 08:41:56" "2022-01-25 04:53:40" "2022-01-04 00:18:04" ...
## $ ended_at : chr "2022-01-13 12:02:44" "2022-01-10 08:46:17" "2022-01-25 04:58:01" "2022-01-04 00:33:00" ...
## $ start_station_name: chr "Glenwood Ave & Touhy Ave" "Glenwood Ave & Touhy Ave" "Sheffield Ave & Fullerton Ave" "Clark St & Bryn Mawr Ave" ...
## $ start_station_id : chr "525" "525" "TA1306000016" "KA1504000151" ...
## $ end_station_name : chr "Clark St & Touhy Ave" "Clark St & Touhy Ave" "Greenview Ave & Fullerton Ave" "Paulina St & Montrose Ave" ...
## $ end_station_id : chr "RP-007" "RP-007" "TA1307000001" "TA1309000021" ...
## $ start_lat : num 42 42 41.9 42 41.9 ...
## $ start_lng : num -87.7 -87.7 -87.7 -87.7 -87.6 ...
## $ end_lat : num 42 42 41.9 42 41.9 ...
## $ end_lng : num -87.7 -87.7 -87.7 -87.7 -87.6 ...
## $ member_casual : chr "casual" "casual" "member" "casual" ...
glimpse(trip_data_2022)
## Rows: 5,667,717
## Columns: 13
## $ ride_id <chr> "C2F7DD78E82EC875", "A6CF8980A652D272", "BD0F91DFF7…
## $ rideable_type <chr> "electric_bike", "electric_bike", "classic_bike", "…
## $ started_at <chr> "2022-01-13 11:59:47", "2022-01-10 08:41:56", "2022…
## $ ended_at <chr> "2022-01-13 12:02:44", "2022-01-10 08:46:17", "2022…
## $ start_station_name <chr> "Glenwood Ave & Touhy Ave", "Glenwood Ave & Touhy A…
## $ start_station_id <chr> "525", "525", "TA1306000016", "KA1504000151", "TA13…
## $ end_station_name <chr> "Clark St & Touhy Ave", "Clark St & Touhy Ave", "Gr…
## $ end_station_id <chr> "RP-007", "RP-007", "TA1307000001", "TA1309000021",…
## $ start_lat <dbl> 42.01280, 42.01276, 41.92560, 41.98359, 41.87785, 4…
## $ start_lng <dbl> -87.66591, -87.66597, -87.65371, -87.66915, -87.624…
## $ end_lat <dbl> 42.01256, 42.01256, 41.92533, 41.96151, 41.88462, 4…
## $ end_lng <dbl> -87.67437, -87.67437, -87.66580, -87.67139, -87.627…
## $ member_casual <chr> "casual", "casual", "member", "casual", "member", "…
trip_data_2022 <- trip_data_2022 %>%
select(-c(start_lat, start_lng, end_lat, end_lng))
trip_data_2022 <- trip_data_2022 %>%
filter(started_at < ended_at)
trip_data_2022 <- distinct(trip_data_2022)
trip_data_2022$date <- as.Date(trip_data_2022$started_at)
trip_data_2022$month <- format(as.Date(trip_data_2022$date), "%b_%y")
trip_data_2022$day <- format(as.Date(trip_data_2022$date), "%d")
trip_data_2022$week_day <- format(as.Date(trip_data_2022$date), "%A")
trip_data_2022$year <- format(as.Date(trip_data_2022$date), "%Y")
trip_data_2022$start_hour <- lubridate:: hour(trip_data_2022$started_at)
trip_data_2022$ride_length <- difftime(trip_data_2022$ended_at, trip_data_2022$started_at, units = "mins")
trip_data_2022 <- dplyr::rename(trip_data_2022, customer_type = member_casual, bike_type = rideable_type)
clean_trip_data_2022 <- trip_data_2022[!trip_data_2022$ride_length < 0,]
dim(clean_trip_data_2022)
## [1] 5667186 16
write.csv(clean_trip_data_2022, "D://Data Analysis/Case Study/trip_data_2022/clean_trip_data_2022.csv", row.names = FALSE)
clean_trip_final_2022 <- read.csv("D://Data Analysis/Case Study/trip_data_2022/clean_trip_data_2022.csv")
clean_trip_final_2022$month <- ordered(clean_trip_final_2022$month,
levels=c("Jan_22", "Feb_22", "Mar_22", "Apr_22",
"May_22", "Jun_22", "Jul_22", "Aug_22",
"Sep_22", "Oct_22", "Nov_22", "Dec_22"))
clean_trip_final_2022$week_day <- ordered(clean_trip_final_2022$week_day,
levels = c("Sunday", "Monday", "Tuesday",
"Wednesday", "Thursday", "Friday",
"Saturday"))
View(describe(clean_trip_final_2022$ride_length, fast=TRUE))
view(table(clean_trip_final_2022$customer_type))
View(setNames(aggregate(ride_length ~ customer_type, clean_trip_final_2022, sum),
c("customer_type", "total_ride_length(mins)")))
View(clean_trip_final_2022 %>%
group_by(customer_type) %>%
summarise(min_length_mins = min(ride_length),
max_length_mins = max(ride_length),
median_length_mins = median(ride_length),
mean_length_mins = mean(ride_length)))
View(clean_trip_final_2022 %>%
group_by(week_day) %>%
summarise(avg_length = mean(ride_length),
number_of_ride = n()))
view(clean_trip_final_2022 %>%
group_by(month) %>%
summarise(avg_length = mean(ride_length), number_of_ride = n()))
view(aggregate(clean_trip_final_2022$ride_length ~ clean_trip_final_2022$customer_type +
clean_trip_final_2022$week_day, FUN = mean))
view(aggregate(clean_trip_final_2022$ride_length ~ clean_trip_final_2022$customer_type +
clean_trip_final_2022$month, FUN = mean))
view(clean_trip_final_2022 %>%
group_by(customer_type, week_day) %>%
summarise(number_of_ride = n(), avg_duration = mean(ride_length),
median_duration = median(ride_length), max_duration = max(ride_length),
min_duration = min(ride_length)))
## `summarise()` has grouped output by 'customer_type'. You can override using the
## `.groups` argument.
view(clean_trip_final_2022 %>%
group_by(customer_type, month) %>%
summarise(number_of_ride = n(), avg_duration = mean(ride_length),
median_duration = median(ride_length),
max_duration = max(ride_length),
min_duration = min(ride_length)))
## `summarise()` has grouped output by 'customer_type'. You can override using the
## `.groups` argument.
write.csv(clean_trip_final_2022,
"D://Data Analysis/Case Study/trip_data_2022/clean_trip_final_2022_visual.csv",
row.names = FALSE)
Key Takeaways :
Recommendations :