The Scenario:
You are a junior data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore, your team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, your team will design a new marketing strategy to convert casual riders into annual members.
All data and the license to use it from is from The license https://divvybikes.com/data-license-agreement
The data source https://divvy-tripdata.s3.amazonaws.com/index.html
202201-divvy-tripdata to 202212-divvy-tipdata was used for this analyst
Setting up my R environment by loading the ‘tidyverse’ and ‘scales’ packages that will be used for analysis. Packages are Functions created by people to tell the computer to do certain task. The reason I am loading the Tidyverse Package because it is a cluster of many functions packages and I will break down each of the packages that will be used in the analyst.
“dplyr” Function that help with data manipulation
“forcats” Functions that provides tools for working with factor.
“ggplot2” Function used for creating data visuals
“lubridate” Functions to work with date-times and time-spans
“purr” makes functions easier to work with
“readr” Functions to read and import data
“stringr” Functions designed to make working with strings as easy as possible
“tibble” Function that works with data frames
“tidyr” Function to clean and tidy up data
Lastly the “scales” package function that centers and/or scales the columns of a numeric matrix by default
# The function install.package is telling the computer to install the function package into Rstudio
install.packages("tidyverse", repos = "http://cran.us.r-project.org")
## Installing package into 'C:/Users/16473/AppData/Local/R/win-library/4.3'
## (as 'lib' is unspecified)
## package 'tidyverse' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\16473\AppData\Local\Temp\RtmpeEGt9K\downloaded_packages
install.packages("scales", repos = "http://cran.us.r-project.org")
## Installing package into 'C:/Users/16473/AppData/Local/R/win-library/4.3'
## (as 'lib' is unspecified)
## package 'scales' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\16473\AppData\Local\Temp\RtmpeEGt9K\downloaded_packages
#The function Library is telling the computer to load the functions into Rstudio
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.3 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(scales)
##
## Attaching package: 'scales'
##
## The following object is masked from 'package:purrr':
##
## discard
##
## The following object is masked from 'package:readr':
##
## col_factor
Importing all 2022 data to use for analyst. The first part of the code before the “<-” is tell the computer what to name the data table
The second part of the code is telling the computer to import the Csv file in to Rstudio.
It will be doing this 12 times because the data is made for each month
X202201_divvy_tripdata <- read_csv("202201-divvy-tripdata.csv")
X202202_divvy_tripdata <- read_csv("202202-divvy-tripdata.csv")
X202203_divvy_tripdata <- read_csv("202203-divvy-tripdata.csv")
X202204_divvy_tripdata <- read_csv("202204-divvy-tripdata.csv")
X202205_divvy_tripdata <- read_csv("202205-divvy-tripdata.csv")
X202206_divvy_tripdata <- read_csv("202206-divvy-tripdata.csv")
X202207_divvy_tripdata <- read_csv("202207-divvy-tripdata.csv")
X202208_divvy_tripdata <- read_csv("202208-divvy-tripdata.csv")
X202209_divvy_tripdata <- read_csv("202209-divvy-tripdata.csv")
X202210_divvy_tripdata <- read_csv("202210-divvy-tripdata.csv")
X202211_divvy_tripdata <- read_csv("202211-divvy-tripdata.csv")
X202212_divvy_tripdata <- read_csv("202212-divvy-tripdata.csv")
Inspecting each Columns name to see if there are any that would need to be fixed
colnames(X202201_divvy_tripdata)
## [1] "ride_id" "rideable_type" "started_at"
## [4] "ended_at" "start_station_name" "start_station_id"
## [7] "end_station_name" "end_station_id" "start_lat"
## [10] "start_lng" "end_lat" "end_lng"
## [13] "member_casual"
colnames(X202202_divvy_tripdata)
## [1] "ride_id" "rideable_type" "started_at"
## [4] "ended_at" "start_station_name" "start_station_id"
## [7] "end_station_name" "end_station_id" "start_lat"
## [10] "start_lng" "end_lat" "end_lng"
## [13] "member_casual"
colnames(X202203_divvy_tripdata)
## [1] "ride_id" "rideable_type" "started_at"
## [4] "ended_at" "start_station_name" "start_station_id"
## [7] "end_station_name" "end_station_id" "start_lat"
## [10] "start_lng" "end_lat" "end_lng"
## [13] "member_casual"
colnames(X202204_divvy_tripdata)
## [1] "ride_id" "rideable_type" "started_at"
## [4] "ended_at" "start_station_name" "start_station_id"
## [7] "end_station_name" "end_station_id" "start_lat"
## [10] "start_lng" "end_lat" "end_lng"
## [13] "member_casual"
colnames(X202205_divvy_tripdata)
## [1] "ride_id" "rideable_type" "started_at"
## [4] "ended_at" "start_station_name" "start_station_id"
## [7] "end_station_name" "end_station_id" "start_lat"
## [10] "start_lng" "end_lat" "end_lng"
## [13] "member_casual"
colnames(X202206_divvy_tripdata)
## [1] "ride_id" "rideable_type" "started_at"
## [4] "ended_at" "start_station_name" "start_station_id"
## [7] "end_station_name" "end_station_id" "start_lat"
## [10] "start_lng" "end_lat" "end_lng"
## [13] "member_casual"
colnames(X202207_divvy_tripdata)
## [1] "ride_id" "rideable_type" "started_at"
## [4] "ended_at" "start_station_name" "start_station_id"
## [7] "end_station_name" "end_station_id" "start_lat"
## [10] "start_lng" "end_lat" "end_lng"
## [13] "member_casual"
colnames(X202208_divvy_tripdata)
## [1] "ride_id" "rideable_type" "started_at"
## [4] "ended_at" "start_station_name" "start_station_id"
## [7] "end_station_name" "end_station_id" "start_lat"
## [10] "start_lng" "end_lat" "end_lng"
## [13] "member_casual"
colnames(X202209_divvy_tripdata)
## [1] "ride_id" "rideable_type" "started_at"
## [4] "ended_at" "start_station_name" "start_station_id"
## [7] "end_station_name" "end_station_id" "start_lat"
## [10] "start_lng" "end_lat" "end_lng"
## [13] "member_casual"
colnames(X202210_divvy_tripdata)
## [1] "ride_id" "rideable_type" "started_at"
## [4] "ended_at" "start_station_name" "start_station_id"
## [7] "end_station_name" "end_station_id" "start_lat"
## [10] "start_lng" "end_lat" "end_lng"
## [13] "member_casual"
colnames(X202211_divvy_tripdata)
## [1] "ride_id" "rideable_type" "started_at"
## [4] "ended_at" "start_station_name" "start_station_id"
## [7] "end_station_name" "end_station_id" "start_lat"
## [10] "start_lng" "end_lat" "end_lng"
## [13] "member_casual"
colnames(X202212_divvy_tripdata)
## [1] "ride_id" "rideable_type" "started_at"
## [4] "ended_at" "start_station_name" "start_station_id"
## [7] "end_station_name" "end_station_id" "start_lat"
## [10] "start_lng" "end_lat" "end_lng"
## [13] "member_casual"
Checking if each data sets strings are in the correct format The code is telling the computer to check each string values in each column and check the format of them
str(X202201_divvy_tripdata)
## spc_tbl_ [103,770 × 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ ride_id : chr [1:103770] "C2F7DD78E82EC875" "A6CF8980A652D272" "BD0F91DFF741C66D" "CBB80ED419105406" ...
## $ rideable_type : chr [1:103770] "electric_bike" "electric_bike" "classic_bike" "classic_bike" ...
## $ started_at : POSIXct[1:103770], format: "2022-01-13 11:59:47" "2022-01-10 08:41:56" ...
## $ ended_at : POSIXct[1:103770], format: "2022-01-13 12:02:44" "2022-01-10 08:46:17" ...
## $ start_station_name: chr [1:103770] "Glenwood Ave & Touhy Ave" "Glenwood Ave & Touhy Ave" "Sheffield Ave & Fullerton Ave" "Clark St & Bryn Mawr Ave" ...
## $ start_station_id : chr [1:103770] "525" "525" "TA1306000016" "KA1504000151" ...
## $ end_station_name : chr [1:103770] "Clark St & Touhy Ave" "Clark St & Touhy Ave" "Greenview Ave & Fullerton Ave" "Paulina St & Montrose Ave" ...
## $ end_station_id : chr [1:103770] "RP-007" "RP-007" "TA1307000001" "TA1309000021" ...
## $ start_lat : num [1:103770] 42 42 41.9 42 41.9 ...
## $ start_lng : num [1:103770] -87.7 -87.7 -87.7 -87.7 -87.6 ...
## $ end_lat : num [1:103770] 42 42 41.9 42 41.9 ...
## $ end_lng : num [1:103770] -87.7 -87.7 -87.7 -87.7 -87.6 ...
## $ member_casual : chr [1:103770] "casual" "casual" "member" "casual" ...
## - attr(*, "spec")=
## .. cols(
## .. ride_id = col_character(),
## .. rideable_type = col_character(),
## .. started_at = col_datetime(format = ""),
## .. ended_at = col_datetime(format = ""),
## .. start_station_name = col_character(),
## .. start_station_id = col_character(),
## .. end_station_name = col_character(),
## .. end_station_id = col_character(),
## .. start_lat = col_double(),
## .. start_lng = col_double(),
## .. end_lat = col_double(),
## .. end_lng = col_double(),
## .. member_casual = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
str(X202202_divvy_tripdata)
## spc_tbl_ [115,609 × 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ ride_id : chr [1:115609] "E1E065E7ED285C02" "1602DCDC5B30FFE3" "BE7DD2AF4B55C4AF" "A1789BDF844412BE" ...
## $ rideable_type : chr [1:115609] "classic_bike" "classic_bike" "classic_bike" "classic_bike" ...
## $ started_at : POSIXct[1:115609], format: "2022-02-19 18:08:41" "2022-02-20 17:41:30" ...
## $ ended_at : POSIXct[1:115609], format: "2022-02-19 18:23:56" "2022-02-20 17:45:56" ...
## $ start_station_name: chr [1:115609] "State St & Randolph St" "Halsted St & Wrightwood Ave" "State St & Randolph St" "Southport Ave & Waveland Ave" ...
## $ start_station_id : chr [1:115609] "TA1305000029" "TA1309000061" "TA1305000029" "13235" ...
## $ end_station_name : chr [1:115609] "Clark St & Lincoln Ave" "Southport Ave & Wrightwood Ave" "Canal St & Adams St" "Broadway & Sheridan Rd" ...
## $ end_station_id : chr [1:115609] "13179" "TA1307000113" "13011" "13323" ...
## $ start_lat : num [1:115609] 41.9 41.9 41.9 41.9 41.9 ...
## $ start_lng : num [1:115609] -87.6 -87.6 -87.6 -87.7 -87.6 ...
## $ end_lat : num [1:115609] 41.9 41.9 41.9 42 41.9 ...
## $ end_lng : num [1:115609] -87.6 -87.7 -87.6 -87.6 -87.6 ...
## $ member_casual : chr [1:115609] "member" "member" "member" "member" ...
## - attr(*, "spec")=
## .. cols(
## .. ride_id = col_character(),
## .. rideable_type = col_character(),
## .. started_at = col_datetime(format = ""),
## .. ended_at = col_datetime(format = ""),
## .. start_station_name = col_character(),
## .. start_station_id = col_character(),
## .. end_station_name = col_character(),
## .. end_station_id = col_character(),
## .. start_lat = col_double(),
## .. start_lng = col_double(),
## .. end_lat = col_double(),
## .. end_lng = col_double(),
## .. member_casual = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
str(X202203_divvy_tripdata)
## spc_tbl_ [284,042 × 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ ride_id : chr [1:284042] "47EC0A7F82E65D52" "8494861979B0F477" "EFE527AF80B66109" "9F446FD9DEE3F389" ...
## $ rideable_type : chr [1:284042] "classic_bike" "electric_bike" "classic_bike" "classic_bike" ...
## $ started_at : POSIXct[1:284042], format: "2022-03-21 13:45:01" "2022-03-16 09:37:16" ...
## $ ended_at : POSIXct[1:284042], format: "2022-03-21 13:51:18" "2022-03-16 09:43:34" ...
## $ start_station_name: chr [1:284042] "Wabash Ave & Wacker Pl" "Michigan Ave & Oak St" "Broadway & Berwyn Ave" "Wabash Ave & Wacker Pl" ...
## $ start_station_id : chr [1:284042] "TA1307000131" "13042" "13109" "TA1307000131" ...
## $ end_station_name : chr [1:284042] "Kingsbury St & Kinzie St" "Orleans St & Chestnut St (NEXT Apts)" "Broadway & Ridge Ave" "Franklin St & Jackson Blvd" ...
## $ end_station_id : chr [1:284042] "KA1503000043" "620" "15578" "TA1305000025" ...
## $ start_lat : num [1:284042] 41.9 41.9 42 41.9 41.9 ...
## $ start_lng : num [1:284042] -87.6 -87.6 -87.7 -87.6 -87.6 ...
## $ end_lat : num [1:284042] 41.9 41.9 42 41.9 41.9 ...
## $ end_lng : num [1:284042] -87.6 -87.6 -87.7 -87.6 -87.7 ...
## $ member_casual : chr [1:284042] "member" "member" "member" "member" ...
## - attr(*, "spec")=
## .. cols(
## .. ride_id = col_character(),
## .. rideable_type = col_character(),
## .. started_at = col_datetime(format = ""),
## .. ended_at = col_datetime(format = ""),
## .. start_station_name = col_character(),
## .. start_station_id = col_character(),
## .. end_station_name = col_character(),
## .. end_station_id = col_character(),
## .. start_lat = col_double(),
## .. start_lng = col_double(),
## .. end_lat = col_double(),
## .. end_lng = col_double(),
## .. member_casual = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
str(X202204_divvy_tripdata)
## spc_tbl_ [371,249 × 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ ride_id : chr [1:371249] "3564070EEFD12711" "0B820C7FCF22F489" "89EEEE32293F07FF" "84D4751AEB31888D" ...
## $ rideable_type : chr [1:371249] "electric_bike" "classic_bike" "classic_bike" "classic_bike" ...
## $ started_at : POSIXct[1:371249], format: "2022-04-06 17:42:48" "2022-04-24 19:23:07" ...
## $ ended_at : POSIXct[1:371249], format: "2022-04-06 17:54:36" "2022-04-24 19:43:17" ...
## $ start_station_name: chr [1:371249] "Paulina St & Howard St" "Wentworth Ave & Cermak Rd" "Halsted St & Polk St" "Wentworth Ave & Cermak Rd" ...
## $ start_station_id : chr [1:371249] "515" "13075" "TA1307000121" "13075" ...
## $ end_station_name : chr [1:371249] "University Library (NU)" "Green St & Madison St" "Green St & Madison St" "Delano Ct & Roosevelt Rd" ...
## $ end_station_id : chr [1:371249] "605" "TA1307000120" "TA1307000120" "KA1706005007" ...
## $ start_lat : num [1:371249] 42 41.9 41.9 41.9 41.9 ...
## $ start_lng : num [1:371249] -87.7 -87.6 -87.6 -87.6 -87.6 ...
## $ end_lat : num [1:371249] 42.1 41.9 41.9 41.9 41.9 ...
## $ end_lng : num [1:371249] -87.7 -87.6 -87.6 -87.6 -87.6 ...
## $ member_casual : chr [1:371249] "member" "member" "member" "casual" ...
## - attr(*, "spec")=
## .. cols(
## .. ride_id = col_character(),
## .. rideable_type = col_character(),
## .. started_at = col_datetime(format = ""),
## .. ended_at = col_datetime(format = ""),
## .. start_station_name = col_character(),
## .. start_station_id = col_character(),
## .. end_station_name = col_character(),
## .. end_station_id = col_character(),
## .. start_lat = col_double(),
## .. start_lng = col_double(),
## .. end_lat = col_double(),
## .. end_lng = col_double(),
## .. member_casual = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
str(X202205_divvy_tripdata)
## spc_tbl_ [634,858 × 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ ride_id : chr [1:634858] "EC2DE40644C6B0F4" "1C31AD03897EE385" "1542FBEC830415CF" "6FF59852924528F8" ...
## $ rideable_type : chr [1:634858] "classic_bike" "classic_bike" "classic_bike" "classic_bike" ...
## $ started_at : POSIXct[1:634858], format: "2022-05-23 23:06:58" "2022-05-11 08:53:28" ...
## $ ended_at : POSIXct[1:634858], format: "2022-05-23 23:40:19" "2022-05-11 09:31:22" ...
## $ start_station_name: chr [1:634858] "Wabash Ave & Grand Ave" "DuSable Lake Shore Dr & Monroe St" "Clinton St & Madison St" "Clinton St & Madison St" ...
## $ start_station_id : chr [1:634858] "TA1307000117" "13300" "TA1305000032" "TA1305000032" ...
## $ end_station_name : chr [1:634858] "Halsted St & Roscoe St" "Field Blvd & South Water St" "Wood St & Milwaukee Ave" "Clark St & Randolph St" ...
## $ end_station_id : chr [1:634858] "TA1309000025" "15534" "13221" "TA1305000030" ...
## $ start_lat : num [1:634858] 41.9 41.9 41.9 41.9 41.9 ...
## $ start_lng : num [1:634858] -87.6 -87.6 -87.6 -87.6 -87.6 ...
## $ end_lat : num [1:634858] 41.9 41.9 41.9 41.9 41.9 ...
## $ end_lng : num [1:634858] -87.6 -87.6 -87.7 -87.6 -87.7 ...
## $ member_casual : chr [1:634858] "member" "member" "member" "member" ...
## - attr(*, "spec")=
## .. cols(
## .. ride_id = col_character(),
## .. rideable_type = col_character(),
## .. started_at = col_datetime(format = ""),
## .. ended_at = col_datetime(format = ""),
## .. start_station_name = col_character(),
## .. start_station_id = col_character(),
## .. end_station_name = col_character(),
## .. end_station_id = col_character(),
## .. start_lat = col_double(),
## .. start_lng = col_double(),
## .. end_lat = col_double(),
## .. end_lng = col_double(),
## .. member_casual = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
str(X202206_divvy_tripdata)
## spc_tbl_ [769,204 × 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ ride_id : chr [1:769204] "600CFD130D0FD2A4" "F5E6B5C1682C6464" "B6EB6D27BAD771D2" "C9C320375DE1D5C6" ...
## $ rideable_type : chr [1:769204] "electric_bike" "electric_bike" "electric_bike" "electric_bike" ...
## $ started_at : POSIXct[1:769204], format: "2022-06-30 17:27:53" "2022-06-30 18:39:52" ...
## $ ended_at : POSIXct[1:769204], format: "2022-06-30 17:35:15" "2022-06-30 18:47:28" ...
## $ start_station_name: chr [1:769204] NA NA NA NA ...
## $ start_station_id : chr [1:769204] NA NA NA NA ...
## $ end_station_name : chr [1:769204] NA NA NA NA ...
## $ end_station_id : chr [1:769204] NA NA NA NA ...
## $ start_lat : num [1:769204] 41.9 41.9 41.9 41.8 41.9 ...
## $ start_lng : num [1:769204] -87.6 -87.6 -87.7 -87.7 -87.6 ...
## $ end_lat : num [1:769204] 41.9 41.9 41.9 41.8 41.9 ...
## $ end_lng : num [1:769204] -87.6 -87.6 -87.6 -87.7 -87.6 ...
## $ member_casual : chr [1:769204] "casual" "casual" "casual" "casual" ...
## - attr(*, "spec")=
## .. cols(
## .. ride_id = col_character(),
## .. rideable_type = col_character(),
## .. started_at = col_datetime(format = ""),
## .. ended_at = col_datetime(format = ""),
## .. start_station_name = col_character(),
## .. start_station_id = col_character(),
## .. end_station_name = col_character(),
## .. end_station_id = col_character(),
## .. start_lat = col_double(),
## .. start_lng = col_double(),
## .. end_lat = col_double(),
## .. end_lng = col_double(),
## .. member_casual = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
str(X202207_divvy_tripdata)
## spc_tbl_ [823,488 × 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ ride_id : chr [1:823488] "954144C2F67B1932" "292E027607D218B6" "57765852588AD6E0" "B5B6BE44314590E6" ...
## $ rideable_type : chr [1:823488] "classic_bike" "classic_bike" "classic_bike" "classic_bike" ...
## $ started_at : POSIXct[1:823488], format: "2022-07-05 08:12:47" "2022-07-26 12:53:38" ...
## $ ended_at : POSIXct[1:823488], format: "2022-07-05 08:24:32" "2022-07-26 12:55:31" ...
## $ start_station_name: chr [1:823488] "Ashland Ave & Blackhawk St" "Buckingham Fountain (Temp)" "Buckingham Fountain (Temp)" "Buckingham Fountain (Temp)" ...
## $ start_station_id : chr [1:823488] "13224" "15541" "15541" "15541" ...
## $ end_station_name : chr [1:823488] "Kingsbury St & Kinzie St" "Michigan Ave & 8th St" "Michigan Ave & 8th St" "Woodlawn Ave & 55th St" ...
## $ end_station_id : chr [1:823488] "KA1503000043" "623" "623" "TA1307000164" ...
## $ start_lat : num [1:823488] 41.9 41.9 41.9 41.9 41.9 ...
## $ start_lng : num [1:823488] -87.7 -87.6 -87.6 -87.6 -87.6 ...
## $ end_lat : num [1:823488] 41.9 41.9 41.9 41.8 41.9 ...
## $ end_lng : num [1:823488] -87.6 -87.6 -87.6 -87.6 -87.7 ...
## $ member_casual : chr [1:823488] "member" "casual" "casual" "casual" ...
## - attr(*, "spec")=
## .. cols(
## .. ride_id = col_character(),
## .. rideable_type = col_character(),
## .. started_at = col_datetime(format = ""),
## .. ended_at = col_datetime(format = ""),
## .. start_station_name = col_character(),
## .. start_station_id = col_character(),
## .. end_station_name = col_character(),
## .. end_station_id = col_character(),
## .. start_lat = col_double(),
## .. start_lng = col_double(),
## .. end_lat = col_double(),
## .. end_lng = col_double(),
## .. member_casual = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
str(X202208_divvy_tripdata)
## spc_tbl_ [785,932 × 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ ride_id : chr [1:785932] "550CF7EFEAE0C618" "DAD198F405F9C5F5" "E6F2BC47B65CB7FD" "F597830181C2E13C" ...
## $ rideable_type : chr [1:785932] "electric_bike" "electric_bike" "electric_bike" "electric_bike" ...
## $ started_at : POSIXct[1:785932], format: "2022-08-07 21:34:15" "2022-08-08 14:39:21" ...
## $ ended_at : POSIXct[1:785932], format: "2022-08-07 21:41:46" "2022-08-08 14:53:23" ...
## $ start_station_name: chr [1:785932] NA NA NA NA ...
## $ start_station_id : chr [1:785932] NA NA NA NA ...
## $ end_station_name : chr [1:785932] NA NA NA NA ...
## $ end_station_id : chr [1:785932] NA NA NA NA ...
## $ start_lat : num [1:785932] 41.9 41.9 42 41.9 41.9 ...
## $ start_lng : num [1:785932] -87.7 -87.6 -87.7 -87.7 -87.7 ...
## $ end_lat : num [1:785932] 41.9 41.9 42 42 41.8 ...
## $ end_lng : num [1:785932] -87.7 -87.6 -87.7 -87.7 -87.7 ...
## $ member_casual : chr [1:785932] "casual" "casual" "casual" "casual" ...
## - attr(*, "spec")=
## .. cols(
## .. ride_id = col_character(),
## .. rideable_type = col_character(),
## .. started_at = col_datetime(format = ""),
## .. ended_at = col_datetime(format = ""),
## .. start_station_name = col_character(),
## .. start_station_id = col_character(),
## .. end_station_name = col_character(),
## .. end_station_id = col_character(),
## .. start_lat = col_double(),
## .. start_lng = col_double(),
## .. end_lat = col_double(),
## .. end_lng = col_double(),
## .. member_casual = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
str(X202209_divvy_tripdata)
## spc_tbl_ [701,339 × 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ ride_id : chr [1:701339] "5156990AC19CA285" "E12D4A16BF51C274" "A02B53CD7DB72DD7" "C82E05FEE872DF11" ...
## $ rideable_type : chr [1:701339] "electric_bike" "electric_bike" "electric_bike" "electric_bike" ...
## $ started_at : POSIXct[1:701339], format: "2022-09-01 08:36:22" "2022-09-01 17:11:29" ...
## $ ended_at : POSIXct[1:701339], format: "2022-09-01 08:39:05" "2022-09-01 17:14:45" ...
## $ start_station_name: chr [1:701339] NA NA NA NA ...
## $ start_station_id : chr [1:701339] NA NA NA NA ...
## $ end_station_name : chr [1:701339] "California Ave & Milwaukee Ave" NA NA NA ...
## $ end_station_id : chr [1:701339] "13084" NA NA NA ...
## $ start_lat : num [1:701339] 41.9 41.9 41.9 41.9 41.9 ...
## $ start_lng : num [1:701339] -87.7 -87.6 -87.6 -87.7 -87.7 ...
## $ end_lat : num [1:701339] 41.9 41.9 41.9 41.9 41.9 ...
## $ end_lng : num [1:701339] -87.7 -87.6 -87.6 -87.7 -87.7 ...
## $ member_casual : chr [1:701339] "casual" "casual" "casual" "casual" ...
## - attr(*, "spec")=
## .. cols(
## .. ride_id = col_character(),
## .. rideable_type = col_character(),
## .. started_at = col_datetime(format = ""),
## .. ended_at = col_datetime(format = ""),
## .. start_station_name = col_character(),
## .. start_station_id = col_character(),
## .. end_station_name = col_character(),
## .. end_station_id = col_character(),
## .. start_lat = col_double(),
## .. start_lng = col_double(),
## .. end_lat = col_double(),
## .. end_lng = col_double(),
## .. member_casual = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
str(X202210_divvy_tripdata)
## spc_tbl_ [558,685 × 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ ride_id : chr [1:558685] "A50255C1E17942AB" "DB692A70BD2DD4E3" "3C02727AAF60F873" "47E653FDC2D99236" ...
## $ rideable_type : chr [1:558685] "classic_bike" "electric_bike" "electric_bike" "electric_bike" ...
## $ started_at : POSIXct[1:558685], format: "2022-10-14 17:13:30" "2022-10-01 16:29:26" ...
## $ ended_at : POSIXct[1:558685], format: "2022-10-14 17:19:39" "2022-10-01 16:49:06" ...
## $ start_station_name: chr [1:558685] "Noble St & Milwaukee Ave" "Damen Ave & Charleston St" "Hoyne Ave & Balmoral Ave" "Rush St & Cedar St" ...
## $ start_station_id : chr [1:558685] "13290" "13288" "655" "KA1504000133" ...
## $ end_station_name : chr [1:558685] "Larrabee St & Division St" "Damen Ave & Cullerton St" "Western Ave & Leland Ave" "Orleans St & Chestnut St (NEXT Apts)" ...
## $ end_station_id : chr [1:558685] "KA1504000079" "13089" "TA1307000140" "620" ...
## $ start_lat : num [1:558685] 41.9 41.9 42 41.9 41.9 ...
## $ start_lng : num [1:558685] -87.7 -87.7 -87.7 -87.6 -87.6 ...
## $ end_lat : num [1:558685] 41.9 41.9 42 41.9 41.9 ...
## $ end_lng : num [1:558685] -87.6 -87.7 -87.7 -87.6 -87.6 ...
## $ member_casual : chr [1:558685] "member" "casual" "member" "member" ...
## - attr(*, "spec")=
## .. cols(
## .. ride_id = col_character(),
## .. rideable_type = col_character(),
## .. started_at = col_datetime(format = ""),
## .. ended_at = col_datetime(format = ""),
## .. start_station_name = col_character(),
## .. start_station_id = col_character(),
## .. end_station_name = col_character(),
## .. end_station_id = col_character(),
## .. start_lat = col_double(),
## .. start_lng = col_double(),
## .. end_lat = col_double(),
## .. end_lng = col_double(),
## .. member_casual = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
str(X202211_divvy_tripdata)
## spc_tbl_ [337,735 × 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ ride_id : chr [1:337735] "BCC66FC6FAB27CC7" "772AB67E902C180F" "585EAD07FDEC0152" "91C4E7ED3C262FF9" ...
## $ rideable_type : chr [1:337735] "electric_bike" "classic_bike" "classic_bike" "classic_bike" ...
## $ started_at : POSIXct[1:337735], format: "2022-11-10 06:21:55" "2022-11-04 07:31:55" ...
## $ ended_at : POSIXct[1:337735], format: "2022-11-10 06:31:27" "2022-11-04 07:46:25" ...
## $ start_station_name: chr [1:337735] "Canal St & Adams St" "Canal St & Adams St" "Indiana Ave & Roosevelt Rd" "Indiana Ave & Roosevelt Rd" ...
## $ start_station_id : chr [1:337735] "13011" "13011" "SL-005" "SL-005" ...
## $ end_station_name : chr [1:337735] "St. Clair St & Erie St" "St. Clair St & Erie St" "St. Clair St & Erie St" "St. Clair St & Erie St" ...
## $ end_station_id : chr [1:337735] "13016" "13016" "13016" "13016" ...
## $ start_lat : num [1:337735] 41.9 41.9 41.9 41.9 41.9 ...
## $ start_lng : num [1:337735] -87.6 -87.6 -87.6 -87.6 -87.6 ...
## $ end_lat : num [1:337735] 41.9 41.9 41.9 41.9 41.9 ...
## $ end_lng : num [1:337735] -87.6 -87.6 -87.6 -87.6 -87.6 ...
## $ member_casual : chr [1:337735] "member" "member" "member" "member" ...
## - attr(*, "spec")=
## .. cols(
## .. ride_id = col_character(),
## .. rideable_type = col_character(),
## .. started_at = col_datetime(format = ""),
## .. ended_at = col_datetime(format = ""),
## .. start_station_name = col_character(),
## .. start_station_id = col_character(),
## .. end_station_name = col_character(),
## .. end_station_id = col_character(),
## .. start_lat = col_double(),
## .. start_lng = col_double(),
## .. end_lat = col_double(),
## .. end_lng = col_double(),
## .. member_casual = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
str(X202212_divvy_tripdata)
## spc_tbl_ [181,806 × 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ ride_id : chr [1:181806] "65DBD2F447EC51C2" "0C201AA7EA0EA1AD" "E0B148CCB358A49D" "54C5775D2B7C9188" ...
## $ rideable_type : chr [1:181806] "electric_bike" "classic_bike" "electric_bike" "classic_bike" ...
## $ started_at : POSIXct[1:181806], format: "2022-12-05 10:47:18" "2022-12-18 06:42:33" ...
## $ ended_at : POSIXct[1:181806], format: "2022-12-05 10:56:34" "2022-12-18 07:08:44" ...
## $ start_station_name: chr [1:181806] "Clifton Ave & Armitage Ave" "Broadway & Belmont Ave" "Sangamon St & Lake St" "Shields Ave & 31st St" ...
## $ start_station_id : chr [1:181806] "TA1307000163" "13277" "TA1306000015" "KA1503000038" ...
## $ end_station_name : chr [1:181806] "Sedgwick St & Webster Ave" "Sedgwick St & Webster Ave" "St. Clair St & Erie St" "Damen Ave & Madison St" ...
## $ end_station_id : chr [1:181806] "13191" "13191" "13016" "13134" ...
## $ start_lat : num [1:181806] 41.9 41.9 41.9 41.8 41.9 ...
## $ start_lng : num [1:181806] -87.7 -87.6 -87.7 -87.6 -87.7 ...
## $ end_lat : num [1:181806] 41.9 41.9 41.9 41.9 41.9 ...
## $ end_lng : num [1:181806] -87.6 -87.6 -87.6 -87.7 -87.7 ...
## $ member_casual : chr [1:181806] "member" "casual" "member" "member" ...
## - attr(*, "spec")=
## .. cols(
## .. ride_id = col_character(),
## .. rideable_type = col_character(),
## .. started_at = col_datetime(format = ""),
## .. ended_at = col_datetime(format = ""),
## .. start_station_name = col_character(),
## .. start_station_id = col_character(),
## .. end_station_name = col_character(),
## .. end_station_id = col_character(),
## .. start_lat = col_double(),
## .. start_lng = col_double(),
## .. end_lat = col_double(),
## .. end_lng = col_double(),
## .. member_casual = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
Combining / Joining all 12 data set into one single data set.
x2022_trips <-bind_rows(X202201_divvy_tripdata,
X202202_divvy_tripdata,
X202203_divvy_tripdata,
X202204_divvy_tripdata,
X202205_divvy_tripdata,
X202206_divvy_tripdata,
X202207_divvy_tripdata,
X202208_divvy_tripdata,
X202209_divvy_tripdata,
X202210_divvy_tripdata,
X202211_divvy_tripdata,
X202212_divvy_tripdata)
Quick inspection that everything was joined into a single data set
glimpse(x2022_trips)
## Rows: 5,667,717
## Columns: 13
## $ ride_id <chr> "C2F7DD78E82EC875", "A6CF8980A652D272", "BD0F91DFF7…
## $ rideable_type <chr> "electric_bike", "electric_bike", "classic_bike", "…
## $ started_at <dttm> 2022-01-13 11:59:47, 2022-01-10 08:41:56, 2022-01-…
## $ ended_at <dttm> 2022-01-13 12:02:44, 2022-01-10 08:46:17, 2022-01-…
## $ start_station_name <chr> "Glenwood Ave & Touhy Ave", "Glenwood Ave & Touhy A…
## $ start_station_id <chr> "525", "525", "TA1306000016", "KA1504000151", "TA13…
## $ end_station_name <chr> "Clark St & Touhy Ave", "Clark St & Touhy Ave", "Gr…
## $ end_station_id <chr> "RP-007", "RP-007", "TA1307000001", "TA1309000021",…
## $ start_lat <dbl> 42.01280, 42.01276, 41.92560, 41.98359, 41.87785, 4…
## $ start_lng <dbl> -87.66591, -87.66597, -87.65371, -87.66915, -87.624…
## $ end_lat <dbl> 42.01256, 42.01256, 41.92533, 41.96151, 41.88462, 4…
## $ end_lng <dbl> -87.67437, -87.67437, -87.66580, -87.67139, -87.627…
## $ member_casual <chr> "casual", "casual", "member", "casual", "member", "…
Removing columns that aren’t going to be useful for analyst
x2022_trips <- x2022_trips %>%
select(-c(start_lat, start_lng, end_lat, end_lng, start_station_id, end_station_id, ride_id))
Formatting and Creating Date columns to make it easier to analyze.
x2022_trips$date <- as.Date(x2022_trips$started_at) # Creating for Date using the started_at column
x2022_trips$month <- format(as.Date(x2022_trips$date), "%b") # Creating column for month in abbreviated Month name format
x2022_trips$day <- format(as.Date(x2022_trips$date), "%d") # Creating column for Day in "dd" format
x2022_trips$year <- format(as.Date(x2022_trips$date), "%Y")# Creating column for year in "yyyy" format
x2022_trips$day_of_week <- format(as.Date(x2022_trips$date), "%A") #creating column for Days of the week in Day of the week format
Checking if the columns have been added
glimpse(x2022_trips)
## Rows: 5,667,717
## Columns: 11
## $ rideable_type <chr> "electric_bike", "electric_bike", "classic_bike", "…
## $ started_at <dttm> 2022-01-13 11:59:47, 2022-01-10 08:41:56, 2022-01-…
## $ ended_at <dttm> 2022-01-13 12:02:44, 2022-01-10 08:46:17, 2022-01-…
## $ start_station_name <chr> "Glenwood Ave & Touhy Ave", "Glenwood Ave & Touhy A…
## $ end_station_name <chr> "Clark St & Touhy Ave", "Clark St & Touhy Ave", "Gr…
## $ member_casual <chr> "casual", "casual", "member", "casual", "member", "…
## $ date <date> 2022-01-13, 2022-01-10, 2022-01-25, 2022-01-04, 20…
## $ month <chr> "Jan", "Jan", "Jan", "Jan", "Jan", "Jan", "Jan", "J…
## $ day <chr> "13", "10", "25", "04", "20", "11", "30", "22", "17…
## $ year <chr> "2022", "2022", "2022", "2022", "2022", "2022", "20…
## $ day_of_week <chr> "Thursday", "Monday", "Tuesday", "Tuesday", "Thursd…
Creating Column of the calculated ride length in secs The code is telling the computer to make a new column named “ride_length” by using the calculating the difference in time between “ended_at” and “started_at” columns
x2022_trips$ride_length <- difftime(x2022_trips$ended_at,x2022_trips$started_at)
Checking if the values is numeric or a factor
is.factor(x2022_trips$ride_length)
## [1] FALSE
converting “ride_lenght” from factor to numeric. Just in case it isn’t.
x2022_trips$ride_length <- as.numeric(as.character(x2022_trips$ride_length))
is.numeric(x2022_trips$ride_length)
## [1] TRUE
removing “Bad data” a few bikes have negative ride times because they were taken out of the docks for qualtity checks.
#removing values that have HQ QR in the station names because those are the tags the company used to do quality checks
x2022_tripsV2 <- x2022_trips[!(x2022_trips$start_station_name == "HQ QR" | x2022_trips$ride_length<0),]
removing any N/A values
x2022_tripsV2 %>%
drop_na()
## # A tibble: 4,369,291 × 12
## rideable_type started_at ended_at start_station_name
## <chr> <dttm> <dttm> <chr>
## 1 electric_bike 2022-01-13 11:59:47 2022-01-13 12:02:44 Glenwood Ave & Touhy A…
## 2 electric_bike 2022-01-10 08:41:56 2022-01-10 08:46:17 Glenwood Ave & Touhy A…
## 3 classic_bike 2022-01-25 04:53:40 2022-01-25 04:58:01 Sheffield Ave & Fuller…
## 4 classic_bike 2022-01-04 00:18:04 2022-01-04 00:33:00 Clark St & Bryn Mawr A…
## 5 classic_bike 2022-01-20 01:31:10 2022-01-20 01:37:12 Michigan Ave & Jackson…
## 6 classic_bike 2022-01-11 18:48:09 2022-01-11 18:51:31 Wood St & Chicago Ave
## 7 classic_bike 2022-01-30 18:32:52 2022-01-30 18:49:26 Oakley Ave & Irving Pa…
## 8 classic_bike 2022-01-22 12:20:02 2022-01-22 12:32:06 Sheffield Ave & Fuller…
## 9 electric_bike 2022-01-17 07:34:41 2022-01-17 08:00:08 Racine Ave & 15th St
## 10 classic_bike 2022-01-28 15:27:53 2022-01-28 15:35:16 LaSalle St & Jackson B…
## # ℹ 4,369,281 more rows
## # ℹ 8 more variables: end_station_name <chr>, member_casual <chr>, date <date>,
## # month <chr>, day <chr>, year <chr>, day_of_week <chr>, ride_length <dbl>
Now making sure that the days of the week are in proper order so the week starts on Monday to Sunday
x2022_tripsV2$day_of_week <- ordered(x2022_tripsV2$day_of_week,
levels=c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"))
Now making sure that the months are in proper order before analyzing the data.
x2022_tripsV2$month <- ordered(x2022_tripsV2$month,
levels=c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"))
One finale check before analyzing the data
glimpse(x2022_tripsV2)
## Rows: 5,667,617
## Columns: 12
## $ rideable_type <chr> "electric_bike", "electric_bike", "classic_bike", "…
## $ started_at <dttm> 2022-01-13 11:59:47, 2022-01-10 08:41:56, 2022-01-…
## $ ended_at <dttm> 2022-01-13 12:02:44, 2022-01-10 08:46:17, 2022-01-…
## $ start_station_name <chr> "Glenwood Ave & Touhy Ave", "Glenwood Ave & Touhy A…
## $ end_station_name <chr> "Clark St & Touhy Ave", "Clark St & Touhy Ave", "Gr…
## $ member_casual <chr> "casual", "casual", "member", "casual", "member", "…
## $ date <date> 2022-01-13, 2022-01-10, 2022-01-25, 2022-01-04, 20…
## $ month <ord> Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, J…
## $ day <chr> "13", "10", "25", "04", "20", "11", "30", "22", "17…
## $ year <chr> "2022", "2022", "2022", "2022", "2022", "2022", "20…
## $ day_of_week <ord> Thursday, Monday, Tuesday, Tuesday, Thursday, Tuesd…
## $ ride_length <dbl> 177, 261, 261, 896, 362, 202, 994, 724, 1527, 443, …
Checking the Min, median,mean, and max of duration of rid for members and casual riders through out the week.
x2022_tripsV2 %>%
aggregate(ride_length ~ member_casual+day_of_week, FUN = min)
## member_casual day_of_week ride_length
## 1 casual Monday 0
## 2 member Monday 0
## 3 casual Tuesday 0
## 4 member Tuesday 0
## 5 casual Wednesday 0
## 6 member Wednesday 0
## 7 casual Thursday 0
## 8 member Thursday 0
## 9 casual Friday 0
## 10 member Friday 0
## 11 casual Saturday 0
## 12 member Saturday 0
## 13 casual Sunday 0
## 14 member Sunday 0
x2022_tripsV2 %>%
aggregate(ride_length ~ member_casual+day_of_week, FUN = median)
## member_casual day_of_week ride_length
## 1 casual Monday 809
## 2 member Monday 513
## 3 casual Tuesday 717
## 4 member Tuesday 514
## 5 casual Wednesday 711
## 6 member Wednesday 521
## 7 casual Thursday 730
## 8 member Thursday 525
## 9 casual Friday 778
## 10 member Friday 528
## 11 casual Saturday 939
## 12 member Saturday 598
## 13 casual Sunday 946
## 14 member Sunday 584
x2022_tripsV2 %>%
aggregate(ride_length ~ member_casual+day_of_week, FUN = mean)
## member_casual day_of_week ride_length
## 1 casual Monday 1903.7340
## 2 member Monday 743.0153
## 3 casual Tuesday 1684.9592
## 4 member Tuesday 733.1064
## 5 casual Wednesday 1614.0556
## 6 member Wednesday 731.6995
## 7 casual Thursday 1665.2090
## 8 member Thursday 743.8793
## 9 casual Friday 1837.6679
## 10 member Friday 758.3058
## 11 casual Saturday 2117.0493
## 12 member Saturday 862.1894
## 13 casual Sunday 2216.0519
## 14 member Sunday 854.8404
x2022_tripsV2 %>%
aggregate(ride_length ~ member_casual+day_of_week, FUN = max)
## member_casual day_of_week ride_length
## 1 casual Monday 1922127
## 2 member Monday 89997
## 3 casual Tuesday 1865151
## 4 member Tuesday 89997
## 5 casual Wednesday 2149238
## 6 member Wednesday 89997
## 7 casual Thursday 1861410
## 8 member Thursday 89997
## 9 casual Friday 1944178
## 10 member Friday 89998
## 11 casual Saturday 2483235
## 12 member Saturday 93594
## 13 casual Sunday 2175468
## 14 member Sunday 89997
Counting the number of rides by how many trips happen on a daily basis by the day of the week
x2022_tripsV2 %>%
group_by(member_casual, day_of_week) %>%
summarise(number_of_rides = n()) %>%
arrange(member_casual, day_of_week)
## # A tibble: 15 × 3
## # Groups: member_casual [3]
## member_casual day_of_week number_of_rides
## <chr> <ord> <int>
## 1 casual Monday 236453
## 2 casual Tuesday 222772
## 3 casual Wednesday 231211
## 4 casual Thursday 260663
## 5 casual Friday 281815
## 6 casual Saturday 407331
## 7 casual Sunday 334337
## 8 member Monday 407786
## 9 member Tuesday 447953
## 10 member Wednesday 451320
## 11 member Thursday 455968
## 12 member Friday 396498
## 13 member Saturday 373592
## 14 member Sunday 326877
## 15 <NA> <NA> 833041
Counting the number of rides by how many trip happen on a monthly basis by Month
x2022_tripsV2 %>%
group_by(member_casual, month) %>%
summarise(number_of_rides = n()) %>%
arrange(member_casual, month)
## # A tibble: 25 × 3
## # Groups: member_casual [3]
## member_casual month number_of_rides
## <chr> <ord> <int>
## 1 casual Jan 14626
## 2 casual Feb 17362
## 3 casual Mar 75367
## 4 casual Apr 103456
## 5 casual May 243132
## 6 casual Jun 322196
## 7 casual Jul 349402
## 8 casual Aug 305106
## 9 casual Sep 250423
## 10 casual Oct 173186
## # ℹ 15 more rows
Creating a bar graphs to show the Number of rides by the Days of the week
x2022_tripsV2 %>%
group_by(member_casual, day_of_week) %>%
summarise(number_of_rides = n()
,average_duration = mean(ride_length)) %>%
arrange(member_casual, day_of_week) %>%
ggplot(aes(x = day_of_week, y = number_of_rides, fill = member_casual)) +
geom_col(position = "dodge") +
scale_fill_manual(values= c("#2990A6","#014A50")) +
scale_y_continuous(labels = label_comma()) +
labs(title = " Number Rides ", subtitle = "By the Days of the week" ) +
theme(plot.title = element_text(hjust = 0.5)) +
theme(plot.subtitle = element_text(hjust = 0.5))
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
Creating a bar graph to show the Number of rides by Month
x2022_tripsV2 %>%
group_by(member_casual, month) %>%
summarise(number_of_rides = n()
,average_duration = mean(ride_length)) %>%
arrange(member_casual, month) %>%
ggplot(aes(x = month, y = number_of_rides, fill = member_casual)) +
geom_col(position = "dodge") +
scale_fill_manual(values= c("#2990A6","#014A50")) +
scale_y_continuous(labels = label_comma()) +
labs(title = " Number Rides ", subtitle = "By Month" ) +
theme(plot.title = element_text(hjust = 0.5)) +
theme(plot.subtitle = element_text(hjust = 0.5))
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
Creating a Bar graph to show the Average ride length Daily in Minutes
x2022_tripsV2 %>%
group_by(member_casual, day_of_week) %>%
summarise(number_of_rides = n()
,average_duration = mean(ride_length/60)) %>%
arrange(member_casual, day_of_week) %>%
ggplot(aes(x = day_of_week, y = average_duration, fill = member_casual)) +
geom_col(position = "dodge") +
scale_fill_manual(values= c("#2990A6","#014A50")) +
scale_y_continuous(labels = label_comma()) +
labs(title = "Average Ride Length ", subtitle = "(Daily) In Minutes" ) +
theme(plot.title = element_text(hjust = 0.5)) +
theme(plot.subtitle = element_text(hjust = 0.5))
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
## Warning: Removed 1 rows containing missing values (`geom_col()`).
Creating a Bar graph to show the Average ride length Monthly in Minutes
x2022_tripsV2 %>%
group_by(member_casual, month) %>%
summarise(number_of_rides = n()# Counting the columns to calculate the Number of rides
,average_duration = mean(ride_length/60)) %>% # calculating the average amount of ride length converted into minutes
arrange(member_casual, month) %>% # Arranging the data by member_causual and by month
ggplot(aes(x = month, y = average_duration, fill = member_casual)) + # Plotting to make a bargraph of data
geom_col(position = "dodge") +
scale_fill_manual(values= c("#2990A6","#014A50")) +
scale_y_continuous(labels = label_comma()) +
labs(title = "Average Ride Length ", subtitle = "(Monthly) In Minutes" ) +
theme(plot.title = element_text(hjust = 0.5)) +
theme(plot.subtitle = element_text(hjust = 0.5))
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
## Warning: Removed 1 rows containing missing values (`geom_col()`).