INSTALL PACKAGES.
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
##
## Attaching package: 'data.table'
##
## The following objects are masked from 'package:lubridate':
##
## hour, isoweek, mday, minute, month, quarter, second, wday, week,
## yday, year
##
## The following objects are masked from 'package:dplyr':
##
## between, first, last
##
## The following object is masked from 'package:purrr':
##
## transpose
##
## Attaching package: 'hms'
##
## The following object is masked from 'package:lubridate':
##
## hms
## here() starts at C:/Users/SWill/Documents/APR TO JUN CYCLISTIC BIKES
library(skimr)
library(janitor)
##
## Attaching package: 'janitor'
##
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(conflicted)
library(gtsummary)
library(scales)
library(RColorBrewer)
library(ggthemes)
SCIENTIFIC NOTATION RUINING YOUR GGPLOT CHARTS? TRY THE LINE OF CODE
BELOW
USE ‘getwd()’ FUNCTION TO DISPLAY WORKING DIRECTORY.
## [1] "C:/Users/SWill/Documents/APR TO JUN CYCLISTIC BIKES"
USE ‘setwd()’ FUNCTION TO SET WORKING DIRECTORY TO SIMPLIFY CALLS TO
DATA.
setwd("C:/Users/SWill/Documents/APR TO JUN CYCLISTIC BIKES")
USE ‘spec_csv()’ FUNCTION TO CHECK THE DATA TYPES BEFORE READING THE
DATA.
NOTICE ‘started_at’ AND ‘ended_at’ COLUMNS ARE ‘datetime’ DATA
TYPE.
spec_csv("C:/Users/SWill/Desktop/CYCLISTIC BIKES/divvy-trip-data 01-12/202104-divvy-tripdata.csv")
## cols(
## ride_id = col_character(),
## rideable_type = col_character(),
## started_at = col_datetime(format = ""),
## ended_at = col_datetime(format = ""),
## start_station_name = col_character(),
## start_station_id = col_character(),
## end_station_name = col_character(),
## end_station_id = col_character(),
## start_lat = col_double(),
## start_lng = col_double(),
## end_lat = col_double(),
## end_lng = col_double(),
## member_casual = col_character()
## )
spec_csv("C:/Users/SWill/Desktop/CYCLISTIC BIKES/divvy-trip-data 01-12/202105-divvy-tripdata.csv")
## cols(
## ride_id = col_character(),
## rideable_type = col_character(),
## started_at = col_datetime(format = ""),
## ended_at = col_datetime(format = ""),
## start_station_name = col_character(),
## start_station_id = col_character(),
## end_station_name = col_character(),
## end_station_id = col_character(),
## start_lat = col_double(),
## start_lng = col_double(),
## end_lat = col_double(),
## end_lng = col_double(),
## member_casual = col_character()
## )
spec_csv("C:/Users/SWill/Desktop/CYCLISTIC BIKES/divvy-trip-data 01-12/202106-divvy-tripdata.csv")
## cols(
## ride_id = col_character(),
## rideable_type = col_character(),
## started_at = col_datetime(format = ""),
## ended_at = col_datetime(format = ""),
## start_station_name = col_character(),
## start_station_id = col_character(),
## end_station_name = col_character(),
## end_station_id = col_character(),
## start_lat = col_double(),
## start_lng = col_double(),
## end_lat = col_double(),
## end_lng = col_double(),
## member_casual = col_character()
## )
UPLOAD DATASETS divvy-trip-data.csv FILES.
df_04 <- read.csv("C:/Users/SWill/Desktop/CYCLISTIC BIKES/divvy-trip-data 01-12/202104-divvy-tripdata.csv")
df_05 <- read.csv("C:/Users/SWill/Desktop/CYCLISTIC BIKES/divvy-trip-data 01-12/202105-divvy-tripdata.csv")
df_06 <- read.csv("C:/Users/SWill/Desktop/CYCLISTIC BIKES/divvy-trip-data 01-12/202106-divvy-tripdata.csv")
USE ‘bind_rows()’ FUNCTION TO STACK DATA FRAMES INTO ONE BIG DATA
FRAME.
apr_to_jun <- bind_rows(df_04,df_05,df_06)
CHECK COLUMNS.
## [1] "ride_id" "rideable_type" "started_at"
## [4] "ended_at" "start_station_name" "start_station_id"
## [7] "end_station_name" "end_station_id" "start_lat"
## [10] "start_lng" "end_lat" "end_lng"
## [13] "member_casual"
USE ‘glimpse()’ FUNCTION TO GET A BETTER UNDERSTANDING OF THE
DATA.
Rows: 1,598,458 Columns: 13
COLUMNS ‘started_at’ AND ‘ended_at’ ARE NOW ‘character’ DATA
TYPE.
COLUMNS ‘end_station_name’ AND ‘end_station_id’ HAVE BLANK ROWS THAT
NEED TO BE REMOVED.
## Rows: 1,598,458
## Columns: 13
## $ ride_id <chr> "6C992BD37A98A63F", "1E0145613A209000", "E498E15508…
## $ rideable_type <chr> "classic_bike", "docked_bike", "docked_bike", "clas…
## $ started_at <chr> "2021-04-12 18:25:36", "2021-04-27 17:27:11", "2021…
## $ ended_at <chr> "2021-04-12 18:56:55", "2021-04-27 18:31:29", "2021…
## $ start_station_name <chr> "State St & Pearson St", "Dorchester Ave & 49th St"…
## $ start_station_id <chr> "TA1307000061", "KA1503000069", "20121", "TA1305000…
## $ end_station_name <chr> "Southport Ave & Waveland Ave", "Dorchester Ave & 4…
## $ end_station_id <chr> "13235", "KA1503000069", "20121", "13235", "20121",…
## $ start_lat <dbl> 41.89745, 41.80577, 41.74149, 41.90312, 41.74149, 4…
## $ start_lng <dbl> -87.62872, -87.59246, -87.65841, -87.67394, -87.658…
## $ end_lat <dbl> 41.94815, 41.80577, 41.74149, 41.94815, 41.74149, 4…
## $ end_lng <dbl> -87.66394, -87.59246, -87.65841, -87.66394, -87.658…
## $ member_casual <chr> "member", "casual", "casual", "member", "casual", "…
USE ‘str()’ FUNCTION TO SEE LIST OF COLUMNS AND DATA TYPES NUMERIC,
CHARACTER, DATETIME ETC.
‘data.frame’: 1598458 obs. of 13 variables:
## 'data.frame': 1598458 obs. of 13 variables:
## $ ride_id : chr "6C992BD37A98A63F" "1E0145613A209000" "E498E15508A80BAD" "1887262AD101C604" ...
## $ rideable_type : chr "classic_bike" "docked_bike" "docked_bike" "classic_bike" ...
## $ started_at : chr "2021-04-12 18:25:36" "2021-04-27 17:27:11" "2021-04-03 12:42:45" "2021-04-17 09:17:42" ...
## $ ended_at : chr "2021-04-12 18:56:55" "2021-04-27 18:31:29" "2021-04-07 11:40:24" "2021-04-17 09:42:48" ...
## $ start_station_name: chr "State St & Pearson St" "Dorchester Ave & 49th St" "Loomis Blvd & 84th St" "Honore St & Division St" ...
## $ start_station_id : chr "TA1307000061" "KA1503000069" "20121" "TA1305000034" ...
## $ end_station_name : chr "Southport Ave & Waveland Ave" "Dorchester Ave & 49th St" "Loomis Blvd & 84th St" "Southport Ave & Waveland Ave" ...
## $ end_station_id : chr "13235" "KA1503000069" "20121" "13235" ...
## $ start_lat : num 41.9 41.8 41.7 41.9 41.7 ...
## $ start_lng : num -87.6 -87.6 -87.7 -87.7 -87.7 ...
## $ end_lat : num 41.9 41.8 41.7 41.9 41.7 ...
## $ end_lng : num -87.7 -87.6 -87.7 -87.7 -87.7 ...
## $ member_casual : chr "member" "casual" "casual" "member" ...
USE TIDYR TO SEPARATE “started_at” COLUMN TO A NEW COLUMN CALLED
“start_date” and “start_time”.
USE TIDYR TO SEPARATE “ended_at” COLUMN TO A NEW COLUMN CALLED
“end_date” and “end_time”.
apr_to_jun <- tidyr::separate(apr_to_jun, started_at, c("start_date", "start_time"), sep = " ", remove = FALSE)
apr_to_jun <- tidyr::separate(apr_to_jun, ended_at, c("end_date", "end_time"), sep = " ", remove = FALSE)
CHECK NEW COLUMNS.
## [1] "ride_id" "rideable_type" "started_at"
## [4] "start_date" "start_time" "ended_at"
## [7] "end_date" "end_time" "start_station_name"
## [10] "start_station_id" "end_station_name" "end_station_id"
## [13] "start_lat" "start_lng" "end_lat"
## [16] "end_lng" "member_casual"
‘’data.frame’: 1598458 obs. of 17 variables:
## 'data.frame': 1598458 obs. of 17 variables:
## $ ride_id : chr "6C992BD37A98A63F" "1E0145613A209000" "E498E15508A80BAD" "1887262AD101C604" ...
## $ rideable_type : chr "classic_bike" "docked_bike" "docked_bike" "classic_bike" ...
## $ started_at : chr "2021-04-12 18:25:36" "2021-04-27 17:27:11" "2021-04-03 12:42:45" "2021-04-17 09:17:42" ...
## $ start_date : chr "2021-04-12" "2021-04-27" "2021-04-03" "2021-04-17" ...
## $ start_time : chr "18:25:36" "17:27:11" "12:42:45" "09:17:42" ...
## $ ended_at : chr "2021-04-12 18:56:55" "2021-04-27 18:31:29" "2021-04-07 11:40:24" "2021-04-17 09:42:48" ...
## $ end_date : chr "2021-04-12" "2021-04-27" "2021-04-07" "2021-04-17" ...
## $ end_time : chr "18:56:55" "18:31:29" "11:40:24" "09:42:48" ...
## $ start_station_name: chr "State St & Pearson St" "Dorchester Ave & 49th St" "Loomis Blvd & 84th St" "Honore St & Division St" ...
## $ start_station_id : chr "TA1307000061" "KA1503000069" "20121" "TA1305000034" ...
## $ end_station_name : chr "Southport Ave & Waveland Ave" "Dorchester Ave & 49th St" "Loomis Blvd & 84th St" "Southport Ave & Waveland Ave" ...
## $ end_station_id : chr "13235" "KA1503000069" "20121" "13235" ...
## $ start_lat : num 41.9 41.8 41.7 41.9 41.7 ...
## $ start_lng : num -87.6 -87.6 -87.7 -87.7 -87.7 ...
## $ end_lat : num 41.9 41.8 41.7 41.9 41.7 ...
## $ end_lng : num -87.7 -87.6 -87.7 -87.7 -87.7 ...
## $ member_casual : chr "member" "casual" "casual" "member" ...
COLUMN RIDEABLE TYPE.
EXPLORE…CHARACTER VARIABLE TYPE IN “rideable_type” COLUMN.
USE ‘class’ FUNCTION TO CHECK DATA TYPE IN COLUMN.
class(apr_to_jun$rideable_type)
## [1] "character"
USE ‘unique ()’ FUNCTION TO FIND INDIVIDUAL VALUES IN COLUMN.
unique(apr_to_jun$rideable_type)
## [1] "classic_bike" "docked_bike" "electric_bike"
HOW MANY OBSERVATIONS FALL UNDER EACH USER TYPE?
table(apr_to_jun$rideable_type)
##
## classic_bike docked_bike electric_bike
## 958732 119783 519943
sort(table(apr_to_jun$rideable_type), decreasing = TRUE)
##
## classic_bike electric_bike docked_bike
## 958732 519943 119783
BAR PLOT OF DATA DISTRIBUTION OF ‘rideable_type’ COLUMN.
barplot(sort(table(apr_to_jun$rideable_type), decreasing = TRUE))

CHANGE VARIABLE FROM CHARACTER TO FACTOR.
apr_to_jun$rideable_type <- as.factor(apr_to_jun$rideable_type)
USE ‘class’ FUNCTION TO CHECK DATA TYPE IN COLUMN.
class(apr_to_jun$rideable_type)
## [1] "factor"
USE ‘levels’ FUNCTION TO CHECK FACTOR.
levels(apr_to_jun$rideable_type)
## [1] "classic_bike" "docked_bike" "electric_bike"
NOTE RIDEABLE TYPE IS NOW A FACTOR.
## Rows: 1,598,458
## Columns: 17
## $ ride_id <chr> "6C992BD37A98A63F", "1E0145613A209000", "E498E15508…
## $ rideable_type <fct> classic_bike, docked_bike, docked_bike, classic_bik…
## $ started_at <chr> "2021-04-12 18:25:36", "2021-04-27 17:27:11", "2021…
## $ start_date <chr> "2021-04-12", "2021-04-27", "2021-04-03", "2021-04-…
## $ start_time <chr> "18:25:36", "17:27:11", "12:42:45", "09:17:42", "12…
## $ ended_at <chr> "2021-04-12 18:56:55", "2021-04-27 18:31:29", "2021…
## $ end_date <chr> "2021-04-12", "2021-04-27", "2021-04-07", "2021-04-…
## $ end_time <chr> "18:56:55", "18:31:29", "11:40:24", "09:42:48", "14…
## $ start_station_name <chr> "State St & Pearson St", "Dorchester Ave & 49th St"…
## $ start_station_id <chr> "TA1307000061", "KA1503000069", "20121", "TA1305000…
## $ end_station_name <chr> "Southport Ave & Waveland Ave", "Dorchester Ave & 4…
## $ end_station_id <chr> "13235", "KA1503000069", "20121", "13235", "20121",…
## $ start_lat <dbl> 41.89745, 41.80577, 41.74149, 41.90312, 41.74149, 4…
## $ start_lng <dbl> -87.62872, -87.59246, -87.65841, -87.67394, -87.658…
## $ end_lat <dbl> 41.94815, 41.80577, 41.74149, 41.94815, 41.74149, 4…
## $ end_lng <dbl> -87.66394, -87.59246, -87.65841, -87.66394, -87.658…
## $ member_casual <chr> "member", "casual", "casual", "member", "casual", "…
COLUMN STARTED_AT AND ENDED_AT.
EXPLORE…CHARACTER VARIABLE TYPE IN “started_at” AND ended_at”
COLUMN.
DATA TYPE IN COLUMN “started_at” AND “end_at” WAS DATETIME BEFORE
UPLOADING.
CONVERT “started_at” AND “ended_at” COLUMN FROM CHARACTER TO
DATETIME
apr_to_jun$started_at <- as.POSIXlt(apr_to_jun$started_at, format="%Y-%m-%d %H:%M:%S", tz="UTC")
apr_to_jun$ended_at <- as.POSIXlt(apr_to_jun$ended_at, format="%Y-%m-%d %H:%M:%S", tz="UTC")
USE ‘class’ FUNCTION TO CHECK DATA TYPE IN COLUMN.
class(apr_to_jun$start_date)
## [1] "POSIXlt" "POSIXt"
USE ‘class’ FUNCTION TO CHECK DATA TYPE IN COLUMN.
class(apr_to_jun$end_date)
## [1] "POSIXlt" "POSIXt"
USE ‘str()’ FUNCTION TO SEE LIST OF COLUMNS AND DATA TYPES NUMERIC,
CHARACTER, DATETIME ETC.
‘started_at’AND ’ended_at’ CHARACTER DATA TYPE IS NOW POSIXlt.
‘data.frame’: 1598458 obs. of 17 variables:
## 'data.frame': 1598458 obs. of 17 variables:
## $ ride_id : chr "6C992BD37A98A63F" "1E0145613A209000" "E498E15508A80BAD" "1887262AD101C604" ...
## $ rideable_type : Factor w/ 3 levels "classic_bike",..: 1 2 2 1 2 1 1 3 1 1 ...
## $ started_at : POSIXlt, format: "2021-04-12 18:25:36" "2021-04-27 17:27:11" ...
## $ start_date : POSIXlt, format: "2021-04-12" "2021-04-27" ...
## $ start_time : chr "18:25:36" "17:27:11" "12:42:45" "09:17:42" ...
## $ ended_at : POSIXlt, format: "2021-04-12 18:56:55" "2021-04-27 18:31:29" ...
## $ end_date : POSIXlt, format: "2021-04-12" "2021-04-27" ...
## $ end_time : chr "18:56:55" "18:31:29" "11:40:24" "09:42:48" ...
## $ start_station_name: chr "State St & Pearson St" "Dorchester Ave & 49th St" "Loomis Blvd & 84th St" "Honore St & Division St" ...
## $ start_station_id : chr "TA1307000061" "KA1503000069" "20121" "TA1305000034" ...
## $ end_station_name : chr "Southport Ave & Waveland Ave" "Dorchester Ave & 49th St" "Loomis Blvd & 84th St" "Southport Ave & Waveland Ave" ...
## $ end_station_id : chr "13235" "KA1503000069" "20121" "13235" ...
## $ start_lat : num 41.9 41.8 41.7 41.9 41.7 ...
## $ start_lng : num -87.6 -87.6 -87.7 -87.7 -87.7 ...
## $ end_lat : num 41.9 41.8 41.7 41.9 41.7 ...
## $ end_lng : num -87.7 -87.6 -87.7 -87.7 -87.7 ...
## $ member_casual : chr "member" "casual" "casual" "member" ...
COLUMN START_STATION_NAME START_STATION_ID END_STATION_NAME AND
END_STATION_ID.
EXPLORE…CHARACTER VARIABLE TYPE IN “start_staion_name” AND
“end_staion_name”
REPLACE ALL BLANK VALUES IN “start_station_name” COLUMN WITH NA
VALUES.
apr_to_jun$start_station_name[apr_to_jun$start_station_name==""] <- NA
REPLACE ALL BLANK VALUES IN “start_station_id” COLUMN WITH NA
VALUES.
apr_to_jun$start_station_id[apr_to_jun$start_station_id==""] <- NA
REPLACE ALL BLANK VALUES IN “end_station_name” COLUMN WITH NA
VALUES.
apr_to_jun$end_station_name[apr_to_jun$end_station_name==""] <- NA
REPLACE ALL BLANK VALUES IN “end_station_id” COLUMN WITH NA
VALUES.
apr_to_jun$end_station_id[apr_to_jun$end_station_id==""] <- NA
## Rows: 1,598,458
## Columns: 17
## $ ride_id <chr> "6C992BD37A98A63F", "1E0145613A209000", "E498E15508…
## $ rideable_type <fct> classic_bike, docked_bike, docked_bike, classic_bik…
## $ started_at <dttm> 2021-04-12 18:25:36, 2021-04-27 17:27:11, 2021-04-…
## $ start_date <dttm> 2021-04-12, 2021-04-27, 2021-04-03, 2021-04-17, 20…
## $ start_time <chr> "18:25:36", "17:27:11", "12:42:45", "09:17:42", "12…
## $ ended_at <dttm> 2021-04-12 18:56:55, 2021-04-27 18:31:29, 2021-04-…
## $ end_date <dttm> 2021-04-12, 2021-04-27, 2021-04-07, 2021-04-17, 20…
## $ end_time <chr> "18:56:55", "18:31:29", "11:40:24", "09:42:48", "14…
## $ start_station_name <chr> "State St & Pearson St", "Dorchester Ave & 49th St"…
## $ start_station_id <chr> "TA1307000061", "KA1503000069", "20121", "TA1305000…
## $ end_station_name <chr> "Southport Ave & Waveland Ave", "Dorchester Ave & 4…
## $ end_station_id <chr> "13235", "KA1503000069", "20121", "13235", "20121",…
## $ start_lat <dbl> 41.89745, 41.80577, 41.74149, 41.90312, 41.74149, 4…
## $ start_lng <dbl> -87.62872, -87.59246, -87.65841, -87.67394, -87.658…
## $ end_lat <dbl> 41.94815, 41.80577, 41.74149, 41.94815, 41.74149, 4…
## $ end_lng <dbl> -87.66394, -87.59246, -87.65841, -87.66394, -87.658…
## $ member_casual <chr> "member", "casual", "casual", "member", "casual", "…
REMOVE ROWS WITH NA VALUES IN ALL COLUMNS.
apr_to_jun <- apr_to_jun %>% drop_na()
‘data.frame’: 1357979 obs. of 17 variables:
## 'data.frame': 1357979 obs. of 17 variables:
## $ ride_id : chr "6C992BD37A98A63F" "1E0145613A209000" "E498E15508A80BAD" "1887262AD101C604" ...
## $ rideable_type : Factor w/ 3 levels "classic_bike",..: 1 2 2 1 2 1 1 3 1 1 ...
## $ started_at : POSIXlt, format: "2021-04-12 18:25:36" "2021-04-27 17:27:11" ...
## $ start_date : POSIXlt, format: "2021-04-12" "2021-04-27" ...
## $ start_time : chr "18:25:36" "17:27:11" "12:42:45" "09:17:42" ...
## $ ended_at : POSIXlt, format: "2021-04-12 18:56:55" "2021-04-27 18:31:29" ...
## $ end_date : POSIXlt, format: "2021-04-12" "2021-04-27" ...
## $ end_time : chr "18:56:55" "18:31:29" "11:40:24" "09:42:48" ...
## $ start_station_name: chr "State St & Pearson St" "Dorchester Ave & 49th St" "Loomis Blvd & 84th St" "Honore St & Division St" ...
## $ start_station_id : chr "TA1307000061" "KA1503000069" "20121" "TA1305000034" ...
## $ end_station_name : chr "Southport Ave & Waveland Ave" "Dorchester Ave & 49th St" "Loomis Blvd & 84th St" "Southport Ave & Waveland Ave" ...
## $ end_station_id : chr "13235" "KA1503000069" "20121" "13235" ...
## $ start_lat : num 41.9 41.8 41.7 41.9 41.7 ...
## $ start_lng : num -87.6 -87.6 -87.7 -87.7 -87.7 ...
## $ end_lat : num 41.9 41.8 41.7 41.9 41.7 ...
## $ end_lng : num -87.7 -87.6 -87.7 -87.7 -87.7 ...
## $ member_casual : chr "member" "casual" "casual" "member" ...
COLUMN MEMBER_CASUAL.
EXPLORE…CHARACTER VARIABLE TYPE IN “member_casual” COLUMN.
USE ‘unique ()’ FUNCTION TO FIND INDIVIDUAL VALUES IN COLUMN.
unique(apr_to_jun$member_casual)
## [1] "member" "casual"
HOW MANY OBSERVATIONS FALL UNDER EACH USER TYPE?
table(apr_to_jun$member_casual)
##
## casual member
## 641441 716538
sort(table(apr_to_jun$member_casual), decreasing = TRUE)
##
## member casual
## 716538 641441
BAR PLOT OF DATA DISTRIBUTION OF ‘member_casual’ COLUMN.
barplot(sort(table(apr_to_jun$member_casual), decreasing = TRUE))

CHANGE VARIABLE FROM CHARACTER TO FACTOR.
apr_to_jun$member_casual <- as.factor(apr_to_jun$member_casual)
USE ‘class’ FUNCTION TO CHECK DATA TYPE IN COLUMN.
class(apr_to_jun$member_casual)
## [1] "factor"
USE ‘levels’ FUNCTION TO CHECK FACTOR.
levels(apr_to_jun$member_casual)
## [1] "casual" "member"
NOTE MEMBER CASUAL IS NOW A FACTOR.
## Rows: 1,357,979
## Columns: 17
## $ ride_id <chr> "6C992BD37A98A63F", "1E0145613A209000", "E498E15508…
## $ rideable_type <fct> classic_bike, docked_bike, docked_bike, classic_bik…
## $ started_at <dttm> 2021-04-12 18:25:36, 2021-04-27 17:27:11, 2021-04-…
## $ start_date <dttm> 2021-04-12, 2021-04-27, 2021-04-03, 2021-04-17, 20…
## $ start_time <chr> "18:25:36", "17:27:11", "12:42:45", "09:17:42", "12…
## $ ended_at <dttm> 2021-04-12 18:56:55, 2021-04-27 18:31:29, 2021-04-…
## $ end_date <dttm> 2021-04-12, 2021-04-27, 2021-04-07, 2021-04-17, 20…
## $ end_time <chr> "18:56:55", "18:31:29", "11:40:24", "09:42:48", "14…
## $ start_station_name <chr> "State St & Pearson St", "Dorchester Ave & 49th St"…
## $ start_station_id <chr> "TA1307000061", "KA1503000069", "20121", "TA1305000…
## $ end_station_name <chr> "Southport Ave & Waveland Ave", "Dorchester Ave & 4…
## $ end_station_id <chr> "13235", "KA1503000069", "20121", "13235", "20121",…
## $ start_lat <dbl> 41.89745, 41.80577, 41.74149, 41.90312, 41.74149, 4…
## $ start_lng <dbl> -87.62872, -87.59246, -87.65841, -87.67394, -87.658…
## $ end_lat <dbl> 41.94815, 41.80577, 41.74149, 41.94815, 41.74149, 4…
## $ end_lng <dbl> -87.66394, -87.59246, -87.65841, -87.66394, -87.658…
## $ member_casual <fct> member, casual, casual, member, casual, casual, cas…
ADD COLUMN FOR DAY OF WEEK.
NUMERIC VALUE DAY OF WEEK SUNDAY = 1 MONDAY = 2 TUESDAY = 3 ETC,
ETC…
apr_to_jun$weekday <- lubridate::wday(apr_to_jun$start_date)
CHARACTER DAY OF WEEK USING ABBREVIATED LABELS MON,TUE,WED ETC
ETC…
apr_to_jun$weekday. <- lubridate::wday(apr_to_jun$start_date, label = TRUE)
CHANGE ‘weekday’ DATA TYPE.
apr_to_jun$weekday. <- as.factor(apr_to_jun$weekday.)
USE ‘class’ FUNCTION TO CHECK DATA TYPE IN COLUMN.
class(apr_to_jun$weekday.)
## [1] "ordered" "factor"
USE ‘levels’ FUNCTION TO CHECK FACTOR.
levels(apr_to_jun$weekday.)
## [1] "Sun" "Mon" "Tue" "Wed" "Thu" "Fri" "Sat"
NOTE WEEKDAY. IS AN ORDERED FACTOR.
## Rows: 1,357,979
## Columns: 21
## $ ride_id <chr> "6C992BD37A98A63F", "1E0145613A209000", "E498E15508…
## $ rideable_type <fct> classic_bike, docked_bike, docked_bike, classic_bik…
## $ started_at <dttm> 2021-04-12 18:25:36, 2021-04-27 17:27:11, 2021-04-…
## $ start_date <dttm> 2021-04-12, 2021-04-27, 2021-04-03, 2021-04-17, 20…
## $ start_time <chr> "18:25:36", "17:27:11", "12:42:45", "09:17:42", "12…
## $ ended_at <dttm> 2021-04-12 18:56:55, 2021-04-27 18:31:29, 2021-04-…
## $ end_date <dttm> 2021-04-12, 2021-04-27, 2021-04-07, 2021-04-17, 20…
## $ end_time <chr> "18:56:55", "18:31:29", "11:40:24", "09:42:48", "14…
## $ start_station_name <chr> "State St & Pearson St", "Dorchester Ave & 49th St"…
## $ start_station_id <chr> "TA1307000061", "KA1503000069", "20121", "TA1305000…
## $ end_station_name <chr> "Southport Ave & Waveland Ave", "Dorchester Ave & 4…
## $ end_station_id <chr> "13235", "KA1503000069", "20121", "13235", "20121",…
## $ start_lat <dbl> 41.89745, 41.80577, 41.74149, 41.90312, 41.74149, 4…
## $ start_lng <dbl> -87.62872, -87.59246, -87.65841, -87.67394, -87.658…
## $ end_lat <dbl> 41.94815, 41.80577, 41.74149, 41.94815, 41.74149, 4…
## $ end_lng <dbl> -87.66394, -87.59246, -87.65841, -87.66394, -87.658…
## $ member_casual <fct> member, casual, casual, member, casual, casual, cas…
## $ ride_length_secs <dbl> 1879, 3858, 341859, 1506, 5477, 41, 86, 1550, 3174,…
## $ ride_length_total <dbl> 31.3166667, 64.3000000, 5697.6500000, 25.1000000, 9…
## $ weekday <dbl> 2, 3, 7, 7, 7, 1, 7, 3, 2, 7, 7, 7, 3, 1, 3, 5, 3, …
## $ weekday. <ord> Mon, Tue, Sat, Sat, Sat, Sun, Sat, Tue, Mon, Sat, S…
EXPLORE NUMERIC VARIABLE TYPE IN “weekday” COLUMN.
USE ‘class’ FUNCTION TO CHECK DATA TYPE IN COLUMN.
class(apr_to_jun$weekday)
## [1] "numeric"
USE ‘summary()’ FUNCTION TO SUMMARIZE VALUES IN DATA FRAME.
summary(apr_to_jun$weekday)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 2.000 4.000 4.046 6.000 7.000
BOX PLOT AKA IS A GRAPHICAL REPRESENTATION TO SUMMARIZE DATA AND
IDENTIFY OUTLIERS.
boxplot(apr_to_jun$weekday, col = 'yellow')

HISTOGRAM TO VIZUALIZE DISTRIBUTION OF VALUES IN WEEKDAY
COLUMN.
hist(apr_to_jun$weekday, col='pink')

NOTE WEEKDAY IS NOW A ‘dbl’.
## Rows: 1,357,979
## Columns: 21
## $ ride_id <chr> "6C992BD37A98A63F", "1E0145613A209000", "E498E15508…
## $ rideable_type <fct> classic_bike, docked_bike, docked_bike, classic_bik…
## $ started_at <dttm> 2021-04-12 18:25:36, 2021-04-27 17:27:11, 2021-04-…
## $ start_date <dttm> 2021-04-12, 2021-04-27, 2021-04-03, 2021-04-17, 20…
## $ start_time <chr> "18:25:36", "17:27:11", "12:42:45", "09:17:42", "12…
## $ ended_at <dttm> 2021-04-12 18:56:55, 2021-04-27 18:31:29, 2021-04-…
## $ end_date <dttm> 2021-04-12, 2021-04-27, 2021-04-07, 2021-04-17, 20…
## $ end_time <chr> "18:56:55", "18:31:29", "11:40:24", "09:42:48", "14…
## $ start_station_name <chr> "State St & Pearson St", "Dorchester Ave & 49th St"…
## $ start_station_id <chr> "TA1307000061", "KA1503000069", "20121", "TA1305000…
## $ end_station_name <chr> "Southport Ave & Waveland Ave", "Dorchester Ave & 4…
## $ end_station_id <chr> "13235", "KA1503000069", "20121", "13235", "20121",…
## $ start_lat <dbl> 41.89745, 41.80577, 41.74149, 41.90312, 41.74149, 4…
## $ start_lng <dbl> -87.62872, -87.59246, -87.65841, -87.67394, -87.658…
## $ end_lat <dbl> 41.94815, 41.80577, 41.74149, 41.94815, 41.74149, 4…
## $ end_lng <dbl> -87.66394, -87.59246, -87.65841, -87.66394, -87.658…
## $ member_casual <fct> member, casual, casual, member, casual, casual, cas…
## $ ride_length_secs <dbl> 1879, 3858, 341859, 1506, 5477, 41, 86, 1550, 3174,…
## $ ride_length_total <dbl> 31.3166667, 64.3000000, 5697.6500000, 25.1000000, 9…
## $ weekday <dbl> 2, 3, 7, 7, 7, 1, 7, 3, 2, 7, 7, 7, 3, 1, 3, 5, 3, …
## $ weekday. <ord> Mon, Tue, Sat, Sat, Sat, Sun, Sat, Tue, Mon, Sat, S…
NOTE WEEKDAY IS NOW NUMERIC.
## 'data.frame': 1357979 obs. of 21 variables:
## $ ride_id : chr "6C992BD37A98A63F" "1E0145613A209000" "E498E15508A80BAD" "1887262AD101C604" ...
## $ rideable_type : Factor w/ 3 levels "classic_bike",..: 1 2 2 1 2 1 1 3 1 1 ...
## $ started_at : POSIXlt, format: "2021-04-12 18:25:36" "2021-04-27 17:27:11" ...
## $ start_date : POSIXlt, format: "2021-04-12" "2021-04-27" ...
## $ start_time : chr "18:25:36" "17:27:11" "12:42:45" "09:17:42" ...
## $ ended_at : POSIXlt, format: "2021-04-12 18:56:55" "2021-04-27 18:31:29" ...
## $ end_date : POSIXlt, format: "2021-04-12" "2021-04-27" ...
## $ end_time : chr "18:56:55" "18:31:29" "11:40:24" "09:42:48" ...
## $ start_station_name: chr "State St & Pearson St" "Dorchester Ave & 49th St" "Loomis Blvd & 84th St" "Honore St & Division St" ...
## $ start_station_id : chr "TA1307000061" "KA1503000069" "20121" "TA1305000034" ...
## $ end_station_name : chr "Southport Ave & Waveland Ave" "Dorchester Ave & 49th St" "Loomis Blvd & 84th St" "Southport Ave & Waveland Ave" ...
## $ end_station_id : chr "13235" "KA1503000069" "20121" "13235" ...
## $ start_lat : num 41.9 41.8 41.7 41.9 41.7 ...
## $ start_lng : num -87.6 -87.6 -87.7 -87.7 -87.7 ...
## $ end_lat : num 41.9 41.8 41.7 41.9 41.7 ...
## $ end_lng : num -87.7 -87.6 -87.7 -87.7 -87.7 ...
## $ member_casual : Factor w/ 2 levels "casual","member": 2 1 1 2 1 1 1 1 1 1 ...
## $ ride_length_secs : num 1879 3858 341859 1506 5477 ...
## $ ride_length_total : num 31.3 64.3 5697.6 25.1 91.3 ...
## $ weekday : num 2 3 7 7 7 1 7 3 2 7 ...
## $ weekday. : Ord.factor w/ 7 levels "Sun"<"Mon"<"Tue"<..: 2 3 7 7 7 1 7 3 2 7 ...