Tidyverse: Lubridate

Required Libraries

library(tidyverse)

## Warning: package 'tidyverse' was built under R version 4.2.2

## Warning: package 'ggplot2' was built under R version 4.2.2

library(lubridate)
library(httr)
library(jsonlite)

Tidyvserse Packages

Tidyverse contains many packages within it that allows users to work with strings, mutate and rearange dataframes and access data through APIs or websites. We can see a few of these packages listed below.

tidyverse_packages()

##  [1] "broom"         "cli"           "crayon"        "dbplyr"       
##  [5] "dplyr"         "dtplyr"        "forcats"       "ggplot2"      
##  [9] "googledrive"   "googlesheets4" "haven"         "hms"          
## [13] "httr"          "jsonlite"      "lubridate"     "magrittr"     
## [17] "modelr"        "pillar"        "purrr"         "readr"        
## [21] "readxl"        "reprex"        "rlang"         "rstudioapi"   
## [25] "rvest"         "stringr"       "tibble"        "tidyr"        
## [29] "xml2"          "tidyverse"

Import Data

To demonstrate the capabilities of the lubridate package within the Tidyverse, I will be using a data set on requested film permits in NYC. As per its documentation “Lubridate provides tools that make it easier to parse and manipulate dates.”

Description of Data: The Film Office issues permits to productions filming on location in the City of New York and provides free police assistance, free parking privileges and access to most exterior locations free of charge. Not all filming activity requires a permit. These permits are generally required when asserting the exclusive use of city property, like a sidewalk, a street, or a park.

Source: NYC Filming Permits

import_data <- as.data.frame(fromJSON('https://data.cityofnewyork.us/resource/tg4x-b46p.json', simplifyVector = TRUE))

Brief Glance of the Raw Dataset

After importing the data using jsonlite, we are left we nineteen (19) columns which includes the start and end dates within different zip codes in New York City. We can see it uses the ISO 8601 standard format of date and time separated from each other with the usage of the string literal “T”.

rmarkdown::paged_table(head(import_data, 5))

Convert ISO to POSIXct

Lubridate has fantastic documentation you can find here. To transform our date columns to be much more useful, I will start by using the ymd_hms() function. Here, it will transform the original datetime columns from ISO format to POSIXct.

update_df <- 
  import_data |> 
  mutate(start_date = ymd_hms(startdatetime),
         end_date = ymd_hms(enddatetime))

Create Day of Week and Hour Columns

Now that we have dates in a workable datetime format, I can extract the day of the week and the hour from the timestamps to perform some exploratory analysis. To keep it simple, the start date will only be used.

update_df <-
  update_df |> 
  mutate(start_day_of_week = wday(start_date, label = TRUE, abbr = TRUE),
         start_hour = hour(start_date))

Preview Columns of Interest

Lets reduce the number of columns to look at for our exploration. This will contain the start date, start time (hour), category, event type, and borough.

borough	category	eventtype	start_date	start_day_of_week	start_hour
Queens	Television	Shooting Permit	2023-01-20 06:00:00	Fri	6
Bronx	Television	Shooting Permit	2023-01-20 09:00:00	Fri	9
Manhattan	Television	Shooting Permit	2023-01-20 11:30:00	Fri	11
Brooklyn	Television	Shooting Permit	2023-01-20 02:30:00	Fri	2
Manhattan	Television	Shooting Permit	2023-01-20 13:00:00	Fri	13
Manhattan	Television	Shooting Permit	2023-01-20 06:00:00	Fri	6

Visualize Dates

Lastly, two plots will be created filtering out for only “shooting permits” where we can see the day of the week filming begins within each borough and what hour does filming begin based on the category such as television or commercial.

Day of Week Filming Began

borough	Sun	Mon	Tue	Wed	Thu	Fri	Sat
Bronx	1	4	6	5	5	6	1
Brooklyn	3	49	60	71	62	50	12
Manhattan	16	49	53	60	62	48	20
Queens	1	27	36	38	33	33	0
Staten Island	0	1	2	1	1	1	0

Hour Filming Began

start_hour	Commercial	Documentary	Film	Music Video	Still Photography	Student	Television	Theater	WEB
0	0	0	1	0	0	0	6	0	0
1	0	0	0	0	0	0	3	0	0
2	0	0	2	0	2	0	11	0	0
3	0	0	0	0	0	0	3	0	0
4	0	0	0	0	0	0	13	0	0
5	0	0	0	0	0	0	9	0	0
6	39	2	13	1	16	0	200	0	10
7	9	0	5	0	5	0	250	0	7
8	2	0	10	2	3	0	47	1	1
9	1	0	3	0	0	1	32	0	6
10	0	0	1	0	0	0	27	0	0
11	0	0	0	0	0	0	14	0	0
12	0	0	2	0	0	0	15	2	0
13	0	0	2	0	0	0	10	0	0
14	0	0	2	0	0	0	7	0	0
15	0	0	1	0	0	0	5	0	0
16	1	0	1	0	0	0	5	0	0
17	0	0	0	0	0	0	4	0	0
19	1	0	1	0	0	0	0	0	0

Changing to Date Data Type

Another functionality of the lubridate package is the ability to extract just the date from a timestamp, or to convert from a character type date to a date data type using as_date().

First, let’s use as_date() to extract the date from each row of startdatetime and create a new variable start_date:

import_data |>
  mutate(start_date = as_date(startdatetime)) |>
  select(borough, category, eventtype, startdatetime, start_date) |>
  head(5) |>
  knitr::kable()

borough	category	eventtype	startdatetime	start_date
Queens	Television	Shooting Permit	2023-01-20T06:00:00.000	2023-01-20
Bronx	Television	Shooting Permit	2023-01-20T09:00:00.000	2023-01-20
Manhattan	Television	Shooting Permit	2023-01-20T11:30:00.000	2023-01-20
Brooklyn	Television	Shooting Permit	2023-01-20T02:30:00.000	2023-01-20
Manhattan	Television	Shooting Permit	2023-01-20T13:00:00.000	2023-01-20

Now, let’s use as_date() to convert dates from a character data type to a date data type. If the character date is not in the form “yyyy-mm-dd”, we can use the format argument to tell the function how the date is formatted so that it knows how to convert it.

Let’s convert a few different dates from character to date data types. Note that you do not need to provide a 2 digit date, as the function will supply a starting zero for single digit months or days. You do, however, need to provide a four digit year.

# format = yyyy-mm-dd
dates1 <- c("2021-4-7", "2021-9-13", "2021-10-15", "2021-11-14")
as_date(dates1)

## [1] "2021-04-07" "2021-09-13" "2021-10-15" "2021-11-14"

# format = mm-dd-yyyy
dates2 <- c("4-7-2021", "9-13-2021", "10-15-2021", "11-14-2021")
as_date(dates2, format = '%m-%d-%Y')

## [1] "2021-04-07" "2021-09-13" "2021-10-15" "2021-11-14"

# format = mm/dd/yyyy
dates3 <- c("4/7/2021", "9/13/2021", "10/15/2021", "11/14/2021")
as_date(dates3, format = '%m/%d/%Y')

## [1] "2021-04-07" "2021-09-13" "2021-10-15" "2021-11-14"

If the date is improperly formatted, the function will coerce those values to NA.

dates4 <- c("12-1-2021", "1-5-2023", "2/14/2023")
as_date(dates4, format = '%m-%d-%Y')

## [1] "2021-12-01" "2023-01-05" NA

Additional documentation for lubridate package can be found here.