Required Libraries

library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.2.2
## Warning: package 'ggplot2' was built under R version 4.2.2
library(lubridate)
library(httr)
library(jsonlite)

Tidyvserse Packages

Tidyverse contains many packages within it that allows users to work with strings, mutate and rearange dataframes and access data through APIs or websites. We can see a few of these packages listed below.

tidyverse_packages()
##  [1] "broom"         "cli"           "crayon"        "dbplyr"       
##  [5] "dplyr"         "dtplyr"        "forcats"       "ggplot2"      
##  [9] "googledrive"   "googlesheets4" "haven"         "hms"          
## [13] "httr"          "jsonlite"      "lubridate"     "magrittr"     
## [17] "modelr"        "pillar"        "purrr"         "readr"        
## [21] "readxl"        "reprex"        "rlang"         "rstudioapi"   
## [25] "rvest"         "stringr"       "tibble"        "tidyr"        
## [29] "xml2"          "tidyverse"

Import Data

To demonstrate the capabilities of the lubridate package within the Tidyverse, I will be using a data set on requested film permits in NYC. As per its documentation “Lubridate provides tools that make it easier to parse and manipulate dates.”

Description of Data: The Film Office issues permits to productions filming on location in the City of New York and provides free police assistance, free parking privileges and access to most exterior locations free of charge. Not all filming activity requires a permit. These permits are generally required when asserting the exclusive use of city property, like a sidewalk, a street, or a park.

Source: NYC Filming Permits

import_data <- as.data.frame(fromJSON('https://data.cityofnewyork.us/resource/tg4x-b46p.json', simplifyVector = TRUE))

Brief Glance of the Raw Dataset

After importing the data using jsonlite, we are left we nineteen (19) columns which includes the start and end dates within different zip codes in New York City. We can see it uses the ISO 8601 standard format of date and time separated from each other with the usage of the string literal “T”.

rmarkdown::paged_table(head(import_data, 5))

Convert ISO to POSIXct

Lubridate has fantastic documentation you can find here. To transform our date columns to be much more useful, I will start by using the ymd_hms() function. Here, it will transform the original datetime columns from ISO format to POSIXct.

update_df <- 
  import_data |> 
  mutate(start_date = ymd_hms(startdatetime),
         end_date = ymd_hms(enddatetime))

Create Day of Week and Hour Columns

Now that we have dates in a workable datetime format, I can extract the day of the week and the hour from the timestamps to perform some exploratory analysis. To keep it simple, the start date will only be used.

update_df <-
  update_df |> 
  mutate(start_day_of_week = wday(start_date, label = TRUE, abbr = TRUE),
         start_hour = hour(start_date))

Preview Columns of Interest

Lets reduce the number of columns to look at for our exploration. This will contain the start date, start time (hour), category, event type, and borough.

borough category eventtype start_date start_day_of_week start_hour
Queens Television Shooting Permit 2023-01-20 06:00:00 Fri 6
Bronx Television Shooting Permit 2023-01-20 09:00:00 Fri 9
Manhattan Television Shooting Permit 2023-01-20 11:30:00 Fri 11
Brooklyn Television Shooting Permit 2023-01-20 02:30:00 Fri 2
Manhattan Television Shooting Permit 2023-01-20 13:00:00 Fri 13
Manhattan Television Shooting Permit 2023-01-20 06:00:00 Fri 6

Visualize Dates

Lastly, two plots will be created filtering out for only “shooting permits” where we can see the day of the week filming begins within each borough and what hour does filming begin based on the category such as television or commercial.

Day of Week Filming Began

borough Sun Mon Tue Wed Thu Fri Sat
Bronx 1 4 6 5 5 6 1
Brooklyn 3 49 60 71 62 50 12
Manhattan 16 49 53 60 62 48 20
Queens 1 27 36 38 33 33 0
Staten Island 0 1 2 1 1 1 0

Hour Filming Began

start_hour Commercial Documentary Film Music Video Still Photography Student Television Theater WEB
0 0 0 1 0 0 0 6 0 0
1 0 0 0 0 0 0 3 0 0
2 0 0 2 0 2 0 11 0 0
3 0 0 0 0 0 0 3 0 0
4 0 0 0 0 0 0 13 0 0
5 0 0 0 0 0 0 9 0 0
6 39 2 13 1 16 0 200 0 10
7 9 0 5 0 5 0 250 0 7
8 2 0 10 2 3 0 47 1 1
9 1 0 3 0 0 1 32 0 6
10 0 0 1 0 0 0 27 0 0
11 0 0 0 0 0 0 14 0 0
12 0 0 2 0 0 0 15 2 0
13 0 0 2 0 0 0 10 0 0
14 0 0 2 0 0 0 7 0 0
15 0 0 1 0 0 0 5 0 0
16 1 0 1 0 0 0 5 0 0
17 0 0 0 0 0 0 4 0 0
19 1 0 1 0 0 0 0 0 0

Changing to Date Data Type

Another functionality of the lubridate package is the ability to extract just the date from a timestamp, or to convert from a character type date to a date data type using as_date().

First, let’s use as_date() to extract the date from each row of startdatetime and create a new variable start_date:

import_data |>
  mutate(start_date = as_date(startdatetime)) |>
  select(borough, category, eventtype, startdatetime, start_date) |>
  head(5) |>
  knitr::kable()
borough category eventtype startdatetime start_date
Queens Television Shooting Permit 2023-01-20T06:00:00.000 2023-01-20
Bronx Television Shooting Permit 2023-01-20T09:00:00.000 2023-01-20
Manhattan Television Shooting Permit 2023-01-20T11:30:00.000 2023-01-20
Brooklyn Television Shooting Permit 2023-01-20T02:30:00.000 2023-01-20
Manhattan Television Shooting Permit 2023-01-20T13:00:00.000 2023-01-20

Now, let’s use as_date() to convert dates from a character data type to a date data type. If the character date is not in the form “yyyy-mm-dd”, we can use the format argument to tell the function how the date is formatted so that it knows how to convert it.

Let’s convert a few different dates from character to date data types. Note that you do not need to provide a 2 digit date, as the function will supply a starting zero for single digit months or days. You do, however, need to provide a four digit year.

# format = yyyy-mm-dd
dates1 <- c("2021-4-7", "2021-9-13", "2021-10-15", "2021-11-14")
as_date(dates1)
## [1] "2021-04-07" "2021-09-13" "2021-10-15" "2021-11-14"
# format = mm-dd-yyyy
dates2 <- c("4-7-2021", "9-13-2021", "10-15-2021", "11-14-2021")
as_date(dates2, format = '%m-%d-%Y')
## [1] "2021-04-07" "2021-09-13" "2021-10-15" "2021-11-14"
# format = mm/dd/yyyy
dates3 <- c("4/7/2021", "9/13/2021", "10/15/2021", "11/14/2021")
as_date(dates3, format = '%m/%d/%Y')
## [1] "2021-04-07" "2021-09-13" "2021-10-15" "2021-11-14"

If the date is improperly formatted, the function will coerce those values to NA.

dates4 <- c("12-1-2021", "1-5-2023", "2/14/2023")
as_date(dates4, format = '%m-%d-%Y')
## [1] "2021-12-01" "2023-01-05" NA

Additional documentation for lubridate package can be found here.