Manipulating Dates with lubridate in R

Learning Objectives

  • Coerce different character formats to a date

  • Create a date time variable using date time components

  • Extract date / time components

  • Calculate differences between date / times

First, let’s load necessary packages

library(tidyverse)
library(stringr)
library(lubridate)
library(ggpubr)
library(skimr)

Import the swiftSongs.csv file located here

# Variables to keep
keeps <- c("track_name", "album_name", "youtube_url", "youtube_title", "youtube_publish_date", "youtube_duration", "song_release_date_year", "song_release_date_month", "song_release_date_day")

# Importing CSV file
swiftSongs <- read_csv("https://raw.githubusercontent.com/dilernia/STA418-518/main/Data/swiftSongs.csv") %>% 
  dplyr::select(keeps)

Exploratory data analysis

Explore high-level characteristics of the data using the glimpse() and skim() functions.

glimpse(swiftSongs)
## Rows: 151
## Columns: 9
## $ track_name              <chr> "...Ready For It?", "‘tis the damn season", "a…
## $ album_name              <chr> "reputation", "evermore", "folklore", "folklor…
## $ youtube_url             <chr> "http://www.youtube.com/watch?v=wIft-t-MQuE", …
## $ youtube_title           <chr> "Taylor Swift - …Ready For It?", "Taylor Swift…
## $ youtube_publish_date    <dttm> 2017-10-27 04:00:03, 2020-12-11 05:00:05, 202…
## $ youtube_duration        <chr> "PT3M31S", "PT3M56S", "PT4M24S", "PT4M56S", "P…
## $ song_release_date_year  <dbl> 2017, 2020, 2020, 2020, 2020, 2020, 2020, 2020…
## $ song_release_date_month <dbl> 9, 12, 7, 7, 7, 12, 12, 12, 12, 12, 7, 12, 7, …
## $ song_release_date_day   <dbl> 3, 11, 24, 24, 24, 11, 11, 11, 11, 11, 24, 11,…
skim(swiftSongs)
Data summary
Name swiftSongs
Number of rows 151
Number of columns 9
_______________________
Column type frequency:
character 5
numeric 3
POSIXct 1
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
track_name 0 1 2 70 0 151 0
album_name 0 1 3 12 0 10 0
youtube_url 0 1 42 42 0 151 0
youtube_title 0 1 5 79 0 151 0
youtube_duration 0 1 4 7 0 92 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
song_release_date_year 0 1 2014.95 5.17 2006 2010 2017 2020 2022 ▃▅▂▂▇
song_release_date_month 0 1 9.46 1.85 3 8 10 11 12 ▁▁▅▇▅
song_release_date_day 0 1 18.38 6.85 2 11 21 24 28 ▁▅▁▅▇

Variable type: POSIXct

skim_variable n_missing complete_rate min max median n_unique
youtube_publish_date 0 1 2009-06-16 21:42:34 2022-10-25 04:00:09 2018-11-21 08:34:37 134

Coercing strings to date / time values

ymd("1989-12-13")
## [1] "1989-12-13"
mdy("December 13th, 1989")
## [1] "1989-12-13"
dmy("13-Dec-1989")
## [1] "1989-12-13"
ymd(19891213)
## [1] "1989-12-13"

Create a date time variable using date time components

Create a new character variable song_release_date_char in the swiftSongs data set using the mutate() and str_c() functions.

# Creating char string column of song release date
swiftSongs <- swiftSongs |>
  mutate(song_release_date_char = str_c(song_release_date_year, "-",
                                        song_release_date_month, "-",
                                        song_release_date_day))

Create a new date / time variable song_release_date using the newly created song_release_date_char variable and the appropriate lubridate helper function.

# Creating a song release date column character
swiftSongs <- swiftSongs |>
  mutate(song_release_date = ymd(song_release_date_char))

Reproduce the scatter plot below showing the relationship between the release date of each of Taylor’s songs, and the release date of the corresponding YouTube video. To match the custom colors from Taylor’s albums, use the vector of colors c('#7f6070', '#964c32', '#bb9559', '#8c8c8c', '#eeadcf', '#7193ac', '#a81e47', '#0c0c0c', '#7d488e', '#01a7d9').

# Creating a scatter plot
swiftSongs |>
  ggplot(aes(x = song_release_date,
             y = youtube_publish_date,
             color = album_name)) +
  scale_color_manual(values = c('#7f6070', '#964c32', '#bb9559', '#8c8c8c', '#eeadcf', '#7193ac', '#a81e47', '#0c0c0c', '#7d488e',  '#01a7d9')) +
  geom_point() +
  labs(tittle = "Taylor Swift release dates",
       y = "Youtube video release date",
       x = "Song release date",
       caption = "Data source: Genius API and Youtube API",
       color = "Album") +
  theme_bw() +
  theme(legend.position = "bottom",
        title = element_text(face = "bold"))

Creating a date from individual components

Recreate the date / time variable song_release_date this time directly using the year, month, and day components with the make_datetime() function.

# Creating the song release date date / time variable 
swiftSongs <- swiftSongs |>
  mutate(song_release_date = make_datetime(year = song_release_date_year,
                                        month = song_release_date_month,
                                        day = song_release_date_day))

Extracting date / time components

Extract the year, month, and day of the release date of the YouTube videos using the youtube_publish_date variable to create new variables called youtube_publish_date_year, youtube_publish_date_month, youtube_publish_date_day and youtube_publish_date_day1, respectively.

# Extracting components of YouTube publish date
swiftSongs <- swiftSongs |>
  mutate(youtube_publish_date_year = year(youtube_publish_date),
         youtube_publish_date_month = month(youtube_publish_date),
         youtube_publish_date_day = day(youtube_publish_date),
         youtube_publish_date_dayl = wday(youtube_publish_date,
                                          label = TRUE,
                                          abbr = FALSE))

Bar chat

Reproduce the bar chart below showing the number of Taylor Swift YouTube videos released on each day of the week. The background image is located here, but can be downloaded and imported into R using the code below.

Downloading background image for the bar chart

# Downloading and saving image
download.file(url = "https://github.com/dilernia/STA418-518/blob/main/lover-album.png?raw=true",
              destfile = "lover-album.png",
              mode = "wb")

# Importing image into R
backgImage <- png::readPNG("lover-album.png")

We can then include the image as a background for our plot using the background_image() function from the ggpubr package. To reproduce the coloring of the bars, use the hex codes "#fc94bc" and "#69b4dc".

swiftSongs |>
  ggplot(aes(x = youtube_publish_date_dayl)) +
  background_image(backgImage) +
  geom_bar(fill = "#69b4dc",
           color = "#fc94bc") +
  labs(title = "Taylor Swift Youtube videos",
       subtitle = "Day of release",
       x = "Release day",
       y = "Number of videos",
       caption = "Data source: Youtube API") +
  scale_y_continuous(expand = expansion(mult = c(0, 0.1))) +
  scale_x_discrete(drop = FALSE) +
  coord_flip() +
  ggthemes::theme_few() +
  theme(title = element_text(face = "bold"))

Calculating difference between date / times

We can also calculate the amount of time between two date / time values in R. For example, we can calculate how old someone born on December 13th, 1989 is using the code below:

# Calculating someone's age in days
dob <- ymd(19891213)
ts_age <- today() - dob
ts_age
## Time difference of 12515 days
# Calculating in years
interval(dob, today()) / years(1)
## [1] 34.26503

Using the song_release_date variable, calculate how many days it has been since the most recent Taylor Swift song was released.

# Date of most recent song release
last_release <- swiftSongs |>
  arrange(desc(song_release_date)) |>
  #slice_head(n=1) |>
  slice_max(song_release_date, n = 1, with_ties = FALSE) |>
  pull(song_release_date)

# Calculating in years
interval(last_release, today()) / days(1)
## [1] 515