Manipulating Dates with lubridate in R
Learning Objectives
Coerce different character formats to a date
Create a date time variable using date time components
Extract date / time components
Calculate differences between date / times
First, let’s load necessary packages
Import the swiftSongs.csv
file located here
# Variables to keep
keeps <- c("track_name", "album_name", "youtube_url", "youtube_title", "youtube_publish_date", "youtube_duration", "song_release_date_year", "song_release_date_month", "song_release_date_day")
# Importing CSV file
swiftSongs <- read_csv("https://raw.githubusercontent.com/dilernia/STA418-518/main/Data/swiftSongs.csv") %>%
dplyr::select(keeps)
Exploratory data analysis
Explore high-level characteristics of the data using the
glimpse()
and skim()
functions.
## Rows: 151
## Columns: 9
## $ track_name <chr> "...Ready For It?", "‘tis the damn season", "a…
## $ album_name <chr> "reputation", "evermore", "folklore", "folklor…
## $ youtube_url <chr> "http://www.youtube.com/watch?v=wIft-t-MQuE", …
## $ youtube_title <chr> "Taylor Swift - …Ready For It?", "Taylor Swift…
## $ youtube_publish_date <dttm> 2017-10-27 04:00:03, 2020-12-11 05:00:05, 202…
## $ youtube_duration <chr> "PT3M31S", "PT3M56S", "PT4M24S", "PT4M56S", "P…
## $ song_release_date_year <dbl> 2017, 2020, 2020, 2020, 2020, 2020, 2020, 2020…
## $ song_release_date_month <dbl> 9, 12, 7, 7, 7, 12, 12, 12, 12, 12, 7, 12, 7, …
## $ song_release_date_day <dbl> 3, 11, 24, 24, 24, 11, 11, 11, 11, 11, 24, 11,…
Name | swiftSongs |
Number of rows | 151 |
Number of columns | 9 |
_______________________ | |
Column type frequency: | |
character | 5 |
numeric | 3 |
POSIXct | 1 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
track_name | 0 | 1 | 2 | 70 | 0 | 151 | 0 |
album_name | 0 | 1 | 3 | 12 | 0 | 10 | 0 |
youtube_url | 0 | 1 | 42 | 42 | 0 | 151 | 0 |
youtube_title | 0 | 1 | 5 | 79 | 0 | 151 | 0 |
youtube_duration | 0 | 1 | 4 | 7 | 0 | 92 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
song_release_date_year | 0 | 1 | 2014.95 | 5.17 | 2006 | 2010 | 2017 | 2020 | 2022 | ▃▅▂▂▇ |
song_release_date_month | 0 | 1 | 9.46 | 1.85 | 3 | 8 | 10 | 11 | 12 | ▁▁▅▇▅ |
song_release_date_day | 0 | 1 | 18.38 | 6.85 | 2 | 11 | 21 | 24 | 28 | ▁▅▁▅▇ |
Variable type: POSIXct
skim_variable | n_missing | complete_rate | min | max | median | n_unique |
---|---|---|---|---|---|---|
youtube_publish_date | 0 | 1 | 2009-06-16 21:42:34 | 2022-10-25 04:00:09 | 2018-11-21 08:34:37 | 134 |
Coercing strings to date / time values
## [1] "1989-12-13"
## [1] "1989-12-13"
## [1] "1989-12-13"
## [1] "1989-12-13"
Create a date time variable using date time components
Create a new character variable song_release_date_char
in the swiftSongs
data set using the mutate()
and str_c()
functions.
# Creating char string column of song release date
swiftSongs <- swiftSongs |>
mutate(song_release_date_char = str_c(song_release_date_year, "-",
song_release_date_month, "-",
song_release_date_day))
Create a new date / time variable song_release_date
using the newly created song_release_date_char
variable and
the appropriate lubridate
helper function.
# Creating a song release date column character
swiftSongs <- swiftSongs |>
mutate(song_release_date = ymd(song_release_date_char))
Reproduce the scatter plot below showing the relationship between the
release date of each of Taylor’s songs, and the release date of the
corresponding YouTube video. To match the custom colors from Taylor’s
albums, use the vector of colors
c('#7f6070', '#964c32', '#bb9559', '#8c8c8c', '#eeadcf', '#7193ac', '#a81e47', '#0c0c0c', '#7d488e', '#01a7d9')
.
# Creating a scatter plot
swiftSongs |>
ggplot(aes(x = song_release_date,
y = youtube_publish_date,
color = album_name)) +
scale_color_manual(values = c('#7f6070', '#964c32', '#bb9559', '#8c8c8c', '#eeadcf', '#7193ac', '#a81e47', '#0c0c0c', '#7d488e', '#01a7d9')) +
geom_point() +
labs(tittle = "Taylor Swift release dates",
y = "Youtube video release date",
x = "Song release date",
caption = "Data source: Genius API and Youtube API",
color = "Album") +
theme_bw() +
theme(legend.position = "bottom",
title = element_text(face = "bold"))
Creating a date from individual components
Recreate the date / time variable song_release_date
this
time directly using the year
, month
, and
day
components with the make_datetime()
function.
Extracting date / time components
Extract the year
, month
, and
day
of the release date of the YouTube videos using the
youtube_publish_date
variable to create new variables
called youtube_publish_date_year
,
youtube_publish_date_month
,
youtube_publish_date_day
and
youtube_publish_date_day1
, respectively.
# Extracting components of YouTube publish date
swiftSongs <- swiftSongs |>
mutate(youtube_publish_date_year = year(youtube_publish_date),
youtube_publish_date_month = month(youtube_publish_date),
youtube_publish_date_day = day(youtube_publish_date),
youtube_publish_date_dayl = wday(youtube_publish_date,
label = TRUE,
abbr = FALSE))
Bar chat
Reproduce the bar chart below showing the number of Taylor Swift YouTube videos released on each day of the week. The background image is located here, but can be downloaded and imported into R using the code below.
Downloading background image for the bar chart
# Downloading and saving image
download.file(url = "https://github.com/dilernia/STA418-518/blob/main/lover-album.png?raw=true",
destfile = "lover-album.png",
mode = "wb")
# Importing image into R
backgImage <- png::readPNG("lover-album.png")
We can then include the image as a background for our plot using the
background_image()
function from the ggpubr
package. To reproduce the coloring of the bars, use the hex codes
"#fc94bc"
and "#69b4dc"
.
swiftSongs |>
ggplot(aes(x = youtube_publish_date_dayl)) +
background_image(backgImage) +
geom_bar(fill = "#69b4dc",
color = "#fc94bc") +
labs(title = "Taylor Swift Youtube videos",
subtitle = "Day of release",
x = "Release day",
y = "Number of videos",
caption = "Data source: Youtube API") +
scale_y_continuous(expand = expansion(mult = c(0, 0.1))) +
scale_x_discrete(drop = FALSE) +
coord_flip() +
ggthemes::theme_few() +
theme(title = element_text(face = "bold"))
Calculating difference between date / times
We can also calculate the amount of time between two date / time
values in R
. For example, we can calculate how old someone
born on December 13th, 1989 is using the code below:
## Time difference of 12515 days
## [1] 34.26503
Using the song_release_date
variable, calculate how many
days it has been since the most recent Taylor Swift song was
released.
# Date of most recent song release
last_release <- swiftSongs |>
arrange(desc(song_release_date)) |>
#slice_head(n=1) |>
slice_max(song_release_date, n = 1, with_ties = FALSE) |>
pull(song_release_date)
# Calculating in years
interval(last_release, today()) / days(1)
## [1] 515