Lubridate

Clever name aside, lubridate is a fantastic way (arguably the BEST way) to work with date and datetime objects. Lubridate allows you to easily convert strings or numbers to date-times, deal with timezones, and perform math with date-times.

Example

Start by loading the lubridate and tidyverse libraries along with your data.

# load the tidyverse and lubridate libraries
library(tidyverse)

## -- Attaching packages -------------------------------------------- tidyverse 1.3.0 --

## v ggplot2 3.2.1     v purrr   0.3.3
## v tibble  2.1.3     v dplyr   0.8.3
## v tidyr   1.0.0     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.4.0

## -- Conflicts ----------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(lubridate)

## 
## Attaching package: 'lubridate'

## The following object is masked from 'package:base':
## 
##     date

# load in the csv for analysis
csv <- "time_series_covid_19_confirmed.csv"
coronavirus_data <- data.frame(read.csv(csv, header = TRUE, stringsAsFactors = FALSE))

Clean your data so that it follows a tidy format.

# clean the dataset; pivot to long form, separate data column,
# create confirmed column (based on grouping), and eliminate duplicates
coronavirus_data <- coronavirus_data %>%
  pivot_longer(c("X1.22.20":"X3.27.20"),
    names_to = "Date",
    values_to = "Cases") %>%
  separate("Date", into = c("X", "Date"), sep = "X") %>%
  group_by(Country.Region, Date) %>%
  mutate("Confirmed" = sum(Cases))%>%
  distinct(Country.Region, Date, Confirmed)

The first great feature of lubridate is it’s ability to convert strings to dates. This can be done using any combination of ymd.

# convert Date column to a date object
coronavirus_data$Date <- mdy(coronavirus_data$Date)

You can also create time intervals by using the interval function. This creates a bounded timeline with options to check for overlapping, if one interval is contained within the other, and much more!

# find US time interval
us_cases <- coronavirus_data %>%
  filter(Country.Region == "US") %>%
  filter(Confirmed > 0)

us_start <- us_cases$Date[1]
us_end <- us_cases$Date[length(us_cases$Date)]
us_interval <- interval(us_start, us_end)

Create two more intervals.

# find Italy time interval
italy_cases <- coronavirus_data %>%
  filter(Country.Region == "Italy") %>%
  filter(Confirmed > 0)

italy_start <- italy_cases$Date[1]
italy_end <- italy_cases$Date[length(italy_cases$Date)]
italy_interval <- interval(italy_start, italy_end)

# find Spain time interval
spain_cases <- coronavirus_data %>%
  filter(Country.Region == "Spain") %>%
  filter(Confirmed > 0)

spain_start <- spain_cases$Date[1]
spain_end <- spain_cases$Date[length(spain_cases$Date)]
spain_interval <- interval(spain_start, spain_end)

You can check if any of the intervals overlap.

# check interval overlaps
int_overlaps(us_interval,italy_interval)

## [1] TRUE

int_overlaps(us_interval,spain_interval)

## [1] TRUE

int_overlaps(spain_interval,italy_interval)

## [1] TRUE

You can also examine the length of an interval and the difference in their lengths.

# examine the length of an interval
as.duration(us_interval)

## [1] "5616000s (~9.29 weeks)"

as.period(us_interval)

## [1] "2m 5d 0H 0M 0S"

# what is the difference in length of the outbreak in the US vs Spain?
dseconds(setdiff(us_interval,spain_interval))

## [1] "864000s (~1.43 weeks)"

Finally, you can see if an interval is contained entirely within a different interval.

# check if an interval is contained within another
spain_interval %within% us_interval

## [1] TRUE

us_interval %within% spain_interval

## [1] FALSE

Lastly, there is a bit of math you can do with lubridate. For example, you can add exactly one month to any given date. Here, I found the last day of January, February, and March for 2020 (notice that lubridate takes into account that 2020 is a leap year).

# find the last day of each month using lubridate math
jan_end <- ymd("2019-12-31") %m+% months(1)
feb_end <- ymd("2019-12-31") %m+% months(2)
march_end <- ymd("2019-12-31") %m+% months(3)

# get total cases by month
us_by_month <- us_cases %>%
  mutate("Month" = month(Date, label = TRUE)) %>%
  group_by(Month) %>%
  filter(Date == jan_end | Date == feb_end | Date == march_end | Date == us_end)

Tidyverse CREATE

David Moste

3/27/2020

Lubridate

Example

Conclusion