Loading Libraries



In this lesson, we’ll explore the lubridate R package, by Garrett Grolemund and Hadley Wickham. According to the package authors, “lubridate has a consistent, memorable syntax, that makes working with dates fun instead of frustrating.” If you’ve ever worked with dates in R, that statement probably has your attention.

There are different date and time representations, in order to view our locale:

## [1] "en_CA.UTF-8"

lubridate contains many useful functions. We’ll only be covering the basics here. Type help(package = lubridate) to bring up an overview of the package, including the package DESCRIPTION, a list of available functions, and a link to the official package vignette.

The today() function returns today’s date. Lets store the result in a new variable called this_day.


this_day <- today()
## [1] "2015-08-09"

Extract information from date

There are three components to this date. In order, they are year, month, and day. We can extract any of these components using the year(), month(), or day() function, respectively. We try them on this_day now.

y <- year(this_day)
m <- month(this_day)
d <-day(this_day)

y 2015
m 8
d 9

We can also get the day of the week from this_day using the wday() function. It will be represented as a number, such that 1 = Sunday, 2 = Monday, 3 = Tuesday, etc. Similarly we can extract day of month and day of year.

w <- wday(this_day)
m <- mday(this_day)
y <- yday(this_day)

w 1
m 9
y 221

We can add a second argument, label = TRUE, to display the name of the weekday (represented as an ordered factor).

wday(this_day, label = TRUE)
## [1] Sun
## Levels: Sun < Mon < Tues < Wed < Thurs < Fri < Sat


In addition to handling dates, lubridate is great for working with date and time combinations, referred to as date-times. The now() function returns the date-time representing this exact moment in time. We store the result in a variable called this_moment.

this_moment <- now()
## [1] "2015-08-09 11:19:08 EDT"

Just like with dates, we can extract the year, month, day, or day of week. However, we can also use hour(), minute(), and second() to extract specific time information.

hr <- hour(this_moment)
minu <- minute(this_moment)
sec <- second(this_moment)

hr 11.000000
minu 19.000000
sec 8.159136

Parsing dates

today() and now() provide neatly formatted date-time information. When working with dates and times ‘in the wild’, this won’t always (and perhaps rarely will) be the case.

Fortunately, lubridate offers a variety of functions for parsing date-times. These functions take the form of ymd(), dmy(), hms(), ymd_hms(), etc., where each letter in the name of the function stands for the location of years (y), months (m), days (d), hours (h), minutes (m), and/or seconds (s) in the date-time being read in.

To see how these functions work, we try ymd("1989-05-17"). We must surround the date with quotes.

my_date <- ymd("1989-05-17")
## [1] "1989-05-17 UTC"

It looks almost the same, except for the addition of a time zone, which we’ll discuss later in the lesson. Below the surface, there’s another important change that takes place when lubridate parses a date.

## [1] "POSIXct" "POSIXt"

So ymd() took a character string as input and returned an object of class POSIXct. It’s not necessary that you understand what POSIXct is, but just know that it is one way that R stores date-time information internally.

“1989-05-17” is a fairly standard format, but lubridate is smart enough to figure out many different date-time formats. We use ymd() to parse “1989 May 17”.

ymd("1989 May 17")
## [1] "1989-05-17 UTC"

Despite being formatted differently, the last two dates had the year first, then the month, then the day. Hence, we used ymd() to parse them. What about the appropriate function is for parsing “March 12, 1975” ?

mdy("March 12, 1975")
## [1] "1975-03-12 UTC"

We can even throw something funky at it and lubridate will often know the right thing to do. Lets parse 25081985, which is supposed to represent the 25th day of August 1985. Note that we are actually parsing a numeric value here – not a character string – so leave off the quotes.

## [1] "1985-08-25 UTC"

But be careful, it’s not magic. Lets try ymd("192012") to see what happens when we give it something more ambiguous. We surround the number with quotes again, just to be consistent with the way most dates are represented (as character strings).

## Warning: All formats failed to parse. No formats found.
## [1] NA

We got a warning message because it was unclear what date you wanted. When in doubt, it’s best to be more explicit. Lets repeat the same command, but add two dashes OR two forward slashes to “192012” so that it’s clear we want January 2, 1920.

# or ymd("1920-1-2")
## [1] "1920-01-02 UTC"

Parsing date-times

In addition to dates, we can parse date-times. Lets created a date-time object called dt1 and we parse it with ymd_hms().

dt1 <- '2014-08-23 17:23:02'

## [1] "2014-08-23 17:23:02 UTC"

What if we have a time, but no date? Lets use the appropriate lubridate function to parse “03:22:14” (hh:mm:ss).

## [1] "3H 22M 14S"

lubridate is also capable of handling vectors of dates, which is particularly helpful when you need to parse an entire column of data.

dt2 <- c('2014-05-14', '2014-09-22', '2014-07-11')

## [1] "2014-05-14 UTC" "2014-09-22 UTC" "2014-07-11 UTC"

Update date-time

The update() function allows us to update one or more components of a date-time. For example, let’s say the current time is 08:34:55 (hh:mm:ss). Lets update this_moment to the new time using update(this_moment, hours = 8, minutes = 34, seconds = 55).

update(this_moment, hours = 8, minutes = 34, seconds = 55)
## [1] "2015-08-09 08:34:55 EDT"

Lets update() this_moment, so that it contain the new time.

this_moment <- update(this_moment, hours = 11, minutes = 55, seconds = 0)
## [1] "2015-08-09 11:55:00 EDT"

DST and time zones and arithmetic operations

Now, lets pretend you are in New York City and you are planning to visit a friend in Hong Kong. You seem to have misplaced your itinerary, but you know that your flight departs New York at 17:34 (5:34pm) the day after tomorrow. You also know that your flight is scheduled to arrive in Hong Kong exactly 15 hours and 50 minutes after departure.

Let’s reconstruct your itinerary from what you can remember, starting with the full date and time of your departure. We will approach this by finding the current date in New York, adding 2 full days, then setting the time to 17:34.

To find the current date in New York, we’’ll use the now() function again. This time, however, we’ll specify the time zone that we want: "America/New_York". “For a complete list of valid time zones for use with lubridate, check out the following Wikipedia http://en.wikipedia.org/wiki/List_of_tz_database_time_zones.

nyc <- now("America/New_York")
## [1] "2015-08-09 11:19:08 EDT"

Your flight is the day after tomorrow (in New York time), so we want to add two days to nyc. One nice aspect of lubridate is that it allows you to use arithmetic operators on dates and times. In this case, we’d like to add two days to nyc, so we can use the following expression: nyc + days(2)

depart <- nyc + days(2)
## [1] "2015-08-11 11:19:08 EDT"

So now depart contains the date of the day after tomorrow. We use update() to add the correct hours (17) and minutes (34) to depart. Reassign the result to depart.

depart <- update(depart, hours = 17, minutes = 34)
## [1] "2015-08-11 17:34:08 EDT"

Your friend wants to know what time she should pick you up from the airport in Hong Kong. Now that we have the exact date and time of your departure from New York, we can figure out the exact time of your arrival in Hong Kong.

The first step is to add 15 hours and 50 minutes to your departure time. Recall that nyc + days(2) added two days to the current time in New York. Use the same approach to add 15 hours and 50 minutes to the date-time stored in depart.

arrive <- depart + hours(15) + minutes(50)
## [1] "2015-08-12 09:24:08 EDT"

The arrive variable contains the time that it will be in New York when you arrive in Hong Kong. What we really want to know is what time is will be in Hong Kong when you arrive, so that your friend knows when to meet you.

The with_tz() function returns a date-time as it would appear in another time zone. We use with_tz() to convert arrive to the “Asia/Hong_Kong” time zone.

arrive <- with_tz(arrive, "Asia/Hong_Kong")
## [1] "2015-08-12 21:24:08 HKT"

Date-time interval

Fast forward to your arrival in Hong Kong. You and your friend have just met at the airport and you realize that the last time you were together was in Singapore on June 17, 2008. Naturally, you’d like to know exactly how long it has been. We only need to use the appropriate lubridate function to parse “June 17, 2008”. This time, however, we should specify an extra argument, tz = "Singapore".

last_time <- mdy("June 17, 2008", tz = "Singapore")
## [1] "2008-06-17 SGT"

We can create a new_interval() that spans from last_time to arrive

how_long <- new_interval(last_time, arrive)
## [1] 2008-06-17 SGT--2015-08-12 21:24:08 SGT

Now use as.period() to see how long it’s been.

## Warning in Ops.factor(left, right): '-' not meaningful for factors
## [1] "7y 1m 26d 21H 24M 8.27182102203369S"

This is where things get a little tricky. Because of things like leap years, leap seconds, and daylight savings time, the length of any given minute, day, month, week, or year is relative to when it occurs. In contrast, the length of a second is always the same, regardless of when it occurs.

To address these complexities, the authors of lubridate introduce four classes of time related objects: instants, intervals, durations, and periods. These topics are beyond the scope of this lesson, but you can find a complete discussion in the 2011 Journal of Statistical Software paper titled ‘Dates and Times Made Easy with lubridate’