Munging date-times with R

Los Angeles R User's Group Meet-up
November 14, 2013

Yasmin Lucero
Senior Statistician, Gravity.com
yasmin.lucero@gmail.com

Why do you need date-time classes?

1. Parsing

  • "Mon, Nov 5, 2013"
  • 1381505475000
  • 2/3/2011
  • 2013-01-31 12:00:15.45 UTC

2. Functions

  • lm(y ~ day)
  • plot(y ~ date)
  • time series models

3. Calculations

  • time1 < time2
  • time2 = time1 + lag

3. Extractions

  • julian = yday(time)
  • time$hour
  • weekdays(time)

Functions: plots with date-times

plot of chunk unnamed-chunk-1

plot of chunk unnamed-chunk-2

Outline

  1. Core datetime classes, methods, concepts
  2. Parsing
  3. Calculations
  4. Extraction
  5. Special datetime classes from lubridate package
  6. timeDate package for holidays, daylight savings, and more
  7. Brief mention of time series classes

Core datetime classes

?DateTimeClasses

Date vs. POSIX datetime

Sys.Date()
[1] "2013-11-14"
Sys.time()
[1] "2013-11-14 10:58:53 PST"

Best practices: if you only need a date, use a Date

Seconds since the Unix epoch

now = Sys.time()
now = as.numeric(now)
format(now, scientific=FALSE)
[1] "1384455533"

POSIXct vs POSIXlt (or POSIXt)

lt = as.POSIXlt(Sys.time())
attr(lt, 'names')
[1] "sec"   "min"   "hour"  "mday"  "mon"   "year"  "wday"  "yday"  "isdst"
lt$hour
[1] 10
lt$wday
[1] 4

A few ways to create datetime objects

ISOdatetime(2013, 10, 2, 8, 30, 15)
[1] "2013-10-02 08:30:15 PDT"
as.POSIXct('2013-10-02 08:30:15')
[1] "2013-10-02 08:30:15 PDT"
as.Date('10/2/13', format='%m/%d/%y')
[1] "2013-10-02"

Parsing

Datetimes come to you in many forms

as.Date('2013-10-02')
[1] "2013-10-02"
as.Date('02/10/13')
[1] "0002-10-13"
print(try(as.Date('October 2, 2013')))
[1] "Error in charToDate(x) : \n  character string is not in a standard unambiguous format\n"
attr(,"class")
[1] "try-error"
attr(,"condition")
<simpleError in charToDate(x): character string is not in a standard unambiguous format>

Use lubridate

The most important advice in the whole presentation.

require(lubridate)
mdy('10/2/13')
[1] "2013-10-02 UTC"

Reason #1: A family of convenient parsing functions:
ymd, dmy, dym, mdy, dmy_h, dmy_hms...

Some handy lubridate features

origin
[1] "1970-01-01 UTC"
today()
[1] "2013-11-14"
now()
[1] "2013-11-14 10:58:53 PST"

Extraction functions

Base

weekdays(today())
[1] "Thursday"

Lubridate

wday(today())
[1] 5
yday(today())
mday(today())
month(today())
quarter(today())

am(now())
tz(now())
dst(now())

More handy lubridate methods

days_in_month(today())
Nov 
 30 
decimal_date(now())
[1] 2013.8697
ceiling_date(now(), unit='hour')
[1] "2013-11-14 11:00:00 PST"
floor_date(now(), unit='day')
[1] "2013-11-14 PST"

Miscellanea: POSIXt methods

Base R has POSIXt methods for seq, cut, trunc, round

week = seq(from=today(), to=today() + 7, by='day')
print(week)
[1] "2013-11-14" "2013-11-15" "2013-11-16" "2013-11-17" "2013-11-18"
[6] "2013-11-19" "2013-11-20" "2013-11-21"
cut(week, breaks=2)
[1] 2013-11-14 2013-11-14 2013-11-14 2013-11-14 2013-11-18 2013-11-18
[7] 2013-11-18 2013-11-18
Levels: 2013-11-14 2013-11-18

Miscellanea: Partial seconds

getOption('digits.secs')

Default value is zero, can be set to any integer from 0 to 6.

Sys.time()
[1] "2013-11-14 10:58:53 PST"
options(digits.secs=2)
Sys.time()
[1] "2013-11-14 10:58:53.53 PST"

Calculations

Operators available in base package are addition, subtraction and the logical comparison operators: ==, !=, <, >, <=, >=

now = Sys.time()
later = now + 100
later - now
Time difference of 1.6666667 mins
later > now
[1] TRUE

A gotcha in difftimes

Difftime sets the time units based on the size of the time difference!

difftime(now, now + 150)
Time difference of -2.5 mins
difftime(now, now + 10)
Time difference of -10 secs

When you use subtraction operator on date-time objects, it calls the difftime method. As a best practice, don't use the subtraction operator in code, instead call difftime, and always set the units parameter explicitly.

More difftime trickiness

dt.test = difftime(now, later)
dt.test
Time difference of -1.6666667 mins
attributes(dt.test)
$units
[1] "mins"

$class
[1] "difftime"
as.numeric(dt.test)
[1] -1.6666667

timespan classes from lubridate

  • durations: exact time units between two time points
  • periods: like durations, but follow clock time
  • intervals: like periods, but connected to the calendar
now() - ddays(14) # duration
[1] "2013-10-31 11:58:53.72 PDT"
now() - days(14) # period
[1] "2013-10-31 10:58:53.72 PDT"
today() %within% new_interval(ymd('2013-10-01'), ymd('2013-12-01'))
[1] TRUE

More calculation: lubridate operators

today() %m+% months(3)
[1] "2014-02-14"
today() %m-% years(2)
[1] "2011-11-14"

The timeDate package: holidays!

require(timeDate)
USThanksgivingDay(year=2014)
GMT
[1] [2014-11-27]
CAThanksgivingDay(year=2014)
GMT
[1] [2014-10-13]

Some of my favorite function names…

And for some reason, they have skew and kurtosis methods for POSIX objects.

The timeDate package: more holidays!

Days when NY stock exchange is closed:

holidayNYSE(year=2014)
NewYork
[1] [2014-01-01] [2014-01-20] [2014-02-17] [2014-04-18] [2014-05-26]
[6] [2014-07-04] [2014-09-01] [2014-11-27] [2014-12-25]

timeDate: daylight savings time

DST rules for a large number of world cities:

head(Phoenix())
              Phoenix offSet isdst TimeZone     numeric
1 1901-12-14 20:45:52 -25200     0      MST -2147397248
2 1918-03-31 09:00:00 -21600     1      MDT -1633273200
3 1918-10-27 08:00:00 -25200     0      MST -1615132800
4 1919-03-30 09:00:00 -21600     1      MDT -1601823600
5 1919-10-26 08:00:00 -25200     0      MST -1583683200
6 1942-02-09 09:00:00 -21600     1      MWT  -880210800

timeDate: coerce to regular time series

alignQuarterly(today())
GMT
[1] [2013-12-31]
tC = timeCalendar()
align(tC, by='2w', offset='3d')
GMT
 [1] [2013-01-04] [2013-01-18] [2013-02-01] [2013-02-15] [2013-03-01]
 [6] [2013-03-15] [2013-03-29] [2013-04-12] [2013-04-26] [2013-05-10]
[11] [2013-05-24] [2013-06-07] [2013-06-21] [2013-07-05] [2013-07-19]
[16] [2013-08-02] [2013-08-16] [2013-08-30] [2013-09-13] [2013-09-27]
[21] [2013-10-11] [2013-10-25] [2013-11-08] [2013-11-22]

Beyond the scope: time-series

Base R comes with a time-series class ts. This is a vector where the index is a datetime class (instead of the usual integer type). The ts class index is a regular time series, there are several other time series classes that faciliate more complex time series needs.

  • ts
  • its
  • fts
  • xts
  • zoo

Look at the time series task view