Preamble

In this notes we shall discuss the methods for handling date and time information in R. For this we can make use of base package and lubridate package which is part of Tidyverse package, to know more please go through Introduction to Tidyverse.
Our goal is to get familiar with date time objects in R and learn to perform operations available with those objects.

lubridate

lubridate is a package in R that is dedicated to date and time information. It is good in handling time related peculiarities. Time related information changes based on region and also for some regions it has day light saving, number of days in an year also changes during leap years, this is mentioned here as peculiarities(which may not happen for other types of data ). The methods available in lubridate for date time handling are robust to time zones, leap days and daylight saving time. Before diving into lubridate's functions let us get to know about date time class in R

Date and time class in R

  • In R programming Date is represented as "yyyy-mm-dd"

  • Dates are represented as the number of days since 1970-01-01, with negative values for earlier dates. This is called as Internal form. They are always printed according to current Gregorian calendar.

  • Time is represented in R as "yyyy-mm-dd hour:minute:second deviation from GMT"

  • A week starts with Sunday

  • Date variables mostly have data type "double" and object class "date".

  • Date and time along with time zone is stored in R as POSIX class.

Loading packages and data needed

Let us load the package

library(lubridate)

We shall use flights data from "nycflights13" package

library(nycflights13)
data("flights")

Let us look at first five rows of the data

year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay carrier flight tailnum origin dest air_time distance hour minute time_hour
2013 1 1 517 515 2 830 819 11 UA 1545 N14228 EWR IAH 227 1400 5 15 2013-01-01 05:00:00
2013 1 1 533 529 4 850 830 20 UA 1714 N24211 LGA IAH 227 1416 5 29 2013-01-01 05:00:00
2013 1 1 542 540 2 923 850 33 AA 1141 N619AA JFK MIA 160 1089 5 40 2013-01-01 05:00:00
2013 1 1 544 545 -1 1004 1022 -18 B6 725 N804JB JFK BQN 183 1576 5 45 2013-01-01 05:00:00
2013 1 1 554 600 -6 812 837 -25 DL 461 N668DN LGA ATL 116 762 6 0 2013-01-01 06:00:00

The following table gives an introduction about variables in the data and their type

Variable Description Type
year year of departure Numeric
month month of departure Numeric
day day of departure Numeric
dep_time actual departure time Numeric
sched_dep_time scheduled departure time Numeric
dep_delay departure delay in minutes Numeric
arr_time actual arrival time Numeric
sched_arr_time scheduled arrival time Numeric
arr_delay arrival delay in minutes Numeric
carrier airline carrier abbreviation Numeric
flight Flight number Numeric
tailnum Plain tail number string
origin Flight origin string
dest Flight destination string
air_time Time spent in air in minutes Numeric
distance distance between airports in miles Numeric
hour scheduled departure hour Numeric
minute scheduled departure minute Numeric
time_hour Scheduled date and hour of the flight as a POSIXct date POSIXct

We shall use some variables from this data to learn lubridate

Managing date and time objects in R

Current system date and time extraction

today() can be used to extract current date

today()
## [1] "2021-07-08"

now() can be used to obtain current system date and time

now()
## [1] "2021-07-08 16:38:59 IST"

Assignment of Date objects

Date objects can be assigned easily using functions ymd, mdy, dmy. These functions transform dates stored in character and numeric vectors to Date objects. These functions recognize arbitrary non-digit separators as well as no separator. As long as the order of formats is correct, these functions will parse dates correctly even when the input vectors contain differently formatted dates.

Let us form a new variable by joining 'year', 'month' and 'date' variables For this we use unite() function from tidyr package to know more about this package please go through Data structuring with tidyr

library(tidyr)
data=unite(data=flights[c("year","month","day")],col=date,sep=" ",remove=F)
Let us look at first few rows of data
date year month day
2013 1 1 2013 1 1
2013 1 1 2013 1 1
2013 1 1 2013 1 1
2013 1 1 2013 1 1
2013 1 1 2013 1 1
2013 1 1 2013 1 1

The function ymd() is used to convert string or numeric object in to date object.

date_converted=ymd(data$date)
##Let us look at the date
head(date_converted)
## [1] "2013-01-01" "2013-01-01" "2013-01-01" "2013-01-01" "2013-01-01"
## [6] "2013-01-01"
class(date_converted)
## [1] "Date"

Note : There are extremely weird cases when one of the separators is "" and some of the formats are not in double digits might not be parsed correctly

Example :

# 2021 march 1
ymd('202131')
## [1] NA
# 2021 january 12
ymd(2021112)
## [1] "0202-11-12"

We can overcome this by adding a separator

ymd("2021/1/31")
## [1] "2021-01-31"

similarly we can convert dates in format 'month date year' or 'date month year'. Let us see some examples

mdy(c('febr 1 2021','jun 30 2021'))
## [1] "2021-02-01" "2021-06-30"
dmy(c('1 1 21',"4 2 21"))
## [1] "2021-01-01" "2021-02-04"

Assignment of Date Time objects

Date time objects can be assigned using ymd_hms(), mdy_hms(), dmy_hms() . This function transforms dates stored as character or numeric vectors to POSIXct objects.

For example we shall use the following data

data2=unite(flights[c("day","month","year","hour","minute")],
col=dt,sep="-",remove = F)
#First few rows of data
head(data2)

The function dmy_hm() is used to convert string or numeric object in to date time(POSIXct) object. we can specify the timezone using tz argument Please note that the timezone availability differs system to system

date_time=dmy_hm(data2$dt,tz="EST")
x
2013-01-01 05:15:00
2013-01-01 05:29:00
2013-01-01 05:40:00
2013-01-01 05:45:00
2013-01-01 06:00:00
2013-01-01 05:58:00

We can change date format as in the below examples

# month date year hour minute
mdy_hm(c("2/22/2021/12/5","6/29/2021/19/51"),tz="UTC")
## [1] "2021-02-22 12:05:00 UTC" "2021-06-29 19:51:00 UTC"
# month date year hour 
mdy_h("12-23-2021-10-pm")
## [1] "2021-12-23 22:00:00 UTC"
# month date year hour minute second
mdy_hms(c("2/22/2021/12/5/22","6/29/2021/19/51/59"),tz="UTC")
## [1] "2021-02-22 12:05:22 UTC" "2021-06-29 19:51:59 UTC"
# year date month hour minute second
ydm_hms("2021 30 5 3 35 22 pm",tz="EST")
## [1] "2021-05-30 15:35:22 EST"
The following may be used to get a quick view of available options
Ready reckoner for lubridate
Order_of_elements_in_date_time Parse_function
year, month day ymd()
year, day, month ydm()
month, day, year mdy()
day, month, year dmy()
hour, minute hm()
hour, minute,second hms()
year, month, day, hour, minute, second ymd_hms()
year, month, day, hour, minute ymd_hm()
year, month, day, hour ymd_h()

Merging data to form date

We can merge different columns of data having year, month and day in to date using ISOdate() . year, month, day, hour, min, sec and tz arguments are used to specify which column is which part of date.

ISOdate(year = flights$year, 
        month =flights$month,
        day =flights$day,
        hour=flights$hour,
        min=flights$minute )->t
The resulting data will look like this
x
2013-01-01 05:15:00
2013-01-01 05:29:00
2013-01-01 05:40:00
2013-01-01 05:45:00
2013-01-01 06:00:00
2013-01-01 05:58:00

Number to date

Numbers in Internal form (as integer) can be converted in to date using as_date().
We specify number of days from origin and obtain date

# default origin
as_date(18808)
## [1] "2021-06-30"
# specific origin (can be given as string)
as_date(c(234,22),origin="2020-12-31")
## [1] "2021-08-22" "2021-01-22"

as_datetime() can be used to obtain time from origin using Number of seconds

as_datetime(60)
## [1] "1970-01-01 00:01:00 UTC"
as_datetime(60*60*24)
## [1] "1970-01-02 UTC"

Decimal form of date in a year

decimal_date() is used to convert a date into a decimal number of year. Digits in decimal represent the position of our date in the year
example

decimal_date(dmy(30062021))
## [1] 2021.493

date_decimal() is used to convert decimal form to date

date_decimal(2021.493)
## [1] "2021-06-29 22:40:47 UTC"

Parsing date time

parse_date_time() parses an input vector into POSIXct date-time object. There are two advantages of using this function. First, it allows specification of the order in which the formats occur without the need to include separators and the % prefix using orders arguent. Second, it allows the user to specify several format-orders to handle heterogeneous date-time character representations.
Example

x <- c("21-01-01", "07-01-02", "09-01-03")
parse_date_time(x,orders =  "ymd")
## [1] "2021-01-01 UTC" "2007-01-02 UTC" "2009-01-03 UTC"
# heterogeneous date times
x <- c("20-01-01", "240102", "15-01 03", "90-01-03 12:02")
parse_date_time(x,orders =  c("ymd", "ymd HM"))
## [1] "2020-01-01 00:00:00 UTC" "2024-01-02 00:00:00 UTC"
## [3] "2015-01-03 00:00:00 UTC" "1990-01-03 12:02:00 UTC"
#  truncated time-dates 
x <- c("2011-12-31 12:59:59", "2010-01-01 12:11",
       "2010-01-01 12", "2010-01-01")
parse_date_time(x, "Ymd HMS", truncated = 3)
## [1] "2011-12-31 12:59:59 UTC" "2010-01-01 12:11:00 UTC"
## [3] "2010-01-01 12:00:00 UTC" "2010-01-01 00:00:00 UTC"

Extracting and modifying components of date time objects

We can either extract or modify various components of date time objects using following functions.
For this let us use the following data

datan=ymd_hms(c("2021 12 21 08 42 22 ",
                "2021 10 2 23 22 30 ",
                "2021 3 11 16 12 2 "))

datan
## [1] "2021-12-21 08:42:22 UTC" "2021-10-02 23:22:30 UTC"
## [3] "2021-03-11 16:12:02 UTC"

date() used to extract day of month

date(datan)
## [1] "2021-12-21" "2021-10-02" "2021-03-11"

month() is used to extract month of the year. Default is number,

month(datan)
## [1] 12 10  3

label argument is used to obtain month names.

month(datan,label=T)
## [1] Dec Oct Mar
## 12 Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < ... < Dec

By default these are abbreviated, to get full name we use abbr argument

month(datan,label=T,abbr=F)
## [1] December October  March   
## 12 Levels: January < February < March < April < May < June < ... < December

year is used to extract year from date object. isoyear() returns years according to the ISO 8601 week calendar. epiyear() returns years according to the epidemiological week calendars.

year(datan)
## [1] 2021 2021 2021

day of the month can be extracted using day()

day(datan)
## [1] 21  2 11

We can find which day of the week given date is using wday()

wday(datan)
## [1] 3 7 5

label argument is used to obtain day names.

wday(datan,label=T)
## [1] Tue Sat Thu
## Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat

By default these are abbreviated, to get full name we use abbr argument

wday(datan,label=T,abbr=F)
## [1] Tuesday  Saturday Thursday
## 7 Levels: Sunday < Monday < Tuesday < Wednesday < Thursday < ... < Saturday

We can find which day of the quarter using qday()

qday(datan)
## [1] 82  2 70

For finding out which week the date occurs week()

week(datan)
## [1] 51 40 10

hour() is used to extract hour from date

hour(datan)
## [1]  8 23 16

minute() is used to extract minute component from the date

minute(datan)
## [1] 42 22 12

second() is used to extract second component from the date

second(datan)
## [1] 22 30  2

semester is used to find which semester the date belongs

semester(datan)
## [1] 2 2 1

We can use the components to assign/ change the value in the date one example is shown below

data12=ymd_hms(c("2019 12 2 2 34 22","2020 05 14 9 8 7"))
data12
## [1] "2019-12-02 02:34:22 UTC" "2020-05-14 09:08:07 UTC"
# Changing hour component
hour(data12)=c(1,2)
data12
## [1] "2019-12-02 01:34:22 UTC" "2020-05-14 02:08:07 UTC"

In similar way we change set other components also

am() / pm() used to check which time is recorded

am(datan)
## [1]  TRUE FALSE FALSE
pm(datan)
## [1] FALSE  TRUE  TRUE

leap_year() can be used to check whether it is leap year or not

leap_year(datan)
## [1] FALSE FALSE FALSE

Approximating dates

Date time objects can be approximated in different ways as below

Floor

floor() can be used to round the date time to lower nearest value. unit argument is used to specify which component we consider for rounding

datan
## [1] "2021-12-21 08:42:22 UTC" "2021-10-02 23:22:30 UTC"
## [3] "2021-03-11 16:12:02 UTC"
floor_date(datan,unit='month')
## [1] "2021-12-01 UTC" "2021-10-01 UTC" "2021-03-01 UTC"

Round

round() is used to round to nearest date

round_date(datan,unit="day")
## [1] "2021-12-21 UTC" "2021-10-03 UTC" "2021-03-12 UTC"

Ceiling

ceiling_date is used to round to higher nearest date

ceiling_date(datan,unit="hour")
## [1] "2021-12-21 09:00:00 UTC" "2021-10-03 00:00:00 UTC"
## [3] "2021-03-11 17:00:00 UTC"

Rollback

rollback is used to change the month to previous month and date as last date of month

rollback(datan)
## [1] "2021-11-30 08:42:22 UTC" "2021-09-30 23:22:30 UTC"
## [3] "2021-02-28 16:12:02 UTC"

roll_to_first argument is used to go to first day of previous month preserve_hms argument is used to keep the time as it is

rollback(datan,roll_to_first = T,preserve_hms = T)
## [1] "2021-12-01 08:42:22 UTC" "2021-10-01 23:22:30 UTC"
## [3] "2021-03-01 16:12:02 UTC"

Timezone handling

We can change time saved in one timezone to another using with_tz(). tzone is used to specify time zone.

# Original time
datan
## [1] "2021-12-21 08:42:22 UTC" "2021-10-02 23:22:30 UTC"
## [3] "2021-03-11 16:12:02 UTC"
# Converted time
with_tz(datan,tzone="EST")
## [1] "2021-12-21 03:42:22 EST" "2021-10-02 18:22:30 EST"
## [3] "2021-03-11 11:12:02 EST"

Without converting time we can change timezone using force_tz()

force_tz(datan,tzone='EST')
## [1] "2021-12-21 08:42:22 EST" "2021-10-02 23:22:30 EST"
## [3] "2021-03-11 16:12:02 EST"

Stamp date times

stamp() can be used to create template from example string and we can apply it to new date time

eg=stamp("The example is created for Thursday 1 june,2021 9:20")
eg(datan)
## [1] "The example is created for Tuesday 21 December,2021 08:42"
## [2] "The example is created for Saturday 02 October,2021 23:22"
## [3] "The example is created for Thursday 11 March,2021 16:12"

Daylight saving

Day light saving is a practice of advancing clocks by one hour during warmer months i.e during late winter or early spring so that darkness falls at a later clock time and set clocks back by one hour in autumn.

Check for daylight saving

logical. TRUE if DST is in force, FALSE if not, NA if unknown.

dst(datan)
## [1] FALSE FALSE FALSE

Some mathematical operations with date time objects

We can perform some mathematical operations using date time objects to obtain other date. Remember we have mentioned dates are saved in Internal form as integers. so we can add or subtract integers to dates to obtain other date

example We can move forward to a date using number of days using '+'

data1=ymd(c('2021 3 2','2020 12 24'))
# 30 days from our dates
data11=data1+31

Difference between two dates can be found using '-'

data11-data1
## Time differences in days
## [1] 31 31

We can go back to number of days using '-' operator

data1-20
## [1] "2021-02-10" "2020-12-04"
# examples with time zones
x <- ymd_hms("2015-09-22 01:00:00", tz = "US/Eastern")
y <- ymd_hms("2015-09-22 01:00:00", tz = "US/Pacific")
z <- ymd_hms("2019-09-22 21:00:00", tz = "US/Pacific")

# moving 3 days 19 hours 
x+days(4)-hours(5)
## [1] "2015-09-25 20:00:00 EDT"
# Checking for equality
y == x
## [1] FALSE
# Difference between two different timezones
y - x
## Time difference of 3 hours
# same timezone
z-y
## Time difference of 1461.833 days

Sequence generation using hours

Creating sequences with time is very similar to sequences with numbers. However, we need to make sure it is date object. This is done by seq() and by argument is used to specify difference between elements of the sequence.

# Using days
seq(ymd("2015-1-1"), 
    ymd("2015-1-15"),
    by = "2 days")
## [1] "2015-01-01" "2015-01-03" "2015-01-05" "2015-01-07" "2015-01-09"
## [6] "2015-01-11" "2015-01-13" "2015-01-15"
# Using hours to generate sequence
seq(ymd_hms("2015 1 1 4 30 22",tz='EST'),
    ymd_hms("2015 1 1 7 31 00",tz='EST'),
    by= "hours")
## [1] "2015-01-01 04:30:22 EST" "2015-01-01 05:30:22 EST"
## [3] "2015-01-01 06:30:22 EST" "2015-01-01 07:30:22 EST"
# Using months 
seq(ymd("2025-1-1"), ymd("2025-12-15"), by = "4 months")
## [1] "2025-01-01" "2025-05-01" "2025-09-01"

Periods

Periods track changes in clock times, this ignore time line irregularities. we can define a period using Following commands
Function Period.created
years(x ) x years
months(x) x months
weeks(x) x weeks
days(x ) x days
hours(x ) x hours
minutes(x ) x minutes
seconds(x ) x seconds
milliseconds(x ) x milliseconds
microseconds(x ) x microseconds
nanoseconds(x ) x nanoseconds
picoseconds(x ) x picoseconds
x represents a number

for example

seconds(2)
## [1] "2S"
months(2)
## [1] "2m 0d 0H 0M 0S"

These periods have some drawbacks, This happens because of daylight saving time and leap year, consider the following dates

A normal day

normal <- ymd_hms("2018-01-01 01:30:00",tz="US/Eastern")

The start of daylight savings (spring forward)

gap_day <- ymd_hms("2018-03-11 01:30:00",tz="US/Eastern")

The end of daylight savings (fall back)

lap_day <- ymd_hms("2018-11-04 00:30:00",tz="US/Eastern")

Leap years and leap seconds

leap_day <- ymd("2023-03-01")

The following example holds correct since it is normal day

normal + minutes(90)
## [1] "2018-01-01 03:00:00 EST"

In day light saving start day there will be a skip of one hour in clock and this needs to be taken into account.

gap_day
## [1] "2018-03-11 01:30:00 EST"
gap_day+minutes(90)
## [1] "2018-03-11 03:00:00 EDT"

We must be getting 4 pm because of daylight saving in the region. but this period doesn't consider it

At the end of daylight saving period we need to take in to account of clock changing back to 1 hour

lap_day
## [1] "2018-11-04 00:30:00 EDT"
lap_day+minutes(90)
## [1] "2018-11-04 02:00:00 EST"

And leap year calculation

leap_day+years(1)
## [1] "2024-03-01"

The above miscalculations can be handled by using 'durations' available in lubridate

Durations

Durations simply measure the time span between start and end dates. They track the passage of physical time, which deviates from clock time when irregularities occur. they can be obtained using
Function Duration
dyears(x) 31536000x seconds
dmonths(x) 2629800x seconds
dweeks(x) 604800x seconds
ddays(x) 86400x seconds
dhours(x) 3600x seconds
dminutes(x) 60x seconds
dseconds(x) x seconds
dmilliseconds(x) x 10^-3 Seconds
dmicroseconds(x) x 10^-6 Seconds
dnanoseconds(x) x 10^-9 Seconds
dpicoseconds(x) x 10^-12 Seconds
normal + dminutes(90)
## [1] "2018-01-01 03:00:00 EST"
gap_day + dminutes(90)
## [1] "2018-03-11 04:00:00 EDT"
lap_day+dminutes(90)
## [1] "2018-11-04 01:00:00 EST"
leap_day + dyears(1)
## [1] "2024-02-29 06:00:00 UTC"

We can see that problems discussed with period have been solved.

Intervals

They represent specific intervals of the timeline, bounded by start and end date-times. interval(a,b) is used to create interval from a to b

i=interval(normal, normal + minutes(90))
i
## [1] 2018-01-01 01:30:00 EST--2018-01-01 03:00:00 EST
j=interval(datan,leap_day)
j
## [1] 2021-12-21 08:42:22 UTC--2023-03-01 UTC
## [2] 2021-10-02 23:22:30 UTC--2023-03-01 UTC
## [3] 2021-03-11 16:12:02 UTC--2023-03-01 UTC

a %within% b used to check whether interval or date-time a fall within interval b

i  %within%  j
## [1] FALSE FALSE FALSE

Final Note

We have seen how we can handle date and time objects in R using base and lubridate packages. We have learnt how to

  • Assignment
  • Conversion
  • Components of date time objects
  • Daylight saving handling
  • Approximations
  • Timezone handling
  • Using stamps

We have given introduction to almost all the important tools to handle date time components in R. We have used ISOdate() and seq() from base package, unite() from tidyr package and all other functions are from lubridate package.