Preamble
In this notes we shall discuss the methods for handling date and time information in R. For this we can make use of base package and lubridate package which is part of Tidyverse package, to know more please go through Introduction to Tidyverse.
Our goal is to get familiar with date time objects in R and learn to perform operations available with those objects.
lubridate
lubridate is a package in R that is dedicated to date and time information. It is good in handling time related peculiarities. Time related information changes based on region and also for some regions it has day light saving, number of days in an year also changes during leap years, this is mentioned here as peculiarities(which may not happen for other types of data ). The methods available in lubridate for date time handling are robust to time zones, leap days and daylight saving time. Before diving into lubridate's functions let us get to know about date time class in R
Date and time class in R
In R programming Date is represented as "yyyy-mm-dd"
Dates are represented as the number of days since 1970-01-01, with negative values for earlier dates. This is called as Internal form. They are always printed according to current Gregorian calendar.
Time is represented in R as "yyyy-mm-dd hour:minute:second deviation from GMT"
A week starts with Sunday
Date variables mostly have data type "double" and object class "date".
Date and time along with time zone is stored in R as POSIX class.
Loading packages and data needed
Let us load the package
library(lubridate)We shall use flights data from "nycflights13" package
library(nycflights13)
data("flights")Let us look at first five rows of the data
| year | month | day | dep_time | sched_dep_time | dep_delay | arr_time | sched_arr_time | arr_delay | carrier | flight | tailnum | origin | dest | air_time | distance | hour | minute | time_hour |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2013 | 1 | 1 | 517 | 515 | 2 | 830 | 819 | 11 | UA | 1545 | N14228 | EWR | IAH | 227 | 1400 | 5 | 15 | 2013-01-01 05:00:00 |
| 2013 | 1 | 1 | 533 | 529 | 4 | 850 | 830 | 20 | UA | 1714 | N24211 | LGA | IAH | 227 | 1416 | 5 | 29 | 2013-01-01 05:00:00 |
| 2013 | 1 | 1 | 542 | 540 | 2 | 923 | 850 | 33 | AA | 1141 | N619AA | JFK | MIA | 160 | 1089 | 5 | 40 | 2013-01-01 05:00:00 |
| 2013 | 1 | 1 | 544 | 545 | -1 | 1004 | 1022 | -18 | B6 | 725 | N804JB | JFK | BQN | 183 | 1576 | 5 | 45 | 2013-01-01 05:00:00 |
| 2013 | 1 | 1 | 554 | 600 | -6 | 812 | 837 | -25 | DL | 461 | N668DN | LGA | ATL | 116 | 762 | 6 | 0 | 2013-01-01 06:00:00 |
The following table gives an introduction about variables in the data and their type
| Variable | Description | Type |
|---|---|---|
| year | year of departure | Numeric |
| month | month of departure | Numeric |
| day | day of departure | Numeric |
| dep_time | actual departure time | Numeric |
| sched_dep_time | scheduled departure time | Numeric |
| dep_delay | departure delay in minutes | Numeric |
| arr_time | actual arrival time | Numeric |
| sched_arr_time | scheduled arrival time | Numeric |
| arr_delay | arrival delay in minutes | Numeric |
| carrier | airline carrier abbreviation | Numeric |
| flight | Flight number | Numeric |
| tailnum | Plain tail number | string |
| origin | Flight origin | string |
| dest | Flight destination | string |
| air_time | Time spent in air in minutes | Numeric |
| distance | distance between airports in miles | Numeric |
| hour | scheduled departure hour | Numeric |
| minute | scheduled departure minute | Numeric |
| time_hour | Scheduled date and hour of the flight as a POSIXct date | POSIXct |
We shall use some variables from this data to learn lubridate
Managing date and time objects in R
Current system date and time extraction
today() can be used to extract current date
today()## [1] "2021-07-08"
now() can be used to obtain current system date and time
now()## [1] "2021-07-08 16:38:59 IST"
Assignment of Date objects
Date objects can be assigned easily using functions ymd, mdy, dmy. These functions transform dates stored in character and numeric vectors to Date objects. These functions recognize arbitrary non-digit separators as well as no separator. As long as the order of formats is correct, these functions will parse dates correctly even when the input vectors contain differently formatted dates.
Let us form a new variable by joining 'year', 'month' and 'date' variables For this we use unite() function from tidyr package to know more about this package please go through Data structuring with tidyr
library(tidyr)
data=unite(data=flights[c("year","month","day")],col=date,sep=" ",remove=F)| date | year | month | day |
|---|---|---|---|
| 2013 1 1 | 2013 | 1 | 1 |
| 2013 1 1 | 2013 | 1 | 1 |
| 2013 1 1 | 2013 | 1 | 1 |
| 2013 1 1 | 2013 | 1 | 1 |
| 2013 1 1 | 2013 | 1 | 1 |
| 2013 1 1 | 2013 | 1 | 1 |
The function ymd() is used to convert string or numeric object in to date object.
date_converted=ymd(data$date)
##Let us look at the date
head(date_converted)## [1] "2013-01-01" "2013-01-01" "2013-01-01" "2013-01-01" "2013-01-01"
## [6] "2013-01-01"
class(date_converted)## [1] "Date"
Note : There are extremely weird cases when one of the separators is "" and some of the formats are not in double digits might not be parsed correctly
Example :
# 2021 march 1
ymd('202131')## [1] NA
# 2021 january 12
ymd(2021112)## [1] "0202-11-12"
We can overcome this by adding a separator
ymd("2021/1/31")## [1] "2021-01-31"
similarly we can convert dates in format 'month date year' or 'date month year'. Let us see some examples
mdy(c('febr 1 2021','jun 30 2021'))## [1] "2021-02-01" "2021-06-30"
dmy(c('1 1 21',"4 2 21"))## [1] "2021-01-01" "2021-02-04"
Assignment of Date Time objects
Date time objects can be assigned using ymd_hms(), mdy_hms(), dmy_hms() . This function transforms dates stored as character or numeric vectors to POSIXct objects.
For example we shall use the following data
data2=unite(flights[c("day","month","year","hour","minute")],
col=dt,sep="-",remove = F)
#First few rows of data
head(data2)The function dmy_hm() is used to convert string or numeric object in to date time(POSIXct) object. we can specify the timezone using tz argument Please note that the timezone availability differs system to system
date_time=dmy_hm(data2$dt,tz="EST")| x |
|---|
| 2013-01-01 05:15:00 |
| 2013-01-01 05:29:00 |
| 2013-01-01 05:40:00 |
| 2013-01-01 05:45:00 |
| 2013-01-01 06:00:00 |
| 2013-01-01 05:58:00 |
We can change date format as in the below examples
# month date year hour minute
mdy_hm(c("2/22/2021/12/5","6/29/2021/19/51"),tz="UTC")## [1] "2021-02-22 12:05:00 UTC" "2021-06-29 19:51:00 UTC"
# month date year hour
mdy_h("12-23-2021-10-pm")## [1] "2021-12-23 22:00:00 UTC"
# month date year hour minute second
mdy_hms(c("2/22/2021/12/5/22","6/29/2021/19/51/59"),tz="UTC")## [1] "2021-02-22 12:05:22 UTC" "2021-06-29 19:51:59 UTC"
# year date month hour minute second
ydm_hms("2021 30 5 3 35 22 pm",tz="EST")## [1] "2021-05-30 15:35:22 EST"
The following may be used to get a quick view of available options
| Order_of_elements_in_date_time | Parse_function |
|---|---|
| year, month day | ymd() |
| year, day, month | ydm() |
| month, day, year | mdy() |
| day, month, year | dmy() |
| hour, minute | hm() |
| hour, minute,second | hms() |
| year, month, day, hour, minute, second | ymd_hms() |
| year, month, day, hour, minute | ymd_hm() |
| year, month, day, hour | ymd_h() |
Merging data to form date
We can merge different columns of data having year, month and day in to date using ISOdate() . year, month, day, hour, min, sec and tz arguments are used to specify which column is which part of date.
ISOdate(year = flights$year,
month =flights$month,
day =flights$day,
hour=flights$hour,
min=flights$minute )->t| x |
|---|
| 2013-01-01 05:15:00 |
| 2013-01-01 05:29:00 |
| 2013-01-01 05:40:00 |
| 2013-01-01 05:45:00 |
| 2013-01-01 06:00:00 |
| 2013-01-01 05:58:00 |
Number to date
Numbers in Internal form (as integer) can be converted in to date using as_date().
We specify number of days from origin and obtain date
# default origin
as_date(18808)## [1] "2021-06-30"
# specific origin (can be given as string)
as_date(c(234,22),origin="2020-12-31")## [1] "2021-08-22" "2021-01-22"
as_datetime() can be used to obtain time from origin using Number of seconds
as_datetime(60)## [1] "1970-01-01 00:01:00 UTC"
as_datetime(60*60*24)## [1] "1970-01-02 UTC"
Decimal form of date in a year
decimal_date() is used to convert a date into a decimal number of year. Digits in decimal represent the position of our date in the year
example
decimal_date(dmy(30062021))## [1] 2021.493
date_decimal() is used to convert decimal form to date
date_decimal(2021.493)## [1] "2021-06-29 22:40:47 UTC"
Parsing date time
parse_date_time() parses an input vector into POSIXct date-time object. There are two advantages of using this function. First, it allows specification of the order in which the formats occur without the need to include separators and the % prefix using orders arguent. Second, it allows the user to specify several format-orders to handle heterogeneous date-time character representations.
Example
x <- c("21-01-01", "07-01-02", "09-01-03")
parse_date_time(x,orders = "ymd")## [1] "2021-01-01 UTC" "2007-01-02 UTC" "2009-01-03 UTC"
# heterogeneous date times
x <- c("20-01-01", "240102", "15-01 03", "90-01-03 12:02")
parse_date_time(x,orders = c("ymd", "ymd HM"))## [1] "2020-01-01 00:00:00 UTC" "2024-01-02 00:00:00 UTC"
## [3] "2015-01-03 00:00:00 UTC" "1990-01-03 12:02:00 UTC"
# truncated time-dates
x <- c("2011-12-31 12:59:59", "2010-01-01 12:11",
"2010-01-01 12", "2010-01-01")
parse_date_time(x, "Ymd HMS", truncated = 3)## [1] "2011-12-31 12:59:59 UTC" "2010-01-01 12:11:00 UTC"
## [3] "2010-01-01 12:00:00 UTC" "2010-01-01 00:00:00 UTC"
Extracting and modifying components of date time objects
We can either extract or modify various components of date time objects using following functions.
For this let us use the following data
datan=ymd_hms(c("2021 12 21 08 42 22 ",
"2021 10 2 23 22 30 ",
"2021 3 11 16 12 2 "))
datan## [1] "2021-12-21 08:42:22 UTC" "2021-10-02 23:22:30 UTC"
## [3] "2021-03-11 16:12:02 UTC"
date() used to extract day of month
date(datan)## [1] "2021-12-21" "2021-10-02" "2021-03-11"
month() is used to extract month of the year. Default is number,
month(datan)## [1] 12 10 3
label argument is used to obtain month names.
month(datan,label=T)## [1] Dec Oct Mar
## 12 Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < ... < Dec
By default these are abbreviated, to get full name we use abbr argument
month(datan,label=T,abbr=F)## [1] December October March
## 12 Levels: January < February < March < April < May < June < ... < December
year is used to extract year from date object. isoyear() returns years according to the ISO 8601 week calendar. epiyear() returns years according to the epidemiological week calendars.
year(datan)## [1] 2021 2021 2021
day of the month can be extracted using day()
day(datan)## [1] 21 2 11
We can find which day of the week given date is using wday()
wday(datan)## [1] 3 7 5
label argument is used to obtain day names.
wday(datan,label=T)## [1] Tue Sat Thu
## Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat
By default these are abbreviated, to get full name we use abbr argument
wday(datan,label=T,abbr=F)## [1] Tuesday Saturday Thursday
## 7 Levels: Sunday < Monday < Tuesday < Wednesday < Thursday < ... < Saturday
We can find which day of the quarter using qday()
qday(datan)## [1] 82 2 70
For finding out which week the date occurs week()
week(datan)## [1] 51 40 10
hour() is used to extract hour from date
hour(datan)## [1] 8 23 16
minute() is used to extract minute component from the date
minute(datan)## [1] 42 22 12
second() is used to extract second component from the date
second(datan)## [1] 22 30 2
semester is used to find which semester the date belongs
semester(datan)## [1] 2 2 1
We can use the components to assign/ change the value in the date one example is shown below
data12=ymd_hms(c("2019 12 2 2 34 22","2020 05 14 9 8 7"))
data12## [1] "2019-12-02 02:34:22 UTC" "2020-05-14 09:08:07 UTC"
# Changing hour component
hour(data12)=c(1,2)
data12## [1] "2019-12-02 01:34:22 UTC" "2020-05-14 02:08:07 UTC"
In similar way we change set other components also
am() / pm() used to check which time is recorded
am(datan)## [1] TRUE FALSE FALSE
pm(datan)## [1] FALSE TRUE TRUE
leap_year() can be used to check whether it is leap year or not
leap_year(datan)## [1] FALSE FALSE FALSE
Approximating dates
Date time objects can be approximated in different ways as below
Floor
floor() can be used to round the date time to lower nearest value. unit argument is used to specify which component we consider for rounding
datan## [1] "2021-12-21 08:42:22 UTC" "2021-10-02 23:22:30 UTC"
## [3] "2021-03-11 16:12:02 UTC"
floor_date(datan,unit='month')## [1] "2021-12-01 UTC" "2021-10-01 UTC" "2021-03-01 UTC"
Round
round() is used to round to nearest date
round_date(datan,unit="day")## [1] "2021-12-21 UTC" "2021-10-03 UTC" "2021-03-12 UTC"
Ceiling
ceiling_date is used to round to higher nearest date
ceiling_date(datan,unit="hour")## [1] "2021-12-21 09:00:00 UTC" "2021-10-03 00:00:00 UTC"
## [3] "2021-03-11 17:00:00 UTC"
Rollback
rollback is used to change the month to previous month and date as last date of month
rollback(datan)## [1] "2021-11-30 08:42:22 UTC" "2021-09-30 23:22:30 UTC"
## [3] "2021-02-28 16:12:02 UTC"
roll_to_first argument is used to go to first day of previous month preserve_hms argument is used to keep the time as it is
rollback(datan,roll_to_first = T,preserve_hms = T)## [1] "2021-12-01 08:42:22 UTC" "2021-10-01 23:22:30 UTC"
## [3] "2021-03-01 16:12:02 UTC"
Timezone handling
We can change time saved in one timezone to another using with_tz(). tzone is used to specify time zone.
# Original time
datan## [1] "2021-12-21 08:42:22 UTC" "2021-10-02 23:22:30 UTC"
## [3] "2021-03-11 16:12:02 UTC"
# Converted time
with_tz(datan,tzone="EST")## [1] "2021-12-21 03:42:22 EST" "2021-10-02 18:22:30 EST"
## [3] "2021-03-11 11:12:02 EST"
Without converting time we can change timezone using force_tz()
force_tz(datan,tzone='EST')## [1] "2021-12-21 08:42:22 EST" "2021-10-02 23:22:30 EST"
## [3] "2021-03-11 16:12:02 EST"
Stamp date times
stamp() can be used to create template from example string and we can apply it to new date time
eg=stamp("The example is created for Thursday 1 june,2021 9:20")
eg(datan)## [1] "The example is created for Tuesday 21 December,2021 08:42"
## [2] "The example is created for Saturday 02 October,2021 23:22"
## [3] "The example is created for Thursday 11 March,2021 16:12"
Daylight saving
Day light saving is a practice of advancing clocks by one hour during warmer months i.e during late winter or early spring so that darkness falls at a later clock time and set clocks back by one hour in autumn.
Check for daylight saving
logical. TRUE if DST is in force, FALSE if not, NA if unknown.
dst(datan)## [1] FALSE FALSE FALSE
Some mathematical operations with date time objects
We can perform some mathematical operations using date time objects to obtain other date. Remember we have mentioned dates are saved in Internal form as integers. so we can add or subtract integers to dates to obtain other date
example We can move forward to a date using number of days using '+'
data1=ymd(c('2021 3 2','2020 12 24'))
# 30 days from our dates
data11=data1+31Difference between two dates can be found using '-'
data11-data1## Time differences in days
## [1] 31 31
We can go back to number of days using '-' operator
data1-20## [1] "2021-02-10" "2020-12-04"
# examples with time zones
x <- ymd_hms("2015-09-22 01:00:00", tz = "US/Eastern")
y <- ymd_hms("2015-09-22 01:00:00", tz = "US/Pacific")
z <- ymd_hms("2019-09-22 21:00:00", tz = "US/Pacific")
# moving 3 days 19 hours
x+days(4)-hours(5)## [1] "2015-09-25 20:00:00 EDT"
# Checking for equality
y == x## [1] FALSE
# Difference between two different timezones
y - x## Time difference of 3 hours
# same timezone
z-y## Time difference of 1461.833 days
Sequence generation using hours
Creating sequences with time is very similar to sequences with numbers. However, we need to make sure it is date object. This is done by seq() and by argument is used to specify difference between elements of the sequence.
# Using days
seq(ymd("2015-1-1"),
ymd("2015-1-15"),
by = "2 days")## [1] "2015-01-01" "2015-01-03" "2015-01-05" "2015-01-07" "2015-01-09"
## [6] "2015-01-11" "2015-01-13" "2015-01-15"
# Using hours to generate sequence
seq(ymd_hms("2015 1 1 4 30 22",tz='EST'),
ymd_hms("2015 1 1 7 31 00",tz='EST'),
by= "hours")## [1] "2015-01-01 04:30:22 EST" "2015-01-01 05:30:22 EST"
## [3] "2015-01-01 06:30:22 EST" "2015-01-01 07:30:22 EST"
# Using months
seq(ymd("2025-1-1"), ymd("2025-12-15"), by = "4 months")## [1] "2025-01-01" "2025-05-01" "2025-09-01"
Periods
Periods track changes in clock times, this ignore time line irregularities. we can define a period using Following commands| Function | Period.created |
|---|---|
| years(x ) | x years |
| months(x) | x months |
| weeks(x) | x weeks |
| days(x ) | x days |
| hours(x ) | x hours |
| minutes(x ) | x minutes |
| seconds(x ) | x seconds |
| milliseconds(x ) | x milliseconds |
| microseconds(x ) | x microseconds |
| nanoseconds(x ) | x nanoseconds |
| picoseconds(x ) | x picoseconds |
| x represents a number |
for example
seconds(2)## [1] "2S"
months(2)## [1] "2m 0d 0H 0M 0S"
These periods have some drawbacks, This happens because of daylight saving time and leap year, consider the following dates
A normal day
normal <- ymd_hms("2018-01-01 01:30:00",tz="US/Eastern")The start of daylight savings (spring forward)
gap_day <- ymd_hms("2018-03-11 01:30:00",tz="US/Eastern")The end of daylight savings (fall back)
lap_day <- ymd_hms("2018-11-04 00:30:00",tz="US/Eastern")Leap years and leap seconds
leap_day <- ymd("2023-03-01")The following example holds correct since it is normal day
normal + minutes(90)## [1] "2018-01-01 03:00:00 EST"
In day light saving start day there will be a skip of one hour in clock and this needs to be taken into account.
gap_day## [1] "2018-03-11 01:30:00 EST"
gap_day+minutes(90)## [1] "2018-03-11 03:00:00 EDT"
We must be getting 4 pm because of daylight saving in the region. but this period doesn't consider it
At the end of daylight saving period we need to take in to account of clock changing back to 1 hour
lap_day## [1] "2018-11-04 00:30:00 EDT"
lap_day+minutes(90)## [1] "2018-11-04 02:00:00 EST"
And leap year calculation
leap_day+years(1)## [1] "2024-03-01"
The above miscalculations can be handled by using 'durations' available in lubridate
Durations
Durations simply measure the time span between start and end dates. They track the passage of physical time, which deviates from clock time when irregularities occur. they can be obtained using| Function | Duration |
|---|---|
| dyears(x) | 31536000x seconds |
| dmonths(x) | 2629800x seconds |
| dweeks(x) | 604800x seconds |
| ddays(x) | 86400x seconds |
| dhours(x) | 3600x seconds |
| dminutes(x) | 60x seconds |
| dseconds(x) | x seconds |
| dmilliseconds(x) | x 10^-3 Seconds |
| dmicroseconds(x) | x 10^-6 Seconds |
| dnanoseconds(x) | x 10^-9 Seconds |
| dpicoseconds(x) | x 10^-12 Seconds |
normal + dminutes(90)## [1] "2018-01-01 03:00:00 EST"
gap_day + dminutes(90)## [1] "2018-03-11 04:00:00 EDT"
lap_day+dminutes(90)## [1] "2018-11-04 01:00:00 EST"
leap_day + dyears(1)## [1] "2024-02-29 06:00:00 UTC"
We can see that problems discussed with period have been solved.
Intervals
They represent specific intervals of the timeline, bounded by start and end date-times. interval(a,b) is used to create interval from a to b
i=interval(normal, normal + minutes(90))
i## [1] 2018-01-01 01:30:00 EST--2018-01-01 03:00:00 EST
j=interval(datan,leap_day)
j## [1] 2021-12-21 08:42:22 UTC--2023-03-01 UTC
## [2] 2021-10-02 23:22:30 UTC--2023-03-01 UTC
## [3] 2021-03-11 16:12:02 UTC--2023-03-01 UTC
a %within% b used to check whether interval or date-time a fall within interval b
i %within% j## [1] FALSE FALSE FALSE
Final Note
We have seen how we can handle date and time objects in R using base and lubridate packages. We have learnt how to
- Assignment
- Conversion
- Components of date time objects
- Daylight saving handling
- Approximations
- Timezone handling
- Using stamps
We have given introduction to almost all the important tools to handle date time components in R. We have used ISOdate() and seq() from base package, unite() from tidyr package and all other functions are from lubridate package.