library(tidyverse)
library(lubridate)
library(dplyr)
Why is Handling Time Data Important in R?
A lot of data contains time information, such as Financial Market Data, E-commerce Transactions, Weather Data, GPS and Location Tracking, etc..
Proper management of time data underpins effective time series data visualization, time series analysis and trend forecasting.
It ensures consistent and meaningful integration of data from sources with different ways of expressing date and time.
Why we use lubridate to manage time data?
It offers easy functions for parsing, manipulating, and computing time data.
It efficiently manages time intervals, enables straightforward conversion between time units, and simplifies time zone management.
Additionally, lubridate allows intuitive date arithmetic and ensures consistency and accuracy in time data processing.
A date-time is a date plus a time. There are several ways that we can create a date-time in R. Here, we are only going to show how can we create one from a string, as our focus is on time zone management.
We can use
to parse date-times with year, month, and day, hour, minute, and second components. Here, ‘y’ stands for year, ‘m’ for month, ‘d’ for day, ‘h’ for hour, ‘m’ for minute, and ‘s’ for second.
How to decide which one to use? It depends on the structure of the string. For example, if the string is ‘2017/02/06 23:00’, which contains the year first, then the month, then the day, followed by the hour and minute, and no second, we would use ymd_hm().
ymd_hms("2017/01/13 12:11:59")
## [1] "2017-01-13 12:11:59 UTC"
mdy_h("May-23-01 22")
## [1] "2001-05-23 22:00:00 UTC"
Let’s test how robust is the function! Actually, if we can read the string, the function can also read it.
ymd_hms("2017/01/31 02:11:59")
## [1] "2017-01-31 02:11:59 UTC"
ymd_hms("2017/01/31 2:11:59")
## [1] "2017-01-31 02:11:59 UTC"
ymd_hms("2017/01/3 12:11:59")
## [1] "2017-01-03 12:11:59 UTC"
ymd_hms("2017/01/0312:11:59")
## [1] "2017-01-03 12:11:59 UTC"
ymd_hms("2017/01/3102:11:59")
## [1] "2017-01-31 02:11:59 UTC"
ymd_hms("2017/01/312:11:59") # Error! Because it can not distinguish whether this is "2017-01-03 12:11:59 UTC" or "2017-01-31 02:11:59 UTC", just as we can't.
## Warning: All formats failed to parse. No formats found.
## [1] NA
Several ways, including duration, period, and interval, can be used to express time spans. As we focus on time zone management, however, we only introduce the simplest concept — difftime here.
# We can show the time difference by simply subtract two datetimes. It returns as a difftime object
ymd_hms("2017/01/31 02:11:59")-ymd_hms("2017/01/30 02:11:59")
## Time difference of 1 days
ymd_hms("2017/01/31 02:11:59")-ymd_hms("2017/01/31 03:11:59")
## Time difference of -1 hours
class(ymd_hms("2017/01/31 02:11:59")-ymd_hms("2017/01/30 02:11:59"))
## [1] "difftime"
# Let's check our time zone by Sys.timezone() first!
Sys.timezone()
## [1] "America/New_York"
As we can see time zones in lubridate look like this: {area}/{location}, typically in the form {continent}/{city} or {ocean}/{city}
# All time zones that lubridate has are in the list OlsonNames().
length(OlsonNames())
## [1] 597
head(OlsonNames())
## [1] "Africa/Abidjan" "Africa/Accra" "Africa/Addis_Ababa"
## [4] "Africa/Algiers" "Africa/Asmara" "Africa/Asmera"
A little bit extension from what we’ve discussed in “How to create date-times from strings” section! Time zone information can also be specified using the argument tz = “{timezone}”.
x1 <- ymd_hms("2023-10-29 11:00:00", tz = "America/Toronto")
x1
## [1] "2023-10-29 11:00:00 EDT"
x2 <- ymd_hms("2023-10-29 23:00:00", tz = "Asia/Taipei")
x2
## [1] "2023-10-29 23:00:00 CST"
x3 <- ymd_hms("2023-10-30 04:00:00", tz = "Pacific/Auckland")
x3
## [1] "2023-10-30 04:00:00 NZDT"
x1, x2, and x3 represent the same physical moment in time: Toronto’s 2023-10-29 11:00:00 corresponds to Taipei’s 2023-10-29 23:00:00 and Auckland’s 2023-10-30 04:00:00. Essentially, if we convert all these datetimes to Greenwich Mean Time (GMT), they would align to the same GMT.
x1 - x2
## Time difference of 0 secs
x1 - x3
## Time difference of 0 secs
When combining date-times using operations like c(), time zones can be dropped, and the combined date-times will display in the time zone of the first element. For example, all date-times in x4 now show in the time zone of x1.
x4 <- c(x1, x2, x3)
x4
## [1] "2023-10-29 11:00:00 EDT" "2023-10-29 11:00:00 EDT"
## [3] "2023-10-29 11:00:00 EDT"
This depends on why the current data is wrong for you!
with_tz:
This function keeps the point in time (i.e., the actual moment) unchanged and only changes the display timezone.
If the physical moment that the datetime shows is right, but you want to express that moment in another timezone, use it!
For example, When you are essentially asking, “What is the time in Taipei when it is 11:00:00 on Oct 29, 2023, in New York?”, use the with_tz function to convert time from “America/New_York” to “Asia/Taipei.”
dt <-ymd_hms("2023-10-29 11:00:00", tz = "America/New_York")
dt
## [1] "2023-10-29 11:00:00 EDT"
dt_withtz <- with_tz(dt, tzone = "Asia/Taipei")
dt_withtz
## [1] "2023-10-29 23:00:00 CST"
dt-dt_withtz
## Time difference of 0 secs
force_tz:
This function changes the point in time to align with the new timezone’s time.
If you have a datetime that appears correct in terms of the time itself but is mistakenly associated with the wrong time zone, use it!
For example, when you are essentially saying, “Treat this point in time as if it’s happening in Taipei, not New York.”, use the force_tz function to forcefully set the time from “America/New_York” to “Asia/Taipei.”
dt <-ymd_hms("2023-10-29 11:00:00", tz = "America/New_York")
dt
## [1] "2023-10-29 11:00:00 EDT"
dt_forcetz <- force_tz(dt, tzone = "Asia/Taipei")
dt_forcetz
## [1] "2023-10-29 11:00:00 CST"
dt-dt_forcetz
## Time difference of 12 hours
# 2023 daylight saving time strats from Mar 12 and ends at Nov 5.
dst <- ymd_hms("2023-06-01 10:00:00", tz = "America/New_York")
dst
## [1] "2023-06-01 10:00:00 EDT"
nodst <- ymd_hms("2023-12-01 10:00:00", tz = "America/New_York")
nodst
## [1] "2023-12-01 10:00:00 EST"
convertdst <- force_tz(dst, tzone = "Asia/Taipei")
convertdst
## [1] "2023-06-01 10:00:00 CST"
convertnodst <- force_tz(nodst, tzone = "Asia/Taipei")
convertnodst
## [1] "2023-12-01 10:00:00 CST"
dst - convertdst
## Time difference of 12 hours
nodst - convertnodst
## Time difference of 13 hours
We can see that lubridate does consider daylight saving time! During DST, the time difference between Taipei and New York is 12 hours; however, when it is not in DST, the time difference between Taipei and New York is 13 hours.
We can also check whether a datetime is in DST by the fuction dst().
dst(ymd_hms("2023-06-01 10:00:00", tz = "America/New_York"))
## [1] TRUE
dst(ymd_hms("2023-12-01 10:00:00", tz = "America/New_York"))
## [1] FALSE