Datetimes in R

Jeff Walker
2013-08-06

Introduction

Dates and times can be one of the most frustrating aspects of data analysis, and is probably too often handled incorrectly. Fortunately, the lubridate package makes dealing with datetimes easier in R. But we still need to be careful when dealing with timezones as the conversion from UTC to local time zones is sometimes not intuitive.

Datetimes are best stored as a continuous numeric value (integer or float) representing the amount of time (usually seconds) since some datum (aka epoch). In R, datetimes are best stored using the POSIXct class, which represents datetimes as the number of seconds since the start of 1970 (aka the Unix epoch).

It is important to recognize that datetimes can be represented by two key formats:

Any given datetime is a unique value according to clock time, but can have different calendar time representations depending on the timezone.

Parsing Dates and Times

lubridate provides a series of shortcut parsing functions for convert timestamps from strings to POSIXct objects.

library(lubridate)
ymd(c("2000-01-01", "2000-01-02", "2000-01-03"))
## [1] "2000-01-01 UTC" "2000-01-02 UTC" "2000-01-03 UTC"
ymd_hm(c("2000-01-01 00:00", "2000-01-02 01:00", "2000-01-03 02:30"))
## [1] "2000-01-01 00:00:00 UTC" "2000-01-02 01:00:00 UTC"
## [3] "2000-01-03 02:30:00 UTC"

Assigning Timezones

By default, lubridate assumes timestamps are in UTC. We can specify a timezone using the tz argument to the parsing function.

ymd_hm(c("2000-01-01 00:00", "2000-01-02 01:00", "2000-01-03 02:30"), tz = "America/New_York")
## [1] "2000-01-01 00:00:00 EST" "2000-01-02 01:00:00 EST"
## [3] "2000-01-03 02:30:00 EST"

Note that lubridate has automatically accounted for daylight savings time (EST vs EDT).

ymd_hm(c("2000-01-01 06:00", "2000-07-01 06:00"), tz = "America/New_York")
## [1] "2000-01-01 06:00:00 EST" "2000-07-01 06:00:00 EDT"

Numeric Representation of Datetimes (Clock Time)

The POSIXct class stores dates and times as the signed number of seconds since the beginning of 1970 (i.e. the Unix epoch).

utc <- ymd_hm(c("1970-01-01 00:00", "1971-01-01 00:00", "1972-01-01 00:00"))
as.numeric(utc)  # number of seconds since 1970-01-01 00:00
## [1]        0 31536000 63072000
diff(as.numeric(utc))/86400  # number of days between each consecutive datetime
## [1] 365 365

Converting Timezones

After a datetime has been parsed, its timezone can be changed in two ways:
1. Conversion from one timezone to another using with_tz() changes the calendar time representation, but not the clock time
2. Forcing a specific timezone using force_tz() changes the clock time, but not the calendar time representation

Conversion

To convert from one timezone to another, use with_tz():

utc <- ymd_hm(c("2000-01-01 06:00", "2000-07-01 06:00"))
et <- with_tz(utc, tz = "America/New_York")

The calendar representations are now different:

print(utc)
## [1] "2000-01-01 06:00:00 UTC" "2000-07-01 06:00:00 UTC"
print(et)
## [1] "2000-01-01 01:00:00 EST" "2000-07-01 02:00:00 EDT"

The clock time representations are still the same:

rbind(utc = utc, et = et, diff = utc - et)
##           [,1]      [,2]
## utc  946706400 962431200
## et   946706400 962431200
## diff         0         0

The conversion to eastern timezone affects not only how the underlying numeric values are printed as calendar datetimes, but also the extraction of the datetime components (e.g. year(), month(), day(), yday(), hour()), which are based on the calendar representation:

hour(utc)
## [1] 6 6
hour(et)
## [1] 1 2

Forcing

If a timestamp string was parsed using the incorrect timezone, then the resulting POSIXct values can be forced to use a different timezone.

utc <- ymd_hm(c("2000-01-01 06:00", "2000-07-01 06:00"))
et <- force_tz(utc, tz = "America/New_York")
print(utc)
## [1] "2000-01-01 06:00:00 UTC" "2000-07-01 06:00:00 UTC"
print(et)
## [1] "2000-01-01 06:00:00 EST" "2000-07-01 06:00:00 EDT"

The POSIXct values now look identical, but are in fact different datetimes as indicated by their numeric values:

rbind(utc = utc, et = et, diff_sec = as.numeric(utc) - as.numeric(et), diff_hr = (as.numeric(utc) - 
    as.numeric(et))/3600)
##               [,1]      [,2]
## utc      946706400 962431200
## et       946724400 962445600
## diff_sec    -18000    -14400
## diff_hr         -5        -4

Summary

In summary, conversion of timezones using with_tz() will not change the underlying numeric value but will change the interpretation of this value in calendar format. Forcing a timezone using force_tz() will change the underlying numeric value but will not change the interpretation in calendar format.

Merging Datasets by Datetimes

Mis-matching timezones when merging data frames based on the timestamp can be a problem.

df.utc <- data.frame(DATETIME = ymd_hm(c("2000-01-01 06:00", "2000-07-01 07:00")))
df.utc$HR_UTC <- hour(df.utc$DATETIME)
print(df.utc)
##              DATETIME HR_UTC
## 1 2000-01-01 06:00:00      6
## 2 2000-07-01 07:00:00      7
df.et <- data.frame(DATETIME = ymd_hm(c("2000-01-01 06:00", "2000-07-01 07:00"), 
    tz = "America/New_York"))
df.et$HR_ET <- hour(df.et$DATETIME)
print(df.et)
##              DATETIME HR_ET
## 1 2000-01-01 06:00:00     6
## 2 2000-07-01 07:00:00     7

The two data frames appear to have the same datetimes, but when we try to merge them, the df.et datetimes are converted to UTC:

df.merge <- merge(df.utc, df.et, all = TRUE)
print(df.merge)
##              DATETIME HR_UTC HR_ET
## 1 2000-01-01 06:00:00      6    NA
## 2 2000-01-01 11:00:00     NA     6
## 3 2000-07-01 07:00:00      7    NA
## 4 2000-07-01 11:00:00     NA     7
print(tz(df.merge$DATETIME))
## [1] "UTC"

This is also a problem when joining a continuous timeseries (e.g. rainfall/streamflow) with a non-continuous timeseries (e.g. water quality).

df.utc <- data.frame(DATETIME = ymd_hm("2000-01-01 00:00") + dhours(0:23))
df.utc$HR_UTC <- hour(df.utc$DATETIME)
df.et <- data.frame(DATETIME = ymd_hm(c("2000-01-01 06:00", "2000-01-01 18:00"), 
    tz = "America/New_York"))
df.et$HR_ET <- hour(df.et$DATETIME)
print(df.et)
##              DATETIME HR_ET
## 1 2000-01-01 06:00:00     6
## 2 2000-01-01 18:00:00    18
df.merge <- merge(df.utc, df.et, all = TRUE)
print(df.merge)
##               DATETIME HR_UTC HR_ET
## 1  2000-01-01 00:00:00      0    NA
## 2  2000-01-01 01:00:00      1    NA
## 3  2000-01-01 02:00:00      2    NA
## 4  2000-01-01 03:00:00      3    NA
## 5  2000-01-01 04:00:00      4    NA
## 6  2000-01-01 05:00:00      5    NA
## 7  2000-01-01 06:00:00      6    NA
## 8  2000-01-01 07:00:00      7    NA
## 9  2000-01-01 08:00:00      8    NA
## 10 2000-01-01 09:00:00      9    NA
## 11 2000-01-01 10:00:00     10    NA
## 12 2000-01-01 11:00:00     11     6
## 13 2000-01-01 12:00:00     12    NA
## 14 2000-01-01 13:00:00     13    NA
## 15 2000-01-01 14:00:00     14    NA
## 16 2000-01-01 15:00:00     15    NA
## 17 2000-01-01 16:00:00     16    NA
## 18 2000-01-01 17:00:00     17    NA
## 19 2000-01-01 18:00:00     18    NA
## 20 2000-01-01 19:00:00     19    NA
## 21 2000-01-01 20:00:00     20    NA
## 22 2000-01-01 21:00:00     21    NA
## 23 2000-01-01 22:00:00     22    NA
## 24 2000-01-01 23:00:00     23    18

We get the correct date if the datetimes in df.et are actually the same UTC, but converted to EST/EDT, which changes their calendar representation.

df.utc <- data.frame(DATETIME = ymd_hm("2000-01-01 00:00") + dhours(0:23))
df.utc$HR_UTC <- hour(df.utc$DATETIME)
df.et <- data.frame(DATETIME = ymd_hm(c("2000-01-01 06:00", "2000-01-01 18:00")))
df.et$DATETIME <- with_tz(df.et$DATETIME, "America/New_York")
df.et$HR_ET <- hour(df.et$DATETIME)
print(df.et)
##              DATETIME HR_ET
## 1 2000-01-01 01:00:00     1
## 2 2000-01-01 13:00:00    13
df.merge <- merge(df.utc, df.et, all = TRUE)
print(df.merge)
##               DATETIME HR_UTC HR_ET
## 1  2000-01-01 00:00:00      0    NA
## 2  2000-01-01 01:00:00      1    NA
## 3  2000-01-01 02:00:00      2    NA
## 4  2000-01-01 03:00:00      3    NA
## 5  2000-01-01 04:00:00      4    NA
## 6  2000-01-01 05:00:00      5    NA
## 7  2000-01-01 06:00:00      6     1
## 8  2000-01-01 07:00:00      7    NA
## 9  2000-01-01 08:00:00      8    NA
## 10 2000-01-01 09:00:00      9    NA
## 11 2000-01-01 10:00:00     10    NA
## 12 2000-01-01 11:00:00     11    NA
## 13 2000-01-01 12:00:00     12    NA
## 14 2000-01-01 13:00:00     13    NA
## 15 2000-01-01 14:00:00     14    NA
## 16 2000-01-01 15:00:00     15    NA
## 17 2000-01-01 16:00:00     16    NA
## 18 2000-01-01 17:00:00     17    NA
## 19 2000-01-01 18:00:00     18    13
## 20 2000-01-01 19:00:00     19    NA
## 21 2000-01-01 20:00:00     20    NA
## 22 2000-01-01 21:00:00     21    NA
## 23 2000-01-01 22:00:00     22    NA
## 24 2000-01-01 23:00:00     23    NA

Alternatively, if the original timestamp strings in df.et are the local representation, then we can assign the timezone in the parsing function.

df.utc <- data.frame(DATETIME = ymd_hm("2000-01-01 00:00") + dhours(0:23))
df.utc$HR_UTC <- hour(df.utc$DATETIME)
df.et <- data.frame(DATETIME = ymd_hm(c("2000-01-01 01:00", "2000-01-01 13:00"), 
    tz = "America/New_York"))
df.et$HR_ET <- hour(df.et$DATETIME)
print(df.et)
##              DATETIME HR_ET
## 1 2000-01-01 01:00:00     1
## 2 2000-01-01 13:00:00    13
df.merge <- merge(df.utc, df.et, all = TRUE)
print(df.merge)
##               DATETIME HR_UTC HR_ET
## 1  2000-01-01 00:00:00      0    NA
## 2  2000-01-01 01:00:00      1    NA
## 3  2000-01-01 02:00:00      2    NA
## 4  2000-01-01 03:00:00      3    NA
## 5  2000-01-01 04:00:00      4    NA
## 6  2000-01-01 05:00:00      5    NA
## 7  2000-01-01 06:00:00      6     1
## 8  2000-01-01 07:00:00      7    NA
## 9  2000-01-01 08:00:00      8    NA
## 10 2000-01-01 09:00:00      9    NA
## 11 2000-01-01 10:00:00     10    NA
## 12 2000-01-01 11:00:00     11    NA
## 13 2000-01-01 12:00:00     12    NA
## 14 2000-01-01 13:00:00     13    NA
## 15 2000-01-01 14:00:00     14    NA
## 16 2000-01-01 15:00:00     15    NA
## 17 2000-01-01 16:00:00     16    NA
## 18 2000-01-01 17:00:00     17    NA
## 19 2000-01-01 18:00:00     18    13
## 20 2000-01-01 19:00:00     19    NA
## 21 2000-01-01 20:00:00     20    NA
## 22 2000-01-01 21:00:00     21    NA
## 23 2000-01-01 22:00:00     22    NA
## 24 2000-01-01 23:00:00     23    NA

Or if both data frames are in local time zone.

df.x <- data.frame(DATETIME = ymd_hm("2000-01-01 00:00", tz = "America/New_York") + 
    dhours(0:23))
df.x$HR_X <- hour(df.x$DATETIME)
df.y <- data.frame(DATETIME = ymd_hm(c("2000-01-01 06:00", "2000-01-01 18:00"), 
    tz = "America/New_York"))
df.y$HR_Y <- hour(df.y$DATETIME)
print(df.y)
##              DATETIME HR_Y
## 1 2000-01-01 06:00:00    6
## 2 2000-01-01 18:00:00   18
df.merge <- merge(df.x, df.y, all = TRUE)
print(df.merge)
##               DATETIME HR_X HR_Y
## 1  2000-01-01 00:00:00    0   NA
## 2  2000-01-01 01:00:00    1   NA
## 3  2000-01-01 02:00:00    2   NA
## 4  2000-01-01 03:00:00    3   NA
## 5  2000-01-01 04:00:00    4   NA
## 6  2000-01-01 05:00:00    5   NA
## 7  2000-01-01 06:00:00    6    6
## 8  2000-01-01 07:00:00    7   NA
## 9  2000-01-01 08:00:00    8   NA
## 10 2000-01-01 09:00:00    9   NA
## 11 2000-01-01 10:00:00   10   NA
## 12 2000-01-01 11:00:00   11   NA
## 13 2000-01-01 12:00:00   12   NA
## 14 2000-01-01 13:00:00   13   NA
## 15 2000-01-01 14:00:00   14   NA
## 16 2000-01-01 15:00:00   15   NA
## 17 2000-01-01 16:00:00   16   NA
## 18 2000-01-01 17:00:00   17   NA
## 19 2000-01-01 18:00:00   18   18
## 20 2000-01-01 19:00:00   19   NA
## 21 2000-01-01 20:00:00   20   NA
## 22 2000-01-01 21:00:00   21   NA
## 23 2000-01-01 22:00:00   22   NA
## 24 2000-01-01 23:00:00   23   NA

Note that the DATETIME column is no shown in local time zone, not UTC.

tz(df.merge$DATETIME)
## [1] "America/New_York"

Daylight Savings Time

So what happens to the timestamp around daylight savings time. In 2008, DST began on March 9 at 2 am (local time). For America/New_York, this corresponds to the UTC time 2008-03-09 07:00. Notice the skip from 1 am to 3 am:

dt.utc <- ymd_hm(c("2008-03-09 00:00")) + dhours(0:12)
data.frame(UTC = dt.utc, ET = with_tz(dt.utc, "America/New_York"))
##                    UTC                  ET
## 1  2008-03-09 00:00:00 2008-03-08 19:00:00
## 2  2008-03-09 01:00:00 2008-03-08 20:00:00
## 3  2008-03-09 02:00:00 2008-03-08 21:00:00
## 4  2008-03-09 03:00:00 2008-03-08 22:00:00
## 5  2008-03-09 04:00:00 2008-03-08 23:00:00
## 6  2008-03-09 05:00:00 2008-03-09 00:00:00
## 7  2008-03-09 06:00:00 2008-03-09 01:00:00
## 8  2008-03-09 07:00:00 2008-03-09 03:00:00
## 9  2008-03-09 08:00:00 2008-03-09 04:00:00
## 10 2008-03-09 09:00:00 2008-03-09 05:00:00
## 11 2008-03-09 10:00:00 2008-03-09 06:00:00
## 12 2008-03-09 11:00:00 2008-03-09 07:00:00
## 13 2008-03-09 12:00:00 2008-03-09 08:00:00

When DST ends on Nov 2 at 2 am, then we loose an hour and 1 am is repeated:

dt.utc <- ymd_hm(c("2008-11-02 00:00")) + dhours(0:12)
data.frame(UTC = dt.utc, ET = with_tz(dt.utc, "America/New_York"))
##                    UTC                  ET
## 1  2008-11-02 00:00:00 2008-11-01 20:00:00
## 2  2008-11-02 01:00:00 2008-11-01 21:00:00
## 3  2008-11-02 02:00:00 2008-11-01 22:00:00
## 4  2008-11-02 03:00:00 2008-11-01 23:00:00
## 5  2008-11-02 04:00:00 2008-11-02 00:00:00
## 6  2008-11-02 05:00:00 2008-11-02 01:00:00
## 7  2008-11-02 06:00:00 2008-11-02 01:00:00
## 8  2008-11-02 07:00:00 2008-11-02 02:00:00
## 9  2008-11-02 08:00:00 2008-11-02 03:00:00
## 10 2008-11-02 09:00:00 2008-11-02 04:00:00
## 11 2008-11-02 10:00:00 2008-11-02 05:00:00
## 12 2008-11-02 11:00:00 2008-11-02 06:00:00
## 13 2008-11-02 12:00:00 2008-11-02 07:00:00

So what happens if you get a sub-hourly time series with the timestamps in the local timezone?

with_tz(ymd_hm(c("2008-11-02 01:30", "2008-11-02 02:00", "2008-11-02 02:30"), 
    tz = "America/New_York"), "UTC")
## [1] "2008-11-02 05:30:00 UTC" "2008-11-02 07:00:00 UTC"
## [3] "2008-11-02 07:30:00 UTC"

Then one of the hours gets skipped. And when DST ends, there is no unambiguous way to interpret the time during the repeated hour, and lubridate gives us an error:

with_tz(ymd_hm(c("2008-03-09 01:30", "2008-03-09 02:00", "2008-03-09 02:30", 
    "2008-03-09 03:00"), tz = "America/New_York"), "UTC")
## Warning: 2 failed to parse.
## [1] "2008-03-09 06:30:00 UTC" NA                       
## [3] NA                        "2008-03-09 07:00:00 UTC"

USGS Streamflow Data Example

Here is an example of streamflow data from USGS at the beginning and end of daylight savings time. The dataset includes a column containing the corresponding timezone (EDT/EST) for each timestamp. Note how the dataset repeats some values during the end of DST with alternating EDT/EST.

usgs <- read.table("usgs.txt", sep = "", col.names = c("SOURCE", "STATION", 
    "DATE", "TIME", "TIMEZONE", "VALUE", "FLAG"))
usgs$TIMESTAMP <- paste(usgs$DATE, usgs$TIME, sep = " ")
usgs <- usgs[order(usgs$TIMESTAMP), ]
usgs <- subset(usgs, select = c("TIMESTAMP", "TIMEZONE", "VALUE"))
print(usgs)
##           TIMESTAMP TIMEZONE VALUE
## 18 2007-11-04 00:00      EDT    68
## 19 2007-11-04 00:15      EDT    68
## 20 2007-11-04 00:30      EDT    67
## 21 2007-11-04 00:45      EDT    67
## 22 2007-11-04 01:00      EDT    65
## 23 2007-11-04 01:00      EST    63
## 24 2007-11-04 01:15      EDT    65
## 25 2007-11-04 01:15      EST    63
## 26 2007-11-04 01:30      EDT    64
## 27 2007-11-04 01:30      EST    63
## 28 2007-11-04 01:45      EDT    64
## 29 2007-11-04 01:45      EST    63
## 30 2007-11-04 02:00      EST    61
## 31 2007-11-04 02:15      EST    61
## 32 2007-11-04 02:30      EST    61
## 33 2007-11-04 02:45      EST    61
## 34 2007-11-04 03:00      EST    60
## 35 2007-11-04 03:15      EST    60
## 36 2007-11-04 03:30      EST    60
## 37 2007-11-04 03:45      EST    60
## 1  2008-03-09 00:00      EST   194
## 2  2008-03-09 00:15      EST   196
## 3  2008-03-09 00:30      EST   196
## 4  2008-03-09 00:45      EST   199
## 5  2008-03-09 01:00      EST   201
## 6  2008-03-09 01:15      EST   201
## 7  2008-03-09 01:30      EST   203
## 8  2008-03-09 01:45      EST   205
## 9  2008-03-09 03:00      EDT   205
## 10 2008-03-09 03:15      EDT   205
## 11 2008-03-09 03:30      EDT   207
## 12 2008-03-09 03:45      EDT   210
## 13 2008-03-09 04:00      EDT   210
## 14 2008-03-09 04:15      EDT   212
## 15 2008-03-09 04:30      EDT   214
## 16 2008-03-09 04:45      EDT   214
## 17 2008-03-09 05:00      EDT   214

We could incorrectly parse the timestamp without consideration for the time zone, which would result in the datetime values in UTC. As a result, the time series would have duplicated timestamps when daylight savings time begins, and a gap when it ends.

usgs$DATETIME_UTC <- ymd_hm(usgs$TIMESTAMP)
print(usgs)
##           TIMESTAMP TIMEZONE VALUE        DATETIME_UTC
## 18 2007-11-04 00:00      EDT    68 2007-11-04 00:00:00
## 19 2007-11-04 00:15      EDT    68 2007-11-04 00:15:00
## 20 2007-11-04 00:30      EDT    67 2007-11-04 00:30:00
## 21 2007-11-04 00:45      EDT    67 2007-11-04 00:45:00
## 22 2007-11-04 01:00      EDT    65 2007-11-04 01:00:00
## 23 2007-11-04 01:00      EST    63 2007-11-04 01:00:00
## 24 2007-11-04 01:15      EDT    65 2007-11-04 01:15:00
## 25 2007-11-04 01:15      EST    63 2007-11-04 01:15:00
## 26 2007-11-04 01:30      EDT    64 2007-11-04 01:30:00
## 27 2007-11-04 01:30      EST    63 2007-11-04 01:30:00
## 28 2007-11-04 01:45      EDT    64 2007-11-04 01:45:00
## 29 2007-11-04 01:45      EST    63 2007-11-04 01:45:00
## 30 2007-11-04 02:00      EST    61 2007-11-04 02:00:00
## 31 2007-11-04 02:15      EST    61 2007-11-04 02:15:00
## 32 2007-11-04 02:30      EST    61 2007-11-04 02:30:00
## 33 2007-11-04 02:45      EST    61 2007-11-04 02:45:00
## 34 2007-11-04 03:00      EST    60 2007-11-04 03:00:00
## 35 2007-11-04 03:15      EST    60 2007-11-04 03:15:00
## 36 2007-11-04 03:30      EST    60 2007-11-04 03:30:00
## 37 2007-11-04 03:45      EST    60 2007-11-04 03:45:00
## 1  2008-03-09 00:00      EST   194 2008-03-09 00:00:00
## 2  2008-03-09 00:15      EST   196 2008-03-09 00:15:00
## 3  2008-03-09 00:30      EST   196 2008-03-09 00:30:00
## 4  2008-03-09 00:45      EST   199 2008-03-09 00:45:00
## 5  2008-03-09 01:00      EST   201 2008-03-09 01:00:00
## 6  2008-03-09 01:15      EST   201 2008-03-09 01:15:00
## 7  2008-03-09 01:30      EST   203 2008-03-09 01:30:00
## 8  2008-03-09 01:45      EST   205 2008-03-09 01:45:00
## 9  2008-03-09 03:00      EDT   205 2008-03-09 03:00:00
## 10 2008-03-09 03:15      EDT   205 2008-03-09 03:15:00
## 11 2008-03-09 03:30      EDT   207 2008-03-09 03:30:00
## 12 2008-03-09 03:45      EDT   210 2008-03-09 03:45:00
## 13 2008-03-09 04:00      EDT   210 2008-03-09 04:00:00
## 14 2008-03-09 04:15      EDT   212 2008-03-09 04:15:00
## 15 2008-03-09 04:30      EDT   214 2008-03-09 04:30:00
## 16 2008-03-09 04:45      EDT   214 2008-03-09 04:45:00
## 17 2008-03-09 05:00      EDT   214 2008-03-09 05:00:00

Unfortunately, we cannot simply assign the TIMEZONE column to the tz argument of the parsing function, as this needs to be a single string:

usgs$DATETIME_UTC <- ymd_hm(usgs$TIMESTAMP, tz = usgs$TIMEZONE)
## Error: invalid 'tz' value

Furthermore, EDT and EST are imprecise timezone designations (from a global perspective) so there is no automatic way to integrate this information.

We could manually assign the local time zone, and then convert to UTC. This results in a continuous timeseries when DST ends but we sill have duplicated values at the end of DST.

usgs$DATETIME_LOCAL <- ymd_hm(usgs$TIMESTAMP, tz = "America/New_York")
usgs$DATETIME_UTC <- with_tz(usgs$DATETIME_LOCAL, "UTC")
print(usgs)
##           TIMESTAMP TIMEZONE VALUE        DATETIME_UTC      DATETIME_LOCAL
## 18 2007-11-04 00:00      EDT    68 2007-11-04 04:00:00 2007-11-04 00:00:00
## 19 2007-11-04 00:15      EDT    68 2007-11-04 04:15:00 2007-11-04 00:15:00
## 20 2007-11-04 00:30      EDT    67 2007-11-04 04:30:00 2007-11-04 00:30:00
## 21 2007-11-04 00:45      EDT    67 2007-11-04 04:45:00 2007-11-04 00:45:00
## 22 2007-11-04 01:00      EDT    65 2007-11-04 05:00:00 2007-11-04 01:00:00
## 23 2007-11-04 01:00      EST    63 2007-11-04 05:00:00 2007-11-04 01:00:00
## 24 2007-11-04 01:15      EDT    65 2007-11-04 06:15:00 2007-11-04 01:15:00
## 25 2007-11-04 01:15      EST    63 2007-11-04 06:15:00 2007-11-04 01:15:00
## 26 2007-11-04 01:30      EDT    64 2007-11-04 06:30:00 2007-11-04 01:30:00
## 27 2007-11-04 01:30      EST    63 2007-11-04 06:30:00 2007-11-04 01:30:00
## 28 2007-11-04 01:45      EDT    64 2007-11-04 06:45:00 2007-11-04 01:45:00
## 29 2007-11-04 01:45      EST    63 2007-11-04 06:45:00 2007-11-04 01:45:00
## 30 2007-11-04 02:00      EST    61 2007-11-04 07:00:00 2007-11-04 02:00:00
## 31 2007-11-04 02:15      EST    61 2007-11-04 07:15:00 2007-11-04 02:15:00
## 32 2007-11-04 02:30      EST    61 2007-11-04 07:30:00 2007-11-04 02:30:00
## 33 2007-11-04 02:45      EST    61 2007-11-04 07:45:00 2007-11-04 02:45:00
## 34 2007-11-04 03:00      EST    60 2007-11-04 08:00:00 2007-11-04 03:00:00
## 35 2007-11-04 03:15      EST    60 2007-11-04 08:15:00 2007-11-04 03:15:00
## 36 2007-11-04 03:30      EST    60 2007-11-04 08:30:00 2007-11-04 03:30:00
## 37 2007-11-04 03:45      EST    60 2007-11-04 08:45:00 2007-11-04 03:45:00
## 1  2008-03-09 00:00      EST   194 2008-03-09 05:00:00 2008-03-09 00:00:00
## 2  2008-03-09 00:15      EST   196 2008-03-09 05:15:00 2008-03-09 00:15:00
## 3  2008-03-09 00:30      EST   196 2008-03-09 05:30:00 2008-03-09 00:30:00
## 4  2008-03-09 00:45      EST   199 2008-03-09 05:45:00 2008-03-09 00:45:00
## 5  2008-03-09 01:00      EST   201 2008-03-09 06:00:00 2008-03-09 01:00:00
## 6  2008-03-09 01:15      EST   201 2008-03-09 06:15:00 2008-03-09 01:15:00
## 7  2008-03-09 01:30      EST   203 2008-03-09 06:30:00 2008-03-09 01:30:00
## 8  2008-03-09 01:45      EST   205 2008-03-09 06:45:00 2008-03-09 01:45:00
## 9  2008-03-09 03:00      EDT   205 2008-03-09 07:00:00 2008-03-09 03:00:00
## 10 2008-03-09 03:15      EDT   205 2008-03-09 07:15:00 2008-03-09 03:15:00
## 11 2008-03-09 03:30      EDT   207 2008-03-09 07:30:00 2008-03-09 03:30:00
## 12 2008-03-09 03:45      EDT   210 2008-03-09 07:45:00 2008-03-09 03:45:00
## 13 2008-03-09 04:00      EDT   210 2008-03-09 08:00:00 2008-03-09 04:00:00
## 14 2008-03-09 04:15      EDT   212 2008-03-09 08:15:00 2008-03-09 04:15:00
## 15 2008-03-09 04:30      EDT   214 2008-03-09 08:30:00 2008-03-09 04:30:00
## 16 2008-03-09 04:45      EDT   214 2008-03-09 08:45:00 2008-03-09 04:45:00
## 17 2008-03-09 05:00      EDT   214 2008-03-09 09:00:00 2008-03-09 05:00:00

The only correct way to deal with this format is to manually assign hourly offsets based on the value of TIMEZONE.

usgs <- merge(usgs, data.frame(TIMEZONE = c("EDT", "EST"), OFFSET = c(-4, -5)))
usgs$DATETIME_UTC <- ymd_hm(usgs$TIMESTAMP, tz = "UTC") - dhours(usgs$OFFSET)
usgs$DATETIME_LOCAL <- with_tz(usgs$DATETIME_UTC, "America/New_York")
usgs <- usgs[order(usgs$DATETIME_UTC), ]
print(usgs[, -6])
##    TIMEZONE        TIMESTAMP VALUE        DATETIME_UTC      DATETIME_LOCAL
## 1       EDT 2007-11-04 00:00    68 2007-11-04 04:00:00 2007-11-04 00:00:00
## 2       EDT 2007-11-04 00:15    68 2007-11-04 04:15:00 2007-11-04 00:15:00
## 3       EDT 2007-11-04 00:30    67 2007-11-04 04:30:00 2007-11-04 00:30:00
## 4       EDT 2007-11-04 00:45    67 2007-11-04 04:45:00 2007-11-04 00:45:00
## 5       EDT 2007-11-04 01:00    65 2007-11-04 05:00:00 2007-11-04 01:00:00
## 7       EDT 2007-11-04 01:15    65 2007-11-04 05:15:00 2007-11-04 01:15:00
## 9       EDT 2007-11-04 01:30    64 2007-11-04 05:30:00 2007-11-04 01:30:00
## 11      EDT 2007-11-04 01:45    64 2007-11-04 05:45:00 2007-11-04 01:45:00
## 20      EST 2007-11-04 01:00    63 2007-11-04 06:00:00 2007-11-04 01:00:00
## 21      EST 2007-11-04 01:15    63 2007-11-04 06:15:00 2007-11-04 01:15:00
## 23      EST 2007-11-04 01:30    63 2007-11-04 06:30:00 2007-11-04 01:30:00
## 24      EST 2007-11-04 01:45    63 2007-11-04 06:45:00 2007-11-04 01:45:00
## 19      EST 2007-11-04 02:00    61 2007-11-04 07:00:00 2007-11-04 02:00:00
## 22      EST 2007-11-04 02:15    61 2007-11-04 07:15:00 2007-11-04 02:15:00
## 18      EST 2007-11-04 02:30    61 2007-11-04 07:30:00 2007-11-04 02:30:00
## 29      EST 2007-11-04 02:45    61 2007-11-04 07:45:00 2007-11-04 02:45:00
## 30      EST 2007-11-04 03:00    60 2007-11-04 08:00:00 2007-11-04 03:00:00
## 31      EST 2007-11-04 03:15    60 2007-11-04 08:15:00 2007-11-04 03:15:00
## 32      EST 2007-11-04 03:30    60 2007-11-04 08:30:00 2007-11-04 03:30:00
## 33      EST 2007-11-04 03:45    60 2007-11-04 08:45:00 2007-11-04 03:45:00
## 34      EST 2008-03-09 00:00   194 2008-03-09 05:00:00 2008-03-09 00:00:00
## 35      EST 2008-03-09 00:15   196 2008-03-09 05:15:00 2008-03-09 00:15:00
## 36      EST 2008-03-09 00:30   196 2008-03-09 05:30:00 2008-03-09 00:30:00
## 37      EST 2008-03-09 00:45   199 2008-03-09 05:45:00 2008-03-09 00:45:00
## 25      EST 2008-03-09 01:00   201 2008-03-09 06:00:00 2008-03-09 01:00:00
## 26      EST 2008-03-09 01:15   201 2008-03-09 06:15:00 2008-03-09 01:15:00
## 27      EST 2008-03-09 01:30   203 2008-03-09 06:30:00 2008-03-09 01:30:00
## 28      EST 2008-03-09 01:45   205 2008-03-09 06:45:00 2008-03-09 01:45:00
## 12      EDT 2008-03-09 03:00   205 2008-03-09 07:00:00 2008-03-09 03:00:00
## 13      EDT 2008-03-09 03:15   205 2008-03-09 07:15:00 2008-03-09 03:15:00
## 14      EDT 2008-03-09 03:30   207 2008-03-09 07:30:00 2008-03-09 03:30:00
## 6       EDT 2008-03-09 03:45   210 2008-03-09 07:45:00 2008-03-09 03:45:00
## 15      EDT 2008-03-09 04:00   210 2008-03-09 08:00:00 2008-03-09 04:00:00
## 8       EDT 2008-03-09 04:15   212 2008-03-09 08:15:00 2008-03-09 04:15:00
## 16      EDT 2008-03-09 04:30   214 2008-03-09 08:30:00 2008-03-09 04:30:00
## 10      EDT 2008-03-09 04:45   214 2008-03-09 08:45:00 2008-03-09 04:45:00
## 17      EDT 2008-03-09 05:00   214 2008-03-09 09:00:00 2008-03-09 05:00:00

Now we see that the timeseries is continuous around the DST transitions, and that the local datetime (POSIXct) matches the timestamp (character).

Summary

When handling sub-daily time series, it is critical to pay close attention to the timezone information. All datetimes should be represented using the POSIXct class with the underlying clock time (numeric value) in UTC. For display purposes and extracting the calendar representation components (hour, day), the datetimes should be assigned the correct time zone.