I took these notes while working through the Manipulating Time Series Data in R with xts & zoo at DataCamp.

Introduction

xts extends the zoo class. xts objects are essentially a matrix of observations indexed by a time object. Create an xts object with xts(x, order.by) where x is the the data and order.by is a vector of dates/times to index the data. You can also add metadata to the xts object by declaring name-value pairs such as born = as.POSIXct("1899-05-08").

The code chunk below creates an xts object from a vector of data indexed by a vector of dates. A name-value pair adds metadata.

library(xts)

dat <- xts(x = rnorm(n = 5),
           order.by = seq(as.Date("2016-01-01"), 
                          length = 5, 
                          by = "days"),
           hayek_bday = as.POSIXct("1899-05-08"))

# another (cool) way to create the sequence of dates:
# as.Date("2016-01-01") + 0:4

print(dat)

##                  [,1]
## 2016-01-01 -0.5173702
## 2016-01-02 -0.7518931
## 2016-01-03 -0.1048107
## 2016-01-04  0.0401562
## 2016-01-05 -0.9661578

If you need to separate your time series into its core data and index attributes for additional analysis and manipulation, use the coredata() and index() functions inherited from the zoo class.

class(coredata(dat))

## [1] "matrix"

class(index(dat))

## [1] "Date"

You can index an xts object using any type of time series object (POSIXct, Date, etc.). The following xts creation methods are equivalent.

x <- rnorm(n = 5)
dat1 <- xts(x = x,
           order.by = as.Date("2016-01-01") + 0:4)
print(dat1)

##                  [,1]
## 2016-01-01  1.4568050
## 2016-01-02  1.4416505
## 2016-01-03 -1.1200310
## 2016-01-04  0.6519961
## 2016-01-05 -1.3620953

dat2 <- xts(x = x,
           order.by = as.POSIXct("2016-01-01") + (0:4)*60*60*24)

print(dat2)

##                  [,1]
## 2016-01-01  1.4568050
## 2016-01-02  1.4416505
## 2016-01-03 -1.1200310
## 2016-01-04  0.6519961
## 2016-01-05 -1.3620953

If you need to convert between classes because a function is not time series aware, use the as.* functions provided in R (as.POSIXct(), as.data.frame(), or as.matrix()). Convert ts objects or data frames to xts with as.xts().

class(as.xts(austres))  # austres is a ts object

## [1] "xts" "zoo"

class(as.ts(as.xts(austres)))  # convert it back!

## [1] "ts"

The zoo package includes its own method for reading files, although it is not necesary. The code chunk below loads a csv data file of time series data using base-R read.csv() and the equivalent way with read.zoo().

file <- "http://s3.amazonaws.com/assets.datacamp.com/production/course_1127/datasets/tmp_file.csv"

x <- read.csv(file = file)
dat1 <- xts(x = x, 
            order.by = as.Date(rownames(x), "%m/%d/%Y"))
print(dat1)

##            a b
## 2015-01-02 1 3
## 2015-02-03 2 4

dat2 <- read.zoo(file = file, 
                 index.column = 0, 
                 sep = ",", 
                 format = "%m/%d/%Y")
dat2 <- as.xts(dat2)

print(dat2)

##            a b
## 2015-01-02 1 3
## 2015-02-03 2 4

If you need an xts object to persist across sessions, save it with saveRDS(x, "file") and later re-open it with readRDS(). If you need to feed your analysis into another process, Write the xts object back to a file using write.zoo(x, sep, file).

file <- "./sunspots.txt"

sunspots_xts <- as.xts(sunspots)
write.zoo(x = sunspots_xts, 
          sep = ",", 
          file = file)

# FUN = as.yearmon converts strings such as Jan 1749 into a proper time class
sun <- as.xts(read.zoo(file = file, 
                       sep = ",", 
                       FUN = as.yearmon))

Basic Manipulations

Extract and replace values in xts objects by expressing values in the ISO standard (YYYY-MM-DDTHH:MM:SS - the dashes can be removed). Use a forward slash “/” to create an interval.

x <- rnorm(n = 1000)
dates <- seq(as.Date("2015-01-01"), 
             length = 1000, 
             by = "days")
dat <- xts(x = x, order.by = dates)

nrow(dat["2015"])  # zoo way!

## [1] 365

The last line in the above code chuck is the zoo equivalent to the following conventional matrix method.

nrow(dat[as.Date("2015-01-01") + 0:364])  # conventional way

## [1] 365

Here is an interval from 2016-01-01 to 2016-03-22.

nrow(dat["2016/2016-03-22"])

## [1] 82

Here is an example of extracting the time interval 5AM - 7AM for 2019-06-05.

x <- rnorm(n = 60*24*10)  # 10 days of data by minute
datetimes = as.POSIXct("2019-06-01") + (1:(60*24*10))*60
dat <- xts(x = x,
           order.by = datetimes)

head(dat["2019-06-05T05:00/2019-06-05T07:00"])

##                            [,1]
## 2019-06-05 05:00:00 -0.73936257
## 2019-06-05 05:01:00 -0.17298425
## 2019-06-05 05:02:00  0.71084222
## 2019-06-05 05:03:00  1.27913715
## 2019-06-05 05:04:00 -0.03052699
## 2019-06-05 05:05:00 -0.22565641

xts functions first() and last() are time-aware analogs to head() and tail().

x <- rnorm(n = 16)  # 16 days of data by day
datetimes = as.Date("2019-06-01") + (0:15)
dat <- xts(x = x,
           order.by = datetimes)

(lastweek <- last(dat, "1 week"))  # week defined as M:Su

##                    [,1]
## 2019-06-10 -0.846810380
## 2019-06-11 -0.404188702
## 2019-06-12 -0.598280156
## 2019-06-13  2.272130984
## 2019-06-14  0.077227531
## 2019-06-15  0.005792591
## 2019-06-16  0.595099206

last(lastweek, n = 2)  # last 2 obs of lastweek

##                   [,1]
## 2019-06-15 0.005792591
## 2019-06-16 0.595099206

first(lastweek, "-2 days")  # all but first 2 obs of lastweek

##                    [,1]
## 2019-06-12 -0.598280156
## 2019-06-13  2.272130984
## 2019-06-14  0.077227531
## 2019-06-15  0.005792591
## 2019-06-16  0.595099206

# First three days of second week of dat
first(last(first(dat, "2 weeks"), "1 week"), "3 days")

##                   [,1]
## 2019-06-03 -0.03043761
## 2019-06-04  0.03587669
## 2019-06-05 -0.21708247

By design, when you perform any binary operation using two xts objects, these objects are first aligned using the intersection of the indexes. This preserves the point-in-time aspect of the data, preventing accidental look ahead (or behind) bias into calculations. If you need to preserve the matrix dimensions, use coredata() or as.numeric() to create a matrix or vector, manually shift index values with lag(), or reindex the data. Here is an example of creating a matrix of vector.

a <- xts(x = rep(x = 3, times = 3),
         order.by = seq(as.Date("2019-06-06"), 
                          length = 3, 
                          by = "days"))
b <- xts(x = 1,
         order.by = as.Date("2019-06-07"))

# adding xts objects returns only intersecting dates. 
a + b

##            e1
## 2019-06-07  4

# Adding a numeric to an xts object is more intuitive. 
a + as.numeric(b)

##            [,1]
## 2019-06-06    4
## 2019-06-07    4
## 2019-06-08    4

Here is an example of reindexing.

# Merge a into b, replacing missing rows of b with 0
a + merge(b, index(a), fill = 0)

##            b
## 2019-06-06 3
## 2019-06-07 4
## 2019-06-08 3

# Merge a into b, replacing NAs with the last observation
a + merge(b, index(a), fill = na.locf)

##             b
## 2019-06-06 NA
## 2019-06-07  4
## 2019-06-08  4

Merging and Modifying Time Series

merge() joins an xts object to another xts object on the index, or a vector of dates to an xts object (useful to fill in gaps in an xts object).¹ merge() takes three arguments. First is ..., an arbitrary number of objects to combine. Second is join = c("inner", "left", "right", "outer"). Third is fill, specifying what to do with NAs.

a <- xts(x = rnorm(n = 4),
         order.by = as.Date(c("2019-06-05", 
                              "2019-06-08",
                              "2019-06-09",
                              "2019-06-13")))
b <- xts(x = rnorm(n = 4),
         order.by = as.Date(c("2019-06-05", 
                              "2019-06-06",
                              "2019-06-08",
                              "2019-06-09")))

# inner join
merge(a, b, join = "inner")

##                     a          b
## 2019-06-05  1.3118157  0.4076763
## 2019-06-08 -1.8079752 -1.7854470
## 2019-06-09  0.3988541 -0.6858090

# left join, NAs = 0
merge(a, b, join = "left", fill = 0)

##                     a          b
## 2019-06-05  1.3118157  0.4076763
## 2019-06-08 -1.8079752 -1.7854470
## 2019-06-09  0.3988541 -0.6858090
## 2019-06-13  1.4912271  0.0000000

Bind rows to an xts object with rbind(). The resulting xts object is ordered by the index, so argument order does not matter.

rbind(a, b)

##                  [,1]
## 2019-06-05  1.3118157
## 2019-06-05  0.4076763
## 2019-06-06 -1.4411524
## 2019-06-08 -1.8079752
## 2019-06-08 -1.7854470
## 2019-06-09  0.3988541
## 2019-06-09 -0.6858090
## 2019-06-13  1.4912271

There are several options for handling NAs in an xts object.

Replace NAs with last value carried forward using na.locf()
Replace NAs with a value with na.fill(object, fill)
Remove NAs with na.trim() (begin and end) and na.omit() (everywhere)
Interpolate NAs with na.approx()

temps <- xts(x = data.frame(Temp.Max = c(74, 78, 79, 80, 90,
                                         89, 87, 89, 81, 83,
                                         93, 89, 86, 89, 75, 79),
                            Temp.Mean = c(69, NA, NA, 76, 79,
                                          79, 78, 80, 73, 71, 
                                          81, 82, 78, 80, 72, 69),
                            Temp.Min = c(60, 56, 59, 69, 68,
                                         70, 72, 72, 67, 64,
                                         69, 77, 68, 68, 60, 60)),
             order.by = as.Date("2019-06-06") + 0:15)
temps[0:5,]

##            Temp.Max Temp.Mean Temp.Min
## 2019-06-06       74        69       60
## 2019-06-07       78        NA       56
## 2019-06-08       79        NA       59
## 2019-06-09       80        76       69
## 2019-06-10       90        79       68

# Replace NAs with last value carried forward
na.locf(temps[0:5,])

##            Temp.Max Temp.Mean Temp.Min
## 2019-06-06       74        69       60
## 2019-06-07       78        69       56
## 2019-06-08       79        69       59
## 2019-06-09       80        76       69
## 2019-06-10       90        79       68

# Replace NAs with next value carried backward
na.locf(temps[0:5,], fromLast = TRUE)

##            Temp.Max Temp.Mean Temp.Min
## 2019-06-06       74        69       60
## 2019-06-07       78        76       56
## 2019-06-08       79        76       59
## 2019-06-09       80        76       69
## 2019-06-10       90        79       68

# Replace NAs with linear interpolation
na.approx(temps[0:5,])

##            Temp.Max Temp.Mean Temp.Min
## 2019-06-06       74  69.00000       60
## 2019-06-07       78  71.33333       56
## 2019-06-08       79  73.66667       59
## 2019-06-09       80  76.00000       69
## 2019-06-10       90  79.00000       68

Create a k period lag \(L^kX = X_{t-k}\) with lag(). Both zoo and xts implement lag(), but with two major differences. Inzooa negative value ofkindicates a lag and a positive value indicates leads. Inxtsit is the opposite, solag(x, k = 1)` returns the prior value.

cbind(lag(temps[1:5, "Temp.Max"], k = 1),
      temps[1:5, "Temp.Max"],
      lag(temps[1:5, "Temp.Max"], k = -1))

##            Temp.Max Temp.Max.1 Temp.Max.2
## 2019-06-06       NA         74         78
## 2019-06-07       74         78         79
## 2019-06-08       78         79         80
## 2019-06-09       79         80         90
## 2019-06-10       80         90         NA

Take the differences between observations using the diff() function. Specify lags for the number of lags to go back.

cbind(temps[1:5, "Temp.Min"], 
      diff(temps[1:5, "Temp.Min"], k = 1, differences = 1))

##            Temp.Min Temp.Min.1
## 2019-06-06       60         NA
## 2019-06-07       56         -4
## 2019-06-08       59          3
## 2019-06-09       69         10
## 2019-06-10       68         -1

Apply and Aggregate by Time

The endpoints() function returns the locations of the last observations in each interval specified by paramater on = c("years", "quarters", "months", "hours", "minutes"). To find the location of the last observation of every kth period, specify k as a parameter.

Below, the data ranges from 6/6 - 6/21. endpoints() returns the index for each Sunday (6/9 and 6/16), and the last day of the remaining week (Fri, 6/21).

temps

##            Temp.Max Temp.Mean Temp.Min
## 2019-06-06       74        69       60
## 2019-06-07       78        66       56
## 2019-06-08       79        68       59
## 2019-06-09       80        76       69
## 2019-06-10       90        79       68
## 2019-06-11       89        79       70
## 2019-06-12       87        78       72
## 2019-06-13       89        80       72
## 2019-06-14       81        73       67
## 2019-06-15       83        71       64
## 2019-06-16       93        81       69
## 2019-06-17       89        82       77
## 2019-06-18       86        78       68
## 2019-06-19       89        80       68
## 2019-06-20       75        72       60
## 2019-06-21       79        69       60

(ep <- endpoints(temps, on = "weeks"))

## [1]  0  4 11 16

temps[ep]

##            Temp.Max Temp.Mean Temp.Min
## 2019-06-09       80        76       69
## 2019-06-16       93        81       69
## 2019-06-21       79        69       60

Broken into bi-weekly increments includes bi-week ending Jun 9, and Jun 21 because it the last day of the bi-week ending Jun 23.

(ep <- endpoints(temps, on = "weeks", k = 2))

## [1]  0  4 16

temps[ep]

##            Temp.Max Temp.Mean Temp.Min
## 2019-06-09       80        76       69
## 2019-06-21       79        69       60

You are more likely to use endpoints() as an input to the period.apply() function. period.apply() applies function FUN to the periods between the endpoints indexed by INDEX. Here is the weekly mean of temps.

# temps prior to calculating the averages
temps

##            Temp.Max Temp.Mean Temp.Min
## 2019-06-06       74        69       60
## 2019-06-07       78        66       56
## 2019-06-08       79        68       59
## 2019-06-09       80        76       69
## 2019-06-10       90        79       68
## 2019-06-11       89        79       70
## 2019-06-12       87        78       72
## 2019-06-13       89        80       72
## 2019-06-14       81        73       67
## 2019-06-15       83        71       64
## 2019-06-16       93        81       69
## 2019-06-17       89        82       77
## 2019-06-18       86        78       68
## 2019-06-19       89        80       68
## 2019-06-20       75        72       60
## 2019-06-21       79        69       60

# calculate the weekly average
(x <- period.apply(temps[, "Temp.Max"], 
                   INDEX = endpoints(temps, 
                                     on = "weeks", 
                                     k = 1), 
                   FUN = mean))

##            Temp.Max
## 2019-06-09 77.75000
## 2019-06-16 87.42857
## 2019-06-21 83.60000

# merge the back into temps
temps2 <- merge(temps[,"Temp.Max"], x)
colnames(temps2)[colnames(temps2)=="Temp.Max.1"] <- "PdAvg"
temps2

##            Temp.Max    PdAvg
## 2019-06-06       74       NA
## 2019-06-07       78       NA
## 2019-06-08       79       NA
## 2019-06-09       80 77.75000
## 2019-06-10       90       NA
## 2019-06-11       89       NA
## 2019-06-12       87       NA
## 2019-06-13       89       NA
## 2019-06-14       81       NA
## 2019-06-15       83       NA
## 2019-06-16       93 87.42857
## 2019-06-17       89       NA
## 2019-06-18       86       NA
## 2019-06-19       89       NA
## 2019-06-20       75       NA
## 2019-06-21       79 83.60000

# set the value to both days
na.locf(temps2, fromLast = TRUE)

##            Temp.Max    PdAvg
## 2019-06-06       74 77.75000
## 2019-06-07       78 77.75000
## 2019-06-08       79 77.75000
## 2019-06-09       80 77.75000
## 2019-06-10       90 87.42857
## 2019-06-11       89 87.42857
## 2019-06-12       87 87.42857
## 2019-06-13       89 87.42857
## 2019-06-14       81 87.42857
## 2019-06-15       83 87.42857
## 2019-06-16       93 87.42857
## 2019-06-17       89 83.60000
## 2019-06-18       86 83.60000
## 2019-06-19       89 83.60000
## 2019-06-20       75 83.60000
## 2019-06-21       79 83.60000

To split the data into chunks, use split(). Whereas endpoints() locates the index of the period endpoint, split() returns a list of xts objects. Here is a list of weekly means.

lapply(X = split(temps, f = "weeks"), 
       FUN = mean)

## [[1]]
## [1] 69.5
## 
## [[2]]
## [1] 77.85714
## 
## [[3]]
## [1] 75.46667

In financial series it is common to find Open-High-Low-Close (OHLC) range bars calculated over a repeating interval. to_period() converts an xts object into a lower periodicity (e.g., days converted to months or years) specified by period with the OHLC statistics.

Below, to.period creates OHLC statistics for the daily US dollar to Euro exchange rate by year.

usd_eur <- read.csv(file = "https://assets.datacamp.com/production/course_1127/datasets/USDEUR.csv")

to.period(usd_eur, period = "years", OHLC = TRUE)

##            usd_eur.Open usd_eur.High usd_eur.Low usd_eur.Close
## 1999-12-31       1.1812       1.1812      1.0016        1.0070
## 2000-12-29       1.0155       1.0335      0.8270        0.9388
## 2001-12-31       0.9465       0.9535      0.8370        0.8901
## 2002-12-31       0.9031       1.0485      0.8594        1.0485
## 2003-12-31       1.0361       1.2597      1.0361        1.2597
## 2004-12-31       1.2592       1.3625      1.1801        1.3538
## 2005-12-30       1.3476       1.3476      1.1667        1.1842
## 2006-12-29       1.1980       1.3327      1.1860        1.3197
## 2007-12-31       1.3286       1.4862      1.2904        1.4603
## 2008-12-31       1.4738       1.6010      1.2446        1.3919
## 2009-12-31       1.3946       1.5100      1.2547        1.4332
## 2010-12-30       1.4419       1.4536      1.1959        1.3269
## 2011-12-30       1.3371       1.4875      1.2926        1.2973
## 2012-12-31       1.3061       1.3463      1.2062        1.3186
## 2013-12-31       1.3195       1.3816      1.2774        1.3779
## 2014-12-31       1.3670       1.3927      1.2101        1.2101
## 2015-12-31       1.2015       1.2015      1.0524        1.0859
## 2016-08-05       1.0803       1.1516      1.0743        1.1080

Calculate a cumulative sum (cumsum()), cumulative max or min (cummax()), cummin()) by time grouping by splitting the data set into sublists with split() then rebinding the lists into a single object with do.call(rbind, myList).

library(lubridate)

usd_eur_xts <- xts(usd_eur, order.by = ymd(row.names(usd_eur)))

# split into xts lists by year
usd_eur_xts_yr <- split(usd_eur_xts, f = "years")

# calculate cumsum for yearly list
usd_eur_xts_yr_ytd <- lapply(usd_eur_xts_yr, FUN = cumsum)

# bind the xts lists into one object
usd_eur_xts_yr_ytd_b <- do.call(rbind, usd_eur_xts_yr_ytd)
head(usd_eur_xts_yr_ytd_b)

##            DEXUSEU
## 1999-01-04  1.1812
## 1999-01-05  2.3572
## 1999-01-06  3.5208
## 1999-01-07  4.6880
## 1999-01-08  5.8434
## 1999-01-11  6.9968

To apply a function FUN to a rolling window of data a size width, use rollapply()

# rolling 3 period sd
usd_eur_xts_sd3 <- rollapply(usd_eur_xts, width = 3, FUN = sd)

Other Features of xts

xts object functionality is sensitive to the settings of its internal attributes.

The indexClass() function (aka tclass()) extracts, replaces, and formats the index in an xts object.

tclass(usd_eur_xts)

## [1] "Date"

indexClass(usd_eur_xts)

## [1] "Date"

indexFormat(x) <- "%b %d, %Y" changes the index format.

head(usd_eur_xts, n = 3)

##            DEXUSEU
## 1999-01-04  1.1812
## 1999-01-05  1.1760
## 1999-01-06  1.1636

indexFormat(usd_eur_xts) <- "%b %d, %Y"
head(usd_eur_xts, 3)

##              DEXUSEU
## Jan 04, 1999  1.1812
## Jan 05, 1999  1.1760
## Jan 06, 1999  1.1636

Some internal operation system functions require a time zone to do date math. If a time zone isn’t explicitly set, one is chosen for you! Be careful to always set a time zone in your environment. The indexTZ() function (aka tzone()) extracts and replaces the timezone. To explicitly set the timezone, use tzone(x) <- "America/New_York". Usehelp(OlsonNames)` for R documentation for time zones.

tzone(usd_eur_xts)

## [1] "UTC"

tzone(usd_eur_xts) <- "America/New_York"
tzone(usd_eur_xts) <- "UTC"
tzone(usd_eur_xts)

##    TZ 
## "UTC"

The periodicity() function searches for the periodicity (daily, monthly, etc.) in a time series. Convert the set to a specified periodicity with the to.*() functions.

periodicity(usd_eur_xts)

## Daily periodicity from 1999-01-04 to 2016-08-05

usd_eur_xts_yr <- to.yearly(usd_eur_xts)
head(usd_eur_xts_yr)

##              usd_eur_xts.Open usd_eur_xts.High usd_eur_xts.Low
## Dec 31, 1999           1.1812           1.1812          1.0016
## Dec 29, 2000           1.0155           1.0335          0.8270
## Dec 31, 2001           0.9465           0.9535          0.8370
## Dec 31, 2002           0.9031           1.0485          0.8594
## Dec 31, 2003           1.0361           1.2597          1.0361
## Dec 31, 2004           1.2592           1.3625          1.1801
##              usd_eur_xts.Close
## Dec 31, 1999            1.0070
## Dec 29, 2000            0.9388
## Dec 31, 2001            0.8901
## Dec 31, 2002            1.0485
## Dec 31, 2003            1.2597
## Dec 31, 2004            1.3538

Count the number of periods with ndays(), nmonths(), nquarters(), nyears(), etc.

nyears(usd_eur_xts)

## [1] 18

nmonths(usd_eur_xts)

## [1] 212

The index.*() functions extract datetime parts.

# index of weekend days
index <- which(.indexwday(usd_eur_xts) == 5 | .indexwday(usd_eur_xts) == 6)

head(usd_eur_xts[index])

##              DEXUSEU
## Jan 08, 1999  1.1554
## Jan 15, 1999  1.1591
## Jan 22, 1999  1.1582
## Jan 29, 1999  1.1371
## Feb 05, 1999  1.1283
## Feb 12, 1999  1.1282

If you have observations with identical timestamps, your can perturb or remove them to create uniqueness. Function make.index.unique() perturbs values with eps = # and drops duplicates with drop = TRUE. The align.time() function rounds timestamps to another period.

# Make usd_eur-xts have unique timestamps
z_unique <- make.index.unique(usd_eur_xts, eps = 1e-4)

# Remove duplicate times 
z_dup <- make.index.unique(usd_eur_xts, drop = TRUE)

# Round observations in z to the next hour
z_round <- align.time(usd_eur_xts, n = 3600)

References

DataCamp: Manipulating Time Series Data with xts & zoo

Get help for function merge.xts.↩

Manipulating Time Series Data in R with xts & zoo

Michael Foley