I took these notes while working through the Manipulating Time Series Data in R with xts & zoo at DataCamp.
xts
extends the zoo
class. xts
objects are essentially a matrix of observations indexed by a time object. Create an xts
object with xts(x, order.by)
where x
is the the data and order.by
is a vector of dates/times to index the data. You can also add metadata to the xts
object by declaring name-value pairs such as born = as.POSIXct("1899-05-08")
.
The code chunk below creates an xts
object from a vector of data indexed by a vector of dates. A name-value pair adds metadata.
library(xts)
dat <- xts(x = rnorm(n = 5),
order.by = seq(as.Date("2016-01-01"),
length = 5,
by = "days"),
hayek_bday = as.POSIXct("1899-05-08"))
# another (cool) way to create the sequence of dates:
# as.Date("2016-01-01") + 0:4
print(dat)
## [,1]
## 2016-01-01 -0.5173702
## 2016-01-02 -0.7518931
## 2016-01-03 -0.1048107
## 2016-01-04 0.0401562
## 2016-01-05 -0.9661578
If you need to separate your time series into its core data and index attributes for additional analysis and manipulation, use the coredata()
and index()
functions inherited from the zoo class.
class(coredata(dat))
## [1] "matrix"
class(index(dat))
## [1] "Date"
You can index an xts
object using any type of time series object (POSIXct
, Date
, etc.). The following xts
creation methods are equivalent.
x <- rnorm(n = 5)
dat1 <- xts(x = x,
order.by = as.Date("2016-01-01") + 0:4)
print(dat1)
## [,1]
## 2016-01-01 1.4568050
## 2016-01-02 1.4416505
## 2016-01-03 -1.1200310
## 2016-01-04 0.6519961
## 2016-01-05 -1.3620953
dat2 <- xts(x = x,
order.by = as.POSIXct("2016-01-01") + (0:4)*60*60*24)
print(dat2)
## [,1]
## 2016-01-01 1.4568050
## 2016-01-02 1.4416505
## 2016-01-03 -1.1200310
## 2016-01-04 0.6519961
## 2016-01-05 -1.3620953
If you need to convert between classes because a function is not time series aware, use the as.*
functions provided in R (as.POSIXct()
, as.data.frame()
, or as.matrix()
). Convert ts
objects or data frames to xts
with as.xts()
.
class(as.xts(austres)) # austres is a ts object
## [1] "xts" "zoo"
class(as.ts(as.xts(austres))) # convert it back!
## [1] "ts"
The zoo
package includes its own method for reading files, although it is not necesary. The code chunk below loads a csv data file of time series data using base-R read.csv()
and the equivalent way with read.zoo()
.
file <- "http://s3.amazonaws.com/assets.datacamp.com/production/course_1127/datasets/tmp_file.csv"
x <- read.csv(file = file)
dat1 <- xts(x = x,
order.by = as.Date(rownames(x), "%m/%d/%Y"))
print(dat1)
## a b
## 2015-01-02 1 3
## 2015-02-03 2 4
dat2 <- read.zoo(file = file,
index.column = 0,
sep = ",",
format = "%m/%d/%Y")
dat2 <- as.xts(dat2)
print(dat2)
## a b
## 2015-01-02 1 3
## 2015-02-03 2 4
If you need an xts
object to persist across sessions, save it with saveRDS(x, "file")
and later re-open it with readRDS()
. If you need to feed your analysis into another process, Write the xts
object back to a file using write.zoo(x, sep, file)
.
file <- "./sunspots.txt"
sunspots_xts <- as.xts(sunspots)
write.zoo(x = sunspots_xts,
sep = ",",
file = file)
# FUN = as.yearmon converts strings such as Jan 1749 into a proper time class
sun <- as.xts(read.zoo(file = file,
sep = ",",
FUN = as.yearmon))
Extract and replace values in xts
objects by expressing values in the ISO standard (YYYY-MM-DDTHH:MM:SS - the dashes can be removed). Use a forward slash “/” to create an interval.
x <- rnorm(n = 1000)
dates <- seq(as.Date("2015-01-01"),
length = 1000,
by = "days")
dat <- xts(x = x, order.by = dates)
nrow(dat["2015"]) # zoo way!
## [1] 365
The last line in the above code chuck is the zoo
equivalent to the following conventional matrix method.
nrow(dat[as.Date("2015-01-01") + 0:364]) # conventional way
## [1] 365
Here is an interval from 2016-01-01 to 2016-03-22.
nrow(dat["2016/2016-03-22"])
## [1] 82
Here is an example of extracting the time interval 5AM - 7AM for 2019-06-05.
x <- rnorm(n = 60*24*10) # 10 days of data by minute
datetimes = as.POSIXct("2019-06-01") + (1:(60*24*10))*60
dat <- xts(x = x,
order.by = datetimes)
head(dat["2019-06-05T05:00/2019-06-05T07:00"])
## [,1]
## 2019-06-05 05:00:00 -0.73936257
## 2019-06-05 05:01:00 -0.17298425
## 2019-06-05 05:02:00 0.71084222
## 2019-06-05 05:03:00 1.27913715
## 2019-06-05 05:04:00 -0.03052699
## 2019-06-05 05:05:00 -0.22565641
xts
functions first()
and last()
are time-aware analogs to head()
and tail()
.
x <- rnorm(n = 16) # 16 days of data by day
datetimes = as.Date("2019-06-01") + (0:15)
dat <- xts(x = x,
order.by = datetimes)
(lastweek <- last(dat, "1 week")) # week defined as M:Su
## [,1]
## 2019-06-10 -0.846810380
## 2019-06-11 -0.404188702
## 2019-06-12 -0.598280156
## 2019-06-13 2.272130984
## 2019-06-14 0.077227531
## 2019-06-15 0.005792591
## 2019-06-16 0.595099206
last(lastweek, n = 2) # last 2 obs of lastweek
## [,1]
## 2019-06-15 0.005792591
## 2019-06-16 0.595099206
first(lastweek, "-2 days") # all but first 2 obs of lastweek
## [,1]
## 2019-06-12 -0.598280156
## 2019-06-13 2.272130984
## 2019-06-14 0.077227531
## 2019-06-15 0.005792591
## 2019-06-16 0.595099206
# First three days of second week of dat
first(last(first(dat, "2 weeks"), "1 week"), "3 days")
## [,1]
## 2019-06-03 -0.03043761
## 2019-06-04 0.03587669
## 2019-06-05 -0.21708247
By design, when you perform any binary operation using two xts objects, these objects are first aligned using the intersection of the indexes. This preserves the point-in-time aspect of the data, preventing accidental look ahead (or behind) bias into calculations. If you need to preserve the matrix dimensions, use coredata()
or as.numeric()
to create a matrix or vector, manually shift index values with lag()
, or reindex the data. Here is an example of creating a matrix of vector.
a <- xts(x = rep(x = 3, times = 3),
order.by = seq(as.Date("2019-06-06"),
length = 3,
by = "days"))
b <- xts(x = 1,
order.by = as.Date("2019-06-07"))
# adding xts objects returns only intersecting dates.
a + b
## e1
## 2019-06-07 4
# Adding a numeric to an xts object is more intuitive.
a + as.numeric(b)
## [,1]
## 2019-06-06 4
## 2019-06-07 4
## 2019-06-08 4
Here is an example of reindexing.
# Merge a into b, replacing missing rows of b with 0
a + merge(b, index(a), fill = 0)
## b
## 2019-06-06 3
## 2019-06-07 4
## 2019-06-08 3
# Merge a into b, replacing NAs with the last observation
a + merge(b, index(a), fill = na.locf)
## b
## 2019-06-06 NA
## 2019-06-07 4
## 2019-06-08 4
merge()
joins an xts
object to another xts
object on the index, or a vector of dates to an xts
object (useful to fill in gaps in an xts
object).1 merge()
takes three arguments. First is ...
, an arbitrary number of objects to combine. Second is join = c("inner", "left", "right", "outer")
. Third is fill
, specifying what to do with NAs.
a <- xts(x = rnorm(n = 4),
order.by = as.Date(c("2019-06-05",
"2019-06-08",
"2019-06-09",
"2019-06-13")))
b <- xts(x = rnorm(n = 4),
order.by = as.Date(c("2019-06-05",
"2019-06-06",
"2019-06-08",
"2019-06-09")))
# inner join
merge(a, b, join = "inner")
## a b
## 2019-06-05 1.3118157 0.4076763
## 2019-06-08 -1.8079752 -1.7854470
## 2019-06-09 0.3988541 -0.6858090
# left join, NAs = 0
merge(a, b, join = "left", fill = 0)
## a b
## 2019-06-05 1.3118157 0.4076763
## 2019-06-08 -1.8079752 -1.7854470
## 2019-06-09 0.3988541 -0.6858090
## 2019-06-13 1.4912271 0.0000000
Bind rows to an xts
object with rbind()
. The resulting xts
object is ordered by the index, so argument order does not matter.
rbind(a, b)
## [,1]
## 2019-06-05 1.3118157
## 2019-06-05 0.4076763
## 2019-06-06 -1.4411524
## 2019-06-08 -1.8079752
## 2019-06-08 -1.7854470
## 2019-06-09 0.3988541
## 2019-06-09 -0.6858090
## 2019-06-13 1.4912271
There are several options for handling NAs in an xts
object.
na.locf()
na.fill(object, fill)
na.trim()
(begin and end) and na.omit()
(everywhere)na.approx()
temps <- xts(x = data.frame(Temp.Max = c(74, 78, 79, 80, 90,
89, 87, 89, 81, 83,
93, 89, 86, 89, 75, 79),
Temp.Mean = c(69, NA, NA, 76, 79,
79, 78, 80, 73, 71,
81, 82, 78, 80, 72, 69),
Temp.Min = c(60, 56, 59, 69, 68,
70, 72, 72, 67, 64,
69, 77, 68, 68, 60, 60)),
order.by = as.Date("2019-06-06") + 0:15)
temps[0:5,]
## Temp.Max Temp.Mean Temp.Min
## 2019-06-06 74 69 60
## 2019-06-07 78 NA 56
## 2019-06-08 79 NA 59
## 2019-06-09 80 76 69
## 2019-06-10 90 79 68
# Replace NAs with last value carried forward
na.locf(temps[0:5,])
## Temp.Max Temp.Mean Temp.Min
## 2019-06-06 74 69 60
## 2019-06-07 78 69 56
## 2019-06-08 79 69 59
## 2019-06-09 80 76 69
## 2019-06-10 90 79 68
# Replace NAs with next value carried backward
na.locf(temps[0:5,], fromLast = TRUE)
## Temp.Max Temp.Mean Temp.Min
## 2019-06-06 74 69 60
## 2019-06-07 78 76 56
## 2019-06-08 79 76 59
## 2019-06-09 80 76 69
## 2019-06-10 90 79 68
# Replace NAs with linear interpolation
na.approx(temps[0:5,])
## Temp.Max Temp.Mean Temp.Min
## 2019-06-06 74 69.00000 60
## 2019-06-07 78 71.33333 56
## 2019-06-08 79 73.66667 59
## 2019-06-09 80 76.00000 69
## 2019-06-10 90 79.00000 68
Create a k period lag \(L^kX = X_{t-k}\) with lag()
. Both zoo
and xts
implement lag()
, but with two major differences. In
zooa negative value of
kindicates a lag and a positive value indicates leads. In
xtsit is the opposite, so
lag(x, k = 1)` returns the prior value.
cbind(lag(temps[1:5, "Temp.Max"], k = 1),
temps[1:5, "Temp.Max"],
lag(temps[1:5, "Temp.Max"], k = -1))
## Temp.Max Temp.Max.1 Temp.Max.2
## 2019-06-06 NA 74 78
## 2019-06-07 74 78 79
## 2019-06-08 78 79 80
## 2019-06-09 79 80 90
## 2019-06-10 80 90 NA
Take the differences between observations using the diff()
function. Specify lags
for the number of lags to go back.
cbind(temps[1:5, "Temp.Min"],
diff(temps[1:5, "Temp.Min"], k = 1, differences = 1))
## Temp.Min Temp.Min.1
## 2019-06-06 60 NA
## 2019-06-07 56 -4
## 2019-06-08 59 3
## 2019-06-09 69 10
## 2019-06-10 68 -1
The endpoints()
function returns the locations of the last observations in each interval specified by paramater on = c("years", "quarters", "months", "hours", "minutes")
. To find the location of the last observation of every kth period, specify k
as a parameter.
Below, the data ranges from 6/6 - 6/21. endpoints()
returns the index for each Sunday (6/9 and 6/16), and the last day of the remaining week (Fri, 6/21).
temps
## Temp.Max Temp.Mean Temp.Min
## 2019-06-06 74 69 60
## 2019-06-07 78 66 56
## 2019-06-08 79 68 59
## 2019-06-09 80 76 69
## 2019-06-10 90 79 68
## 2019-06-11 89 79 70
## 2019-06-12 87 78 72
## 2019-06-13 89 80 72
## 2019-06-14 81 73 67
## 2019-06-15 83 71 64
## 2019-06-16 93 81 69
## 2019-06-17 89 82 77
## 2019-06-18 86 78 68
## 2019-06-19 89 80 68
## 2019-06-20 75 72 60
## 2019-06-21 79 69 60
(ep <- endpoints(temps, on = "weeks"))
## [1] 0 4 11 16
temps[ep]
## Temp.Max Temp.Mean Temp.Min
## 2019-06-09 80 76 69
## 2019-06-16 93 81 69
## 2019-06-21 79 69 60
Broken into bi-weekly increments includes bi-week ending Jun 9, and Jun 21 because it the last day of the bi-week ending Jun 23.
(ep <- endpoints(temps, on = "weeks", k = 2))
## [1] 0 4 16
temps[ep]
## Temp.Max Temp.Mean Temp.Min
## 2019-06-09 80 76 69
## 2019-06-21 79 69 60
You are more likely to use endpoints()
as an input to the period.apply()
function. period.apply()
applies function FUN
to the periods between the endpoints indexed by INDEX
. Here is the weekly mean of temps
.
# temps prior to calculating the averages
temps
## Temp.Max Temp.Mean Temp.Min
## 2019-06-06 74 69 60
## 2019-06-07 78 66 56
## 2019-06-08 79 68 59
## 2019-06-09 80 76 69
## 2019-06-10 90 79 68
## 2019-06-11 89 79 70
## 2019-06-12 87 78 72
## 2019-06-13 89 80 72
## 2019-06-14 81 73 67
## 2019-06-15 83 71 64
## 2019-06-16 93 81 69
## 2019-06-17 89 82 77
## 2019-06-18 86 78 68
## 2019-06-19 89 80 68
## 2019-06-20 75 72 60
## 2019-06-21 79 69 60
# calculate the weekly average
(x <- period.apply(temps[, "Temp.Max"],
INDEX = endpoints(temps,
on = "weeks",
k = 1),
FUN = mean))
## Temp.Max
## 2019-06-09 77.75000
## 2019-06-16 87.42857
## 2019-06-21 83.60000
# merge the back into temps
temps2 <- merge(temps[,"Temp.Max"], x)
colnames(temps2)[colnames(temps2)=="Temp.Max.1"] <- "PdAvg"
temps2
## Temp.Max PdAvg
## 2019-06-06 74 NA
## 2019-06-07 78 NA
## 2019-06-08 79 NA
## 2019-06-09 80 77.75000
## 2019-06-10 90 NA
## 2019-06-11 89 NA
## 2019-06-12 87 NA
## 2019-06-13 89 NA
## 2019-06-14 81 NA
## 2019-06-15 83 NA
## 2019-06-16 93 87.42857
## 2019-06-17 89 NA
## 2019-06-18 86 NA
## 2019-06-19 89 NA
## 2019-06-20 75 NA
## 2019-06-21 79 83.60000
# set the value to both days
na.locf(temps2, fromLast = TRUE)
## Temp.Max PdAvg
## 2019-06-06 74 77.75000
## 2019-06-07 78 77.75000
## 2019-06-08 79 77.75000
## 2019-06-09 80 77.75000
## 2019-06-10 90 87.42857
## 2019-06-11 89 87.42857
## 2019-06-12 87 87.42857
## 2019-06-13 89 87.42857
## 2019-06-14 81 87.42857
## 2019-06-15 83 87.42857
## 2019-06-16 93 87.42857
## 2019-06-17 89 83.60000
## 2019-06-18 86 83.60000
## 2019-06-19 89 83.60000
## 2019-06-20 75 83.60000
## 2019-06-21 79 83.60000
To split the data into chunks, use split()
. Whereas endpoints()
locates the index of the period endpoint, split()
returns a list of xts
objects. Here is a list of weekly means.
lapply(X = split(temps, f = "weeks"),
FUN = mean)
## [[1]]
## [1] 69.5
##
## [[2]]
## [1] 77.85714
##
## [[3]]
## [1] 75.46667
In financial series it is common to find Open-High-Low-Close (OHLC) range bars calculated over a repeating interval. to_period()
converts an xts
object into a lower periodicity (e.g., days converted to months or years) specified by period
with the OHLC statistics.
Below, to.period
creates OHLC statistics for the daily US dollar to Euro exchange rate by year.
usd_eur <- read.csv(file = "https://assets.datacamp.com/production/course_1127/datasets/USDEUR.csv")
to.period(usd_eur, period = "years", OHLC = TRUE)
## usd_eur.Open usd_eur.High usd_eur.Low usd_eur.Close
## 1999-12-31 1.1812 1.1812 1.0016 1.0070
## 2000-12-29 1.0155 1.0335 0.8270 0.9388
## 2001-12-31 0.9465 0.9535 0.8370 0.8901
## 2002-12-31 0.9031 1.0485 0.8594 1.0485
## 2003-12-31 1.0361 1.2597 1.0361 1.2597
## 2004-12-31 1.2592 1.3625 1.1801 1.3538
## 2005-12-30 1.3476 1.3476 1.1667 1.1842
## 2006-12-29 1.1980 1.3327 1.1860 1.3197
## 2007-12-31 1.3286 1.4862 1.2904 1.4603
## 2008-12-31 1.4738 1.6010 1.2446 1.3919
## 2009-12-31 1.3946 1.5100 1.2547 1.4332
## 2010-12-30 1.4419 1.4536 1.1959 1.3269
## 2011-12-30 1.3371 1.4875 1.2926 1.2973
## 2012-12-31 1.3061 1.3463 1.2062 1.3186
## 2013-12-31 1.3195 1.3816 1.2774 1.3779
## 2014-12-31 1.3670 1.3927 1.2101 1.2101
## 2015-12-31 1.2015 1.2015 1.0524 1.0859
## 2016-08-05 1.0803 1.1516 1.0743 1.1080
Calculate a cumulative sum (cumsum()
), cumulative max or min (cummax()
), cummin()
) by time grouping by splitting the data set into sublists with split()
then rebinding the lists into a single object with do.call(rbind, myList)
.
library(lubridate)
usd_eur_xts <- xts(usd_eur, order.by = ymd(row.names(usd_eur)))
# split into xts lists by year
usd_eur_xts_yr <- split(usd_eur_xts, f = "years")
# calculate cumsum for yearly list
usd_eur_xts_yr_ytd <- lapply(usd_eur_xts_yr, FUN = cumsum)
# bind the xts lists into one object
usd_eur_xts_yr_ytd_b <- do.call(rbind, usd_eur_xts_yr_ytd)
head(usd_eur_xts_yr_ytd_b)
## DEXUSEU
## 1999-01-04 1.1812
## 1999-01-05 2.3572
## 1999-01-06 3.5208
## 1999-01-07 4.6880
## 1999-01-08 5.8434
## 1999-01-11 6.9968
To apply a function FUN
to a rolling window of data a size width
, use rollapply()
# rolling 3 period sd
usd_eur_xts_sd3 <- rollapply(usd_eur_xts, width = 3, FUN = sd)
xts
object functionality is sensitive to the settings of its internal attributes.
The indexClass()
function (aka tclass()
) extracts, replaces, and formats the index in an xts
object.
tclass(usd_eur_xts)
## [1] "Date"
indexClass(usd_eur_xts)
## [1] "Date"
indexFormat(x) <- "%b %d, %Y"
changes the index format.
head(usd_eur_xts, n = 3)
## DEXUSEU
## 1999-01-04 1.1812
## 1999-01-05 1.1760
## 1999-01-06 1.1636
indexFormat(usd_eur_xts) <- "%b %d, %Y"
head(usd_eur_xts, 3)
## DEXUSEU
## Jan 04, 1999 1.1812
## Jan 05, 1999 1.1760
## Jan 06, 1999 1.1636
Some internal operation system functions require a time zone to do date math. If a time zone isn’t explicitly set, one is chosen for you! Be careful to always set a time zone in your environment. The indexTZ()
function (aka tzone()
) extracts and replaces the timezone. To explicitly set the timezone, use tzone(x) <- "America/New_York". Use
help(OlsonNames)` for R documentation for time zones.
tzone(usd_eur_xts)
## [1] "UTC"
tzone(usd_eur_xts) <- "America/New_York"
tzone(usd_eur_xts) <- "UTC"
tzone(usd_eur_xts)
## TZ
## "UTC"
The periodicity()
function searches for the periodicity (daily, monthly, etc.) in a time series. Convert the set to a specified periodicity with the to.*()
functions.
periodicity(usd_eur_xts)
## Daily periodicity from 1999-01-04 to 2016-08-05
usd_eur_xts_yr <- to.yearly(usd_eur_xts)
head(usd_eur_xts_yr)
## usd_eur_xts.Open usd_eur_xts.High usd_eur_xts.Low
## Dec 31, 1999 1.1812 1.1812 1.0016
## Dec 29, 2000 1.0155 1.0335 0.8270
## Dec 31, 2001 0.9465 0.9535 0.8370
## Dec 31, 2002 0.9031 1.0485 0.8594
## Dec 31, 2003 1.0361 1.2597 1.0361
## Dec 31, 2004 1.2592 1.3625 1.1801
## usd_eur_xts.Close
## Dec 31, 1999 1.0070
## Dec 29, 2000 0.9388
## Dec 31, 2001 0.8901
## Dec 31, 2002 1.0485
## Dec 31, 2003 1.2597
## Dec 31, 2004 1.3538
Count the number of periods with ndays()
, nmonths()
, nquarters()
, nyears()
, etc.
nyears(usd_eur_xts)
## [1] 18
nmonths(usd_eur_xts)
## [1] 212
The index.*()
functions extract datetime parts.
# index of weekend days
index <- which(.indexwday(usd_eur_xts) == 5 | .indexwday(usd_eur_xts) == 6)
head(usd_eur_xts[index])
## DEXUSEU
## Jan 08, 1999 1.1554
## Jan 15, 1999 1.1591
## Jan 22, 1999 1.1582
## Jan 29, 1999 1.1371
## Feb 05, 1999 1.1283
## Feb 12, 1999 1.1282
If you have observations with identical timestamps, your can perturb or remove them to create uniqueness. Function make.index.unique()
perturbs values with eps = #
and drops duplicates with drop = TRUE
. The align.time()
function rounds timestamps to another period.
# Make usd_eur-xts have unique timestamps
z_unique <- make.index.unique(usd_eur_xts, eps = 1e-4)
# Remove duplicate times
z_dup <- make.index.unique(usd_eur_xts, drop = TRUE)
# Round observations in z to the next hour
z_round <- align.time(usd_eur_xts, n = 3600)
Get help for function merge.xts
.↩