There are many different types of objects in R. With a variety of different features, each has a unique purpose. Some classes inherit behavior from their parents, allowing for custom extensions to existing and well-understood R objects. This makes it easy to adapt existing code to new functionality.
It is for this reason that xts extends the popular zoo class. - xts objects are matrix objects internally. - xts objects are indexed by a formal time object. - Most zoo methods work for xts. Xts objects are very versatile and useful for working with time-based data.
It is best to think of xts objects as normal R matrices, but with special powers. These powers let you manipulate your data as a function of time, as your data is now self-aware of when it exists in time. Before we can start to exploit these powers, it will be helpful to see how xts objects relate to their base-R relatives.
# Load xts
library(xts)
# View the structure of ex_matrix
str(ex_matrix)
An ‘xts’ object on 2016-06-01/2016-06-03 containing:
Data: num [1:3, 1:2] 1 1 1 2 2 2
Indexed by objects of class: [Date] TZ: UTC
xts Attributes:
List of 1
$ createdOn: POSIXct[1:1], format: "2020-06-29 16:16:16"
# Extract the 3rd observation of the 2nd column of ex_matrix
ex_matrix[3, 2]
[,1]
2016-06-03 2
# Extract the 3rd observation of the 2nd column of core
core[3, 2]
[1] 2
xts objects are simple. Think of them as a matrix of observations combined with an index of corresponding dates and times.
xts = matrix + times
The main xts constructor takes a number of arguments, but the two most important are x for the data and order.by for the index. x must be a vector or matrix. order.by is a vector which must be the same length or number of rows as x, be a proper time or date object (very important!), and be in increasing order.
xts also allows you to bind arbitrary key-value attributes to your data. This lets you keep metadata about your object inside your object. To add these at creation, you simply pass additional name = value arguments to the xts() function.
Since we are focusing here on the mechanics, we’ll use random numbers as our data so we can focus on creating the object rather than worry about its contents.
# Create the object data using 5 random numbers
data <- rnorm(5)
# Create dates as a Date class object starting from 2016-01-01
dates <- seq(as.Date("2016-01-01"), length = 5, by = "days")
# Use xts() to create smith
smith <- xts(x = data, order.by = dates)
# Create bday (1899-05-08) using a POSIXct date class object
bday <- as.POSIXct("1899-05-08")
# Create hayek and add a new attribute called born
hayek <- xts(x = data, order.by = dates, born = bday)
Now that you can create xts objects, your next task is to examine an xts object from the inside.
At the core of both xts and zoo is a simple R matrix with a few additional attributes. The most important of these attributes is the index. The index holds all the information we need for xts to treat our data as a time series.
When working with time series, it will sometimes be necessary to separate your time series into its core data and index attributes for additional analysis and manipulation. The core data is the matrix portion of xts. You can separate this from the xts object using coredata(). The index portion of the xts object is available using the index() function. Note that both of these functions are methods from the zoo class, which xts extends.
# Extract the core data of hayek
hayek_core <- coredata(hayek)
# View the class of hayek_core
class(hayek_core)
[1] "matrix" "array"
# Extract the index of hayek
hayek_index <- index(hayek)
# View the class of hayek_index
class(hayek_index)
[1] "Date"
xts objects get their power from the index attribute that holds the time dimension. One major difference between xts and most other time series objects in R is the ability to use any one of various classes that are used to represent time. Whether POSIXct, Date, or some other class, xts will convert this into an internal form to make subsetting as natural to the user as possible.
a <- xts(x = 1:2, as.Date("2012-01-01") + 0:1)
a[index(a)]
[,1]
2012-01-01 1
2012-01-02 2
We can simply use date objects to index appropriate rows from your time series. We can think of this as effectively matching the rownames you see in the object. This works as anticipated for time objects because the rownames are really dates!
# Create dates
dates <- as.Date("2016-01-01") + 0:4
# Create ts_a
ts_a <- xts(x = c(1:5), order.by = dates)
# Create ts_b
ts_b <- xts(x = c(1:5), order.by = as.POSIXct(dates))
# Extract the rows of ts_a using the index of ts_b
ts_a[index(ts_b)]
[,1]
2016-01-01 1
2016-01-02 2
2016-01-03 3
2016-01-04 4
2016-01-05 5
# Extract the rows of ts_b using the index of ts_a
ts_b[index(ts_a)]
[,1]
2016-01-01 1
2016-01-02 2
2016-01-03 3
2016-01-04 4
2016-01-05 5
The versatile structure of xts objects makes subsetting them very intuitive.
It is often necessary to convert between classes when working with time series data in R. Conversion can be required for many reasons, but typically you’ll be looking to use a function that may not be time series aware or you may want to use a particular aspect of xts with something that doesn’t necessarily need to be a full time series.
Luckily, it is quite easy to convert back and forth using the standard as.* style functionality provided in R (for example, as.POSIXct() or as.matrix()).
xts provides methods to convert all of the major objects you are likely to come across. Suitable native R types like matrix, data.frame, and ts are supported, as well as contributed ones such as timeSeries, fts and of course zoo. as.xts() is the workhorse function to do the conversions to xts, and similar functions will provide the reverse behavior.
To get a feel for moving data between classes, let’s try a few examples using the Australian population ts object from R named austres.
# Convert austres to an xts object called au
au <- as.xts(austres)
# Then convert your xts object (au) into a matrix am
am <- as.matrix(au)
# Inspect the head of am
head(am)
au
1971 Q2 13067.3
1971 Q3 13130.5
1971 Q4 13198.4
1972 Q1 13254.2
1972 Q2 13303.7
1972 Q3 13353.9
# Convert the original austres into a matrix am2
am2 <- as.matrix(austres)
# Inspect the head of am2
head(am2)
[,1]
[1,] 13067.3
[2,] 13130.5
[3,] 13198.4
[4,] 13254.2
[5,] 13303.7
[6,] 13353.9
Converting objects to and from the xts class is critical for manipulating time series data and couldn’t be easier than with the as.xts() command.
We can now convert data to xts using as.xts(). However, in most real world applications we will often need to read raw data from files on disk or the web. This can be challenging without knowing the right commands.
First, we will start by reading a csv file from disk using the base-R read.csv. After we read the data, the next step is to convert it to xts. Here we will be required to use the xts() constructor as well as deal with converting non-standard dates into something that xts understands.
Second, we will read the same data into a zoo object using read.zoo and then convert the zoo object into an xts object.
The data in this exercise are quite simple, but will require some effort to properly import and clean. The full name of the file you will be working with has been saved as the value of tmp_file
tmp_file <- "https://s3.amazonaws.com/assets.datacamp.com/production/course_1127/datasets/tmp_file.csv"
# Create dat by reading tmp_file
dat = read.csv(tmp_file)
# Convert dat into xts
xts(dat, order.by = as.Date(rownames(dat), "%m/%d/%Y"))
a b
2015-01-02 1 3
2015-02-03 2 4
# Read tmp_file using read.zoo
dat_zoo <- read.zoo(tmp_file, index.column = 0, sep = ",", format = "%m/%d/%Y")
# Convert dat_zoo to xts
dat_xts <- as.xts(dat_zoo)
We just successfully generated xts objects from raw data.
Now that we can read raw data into xts and zoo objects, it is only natural that we learn how to reverse the process.
There are two main use cases for exporting xts objects. First, you may require an object to persist across sessions for use in later analysis. In this case, it is almost always best to use saveRDS() and readRDS() to serialize single R objects.
Alternatively, we may find yourself needing to share the results of your analysis with others, often expecting the data to be consumed by processes unaware of both R and xts. Most of us would prefer not to think of this horrible fate for our data, but the real world mandates that we at least understand how this works.
One of the best ways to write an xts object from R is to use the zoo function write.zoo().
# Convert sunspots to xts using as.xts().
sunspots_xts <- as.xts(sunspots)
# Get the temporary file name
tmp <- tempfile()
# Write the xts object using zoo to tmp
write.zoo(sunspots_xts, sep = ",", file = tmp)
# Read the tmp file. FUN = as.yearmon converts strings such as Jan 1749 into a proper time class
sun <- read.zoo(tmp, sep = ",", FUN = as.yearmon)
# Convert sun into xts. Save this as sun_xts
sun_xts <- as.xts(sun)
The ISO-8601 standard is the internationally recognized and accepted way to represent dates and times. The standard allows for a common format to not only describe dates, but also to represent ranges and repeating intervals.
xts makes use of this standard for all extract and replace operations. This makes code both easy to write and easy to maintain. It also makes for very concise expression of date ranges and intervals.
For xts to work correctly, it is very important to follow the standard exactly. Details can be found in xts subset and .parseISO8601 documentation.
One of the most powerful aspects of working with time series in xts is the ability to quickly and efficiently specify dates and time ranges for subsetting.
Date ranges can be extracted from xts objects by simply specifying the period(s) you want using special character strings in your subset.
A["20090825"] ## Aug 25, 2009
A["201203/201212"] ## Mar to Dec 2012
A["/201601"] ## Up to and including January 2016
We will extract a range of dates using the ISO-8601 feature of xts. After successfully extracting a full year, you will then create a subset of your new object with specific start and end dates using this same notation.
x <- readRDS("x.rds")
# Select all of 2016 from x
x_2016 <- x["2016"]
# Select January 1, 2016 to March 22, 2016
jan_march <- x["2016/2016-03-22"]
# Verify that jan_march contains 82 rows
82 == length(jan_march)
[1] TRUE
Subsetting a range is a useful way to get a snapshot of a larger time series object.
The most common time series data “in the wild” is daily. On occasion, you may find yourself working with intraday data, which contains both dates and times. In this case it is sometimes necessary to view only a subset of time for each day over multiple days. Using xts, you can slice days easily by using special notation in the i = argument to the single bracket extraction (i.e. [i, j]). The trick to this is to not specify explicit dates, but rather to use the special T/T notation designed for intraday repeating intervals.
# Intraday times for all days
NYSE["T09:30/T16:00"]
We will extract recurring morning hours from the time series irreg, which holds irregular data from the month of January 2010.
irreg <- readRDS("irreg.rds")
# Extract all data from irreg between 8AM and 10AM
morn_2010 <- irreg["T08:00/T10:00"]
'indexTZ' is deprecated.
Use 'tzone' instead.
See help("Deprecated") and help("xts-deprecated").
# Extract the observations in morn_2010 for January 13th, 2010
morn_2010["2010-01-13"]
[,1]
2010-01-13 08:07:00 41
2010-01-13 09:28:00 42
Precision subsetting is a useful tool for exploring time series data.
Often you may need to subset an existing time series with a set of Dates, or time-based objects. These might be from as.Date(), as.POSIXct(), or a variety of other classes. In this exercise you’ll explore how, given an xts object x, it is possible to extract relevant observations using a vector of dates in brackets.
# Subset x using the vector dates
x[dates]
[,1]
2016-06-04 3
2016-06-08 7
# Subset x using dates as POSIXct
x[as.POSIXct(dates)]
[,1]
2016-06-04 3
2016-06-08 7
Replacing values in xts objects is just as easy as extracting them. We can use either ISO-8601 strings, date objects, logicals, or integers to locate the rows you want to replace. One reason we may want to do this would be to replace known intervals or observations with NA, say due to a malfunctioning sensor on a particular day or a set of outliers given a holiday.
For individual observations located sporadically throughout our data dates, integers or logical vectors are a great choice. For continuous blocks of time, ISO-8601 is the preferred method.
# Replace the values in x contained in the dates vector with NA
x[dates] <- NA
# Replace all values in x for dates starting June 9, 2016 with 0
x["20160609/"] <- 0
# Verify that the value in x for June 11, 2016 is now indeed 0
x["20160611"]
[,1]
2016-06-11 0
Sometimes you need to locate data by relative time. Something that is easier said than put into code. This is equivalent to requesting the head or tail of a series, but instead of using an absolute offset, you describe a relative position in time. A simple example would be something like the last 3 weeks of a series, or the first day of current month.
Without a time aware object, this gets quite complicated very quickly. Luckily, xts has the necessary prerequisites built in for you to use with very little learning required. Using the first() and last() functions it is actually quite easy!
# Create lastweek using the last 1 week of temps
lastweek <- last(temps, "1 week")
# Print the last 2 observations in lastweek
last(lastweek, 2)
Temp.Max Temp.Mean Temp.Min
2016-07-15 75 72 60
2016-07-16 79 69 60
# Extract all but the first two days of lastweek
first(lastweek, "-2 days")
Temp.Max Temp.Mean Temp.Min
2016-07-13 86 78 68
2016-07-14 89 80 68
2016-07-15 75 72 60
2016-07-16 79 69 60
Relative subsetting with first() and last() can be a valuable way to look at the most recent data in your xts object.
Now that you have seen how to extract the first or last chunk of a time series using natural looking language, it is only a matter of time before you need to get a bit more complex.
We’ll extract a very specific subset of observations by linking together multiple calls to first() and last().
# Last 3 days of first week
last(first(Temps, '1 week'), '3 days')
We will reconfigure the example above using the temps data from the previous exercise. The trick to using such a complex command is to work from the inside function, out.
# Extract the first three days of the second week of temps
first(last(first(temps, "2 weeks"), "1 week"), "3 days")
Temp.Max Temp.Mean Temp.Min
2016-07-04 80 76 69
2016-07-05 90 79 68
2016-07-06 89 79 70
xts objects respect time. By design when we perform any binary operation using two xts objects, these objects are first aligned using the intersection of the indexes. This may be surprising when first encountered.
The reason for this is that we want to preserve the point-in-time aspect of our data, assuring that we don’t introduce accidental look ahead (or look behind!) bias into our calculations.
What this means in practice is that we will sometimes be tasked with handling this behavior if we want to preserve the dimensions of your data.
Our options include:
# Add a and b
a + b
a
2015-01-24 3
# Add a with the numeric value of b
a + as.numeric(b)
a
2015-01-24 3
2015-01-25 3
2015-01-26 3
As we can see, adding two xts objects returns only the dates common to both. Adding a numeric to an xts object is a bit more intuitive.
At this point we are aware that xts respects time and will only return the intersection of times when doing various mathematical operations.
We alluded to another way to handle this behavior. Namely, re-indexing your data before an operation. This makes it possible to preserve the dimensions of your data by leveraging the same mechanism that xts uses internally in its own Ops method (the code dispatched when you call + or similar).
The third way involves modifying the two series you want by assuring you have some union of dates - the dates you require in your final output.
merge(b, index(a))
# Add a to b, and fill all missing rows of b with 0
a + merge(b, index(a), fill = 0)
a
2015-01-24 3
2015-01-25 1
2015-01-26 1
# Add a to b and fill NAs with the last observation
a + merge(b, index(a), fill = na.locf)
a
2015-01-24 3
2015-01-25 3
2015-01-26 3
xts makes it easy to join data by column and row using a few different functions. All results will be correctly ordered in time, regardless of original frequencies or date class. One of the most important functions to accomplish this is merge(). It takes one or more series and joins them by column. It’s also possible to combine a series with a vector of dates. This is especially useful for normalizing observations to a fixed calendar.
merge() takes three key arguments which we will emphasize here. First is the …, which lets you pass in an arbitrary number of objects to combine. The second argument is join, which specifies how to join the series - accepting arguments such as inner or left. This is similar to a relational database join, only here, the index is what we join on. The final argument for this exercise is fill. This keyword specifies what to do with the new values in a series if there is missingness introduced as a result of the merge.
# Basic argument use
merge(a, b, join = "right", fill = 9999)
# Perform an inner join of a and b
merge(a, b, join = "inner")
a b
2016-06-05 -1.2070657 0.4291247
2016-06-08 0.2774292 -0.5747400
2016-06-09 1.0844412 -0.5466319
# Perform a left-join of a and b, fill missing values with 0
merge(a, b, join = "left", fill = 0)
a b
2016-06-05 -1.2070657 0.4291247
2016-06-08 0.2774292 -0.5747400
2016-06-09 1.0844412 -0.5466319
2016-06-13 -2.3456977 0.0000000
xts provides its own S3 method to the base rbind() generic function. The xts rbind function is much simpler than merge(). The only argument that matters is …, which takes an arbitrary number of objects to bind. What is different is that rbind requires a time series, since we need to have timestamps for R to know where to insert new data.
# Row bind temps_june30 to temps, assign this to temps2
temps2 <- rbind(temps, temps_june30)
# Row bind temps_july17 and temps_july18 to temps2, call this temps3
temps3 <- rbind(temps2, temps_july17, temps_july18)
temps2
Temp.Max Temp.Mean Temp.Min
2016-06-30 75 73 63
2016-07-01 74 69 60
2016-07-02 78 66 56
2016-07-03 79 68 59
2016-07-04 80 76 69
2016-07-05 90 79 68
2016-07-06 89 79 70
2016-07-07 87 78 72
2016-07-08 89 80 72
2016-07-09 81 73 67
2016-07-10 83 72 64
2016-07-11 93 81 69
2016-07-12 89 82 77
2016-07-13 86 78 68
2016-07-14 89 80 68
2016-07-15 75 72 60
2016-07-16 79 69 60
temps3
Temp.Max Temp.Mean Temp.Min
2016-06-30 75 73 63
2016-07-01 74 69 60
2016-07-02 78 66 56
2016-07-03 79 68 59
2016-07-04 80 76 69
2016-07-05 90 79 68
2016-07-06 89 79 70
2016-07-07 87 78 72
2016-07-08 89 80 72
2016-07-09 81 73 67
2016-07-10 83 72 64
2016-07-11 93 81 69
2016-07-12 89 82 77
2016-07-13 86 78 68
2016-07-14 89 80 68
2016-07-15 75 72 60
2016-07-16 79 69 60
2016-07-17 79 70 68
2016-07-18 75 70 65
Because xts objects are ordered by their time index, the order of arguments in xts’s rbind() command is unimportant.
As you’ve encountered already, it’s not uncommon to find yourself with missing values (i.e. NAs) in your time series. This may be the result of a data omission or some mathematical or merge operation you do on your data.
The xts package leverages the power of zoo for help with this. zoo provides a variety of missing data handling functions which are usable by xts.
In this example you will use the most basic of these, na.locf(). This function takes the last observation carried forward approach. In most circumstances this is the correct thing to do. It both preserves the last known value and prevents any look-ahead bias from entering into the data.
You can also apply next observation carried backward by setting fromLast = TRUE.
# Last obs. carried forward
na.locf(x)
# Next obs. carried backward
na.locf(x, fromLast = TRUE)
# Fill missing values in temps using the last observation
temps_last <- na.locf(temps)
# Fill missing values in temps using the next observation
temps_next <- na.locf(temps, fromLast = TRUE)
On occasion, a simple carry forward approach to missingness isn’t appropriate. It may be that a series is missing an observation due to a higher frequency sampling than the generating process. You might also encounter an observation that is in error, yet expected to be somewhere between the values of its neighboring observations.
These are scenarios where interpolation is useful. zoo provides a powerful tool to do this. Based on simple linear interpolation between points, implemented with na.approx() the data points are approximated using the distance between the index values. In other words, the estimated value is linear in time.
For this example, you’ll use a smaller xts version of the Box and Jenkin’s AirPassengers data set that ships with R. We’ve removed a few months of data to illustrate various fill techniques.
One takeaway, aside from getting a feel for the functions, is to see how various fill techniques impact your data, and especially how it will impact your understanding of it.
AirPass <- readRDS("airpass.rds")
AirPass
missing original
1960-01-01 417 417
1960-02-01 391 391
1960-03-01 NA 419
1960-04-01 NA 461
1960-05-01 NA 472
1960-06-01 535 535
1960-07-01 622 622
1960-08-01 606 606
1960-09-01 508 508
1960-10-01 461 461
1960-11-01 390 390
1960-12-01 432 432
# Interpolate NAs using linear approximation
na.approx(AirPass)
missing original
1960-01-01 417.0000 417
1960-02-01 391.0000 391
1960-03-01 425.5124 419
1960-04-01 462.4050 461
1960-05-01 498.1074 472
1960-06-01 535.0000 535
1960-07-01 622.0000 622
1960-08-01 606.0000 606
1960-09-01 508.0000 508
1960-10-01 461.0000 461
1960-11-01 390.0000 390
1960-12-01 432.0000 432
Linear interpolation is a straightforward way to account for missingness, although it is up to you to determine its applicability.
Another common modification for time series is the ability to lag a series. Also known as a backshift operation, it’s typically shown in literature using \(L^{k}\) notation, indicating a transformation in time \(L^{k}X = X_{t-k}\). This lets you see observations like yesterday’s value in the context of today.
Both zoo and xts implement this behavior, and in fact extend it from the ts original in R. There are two major differences between xts and zoo implementations that you need to be aware of. One is the direction of the lag for a given k. The second is how missingness is handled afterwards.
For historical reasons in R, zoo uses a convention for the sign of k in which negative values indicate lags and positive values indicate leads. That is, in zoo lag(x, k = 1) will shift future values one step back in time. This is inconsistent with the vast majority of the time series literature, but is consistent with behavior in base R. xts implements the exact opposite, namely for a positive k, the series will shift the last value in time one period forward; this is consistent with intuition, but quite different than zoo.
In this example, you will construct a single xts object with three columns. The first column is data one day ahead, the second column is the original data, and the third column is the one day behind - all using xts.
# Your final object
cbind(lead_x, x, lag_x)
# Create a leading object called lead_x
lead_x <- lag(x, k = -1)
# Create a lagging object called lag_x
lag_x <- lag(x, k = 1)
# Merge your three series together and assign to z
z <- cbind(lead_x, x, lag_x)
z
lead_x x lag_x
2016-06-02 2 1 NA
2016-06-03 NA 2 1
2016-06-04 4 NA 2
2016-06-05 5 4 NA
2016-06-06 6 5 4
2016-06-07 NA 6 5
2016-06-08 0 NA 6
2016-06-09 0 0 NA
2016-06-10 0 0 0
2016-06-11 NA 0 0
Generating leads and lags can help you visualize trends in your time series data over time.
Another common operation on time series, typically on those that are non-stationary, is to take a difference of the series. The number of differences to take of a series is an application of recursively calling the difference function n times.
A simple way to view a single (or “first order”) difference is to see it as x(t) - x(t-k) where k is the number of lags to go back. Higher order differences are simply the reapplication of a difference to each prior result.
In R, the difference operator for xts is made available using the diff() command. This function takes two arguments of note. The first is the lag, which is the number of periods, and the second is differences, which is the order of the difference (e.g. how many times diff() is called).
# These are the same
diff(x, differences = 2)
diff(diff(x))
AirPass2 <- AirPass[,2]
# Calculate the first difference of AirPass and assign to diff_by_hand
diff_by_hand <- AirPass2 - lag(AirPass2, k = 1)
# Use merge to compare the first parts of diff_by_hand and diff(AirPass)
merge(head(diff_by_hand), head(diff(AirPass2)))
original original.1
1960-01-01 NA NA
1960-02-01 -26 -26
1960-03-01 28 28
1960-04-01 42 42
1960-05-01 11 11
1960-06-01 63 63
# Calculate the first order 12 month difference of AirPass
diff(AirPass2, lag = 12, diff = 1)
original
1960-01-01 NA
1960-02-01 NA
1960-03-01 NA
1960-04-01 NA
1960-05-01 NA
1960-06-01 NA
1960-07-01 NA
1960-08-01 NA
1960-09-01 NA
1960-10-01 NA
1960-11-01 NA
1960-12-01 NA
As you can see, differencing your series is only one step more complex than generating lags and leads.
One of the benefits to working with time series objects is how easy it is to apply functions by time.
The main function in xts to facilitate this is endpoints(). It takes a time series (or a vector of times) and returns the locations of the last observations in each interval.
For example, the code below locates the last observation of each year for the AirPass data set.
endpoints(AirPass, on = "years")
[1] 0 12
The argument on supports a variety of periods, including “years”, “quarters”, “months”, as well as intraday intervals such as “hours”, and “minutes”. What is returned is a vector starting with 0 and ending with the extent (last row) of your data.
In addition to each period, you can find the \(K_th\) period by utilizing the k argument. For example, setting the arguments of your endpoints() call to on = “weeks”, k = 2, would generate the final day of every other week in your data. Note that the last value returned will always be the length of your input data, even if it doesn’t correspond to a skipped interval.
In this examole you’ll use endpoints() to find two sets of endpoints for the daily temps data.
# Locate the weeks
endpoints(temps, on = "weeks")
[1] 0 3 10 16
# Locate every two weeks
endpoints(temps, on = "weeks", k = 2)
[1] 0 10 16
At this point you know how to locate the end of periods using endpoints(). You may be wondering what it is you do with these values.
In the most simple case you can subset your object to get the last values. In certain cases this may be useful. For example, to identify the last known value of a sensor during the hour or get the value of the USD/JPY exchange rate at the start of the day. For most series, you will want to apply a function to the values between endpoints. In essence, use the base function apply(), but used on a window of time.
To do this easily, xts provides the period.apply() command, which takes a time series, an index of endpoints, and a function.
period.apply(x, INDEX, FUN, ...)
In this example we’ll practice using period.apply() by taking the weekly mean of your temps data.
# Calculate the weekly endpoints
ep <- endpoints(temps, on = "weeks")
# Now calculate the weekly mean and display the results
period.apply(temps[, "Temp.Mean"], INDEX = ep, FUN = mean)
Temp.Mean
2016-07-03 67.66667
2016-07-10 76.71429
2016-07-16 77.00000
The period.apply() command allows you to easily calculate complex qualities of your time series data.
Along the same lines as the previous exercise, xts gives you an additional mechanism to dive into periods of your data. Often it is useful to physically split your data into disjoint chunks by time and perform some calculation on these periods.
For this exercise you’ll make use of the xts split() command to chunk your data by time. The split() function creates a list containing an element for each split. The f argument in split() is a character string describing the period to split by (i.e. “months”, “years”, etc.).
Here you will follow the same process you followed in the previous exercise. However, this time you will manually split your data first, and then apply the mean() function to each chunk. The function lapply() is used for the most efficient calculations. In cases where you don’t want to return a time series, this proves to be very intuitive and effective.
# Split temps by week
temps_weekly <- split(temps, f = "weeks")
# Create a list of weekly means, temps_avg, and print this list
temps_avg <- lapply(X = temps_weekly, FUN = mean)
temps_avg
[[1]]
[1] 67.66667
[[2]]
[1] 77.04762
[[3]]
[1] 76.38889
As you can see, period.apply() is similar to using a combination of split() and lapply().
By now you have seen that even in xts there is more than one way to accomplish a task. In this exercise we’ll highlight this explicitly by tackling the same challenge using two different methods. When you are on your own, you will likely find situations where one or the other will be more intuitive, but for now you should make sure you are able to do both.
Starting with the same daily series temps, the challenge will be to find the last observation in each week.
Note that these functions will always find the dates that are in the closed interval [start of period, end of period] even if there is no observation at the exact start or end. xts represents irregular time series, so it is perfectly valid to have holes in the data where one might expect an observation.
# Use the proper combination of split, lapply and rbind
temps_1 <- do.call(rbind, lapply(split(temps, "weeks"), function(w) last(w, n = "1 day")))
# Create last_day_of_weeks using endpoints()
last_day_of_weeks <- endpoints(temps, on = "weeks")
# Subset temps using last_day_of_weeks
temps_2 <- temps[last_day_of_weeks]
temps_1
Temp.Max Temp.Mean Temp.Min
2016-07-03 79 68 59
2016-07-10 83 72 64
2016-07-16 79 69 60
temps_2
Temp.Max Temp.Mean Temp.Min
2016-07-03 79 68 59
2016-07-10 83 72 64
2016-07-16 79 69 60
It never hurts to know multiple methods for selecting certain points in your time series.
Aggregating time series can be a frustrating task. For example, in financial series it is common to find Open-High-Low-Close data (or OHLC) calculated over some repeating and regular interval.
Also known as range bars, aggregating a series based on some regular window can make analysis easier amongst series that have varying frequencies. A weekly economic series and a daily stock series can be compared more easily if the daily is converted to weekly.
In this exercise, you’ll convert from a univariate series into OHLC series, and then convert your final OHLC series back into a univariate series using the xts function to.period(). This function takes a time-series, x, and a string for the period (i.e. months, days, etc.), in addition to a number of other optional arguments.
to.period(x,
period = "months",
k = 1,
indexAt,
name=NULL,
OHLC = TRUE,
...)
You will use a new data set for this exercise, usd_eur, a daily USD/EUR exchange rate from 1999 to August 2016
dat <- read.csv("USDEUR.csv")
usd_eur <- xts(dat, order.by = as.Date(rownames(dat)))
# Convert usd_eur to weekly and assign to usd_eur_weekly
usd_eur_weekly <- to.period(usd_eur, period = "weeks")
# Convert usd_eur to monthly and assign to usd_eur_monthly
usd_eur_monthly <- to.period(usd_eur, period = "months")
# Convert usd_eur to yearly univariate and assign to usd_eur_yearly
usd_eur_yearly <- to.period(usd_eur, period = "years", OHLC = FALSE)
head(usd_eur_weekly)
usd_eur.Open usd_eur.High usd_eur.Low usd_eur.Close
1999-01-08 1.1812 1.1812 1.1554 1.1554
1999-01-15 1.1534 1.1698 1.1534 1.1591
1999-01-22 1.1610 1.1610 1.1575 1.1582
1999-01-29 1.1566 1.1577 1.1371 1.1371
1999-02-05 1.1303 1.1339 1.1283 1.1283
1999-02-12 1.1296 1.1331 1.1282 1.1282
head(usd_eur_monthly)
usd_eur.Open usd_eur.High usd_eur.Low usd_eur.Close
1999-01-29 1.1812 1.1812 1.1371 1.1371
1999-02-26 1.1303 1.1339 1.0972 1.0995
1999-03-31 1.0891 1.1015 1.0716 1.0808
1999-04-30 1.0782 1.0842 1.0564 1.0564
1999-05-28 1.0571 1.0787 1.0422 1.0422
1999-06-30 1.0449 1.0516 1.0296 1.0310
head(usd_eur_yearly)
DEXUSEU
1999-12-31 1.0070
2000-12-29 0.9388
2001-12-31 0.8901
2002-12-31 1.0485
2003-12-31 1.2597
2004-12-31 1.3538
Aggregating over time and converting from univariate to OHLC (and vice-versa) are useful skills for time series analysis, especially with financial data.
Besides converting univariate time series to OHLC series, to.period() also lets you convert OHLC to lower regularized frequency - something like subsampling your data.
Depending on the chosen frequency, the index class of your data may be coerced to something more appropriate to the new data. For example, when using the shortcut function to.quarterly(), xts will convert your index to the yearqtr class to make periods more obvious.
We can override this behavior by using the indexAt argument. Specifically, using firstof would give you the time from the beginning of the period. In addition, you can change the base name of each column by supplying a string to the argument name.
For this example we’ll introduce a new dataset, the edhec hedge fund index data from the PerformanceAnalytics package.
We will use the Equity Market Neutral time series from the edhec data, which we’ve assigned to eq_mkt.
library(PerformanceAnalytics)
eq_mkt <- edhec[,5]
# Convert eq_mkt to quarterly OHLC
mkt_quarterly <- to.period(eq_mkt, period = "quarters")
# Convert eq_mkt to quarterly using shortcut function
mkt_quarterly2 <- to.quarterly(eq_mkt, name = "edhec_equity" , indexAt = "firstof")
head(mkt_quarterly)
eq_mkt.Open eq_mkt.High eq_mkt.Low eq_mkt.Close
1997-03-31 0.0189 0.0189 0.0016 0.0016
1997-06-30 0.0119 0.0189 0.0119 0.0165
1997-09-30 0.0247 0.0247 0.0017 0.0202
1997-12-31 0.0095 0.0095 0.0041 0.0066
1998-03-31 0.0060 0.0179 0.0060 0.0179
1998-06-30 0.0067 0.0108 0.0067 0.0108
head(mkt_quarterly2)
edhec_equity.Open edhec_equity.High edhec_equity.Low edhec_equity.Close
1997-03-01 0.0189 0.0189 0.0016 0.0016
1997-06-01 0.0119 0.0189 0.0119 0.0165
1997-09-01 0.0247 0.0247 0.0017 0.0202
1997-12-01 0.0095 0.0095 0.0041 0.0066
1998-03-01 0.0060 0.0179 0.0060 0.0179
1998-06-01 0.0067 0.0108 0.0067 0.0108
Commands such as to.quarterly() provide a convenient shortcut for precisely converting your time series to a lower frequency.
One common aggregation you may want to apply involves doing a calculation within the context of a period, but returning the interim results for each observation of the period.
For example, you may want to calculate a running month-to-date cumulative sum of a series. This would be relevant when looking at monthly performance of a mutual fund you are interested in investing in.
For this example, you’ll calculate the cumulative annual return using the edhec fund data from the last exercise. To do this, you’ll follow the split()-lapply()-rbind() pattern demonstrated below:
x_split <- split(x, f = "months")
x_list <- lapply(x_split, cummax)
x_list_rbind <- do.call(rbind, x_list)
Note the last call uses R’s somewhat strange do.call(rbind, …) syntax, which allows you to pass a list to rbind instead of passing each object one at a time. This is a handy shortcut for your R toolkit.
# Split edhec into years
edhec_years <- split(edhec , f = "years")
# Use lapply to calculate the cumsum for each year in edhec_years
edhec_ytd <- lapply(edhec_years, FUN = cumsum)
# Use do.call to rbind the results
edhec_xts <- do.call(rbind, edhec_ytd)
head(edhec_xts)
Convertible Arbitrage CTA Global Distressed Securities Emerging Markets
1997-01-31 0.0119 0.0393 0.0178 0.0791
1997-02-28 0.0242 0.0691 0.0300 0.1316
1997-03-31 0.0320 0.0670 0.0288 0.1196
1997-04-30 0.0406 0.0500 0.0318 0.1315
1997-05-31 0.0562 0.0485 0.0551 0.1630
1997-06-30 0.0774 0.0570 0.0768 0.2211
Equity Market Neutral Event Driven Fixed Income Arbitrage Global Macro
1997-01-31 0.0189 0.0213 0.0191 0.0573
1997-02-28 0.0290 0.0297 0.0313 0.0748
1997-03-31 0.0306 0.0274 0.0422 0.0629
1997-04-30 0.0425 0.0269 0.0552 0.0801
1997-05-31 0.0614 0.0615 0.0670 0.0909
1997-06-30 0.0779 0.0873 0.0778 0.1127
Long/Short Equity Merger Arbitrage Relative Value Short Selling
1997-01-31 0.0281 0.0150 0.0180 -0.0166
1997-02-28 0.0275 0.0184 0.0298 0.0260
1997-03-31 0.0191 0.0244 0.0308 0.1038
1997-04-30 0.0275 0.0243 0.0430 0.0909
1997-05-31 0.0669 0.0440 0.0603 0.0172
1997-06-30 0.0892 0.0671 0.0801 0.0107
Funds of Funds
1997-01-31 0.0317
1997-02-28 0.0423
1997-03-31 0.0346
1997-04-30 0.0355
1997-05-31 0.0630
1997-06-30 0.0855
The split-lapply-rbind syntax may seem complicated, but it is a powerful way to manipulate your time series data.
Another common requirement when working with time series data is to apply a function on a rolling window of data. xts provides this facility through the intuitively named zoo function rollapply().
This function takes a time series object x, a window size width, and a function FUN to apply to each rolling period. The width argument can be tricky; a number supplied to the width argument specifies the number of observations in a window. For instance, to take the rolling 10-day max of a series, you would type the following:
rollapply(x, width = 10, FUN = max, na.rm = TRUE)
Note that the above would only take the 10-day max of a series with daily observations. If the series had monthly observations, it would take the 10-month max. Also note that you can pass additional arguments (i.e. na.rm to the max function) just like you would with apply().
# Use rollapply to calculate the rolling 3 period sd of eq_mkt
eq_sd <- rollapply(eq_mkt, width = 3, FUN = sd)
Rolling values are a useful metric in time series data.
help(OlsonNames)
xts objects are somewhat tricky when it comes to time. Internally, we have now seen that the index attribute is really a vector of numeric values corresponding to the seconds since the UNIX epoch (1970-01-01).
How these values are displayed on printing and how they are returned to the user when using the index() function is dependent on a few key internal attributes.
The information that controls this behavior can be viewed and even changed through a set of accessor functions detailed here:
# View the first three indexes of temps
index(temps)[1:3]
[1] "2016-07-01 UTC" "2016-07-02 UTC" "2016-07-03 UTC"
# Get the index class of temps
tclass(temps)
[1] "POSIXct" "POSIXt"
# Get the timezone of temps
tzone(temps)
[1] ""
# Change the format of the time display
tformat(temps) <- "%b-%d-%Y"
# View the new format
head(temps)
Temp.Max Temp.Mean Temp.Min
Jul-01-2016 74 69 60
Jul-02-2016 78 66 56
Jul-03-2016 79 68 59
Jul-04-2016 80 76 69
Jul-05-2016 90 79 68
Jul-06-2016 89 79 70
These commands allow you to quickly and easily modify the internal characteristics of your xts object.
One of the trickiest parts to working with time series in general is dealing with time zones. xts provides a simple way to leverage time zones on a per-series basis. While R provides time zone support in native classes POSIXct and POSIXlt, xts extends this power to the entire object, allowing you to have multiple time zones across various objects.
Some internal operation system functions require a time zone to do date math. If a time zone isn’t explicitly set, one is chosen for you! Be careful to always set a time zone in your environment to prevent errors when working with dates and times.
xts provides the function tzone(), which allows you to extract or set time zones.
tzone(x) <- "Time_Zone"
times <- as.Date(c("2020-07-18 15:11:35 GMT", "2020-07-18 15:13:15 GMT", "2020-07-18 15:14:55 GMT", "2020-07-18 15:16:35 GMT","2020-07-18 15:18:15 GMT", "2020-07-18 15:19:55 GMT", "2020-07-18 15:21:35 GMT", "2020-07-18 15:23:15 GMT","2020-07-18 15:24:55 GMT", "2020-07-18 15:26:35 GMT"))
# Construct times_xts with tzone set to America/Chicago
times_xts <- xts(1:10, order.by = times, tzone = "America/Chicago")
‘tzone’ setting ignored for Date indexes
# Change the time zone of times_xts to Asia/Hong_Kong
tzone(times_xts) <- "Asia/Hong_Kong"
# Extract the current time zone of times_xts
tzone(times_xts)
[1] "Asia/Hong_Kong"
Manipulating time zones in xts is relatively straightforward.
The idea of periodicity is pretty simple: With what regularity does your data repeat? For stock market data, you might have hourly prices or maybe daily open-high-low-close bars. For macroeconomic series, it might be monthly or weekly survey numbers.
xts provides a handy tool to discover this regularity in your data by estimating the frequency of the observations - what we are referring to as periodicity - using the periodicity() command
In this exercise, you’ll try this out on a few sample data sets. In real life you might find yourself doing this as a first step to understanding your data before diving in for further analysis.
# Calculate the periodicity of temps
periodicity(temps)
Daily periodicity from 2016-07-01 to 2016-07-16
# Calculate the periodicity of edhec
periodicity(edhec)
Monthly periodicity from 1997-01-31 to 2019-11-30
# Convert edhec to yearly
edhec_yearly <- to.yearly(edhec)
# Calculate the periodicity of edhec_yearly
periodicity(edhec_yearly)
Yearly periodicity from 1997-12-31 to 2019-11-30
The periodicity() command combined with the to.period() set of commands gives you a simple way to manipulate your time series data.
Often it is handy to know not just the range of your time series index, but also how many discrete irregular periods your time series data covers. You shouldn’t be surprised to learn that xts provides a set of functions to do just that!
If you have a time series, it is now easy to see how many days, weeks or years your data contains. To do so, simply use the function ndays() and its shortcut functions nmonths(), nquarters(), and so forth, making counting irregular periods easy.
# Count the months
nmonths(edhec)
[1] 275
# Count the quarters
nquarters(edhec)
[1] 92
# Count the years
nyears(edhec)
[1] 23
xts uses a very special attribute called index to provide time support to your objects. For performance and design reasons, the index is stored in a special way. This means that regardless of the class of your index (e.g. Date or yearmon) everything internally looks the same to xts. The raw index is actually a simple vector of fractional seconds since the UNIX epoch.
Normally you want to access the times you stored. index() does this magically for you by using your indexClass. To get to the raw vector of the index, you can use .index(). Note the critical dot before the function name.
More useful than extracting raw seconds is the ability to extract time components similar to the POSIXlt class, which closely mirrors the underlying POSIX internal compiled structure tm. This functionality is provided by a handful of commands such as .indexday(), .indexmon(), .indexyear(), and more.
In this example, you’ll take a look at the weekend weather in our pre-loaded temps data using the .indexwday() command. Note that the values range from 0-6, with Sunday equal to 0. Recall that you can use a logical vector to extract elements of an xts object.
# Explore underlying units of temps in two commands: .index() and .indexwday()
.index(temps)
[1] 1467331200 1467417600 1467504000 1467590400 1467676800 1467763200 1467849600
[8] 1467936000 1468022400 1468108800 1468195200 1468281600 1468368000 1468454400
[15] 1468540800 1468627200
attr(,"tzone")
[1] ""
attr(,"tclass")
[1] "POSIXct" "POSIXt"
attr(,"tformat")
[1] "%b-%d-%Y"
.indexwday(temps)
[1] 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6
# Create an index of weekend days using which()
index <- which(.indexwday(temps) == 0 | .indexwday(temps) == 6)
# Select the index
temps[index]
Temp.Max Temp.Mean Temp.Min
Jul-02-2016 78 66 56
Jul-03-2016 79 68 59
Jul-09-2016 81 73 67
Jul-10-2016 83 72 64
Jul-16-2016 79 69 60
As you can see, these index commands have a variety of applications when it comes to subsetting your time series data.
Most time series we’ve seen have been daily or lower frequency. Depending on your field, you might encounter higher frequency data - think intraday trading intervals, or sensor data from medical equipment.
In these situations, there are two functions in xts that are handy to know.
If you find that you have observations with identical timestamps, it might be useful to perturb or remove these times to allow for uniqueness. xts provides the function make.index.unique() for just this purpose. The eps argument, short for epsilon or small change, controls how much identical times should be perturbed, and drop = TRUE lets you just remove duplicate observations entirely.
On other occasions you might find your timestamps a bit too precise. In these instances it might be better to round up to some fixed interval, for example an observation may occur at any point in an hour, but you want to record the latest as of the beginning of the next hour. For this situation, the align.time() command will do what you need, setting the n argument to the number of seconds you’d like to round to.
make.index.unique(x, eps = 1e-4) # Perturb
make.index.unique(x, drop = TRUE) # Drop duplicates
align.time(x, n = 60) # Round to the minute
z <- rbind(structure(c(-1.53995004190371, -0.928567034713538, -0.928567034713538,
-0.29472044679056, -0.00576717274753696, 2.40465338885795, 0.76359346114046,
-0.799009248989368, -0.799009248989368, -1.14765700923635, -0.289461573688223,
-0.299215117897316), class = c("xts", "zoo"), .Dim = c(12L, 1L
), index = structure(c(1595194275, 1595226140, 1595226140, 1595321751,
1595425493, 1595540866, 1595566225, 1595712816, 1595712816, 1595803214,
1595865088, 1595948750), tclass = c("POSIXct", "POSIXt"), tzone = "GMT"), .indexCLASS = c("POSIXct",
"POSIXt"), .indexTZ = "GMT", tclass = c("POSIXct", "POSIXt"), tzone = "GMT"))
We’ll try the three use cases on an xts object called z.
# Make z have unique timestamps
z_unique <- make.index.unique(z, eps = 1e-4)
index value is unique but will be replaced; it is less than the cumulative epsilon for the preceding duplicate index values
# Remove duplicate times in z
z_dup <- make.index.unique(z, drop = TRUE)
# Round observations in z to the next hour
z_round <- align.time(z, n = 3600)
head(z_unique)
timezone of object (GMT) is different than current timezone ().
[,1]
2020-07-19 21:31:15 -1.539950042
2020-07-20 06:22:20 -0.928567035
2020-07-20 06:22:20 -0.928567035
2020-07-21 08:55:51 -0.294720447
2020-07-22 13:44:53 -0.005767173
2020-07-23 21:47:46 2.404653389
head(z_dup)
timezone of object (GMT) is different than current timezone ().
[,1]
2020-07-19 21:31:15 -1.539950042
2020-07-20 06:22:20 -0.928567035
2020-07-21 08:55:51 -0.294720447
2020-07-22 13:44:53 -0.005767173
2020-07-23 21:47:46 2.404653389
2020-07-24 04:50:25 0.763593461
head(z_round)
timezone of object (GMT) is different than current timezone ().
[,1]
2020-07-19 22:00:00 -1.539950042
2020-07-20 07:00:00 -0.928567035
2020-07-20 07:00:00 -0.928567035
2020-07-21 09:00:00 -0.294720447
2020-07-22 14:00:00 -0.005767173
2020-07-23 22:00:00 2.404653389