Time series are a little different from other types of data. Time series data often has long-term trends or periodic patterns that traditional summary statistics don’t capture. To find these patterns, you need to use different types of analyses.
One important property of a time series is the autocorrelation function. You can estimate the autocorrelation function for time series using R’s acf
function:
acf(x, lag.max = NULL,
type = c("correlation", "covariance", "partial"),
plot = TRUE, na.action = na.fail, demean = TRUE, ...)
The function pacf
is an alias for acf
, except with the default type of “partial”:
pacf(x, lag.max, plot, na.action, ...)
By default, this function plots the results. As an example, let’s show the autocorrelation function of the turkey price data:
# library(nutshell)
data(turkey.price.ts)
acf(turkey.price.ts)
acf(turkey.price.ts,plot=FALSE)
##
## Autocorrelations of series 'turkey.price.ts', by lag
##
## 0.0000 0.0833 0.1667 0.2500 0.3333 0.4167 0.5000 0.5833 0.6667 0.7500
## 1.000 0.465 -0.019 -0.165 -0.145 -0.219 -0.215 -0.122 -0.136 -0.200
## 0.8333 0.9167 1.0000 1.0833 1.1667 1.2500 1.3333 1.4167 1.5000 1.5833
## -0.016 0.368 0.723 0.403 -0.013 -0.187 -0.141 -0.180 -0.226 -0.130
pacf(turkey.price.ts)
pacf(turkey.price.ts,plot=FALSE)
##
## Partial autocorrelations of series 'turkey.price.ts', by lag
##
## 0.0833 0.1667 0.2500 0.3333 0.4167 0.5000 0.5833 0.6667 0.7500 0.8333
## 0.465 -0.300 -0.020 -0.060 -0.218 -0.054 -0.061 -0.211 -0.180 0.098
## 0.9167 1.0000 1.0833 1.1667 1.2500 1.3333 1.4167 1.5000 1.5833
## 0.299 0.571 -0.122 -0.077 -0.075 0.119 0.064 -0.149 -0.061
The function ccf
plots the cross-correlation function for two time series:
ccf(x, y, lag.max = NULL, type = c("correlation", "covariance"),
plot = TRUE, na.action = na.fail, ...)
By default, this function will plot the results. You can suppress the plot (to just view the function) with the argument plot=FALSE
.
As an example of cross-correlations, we can use average ham prices in the United States. These are included in the nutshell
package as ham.price.ts
:
# library(nutshell)
data(ham.price.ts)
ccf(turkey.price.ts, ham.price.ts, plot=FALSE)
##
## Autocorrelations of series 'X', by lag
##
## -1.0833 -1.0000 -0.9167 -0.8333 -0.7500 -0.6667 -0.5833 -0.5000 -0.4167
## 0.147 0.168 -0.188 -0.259 -0.234 -0.098 -0.004 0.010 0.231
## -0.3333 -0.2500 -0.1667 -0.0833 0.0000 0.0833 0.1667 0.2500 0.3333
## 0.228 0.059 -0.038 0.379 0.124 -0.207 -0.315 -0.160 -0.084
## 0.4167 0.5000 0.5833 0.6667 0.7500 0.8333 0.9167 1.0000 1.0833
## -0.047 -0.005 0.229 0.223 -0.056 -0.099 0.189 0.039 -0.108
Time series models are a little different from other models that we’ve seen in R. With most other models, the goal is to predict a value (the response variable) from a set of other variables (the predictor variables). Usually, we explicitly assume that there is no autocorrelation: that the sequence of observations does not matter.
With time series, we assume the opposite: we assume that previous observations help predict future observations.
To fit an autoregressive model to a time series, use the function ar
:
ar(x, aic = TRUE, order.max = NULL,
method=c("yule-walker", "burg", "ols", "mle", "yw"),
na.action, series, ...)
Here is a description of the arguments to ar.
Argument | Description | Default |
---|---|---|
x | A time series. | |
aic | A logical value that specifies whether the Akaike information criterion is used to choose the order of the model. | TRUE |
order.max | A numeric value specifying the maximum order of the model to fit. | NULL |
method | A character value that specifies the method to use for fitting the model. Specify method=“yw” (or method=“yule-walker”) for the Yule-Walker method, method=“burg” for the Burg method, method=“ols” for ordinary least squares, or method=“mle” for maximum likelihood estimation. | c(“yule-walker”, “burg”, “ols”, “mle”, “yw”) |
na.action | A function that specifies how to handle missing values. | |
series | A character vector of names for the series. | |
demean | A logical value specifying if a mean should be estimated during fitting. | |
var.method | Specifies the method used to estimate the innovations variance when method=“ar.burg”. | |
... |
Additional arguments, depending on method. |
The ar
function actually calls one of four other functions, depending on the fit method chosen: ar.yw
, ar.burg
, ar.ols
, or ar.mle
. As an example, let’s fit an autoregressive model to the turkey price data:
# library(nutshell)
data(turkey.price.ts)
turkey.price.ts.ar <- ar(turkey.price.ts)
turkey.price.ts.ar
##
## Call:
## ar(x = turkey.price.ts)
##
## Coefficients:
## 1 2 3 4 5 6 7 8
## 0.3353 -0.1868 -0.0024 0.0571 -0.1554 -0.0208 0.0914 -0.0658
## 9 10 11 12
## -0.0952 0.0649 0.0099 0.5714
##
## Order selected 12 sigma^2 estimated as 0.05182
You can use the model to predict future values. To do this, use the predict
function. Here is the method for ar
objects:
predict(object, newdata, n.ahead = 1, se.fit = TRUE, ...)
The argument object
specifies the model object to use for prediction. You can use newdata
to specify new data to use for prediction, or n.ahead
to specify a number of periods ahead to predict. The argument se.fit
specifies whether to return standard errors of the prediction error.
Here is a forecast for the next 12 months for turkey prices:
predict(turkey.price.ts.ar,n.ahead=12)
## $pred
## Jan Feb Mar Apr May Jun Jul
## 2008 1.8827277 1.7209182 1.7715016
## 2009 1.5439290 1.6971933 1.5849406 1.7800358
## Aug Sep Oct Nov Dec
## 2008 1.9416776 1.7791961 1.4822070 0.9894343 1.1588863
## 2009
##
## $se
## Jan Feb Mar Apr May Jun Jul
## 2008 0.2276439 0.2400967 0.2406938
## 2009 0.2450732 0.2470678 0.2470864 0.2480176
## Aug Sep Oct Nov Dec
## 2008 0.2415644 0.2417360 0.2429339 0.2444610 0.2449850
## 2009
To take a look at a forecast from an autoregressive model, you can use the function ts.plot
. This function plots multiple time series on a single chart, even if the times are not overlapping. You can specify colors, line types, or other characteristics of each series as vectors; the ith place in the vector determines the property for the \(i^{th}\) series.
Here is how to plot the turkey price time series as a solid line, and a projection 24 months into the future as a dashed line:
ts.plot(turkey.price.ts, predict(turkey.price.ts.ar,n.ahead=24)$pred, lty=c(1:2))
You can also fit autoregressive integrated moving average (ARIMA) models in R using the arima
function:
arima(x, order = c(0, 0, 0),
seasonal = list(order = c(0, 0, 0), period = NA),
xreg = NULL, include.mean = TRUE,
transform.pars = TRUE,
fixed = NULL, init = NULL,
method = c("CSS-ML", "ML", "CSS"),
n.cond, optim.method = "BFGS",
optim.control = list(), kappa = 1e6)
Here is a description of the arguments to arima.
Argument | Description | Default |
---|---|---|
x | A time series. | |
order | A numeric vector (p, d, q), where p is the AR order, d is the degree of differencing, and q is the MA order. | c(0, 0, 0) |
seasonal | A list specifying the seasonal part of the model. The list contains two parts: the order and the period. | list(order = c(0, 0, 0), period = NA) |
xreg | An (optional) vector or matrix of external regressors (with the same number of rows as x). | NULL |
include.mean | A logical value specifying whether the model should include a mean/intercept term. | TRUE |
tranform.pars | A logical value specifying whether the AR parameters should be transformed to ensure that they remain in the region of stationarity. | TRUE |
fixed | An optional numeric vector specifying fixed values for parameters. (Only NA values are varied.) |
NULL |
init | A numeric vector of initial parameter values. | NULL |
method | A character value specifying the fitting method to use. The default setting, method=“CSS-ML”, uses conditional sum of squares to find starting values, then maximum likelihood. Specify method=“ML” for maximum likelihood only, or method=“CSS” for conditional sum of squares only. | c(“CSS-ML”, “ML”, “CSS”) |
n.cond | A numeric value indicating the number of initial values to ignore (only used for conditional sum of squares). |
When you want an R data structure that can represent time series data, you can use the zoo and xts packages. They define a data structure for time series, and they contain many useful functions for working with time series data. These representations assume you have two vectors: a vector of observations (data) and a vector of dates or times of those observations. The zoo
function combines them into a zoo object:
# library(zoo)
ts <- zoo(x, dt)
The xts
function is similar, returning an xts object:
# library(xts)
ts <- xts(x, dt)
The data, x
, should be numeric. The vector of dates or datetimes, dt, is called the index. Legal indices vary between the packages:
zoo
The index can be any ordered values, such as Date objects, POSIXct objects, integers, or even floating-point values.xts
The index must be a supported date or time class. This includes Date
, POSIXct
, and chron objects
. Those should be sufficient for most applications, but you can also use yearmon
, yearqtr
, and dateTime objects
. The xts
package is more restrictive than zoo because it implements powerful operations that require a time-based index.
Convert between representations of the time series data by using as.zoo
and as.xts
:
as.zoo(ts) # Converts ts to a zoo object
as.xts(ts) # Converts ts to an xts object
The following example creates a zoo
object that contains the price of IBM stock for thefirst five days of 2010; it uses Date
objects for the index:
prices <- c(132.45, 130.85, 130.00, 129.55, 130.85)
dates <- as.Date(c("2010-01-04", "2010-01-05", "2010-01-06", "2010-01-07","2010-01-08"))
ibm.daily <- zoo(prices, dates)
print(ibm.daily)
## 2010-01-04 2010-01-05 2010-01-06 2010-01-07 2010-01-08
## 132.45 130.85 130.00 129.55 130.85
In contrast, the next example captures the price of IBM stock at one-second intervals. It represents time by the number of hours past midnight starting at 9:30 a.m. (1 second \(\approx\) 0.00027778 hours):
prices <- c(131.18, 131.20, 131.17, 131.15, 131.17)
seconds <- c(9.5, 9.500278, 9.500556, 9.500833, 9.501111)
ibm.sec <- zoo(prices, seconds)
print(ibm.sec)
## 9.5 9.5003 9.5006 9.5008 9.5011
## 131.18 131.20 131.17 131.15 131.17
Those two examples used a single time series, where the data came from a vector. Both zoo
and xts
can also handle multiple, parallel time series. For this, capture the several time series in a data frame and then create a multivariate time series by calling the zoo
(or xts
) function:
ts <- zoo(dfrm, dt) # OR: ts <- xts(dfrm, dt)
For example:
m = matrix(1:20,5,4)
dates <- as.Date(c("2010-01-04", "2010-01-05", "2010-01-06", "2010-01-07","2010-01-08"))
zoo(m, dates)
##
## 2010-01-04 1 6 11 16
## 2010-01-05 2 7 12 17
## 2010-01-06 3 8 13 18
## 2010-01-07 4 9 14 19
## 2010-01-08 5 10 15 20
dfrm = data.frame(a = c(1:5),b = c(6:10))
zoo(dfrm, dates)
## a b
## 2010-01-04 1 6
## 2010-01-05 2 7
## 2010-01-06 3 8
## 2010-01-07 4 9
## 2010-01-08 5 10
The second argument is a vector of dates (or datetimes) for each observation. There is only one vector of dates for all the time series; in other words, all observations in each row of the data frame must have the same date.
Once the data is captured inside a zoo
or xts
object, you can extract the pure data via coredata
, which returns a simple vector (or matrix):
coredata(ibm.daily)
## [1] 132.45 130.85 130.00 129.55 130.85
You can extract the date or time portion via index:
index(ibm.daily)
## [1] "2010-01-04" "2010-01-05" "2010-01-06" "2010-01-07" "2010-01-08"
The xts
package is strongly similar to zoo
. It is optimized for speed and so is especially well suited for processing large volumes of data. It is also clever about converting to and from other time series representations.
One big advantage of capturing data inside a zoo
or xts
object is that special-purpose functions become available for printing, plotting, differencing, merging, periodic sampling, applying rolling functions, and other useful operations. There is even a function, read.zoo
, dedicated to reading time series data from ASCII files.
Remember that the xts
package can do everything that the zoo
package can do, so everywhere that this chapter talks about zoo
objects you can also use xts
objects.
You can use plot(x)
, which works for zoo
objects and xts
objects containing either single or multiple time series.
For a simple vector v of time series observations, you can use either plot(v,type="l")
or plot.ts(v)
.
Suppose we have a zoo
object dcp
:
# library(datasets)
data(EuStockMarkets)
idx = seq(from = as.Date("1991-05-10"), by = "day", length.out = 1860)
dcp = zoo(coredata(EuStockMarkets), idx)
head(dcp)
## DAX SMI CAC FTSE
## 1991-05-10 1628.75 1678.1 1772.8 2443.6
## 1991-05-11 1613.63 1688.5 1750.5 2460.2
## 1991-05-12 1606.51 1678.6 1718.0 2448.2
## 1991-05-13 1621.04 1684.1 1708.1 2470.4
## 1991-05-14 1618.16 1686.6 1723.1 2484.7
## 1991-05-15 1610.61 1671.6 1714.3 2466.8
If you plot the object with screens=1
, R will plot the two time series together in one plot:
plot(dcp, screens=1)
If you specify screens=c(1,2,3,4)
, however, then R will plot both time series separately in two plots:
plot(dcp, screens=c(1,2,3,4))
The plot
function provides a default label for the x-axis and y-axis, but they are not very informative. It does not provide a default title. So you may need to supply your own xlab
, ylab
, and main
(title) parameters.
xlab="Date"
ylab="Price"
main="Daily Closing Prices of Major European Stock Indices, 1991?C1998"
lty=c("solid", "dashed", "dotted", "dotdash")
ylim=range(coredata(dcp))
# Plot the two time series in two plots
plot(dcp, screens=c(1,2,3,4), lty=lty, main=main, xlab=xlab, ylab=ylab, ylim=ylim)
# Plot the two time series in one plot
plot(dcp, screens=1, lty=lty, main=main, xlab=xlab, ylab=ylab, col = c("red", "blue", "green", "orange"))
# Add a legend
legend(as.Date("1991-05-10"), 8000, c("DAX", "SMI", "CAC", "FTSE"), lty=lty, col = c("red", "blue", "green", "orange"))
You can use head
to view the oldest observations:
head(ts)
And use tail
to view the newest observations:
tail(ts)
The head
and tail
functions are generic, so they will work whether your data is stored in a simple vector, a zoo
object, or an xts
object. For example:
head(dcp)
## DAX SMI CAC FTSE
## 1991-05-10 1628.75 1678.1 1772.8 2443.6
## 1991-05-11 1613.63 1688.5 1750.5 2460.2
## 1991-05-12 1606.51 1678.6 1718.0 2448.2
## 1991-05-13 1621.04 1684.1 1708.1 2470.4
## 1991-05-14 1618.16 1686.6 1723.1 2484.7
## 1991-05-15 1610.61 1671.6 1714.3 2466.8
tail(dcp)
## DAX SMI CAC FTSE
## 1996-06-06 5598.32 7952.9 4041.9 5680.4
## 1996-06-07 5460.43 7721.3 3939.5 5587.6
## 1996-06-08 5285.78 7447.9 3846.0 5432.8
## 1996-06-09 5386.94 7607.5 3945.7 5462.2
## 1996-06-10 5355.03 7552.6 3951.7 5399.5
## 1996-06-11 5473.72 7676.3 3995.0 5455.0
By default, head
and tail
show (respectively) the six oldest and six newest observations. You can see more observations by providing a second argument, for example:
tail(dcp, 20)
## DAX SMI CAC FTSE
## 1996-05-23 6186.09 8400.8 4368.9 6179.0
## 1996-05-24 6184.10 8412.0 4322.1 6132.7
## 1996-05-25 6081.11 8340.7 4220.1 5989.6
## 1996-05-26 6043.82 8229.2 4235.9 5976.2
## 1996-05-27 6040.58 8205.7 4205.4 5892.3
## 1996-05-28 5854.35 7998.7 4139.5 5836.1
## 1996-05-29 5867.52 8093.0 4122.4 5835.8
## 1996-05-30 5828.74 8102.7 4139.2 5844.1
## 1996-05-31 5906.33 8205.5 4197.6 5910.7
## 1996-06-01 5861.19 8239.5 4177.3 5837.0
## 1996-06-02 5774.38 8139.2 4095.0 5809.7
## 1996-06-03 5718.70 8170.2 4047.9 5736.1
## 1996-06-04 5614.77 7943.2 3976.4 5632.5
## 1996-06-05 5528.12 7846.2 3968.6 5594.1
## 1996-06-06 5598.32 7952.9 4041.9 5680.4
## 1996-06-07 5460.43 7721.3 3939.5 5587.6
## 1996-06-08 5285.78 7447.9 3846.0 5432.8
## 1996-06-09 5386.94 7607.5 3945.7 5462.2
## 1996-06-10 5355.03 7552.6 3951.7 5399.5
## 1996-06-11 5473.72 7676.3 3995.0 5455.0
The xts
package also includes first and last functions, which use calendar periods instead of number of observations. We can use first
and last
to select data by number of days, weeks, months, or even years:
first(as.xts(dcp), "3 weeks")
## DAX SMI CAC FTSE
## 1991-05-10 1628.75 1678.1 1772.8 2443.6
## 1991-05-11 1613.63 1688.5 1750.5 2460.2
## 1991-05-12 1606.51 1678.6 1718.0 2448.2
## 1991-05-13 1621.04 1684.1 1708.1 2470.4
## 1991-05-14 1618.16 1686.6 1723.1 2484.7
## 1991-05-15 1610.61 1671.6 1714.3 2466.8
## 1991-05-16 1630.75 1682.9 1734.5 2487.9
## 1991-05-17 1640.17 1703.6 1757.4 2508.4
## 1991-05-18 1635.47 1697.5 1754.0 2510.5
## 1991-05-19 1645.89 1716.3 1754.3 2497.4
## 1991-05-20 1647.84 1723.8 1759.8 2532.5
## 1991-05-21 1638.35 1730.5 1755.5 2556.8
## 1991-05-22 1629.93 1727.4 1758.1 2561.0
## 1991-05-23 1621.49 1733.3 1757.5 2547.3
## 1991-05-24 1624.74 1734.0 1763.5 2541.5
## 1991-05-25 1627.63 1728.3 1762.8 2558.5
## 1991-05-26 1631.99 1737.1 1768.9 2587.9
last(as.xts(dcp), "10 days")
## DAX SMI CAC FTSE
## 1996-06-02 5774.38 8139.2 4095.0 5809.7
## 1996-06-03 5718.70 8170.2 4047.9 5736.1
## 1996-06-04 5614.77 7943.2 3976.4 5632.5
## 1996-06-05 5528.12 7846.2 3968.6 5594.1
## 1996-06-06 5598.32 7952.9 4041.9 5680.4
## 1996-06-07 5460.43 7721.3 3939.5 5587.6
## 1996-06-08 5285.78 7447.9 3846.0 5432.8
## 1996-06-09 5386.94 7607.5 3945.7 5462.2
## 1996-06-10 5355.03 7552.6 3951.7 5399.5
## 1996-06-11 5473.72 7676.3 3995.0 5455.0
Notice that I converted the zoo
object to an xts
object by calling as.xts(ibm)
, forcing R to use the xts
functions (methods). In addition, a “week” is defined by the calendar, not just by any seven consecutive days.
If you want to select one or more elements from a time series, you can index a zoo or xts object by position. Use one or two subscripts, depending upon whether the object contains one time series or multiple time series:
ts[i]
Selects the ith observation from a single time series
ts[j,i]
Selects the ith observation of the jth time series of multiple time series
You can index the time series by a date object. Use the same type of object as the index of your time series. This example assumes that the index contains Date
objects:
ts[as.Date("yyyy-mm-dd")]
You can index it by a sequence of dates:
dates <- seq(startdate, enddate, increment)
ts[dates]
The window
function can select a range by start and end date:
window(ts, start=startdate, end=enddate)
Recall our small sample of IBM stock prices:
ibm.daily
## 2010-01-04 2010-01-05 2010-01-06 2010-01-07 2010-01-08
## 132.45 130.85 130.00 129.55 130.85
We can select an observation by position, just like selecting elements from a vector:
ibm.daily[2]
## 2010-01-05
## 130.85
ibm.daily[2:4]
## 2010-01-05 2010-01-06 2010-01-07
## 130.85 130.00 129.55
Sometimes it’s more useful to select by date. In this case, our index is built from Date
objects, so we subscript the time series using a Date
object (if the index were built from POSIXct
objects, we’d use a POSIXct
object):
ibm.daily[as.Date('2010-01-05')]
## 2010-01-05
## 130.85
We can select by a vector of Date objects:
dates <- seq(as.Date('2010-01-04'), as.Date('2010-01-08'), by=2)
ibm.daily[dates]
## 2010-01-04 2010-01-06 2010-01-08
## 132.45 130.00 130.85
The window
function is easier for selecting a range of consecutive dates:
window(ibm.daily, start=as.Date('2010-01-05'), end=as.Date('2010-01-07'))
## 2010-01-05 2010-01-06 2010-01-07
## 130.85 130.00 129.55
Suppose that you have two or more time series and you want to merge them into a single time series object. In this case, you can use the zoo object to represent the time series; then use the merge function to combine them:
merge(ts1, ts2)
Merging two time series is an incredible headache when the two series have differing timestamps.
daily_idx = seq(from = as.Date("2000-01-02"), by = "day", length.out = 30)
weekly_idx = seq(from = as.Date("2000-01-01"), by = "week", length.out = 6)
ts1 = zoo(rnorm(30), daily_idx)
ts1
## 2000-01-02 2000-01-03 2000-01-04 2000-01-05 2000-01-06 2000-01-07
## 0.9737556 -0.8728359 0.2784130 1.3754153 1.6987765 -2.3975531
## 2000-01-08 2000-01-09 2000-01-10 2000-01-11 2000-01-12 2000-01-13
## 2.6821324 -0.8695387 -1.3853879 0.1380131 -1.5153835 1.1372844
## 2000-01-14 2000-01-15 2000-01-16 2000-01-17 2000-01-18 2000-01-19
## 1.4571492 -0.5150068 0.6403659 0.4043159 -0.5491450 2.0633122
## 2000-01-20 2000-01-21 2000-01-22 2000-01-23 2000-01-24 2000-01-25
## -0.2536504 -1.0916630 1.5104725 0.5727101 0.2553154 -0.2445711
## 2000-01-26 2000-01-27 2000-01-28 2000-01-29 2000-01-30 2000-01-31
## 0.9808319 0.7601182 0.6755955 1.5439757 1.8410443 -0.8875722
ts2 = zoo(rnorm(6), weekly_idx)
ts2
## 2000-01-01 2000-01-08 2000-01-15 2000-01-22 2000-01-29 2000-02-05
## 0.03448087 0.36390343 -1.42906502 -1.40853638 -0.28024273 -0.75505235
Obviously, the two time series have different timestamps because one is daily data and the other is weekly data.
Thank goodness for the merge
function, which handles the messy details of reconciling the different dates:
merge(ts1, ts2)
## ts1 ts2
## 2000-01-01 NA 0.03448087
## 2000-01-02 0.9737556 NA
## 2000-01-03 -0.8728359 NA
## 2000-01-04 0.2784130 NA
## 2000-01-05 1.3754153 NA
## 2000-01-06 1.6987765 NA
## 2000-01-07 -2.3975531 NA
## 2000-01-08 2.6821324 0.36390343
## 2000-01-09 -0.8695387 NA
## 2000-01-10 -1.3853879 NA
## 2000-01-11 0.1380131 NA
## 2000-01-12 -1.5153835 NA
## 2000-01-13 1.1372844 NA
## 2000-01-14 1.4571492 NA
## 2000-01-15 -0.5150068 -1.42906502
## 2000-01-16 0.6403659 NA
## 2000-01-17 0.4043159 NA
## 2000-01-18 -0.5491450 NA
## 2000-01-19 2.0633122 NA
## 2000-01-20 -0.2536504 NA
## 2000-01-21 -1.0916630 NA
## 2000-01-22 1.5104725 -1.40853638
## 2000-01-23 0.5727101 NA
## 2000-01-24 0.2553154 NA
## 2000-01-25 -0.2445711 NA
## 2000-01-26 0.9808319 NA
## 2000-01-27 0.7601182 NA
## 2000-01-28 0.6755955 NA
## 2000-01-29 1.5439757 -0.28024273
## 2000-01-30 1.8410443 NA
## 2000-01-31 -0.8875722 NA
## 2000-02-05 NA -0.75505235
By default, merge
finds the union of all dates: the output contains all dates from both inputs, and missing observations are filled with NA
values. You can replace those NA
values with the most recent observation by using the na.locf
function from the zoo
package:
na.locf(merge(ts1, ts2))
## ts1 ts2
## 2000-01-01 NA 0.03448087
## 2000-01-02 0.9737556 0.03448087
## 2000-01-03 -0.8728359 0.03448087
## 2000-01-04 0.2784130 0.03448087
## 2000-01-05 1.3754153 0.03448087
## 2000-01-06 1.6987765 0.03448087
## 2000-01-07 -2.3975531 0.03448087
## 2000-01-08 2.6821324 0.36390343
## 2000-01-09 -0.8695387 0.36390343
## 2000-01-10 -1.3853879 0.36390343
## 2000-01-11 0.1380131 0.36390343
## 2000-01-12 -1.5153835 0.36390343
## 2000-01-13 1.1372844 0.36390343
## 2000-01-14 1.4571492 0.36390343
## 2000-01-15 -0.5150068 -1.42906502
## 2000-01-16 0.6403659 -1.42906502
## 2000-01-17 0.4043159 -1.42906502
## 2000-01-18 -0.5491450 -1.42906502
## 2000-01-19 2.0633122 -1.42906502
## 2000-01-20 -0.2536504 -1.42906502
## 2000-01-21 -1.0916630 -1.42906502
## 2000-01-22 1.5104725 -1.40853638
## 2000-01-23 0.5727101 -1.40853638
## 2000-01-24 0.2553154 -1.40853638
## 2000-01-25 -0.2445711 -1.40853638
## 2000-01-26 0.9808319 -1.40853638
## 2000-01-27 0.7601182 -1.40853638
## 2000-01-28 0.6755955 -1.40853638
## 2000-01-29 1.5439757 -0.28024273
## 2000-01-30 1.8410443 -0.28024273
## 2000-01-31 -0.8875722 -0.28024273
## 2000-02-05 -0.8875722 -0.75505235
(Here locf
stands for ‘last observation carried forward’.) Observe that the NAs were replaced except the first observation (2000-01-01). You can get the intersection of all dates by setting all=FALSE
:
merge(ts1, ts2, all=FALSE)
## ts1 ts2
## 2000-01-08 2.6821324 0.3639034
## 2000-01-15 -0.5150068 -1.4290650
## 2000-01-22 1.5104725 -1.4085364
## 2000-01-29 1.5439757 -0.2802427
Now the output is limited to observations that are common to both files.
When your time series data has missing observations, you can use merge
function to fill or pad the data with the missing dates/times. First, create a zero-width (dataless) zoo
object with the missing dates/times. Then merge your data with the zero-width object, taking the union of all dates:
empty <- zoo(,dates) # 'dates' is vector of the missing dates
merge(ts, empty, all=TRUE)
The zoo
package includes a handy feature in the constructor for zoo
objects: you can omit the data and build a zero-width object. The object contains no data, just dates. We can use these ‘Frankenstein’ objects to perform such operations as filling and padding on other time series objects.
-The statistical details of times series can be found here.