Time series are a little different from other types of data. Time series data often has long-term trends or periodic patterns that traditional summary statistics don’t capture. To find these patterns, you need to use different types of analyses.

1 Autocorrelation Functions

One important property of a time series is the autocorrelation function. You can estimate the autocorrelation function for time series using R’s acf function:

acf(x, lag.max = NULL,
    type = c("correlation", "covariance", "partial"),
    plot = TRUE, na.action = na.fail, demean = TRUE, ...)

The function pacf is an alias for acf, except with the default type of “partial”:

pacf(x, lag.max, plot, na.action, ...)

By default, this function plots the results. As an example, let’s show the autocorrelation function of the turkey price data:

# library(nutshell)
data(turkey.price.ts)
acf(turkey.price.ts)

acf(turkey.price.ts,plot=FALSE)
## 
## Autocorrelations of series 'turkey.price.ts', by lag
## 
## 0.0000 0.0833 0.1667 0.2500 0.3333 0.4167 0.5000 0.5833 0.6667 0.7500 
##  1.000  0.465 -0.019 -0.165 -0.145 -0.219 -0.215 -0.122 -0.136 -0.200 
## 0.8333 0.9167 1.0000 1.0833 1.1667 1.2500 1.3333 1.4167 1.5000 1.5833 
## -0.016  0.368  0.723  0.403 -0.013 -0.187 -0.141 -0.180 -0.226 -0.130
pacf(turkey.price.ts)

pacf(turkey.price.ts,plot=FALSE)
## 
## Partial autocorrelations of series 'turkey.price.ts', by lag
## 
## 0.0833 0.1667 0.2500 0.3333 0.4167 0.5000 0.5833 0.6667 0.7500 0.8333 
##  0.465 -0.300 -0.020 -0.060 -0.218 -0.054 -0.061 -0.211 -0.180  0.098 
## 0.9167 1.0000 1.0833 1.1667 1.2500 1.3333 1.4167 1.5000 1.5833 
##  0.299  0.571 -0.122 -0.077 -0.075  0.119  0.064 -0.149 -0.061

The function ccf plots the cross-correlation function for two time series:

ccf(x, y, lag.max = NULL, type = c("correlation", "covariance"),
    plot = TRUE, na.action = na.fail, ...)

By default, this function will plot the results. You can suppress the plot (to just view the function) with the argument plot=FALSE.

As an example of cross-correlations, we can use average ham prices in the United States. These are included in the nutshell package as ham.price.ts:

# library(nutshell)
data(ham.price.ts)
ccf(turkey.price.ts, ham.price.ts, plot=FALSE)
## 
## Autocorrelations of series 'X', by lag
## 
## -1.0833 -1.0000 -0.9167 -0.8333 -0.7500 -0.6667 -0.5833 -0.5000 -0.4167 
##   0.147   0.168  -0.188  -0.259  -0.234  -0.098  -0.004   0.010   0.231 
## -0.3333 -0.2500 -0.1667 -0.0833  0.0000  0.0833  0.1667  0.2500  0.3333 
##   0.228   0.059  -0.038   0.379   0.124  -0.207  -0.315  -0.160  -0.084 
##  0.4167  0.5000  0.5833  0.6667  0.7500  0.8333  0.9167  1.0000  1.0833 
##  -0.047  -0.005   0.229   0.223  -0.056  -0.099   0.189   0.039  -0.108

2 Time Series Models

Time series models are a little different from other models that we’ve seen in R. With most other models, the goal is to predict a value (the response variable) from a set of other variables (the predictor variables). Usually, we explicitly assume that there is no autocorrelation: that the sequence of observations does not matter.

With time series, we assume the opposite: we assume that previous observations help predict future observations.

To fit an autoregressive model to a time series, use the function ar:

ar(x, aic = TRUE, order.max = NULL, 
   method=c("yule-walker", "burg", "ols", "mle", "yw"), 
   na.action, series, ...)

Here is a description of the arguments to ar.

Argument Description Default
x A time series.
aic A logical value that specifies whether the Akaike information criterion is used to choose the order of the model. TRUE
order.max A numeric value specifying the maximum order of the model to fit. NULL
method A character value that specifies the method to use for fitting the model. Specify method=“yw” (or method=“yule-walker”) for the Yule-Walker method, method=“burg” for the Burg method, method=“ols” for ordinary least squares, or method=“mle” for maximum likelihood estimation. c(“yule-walker”, “burg”, “ols”, “mle”, “yw”)
na.action A function that specifies how to handle missing values.
series A character vector of names for the series.
demean A logical value specifying if a mean should be estimated during fitting.
var.method Specifies the method used to estimate the innovations variance when method=“ar.burg”.
... Additional arguments, depending on method.

The ar function actually calls one of four other functions, depending on the fit method chosen: ar.yw, ar.burg, ar.ols, or ar.mle. As an example, let’s fit an autoregressive model to the turkey price data:

# library(nutshell)
data(turkey.price.ts)
turkey.price.ts.ar <- ar(turkey.price.ts)
turkey.price.ts.ar
## 
## Call:
## ar(x = turkey.price.ts)
## 
## Coefficients:
##       1        2        3        4        5        6        7        8  
##  0.3353  -0.1868  -0.0024   0.0571  -0.1554  -0.0208   0.0914  -0.0658  
##       9       10       11       12  
## -0.0952   0.0649   0.0099   0.5714  
## 
## Order selected 12  sigma^2 estimated as  0.05182

You can use the model to predict future values. To do this, use the predict function. Here is the method for ar objects:

predict(object, newdata, n.ahead = 1, se.fit = TRUE, ...)

The argument object specifies the model object to use for prediction. You can use newdata to specify new data to use for prediction, or n.ahead to specify a number of periods ahead to predict. The argument se.fit specifies whether to return standard errors of the prediction error.

Here is a forecast for the next 12 months for turkey prices:

predict(turkey.price.ts.ar,n.ahead=12)
## $pred
##            Jan       Feb       Mar       Apr       May       Jun       Jul
## 2008                                         1.8827277 1.7209182 1.7715016
## 2009 1.5439290 1.6971933 1.5849406 1.7800358                              
##            Aug       Sep       Oct       Nov       Dec
## 2008 1.9416776 1.7791961 1.4822070 0.9894343 1.1588863
## 2009                                                  
## 
## $se
##            Jan       Feb       Mar       Apr       May       Jun       Jul
## 2008                                         0.2276439 0.2400967 0.2406938
## 2009 0.2450732 0.2470678 0.2470864 0.2480176                              
##            Aug       Sep       Oct       Nov       Dec
## 2008 0.2415644 0.2417360 0.2429339 0.2444610 0.2449850
## 2009

To take a look at a forecast from an autoregressive model, you can use the function ts.plot. This function plots multiple time series on a single chart, even if the times are not overlapping. You can specify colors, line types, or other characteristics of each series as vectors; the ith place in the vector determines the property for the \(i^{th}\) series.

Here is how to plot the turkey price time series as a solid line, and a projection 24 months into the future as a dashed line:

ts.plot(turkey.price.ts, predict(turkey.price.ts.ar,n.ahead=24)$pred, lty=c(1:2))

You can also fit autoregressive integrated moving average (ARIMA) models in R using the arima function:

arima(x, order = c(0, 0, 0), 
      seasonal = list(order = c(0, 0, 0), period = NA),
      xreg = NULL, include.mean = TRUE,
      transform.pars = TRUE, 
      fixed = NULL, init = NULL, 
      method = c("CSS-ML", "ML", "CSS"), 
      n.cond, optim.method = "BFGS", 
      optim.control = list(), kappa = 1e6)

Here is a description of the arguments to arima.

Argument Description Default
x A time series.
order A numeric vector (p, d, q), where p is the AR order, d is the degree of differencing, and q is the MA order. c(0, 0, 0)
seasonal A list specifying the seasonal part of the model. The list contains two parts: the order and the period. list(order = c(0, 0, 0), period = NA)
xreg An (optional) vector or matrix of external regressors (with the same number of rows as x). NULL
include.mean A logical value specifying whether the model should include a mean/intercept term. TRUE
tranform.pars A logical value specifying whether the AR parameters should be transformed to ensure that they remain in the region of stationarity. TRUE
fixed An optional numeric vector specifying fixed values for parameters. (Only NA values are varied.) NULL
init A numeric vector of initial parameter values. NULL
method A character value specifying the fitting method to use. The default setting, method=“CSS-ML”, uses conditional sum of squares to find starting values, then maximum likelihood. Specify method=“ML” for maximum likelihood only, or method=“CSS” for conditional sum of squares only. c(“CSS-ML”, “ML”, “CSS”)
n.cond A numeric value indicating the number of initial values to ignore (only used for conditional sum of squares).

The arima function uses the optim function to fit models. You can use the result of an ARIMA model to smooth a time series with the tsSmooth function. For more information, see the help file for tsSmooth.

2.1 Representing Time Series Data

When you want an R data structure that can represent time series data, you can use the zoo and xts packages. They define a data structure for time series, and they contain many useful functions for working with time series data. These representations assume you have two vectors: a vector of observations (data) and a vector of dates or times of those observations. The zoo function combines them into a zoo object:

# library(zoo)
ts <- zoo(x, dt)

The xts function is similar, returning an xts object:

# library(xts)
ts <- xts(x, dt)

The data, x, should be numeric. The vector of dates or datetimes, dt, is called the index. Legal indices vary between the packages:

  • zoo

    The index can be any ordered values, such as Date objects, POSIXct objects, integers, or even floating-point values.
  • xts

    The index must be a supported date or time class. This includes Date, POSIXct, and chron objects. Those should be sufficient for most applications, but you can also use yearmon, yearqtr, and dateTime objects. The xts package is more restrictive than zoo because it implements powerful operations that require a time-based index.

Convert between representations of the time series data by using as.zoo and as.xts:

as.zoo(ts) # Converts ts to a zoo object
as.xts(ts) # Converts ts to an xts object

The following example creates a zoo object that contains the price of IBM stock for thefirst five days of 2010; it uses Date objects for the index:

prices <- c(132.45, 130.85, 130.00, 129.55, 130.85)
dates <- as.Date(c("2010-01-04", "2010-01-05", "2010-01-06", "2010-01-07","2010-01-08"))
ibm.daily <- zoo(prices, dates)
print(ibm.daily)
## 2010-01-04 2010-01-05 2010-01-06 2010-01-07 2010-01-08 
##     132.45     130.85     130.00     129.55     130.85

In contrast, the next example captures the price of IBM stock at one-second intervals. It represents time by the number of hours past midnight starting at 9:30 a.m. (1 second \(\approx\) 0.00027778 hours):

prices <- c(131.18, 131.20, 131.17, 131.15, 131.17)
seconds <- c(9.5, 9.500278, 9.500556, 9.500833, 9.501111)
ibm.sec <- zoo(prices, seconds)
print(ibm.sec)
##    9.5 9.5003 9.5006 9.5008 9.5011 
## 131.18 131.20 131.17 131.15 131.17

Those two examples used a single time series, where the data came from a vector. Both zoo and xts can also handle multiple, parallel time series. For this, capture the several time series in a data frame and then create a multivariate time series by calling the zoo (or xts) function:

ts <- zoo(dfrm, dt) # OR: ts <- xts(dfrm, dt)

For example:

m = matrix(1:20,5,4)
dates <- as.Date(c("2010-01-04", "2010-01-05", "2010-01-06", "2010-01-07","2010-01-08"))
zoo(m, dates)
##                      
## 2010-01-04 1  6 11 16
## 2010-01-05 2  7 12 17
## 2010-01-06 3  8 13 18
## 2010-01-07 4  9 14 19
## 2010-01-08 5 10 15 20
dfrm = data.frame(a = c(1:5),b = c(6:10))
zoo(dfrm, dates)
##            a  b
## 2010-01-04 1  6
## 2010-01-05 2  7
## 2010-01-06 3  8
## 2010-01-07 4  9
## 2010-01-08 5 10

The second argument is a vector of dates (or datetimes) for each observation. There is only one vector of dates for all the time series; in other words, all observations in each row of the data frame must have the same date.

Once the data is captured inside a zoo or xts object, you can extract the pure data via coredata, which returns a simple vector (or matrix):

coredata(ibm.daily)
## [1] 132.45 130.85 130.00 129.55 130.85

You can extract the date or time portion via index:

index(ibm.daily)
## [1] "2010-01-04" "2010-01-05" "2010-01-06" "2010-01-07" "2010-01-08"

The xts package is strongly similar to zoo. It is optimized for speed and so is especially well suited for processing large volumes of data. It is also clever about converting to and from other time series representations.

One big advantage of capturing data inside a zoo or xts object is that special-purpose functions become available for printing, plotting, differencing, merging, periodic sampling, applying rolling functions, and other useful operations. There is even a function, read.zoo, dedicated to reading time series data from ASCII files.

Remember that the xts package can do everything that the zoo package can do, so everywhere that this chapter talks about zoo objects you can also use xts objects.

2.2 Plotting Time Series Data

You can use plot(x), which works for zoo objects and xts objects containing either single or multiple time series.

For a simple vector v of time series observations, you can use either plot(v,type="l") or plot.ts(v).

Suppose we have a zoo object dcp:

# library(datasets)
data(EuStockMarkets)
idx = seq(from = as.Date("1991-05-10"), by = "day", length.out = 1860)
dcp = zoo(coredata(EuStockMarkets), idx)
head(dcp)
##                DAX    SMI    CAC   FTSE
## 1991-05-10 1628.75 1678.1 1772.8 2443.6
## 1991-05-11 1613.63 1688.5 1750.5 2460.2
## 1991-05-12 1606.51 1678.6 1718.0 2448.2
## 1991-05-13 1621.04 1684.1 1708.1 2470.4
## 1991-05-14 1618.16 1686.6 1723.1 2484.7
## 1991-05-15 1610.61 1671.6 1714.3 2466.8

If you plot the object with screens=1, R will plot the two time series together in one plot:

plot(dcp, screens=1)

If you specify screens=c(1,2,3,4), however, then R will plot both time series separately in two plots:

plot(dcp, screens=c(1,2,3,4))

The plot function provides a default label for the x-axis and y-axis, but they are not very informative. It does not provide a default title. So you may need to supply your own xlab, ylab, and main(title) parameters.

xlab="Date"
ylab="Price"
main="Daily Closing Prices of Major European Stock Indices, 1991?C1998"
lty=c("solid", "dashed", "dotted", "dotdash")
ylim=range(coredata(dcp))
# Plot the two time series in two plots
plot(dcp, screens=c(1,2,3,4), lty=lty, main=main, xlab=xlab, ylab=ylab, ylim=ylim)

# Plot the two time series in one plot
plot(dcp, screens=1, lty=lty, main=main, xlab=xlab, ylab=ylab, col = c("red", "blue", "green", "orange"))
# Add a legend
legend(as.Date("1991-05-10"), 8000, c("DAX", "SMI", "CAC", "FTSE"), lty=lty, col = c("red", "blue", "green", "orange"))

2.3 Extracting the Oldest or Newest Observations

You can use head to view the oldest observations:

head(ts)

And use tail to view the newest observations:

tail(ts)

The head and tail functions are generic, so they will work whether your data is stored in a simple vector, a zoo object, or an xts object. For example:

head(dcp)
##                DAX    SMI    CAC   FTSE
## 1991-05-10 1628.75 1678.1 1772.8 2443.6
## 1991-05-11 1613.63 1688.5 1750.5 2460.2
## 1991-05-12 1606.51 1678.6 1718.0 2448.2
## 1991-05-13 1621.04 1684.1 1708.1 2470.4
## 1991-05-14 1618.16 1686.6 1723.1 2484.7
## 1991-05-15 1610.61 1671.6 1714.3 2466.8
tail(dcp)
##                DAX    SMI    CAC   FTSE
## 1996-06-06 5598.32 7952.9 4041.9 5680.4
## 1996-06-07 5460.43 7721.3 3939.5 5587.6
## 1996-06-08 5285.78 7447.9 3846.0 5432.8
## 1996-06-09 5386.94 7607.5 3945.7 5462.2
## 1996-06-10 5355.03 7552.6 3951.7 5399.5
## 1996-06-11 5473.72 7676.3 3995.0 5455.0

By default, head and tail show (respectively) the six oldest and six newest observations. You can see more observations by providing a second argument, for example:

tail(dcp, 20)
##                DAX    SMI    CAC   FTSE
## 1996-05-23 6186.09 8400.8 4368.9 6179.0
## 1996-05-24 6184.10 8412.0 4322.1 6132.7
## 1996-05-25 6081.11 8340.7 4220.1 5989.6
## 1996-05-26 6043.82 8229.2 4235.9 5976.2
## 1996-05-27 6040.58 8205.7 4205.4 5892.3
## 1996-05-28 5854.35 7998.7 4139.5 5836.1
## 1996-05-29 5867.52 8093.0 4122.4 5835.8
## 1996-05-30 5828.74 8102.7 4139.2 5844.1
## 1996-05-31 5906.33 8205.5 4197.6 5910.7
## 1996-06-01 5861.19 8239.5 4177.3 5837.0
## 1996-06-02 5774.38 8139.2 4095.0 5809.7
## 1996-06-03 5718.70 8170.2 4047.9 5736.1
## 1996-06-04 5614.77 7943.2 3976.4 5632.5
## 1996-06-05 5528.12 7846.2 3968.6 5594.1
## 1996-06-06 5598.32 7952.9 4041.9 5680.4
## 1996-06-07 5460.43 7721.3 3939.5 5587.6
## 1996-06-08 5285.78 7447.9 3846.0 5432.8
## 1996-06-09 5386.94 7607.5 3945.7 5462.2
## 1996-06-10 5355.03 7552.6 3951.7 5399.5
## 1996-06-11 5473.72 7676.3 3995.0 5455.0

The xts package also includes first and last functions, which use calendar periods instead of number of observations. We can use first and last to select data by number of days, weeks, months, or even years:

first(as.xts(dcp), "3 weeks")
##                DAX    SMI    CAC   FTSE
## 1991-05-10 1628.75 1678.1 1772.8 2443.6
## 1991-05-11 1613.63 1688.5 1750.5 2460.2
## 1991-05-12 1606.51 1678.6 1718.0 2448.2
## 1991-05-13 1621.04 1684.1 1708.1 2470.4
## 1991-05-14 1618.16 1686.6 1723.1 2484.7
## 1991-05-15 1610.61 1671.6 1714.3 2466.8
## 1991-05-16 1630.75 1682.9 1734.5 2487.9
## 1991-05-17 1640.17 1703.6 1757.4 2508.4
## 1991-05-18 1635.47 1697.5 1754.0 2510.5
## 1991-05-19 1645.89 1716.3 1754.3 2497.4
## 1991-05-20 1647.84 1723.8 1759.8 2532.5
## 1991-05-21 1638.35 1730.5 1755.5 2556.8
## 1991-05-22 1629.93 1727.4 1758.1 2561.0
## 1991-05-23 1621.49 1733.3 1757.5 2547.3
## 1991-05-24 1624.74 1734.0 1763.5 2541.5
## 1991-05-25 1627.63 1728.3 1762.8 2558.5
## 1991-05-26 1631.99 1737.1 1768.9 2587.9
last(as.xts(dcp), "10 days")
##                DAX    SMI    CAC   FTSE
## 1996-06-02 5774.38 8139.2 4095.0 5809.7
## 1996-06-03 5718.70 8170.2 4047.9 5736.1
## 1996-06-04 5614.77 7943.2 3976.4 5632.5
## 1996-06-05 5528.12 7846.2 3968.6 5594.1
## 1996-06-06 5598.32 7952.9 4041.9 5680.4
## 1996-06-07 5460.43 7721.3 3939.5 5587.6
## 1996-06-08 5285.78 7447.9 3846.0 5432.8
## 1996-06-09 5386.94 7607.5 3945.7 5462.2
## 1996-06-10 5355.03 7552.6 3951.7 5399.5
## 1996-06-11 5473.72 7676.3 3995.0 5455.0

Notice that I converted the zoo object to an xts object by calling as.xts(ibm), forcing R to use the xts functions (methods). In addition, a “week” is defined by the calendar, not just by any seven consecutive days.

2.4 Subsetting a Time Series

If you want to select one or more elements from a time series, you can index a zoo or xts object by position. Use one or two subscripts, depending upon whether the object contains one time series or multiple time series:

  • ts[i]

    Selects the ith observation from a single time series

  • ts[j,i]

    Selects the ith observation of the jth time series of multiple time series

You can index the time series by a date object. Use the same type of object as the index of your time series. This example assumes that the index contains Date objects:

ts[as.Date("yyyy-mm-dd")]

You can index it by a sequence of dates:

dates <- seq(startdate, enddate, increment)
ts[dates]

The window function can select a range by start and end date:

window(ts, start=startdate, end=enddate)

Recall our small sample of IBM stock prices:

ibm.daily
## 2010-01-04 2010-01-05 2010-01-06 2010-01-07 2010-01-08 
##     132.45     130.85     130.00     129.55     130.85

We can select an observation by position, just like selecting elements from a vector:

ibm.daily[2]
## 2010-01-05 
##     130.85
ibm.daily[2:4]
## 2010-01-05 2010-01-06 2010-01-07 
##     130.85     130.00     129.55

Sometimes it’s more useful to select by date. In this case, our index is built from Date objects, so we subscript the time series using a Date object (if the index were built from POSIXct objects, we’d use a POSIXct object):

ibm.daily[as.Date('2010-01-05')]
## 2010-01-05 
##     130.85

We can select by a vector of Date objects:

dates <- seq(as.Date('2010-01-04'), as.Date('2010-01-08'), by=2)
ibm.daily[dates]
## 2010-01-04 2010-01-06 2010-01-08 
##     132.45     130.00     130.85

The window function is easier for selecting a range of consecutive dates:

window(ibm.daily, start=as.Date('2010-01-05'), end=as.Date('2010-01-07'))
## 2010-01-05 2010-01-06 2010-01-07 
##     130.85     130.00     129.55

2.5 Merging Several Time Series

Suppose that you have two or more time series and you want to merge them into a single time series object. In this case, you can use the zoo object to represent the time series; then use the merge function to combine them:

merge(ts1, ts2)

Merging two time series is an incredible headache when the two series have differing timestamps.

daily_idx = seq(from = as.Date("2000-01-02"), by = "day", length.out = 30)
weekly_idx = seq(from = as.Date("2000-01-01"), by = "week", length.out = 6)
ts1 = zoo(rnorm(30), daily_idx)
ts1
## 2000-01-02 2000-01-03 2000-01-04 2000-01-05 2000-01-06 2000-01-07 
##  0.9737556 -0.8728359  0.2784130  1.3754153  1.6987765 -2.3975531 
## 2000-01-08 2000-01-09 2000-01-10 2000-01-11 2000-01-12 2000-01-13 
##  2.6821324 -0.8695387 -1.3853879  0.1380131 -1.5153835  1.1372844 
## 2000-01-14 2000-01-15 2000-01-16 2000-01-17 2000-01-18 2000-01-19 
##  1.4571492 -0.5150068  0.6403659  0.4043159 -0.5491450  2.0633122 
## 2000-01-20 2000-01-21 2000-01-22 2000-01-23 2000-01-24 2000-01-25 
## -0.2536504 -1.0916630  1.5104725  0.5727101  0.2553154 -0.2445711 
## 2000-01-26 2000-01-27 2000-01-28 2000-01-29 2000-01-30 2000-01-31 
##  0.9808319  0.7601182  0.6755955  1.5439757  1.8410443 -0.8875722
ts2 = zoo(rnorm(6), weekly_idx) 
ts2
##  2000-01-01  2000-01-08  2000-01-15  2000-01-22  2000-01-29  2000-02-05 
##  0.03448087  0.36390343 -1.42906502 -1.40853638 -0.28024273 -0.75505235

Obviously, the two time series have different timestamps because one is daily data and the other is weekly data.

Thank goodness for the merge function, which handles the messy details of reconciling the different dates:

merge(ts1, ts2)
##                   ts1         ts2
## 2000-01-01         NA  0.03448087
## 2000-01-02  0.9737556          NA
## 2000-01-03 -0.8728359          NA
## 2000-01-04  0.2784130          NA
## 2000-01-05  1.3754153          NA
## 2000-01-06  1.6987765          NA
## 2000-01-07 -2.3975531          NA
## 2000-01-08  2.6821324  0.36390343
## 2000-01-09 -0.8695387          NA
## 2000-01-10 -1.3853879          NA
## 2000-01-11  0.1380131          NA
## 2000-01-12 -1.5153835          NA
## 2000-01-13  1.1372844          NA
## 2000-01-14  1.4571492          NA
## 2000-01-15 -0.5150068 -1.42906502
## 2000-01-16  0.6403659          NA
## 2000-01-17  0.4043159          NA
## 2000-01-18 -0.5491450          NA
## 2000-01-19  2.0633122          NA
## 2000-01-20 -0.2536504          NA
## 2000-01-21 -1.0916630          NA
## 2000-01-22  1.5104725 -1.40853638
## 2000-01-23  0.5727101          NA
## 2000-01-24  0.2553154          NA
## 2000-01-25 -0.2445711          NA
## 2000-01-26  0.9808319          NA
## 2000-01-27  0.7601182          NA
## 2000-01-28  0.6755955          NA
## 2000-01-29  1.5439757 -0.28024273
## 2000-01-30  1.8410443          NA
## 2000-01-31 -0.8875722          NA
## 2000-02-05         NA -0.75505235

By default, merge finds the union of all dates: the output contains all dates from both inputs, and missing observations are filled with NA values. You can replace those NA values with the most recent observation by using the na.locf function from the zoo package:

na.locf(merge(ts1, ts2))
##                   ts1         ts2
## 2000-01-01         NA  0.03448087
## 2000-01-02  0.9737556  0.03448087
## 2000-01-03 -0.8728359  0.03448087
## 2000-01-04  0.2784130  0.03448087
## 2000-01-05  1.3754153  0.03448087
## 2000-01-06  1.6987765  0.03448087
## 2000-01-07 -2.3975531  0.03448087
## 2000-01-08  2.6821324  0.36390343
## 2000-01-09 -0.8695387  0.36390343
## 2000-01-10 -1.3853879  0.36390343
## 2000-01-11  0.1380131  0.36390343
## 2000-01-12 -1.5153835  0.36390343
## 2000-01-13  1.1372844  0.36390343
## 2000-01-14  1.4571492  0.36390343
## 2000-01-15 -0.5150068 -1.42906502
## 2000-01-16  0.6403659 -1.42906502
## 2000-01-17  0.4043159 -1.42906502
## 2000-01-18 -0.5491450 -1.42906502
## 2000-01-19  2.0633122 -1.42906502
## 2000-01-20 -0.2536504 -1.42906502
## 2000-01-21 -1.0916630 -1.42906502
## 2000-01-22  1.5104725 -1.40853638
## 2000-01-23  0.5727101 -1.40853638
## 2000-01-24  0.2553154 -1.40853638
## 2000-01-25 -0.2445711 -1.40853638
## 2000-01-26  0.9808319 -1.40853638
## 2000-01-27  0.7601182 -1.40853638
## 2000-01-28  0.6755955 -1.40853638
## 2000-01-29  1.5439757 -0.28024273
## 2000-01-30  1.8410443 -0.28024273
## 2000-01-31 -0.8875722 -0.28024273
## 2000-02-05 -0.8875722 -0.75505235

(Here locf stands for ‘last observation carried forward’.) Observe that the NAs were replaced except the first observation (2000-01-01). You can get the intersection of all dates by setting all=FALSE:

merge(ts1, ts2, all=FALSE)
##                   ts1        ts2
## 2000-01-08  2.6821324  0.3639034
## 2000-01-15 -0.5150068 -1.4290650
## 2000-01-22  1.5104725 -1.4085364
## 2000-01-29  1.5439757 -0.2802427

Now the output is limited to observations that are common to both files.

2.6 Filling or Padding a Time Series

When your time series data has missing observations, you can use merge function to fill or pad the data with the missing dates/times. First, create a zero-width (dataless) zoo object with the missing dates/times. Then merge your data with the zero-width object, taking the union of all dates:

empty <- zoo(,dates) # 'dates' is vector of the missing dates
merge(ts, empty, all=TRUE)

The zoo package includes a handy feature in the constructor for zoo objects: you can omit the data and build a zero-width object. The object contains no data, just dates. We can use these ‘Frankenstein’ objects to perform such operations as filling and padding on other time series objects.

3 The statistical details

-The statistical details of times series can be found here.