Motivation

There is a plethora of packages that work with time series and it can be difficult to determine which package might be best for a given situation. I have decided to take a historical approach to learning about the various packages, in order of ts, zoo, and xts. The base ts, can only handle strictly regular time series and cannot accept date vectors of more than 2 characters. Hence, the zoo package was developed to handle regular and irregular time series and different date formats. The xts package is an extension of zoo and works with a variety of date formats and conveniently stores the time series in matrix form. An introduction to time series plots will provide the reader with some qualities to investigate in a time series graph. Decomposition will be discussed but the method will not be shown. Skills with each package will be shown, including plots, subsetting and aggregation of data.

Time Series Introduction

A time series is a series of data points in chronological order, where time is the independent variable. In a strictly regular time series, the chronological data points will be evenly spaced. A regular time series could be unevenly spaced but possess some type of underlying regularity. An irregular time series will be too unequally spaced to determine any type of regularity. These individual definitions seem to vary, but this seems to be what the zoo package in R is using.

A time series is seasonal if a strict cyclical pattern occurs over a constant time interval. The frequency of a time series indicates the number of observations during that a time period. Stocks, climate, and production data are examples that lend itself well to time series analysis, because they are often sampled regularly. Some sample time series plots are shown below. Three of the graphs are plots of the monthly recreational visitors to a certain National Park. The US Air Passengers is in the datasets package, and the sheep plot is in the fma package. The humidity graph contains 1-4 samples per hour of humidity data for Minneapolis, MN from January to March 2017.

The usual statistical observations can be gleaned from a time series plot. For example, the Everglades plot appears to have a couple of outliers, years where the monthly attendance spiked. The Glacier plot appears to have a stationary minimum during the winter months. The humidity graph is noisy, its amount of fluctuations and data make it difficult to pick out any trends or seasonality. There appears to be a strict cyclical pattern in each of the national parks plots, and hence they are seasonal. The sheep plot may have some slight cyclical behavior.

Is the level of visitors increasing for the amount of recreational visitors to Glacier NP? How about the levels US Air Passengers? The trend at Glacier appears to be stationary, neither increasing nor decreasing. The sheep plot has a decreasing trend. The moving average trend for a couple of the series is shown below (there are resources on moving averages in the Summary and Additional Resources section).

A time series plot can also tell you something about the relationship of the trend to the seasonal pattern. For the Air Passengers plot, the relationship is multiplicative. That is, the height of the seasonal pattern is increasing (it could have also been decreasing to be multiplicative). In the Glacier graph, the relationship appears to be additive, determined because the height of the seasonal pattern is relatively constant.

The Redwood plot appears to be an increasing trend but not linear, and may be multiplicative. There are functions in R that will show these relationships separately, stl and decompose. Below is an example of the decompose function for the Redwood National Park. The last plot is that of error, what is left after trend and seasonality have been accounted for in the model. In the Summary and Additional Resources section, I have provided links on how these are calculated.

Now that you have some tools to recognize patterns from the plots of a time series, let’s work on forming the plots using ts, zoo and xts.

Base R: ts

In order to form a ts object, it is a requirement to know the start and frequency of the time series, and end is optional. Forming a ts object can be counter intuitive at first, since the time vector from your original data won’t be used. The form of the data should be in vector or matrix form, but R will also do its best to coerce it. Let’s take a look at how to form a ts object using the monthly recreational visitation data for Everglades National Park.

everglades <- read.csv("Everglades_Monthly_Data.csv", header = TRUE)
head(everglades)
##   Year   JAN    FEB    MAR   APR   MAY   JUN   JUL   AUG   SEP   OCT   NOV
## 1 1979 94405 145610  75386 67932 38994 26589 42045 59536 20699 34949 41571
## 2 1980 92517  96585 101797 80500 53627 37721 45665 47624 34889 47387 37498
## 3 1981 73845  75996  93947 63246 37226 23647 25168 28262 21493 29005 32084
## 4 1982 69165  81665  81796 56259 31248 18985 23111 29049 25022 33944 34009
## 5 1983 72105  79968  84433 59209 44570 27576 30348 26189 23056 28780 40683
## 6 1984 78179  92958  97997 71594 36831 22900 26928 28283 26494 33688 47202
##     DEC  Total
## 1 70386 718102
## 2 68434 744244
## 3 60802 564721
## 4 65915 550168
## 5 60522 577439
## 6 65604 628658
class(everglades)
## [1] "data.frame"

To use ts, our data must be expressed in a vector format. We will have to unwrap our data into a vector form (there’s probably a better way to do this, but it helps to illustrate the form that ts prefers). I took the transpose of the matrix and then converted it into a vector. I also choose to transform the population data into the 1000’s. The structure of the time series will show the dates and beginning of the data set.

everglades.vec <- as.vector(t(as.matrix(everglades[,2:13])))/1000 #unwrap
head(everglades.vec)
## [1]  94.405 145.610  75.386  67.932  38.994  26.589

Now, we are ready to form our time series, and check the structure, start, end and frequency.

everglades.ts <- ts(everglades.vec, c(1979,1), end = c(2017,12), frequency = 12) 
str(everglades.ts)
##  Time-Series [1:468] from 1979 to 2018: 94.4 145.6 75.4 67.9 39 ...
start(everglades.ts);end(everglades.ts); frequency(everglades.ts)
## [1] 1979    1
## [1] 2017   12
## [1] 12

Plotting the time series is relatively easy with the plot function.

plot(everglades.ts, xlab = "Time (years)", ylab = "Visitors (1000's)",col = "coral3")
title("Monthly Recreational Visitors-Everglades NP")

To view a subset of the data set, the function window can be used. Here, I used it to view a portion of the plot but it can also be used to form a ts object.

plot(window(everglades.ts, start = c(2000,1), end = c(2010,1), frequency = 12), ylab = "Visitors (1000's)", xlab = "Time (years)",col = "coral3", lwd = 2)
title("Monthly Recreational Visitors-Everglades NP")

There are a few ways to plot multiple time series plots on one graph. Here, the plot command will be shown. The plot.type can be set to multiple or single. Multiple will stack the individual time series while single will plot the time series on the same plot. It is important to note that in this example the input is a time series with 3 time series, formed by a cbind command. The aggregrate.ts function is also introduced in order to show the time series by total annual visitors. Another function,ts.plot can plot multiple time series without forming a multiple series ts object.

par(mfrow = c(1,1))
parks.ts <- cbind(everglades.ts, glacier.ts, redwood.ts)
plot(aggregate.ts(parks.ts, FUN=sum), xlab = "Time (years)", ylab = "Total Yearly Visitors (1000's)", plot.type = "single", col = c(1:3), lwd = 2)
title("Total Annual Recreational Visitors")
legend("topleft", c("Everglades", "Glacier", "Redwood"), lty = c(1,1,1), col = c(1:3), lwd = 2)

Zoo Objects

Unlike ts, the zoo package can detect whether a time series is regular, strictly regular, or irregular. The zoo pacakge can handle a variety of date formats. Although zoo objects seem to handle missing values and irregular time series well, it is important to pre-process the data and uncover any anomalies that could be fixed. Instead of a start and end parameter, the zoo parameter is order.by. For the zoo time series object I will be using weather data for a couple months from Minneapolis, Minnesota during 2017.

mpls <- read.csv("minneapolis_2months2017.csv", header = TRUE)
str(mpls)
## 'data.frame':    1854 obs. of  22 variables:
##  $ X           : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ date        : Factor w/ 1854 levels "2017-01-01 00:53:00",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ temp        : num  27 26.1 26.1 25 24.1 23 21.9 21 21 21 ...
##  $ dew_pt      : num  16 16 16 16 16 15.1 15.1 15.1 15.1 15.1 ...
##  $ hum         : int  63 66 66 69 71 72 75 78 78 78 ...
##  $ wind_spd    : num  16.1 11.5 10.4 9.2 6.9 6.9 5.8 4.6 5.8 5.8 ...
##  $ wind_gust   : num  24.2 NA NA NA NA NA NA NA NA NA ...
##  $ dir         : Factor w/ 17 levels "East","ENE","ESE",..: 13 13 13 17 17 15 17 13 12 12 ...
##  $ vis         : num  10 10 10 10 10 10 10 10 10 10 ...
##  $ pressure    : num  29.8 29.8 29.8 29.9 29.9 ...
##  $ wind_chill  : num  14.7 15.6 16.2 15.5 16 14.7 14.5 14.6 13.4 13.4 ...
##  $ heat_index  : logi  NA NA NA NA NA NA ...
##  $ precip      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ precip_rate : logi  NA NA NA NA NA NA ...
##  $ precip_total: logi  NA NA NA NA NA NA ...
##  $ cond        : Factor w/ 14 levels "Clear","Fog",..: 1 1 1 1 1 1 9 9 13 9 ...
##  $ fog         : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ rain        : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ snow        : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ hail        : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ thunder     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ tornado     : int  0 0 0 0 0 0 0 0 0 0 ...

From the structure of the data frame, date is a factor, not a date character. I formed a new date object, but it is possible to do this in-place on the time column of the data frame.

mpls.date <- as.POSIXct(as.character(mpls$date, format="%Y-%m-%d %H:%M:%S")) #form date object
mplstemp.zoo <- zoo(mpls$temp, order.by = mpls.date) 

Let’s take a look at the regularity of our time series and its structure.

str(mplstemp.zoo)
## 'zoo' series from 2017-01-01 00:53:00 to 2017-03-01 23:53:00
##   Data: num [1:1854] 27 26.1 26.1 25 24.1 23 21.9 21 21 21 ...
##   Index:  POSIXct[1:1854], format: "2017-01-01 00:53:00" "2017-01-01 01:53:00" ...
is.regular(mplstemp.zoo, strict = TRUE);is.regular(mplstemp.zoo, strict = FALSE)
## [1] FALSE
## [1] TRUE

Let’s get an idea of where our missing values might be.

sum(is.na(mplstemp.zoo))
## [1] 28
mplstemp.zoo[is.na(mplstemp.zoo)==TRUE]
## 2017-02-20 18:53:00 2017-02-20 19:03:00 2017-02-20 19:20:00 
##                  NA                  NA                  NA 
## 2017-02-20 19:53:00 2017-02-20 20:13:00 2017-02-20 20:53:00 
##                  NA                  NA                  NA 
## 2017-02-20 21:53:00 2017-02-20 22:53:00 2017-02-20 23:53:00 
##                  NA                  NA                  NA 
## 2017-02-21 00:53:00 2017-02-21 01:00:00 2017-02-21 01:53:00 
##                  NA                  NA                  NA 
## 2017-02-21 02:53:00 2017-02-21 03:53:00 2017-02-21 04:33:00 
##                  NA                  NA                  NA 
## 2017-02-21 04:40:00 2017-02-21 04:51:00 2017-02-21 04:53:00 
##                  NA                  NA                  NA 
## 2017-02-21 05:06:00 2017-02-21 05:18:00 2017-02-21 05:21:00 
##                  NA                  NA                  NA 
## 2017-02-21 05:53:00 2017-02-21 06:53:00 2017-02-21 07:53:00 
##                  NA                  NA                  NA 
## 2017-02-21 08:53:00 2017-02-21 09:18:00 2017-02-21 09:53:00 
##                  NA                  NA                  NA 
## 2017-02-21 10:01:00 
##                  NA

Since I was only missing a night’s worth of data, I felt comfortable using a linear interpolation method, na.approx to fill in missing values. It will still return a zoo object.

mpls.zoo2 <- na.approx(mplstemp.zoo)
class(mpls.zoo2)
## [1] "zoo"

Let’s see if that worked. It looks like this thermometer might have broken from shock at 66 degrees in February!

na.filled <- window(mpls.zoo2, start = as.POSIXct("2017-02-20 17:53:00"), end = as.POSIXct("2017-02-21 10:01:00 "))
na.filled
## 2017-02-20 17:53:00 2017-02-20 18:53:00 2017-02-20 19:03:00 
##            66.90000            65.61176            65.39706 
## 2017-02-20 19:20:00 2017-02-20 19:53:00 2017-02-20 20:13:00 
##            65.03206            64.32353            63.89412 
## 2017-02-20 20:53:00 2017-02-20 21:53:00 2017-02-20 22:53:00 
##            63.03529            61.74706            60.45882 
## 2017-02-20 23:53:00 2017-02-21 00:53:00 2017-02-21 01:00:00 
##            59.17059            57.88235            57.73206 
## 2017-02-21 01:53:00 2017-02-21 02:53:00 2017-02-21 03:53:00 
##            56.59412            55.30588            54.01765 
## 2017-02-21 04:33:00 2017-02-21 04:40:00 2017-02-21 04:51:00 
##            53.15882            53.00853            52.77235 
## 2017-02-21 04:53:00 2017-02-21 05:06:00 2017-02-21 05:18:00 
##            52.72941            52.45029            52.19265 
## 2017-02-21 05:21:00 2017-02-21 05:53:00 2017-02-21 06:53:00 
##            52.12824            51.44118            50.15294 
## 2017-02-21 07:53:00 2017-02-21 08:53:00 2017-02-21 09:18:00 
##            48.86471            47.57647            47.03971 
## 2017-02-21 09:53:00 2017-02-21 10:01:00 
##            46.28824            46.11647

Plotting a zoo object works with both plot.zoo and plot.

plot.zoo(mpls.zoo2, xlab = "Month", ylab = "Temp (F)", col = "mediumblue", lwd = 2) 
title("MPLS Jan-March Temperature for 2017")

xts Objects

The ‘x’ in xts stands for extensible, named as such because it is an extension of zoo. An xts object is formed just as a zoo object is. However, an xts object is in matrix form and thus is easier to subset through.

First, let’s coerce our zoo object into an xts object.

library(xts)
mpls.temp.xts <- as.xts(mpls.zoo2)
head(mpls.temp.xts)
##                     [,1]
## 2017-01-01 00:53:00 27.0
## 2017-01-01 01:53:00 26.1
## 2017-01-01 02:53:00 26.1
## 2017-01-01 03:53:00 25.0
## 2017-01-01 04:53:00 24.1
## 2017-01-01 05:53:00 23.0
dim(mpls.temp.xts) #not NULL!
## [1] 1854    1

Sub-setting is convenient in xts as you can do it with its dates and with a / for a range of dates.

mpls.temp.xts["2017-01-01 00:53:00"] #temperature for new years
##                     [,1]
## 2017-01-01 00:53:00   27
mpls.temp.xts["2017-01-01 00:15:00/2017-01-01 03:15:00"] #first few hours of new years
##                     [,1]
## 2017-01-01 00:53:00 27.0
## 2017-01-01 01:53:00 26.1
## 2017-01-01 02:53:00 26.1

xts has several tools for converting to different periods. Here we will use to.hourly. This provides, the first, min, max, and last of the data.

mplstemp.h <- to.hourly(mpls.temp.xts) 
mplstemp.h["2017-01-03 08:53:00"]
##                     mpls.temp.xts.Open mpls.temp.xts.High
## 2017-01-03 08:53:00               15.1               15.1
##                     mpls.temp.xts.Low mpls.temp.xts.Close
## 2017-01-03 08:53:00                14                  14

Here’s the plot of the daily high temperatures. The plot command with an xts object provides a TON of features. This makes it fairly easy to customize your plots.

mplstemp.d <- to.daily(mpls.temp.xts)[,2]
plot(mplstemp.d, col = "darkblue", grid.ticks.on = "days", major.ticks = "days", grid.col = "lightgrey", main = "MPLS daily temp JAN-MAR 2017")

A Brief ggplot example

You may also want to consider plotting a time series by making a scatterplot and then connecting the lines using base or ggplot. However, it seems that ggplot won’t deal with any time series classes, so it is necessary to have your data stored in another form.

library(ggplot2)
ggplot(glacier, aes(glacier$Year, glacier$TOTAL))+ geom_point() + geom_line(colour = "blue")+xlab("Year")+ ylab("Total Rec Visitors")+ggtitle("Glacier Annual Recreational Visitors")

Summary and Additional Resources

Hopefully, you have the tools you need to form time series objects and their plots. At first, I found it difficult to work with because I did not understand the structure of the objects that I needed to create. Here’s a review of each method and some pros/cons to their use:

In addition, coercion between objects is relatively easy, with commands such as as.xts. One should be careful if coercing to ts, since it can only handle strictly regular time series and may add NaNs. Note that I didn’t change any of the default axis to save space when coding, but some of the plots would have looked better with a different axis scale. I would also recommend that one still save their data in a data frame or matrix as it may be needed for additional analysis, such as regression. Also, I showed aggregation, but the aggregate command is slow for large datasets. One would want to choose a better package such as dplyr for that. I didn’t set a timezone in xts, but it’s recommended to set that.

The next topic of study would be the decomposition function and understanding how it’s created. R’s help,?decompose is also very informative. Date objects in R can be a pain! I have included some additional resources for both dates and decomposition.

References

NPS Stats and Visitor Use Statistics [https://irma.nps.gov/Stats/]

Weather Underground [https://www.wunderground.com/]

Package zoo [https://cran.r-project.org/web/packages/zoo/zoo.pdf]

Package xts [https://cran.r-project.org/web/packages/xts/xts.pdf]