There is a plethora of packages that work with time series and it can be difficult to determine which package might be best for a given situation. I have decided to take a historical approach to learning about the various packages, in order of ts
, zoo
, and xts
. The base ts
, can only handle strictly regular time series and cannot accept date vectors of more than 2 characters. Hence, the zoo
package was developed to handle regular and irregular time series and different date formats. The xts
package is an extension of zoo
and works with a variety of date formats and conveniently stores the time series in matrix form. An introduction to time series plots will provide the reader with some qualities to investigate in a time series graph. Decomposition will be discussed but the method will not be shown. Skills with each package will be shown, including plots, subsetting and aggregation of data.
A time series is a series of data points in chronological order, where time is the independent variable. In a strictly regular time series, the chronological data points will be evenly spaced. A regular time series could be unevenly spaced but possess some type of underlying regularity. An irregular time series will be too unequally spaced to determine any type of regularity. These individual definitions seem to vary, but this seems to be what the zoo
package in R
is using.
A time series is seasonal if a strict cyclical pattern occurs over a constant time interval. The frequency of a time series indicates the number of observations during that a time period. Stocks, climate, and production data are examples that lend itself well to time series analysis, because they are often sampled regularly. Some sample time series plots are shown below. Three of the graphs are plots of the monthly recreational visitors to a certain National Park. The US Air Passengers is in the datasets
package, and the sheep plot is in the fma
package. The humidity graph contains 1-4 samples per hour of humidity data for Minneapolis, MN from January to March 2017.
The usual statistical observations can be gleaned from a time series plot. For example, the Everglades plot appears to have a couple of outliers, years where the monthly attendance spiked. The Glacier plot appears to have a stationary minimum during the winter months. The humidity graph is noisy, its amount of fluctuations and data make it difficult to pick out any trends or seasonality. There appears to be a strict cyclical pattern in each of the national parks plots, and hence they are seasonal. The sheep plot may have some slight cyclical behavior.
Is the level of visitors increasing for the amount of recreational visitors to Glacier NP? How about the levels US Air Passengers? The trend at Glacier appears to be stationary, neither increasing nor decreasing. The sheep plot has a decreasing trend. The moving average trend for a couple of the series is shown below (there are resources on moving averages in the Summary and Additional Resources section).
A time series plot can also tell you something about the relationship of the trend to the seasonal pattern. For the Air Passengers plot, the relationship is multiplicative. That is, the height of the seasonal pattern is increasing (it could have also been decreasing to be multiplicative). In the Glacier graph, the relationship appears to be additive, determined because the height of the seasonal pattern is relatively constant.
The Redwood plot appears to be an increasing trend but not linear, and may be multiplicative. There are functions in R
that will show these relationships separately, stl
and decompose
. Below is an example of the decompose
function for the Redwood National Park. The last plot is that of error, what is left after trend and seasonality have been accounted for in the model. In the Summary and Additional Resources section, I have provided links on how these are calculated.
Now that you have some tools to recognize patterns from the plots of a time series, let’s work on forming the plots using ts
, zoo
and xts
.
In order to form a ts
object, it is a requirement to know the start
and frequency
of the time series, and end
is optional. Forming a ts
object can be counter intuitive at first, since the time vector from your original data won’t be used. The form of the data should be in vector or matrix form, but R
will also do its best to coerce it. Let’s take a look at how to form a ts
object using the monthly recreational visitation data for Everglades National Park.
everglades <- read.csv("Everglades_Monthly_Data.csv", header = TRUE)
head(everglades)
## Year JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV
## 1 1979 94405 145610 75386 67932 38994 26589 42045 59536 20699 34949 41571
## 2 1980 92517 96585 101797 80500 53627 37721 45665 47624 34889 47387 37498
## 3 1981 73845 75996 93947 63246 37226 23647 25168 28262 21493 29005 32084
## 4 1982 69165 81665 81796 56259 31248 18985 23111 29049 25022 33944 34009
## 5 1983 72105 79968 84433 59209 44570 27576 30348 26189 23056 28780 40683
## 6 1984 78179 92958 97997 71594 36831 22900 26928 28283 26494 33688 47202
## DEC Total
## 1 70386 718102
## 2 68434 744244
## 3 60802 564721
## 4 65915 550168
## 5 60522 577439
## 6 65604 628658
class(everglades)
## [1] "data.frame"
To use ts
, our data must be expressed in a vector format. We will have to unwrap our data into a vector form (there’s probably a better way to do this, but it helps to illustrate the form that ts
prefers). I took the transpose of the matrix and then converted it into a vector. I also choose to transform the population data into the 1000’s. The structure of the time series will show the dates and beginning of the data set.
everglades.vec <- as.vector(t(as.matrix(everglades[,2:13])))/1000 #unwrap
head(everglades.vec)
## [1] 94.405 145.610 75.386 67.932 38.994 26.589
Now, we are ready to form our time series, and check the structure, start, end and frequency.
everglades.ts <- ts(everglades.vec, c(1979,1), end = c(2017,12), frequency = 12)
str(everglades.ts)
## Time-Series [1:468] from 1979 to 2018: 94.4 145.6 75.4 67.9 39 ...
start(everglades.ts);end(everglades.ts); frequency(everglades.ts)
## [1] 1979 1
## [1] 2017 12
## [1] 12
Plotting the time series is relatively easy with the plot
function.
plot(everglades.ts, xlab = "Time (years)", ylab = "Visitors (1000's)",col = "coral3")
title("Monthly Recreational Visitors-Everglades NP")
To view a subset of the data set, the function window
can be used. Here, I used it to view a portion of the plot but it can also be used to form a ts
object.
plot(window(everglades.ts, start = c(2000,1), end = c(2010,1), frequency = 12), ylab = "Visitors (1000's)", xlab = "Time (years)",col = "coral3", lwd = 2)
title("Monthly Recreational Visitors-Everglades NP")
There are a few ways to plot multiple time series plots on one graph. Here, the plot
command will be shown. The plot.type
can be set to multiple
or single
. Multiple will stack the individual time series while single
will plot the time series on the same plot. It is important to note that in this example the input is a time series with 3 time series, formed by a cbind
command. The aggregrate.ts
function is also introduced in order to show the time series by total annual visitors. Another function,ts.plot
can plot multiple time series without forming a multiple series ts
object.
par(mfrow = c(1,1))
parks.ts <- cbind(everglades.ts, glacier.ts, redwood.ts)
plot(aggregate.ts(parks.ts, FUN=sum), xlab = "Time (years)", ylab = "Total Yearly Visitors (1000's)", plot.type = "single", col = c(1:3), lwd = 2)
title("Total Annual Recreational Visitors")
legend("topleft", c("Everglades", "Glacier", "Redwood"), lty = c(1,1,1), col = c(1:3), lwd = 2)
Unlike ts
, the zoo
package can detect whether a time series is regular, strictly regular, or irregular. The zoo
pacakge can handle a variety of date formats. Although zoo
objects seem to handle missing values and irregular time series well, it is important to pre-process the data and uncover any anomalies that could be fixed. Instead of a start
and end
parameter, the zoo parameter is order.by
. For the zoo
time series object I will be using weather data for a couple months from Minneapolis, Minnesota during 2017.
mpls <- read.csv("minneapolis_2months2017.csv", header = TRUE)
str(mpls)
## 'data.frame': 1854 obs. of 22 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ date : Factor w/ 1854 levels "2017-01-01 00:53:00",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ temp : num 27 26.1 26.1 25 24.1 23 21.9 21 21 21 ...
## $ dew_pt : num 16 16 16 16 16 15.1 15.1 15.1 15.1 15.1 ...
## $ hum : int 63 66 66 69 71 72 75 78 78 78 ...
## $ wind_spd : num 16.1 11.5 10.4 9.2 6.9 6.9 5.8 4.6 5.8 5.8 ...
## $ wind_gust : num 24.2 NA NA NA NA NA NA NA NA NA ...
## $ dir : Factor w/ 17 levels "East","ENE","ESE",..: 13 13 13 17 17 15 17 13 12 12 ...
## $ vis : num 10 10 10 10 10 10 10 10 10 10 ...
## $ pressure : num 29.8 29.8 29.8 29.9 29.9 ...
## $ wind_chill : num 14.7 15.6 16.2 15.5 16 14.7 14.5 14.6 13.4 13.4 ...
## $ heat_index : logi NA NA NA NA NA NA ...
## $ precip : num NA NA NA NA NA NA NA NA NA NA ...
## $ precip_rate : logi NA NA NA NA NA NA ...
## $ precip_total: logi NA NA NA NA NA NA ...
## $ cond : Factor w/ 14 levels "Clear","Fog",..: 1 1 1 1 1 1 9 9 13 9 ...
## $ fog : int 0 0 0 0 0 0 0 0 0 0 ...
## $ rain : int 0 0 0 0 0 0 0 0 0 0 ...
## $ snow : int 0 0 0 0 0 0 0 0 0 0 ...
## $ hail : int 0 0 0 0 0 0 0 0 0 0 ...
## $ thunder : int 0 0 0 0 0 0 0 0 0 0 ...
## $ tornado : int 0 0 0 0 0 0 0 0 0 0 ...
From the structure of the data frame, date is a factor, not a date character. I formed a new date object, but it is possible to do this in-place on the time column of the data frame.
mpls.date <- as.POSIXct(as.character(mpls$date, format="%Y-%m-%d %H:%M:%S")) #form date object
mplstemp.zoo <- zoo(mpls$temp, order.by = mpls.date)
Let’s take a look at the regularity of our time series and its structure.
str(mplstemp.zoo)
## 'zoo' series from 2017-01-01 00:53:00 to 2017-03-01 23:53:00
## Data: num [1:1854] 27 26.1 26.1 25 24.1 23 21.9 21 21 21 ...
## Index: POSIXct[1:1854], format: "2017-01-01 00:53:00" "2017-01-01 01:53:00" ...
is.regular(mplstemp.zoo, strict = TRUE);is.regular(mplstemp.zoo, strict = FALSE)
## [1] FALSE
## [1] TRUE
Let’s get an idea of where our missing values might be.
sum(is.na(mplstemp.zoo))
## [1] 28
mplstemp.zoo[is.na(mplstemp.zoo)==TRUE]
## 2017-02-20 18:53:00 2017-02-20 19:03:00 2017-02-20 19:20:00
## NA NA NA
## 2017-02-20 19:53:00 2017-02-20 20:13:00 2017-02-20 20:53:00
## NA NA NA
## 2017-02-20 21:53:00 2017-02-20 22:53:00 2017-02-20 23:53:00
## NA NA NA
## 2017-02-21 00:53:00 2017-02-21 01:00:00 2017-02-21 01:53:00
## NA NA NA
## 2017-02-21 02:53:00 2017-02-21 03:53:00 2017-02-21 04:33:00
## NA NA NA
## 2017-02-21 04:40:00 2017-02-21 04:51:00 2017-02-21 04:53:00
## NA NA NA
## 2017-02-21 05:06:00 2017-02-21 05:18:00 2017-02-21 05:21:00
## NA NA NA
## 2017-02-21 05:53:00 2017-02-21 06:53:00 2017-02-21 07:53:00
## NA NA NA
## 2017-02-21 08:53:00 2017-02-21 09:18:00 2017-02-21 09:53:00
## NA NA NA
## 2017-02-21 10:01:00
## NA
Since I was only missing a night’s worth of data, I felt comfortable using a linear interpolation method, na.approx
to fill in missing values. It will still return a zoo
object.
mpls.zoo2 <- na.approx(mplstemp.zoo)
class(mpls.zoo2)
## [1] "zoo"
Let’s see if that worked. It looks like this thermometer might have broken from shock at 66 degrees in February!
na.filled <- window(mpls.zoo2, start = as.POSIXct("2017-02-20 17:53:00"), end = as.POSIXct("2017-02-21 10:01:00 "))
na.filled
## 2017-02-20 17:53:00 2017-02-20 18:53:00 2017-02-20 19:03:00
## 66.90000 65.61176 65.39706
## 2017-02-20 19:20:00 2017-02-20 19:53:00 2017-02-20 20:13:00
## 65.03206 64.32353 63.89412
## 2017-02-20 20:53:00 2017-02-20 21:53:00 2017-02-20 22:53:00
## 63.03529 61.74706 60.45882
## 2017-02-20 23:53:00 2017-02-21 00:53:00 2017-02-21 01:00:00
## 59.17059 57.88235 57.73206
## 2017-02-21 01:53:00 2017-02-21 02:53:00 2017-02-21 03:53:00
## 56.59412 55.30588 54.01765
## 2017-02-21 04:33:00 2017-02-21 04:40:00 2017-02-21 04:51:00
## 53.15882 53.00853 52.77235
## 2017-02-21 04:53:00 2017-02-21 05:06:00 2017-02-21 05:18:00
## 52.72941 52.45029 52.19265
## 2017-02-21 05:21:00 2017-02-21 05:53:00 2017-02-21 06:53:00
## 52.12824 51.44118 50.15294
## 2017-02-21 07:53:00 2017-02-21 08:53:00 2017-02-21 09:18:00
## 48.86471 47.57647 47.03971
## 2017-02-21 09:53:00 2017-02-21 10:01:00
## 46.28824 46.11647
Plotting a zoo object works with both plot.zoo
and plot
.
plot.zoo(mpls.zoo2, xlab = "Month", ylab = "Temp (F)", col = "mediumblue", lwd = 2)
title("MPLS Jan-March Temperature for 2017")
The ‘x’ in xts
stands for extensible, named as such because it is an extension of zoo
. An xts
object is formed just as a zoo
object is. However, an xts
object is in matrix form and thus is easier to subset through.
First, let’s coerce our zoo
object into an xts
object.
library(xts)
mpls.temp.xts <- as.xts(mpls.zoo2)
head(mpls.temp.xts)
## [,1]
## 2017-01-01 00:53:00 27.0
## 2017-01-01 01:53:00 26.1
## 2017-01-01 02:53:00 26.1
## 2017-01-01 03:53:00 25.0
## 2017-01-01 04:53:00 24.1
## 2017-01-01 05:53:00 23.0
dim(mpls.temp.xts) #not NULL!
## [1] 1854 1
Sub-setting is convenient in xts
as you can do it with its dates and with a /
for a range of dates.
mpls.temp.xts["2017-01-01 00:53:00"] #temperature for new years
## [,1]
## 2017-01-01 00:53:00 27
mpls.temp.xts["2017-01-01 00:15:00/2017-01-01 03:15:00"] #first few hours of new years
## [,1]
## 2017-01-01 00:53:00 27.0
## 2017-01-01 01:53:00 26.1
## 2017-01-01 02:53:00 26.1
xts
has several tools for converting to different periods. Here we will use to.hourly
. This provides, the first, min, max, and last of the data.
mplstemp.h <- to.hourly(mpls.temp.xts)
mplstemp.h["2017-01-03 08:53:00"]
## mpls.temp.xts.Open mpls.temp.xts.High
## 2017-01-03 08:53:00 15.1 15.1
## mpls.temp.xts.Low mpls.temp.xts.Close
## 2017-01-03 08:53:00 14 14
Here’s the plot of the daily high temperatures. The plot
command with an xts
object provides a TON of features. This makes it fairly easy to customize your plots.
mplstemp.d <- to.daily(mpls.temp.xts)[,2]
plot(mplstemp.d, col = "darkblue", grid.ticks.on = "days", major.ticks = "days", grid.col = "lightgrey", main = "MPLS daily temp JAN-MAR 2017")
You may also want to consider plotting a time series by making a scatterplot and then connecting the lines using base or ggplot
. However, it seems that ggplot
won’t deal with any time series classes, so it is necessary to have your data stored in another form.
library(ggplot2)
ggplot(glacier, aes(glacier$Year, glacier$TOTAL))+ geom_point() + geom_line(colour = "blue")+xlab("Year")+ ylab("Total Rec Visitors")+ggtitle("Glacier Annual Recreational Visitors")
Hopefully, you have the tools you need to form time series objects and their plots. At first, I found it difficult to work with because I did not understand the structure of the objects that I needed to create. Here’s a review of each method and some pros/cons to their use:
yearmon
and yearqtr
. Easy to plot and will work with many different functions in R
. It can only handle strictly regular time series.ts
, but can handle dates of varying formats, such as POSIX
.zoo
, but with the addition of putting your data into matrix form with time as the index. The built-in plot capabilities are quite exhaustive.ggplot
is a required package for some other time series packages, such as ggseas
.In addition, coercion between objects is relatively easy, with commands such as as.xts
. One should be careful if coercing to ts
, since it can only handle strictly regular time series and may add NaNs
. Note that I didn’t change any of the default axis to save space when coding, but some of the plots would have looked better with a different axis scale. I would also recommend that one still save their data in a data frame
or matrix
as it may be needed for additional analysis, such as regression. Also, I showed aggregation, but the aggregate
command is slow for large datasets. One would want to choose a better package such as dplyr
for that. I didn’t set a timezone in xts
, but it’s recommended to set that.
The next topic of study would be the decomposition
function and understanding how it’s created. R’s help,?decompose
is also very informative. Date objects in R can be a pain! I have included some additional resources for both dates and decomposition.
Date Objects +[https://www.r-bloggers.com/date-formats-in-r/] +[https://www.stat.berkeley.edu/~s133/dates.html]
fma
and fpp
package has a lot of strictly regular time series. You could pick one, plot it and try out the ‘decompose’ function.NPS Stats and Visitor Use Statistics [https://irma.nps.gov/Stats/]
Weather Underground [https://www.wunderground.com/]
Package zoo [https://cran.r-project.org/web/packages/zoo/zoo.pdf]
Package xts [https://cran.r-project.org/web/packages/xts/xts.pdf]