For self-study

Check out the Little Book of R for Time Series at http://bit.ly/1bqP1tr . This resource will cover much more than we are covering this time. So, you should find it to be more than helpful for this, and the ARIMA sections of the course.

Also, here are a few potential data sets that you may want to try out on your own:
Age of Death of Successive Kings of England

kings <- scan("http://robjhyndman.com/tsdldata/misc/kings.dat",skip=3)
kings.ts <- ts(kings)

Number of births per month in New York city, from January 1946 to December 1959

births <- scan("http://robjhyndman.com/tsdldata/data/nybirths.dat")
births.ts <- ts(births, frequency=12, start=c(1946,1))

Monthly sales for a souvenir shop at a beach resort town in Queensland, Australia, for January 1987-December 1993

  souvenir <- scan("http://robjhyndman.com/tsdldata/data/fancy.dat")
  souv.ts <- ts(souvenir, frequency=12, start=c(1987,1))

All of the above data sets come from: http://bit.ly/ZA6QZp That page is also a good source of information in plotting and analyzing time series data. Enjoy

Some Simple Time Series Procedures

Getting started

For today’s practicum, start out by importing the time series data.

The data can be found on the course website. Look in Documents, Data, and Class Materials for: “US Monthly Industrial Production 1947-1993.csv”.

Also install the “TTR” package:

install.packages("TTR", dependencies=TRUE)

Now, read in the data and take a look at it.

ip <- read.csv(file.choose(), header=TRUE)
head(ip)
ip <- read.csv("US Monthly Industrial Production 1947-1993.csv", header=TRUE)
head(ip)
##   year month IndProd manufacturing manufDur manufNondur mining utilities
## 1   47     1    21.4          20.4     19.1        21.6   54.8        10
## 2   47     2    21.5          20.4     19.4        21.5   55.4      10.1
## 3   47     3    21.7          20.5     19.6        21.5   56.4      10.2
## 4   47     4    21.5          20.6     19.8        21.3   51.7      10.4
## 5   47     5    21.6          20.4     19.8        21.1   55.9      10.6
## 6   47     6    21.6          20.4     19.9        21.0   55.5      10.6
##   products materials
## 1     21.0      21.7
## 2     21.0      21.8
## 3     21.1      22.6
## 4     21.1      22.0
## 5     21.2      22.1
## 6     21.1      21.9

Select out a few variables to convert to time series data

Notice that the first two columns in the data are “year” and “month.” This is how we will identify the data as time series. For each of the variables we are inerested in investigating, we’ll have to tell R that they are time series data. To do so, we’ll use the ts function to convert the variable to time series data. The function needs to know:

  • which variable to convert
  • the frequency (number of observations per unit of time)
  • the start date for the series

We are using monthly data, so we’ll be using a frequency of 12. Also, the data go back as far as 1947, so we’ll start there.

For more options use the help section for ts: ?ts.

manu <- ts(ip[,4], frequency=12, start=c(1947, 1))
dura <- ts(ip[,5], frequency=12, start=c(1947, 1))
nondu <- ts(ip[,6], frequency=12, start=c(1947, 1))
mining <- ts(ip[,7], frequency=12, start=c(1947, 1))
util <- ts(ip[,8], frequency=12, start=c(1947, 1))
prods <- ts(ip[,9], frequency=12, start=c(1947, 1))

Take a look at the time series variable

Below, are a series of functions to evaluate the variables without having to plot them. Use these to assess the data.

class(manu)
start(manu)  # First observation
end(manu)  # Last observation
frequency(manu) # Tells frequency (If you are using someone else's data.)
summary(manu) # Descriptive measuress
cycle(manu)  # Use this to check that the data imported correctly
aggregate(manu, FUN=sum) # Sum the manufacturing values by year
aggregate(manu, FUN=mean) # Give annual average monthly manufacturing values
boxplot(manu~cycle(manu)) # Use boxplots to examine which years may have greater variance than others


Plotting the time series

Start with manufacturing output. This is still using base R functions. In this case, use plot.ts() to plot the time series.

plot.ts(manu, col="orange", ylab="Manufacturing") 

You may also plot the aggregate annual measures if you like. This will produce a less nuanced plot. But, it may be necessary if you are comparing the series to another annual series. You’ll also notice that the scale on the y-axis is much greater, since you are dealing with aggregate measures.

plot(aggregate(manu),  col="orange", ylab="Manufacturing (annual measures)")


Interruptions and interventions

We are a policy school, so we should add a policy interruption. For this example, I’ll just be arbitrary and add a vertical line at 1981 (when Regan was elected President). To do this, we use the abline() function. v= tells the function to add a vertical line, and h= tells it to add a horizontal line. If, for example, you wanted to instead add a horizontal line at 60, the command would be abline(h=60).

plot.ts(manu, col="orange", ylab="Manufacturing") 
abline(v=1981)

You may add more if necessary. (This is an optional - and fancy - way to do the same thing.)

plot.ts(manu, col="orange", ylab="Manufacturing") 
abline(v=c(1953, 1961, 1963, 1969, 1974, 1977, 1981, 1989), 
    col=c("red", "blue", "blue", "red", "red", "blue", "red", "red"))
text(c(1953, 1961, 1963, 1969, 1974, 1977, 1981, 1989),
    c(95, 90, 95, 90, 95, 90, 95, 90),
    labels=c("Eisenhower","Kennedy", "Johnson", "Nixon", "Ford", "Carter", "Reagan", 
    "Bush"), cex=.8,
    col=c("red", "blue", "blue", "red", "red", "blue", "red", "red"))

The text function adds text to a plot window and places it wherever you want it. The reference points correspond with the points on the x (time) and y (manufacturing levels) axes. For example, the first text element (Eisenhower), is located at 1953 on the x axis, and I placed it at 95 on the y axis (mostly to keep the chart readable). The colors (col=) can also be manipulated fairly easily. Here, they are coded according to political party.


Now add an comparison time series
par(new=TRUE) # Lets you add more stuff to the plot

plot.ts(mining, col="brown",  xlab='', ylab='', axes=FALSE) 

text(c(1950, 1960), c(70, 50), labels=c("Mining", "Manufacturing"), 
    col=c("brown", "orange"))

par(new=FALSE) # After this point, new plots will be on a clean surface.

That should give you an idea of what is possible for graphing time series.

Some Smoothing and Trend Removal Options

Now move on to diagnosing the components of a time series

First, we can check to see if the data are stationary using acf() and pacf().

To decompose the processes at work in your time series, try the decompose() function. (Note: this is only useful for data seasonal data.)

decomp <- decompose(manu)


First, take a look at what this produces:
ls(decomp)
## [1] "figure"   "random"   "seasonal" "trend"    "type"     "x"

To get a look at what this produced:

plot(decomp)

If you like, you may remove one or more of the components above from the variable. The example below removes the random noise. In the example, I named the new version of the manufacturing time series “quiet.manu”. You may name it whatever you like.

Plot it to get a look at what this produces.

quiet.manu <- manu - decomp$random
plot.ts(quiet.manu)

Or, better yet, compare the two versions of the manufacturing series in a plot to see the differences.

quiet.manu <- manu - decomp$random
plot.ts(quiet.manu)
par(new=TRUE) # Lets you add more stuff to the plot
plot.ts(manu)
par(new=FALSE)


Options for quieting “noise”

You can still “smooth” the plotted data. This is a widely accepted method for quieting some of the noise in the graph and help expose more salient trends in the series. Here, we will smooth the line using the moving averages method. The SMA function in the “TTR” package will average observations over the past n observations (periods).

library(TTR)
## Warning: package 'TTR' was built under R version 3.4.3
manuMA3 <- SMA(manu, n=3) # calculate the moving average over 3 preceding periods
plot.ts(manuMA3)

Another approach is to use exponential smoothing.

Exponential smoothing is essentially the same as the moving average method above, but uses a weighted moving average, where the weights decrease as they get further from the timepoint of interest. Here, we will use the HoltWinters function in the R base package to smooth the series. The “beta” command controls exponential smoothing (if beta=FALSE, then the data will be smoothed. The gamma= command controls seasonality in your data. (If gamma=FALSE, then the data will be treated as non-seasonal.)

For more details: ?HoltWinters

manuEXS <- HoltWinters(manu, beta=FALSE, gamma=TRUE)
manuEXS
plot.ts(manuEXS$fitted)
# or 
plot(manuEXS)