Exercise 2.1

For each of the following series (from the fma package), make a graph of the data. If transforming seems appropriate, do so and describe the effect.

if (!require("fma")) install.packages("fma")

Hints: data(package="fma") will give a list of the available data. To plot a transformed data set, use plot(BoxCox(x, 0.5)) where x is the name of the data set and 0.5 is the Box-Cox parameter.

library(fma)
find_dataset <- function(string, pkg) {
  sets <- data(package=pkg)$results[, c('Item', 'Title')]
  row <- grep(string, sets[, 'Title'])
  df <- data.frame("Dataset"=sets[row, 'Item'],
    "Description"=sets[row, 'Title'], row.names=NULL)
  return(df)
}
plot_TimeSeries <- function(ts, i=1:12, l=month.abb, x, y, desc) {
  par(mfrow=c(2, 2), oma=c(0, 0, 1, 0))
  plot(ts, ylab=y, xlab=x, main="Time Plot")
  seasonplot(ts, ylab=y, xlab=x, main="Seasonal Plot", col=1:20, pch=19)
  monthplot(ts, ylab=y, xlab=x, xaxt="n", main="Seasonal Subseries Plot")
  axis(1, at=i, labels=l, cex=0.8)
  title(desc, outer=TRUE)
  lambda <- BoxCox.lambda(ts)
  title <- paste0("Box-Cox Transformation (", round(lambda, 3), ")")
  plot(BoxCox(ts, lambda), ylab="", main=title)
  plot(decompose(ts))
}

Exercise 2.1.a

Monthly total of people on unemployed benefits in Australia (January 1956-July 1992).

Approach

First the function named find_dataset() created above searches for datasets in a specified package by keyword. Then the function named plot_TimeSeries() created above takes the dataset, periods, period labels, and data description as inputs. The function then outputs a Time Plot, Seasonal Plot, Seasonal Subseries Plot, Box-Cox transformed Time Plot, and a Decomposition Plot. The Time Plot is a line graph that plots each observed value against the time of the observation, with a single line connecting each observation across the entire period. The Seasonal Plot is also a line graph that plots each observed value against the time of the observation, but with seasons in the x-axis and separate lines connecting the observations for a given year across seasons. The Seasonal Subseries Plot is another line graph that plots each observed value against the time of the observation, but with seasons in the x-axis and separate lines connecting the observations for a given season across years–plus a line in each season representing the mean for each season. The Box-Cox transformation which is used to normalize data and improve forecasting is implemented using a \(\lambda\) parameter calculated by the BoxCox() function and then plotted using a Time Plot. The Decomposition Plot decomposes and plots the observed values, the underlying trend, seasonality, and randomness of the time series data.

Results

find_dataset("Australia", "fma")
##   Dataset                        Description
## 1    dole Unemployment benefits in Australia
plot_TimeSeries(dole / 1000, 1:12, month.abb, "Month", "Claims (Thousands)",
  "People on Unemployment in Australia (January 1956-July 1992)")

Interpretation

The Time Plot shows what could roughly be described as exponential growth with significant increases every ten years starting in 1975, and a large decrease around 1990. The Seasonal Plot shows the year-over-year increase, but no seasonal pattern. The Seasonal Subseries Plot further negates seasonal fluctuation by showing a mean that is relatively stable across seasons. The Box-Cox transformation computed by the BoxCox() function in R returns a value of \(\lambda=0.3290922\). This value substantiates exponential growth. The exponent calculated is approximately a cube root and applying it to the data makes the data look a bit more linear. The Decomposition Plot shows the data have an upward trend, monthly seasonality, and non-random residuals. It is worth noting that the data were divided by 1,000 to improve y-axis labeling.

Exercise 2.1.b

Monthly total of accidental deaths in the United States (January 1973-December 1978).

Approach

First the function named find_dataset() created above searches for datasets in a specified package by keyword. Then the function named plot_TimeSeries() created above takes the dataset, periods, period labels, and data description as inputs. The function then outputs a Time Plot, Seasonal Plot, Seasonal Subseries Plot, Box-Cox transformed Time Plot, and a Decomposition Plot. The Time Plot is a line graph that plots each observed value against the time of the observation, with a single line connecting each observation across the entire period. The Seasonal Plot is also a line graph that plots each observed value against the time of the observation, but with seasons in the x-axis and separate lines connecting the observations for a given year across seasons. The Seasonal Subseries Plot is another line graph that plots each observed value against the time of the observation, but with seasons in the x-axis and separate lines connecting the observations for a given season across years–plus a line in each season representing the mean for each season. The Box-Cox transformation which is used to normalize data and improve forecasting is implemented using a \(\lambda\) parameter calculated by the BoxCox() function and then plotted using a Time Plot. The Decomposition Plot decomposes and plots the observed values, the underlying trend, seasonality, and randomness of the time series data.

Results

find_dataset("death", "fma")
##    Dataset                       Description
## 1 ukdeaths Total deaths and serious injuries
## 2 usdeaths          Accidental deaths in USA
plot_TimeSeries(usdeaths, 1:12, month.abb, "Month", "Accidental Deaths",
  "Accidental Deaths in U.S. (January 1973-December 1978)")

Interpretation

The Time Plot shows a slight quadratic trend and a clear pattern of oscillations throughout every year with increases in the summer months. The Seasonal Plot shows a clear seasonal pattern with spikes in July and a large number of accidents across all of 1973 relative to the other years. The Seasonal Subseries Plot shows the seasonal pattern as well, but is a lot better at showing the underlying quadratic trend. The Box-Cox transformation computed by the BoxCox() function in R returns a value of \(\lambda=-0.0336377\). This value is very near zero. The exponent calculated suggests a logarithmic transformation and does very little to change the data. The Decomposition Plot shows the data have a quadratic trend, annual seasonality, and fairly random residuals.

Exercise 2.1.c

Quarterly production of bricks (in millions of units) at Portland, Australia (March 1956-September 1994).

Approach

First the function named find_dataset() created above searches for datasets in a specified package by keyword. Then the function named plot_TimeSeries() created above takes the dataset, periods, period labels, and data description as inputs. The function then outputs a Time Plot, Seasonal Plot, Seasonal Subseries Plot, Box-Cox transformed Time Plot, and a Decomposition Plot. The Time Plot is a line graph that plots each observed value against the time of the observation, with a single line connecting each observation across the entire period. The Seasonal Plot is also a line graph that plots each observed value against the time of the observation, but with seasons in the x-axis and separate lines connecting the observations for a given year across seasons. The Seasonal Subseries Plot is another line graph that plots each observed value against the time of the observation, but with seasons in the x-axis and separate lines connecting the observations for a given season across years–plus a line in each season representing the mean for each season. The Box-Cox transformation which is used to normalize data and improve forecasting is implemented using a \(\lambda\) parameter calculated by the BoxCox() function and then plotted using a Time Plot. The Decomposition Plot decomposes and plots the observed values, the underlying trend, seasonality, and randomness of the time series data.

Results

find_dataset("brick", "fma")
##   Dataset                     Description
## 1 bricksq Quarterly clay brick production
plot_TimeSeries(bricksq, 1:4, c('Q1','Q2','Q3','Q4'), "Month", "Bricks (Millions)",
  "Bricks Produced in Portland, Australia (March 1956-September 1994)")

Interpretation

The Time Plot shows an upward trend, no clear pattern of seasonality, and two large dips around 1975 and 1985. The Seasonal Plot shows a very slight seasonal pattern with fewer sales in the first quarter of each year. The Seasonal Subseries Plot shows the trend and seasonal pattern as well, but is a little easier to read due to the mean-lines. The Box-Cox transformation computed by the BoxCox() function in R returns a value of \(\lambda=0.2548929\). This value shows some exponential growth. The exponent is approximately a quartic root and does very little to change the data. The Decomposition Plot shows the data have an upward trend, monthly seasonality, and fairly random residuals.

Exercise 2.3

Consider the daily closing IBM stock prices (data set ibmclose).

ibmclose
## Time Series:
## Start = 1 
## End = 369 
## Frequency = 1 
##   [1] 460 457 452 459 462 459 463 479 493 490 492 498 499 497 496 490 489
##  [18] 478 487 491 487 482 479 478 479 477 479 475 479 476 476 478 479 477
##  [35] 476 475 475 473 474 474 474 465 466 467 471 471 467 473 481 488 490
##  [52] 489 489 485 491 492 494 499 498 500 497 494 495 500 504 513 511 514
##  [69] 510 509 515 519 523 519 523 531 547 551 547 541 545 549 545 549 547
##  [86] 543 540 539 532 517 527 540 542 538 541 541 547 553 559 557 557 560
## [103] 571 571 569 575 580 584 585 590 599 603 599 596 585 587 585 581 583
## [120] 592 592 596 596 595 598 598 595 595 592 588 582 576 578 589 585 580
## [137] 579 584 581 581 577 577 578 580 586 583 581 576 571 575 575 573 577
## [154] 582 584 579 572 577 571 560 549 556 557 563 564 567 561 559 553 553
## [171] 553 547 550 544 541 532 525 542 555 558 551 551 552 553 557 557 548
## [188] 547 545 545 539 539 535 537 535 536 537 543 548 546 547 548 549 553
## [205] 553 552 551 550 553 554 551 551 545 547 547 537 539 538 533 525 513
## [222] 510 521 521 521 523 516 511 518 517 520 519 519 519 518 513 499 485
## [239] 454 462 473 482 486 475 459 451 453 446 455 452 457 449 450 435 415
## [256] 398 399 361 383 393 385 360 364 365 370 374 359 335 323 306 333 330
## [273] 336 328 316 320 332 320 333 344 339 350 351 350 345 350 359 375 379
## [290] 376 382 370 365 367 372 373 363 371 369 376 387 387 376 385 385 380
## [307] 373 382 377 376 379 386 387 386 389 394 393 409 411 409 408 393 391
## [324] 388 396 387 383 388 382 384 382 383 383 388 395 392 386 383 377 364
## [341] 369 355 350 353 340 350 349 358 360 360 366 359 356 355 367 357 361
## [358] 355 348 343 330 340 339 331 345 352 346 352 357

Exercise 2.3.a

Produce some plots of the data in order to become familiar with it.

Approach

A Lagged Scatterplot Matrix helps in assessing the extent to which it might be possible to forecast a time series from its past values by showing the correlation between lagged values (autocorrelation) of a time series. The Time Plot is a line graph that plots each observed value against the time of the observation, with a single line connecting each observation across the entire period. The Autocorrelation Function (ACF) graph, also known as a correlogram, plots the autocorrelation coefficients associated with the lagged values of a time series. Randomness exists when over 95% of the values fall inside the boundaries marked by blue dashed lines.

Results

lag.plot(ibmclose, 25)

par(mfrow=c(1,2), oma=c(0,0,1,0))
plot(ibmclose, ylab="Price", xlab="Day", main="Time Plot")
acf(ibmclose, lag.max = 100, type = "correlation", plot = T, main = "Residuals")

Interpretation

The Lagged Scatterplot Matrix extended all the way out to lag 36 shows a significant amount of autocorrelation. This much autocorrelation is generally good for forecasting. The Time Plot shows relatively stable positive and negative momentum between trading days with a large price drop around trading day 250. The ACF shows how prevalent the autocorrelation is in the data. The autocorrelation exists at a decreasing rate all the way out to around lag 90.

Exercise 2.3.b

Split the data into a training set of 300 observations and a test set of 69 observations.

Approach

For iid data, simple random sampling is employed to create a training and test set. For time series data however, SRS will break down the serial dependence. Therefore, a random slice (window) with \(n\) days of data are used as samples instead.

Results

ibm_train <- window(ibmclose, end=300)
ibm_test <- window(ibmclose, start=301)

Interpretation

Duplicating the textbook example, a window of the first \(n=300\) days are used for the training set and a window of the following \(n=69\) days are used for the test set.

Exercise 2.3.c

Try various benchmark methods to forecast the training set and compare the results on the test set. Which method did best?

Approach

The benchmark methods from the text are the Mean, Naïve, Seasonal Naïve, and Random Walk. The Mean Method implemented with meanf() is an iid model forecast used for comparison that does not serve time series well due to the trends and other patterns in time series. The Naive Method implemented with naive() or rwf() is a random walk model forecast which is equivalent to an ARIMA(0,1,0) model. The Seasonal Naive Method implemented with snaive()is similar to the Naïve except that it is more useful for highly seasonal data. The Drift Method implemented with rwf() and a drift parameter is a random walk model with drift forecast which is equivalent to an ARIMA(0,1,0) model with a drift coefficient. The results are implemented and then the prediction results are plotted against the original data for comparison. The metrics in the accuracy() function provide a more objective comparison.

Results

# MAKE FORECASTS
method <- c("Mean Method", "Naive Method", "Seasonal Naive Method", "Drift Method")
ibmfit1 <- meanf(ibm_train, h=69)
ibmfit2 <- naive(ibm_train, h=69) # Equivalent to rwf(ibm_train, h=69)
ibmfit3 <- snaive(ibm_train, h=69)
ibmfit4 <- rwf(ibm_train, h=69, drift=T)
# PLOT PREDICTIONS
plot(ibm_train, main="IBM Price", ylab="", xlab="Day", xlim=c(2, 369))
lines(ibmfit1$mean, col=5)
lines(ibmfit2$mean-0.5, col=4) # nudged to differentiate on plot
lines(ibmfit3$mean+0.5, col=3) # nudged to differentiate on plot
lines(ibmfit4$mean, col=2)
legend("topleft", lty=1, col=5:2, legend=method)
lines(ibmclose)

# EVALUATE RESULTS
data.frame(row.names=method, rbind(
  t(accuracy(ibmfit1, ibm_test)[2, c(2, 3, 5, 6)]),
  t(accuracy(ibmfit2, ibm_test)[2, c(2, 3, 5, 6)]),
  t(accuracy(ibmfit3, ibm_test)[2, c(2, 3, 5, 6)]),
  t(accuracy(ibmfit4, ibm_test)[2, c(2, 3, 5, 6)])))
##                            RMSE       MAE      MAPE      MASE
## Mean Method           132.12557 130.61797 35.478819 25.626492
## Naive Method           20.24810  17.02899  4.668186  3.340989
## Seasonal Naive Method  20.24810  17.02899  4.668186  3.340989
## Drift Method           17.06696  13.97475  3.707888  2.741765

Interpretation

Immediately it can be seen with the plot that the Mean Method is a poor forecasting tool. The Naive Method and the Seasonal Naive Method perform the same on these data because the data do not have a seasonal component. To make both naïve prediction results visible on the plot, they were each nudged \(\pm 0.5\). Both naïve prediction do outperform the simple mean prediction, but the Drift Method produces the best predictions on these data. These results are further evidenced in the metrics which show the Drift Method with the best Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and Mean Absolute Scaled Error (MASE).

References

https://www.otexts.org/fpp/

http://appliedpredictivemodeling.com/

https://www.rdocumentation.org/packages

https://robjhyndman.com/uwafiles/fpp-notes.pdf

https://web.stanford.edu/~hastie/Papers/ESLII.pdf

https://cran.r-project.org/web/packages/fma/fma.pdf

http://a-little-book-of-r-for-time-series.readthedocs.io/en/latest/src/timeseries.html