Forecasting Principles and Practice

Load required packages

packages <- c("tidyverse", "fpp2", "forecast", "kableExtra", "broom", "ggplot2", "caret", "e1071", "knitr", "GGally", "VIM", "mlbench", "car", "corrplot", "mice", "seasonal", "fma", "latex2exp","gridExtra")
pacman::p_load(char = packages)

2.1

Use the help function to explore what the series gold, woolyrnq and gas represent.

  1. Use autoplot() to plot each of these in separate plots.

  2. What is the frequency of each series? Hint: apply the frequency() function.

  3. Use which.max() to spot the outlier in the gold series. Which observation was it?

Answer:

gold, woolyrnq and gas are datasets contained within the forecast package.

The gold dataset contains time series data of daily morning gold prices in US dollars from 1/1/1985 to 3/31/1989.

The woolyrnq dataset includes quarterly woolen yarn production from Australia from 2Q 1965 to 3Q 1994.

The gas dataset consists of time series data of Australian monthly gas production from 1956 to 1995.

Ans a)

Plot for gold

autoplot(gold)+
  ggtitle("Daily morning gold prices in US dollars from 1/1/1985 to 3/31/1989")+
  xlab("Days since 1/1/1985")+
  ylab("US dollars")

Plot for woolyrnq

autoplot(woolyrnq) +
  ggtitle("Quarterly woollen yarn production in Australia") +
  xlab("Year") +
  ylab("Tonnes")

Plot for gas

autoplot(gas) +
  ggtitle("Australian monthly gas production") +
  xlab("Year") +
  ylab("Unknown units")

Ans b)

frequency(gold)
## [1] 1

Frequency of gold is 1 i.e. daily

frequency(woolyrnq)
## [1] 4

Frequency of woolyrnq is 4 i.e. quarterly

frequency(gas)
## [1] 12

Frequency of gas is 12 i.e. monthly

Ans c)

which.max(gold)
## [1] 770

Observation 770 is the outlier in gold series.

gold[770]
## [1] 593.7

2.2

Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.

Answer 2)

  1. Read the data into R with the following script:
tute1 <- read_csv("https://otexts.com/fpp2/extrafiles/tute1.csv")
#View(tute1)
  1. Convert the data to time series
tute_series <- ts(tute1[,-1], start=1981, frequency=4)
  1. Construct time series plots of each of the three series
autoplot(tute_series, facets=TRUE)

Check what happens when you don’t include facets=TRUE

autoplot(tute_series)

  • When we don’t include facets=TRUE, graph does not subset them into individual plots.

2.3

Download some monthly Australian retail data from the book website. These represent retail sales in various categories for different Australian states, and are stored in a MS-Excel file.

Answer 3)

  1. Read the data into R with the following script:
retaildata <- readxl::read_excel("retail.xlsx", skip=1)
#View(retaildata)
  1. Select one of the time series as follows
new_series <- ts(retaildata[,"A3349873A"],
  frequency=12, start=c(1982,4))
  1. Explore your chosen retail time series using the following functions: autoplot(), ggseasonplot(), ggsubseriesplot(), gglagplot(), ggAcf(). Can you spot any seasonality, cyclicity and trend? What do you learn about the series?
autoplot(new_series)

In the autoplot above, we see both an increasing trend along with a clear seasonal pattern with a fixed and known frequency that increases in nominal size over time.

ggseasonplot(new_series)

The seasonal plot for examaple, for 2013 shows highest monthly sales generally occuring in December due to sales and offers season, with the lowest dip arriving in February generally due to season closing.

ggsubseriesplot(new_series)

In the seasonal subseries plot above,horizontal blue lines indicate means for each month.

This form of plot enables the underlying seasonal pattern to be seen clearly, and also changes in seasonality over time.

In some cases, this is the most useful way of viewing seasonal changes over time.

gglagplot(new_series)

In the gglagplot above Each graph shows yt plotted against yt−k for different values of k.

Here the colours indicate the month of the variable i.e. yearly on the vertical axis. The lines connect points in chronological order. The relationship is strongly positive at lags 12, reflecting the strong seasonality in the data.

ggAcf(new_series)

In the above auto correlation plot, 1st and 12th lag are higher than other lags.This is due to the seasonal pattern in the data: the peaks tend to be apart yearly.

The dashed blue lines indicate whether the correlations are significantly different from zero.

They agree with the trend of increasing sales over time with seasonal variation that’s likely tied to holiday shopping.

2.6

Use the following graphics functions: autoplot(), ggseasonplot(), ggsubseriesplot(), gglagplot(), ggAcf() and explore features from the following time series: hsales, usdeaths, bricksq, sunspotarea, gasoline. Can you spot any seasonality, cyclicity and trend? What do you learn about the series?

a) hsales : monthly sales of new one-family houses sold in the USA since 1973

autoplot(hsales)

ggseasonplot(hsales)

ggsubseriesplot(hsales)

gglagplot(hsales)

ggAcf(hsales)

Inference on US Home Sales :

Seasonality, Cyclicality, Trend

  • Seasonality: annual 12 month period, with peak activity in the spring and summer months

  • Cyclicality: 7 to 10 year period corresponding to the economic cycle

  • Trend: none apparent, although expected long-term upward trend corresponding to population growth

  • Strong autocorrelation

What do we learn about the series:

From the Seasanol plot, it seems that home sales seems to pick up after february and slow down after september which is confirmed by other plots also.

b) usdeaths : monthly accidental deaths in the USA

autoplot(usdeaths)

ggseasonplot(usdeaths)

ggsubseriesplot(usdeaths)

gglagplot(usdeaths)

ggAcf(usdeaths)

Inference on US Deaths :

Seasonality, Cyclicality, Trend

  • Seasonality: annual 12 month period, with peak deaths in summer months and trough in winter months

  • Cyclicality: none apparent

  • Trend: none apparent over this time frame, expected long-term upward trend corresponding to population growth

  • Strong autocorrelation

What do we learn about the series:

From the Seasanol plot, it seems more accidental deaths happens during summer which peaks in July.

c) bricksq : Australian quarterly clay brick production: 1956-1994

autoplot(bricksq)

ggseasonplot(bricksq)

ggsubseriesplot(bricksq)

gglagplot(bricksq)

ggAcf(bricksq)

Inference on Bricks:

Seasonality, Cyclicality, Trend

  • Seasonality: annual 12 month period, with peak activity in the spring and summer months

  • Cyclicality: 7 to 10 year period corresponding to the economic cycle

  • Trend: none apparent, although expected long-term upward trend corresponding to population growth

  • Strong autocorrelation

What do we learn about the series:

From the timeseries plot, it indicates an upward trend from the 50’s to the mid 70’s, but after that it exhibits cyclic pattern.

Seasonal plot shows a uptick in mid q1 and flattens around mid q2 and remains the same for most of the years.

c) sunspotarea : annual averages of the daily sunspot areas (in units of millionths of a hemisphere) for the full sun”, 1875-2015

autoplot(sunspotarea)

ggseasonplot(sunspotarea)

Error in ggseasonplot(sunspotarea) : Data are not seasonal

ggsubseriesplot(sunspotarea)

Error in ggsubseriesplot(sunspotarea) : Data are not seasonal

gglagplot(sunspotarea)

ggAcf(sunspotarea)

Inference on Sunspotarea :

Seasonality, Cyclicality, Trend

  • Seasonality: an approximately 10-year solar period is evident, although, it’s not clear that the seasonal period is fixed exactly, so perhaps it could be considered a cyclical pattern

  • Cyclicality: it’s possible that there may be a solar cyclical pattern over >140 years, we would need more data to determine this, perhaps this could be considered a long-term trend

  • Trend: none apparent, need longer-term time series to determine any trends given the long-term nature of the observations

  • Strong autocorrelation

What do we learn about the series:

From the timeseries plot, it shows a cyclic trend through its time period with a few peaks and also the data is not seasonal.

c) gasoline : US finished motor gasoline product supplied, weekly data beginning 2 February 1991, ending 20 January 2017”; units of “million barrels per day

autoplot(gasoline)

ggseasonplot(gasoline)

ggsubseriesplot(gasoline)

Error in ggsubseriesplot(gasoline) : Each season requires at least 2 observations. This may be caused from specifying a time-series with non-integer frequency.

gglagplot(gasoline)

ggAcf(gasoline)

Inference on gasoline :

Seasonality, Cyclicality, Trend

  • Seasonality: annual 52 week period with peaks in the summer months and troughs in the winter months

  • Cyclicality: 7 to 10 year period correlating with economic cycle

  • Trend: It has an upward trend corresponding to long-term growth in the economy or population, however long-term trendline may have a more shallow slope recently because of alternative fuels (e.g., electric cars)

  • Strong autocorrelation

What do we learn about the series:

From the timeseries plot, it shows a trend up from the 90’s to 2005 and then kind of flattens and gasoline usage going down a bit. seasoanl trend shows marginal uptick during summer and comes down a bit during winter.