\label{fig:fig1}Forecasting: Principles and Practice.

Forecasting: Principles and Practice.

Instructions

From the book Forecasting Principles and Practice by Hyndman, R. & Athanasopoulus, G.

Please submit exercises 2.1, 2.2, 2.3 and 2.6 from the Hyndman online Forecasting book. Please submit both your Rpubs link as well as attach the .rmd file with your code.

Exercises

2.1

Use the help function to explore what the series gold, woolyrnq and gas represent.

library(fpp2)

First, we need to load the library fpp2(). This will automatically load all the data used in the book.

library(fpp2)

The below results are given with the help() function from R.

help(gold)

gold R Documentation

Daily morning gold prices

Description

Daily morning gold prices in US dollars. 1 January 1985 – 31 March 1989.

Usage

gold

Format

Time series data

Examples

tsdisplay(gold)


The below results are given with the help() function from R.

help(woolyrnq)

woolyrnq R Documentation

Quarterly production of woollen yarn in Australia

Description

Quarterly production of woollen yarn in Australia: tonnes. Mar 1965 – Sep 1994.

Usage

woolyrnq

Format

Time series data

Source

Time Series Data Library. http://bit.ly/1BqwTa0

Examples

tsdisplay(woolyrnq)


The below results are given with the help() function from R.

help(gas)

gas R Documentation

Australian monthly gas production

Description

Australian monthly gas production: 1956–1995.

Usage

gas

Format

Time series data

Source

Australian Bureau of Statistics.

Examples

plot(gas)
seasonplot(gas)
tsdisplay(gas)

a)

Use autoplot() to plot each of these in separate plots.

autoplot(gold)

autoplot(woolyrnq)

autoplot(gas)

b)

What is the frequency of each series? Hint: apply the frequency() function.

Answer:

Since the “frequency” is the number of observations before the seasonal pattern repeats; we have as follows:

Data frequency
Annual 1
Quarterly 4
Monthly 12
Weekly 52
  • frequency(gold) = 1. That is, the frequency is Annual.

  • frequency(woolyrnq) = 4. That is, the frequency is Quarterly.

  • frequency(gas) = 12. That is, the frequency is Monthly.

c)

Use which.max() to spot the outlier in the gold series. Which observation was it?

  • which.max(gold) = 770. In other words, this maximum value is located on the 770 day of the recorded list; and it represents a maximum value of $593.70.

2.2

Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.

a)

You can read the data into R with the following script:

tute1 <- read.csv("tute1.csv", header=TRUE)
View(tute1)

b)

Convert the data to time series

mytimeseries <- ts(tute1[,-1], start=1981, frequency=4)

(The [,-1] removes the first column which contains the quarters as we don’t need them now.)

c)

Construct time series plots of each of the three series

autoplot(mytimeseries, facets=TRUE)

Check what happens when you don’t include facets=TRUE.

The difference is that the facets option, define faceting groups. That is, define different planes for each series; while if we do not include the option, it represent all the series into one plane. The above graphic avoid interpolation, while the second could interpolate the values of the series since those are represented on the same plane.

2.3

Download some monthly Australian retail data from the book website. These represent retail sales in various categories for different Australian states, and are stored in a MS-Excel file.

a)

You can read the data into R with the following script:

retaildata <- readxl::read_excel("retail.xlsx", skip=1)

The second argument (skip=1) is required because the Excel sheet has two header rows.

b)

Select one of the time series as follows (but replace the column name with your own chosen column):

myts <- ts(retaildata[,"A3349873A"],
frequency=12, start=c(1982,4))

My selected time series is A3349627V; it represents the Turnover in New South Wales about Liquor retailing.

c)

Explore your chosen retail time series using the following functions:

autoplot(), ggseasonplot(), ggsubseriesplot(), gglagplot(), ggAcf()

autoplot(myts)

Here, there is a clear and increasing trend. There is also a strong seasonal pattern that increases in size as the level of the series increases. The sudden drop at the start of each year needs to be investigated in order to find what cause this effect at the end of the calendar year. Any forecasts of this series would need to capture the seasonal pattern, and the fact that the trend is changing slowly.

ggseasonplot(myts)

A seasonal plot allows the underlying seasonal pattern to be seen more clearly, and is especially useful in identifying years in which the pattern changes. In this case, it is clear that there is a large jump in sales in December each year. Actually, these are probably sales in late December as customers stockpile before the end of the calendar year. The graph also shows that there was an unusually high number of sales in November 2011, 2012 and 2013. The data also show a considerable increase of sales for 2013. Over all the graph also show an increased trend starting on June 2012.

ggsubseriesplot(myts)

The horizontal lines indicate the means for each month. This form of plot enables the underlying seasonal pattern to be seen clearly, and also shows the changes in seasonality over time. It is especially useful in identifying changes within particular seasons.

gglagplot(myts)

Here the colors indicate the month of the variable on the vertical axis. The lines connect points in chronological order. The relationship is strongly positive at lag 12, reflecting the strong seasonality in the data.

ggAcf(myts)

Let’s see the Auto correlation values.

lag Acf
0 1.0000000
1 0.9013042
2 0.8635392
3 0.8577749
4 0.8422045
5 0.8266995
6 0.8134934
7 0.8153092
8 0.8174508
9 0.8190400
10 0.8081430
11 0.8331827
12 0.9010247

By looking at the correlogram, we noticed that all correlations are above the blue lines, which indicate that the correlations are significantly different from zero.

The slow decrease in the ACF as the lags increase is due to the trend, while the “scalloped” shape is due the seasonality.

Can you spot any seasonality, cyclicity and trend? What do you learn about the series?

When data have a trend, the auto correlations for small lags tend to be large and positive because observations nearby in time are also nearby in size. So the ACF of trended time series tend to have positive values that slowly decrease as the lags increase.

When data are seasonal, the auto correlations will be larger for the seasonal lags (at multiples of the seasonal frequency) than for other lags.

When data are both trended and seasonal, you see a combination of these effects.

2.6

Use the following graphics functions: autoplot(), ggseasonplot(), ggsubseriesplot(), gglagplot(), ggAcf() and explore features from the following time series: hsales, usdeaths, bricksq, sunspotarea, gasoline.

hsales

Description: Monthly sales of new one-family houses sold in the USA since 1973.

autoplot(hsales)

ggseasonplot(hsales)

ggsubseriesplot(hsales)

gglagplot(hsales)

ggAcf(hsales)

a)

Can you spot any seasonality, cyclicity and trend?

From the above graphs, we could identify increasing and decreasing trends, also there is a seasonality present and some cyclicity can be seeing of about 6 to 10 years.

b)

What do you learn about the series?

I can learn as follows:

  • The monthly sales of new one-family houses sold in the USA in later years have decreased compared to prior years.

  • There seems to be a high number of monthly sales of new one-family houses sold in the months of March, April and May of 1987.

  • The month of December seems to represent the lowest number of new one-family houses sold.

  • Spring seems to have the highest numbers of new one-family houses sold; while winter seems to represent the lowest with only a few exceptions.


usdeaths

Description: Monthly accidental deaths in USA from 1973 to 1978.

autoplot(usdeaths)

ggseasonplot(usdeaths)

ggsubseriesplot(usdeaths)

gglagplot(usdeaths)

ggAcf(usdeaths)

a)

Can you spot any seasonality, cyclicity and trend?

From the above graphs, we could identify increasing and decreasing trends, also there is a seasonality present and some cyclicity can be seeing since there seems to be an 8 to 12 years cycle.

b)

What do you learn about the series?

I can learn as follows:

  • The monthly accidental deaths in the USA in later years seems to have an upward trend.

  • There seems to be a high number of monthly accidental deaths in the month of July, with the highest in July of 1973.

  • The month of February seems to represent the lowest number of monthly accidental deaths (it is also the shortest month of the year).

  • Summer time seems to have the highest numbers monthly accidental deaths; while winter seems to represent the lowest.


bricksq

Description: Australian quarterly clay brick production: 1956-1994.

autoplot(bricksq)

ggseasonplot(bricksq)

ggsubseriesplot(bricksq)

gglagplot(bricksq)

ggAcf(bricksq)

a)

Can you spot any seasonality, cyclicity and trend?

From the above graphs, we could identify increasing and decreasing trends, also there is a seasonality present and some cyclicity can be seeing since there seems to be a 4 to 5 years cycle.

b)

What do you learn about the series?

I can learn as follows:

  • The quarterly Australian quarterly clay brick production years seems to have a “volatile” trend behavior.

  • There seems to be a low number of Australian quarterly clay brick production in the first quarter, with increased trends on the other 3 quarters.

  • The production of Australian quarterly clay brick production seems to be reduced in later years compared to prior years.

  • In the quarter 4 of 1981 seems to have a very deep drop in Australian quarterly clay brick production compared to other years.


sunspotarea

Description: Annual average sunspot area (1875-2015).

Annual averages of the daily sunspot areas (in units of millionths of a hemisphere) for the full sun. Sunspots are magnetic regions that appear as dark spots on the surface of the sun. The Royal Greenwich Observatory compiled daily sunspot observations from May 1874 to 1976. Later data are from the US Air Force and the US National Oceanic and Atmospheric Administration. The data have been calibrated to be consistent across the whole history of observations.

autoplot(sunspotarea)

ggseasonplot(sunspotarea)

In this particular case, it is noted that the data is not seasonal.

ggsubseriesplot(sunspotarea)

In this particular case, it is noted that the data is not seasonal.

gglagplot(sunspotarea)

ggAcf(sunspotarea)

a)

Can you spot any seasonality, cyclicity and trend?

From the above graphs, we could identify increasing and decreasing trends, also there no apparent seasonality present and some cyclicity can be seeing since there seems to be 8 to 10 years cycle.

b)

What do you learn about the series?

I can learn as follows:

  • The Annual average sunspot area seems to have an upward and downward trend behavior.

  • There seems to be an annual average sunspot area high increase in 1957, with a downward trend afterwards.

  • The Annual average sunspot area seems to be reduced in later years compared to prior years.


gasoline

Description: US finished motor gasoline product supplied.

Weekly data beginning 2 February 1991, ending 20 January 2017. Units are “million barrels per day”.

autoplot(gasoline)

ggseasonplot(gasoline)

In this particular case, it is noted that the data is not seasonal.

ggsubseriesplot(gasoline)

In this particular case, it is noted that the data is not seasonal.

gglagplot(gasoline)

ggAcf(gasoline)

a)

Can you spot any seasonality, cyclicity and trend?

From the above graphs, we could identify increasing and decreasing trends, also there no apparent seasonality present and some cyclicity can be seeing since there seems to be 8 to 10 years cycle.

b)

What do you learn about the series?

I can learn as follows:

  • The US finished motor gasoline product supplied seems to have an upward and downward trend behavior.

  • There seems to be an US finished motor gasoline product supplied decrease in various years.

  • The US finished motor gasoline product supplied seems that it had a downward trend in previous years with an upward trend in later years.

References

Hyndman, R. & Athanasopoulos, G. 2019. Forecasting: Principles and Practice. Australia: Monash University. https://otexts.com/fpp2/.

R Core Team. 2016. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.