Forecasting: Principles and Practice.
Instructions
From the book Forecasting Principles and Practice by Hyndman, R. & Athanasopoulus, G.
Please submit exercises 2.1, 2.2, 2.3
and 2.6
from the Hyndman online Forecasting book. Please submit both your Rpubs link as well as attach the .rmd file with your code.
Exercises
2.1
Use the help function to explore what the series gold, woolyrnq
and gas
represent.
library(fpp2)
First, we need to load the library fpp2()
. This will automatically load all the data used in the book.
library(fpp2)
The below results are given with the help()
function from R.
help(gold)
gold | R Documentation |
Daily morning gold prices
Description
Daily morning gold prices in US dollars. 1 January 1985 – 31 March 1989.
Usage
gold
Format
Time series data
Examples
tsdisplay(gold)
The below results are given with the help()
function from R.
help(woolyrnq)
woolyrnq | R Documentation |
Quarterly production of woollen yarn in Australia
Description
Quarterly production of woollen yarn in Australia: tonnes. Mar 1965 – Sep 1994.
Usage
woolyrnq
Format
Time series data
Source
Time Series Data Library. http://bit.ly/1BqwTa0
Examples
tsdisplay(woolyrnq)
The below results are given with the help()
function from R.
help(gas)
gas | R Documentation |
Australian monthly gas production
Description
Australian monthly gas production: 1956–1995.
Usage
gas
Format
Time series data
Source
Australian Bureau of Statistics.
Examples
plot(gas) seasonplot(gas) tsdisplay(gas)
a)
Use autoplot()
to plot each of these in separate plots.
autoplot(gold)
autoplot(woolyrnq)
autoplot(gas)
b)
What is the frequency of each series? Hint: apply the frequency()
function.
Answer:
Since the “frequency” is the number of observations before the seasonal pattern repeats; we have as follows:
Data | frequency |
---|---|
Annual | 1 |
Quarterly | 4 |
Monthly | 12 |
Weekly | 52 |
frequency(gold)
= 1. That is, the frequency is Annual.frequency(woolyrnq)
= 4. That is, the frequency is Quarterly.frequency(gas)
= 12. That is, the frequency is Monthly.
c)
Use which.max() to spot the outlier in the gold series. Which observation was it?
which.max(gold)
= 770. In other words, this maximum value is located on the 770 day of the recorded list; and it represents a maximum value of $593.70.
2.2
Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.
a)
You can read the data into R with the following script:
tute1 <- read.csv("tute1.csv", header=TRUE)
View(tute1)
b)
Convert the data to time series
mytimeseries <- ts(tute1[,-1], start=1981, frequency=4)
(The [,-1]
removes the first column which contains the quarters as we don’t need them now.)
c)
Construct time series plots of each of the three series
autoplot(mytimeseries, facets=TRUE)
Check what happens when you don’t include facets=TRUE
.
The difference is that the facets option, define faceting groups. That is, define different planes for each series; while if we do not include the option, it represent all the series into one plane. The above graphic avoid interpolation, while the second could interpolate the values of the series since those are represented on the same plane.
2.3
Download some monthly Australian retail data from the book website. These represent retail sales in various categories for different Australian states, and are stored in a MS-Excel file.
a)
You can read the data into R with the following script:
retaildata <- readxl::read_excel("retail.xlsx", skip=1)
The second argument (skip=1
) is required because the Excel sheet has two header rows.
b)
Select one of the time series as follows (but replace the column name with your own chosen column):
myts <- ts(retaildata[,"A3349873A"],
frequency=12, start=c(1982,4))
My selected time series is A3349627V
; it represents the Turnover in New South Wales about Liquor retailing.
c)
Explore your chosen retail time series using the following functions:
autoplot(), ggseasonplot(), ggsubseriesplot(), gglagplot(), ggAcf()
autoplot(myts)
Here, there is a clear and increasing trend. There is also a strong seasonal pattern that increases in size as the level of the series increases. The sudden drop at the start of each year needs to be investigated in order to find what cause this effect at the end of the calendar year. Any forecasts of this series would need to capture the seasonal pattern, and the fact that the trend is changing slowly.
ggseasonplot(myts)
A seasonal plot allows the underlying seasonal pattern to be seen more clearly, and is especially useful in identifying years in which the pattern changes. In this case, it is clear that there is a large jump in sales in December each year. Actually, these are probably sales in late December as customers stockpile before the end of the calendar year. The graph also shows that there was an unusually high number of sales in November 2011, 2012 and 2013. The data also show a considerable increase of sales for 2013. Over all the graph also show an increased trend starting on June 2012.
ggsubseriesplot(myts)
The horizontal lines indicate the means for each month. This form of plot enables the underlying seasonal pattern to be seen clearly, and also shows the changes in seasonality over time. It is especially useful in identifying changes within particular seasons.
gglagplot(myts)
Here the colors indicate the month of the variable on the vertical axis. The lines connect points in chronological order. The relationship is strongly positive at lag 12, reflecting the strong seasonality in the data.
ggAcf(myts)
Let’s see the Auto correlation values.
lag | Acf |
---|---|
0 | 1.0000000 |
1 | 0.9013042 |
2 | 0.8635392 |
3 | 0.8577749 |
4 | 0.8422045 |
5 | 0.8266995 |
6 | 0.8134934 |
7 | 0.8153092 |
8 | 0.8174508 |
9 | 0.8190400 |
10 | 0.8081430 |
11 | 0.8331827 |
12 | 0.9010247 |
By looking at the correlogram, we noticed that all correlations are above the blue lines, which indicate that the correlations are significantly different from zero.
The slow decrease in the ACF as the lags increase is due to the trend, while the “scalloped” shape is due the seasonality.
Can you spot any seasonality, cyclicity and trend? What do you learn about the series?
When data have a trend, the auto correlations for small lags tend to be large and positive because observations nearby in time are also nearby in size. So the ACF of trended time series tend to have positive values that slowly decrease as the lags increase.
When data are seasonal, the auto correlations will be larger for the seasonal lags (at multiples of the seasonal frequency) than for other lags.
When data are both trended and seasonal, you see a combination of these effects.
2.6
Use the following graphics functions: autoplot(), ggseasonplot(), ggsubseriesplot(), gglagplot(), ggAcf()
and explore features from the following time series: hsales, usdeaths, bricksq, sunspotarea, gasoline
.
hsales
Description: Monthly sales of new one-family houses sold in the USA since 1973.
autoplot(hsales)
ggseasonplot(hsales)
ggsubseriesplot(hsales)
gglagplot(hsales)
ggAcf(hsales)
a)
Can you spot any seasonality, cyclicity and trend?
From the above graphs, we could identify increasing and decreasing trends, also there is a seasonality present and some cyclicity can be seeing of about 6 to 10 years.
b)
What do you learn about the series?
I can learn as follows:
The monthly sales of new one-family houses sold in the USA in later years have decreased compared to prior years.
There seems to be a high number of monthly sales of new one-family houses sold in the months of March, April and May of 1987.
The month of December seems to represent the lowest number of new one-family houses sold.
Spring seems to have the highest numbers of new one-family houses sold; while winter seems to represent the lowest with only a few exceptions.
usdeaths
Description: Monthly accidental deaths in USA from 1973 to 1978.
autoplot(usdeaths)
ggseasonplot(usdeaths)
ggsubseriesplot(usdeaths)
gglagplot(usdeaths)
ggAcf(usdeaths)
a)
Can you spot any seasonality, cyclicity and trend?
From the above graphs, we could identify increasing and decreasing trends, also there is a seasonality present and some cyclicity can be seeing since there seems to be an 8 to 12 years cycle.
b)
What do you learn about the series?
I can learn as follows:
The monthly accidental deaths in the USA in later years seems to have an upward trend.
There seems to be a high number of monthly accidental deaths in the month of July, with the highest in July of 1973.
The month of February seems to represent the lowest number of monthly accidental deaths (it is also the shortest month of the year).
Summer time seems to have the highest numbers monthly accidental deaths; while winter seems to represent the lowest.
bricksq
Description: Australian quarterly clay brick production: 1956-1994.
autoplot(bricksq)
ggseasonplot(bricksq)
ggsubseriesplot(bricksq)
gglagplot(bricksq)
ggAcf(bricksq)
a)
Can you spot any seasonality, cyclicity and trend?
From the above graphs, we could identify increasing and decreasing trends, also there is a seasonality present and some cyclicity can be seeing since there seems to be a 4 to 5 years cycle.
b)
What do you learn about the series?
I can learn as follows:
The quarterly Australian quarterly clay brick production years seems to have a “volatile” trend behavior.
There seems to be a low number of Australian quarterly clay brick production in the first quarter, with increased trends on the other 3 quarters.
The production of Australian quarterly clay brick production seems to be reduced in later years compared to prior years.
In the quarter 4 of 1981 seems to have a very deep drop in Australian quarterly clay brick production compared to other years.
sunspotarea
Description: Annual average sunspot area (1875-2015).
Annual averages of the daily sunspot areas (in units of millionths of a hemisphere) for the full sun. Sunspots are magnetic regions that appear as dark spots on the surface of the sun. The Royal Greenwich Observatory compiled daily sunspot observations from May 1874 to 1976. Later data are from the US Air Force and the US National Oceanic and Atmospheric Administration. The data have been calibrated to be consistent across the whole history of observations.
autoplot(sunspotarea)
ggseasonplot(sunspotarea)
In this particular case, it is noted that the data is not seasonal.
ggsubseriesplot(sunspotarea)
In this particular case, it is noted that the data is not seasonal.
gglagplot(sunspotarea)
ggAcf(sunspotarea)
a)
Can you spot any seasonality, cyclicity and trend?
From the above graphs, we could identify increasing and decreasing trends, also there no apparent seasonality present and some cyclicity can be seeing since there seems to be 8 to 10 years cycle.
b)
What do you learn about the series?
I can learn as follows:
The Annual average sunspot area seems to have an upward and downward trend behavior.
There seems to be an annual average sunspot area high increase in 1957, with a downward trend afterwards.
The Annual average sunspot area seems to be reduced in later years compared to prior years.
gasoline
Description: US finished motor gasoline product supplied.
Weekly data beginning 2 February 1991, ending 20 January 2017. Units are “million barrels per day”.
autoplot(gasoline)
ggseasonplot(gasoline)
In this particular case, it is noted that the data is not seasonal.
ggsubseriesplot(gasoline)
In this particular case, it is noted that the data is not seasonal.
gglagplot(gasoline)
ggAcf(gasoline)
a)
Can you spot any seasonality, cyclicity and trend?
From the above graphs, we could identify increasing and decreasing trends, also there no apparent seasonality present and some cyclicity can be seeing since there seems to be 8 to 10 years cycle.
b)
What do you learn about the series?
I can learn as follows:
The US finished motor gasoline product supplied seems to have an upward and downward trend behavior.
There seems to be an US finished motor gasoline product supplied decrease in various years.
The US finished motor gasoline product supplied seems that it had a downward trend in previous years with an upward trend in later years.
References
Hyndman, R. & Athanasopoulos, G. 2019. Forecasting: Principles and Practice. Australia: Monash University. https://otexts.com/fpp2/.
R Core Team. 2016. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.