library(fpp2)
## -- Attaching packages ------------------------------------------------------------------------------------------------------------------------------------------------------ fpp2 2.4 --
## v ggplot2   3.1.0     v fma       2.4  
## v forecast  8.12      v expsmooth 2.3
## 

(2.1)

Use the help function to explore what the series gold, woolyrnq and gas represent.

Daily morning gold prices (US Dollars) 1 January 1985 - 31 March 1989.

#help(gold)
tsdisplay(gold)

Quarterly production of woollen yarn in Australia Quarterly production of woollen yarn in Australia: tonnes. Mar 1965 - Sep 1994.

#help(woolyrnq)
tsdisplay(woolyrnq)

Australian monthly gas production 1956-1995

#help(gas)
tsdisplay(gas)

Use autoplot() to plot each of these in separate plots.

autoplot(gold) +
  ggtitle("Daily morning gold prices") +
  xlab("Day") +
  ylab("US Dollars")

autoplot(woolyrnq) + 
  ggtitle("Quarterly production of woollen yarn in Australia") +
  xlab("Mar 1965 - Sep 1994") +
  ylab("Tonnes")

autoplot(gas)  + 
  ggtitle("Australian monthly gas production") +
  xlab("1956-1995") +
  ylab("")

What is the frequency of each series? Hint: apply the frequency() function.

For time series gold, we know that the time is daily. The frequency is 12. So, I think this means 12 days.

frequency(gold)
## [1] 1

For time series woolyrnq, we know that this is quarterly. The frequency is 4. So, I think this means 4 quarters.

frequency(woolyrnq)
## [1] 4

For time series gas, we know that this is monthly. The frequency is 12. So, I think this means 12 months.

frequency(gas)
## [1] 12

Use which.max() to spot the outlier in the gold series. Which observation was it?

As per function which.max, the outlier is on day 770.

which.max(gold)
## [1] 770

The value of price of gold in the morning on day 770 is $593.70.

gold[770]
## [1] 593.7

(2.2)

Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.

You can read the data into R with the following script:

tute1 <- read.csv("tute1.csv", header=TRUE)
View(tute1)

Convert the data to time series

mytimeseries <- ts(tute1[,-1], start=1981, frequency=4)
#(The [,-1] removes the first column which contains the quarters as we don't need them now.)

Construct time series plots of each of the three series

autoplot(mytimeseries, facets=TRUE)

Check what happens when you don’t include facets=TRUE.

Below is plot output when facets=TRUE is removed from the call.

autoplot(mytimeseries)


(2.3)

Download some monthly Australian retail data from the book website. These represent retail sales in various categories for different Australian states, and are stored in a MS-Excel file.

You can read the data into R with the following script:

retaildata <- readxl::read_excel("retail.xlsx", skip=1)
#The second argument (skip=1) is required because the Excel sheet has two header rows.
head(retaildata)
## # A tibble: 6 x 190
##   `Series ID`         A3349335T A3349627V A3349338X A3349398A A3349468W
##   <dttm>                  <dbl>     <dbl>     <dbl>     <dbl>     <dbl>
## 1 1982-04-01 00:00:00      303.      41.7      63.9      409.      65.8
## 2 1982-05-01 00:00:00      298.      43.1      64        405.      65.8
## 3 1982-06-01 00:00:00      298       40.3      62.7      401       62.3
## 4 1982-07-01 00:00:00      308.      40.9      65.6      414.      68.2
## 5 1982-08-01 00:00:00      299.      42.1      62.6      404.      66  
## 6 1982-09-01 00:00:00      305.      42        64.4      412.      62.3
## # ... with 184 more variables: A3349336V <dbl>, A3349337W <dbl>,
## #   A3349397X <dbl>, A3349399C <dbl>, A3349874C <dbl>, A3349871W <dbl>,
## #   A3349790V <dbl>, A3349556W <dbl>, A3349791W <dbl>, A3349401C <dbl>,
## #   A3349873A <dbl>, A3349872X <dbl>, A3349709X <dbl>, A3349792X <dbl>,
## #   A3349789K <dbl>, A3349555V <dbl>, A3349565X <dbl>, A3349414R <dbl>,
## #   A3349799R <dbl>, A3349642T <dbl>, A3349413L <dbl>, A3349564W <dbl>,
## #   A3349416V <dbl>, A3349643V <dbl>, A3349483V <dbl>, A3349722T <dbl>,
## #   A3349727C <dbl>, A3349641R <dbl>, A3349639C <dbl>, A3349415T <dbl>,
## #   A3349349F <dbl>, A3349563V <dbl>, A3349350R <dbl>, A3349640L <dbl>,
## #   A3349566A <dbl>, A3349417W <dbl>, A3349352V <dbl>, A3349882C <dbl>,
## #   A3349561R <dbl>, A3349883F <dbl>, A3349721R <dbl>, A3349478A <dbl>,
## #   A3349637X <dbl>, A3349479C <dbl>, A3349797K <dbl>, A3349477X <dbl>,
## #   A3349719C <dbl>, A3349884J <dbl>, A3349562T <dbl>, A3349348C <dbl>,
## #   A3349480L <dbl>, A3349476W <dbl>, A3349881A <dbl>, A3349410F <dbl>,
## #   A3349481R <dbl>, A3349718A <dbl>, A3349411J <dbl>, A3349638A <dbl>,
## #   A3349654A <dbl>, A3349499L <dbl>, A3349902A <dbl>, A3349432V <dbl>,
## #   A3349656F <dbl>, A3349361W <dbl>, A3349501L <dbl>, A3349503T <dbl>,
## #   A3349360V <dbl>, A3349903C <dbl>, A3349905J <dbl>, A3349658K <dbl>,
## #   A3349575C <dbl>, A3349428C <dbl>, A3349500K <dbl>, A3349577J <dbl>,
## #   A3349433W <dbl>, A3349576F <dbl>, A3349574A <dbl>, A3349816F <dbl>,
## #   A3349815C <dbl>, A3349744F <dbl>, A3349823C <dbl>, A3349508C <dbl>,
## #   A3349742A <dbl>, A3349661X <dbl>, A3349660W <dbl>, A3349909T <dbl>,
## #   A3349824F <dbl>, A3349507A <dbl>, A3349580W <dbl>, A3349825J <dbl>,
## #   A3349434X <dbl>, A3349822A <dbl>, A3349821X <dbl>, A3349581X <dbl>,
## #   A3349908R <dbl>, A3349743C <dbl>, A3349910A <dbl>, A3349435A <dbl>,
## #   A3349365F <dbl>, A3349746K <dbl>, ...

Select one of the time series as follows (but replace the column name with your own chosen column):

myts <- ts(retaildata[,"A3349335T"],
  frequency=12, start=c(1982,4))

Explore your chosen retail time series using the following functions:

autoplot(), ggseasonplot(), ggsubseriesplot(), gglagplot(), ggAcf()

autoplot(myts)

ggseasonplot(myts)

ggsubseriesplot(myts)

gglagplot(myts)

ggAcf(myts)

Can you spot any seasonality, cyclicity and trend? What do you learn about the series?

The autoplot shows an increasing trend. There’s seasonality as well. There’s an increase that happens around December/January and March and decrease around February. It’s harder to see the seasonality I observed on the ACF as I don’t see obvious spikes on December. The correlation are all positive. I do see decrease in in the ACF, which the book says is due to the trend.


(2.6)

Use the following graphics functions: autoplot(), ggseasonplot(), ggsubseriesplot(), gglagplot(), ggAcf() and explore features from the following time series: hsales, usdeaths, bricksq, sunspotarea, gasoline.

Can you spot any seasonality, cyclicity and trend? What do you learn about the series?

There appears to be more sales during March. The seasonal subseries plot shows an overall higher mean for months Mar, Apr, and May. I don’t really see any obvious trend here. The ACF show some months have negative correlation. Maybe these are months that did not follow the expected seasonal patterns?

#help(hsales)
autoplot(hsales) + 
  ggtitle("Sales of one-family houses") +
  xlab("Year") +
  ylab("")

ggseasonplot(hsales)

ggsubseriesplot(hsales)

gglagplot(hsales)

ggAcf(hsales)

Accidental deaths appear to peak on July. Rises as it heads during summer and decreases as it heads winter. ACF shows both positive and negative correlation, and they are all outside the bounds of white noise. The ACF show that there are months with lower accidents and months with higher accidents.

#help(usdeaths)
autoplot(usdeaths) + 
  ggtitle("Accidental deaths in USA") +
  xlab("Year") +
  ylab("")

ggseasonplot(usdeaths)

ggsubseriesplot(usdeaths)

gglagplot(usdeaths)

ggAcf(usdeaths)

There is an increasing trend overall for clay brick production. The seasonal subseries plot shows that overall Q3 has highest production. ACF shows all positive correlation (does this mean production is similar in size?). Slow decrease in ACF as as lag increase shows trend.

#help(bricksq)
autoplot(bricksq) + 
  ggtitle("Quarterly clay brick production") +
  xlab("Year") +
  ylab("")

ggseasonplot(bricksq)

ggsubseriesplot(bricksq)

gglagplot(bricksq)

ggAcf(bricksq)

This looks cyclical. There are periods when it has upward trend, followed by downward trend.

#help(sunspotarea)
autoplot(sunspotarea) + 
  ggtitle("Annual average sunspot area ") +
  xlab("Year") +
  ylab("")

#ggseasonplot(sunspotarea) --> Error in ggseasonplot(sunspotarea) : Data are not seasonal
#ggsubseriesplot(sunspotarea) --> Error in ggsubseriesplot(sunspotarea) : Data are not seasonal
gglagplot(sunspotarea)

ggAcf(sunspotarea)

There appears to be an upward trend overall. ACF suggests that there’s seasonality in the data as I observe the “scallop” shape.

#help(gasoline)
autoplot(gasoline) + 
  ggtitle("US finished motor gasoline product supplied") +
  xlab("Weekly") +
  ylab("Million barrels per day")

ggseasonplot(gasoline) 

#ggsubseriesplot(gasoline) --> "Error in ggsubseriesplot(gasoline) : Each season requires at least 2 observations. This may be caused from specifying a time-series with non-integer frequency."
gglagplot(gasoline)

ggAcf(gasoline)