Data 624: HW 1

2.1 Use the help function to explore what the series gold, woolyrnq and gas represent.

?gold

?woolyrnq

?gas

a) Use autoplot() to plot each of these in separate plots.

autoplot(gold)

autoplot(woolyrnq)

autoplot(gas)

b) What is the frequency of each series? Hint: apply the frequency() function.

frequency(gold)

## [1] 1

frequency(woolyrnq)

## [1] 4

frequency(gas)

## [1] 12

c) Use which.max() to spot the outlier in the gold series. Which observation was it?

?which.max

which.max(gold)

## [1] 770

2.3 Download some monthly Australian retail data from the book website. These represent retail sales in various categories for different Australian states, and are stored in a MS-Excel file.

a) You can read the data into R with the following script:

retaildata <- readxl::read_excel("retail.xlsx", skip=1)

head(retaildata)

## # A tibble: 6 × 190
##   `Series ID`         A3349335T A3349627V A3349338X A3349398A A3349468W
##   <dttm>                  <dbl>     <dbl>     <dbl>     <dbl>     <dbl>
## 1 1982-04-01 00:00:00      303.      41.7      63.9      409.      65.8
## 2 1982-05-01 00:00:00      298.      43.1      64        405.      65.8
## 3 1982-06-01 00:00:00      298       40.3      62.7      401       62.3
## 4 1982-07-01 00:00:00      308.      40.9      65.6      414.      68.2
## 5 1982-08-01 00:00:00      299.      42.1      62.6      404.      66  
## 6 1982-09-01 00:00:00      305.      42        64.4      412.      62.3
## # … with 184 more variables: A3349336V <dbl>, A3349337W <dbl>, A3349397X <dbl>,
## #   A3349399C <dbl>, A3349874C <dbl>, A3349871W <dbl>, A3349790V <dbl>,
## #   A3349556W <dbl>, A3349791W <dbl>, A3349401C <dbl>, A3349873A <dbl>,
## #   A3349872X <dbl>, A3349709X <dbl>, A3349792X <dbl>, A3349789K <dbl>,
## #   A3349555V <dbl>, A3349565X <dbl>, A3349414R <dbl>, A3349799R <dbl>,
## #   A3349642T <dbl>, A3349413L <dbl>, A3349564W <dbl>, A3349416V <dbl>,
## #   A3349643V <dbl>, A3349483V <dbl>, A3349722T <dbl>, A3349727C <dbl>, …

b) Select one of the time series as follows (but replace the column name with your own chosen column):

myts <- ts(retaildata[,"A3349335T"],
  frequency=12, start=c(1982,4))

c) Explore your chosen retail time series using the following functions:

autoplot(myts)

ggseasonplot(myts)

ggsubseriesplot(myts)

gglagplot(myts)

ggAcf(myts)

Can you spot any seasonality, cyclicity and trend? What do you learn about the series?

There does appear to be seasonality, becoming more pronounced after 2005. For instance, values tend to drop in February, increase in October, and peak in December. Not seeing clear signs of cyclicity. There does appear to be a consistent upward trend, which begins to accelerate after 2000.

6.2 The plastics data set consists of the monthly sales (in thousands) of product A for a plastics manufacturer for five years.

a) Plot the time series of sales of product A. Can you identify seasonal fluctuations and/or a trend-cycle?

?plastics

autoplot(plastics)

Yes, there is clear seasonality, with sales peaking just past the middle of the year. As for trend, there appears to be a consistent upward trend.

b) Use a classical multiplicative decomposition to calculate the trend-cycle and seasonal indices.

plastics %>% decompose(type="multiplicative") %>%
  autoplot() + xlab("Month") +
  ggtitle("Classical multiplicative decomposition
    of Sales of plastic product")

c) Do the results support the graphical interpretation from part a?

Yes, my estimation of seasonality and trend was generally correct. Though I did not clearly see the upward trend leveling off towards the end of the series.

d) Compute and plot the seasonally adjusted data.

plastics_ts <- ts(plastics, frequency=frequency(plastics), start=c(2017,1))

plastics_ts %>% seas(x11="") -> fit

autoplot(seasadj(fit), series="Data") +
  ggtitle("Seasonally Adjusted Sales of Plastic Product")

e) Change one observation to be an outlier (e.g., add 500 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

plastics_cp2 <- plastics
plastics_cp2[14] <- plastics_cp2[14] + 600
plastics_out2_ts <- ts(plastics_cp2, frequency=frequency(plastics_cp2), start=c(2017,1))

plastics_out2_ts %>% seas(x11="") -> fit2

autoplot(plastics_out2_ts, series="Data") +
  autolayer(trendcycle(fit2), series="Trend") +
  autolayer(seasadj(fit2), series="Seasonally Adjusted") +
  xlab("Year") + ylab("Sales") +
  ggtitle("Sales of plastic product") +
  scale_colour_manual(values=c("gray","blue","red"),
             breaks=c("Data","Seasonally Adjusted","Trend"))

plastics_cp2 %>% decompose(type="multiplicative") %>%
  autoplot() + xlab("Month") +
  ggtitle("Classical multiplicative decomposition
    of Sales of plastic product")

The outlier, which occurs in the trough of a seasonal cycle, does appear to change the shape of the seasonal cycle. Though the majority of the effects are captured in the remainder.

f) Does it make any difference if the outlier is near the end rather than in the middle of the time series?

plastics_cp3 <- plastics
plastics_cp3[52] <- plastics_cp3[52] + 600
plastics_out3_ts <- ts(plastics_cp3, frequency=frequency(plastics_cp3), start=c(2017,1))

plastics_out3_ts %>% seas(x11="") -> fit3

autoplot(plastics_out3_ts, series="Data") +
  autolayer(trendcycle(fit3), series="Trend") +
  autolayer(seasadj(fit3), series="Seasonally Adjusted") +
  xlab("Month") + ylab("Sales") +
  ggtitle("Sales of plastic product") +
  scale_colour_manual(values=c("gray","blue","red"),
             breaks=c("Data","Seasonally Adjusted","Trend"))

plastics_cp3 %>% decompose(type="multiplicative") %>%
  autoplot() + xlab("Month") +
  ggtitle("Classical multiplicative decomposition
    of Sales of plastic product")

The outlier towards the end of the series appears to have a smaller impact on trend and seasonality. It is unclear if the addition of 600 to a data point is less impactful at the end of the series, since the values do increase over time or if the effects are a result of the model being more influence by earlier data points. Again, the majority of the outlier’s effect is captured in the remainder.