Questions

Question 2.1

  1. Use the help function to explore what the series gold, woolyrnq and gas represent.
  1. Use autoplot() to plot each of these in separate plots.
  2. What is the frequency of each series?
  3. Use which.max() to spot the outlier in the gold series. Which observation was it?

gold

This data represents the daily morning gold prices in US dollars from 1 January 1985 to 31 March 1989.

The frequency of the series is daily (or 365). That is, each entry’s value is separated by 1 day.

The outlier value in the gold series occurs at the 770th observation (770 days after 1/1/1985). The value is 593.7

data(gold)
autoplot(gold) +
  ggtitle('Daily Morning Gold Prices in US dollars', subtitle='1/1/1985 to 31/3/1989') +
  labs(x='Days', y='Price') +
  geom_point(data=data_frame(x=which.max(gold), y=gold[which.max(gold)]), 
             aes(x, y), color='red')

which.max(gold)
## [1] 770
gold[which.max(gold)]
## [1] 593.7

woolyrnq

The data represents the quarterly production of woollen yarn in Australia measured in tons from May 1965 to Sept 1994.

The frequency of this series is quarterly (or 4). That is, each entry’s value is separated by 1 quarter.

data(woolyrnq)
autoplot(woolyrnq) +
  ggtitle('Quarterly Yarn Production in Australia', subtitle='5/1965 to 9/1994') +
  labs(x='Year', y='Price')

gas

The data represents the monthly gas production of Australia from 1956 to 1995.

The frequency of this series is monthly (or 12). That is, each entry’s value is separated by 1 month.

data(gas)
autoplot(gas) +
  ggtitle('Monthly Gas Production in Australia', subtitle='1956 to 1995') +
  labs(x='Year', y='Price')

Question 2.2

  1. Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.
  1. You can read the data into R with the following script:
  2. Convert the data to time series
mytimeseries <- read_csv('./tute1.csv') %>%
  select(-X1) %>%
  ts(start=1981, frequency=4)
mytimeseries %>%
  head() %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
Sales AdBudget GDP
1020.2 659.2 251.8
889.2 589.0 290.9
795.0 512.5 290.8
1003.9 614.1 292.4
1057.7 647.2 279.1
944.4 602.0 254.0
  1. Construct time series plots of each of the three series. Check what happens when you don’t include facets=TRUE.
mytimeseries %>%
  autoplot(facets=TRUE)

mytimeseries %>%
  autoplot()

Facets separate the data into three plots. This allows for easier comparison of the changes between the values as the y-axis for each facet is standardized. The second plots, without facets, may be advantagious if we are interested in the absolute value difference between the series and not their relative changes over time.

Question 2.3

  1. Download some monthly Australian retail data from the book website. These represent retail sales in various categories for different Australian states, and are stored in a MS-Excel file.
  1. You can read the data into R with the following script:
retaildata <- readxl::read_excel('./retail.xlsx', skip=1)
retaildata %>%
  select(1:5) %>%
  head() %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
Series ID A3349335T A3349627V A3349338X A3349398A
1982-04-01 303.1 41.7 63.9 408.7
1982-05-01 297.8 43.1 64.0 404.9
1982-06-01 298.0 40.3 62.7 401.0
1982-07-01 307.9 40.9 65.6 414.4
1982-08-01 299.2 42.1 62.6 403.8
1982-09-01 305.4 42.0 64.4 411.8
  1. Select one of the time series as follows (but replace the column name with your own chosen column):
myts <- retaildata %>%
  select(A3349335T) %>%
  ts(frequency=12, start=c(1982, 4))
myts %>%
  head() %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)
A3349335T
303.1
297.8
298.0
307.9
299.2
305.4
  1. Explore your chosen retail time series using the following functions:
myts %>%
  autoplot()

This plot appears to show a general upward trend over time.

myts %>%
  ggseasonplot()

This plot, combined with the previous one, appears to show seasonality in the data. Specifically, the data appears to fall in February (in recent years) and rise in December

myts %>%
  ggsubseriesplot()

myts %>%
  gglagplot()

The previous two plots appears to show that the data is highly correlated year after year. That is, while usage appears to be rising, the usage rate apperas to be the same. Notice that the most highly correlated lag is lag12 indicating that while all months are similar to each other, identical months are even more similar.

myts %>%
  ggAcf()

This plot appears to support the previous assertion that the data is highly correlated month after month. The lower correlation as the lag grows can be attributed to the trend.

In conclusion, the data appears to be trending upwards, highly correlated month over month and year over year but with seasonality in December. There does not appear to be any signs of cyclicity.

Question 2.6

Use the following graphics functions: autoplot(), ggseasonplot(), ggsubseriesplot(), gglagplot(), ggAcf() and explore features from the following time series: hsales, usdeaths, bricksq, sunspotarea, gasoline.

  • Can you spot any seasonality, cyclicity and trend?
  • What do you learn about the series?

This question requires plotting the same handful of plots for each time series. To simplify this process I wrote two functions to display the plots. The plus side of these functions is that they greatly simplify the code. The down side is that I cannot customize each plot. Thus, these plots really should only be used for EDA. I also wrote a third method to display a portion of the data for reference.

SINGLE.PLOT <- function(plot){
  tryCatch({
    plot %>%
      print()
  },
  error=function(cond){
    message(cond)
  })
}

PLOT.GENERATOR <- function(data){
  data %>%
    autoplot() %>%
    SINGLE.PLOT()
  SINGLE.PLOT(data %>% ggseasonplot())
  SINGLE.PLOT(data %>% ggsubseriesplot())
  data %>%
    ggAcf() %>%
    SINGLE.PLOT()
  data %>% 
    gglagplot() %>%
    SINGLE.PLOT()
}

DISPLAY.SERIES <- function(data){
  data %>%
    head() %>%
    kable() %>%
    kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)
}

hsales

overview

This data represents the monthly sales of new one-family houses sold in the USA since 1973.

data(hsales)
hsales %>%
  DISPLAY.SERIES()
x
55
60
68
63
65
61

plots

hsales %>%
  PLOT.GENERATOR()

conclusion

hsales appears to show cyclicity of about 10 years. Notice the troughs around 1975, 1982 and 1992 with peaks in between. This possibly reflects recessions and recovery cycles. There is also strong seaonality with sales being highest in the spring. It makes sense that sales would be higher when the weather is nicer and coming out of the winter. Finally, there is no apparent trend in the data. House sales rise and fall but stay within a fairly standard range of values.

usdeaths

overview

This data represents monthly accidental deaths in the USA

data(usdeaths)
usdeaths %>%
  DISPLAY.SERIES()
x
9007
8106
8928
9137
10017
10826

plots

usdeaths %>%
  PLOT.GENERATOR()

conclusion

usdeaths shows strong seasonality. This appears to reflect the fact that when the weather is nice more people leave their homes to partake in activities and these activities lead to higher death rates. The first of the every month appears to have unusually high rates but this is likely due to reporting lag and the fact that mass reporting occurs on the first of the month. The yearly cyclicity is just an alternative interpretation of the high seasonality. There appears to be no greater cyclicity or trend in the data.

bricksq

overview

This data represents australian quarterly clay brick production from 1956 to 1994.

data(bricksq)
bricksq %>%
  DISPLAY.SERIES()
x
189
204
208
197
187
214

plots

bricksq %>%
  PLOT.GENERATOR()

conclusion

bricksq shows a strong growing trend that eventually evens out around 1975. This indicates a growing need for clay brick in Australia up until the market was saturated in the mid 70’s. Afterwards there appears to be cyclicity that likely aligns with general recessions which would result in fewer buildings being built (and thus fewer bricks needed). This cycle appears roughly every 10 years followed by a rebound. There is also strong seasonality with Q1 having much lower production than other quarters. This likely reflects the fact that fewer buildings are made during this time of the year.

sunspotarea

overview

This data represents annual average of sunspot areas.

data(sunspotarea)
sunspotarea %>%
  DISPLAY.SERIES()
x
213.13333
109.28333
92.85833
22.21667
36.33333
446.75000

plots

sunspotarea %>%
  PLOT.GENERATOR()

conclusion

sunspotarea show strong cyclidity on a roughly decade scale. This seems to indicate that sunspots grow and fade in intensity over a period of many years. Within that however, there is much noise. The lag plots show that year over year changes are widely inconsistent. Due to the data being yearly there is no seasonality. There also does not appear to be any trend although given the large time scale of the cyclidity, we made need a broader picture to see any trends.

gasoline

overview

This data represents weekly US motor gasoline supplies from 1991 to 2017.

data(gasoline)
gasoline %>%
  DISPLAY.SERIES()
x
6.621
6.433
6.582
7.224
6.875
6.947

plots

gasoline %>%
  PLOT.GENERATOR()

conclusion

gasoline shows a strong upward trend throughout the 90’s until it levels off in the 2000s. I am not confident calling this a trend as this also happens to highly correlate with the economy. It may be possible that this data is more cyclical with barrels rising and falling with the economy. In general we would expect to travel less and order fewer things when the economy is poor. There is strong seasonality that shows that people tend to travel more during the Summer months. This is unsurprising of course as people are likely travelling on vacations in July and August.