Week 2 - Time Series - Homework

C. Rosemond 09.06.20


library(fpp2)
library(readxl)


2.1

Use the help function to explore what the series gold, woolyrnq and gas represent.

#?gold # daily morning gold prices in US dollars (01.01.1985 - 03.31.1989)
#?woolyrnq # quarterly production of woollen yarn in Australia in tons (03.1965 - 09.1994)
#?gas # Australian monthly gas production (1956 - 1995)


a. Use autoplot() to plot each of these in separate plots.

autoplot(gold) # seasonality (day) and clear outlier(s) (day ~770)

autoplot(woolyrnq) #seasonality (quarter)

autoplot(gas) #seasonality (month) and trend (starting 1970)


b. What is the frequency of each series? Hint: apply the frequency() function.

lapply(list(gold,woolyrnq,gas), frequency) # apply frequency() to each series
## [[1]]
## [1] 1
## 
## [[2]]
## [1] 4
## 
## [[3]]
## [1] 12

The frequency of the gold series is 1 (daily), the frequency of the woolyrnq series is 4 (quarterly), and the frequency of the gas series is 12 (monthly).


c. Use which.max() to spot the outlier in the gold series. Which observation was it?

which.max(gold) # day 770 - plot observation confirmed
## [1] 770

The outlier in the gold series is day 770, which confirms my initial plot observation.



2.2

Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.

a. You can read the data into R with the following script:

tute1 <- read.csv("tute1.csv", header=TRUE) # 100 x 4
View(tute1)


b. Convert the data to time series

mytimeseries <- ts(tute1[,-1], start=1981, frequency=4) # remove first column and create time series object starting in 1981 with a frequency of 4 (quarterly)


c. Construct time series plots of each of the three series

autoplot(mytimeseries, facets=TRUE)

autoplot(mytimeseries, facets=FALSE)

When TRUE, facets results in three separate time series plots, each with its own data-specific y-axis but a shared x-axis. When FALSE, the three time series are combined in a single plot with shared y- and x-axes.



2.3

Download some monthly Australian retail data from the book website. These represent retail sales in various categories for different Australian states, and are stored in a MS-Excel file.

a. You can read the data into R with the following script:

retaildata <- read_excel("retail.xlsx", skip=1) # 381 x 19


b. Select one of the time series as follows (but replace the column name with your own chosen column):

myts <- ts(retaildata[,4], frequency=12, start=c(1982,4)) # 4th column ("A3349338X")


c. Explore your chosen retail time series using the following functions: autoplot(), ggseasonplot(), ggsubseriesplot(), gglagplot(), ggAcf(). Can you spot any seasonality, cyclicity and trend? What do you learn about the series?

autoplot(myts)

ggseasonplot(myts)

ggsubseriesplot(myts)

gglagplot(myts)

ggAcf(myts)

autoplot()

The time series is trending upward from the start and shows monthly seasonality. Starting in late 1994, there is increasing variation that continues to grow over time. The series also shows a notable dip around the start of the year 2000.

ggseasonplot()

The seasonal plot reflects the initial plot in that more recent years show greater sales. There is variation across years, but more recently, there appear to be common dips in sales in February, June, and November. Notably, irrespective of year, sales show a clear increase from November to December. The holiday shopping season could be one driver of that increase.

ggsubseriesplot()

The seasonal subseries plot confirms the notably higher sales in December and relatively lower sales in February and June. Regardless of month, recent sales have decreased from a relative peak a few years earlier (5-10 years based on prior plots).

gglagplot()

The lag plots are faceted and somewhat difficult to interpret. All of the plots show strong positive relationships, which suggest strong seasonality of the data. There also appears to be increasing variation as k increases, which reflects the general change in the time series over time.

ggAcf()

The autocorrelation plot shows a trend in that it shows large positive values that decrease as k increases. The slightly higher values for lags of 12 and 24--multiples of the monthly seasonal frequency--suggest seasonality.



2.6

Use the following graphics functions: autoplot(), ggseasonplot(), ggsubseriesplot(), gglagplot(), ggAcf() and explore features from the following time series: hsales, usdeaths, bricksq, sunspotarea, gasoline.

Can you spot any seasonality, cyclicity and trend?
What do you learn about the series?
plot_function <- function(ts) { # function as attempt to save space
  print(autoplot(ts))
  print(ggseasonplot(ts))
  print(ggsubseriesplot(ts))
  print(gglagplot(ts, do.lines = FALSE)) # the line plots are an absolute mess
  print(ggAcf(ts))
}


hsales

#?hsales # monthly sales of new single family homes in US (1973 -)
plot_function(hsales)

autoplot()

Monthly single family home sales show seasonality (monthly). There is no clear overall trend, but there does appear to be cyclicity, of roughly seven to ten years, from trough to peak and back (e.g. 1975 to 1982, 1982 to 1992).

ggseasonplot()

The seasonal plot shows some patterns: notably, home sales tend to increase into early spring, and they tend to decrease into winter.

ggsubseriesplot()

The seasonal subseries plot confirms the seasonal plot in that spring months tend to show higher sales than do late fall and early winter months.

gglagplot()

The lag plots show a somewhat positive relationship regardless of k, though the relationship appears weaker as k increases.

ggAcf()

The autocorrelation plot suggests strong seasonality given the relative peaks at k = 1, 12, and 24. There is long-term downward change, though it's unclear whether that change indicates a clear trend. Also, around k = 18, there is insignificant autocorrelation, meaning the values are not significantly different from zero.


usdeaths

#?usdeaths # monthly accidental deaths in USA (1973 - 1978)
plot_function(usdeaths)

autoplot()

Monthly accidental deaths--in the USA, from 1973 to 1978--show clear seasonality (monthly) but no apparent trend or cyclicity. There are defined peaks during the summer months and troughs during the winter months.

ggseasonplot()

The seasonal plot shows, irrespective of year, the number of deaths increasing through the spring months to a peak in July, and decreasing through the fall and winter to a trough in February.

ggsubseriesplot()

Likewise, the seasonal subseries plot shows the July peaks and February troughs, though there is some within-month variation in deaths over the course of the time series.

gglagplot()

The faceted lag plots show seasonality, with a positive relationship for k = 1 and a clearer, stronger positive one for k = 12. There are some possibly negative relationships for k = 5 through k = 7.

ggAcf()

The autocorrelation plot appears like a cosine wave, with clear seasonality given the relative peaks at k = 1, 12, and 24 (and relative troughs at k = 6 and 18). Between those peaks and troughs, the positive autocorrelations are higher (in magnitude) and more significant than the negative ones.


bricksq

#?bricksq # quarterly clay brick production (1956 - 1994)
plot_function(bricksq)

autoplot()

Quarterly clay brick production show seasonality (quarterly) and a trend from 1956 to roughly 1975, when production growth began to flatten. There are several large decreases (~1975, 1983, and 1991), and also possible seven-ish-year cycles (1975 to 1982, 1983 to 1990).

ggseasonplot()

The seasonal plot appears to show slightly higher brick production during quarters 2 and 3, with relatively lower production during quarter 1 in particularly. There is clear growth in production over time, to a point, which reflects the prior time series plot.

ggsubseriesplot()

The seasonal subseries plot reveals similar production patterns--upward trend to plateau with greater variation--across quarters, though at different levels. Here again, quarters 2 and 3 have the highest average production, and quarter 1 has the lowest.

gglagplot()

Regardless of value of k and quarter, the faceted lag plots show positive relationships with greater variation (somewhat weaker correlations) as lagged values increase.

ggAcf()

The autocorrelation plot shows seasonality given the relative peaks where lags are multiples of four, as well as the overall trend in production given the large positive autocorrelations decrease as the number of lags increases.


sunspotarea

#?sunspotarea # annual average sunspot area (1875 - 2015)
autoplot(sunspotarea)

gglagplot(sunspotarea, do.lines = FALSE)

ggAcf(sunspotarea)

autoplot()

The annual averages show roughly decade-long, relatively fixed patterns (fixed cycles? multi-year seasons?) from trough to peak (and vice versa). The peaks trend upward from 1875 to around 1960.

ggseasonplot()

The function returns an error given the data are annual and not seasonal.

ggsubseriesplot()

The function returns an error given the data are annual and not seasonal.

gglagplot()

The faceted lag plots do not depict particularly strong relationships. The plot for k = 1 is somewhat more positive, while the plots for k = 4, 5, and 6 are more negative.

ggAcf()

The autocorrelation plot reflects the patterns apparent in the time series plot. There are relatively strong peaks at k = 1, 10/11, and 21, and relatively strong troughs at k = 5 and 16.


gasoline

#?gasoline # US finished motor gasoline product supplied (02.02.1991 - 01.20.2017)
autoplot(gasoline)

ggseasonplot(gasoline)

gglagplot(gasoline, do.lines = FALSE)

ggAcf(gasoline)

autoplot()

US gasoline production trends upward from 1991 before leveling off in 2007 and decreasing slightly through around 2014. Seasonality is somewhat apparent in the plot, though it is difficult to discern that the data are in fact seasonal by week. There could be cyclicity given the slight years-long decrease then increase from 2007 through 2017.

ggseasonplot()

Given the number of years in the time series, the season plot is cluttered and tough to interpret. Generally, there appears to be slightly greater relative supply during the middle of the year.

ggsubseriesplot()

The function returns an error. Regardless, given previous subseries plots, I am unsure how a weekly subseries plot could be visualized clearly.

gglagplot()

The faceted lag plots reveal relatively strong positive autocorrelations regardless of value of k or, perhaps, week of the year. There is not clear evidence of negative autocorrelation.

ggAcf()

The autocorrelation plot suggests the time series features both a trend--large positive autocorrelations decrease in size as k increases--and seasonality (relative peaks on the year).