Chapter 2 Time series graphics

Problem 2.6

Use the following graphics functions: autoplot(), ggseasonplot(), ggsubseriesplot(), gglagplot(), ggAcf() and explore features from the following time series: hsales, usdeaths, bricksq, sunspotarea, gasoline.

Can you spot any seasonality, cyclicity and trend? What do you learn about the series?

a. hsales

This set contains data on the Monthly sales of new one-family houses sold in the USA since 1973.

The housing sale data shows seasonality - sales peak around April. This seasonality can also be seen in the ACF plot, showing how strong positive correlation of 12 month lags. Also, from the main autoplot we can see cyclicity, as there appears to be a 10 year long boom/bust cycle in sales. There doesn’t seem to be an upwards or downwards trend in sales.

b. usdeaths

Monthly accidental deaths in USA.

The summer season correlates with an increase in the number of accidental deaths. We can see the effects of seasonality in the ggseasonplot, ggsubseriesplot, gglagplot and ggAcf. In all these plots, we see the effect of the 12 month cycle.

There may be a hint of a multi-year downward trend in the rate of deaths. However, this trend seems to have tapered off towards the end of the time series.

c. bricksq

Australian quarterly clay brick production: 1956–1994.

There is a strong positive trend at the beggining of the time series that appears to even out. There is a mild seasonality effect repeating every 4 lag period (once per year). This indicates that one of the quarters consistently shows higher brick production than the rest - 3rd Quarter.

Within the range of this series, we cannot discern a multi-year cycle in effect.

d. sunspotarea

Annual averages of the daily sunspot areas (in units of millionths of a hemisphere) for the full sun. Sunspots are magnetic regions that appear as dark spots on the surface of the sun. The Royal Greenwich Observatory compiled daily sunspot observations from May 1874 to 1976. Later data are from the US Air Force and the US National Oceanic and Atmospheric Administration. The data have been calibrated to be consistent across the whole history of observations.

We can observe a 10 year long seasonal effect when the sunspot areas maximize before a reduction. There is no apparent trend in the growth or reduction of sunspot areas. However, there does appear to be a multi-decade seasonal effect not fully captured by the range-span of this data set.

e. gasoline

US finished motor gasoline product supplied: Weekly data beginning 2 February 1991, ending 20 January 2017. Units are “million barrels per day”.

There is an overal positive trend in the production between 1991 and 2005. After 2005, the trend decreases before starting to increase until the end of the recorded series. There is seasonality in effect as the quantity maxes near week number 34 which makes sense as this is the peark of summer time when most people travel and consume gasoline. This is repeated every year which explains the peak in the ggAcf happening at week 52.18 which is the average out annual week lengh.

Problem 2.1

Use the help function to explore what the series gold, woolyrnq and gas represent.

a. Use autoplot() to plot each of these in separate plots.

The first series shows the daily morning price of gold in a multi-year time span. The data set doesn’t appear to show a pronounced seasonality. However, there may be a long-term cyclic pattern not fully captures by the time series. This is the observed downward trend around 700 days after the start of the series in 1 January 1985.

The second series shows the quarterly production of wool in Australia. There appears to be seasonal effects (always a deep after a peak) and cyclic behavior (decades wide) present.

The last series is on monthly gas production in Australia. We can observe an increasing trend over the range of series as well as seasonal high-low behavior withing a year.

b. What is the frequency of each series?

The observation frequency of the gold data is 1. There is no seasonal pattern so they frequency is equal to the base unit of reporting - 1 day.

The observation frequency of the wool data is 4. There are 4 quarters in a year before the pattern resets.

The observation frequency of the gas data is 12. There are 12 months before the before the seasonal pattern repeats.

c. Use which.max() to spot the outlier in the gold series. Which observation was it?

The highest value of gold happens 770 days after the start of the time series or in 1987-02-10. The price at this date reaches a value of 593.7 dollars.

Problem 2.2

Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.

Showing the first entries of the company data

##          Sales AdBudget   GDP
## 1981 Q1 1020.2    659.2 251.8
## 1981 Q2  889.2    589.0 290.9
## 1981 Q3  795.0    512.5 290.8
## 1981 Q4 1003.9    614.1 292.4
## 1982 Q1 1057.7    647.2 279.1
## 1982 Q2  944.4    602.0 254.0

Construct time series plots of each of the three series

Below we can see how the autoplot function plots for us all three series at once with independent y-axis but sharing the same x-axis.

Check what happens when you don’t include facets=TRUE.

The plot below shows when we don’t include the command “facets=TRUE”. Seems an style improvement as the plot assigns color to the different series.

Problem 2.3

Download some monthly Australian retail data from the book website. These represent retail sales in various categories for different Australian states, and are stored in a MS-Excel file.

Select one of the time series as follows (but replace the column name with your own chosen column):

myts <- ts(retaildata[,“A3349873A”], frequency=12, start=c(1982,4)) Explore your chosen retail time series using the following functions:

autoplot(), ggseasonplot(), ggsubseriesplot(), gglagplot(), ggAcf()

a. Using tsdisplay()

This functions shows combines a plot of the data set, complete auto-correlation and partial auto-correlation plot at the same time. It is very useful to quickly see the most important aspects of our data.

##       Apr  May  Jun  Jul  Aug  Sep
## 1982 41.7 43.1 40.3 40.9 42.1 42.0

b. Using autplot()

Autoplot gets to the point and only show us a plot of the series value Vs. it’s time index in the x-axis.

c. Using ggseasonplot()

ggseasonplot() breaks up the series into parallel seasons. In this case, the length of one season is one year and we have folded every season into a circle using the “polar = TRUE” option.

d. Using ggsubseriesplot()

ggsubseriesplot() breaks up the time series into seasons and collects then together. Below we can see how the data per month for all the years has been aggregated. Also, we can see how the mean value per season has been added to graph - blue horizontal line.

e. Using gglagplot()

gglagplot() we use to plot observations against another observation that occurred some time previously. In the example below, we see that the previous times (lags) chosen are between 1 to 16 months. A lag of the 12 months shows the strongest correlation of the data vs itself. Makes sense as the patterns follow an annual (12 months) season.

f. Using ggAcf()

ggAcf() will plot the auto-correlation coefficients for a range of different lag values. The strong positive trends leads to stronger positive correlations in the early lags. We can also see how the ACF values peak at the seasonal 12 month lag that we already saw in gglagplot().

Can you spot any seasonality, cyclicity and trend? What do you learn about the series?

The time series definitely shows a strong positive trend (overall keeps increasing) and strong seasonality (sales peak always in December). Within the available data, there does not seem to be a cycle present. The only factors in play are the upwards trends and seasonality.

*https://afit-r.github.io/ts_exploration*