Use the help function to explore what the series gold, woolyrnq and gas represent.
The gold
time series provides daily morning gold prices in US dollars from 1 January 1985 – 31 March 1989.
woolyrnq
provides quarterly production of woollen yarn in Australia in tonnes from Mar 1965 – Sep 1994.
gas
provides Australian monthly gas production from 1956–1995.
#help(gold)
#help(woolyrnq)
#help(gas)
Use autoplot() to plot each of these in separate plots.
autoplot(gold)+
ggtitle("Gold prices time series")
autoplot(woolyrnq)+
ggtitle("Wool production time series")
autoplot(gas)+
ggtitle("Gas production time series")
What is the frequency of each series? Hint: apply the frequency() function.
The frequencies of the time series are 1, 4 and 12 respectively. The gold
ts has no defined seasonality. For woolyrnq
and gas
, the frequency is the number of observations per year.
frequency(gold)
## [1] 1
frequency(woolyrnq)
## [1] 4
frequency(gas)
## [1] 12
Use which.max() to spot the outlier in the gold series. Which observation was it?
Observation 770 was the outlier
which.max(gold)
## [1] 770
Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labeled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.
#You can read the data into R with the following script:
tute1 <- read.csv("tute1.csv", header=TRUE)
head(tute1)
## X Sales AdBudget GDP
## 1 Mar-81 1020.2 659.2 251.8
## 2 Jun-81 889.2 589.0 290.9
## 3 Sep-81 795.0 512.5 290.8
## 4 Dec-81 1003.9 614.1 292.4
## 5 Mar-82 1057.7 647.2 279.1
## 6 Jun-82 944.4 602.0 254.0
# Convert the data to time series
mytimeseries <- ts(tute1[,-1], start=1981, frequency=4)
#(The [,-1] removes the first column which contains the quarters as we don’t need them now.)
# Construct time series plots of each of the three series
autoplot(mytimeseries, facets=TRUE)
autoplot(mytimeseries)
#Check what happens when you don’t include facets=TRUE.
# without this parameter set to TRUE, the data will be plot on the same set of axes instead of separate as small multiples.
Download some monthly Australian retail data from the book website. These represent retail sales in various categories for different Australian states, and are stored in a MS-Excel file.
#You can read the data into R with the following script:
retaildata <- readxl::read_excel("retail.xlsx", skip=1)
#The second argument (skip=1) is required because the Excel sheet has two header rows.
#Select one of the time series as follows (but replace the column name with your own chosen column):
myts <- ts(retaildata[,"A3349903C"],
frequency=12, start=c(1982,4))
#Explore your chosen retail time series using the following functions:
autoplot(myts)
ggseasonplot(myts)
ggsubseriesplot(myts)
gglagplot(myts)
ggAcf(myts)
Can you spot any seasonality, cyclicity and trend? What do you learn about the series?
There is an obvious increasing trend over time, the rate of increase does change over different time periods with steady increase during the 80’s before leveling off during the early-mid 90’s, then sharp increases from the late nineties through about 2005. There is still overall increase after that but the rate of increase flattens. Seasonality is certainly a major factor, particularly in December when values are highest, with a second bump generally occurring in May - February is consistently the lowest. Sales show high levels of serially correlation due to the underlying trend.
Use the following graphics functions: autoplot(), ggseasonplot(), ggsubseriesplot(), gglagplot(), ggAcf() and explore features from the following time series: hsales, usdeaths, bricksq, sunspotarea, gasoline.
hsales
: There is no long-term trend for the time period provided. Seasonality is a major factor with home sales being highest in March, then decreasing throughout the rest of the calendar year before picking up in January and Februaury. An arguement could be made for the presence of cycles, with noticeable troughs approximately every 7 to 9 years (1975, 1984. 1991), but a larger time period would be needed to verify. Potential cycles are visible in the ACF plot which shows 25 years of sales.
autoplot(hsales)
ggseasonplot(hsales, polar=T)
ggsubseriesplot(hsales)
gglagplot(hsales, set.lags = 1:12, do.lines=F)
ggAcf(hsales, lag.max=300)
USdeaths
: The time series only provides a few years of data so not much can be said regarding trends or cycles, but strong seasonality is clearly present in the occurrence of accidental deaths in the US. The summer months see much higher levels when compared to the cold weather months.
autoplot(usdeaths)
ggseasonplot(usdeaths)
ggsubseriesplot(usdeaths)
gglagplot(usdeaths, lags=12, do.lines = F)
ggAcf(usdeaths, lag.max = 12)
bricksq
: There’s an upward trend in the data, but cycles appear to be more important. Seasonality is not pronounced, except for Q1 which is typically lower than the rest. There is a large amount of autocorrelation.
autoplot(bricksq)
ggseasonplot(bricksq, polar=T)
ggsubseriesplot(bricksq)
gglagplot(bricksq, do.lines=F)
ggAcf(bricksq, lag.max = 300)
sunspotarea
: This data is cyclic with peaks and troughs ocurring approximately 10-11 years - the lag plot demonstrates this. This data is aggregated yearly so seasonality is not applicable.
data <- sunspotarea
autoplot(sunspotarea)
#ggseasonplot(data)
#ggsubseriesplot(data)
gglagplot(sunspotarea, set.lags = 5:15, do.lines=F)
ggAcf(sunspotarea, lag.max=100)
The gas
time series shows an increasing trend with ramped up production starting in about 1970 to about 1985. There is strong seasonality in the data, with July being the highest month of production. Since this data is Australian gasoline production, it corresponds to winter. There is also strong serial correlation due to the rapidly increasing long-term trend.
autoplot(gas)
ggseasonplot(gas, polar = T)
ggsubseriesplot(gas)
gglagplot(gas)
ggAcf(gas, lag.max = 100)