library(fpp2)
library(zoo)
library(plotly)
Use the help function to explore what the series gold, woolyrnq and gas represent.
gold: Daily morning gold prices in US dollars. 1 January 1985 – 31 March 1989.woolrnq: Quarterly production of woollen yarn in Australia: tonnes. Mar 1965 – Sep 1994.gas: Australian monthly gas production: 1956–1995.Use autoplot() to plot each of these in separate plots.
data(gold)
ggplotly(autoplot(gold))
data(woolyrnq)
ggplotly(autoplot(woolyrnq))
data(gas)
ggplotly(autoplot(gas))
What is the frequency of each series? Hint: apply the frequency() function.
frequency(gold)
## [1] 1
frequency(woolyrnq)
## [1] 4
frequency(gas)
## [1] 12
Use which.max() to spot the outlier in the gold series. Which observation was it?
#position of the outlier
which.max(gold)
## [1] 770
#value of the outlier
gold[which.max(gold)]
## [1] 593.7
Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.
You can read the data into R with the following script:
tute1 <- read.csv("http://otexts.com/fpp2/extrafiles/tute1.csv", header=TRUE)
head(tute1)
## X Sales AdBudget GDP
## 1 Mar-81 1020.2 659.2 251.8
## 2 Jun-81 889.2 589.0 290.9
## 3 Sep-81 795.0 512.5 290.8
## 4 Dec-81 1003.9 614.1 292.4
## 5 Mar-82 1057.7 647.2 279.1
## 6 Jun-82 944.4 602.0 254.0
Convert the data to time series
mytimeseries <- ts(tute1[,-1], start=1981, frequency=4)
#(The [,-1] removes the first column which contains the quarters as we don’t need them now.)
Construct time series plots of each of the three series
ggplotly(autoplot(mytimeseries, facets=TRUE))
Download some monthly Australian retail data from the book website. These represent retail sales in various categories for different Australian states, and are stored in a MS-Excel file.
You can read the data into R with the following script:
temp = tempfile(fileext = ".xlsx")
dataURL <- "https://otexts.com/fpp2/extrafiles/retail.xlsx"
download.file(dataURL, destfile=temp, mode='wb')
retaildata <- readxl::read_excel(temp, skip=1)
#The second argument (skip=1) is required because the Excel sheet has two header rows.
head(retaildata[,"A3349396W"])
## # A tibble: 6 x 1
## A3349396W
## <dbl>
## 1 3396.
## 2 3498.
## 3 3358.
## 4 3487.
## 5 3356.
## 6 3454.
Select one of the time series as follows (but replace the column name with your own chosen column):
myts <- ts(retaildata[,"A3349396W"], frequency=12, start=c(1982,4))
Explore your chosen retail time series using the following functions: autoplot(), ggseasonplot(), ggsubseriesplot(), gglagplot(), ggAcf()
Can you spot any seasonality, cyclicity and trend? What do you learn about the series?
The time plot shows:
theme_set(theme_light(base_size = 12))
p <- autoplot(myts) + ggtitle("Retail Data A3349396W time plot")
ggplotly(p)
The seasonal plot confirms the seasonality and upwards trend found in the time plot. It also shows that the upwards trend has been accelerating.
ggseasonplot(myts, polar = TRUE) + ggtitle("Retail Data A3349396W season plot")
This subseries plot confirms the upwards trend and the peak that we see in Decmeber and the trough that occurs in February. Other than this, the subseries plot for this time series des not tell us much.
ggsubseriesplot(myts) + ggtitle("Retail Data A3349396W season subseries plot")
The lagplot has the strongest linear relationship at lag 12, confirming that the data has annual seasonality. However, it also shows a positive linear relationship for most of the lag plots. This is because the time series is nearly always increasing.
gglagplot(myts) + ggtitle("Retail Data A3349396W lag plot")
The autocorrelation plot has r12 slightly higher than the other lags, due to the annual seasonal pattern in the data. All lags are higher than the dashed blue lines, which indicates the correlations are significantly different then 0.
ggAcf(myts) + ggtitle("Retail Data A3349396W autocorrelation plot")
Each of these plots has confirmed that the series has an annual seasonality and an upwards trend.
Use the following graphics functions: autoplot(), ggseasonplot(), ggsubseriesplot(), gglagplot(), ggAcf() and explore features from the following time series: hsales, usdeaths, bricksq, sunspotarea, gasoline.
Can you spot any seasonality, cyclicity and trend? What do you learn about the series?
Monthly sales of new one-family houses sold in the USA since 1973.
These plots have indicated that the series has an annual seasonality and is strongly dependent on the previous month of data. There is also some evidence of cyclical behavior every 6-9 years.
The time plot shows:
theme_set(theme_light(base_size = 12))
p <- autoplot(hsales) + ggtitle("hsales time plot")
ggplotly(p)
The seasonal plot confirms the seasonality. It also reveals a peak in March and trough in December.
ggseasonplot(hsales, polar = FALSE) + ggtitle("hsales season plot")
This subseries plot shows the seasonal behavior of the time series (increasing from January - March, decreasing from March - December).
ggsubseriesplot(hsales) + ggtitle("hsales season subseries plot")
The lagplot has the strongest linear relationship at lag 1 and the linear relationship continues to weaken as the lag continues until lag 12, when it becomes somewhat linear. This reflects that this time series has some annual seasonality, but mainly depends on the previous month of data.
gglagplot(hsales) + ggtitle("hsales lag plot")
The autocorrelation plot has r12 higher than the other lags, due to the annual seasonal pattern in the data. Even higher is r1, which confirms that the previous month in the time series is indicative of the next. All lags except 16 - 22 are higher than the dashed blue lines, which indicates the correlations are significantly different then 0.
ggAcf(hsales) + ggtitle("hsales autocorrelation plot")
Monthly accidental deaths in USA.
These plots have indicated that the series has an annual seasonality with a peak in June and a trough in February. There is no significant evidence of trend or cyclic behavior.
The time plot shows:
theme_set(theme_light(base_size = 12))
p <- autoplot(usdeaths) + ggtitle("usdeaths time plot")
ggplotly(p)
The seasonal plot confirms the annual seasonality. It also hits at a downward trend, but there is not enough data to confirm.
ggseasonplot(usdeaths, polar = TRUE) + ggtitle("usdeaths season plot")
This subseries plot shows the seasonal behavior of the time series (decreasing from July - February, decreasing from February - July).
ggsubseriesplot(usdeaths) + ggtitle("usdeaths season subseries plot")
The lagplot has the strongest linear relationship at lag 12, confirming the annual seasonality present in the time series.
gglagplot(usdeaths) + ggtitle("usdeaths lag plot")
The autocorrelation plot has r12 higher than the other lags, due to the annual seasonal pattern in the data. Even higher is r1, which shows that the previous month in the time series is indicative of the next. r6 and r18 show strong negative correlations, which further confirm the annual seasonality.
ggAcf(usdeaths) + ggtitle("usdeaths autocorrelation plot")
Australian quarterly clay brick production: 1956–1994.
These plots have indicated that the series has an annual seasonality and an upwards trend. There is also evidence of cyclic behavior every 8 years.
The time plot shows:
theme_set(theme_light(base_size = 12))
p <- autoplot(bricksq) + ggtitle("bricksq time plot")
ggplotly(p)
The seasonal plot confirms the seasonality. It also confirms the trough in Q1 and peak in Q3, as well as the upwards trend.
ggseasonplot(bricksq, polar = TRUE) + ggtitle("bricksq season plot")
This subseries plot doesn’t tell us too much about seasonality, but it does show that the rises and falls in the time series are consistent across quarters.
ggsubseriesplot(bricksq) + ggtitle("bricksq season subseries plot")
The lagplot has the strongest linear relationship at lag 1, but all the lag plots are somewhat linear (with increased variability at higher values).
gglagplot(bricksq) + ggtitle("bricksq lag plot")
The autocorrelation plot has peaks at r4, r8, r12 due to the annual seasonal pattern in the data. Even higher is r1, which confirms that the previous quarter in the time series is indicative of the next. All lags are higher than the dashed blue lines, which indicates the correlations are positive and significantly different then 0.
ggAcf(bricksq) + ggtitle("bricksq autocorrelation plot")
Annual averages of the daily sunspot areas (in units of millionths of a hemisphere) for the full sun. Sunspots are magnetic regions that appear as dark spots on the surface of the sun. The Royal Greenwich Observatory compiled daily sunspot observations from May 1874 to 1976. Later data are from the US Air Force and the US National Oceanic and Atmospheric Administration. The data have been calibrated to be consistent across the whole history of observations.
These plots have indicated that the series only showcases cyclic behavior every ~10-12 years.
The time plot shows:
theme_set(theme_light(base_size = 12))
p <- autoplot(sunspotarea) + ggtitle("sunspotarea time plot")
ggplotly(p)
Data are not seasonal
Data are not seasonal
The lagplot has the strongest linear relationship at lag 10 - this is further confirmation that the data is cyclic (on a 10 - 12 year period).
gglagplot(sunspotarea, lag = 12) + ggtitle("sunspotarea lag plot")
The autocorrelation plot has positive correlation peaks at r1, r10, r21, and r33. Negative correlation peaks occur at r5, r16, and r17. This confirms that each cycle is 10 - 12 years.
ggAcf(sunspotarea, lag.max = 36) + ggtitle("sunspotarea autocorrelation plot")
These plots have indicated that the series has an annual seasonality and has a mostly upwards trend. No evidence of cyclic behavior was seen.
The time plot shows:
theme_set(theme_light(base_size = 12))
p <- autoplot(gasoline) + ggtitle("gasoline time plot")
ggplotly(p)
The seasonal plot confirms the weekly seasonality and upwards trend. It also reveals a peak in weeks 30 - 39 and a trough during weeks 5 - 11.
ggseasonplot(gasoline, polar = TRUE) + ggtitle("gasoline season plot")
Since there are not exactly 52 weeks in a year, the frequency is non-integer and the plot produces an error.
#ggsubseriesplot(gasoline, frequency = 52) + ggtitle("gasoline season subseries plot")
The lagplot has the strongest linear relationship at lag 1 and the linear relationship continues to weaken as the lag continues. This reflects that this time series mainly depends on the previous week of data.
gglagplot(gasoline) + ggtitle("gasoline lag plot")
The autocorrelation plot has ~r52 and ~r102 higher than the other lags, indicating an annual seasonal pattern in the data. There are troughs at ~r26 and ~r78 - this further confirms annual seasonality.
ggAcf(gasoline) + ggtitle("gasoline autocorrelation plot")
The predicatability of an event or quantity depends on several factors:
Forecasting methods depend on what data are available and the predicability of the quantity to be forecast. Methods include: 1. Naive method - Using the most recent observation as a forecast 2. Judgemental forecasting - For when historic data is not available. More accurate when the forcaster has important domain knowledge and more up-to-date data.
Forecasting Predicting the future as accurately as possible, using historical data and knowledge of future events. May be short-term (ex - scheduling, demand), medium-term (ex- resource requirements), or long-term (ex - strategic planning).
Goals Events that we would like to happen
Planning A response to forecast and goals
ts objectsTime series can be stored as a ts object in R.
autoplot in R to visualizeSeasonal plots show data plotted against individual seasons to help better identify underlying seasonal patterns and see where the pattern changes. Option - set polar = TRUE to see plot on polar coordinates.
Plot where the data for each season are collected in separate mini time plots.
Used to see relatonships between predictor variables. * Facet by a feature * Find correlation between two time series * Correlation matrix
Note - The kth lag is the time period that happened k time periods before time i. Helpful in identifying seasonality.
Autocorrelation measures the linear relationship between lagged values of a time series. * When data have a trend, autocorrelations for small lags tend to be large and positive * When data are seasonal, autocorrelations will be larger at multiples of the seasonal frequency * Wehn both, autocorrelations will have a combination of these
White noise - time series that show no autocorrelation.