The data below represent the monthly sales (in thousands) of product A for a plastics manufacturer for years 1 through 5 (data set plastics
).
if (!require("fma")) install.packages("fma")
library(fma)
plastics
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1 742 697 776 898 1030 1107 1165 1216 1208 1131 971 783
## 2 741 700 774 932 1099 1223 1290 1349 1341 1296 1066 901
## 3 896 793 885 1055 1204 1326 1303 1436 1473 1453 1170 1023
## 4 951 861 938 1109 1274 1422 1486 1555 1604 1600 1403 1209
## 5 1030 1032 1126 1285 1468 1637 1611 1608 1528 1420 1119 1013
Plot the time series of sales of product A. Can you identify seasonal fluctuations and/or a trend?
The Time Plot is a line graph that plots each observed value against the time of the observation, with a single line connecting each observation across the entire period. This plot is useful for graphically interpreting time series data.
plot(plastics, ylab="Sales (Thousands)", xlab="Month", main="Plastics Time Plot")
This Time Plot shows very clearly that the data have an overall upward trend and seasonal fluctuations. The seasonality results in decreased sales at the beginning of each year and increased sales toward the end of each year.
Use a classical multiplicative decomposition to calculate the trend-cycle and seasonal indices.
The Decomposition Plot graphs the observed values, the underlying trend, seasonality, and randomness of the data. The method the decompose
function that produces the plot uses for extracting components is Seasonal and Trend decomposition using Loess. This method is can handle any tape of seasonality and is robust to outliers.
plot(decompose(plastics, type="multiplicative"))
Plotting the trend-cycle and seasonal indices (\(m\)) computed by the multiplicative decomposition decompose
function shows that the data have an upward trend, seasonal fluctuations, and fairly random residuals.
Do the results support the graphical interpretation from part (a)?
Comparison of results of 6.2.a and 6.2.b can be furthered by looking at a Seasonal Plot. A Seasonal Plot is a line graph that plots each observed value against the time of the observation, but with seasons in the x-axis and separate lines connecting the observations for a given year across seasons.
seasonplot(plastics, ylab="Sales (Thousands)", xlab="Month",
year.labels.left=TRUE, main="Plastics Seasonal Plot", col=1:20, pch=19)
The results of the classical multiplicative decomposition output are in accordance with the graphical interpretation. There was an upward trend and seasonal fluctuation. This is further evidenced in the Seasonal Plot. In this time series, those components were relatively easy to see.
Compute and plot the seasonally adjusted data.
Seasonally Adjusted data are derived by removing the seasonal component from the original data. This is achieved by first doing a Seasonal and Trend decomposition using Loess (STL) then applying the seasadj
function. STL is a very versatile and robust method for decomposing a time series. The function stl
decomposes a time series into seasonal, trend, and irregular components using loess. The trend window parameter (t.window
) controls wiggliness of trend component and the seasonal window parameter (s.window
) controls variation on seasonal component.
plot(plastics, col="grey", ylab="Sales (Thousands)", xlab="Month", main="Seasonally Adjusted Plastics")
lines(seasadj(stl(plastics, s.window="periodic")), col="red", ylab="Seasonally Adjusted")
Seasonal adjusting the data is relatively easy once the time series is decomposed using STL. The stl
function s.window
parameter is set to periodic
so that the span (in lags) of the loess window for seasonal extraction is calculated using defaults. The resulting plot shows the seasonally adjusted data represented by a red line. This seasonally adjusted data still contains the trend-cycle and remainder components which make it go upward in a jagged pattern.
Change one observation to be an outlier (e.g., add 500 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?
The observation to be changed is chosen at random. The R
function sample()
takes a sample of the specified size from the elements of \(x\). Using this function, a randomly selecti element from plastics
to increase by 500 is selected. A Time Plot comparison is then made of the original and modified seasonally adjusted data.
plastics2 <- plastics
set.seed(2018)
index <- sample(1:length(plastics), 1)
obs <- plastics2[index]
data.frame(index, obs, outlier=obs+500)
## index obs outlier
## 1 21 1341 1841
plastics2[index] <- obs + 500
par(mfrow=c(1, 2))
plot(plastics, col="grey", ylab="Sales (Thousands)", xlab="Month", main="Seasonally Adjusted")
lines(seasadj(stl(plastics, s.window="periodic")), col="red", ylab="Seasonally Adjusted")
plot(plastics2, col="grey", ylab="Sales (Thousands)", xlab="Month", main="Random Outlier")
lines(seasadj(stl(plastics2, s.window="periodic")), col="red", ylab="Seasonally Adjusted")
First the seed is set to facilitate reproducible research. After duplicating the dataset, a random index number is selected and its corresponding value is increased by 500. This value is the outlier. Once the original seasonally adjusted data and modified seasonally adjusted data are plotted side by side, it can be seen that the outlier creates a spike in the Seasonally Adjusted data but does little to increase the volatility of the seasonally adjusted data.
Does it make any difference if the outlier is near the end rather than in the middle of the time series?
The observation to be changed is final element. It will be increased by 500 and a Time Plot comparison will then be made of the original and modified seasonally adjusted data.
plastics3 <- plastics
plastics2[length(plastics)] <- obs + 500
par(mfrow=c(1, 2))
plot(plastics, col="grey", ylab="Sales (Thousands)", xlab="Month", main="Seasonally Adjusted")
lines(seasadj(stl(plastics, s.window="periodic")), col="red", ylab="Seasonally Adjusted")
plot(plastics3, col="grey", ylab="Sales (Thousands)", xlab="Month", main="Trailing Outlier")
lines(seasadj(stl(plastics3, s.window="periodic")), col="red", ylab="Seasonally Adjusted")
After duplicating the dataset, the final element in the dataset increased by 500. This value is the outlier. Once the original seasonally adjusted data and modified seasonally adjusted data are plotted side by side, it can be seen that the outlier at the end of the time series has no effected on the Seasonally Adjusted data.
Use a random walk with drift to produce forecasts of the seasonally adjusted data.
Seasonal adjusting the data is relatively easy once the time series is decomposed using STL. The stl
function t.window
parameter is set to NULL
and s.window
parameter set to periodic
so that the span (in lags) of the loess window for trend/seasonal extraction is calculated using defaults. A random walk model with drift forecast, equivalent to an ARIMA(0,1,0) model with a drift coefficient, is then implemented using the rwf
function with the drift
parameter set to TRUE
. The data are then plotted with a Time Plot.
fit <- stl(plastics, t.window=NULL, s.window="periodic", robust=TRUE)
plot(rwf(seasadj(fit), drift=T), ylab="Sales (Thousands)", xlab="Month",
main="Naive Random Walk Forecast of Seasonally Adjusted Data")
The forecast reflects the trend in its drift, but ignores seasonality. Seasonality is potentially captured in the prediction intervals, but the accuracy of predictions will be too low to be of technically use. Overall, the forecast and its prediction intervals are of minimal use.
Reseasonalize the results to give forecasts on the original scale.
Forecasts are “reseasonalized” by adding a seasonal component to the forecast. This is achieved by first doing a Seasonal and Trend decomposition using Loess (STL), then forecasting without seasonal adjustment. For STL, the t.window
parameter is set to NULL
and s.window
parameter set to periodic
so that the span (in lags) of the loess window for trend/seasonal extraction is calculated using defaults. A random walk model with drift forecast, equivalent to an ARIMA(0,1,0) model with a drift coefficient, is then implemented using the forecast
function with the method
parameter set to rwdrift
. The data are then plotted with a Time Plot.
fit <- stl(plastics, t.window=NULL, s.window="periodic", robust=TRUE)
fcast <- forecast(fit, method="rwdrift")
plot(fcast, ylab="Sales (Thousands)", xlab="Month",
main="Naive Random Walk Forecasts of Seasonally Adjusted Data")
The reseasonalized random walk with drift forecast produces a very useful forecast. The forecast reflect both trend and seasonality. Both the prediction and the production intervals continue in the trend and seasonal pattern of the original data. Using this forecast would yeild better results than a non-reseasonalized forecast.