From the time-series it looks like a quadratic trend model without seasonality would work best. There is an obvious dip in the time series (hence the quadratic) and there does not appear to be any seasonality.
library(readr)
library(forecast)
library(ggplot2)
Sales <- read_csv('DeptStoreSales.csv')
Sales_TS <- ts(Sales[,2], frequency = 4)
In order to fit a regression-based exponential trend model the natural logarithm of the Sales must be taken.
nValid <- 4
nTrain <- length(Sales_TS) - nValid
train_TS <- window(Sales_TS, end = c(1, nTrain))
valid_TS <- window(Sales_TS, start = c(1, nTrain + 1))
Exp_Snl <- tslm(train_TS ~ trend + season, lambda = 0)
Exp_Snl <- tslm(train_TS ~ trend + I(trend^2) + season, lambda = 0)
Plot1 <- autoplot(train_TS, series = 'Training Data') + autolayer(Exp_Snl$fitted.values, series = 'Fitted Values')
Plot1
summary(Exp_Snl)
##
## Call:
## tslm(formula = train_TS ~ trend + I(trend^2) + season, lambda = 0)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.027792 -0.014381 -0.001153 0.014211 0.021609
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.8100603 0.0149412 723.505 < 2e-16 ***
## trend -0.0057993 0.0030151 -1.923 0.0750 .
## I(trend^2) 0.0008041 0.0001393 5.772 4.83e-05 ***
## season2 0.0265642 0.0116937 2.272 0.0394 *
## season3 0.1669514 0.0117618 14.194 1.05e-09 ***
## season4 0.4337455 0.0118710 36.538 2.73e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.01845 on 14 degrees of freedom
## Multiple R-squared: 1, Adjusted R-squared: 1
## F-statistic: 2.001e+12 on 5 and 14 DF, p-value: < 2.2e-16
From the summary above we can see that average Q2 sales are slightly higher than Q1 sales, but the difference is not statistically significant.
FC_2122 <- forecast(Exp_Snl, h = 2)
autoplot(FC_2122)
Plot1 + xlab('Quarter') + ylab('Sales')
autoplot(train_TS - Exp_Snl$fitted.values, xlab = 'Year', ylab = 'Residual', main = 'Residual Plot')
## Now plot just the residuals for the first and second quarter of each year
plot((train_TS - Exp_Snl$fitted.values)[rep(c(TRUE, TRUE, FALSE, FALSE),5)],
xaxt = 'n', xlab = 'Quarter', ylab = 'Residual', type = 'b',
main = 'Plot of First and Second Quarter Residuals for Each Year')
axis(1, at = 1:10, labels = c(1,2,5,6,9,10,13,14,17,18))
abline(h = 0, col = 'red')
It is hard to tell from looking at the above plots, the model doesn’t appear to consistently over- or under-forecast. I would expect the the forecasts to be reasonably close to the real values.
There doesn’t seem to be any seasonal pattern to the residuals (indicating that the regression model captured the seasonality well), but there does seem to be an overall U-shaped pattern to the residuals indicating that the model did not capture the trend well.
Adding a quadratic component to the exponential trend and seasonality model would be the better approach.
SouvSales <- read_csv('SouvenirSales.csv')
SouvSales_TS <- ts(SouvSales$Sales, start = c(1995, 1), frequency = 12)
nValid <- 12
nTrain <- length(SouvSales_TS) - nValid
trainSS_TS <- window(SouvSales_TS, end = c(1995, nTrain))
validSS_TS <- window(SouvSales_TS, start = c(1995, nTrain + 1))
From looking at the time-series (fig. 6.14 of textbook) there appears to be both trend and seasonality. Thus there should be predictors for both trend and seasonality in addition to the intercept. There should be 1 predictor for the intercept, 1 for trend, and 11 (12 months - 1) predictors for seasonality. In total there should be 13 predictors in the regression model.
Model_A <- tslm(trainSS_TS ~ trend + season)
Plot <- autoplot(trainSS_TS, series = 'Training Data', ylab = 'Sales (Aus$)') +
autolayer(Model_A$fitted.values, series = 'Model A')
Plot
summary(Model_A)
##
## Call:
## tslm(formula = trainSS_TS ~ trend + season)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12592 -2359 -411 1940 33651
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3065.55 2640.26 -1.161 0.25029
## trend 245.36 34.08 7.199 1.24e-09 ***
## season2 1119.38 3422.06 0.327 0.74474
## season3 4408.84 3422.56 1.288 0.20272
## season4 1462.57 3423.41 0.427 0.67077
## season5 1446.19 3424.60 0.422 0.67434
## season6 1867.98 3426.13 0.545 0.58766
## season7 2988.56 3427.99 0.872 0.38684
## season8 3227.58 3430.19 0.941 0.35058
## season9 3955.56 3432.73 1.152 0.25384
## season10 4821.66 3435.61 1.403 0.16573
## season11 11524.64 3438.82 3.351 0.00141 **
## season12 32469.55 3442.36 9.432 2.19e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5927 on 59 degrees of freedom
## Multiple R-squared: 0.7903, Adjusted R-squared: 0.7476
## F-statistic: 18.53 on 12 and 59 DF, p-value: 9.435e-16
November (coeff = 11524.64) and December (coeff = 32469.55) tend to have the highest average sales, this makes sense because this is tourist season in Australia.
The trend coefficient of 245.36 means that souvenir sales are predicted to increase by $245.36 (not including changes due to seasonality).
Model_B <- tslm(trainSS_TS ~ trend + season, lambda = 0)
Plot <- Plot + autolayer(Model_B$fitted.values, series = 'Model B')
Plot
Exponential trend
The estimated trend coefficient of \(0.02112\) indicates that (ignoring seasonality) souvenir sales are estimated to increase by \(e^{0.02112}\) fold (or \((1 - e^{0.02112})\%\)) in one month (one time unit).
FC_B <- forecast(Model_B, h = 12)
February <- FC_B$mean[2]
print(paste('Forecast for souvenir sales in February 2002 is:',
as.numeric(February),
'Australian dollars.'))
## [1] "Forecast for souvenir sales in February 2002 is: 13243.0947639253 Australian dollars."
library(pander)
FC_A <- forecast(Model_A, h = 12)
accuracy(FC_A, validSS_TS)
## ME RMSE MAE MPE MAPE MASE
## Training set -5.684342e-14 5365.199 3205.089 6.967778 36.75088 0.855877
## Test set 8.251513e+03 17451.547 10055.276 10.533974 26.66568 2.685130
## ACF1 Theil's U
## Training set 0.4048039 NA
## Test set 0.3206228 0.9075924
pandoc.table(accuracy(FC_A, validSS_TS),
caption = 'Forecast Accuracy Metrics for Model A (12 month horizon)',
split.table = Inf)
##
## ---------------------------------------------------------------------------------------------
## ME RMSE MAE MPE MAPE MASE ACF1 Theil's U
## ------------------ ------------ ------- ------- ------- ------- -------- -------- -----------
## **Training set** -5.684e-14 5365 3205 6.968 36.75 0.8559 0.4048 NA
##
## **Test set** 8252 17452 10055 10.53 26.67 2.685 0.3206 0.9076
## ---------------------------------------------------------------------------------------------
##
## Table: Forecast Accuracy Metrics for Model A (12 month horizon)
pandoc.table(accuracy(FC_B, validSS_TS),
caption = 'Forecast Accuracy Metrics for Model B (12 month horizon)',
split.table = Inf)
##
## ---------------------------------------------------------------------------------------
## ME RMSE MAE MPE MAPE MASE ACF1 Theil's U
## ------------------ ------- ------ ------ -------- ------- -------- -------- -----------
## **Training set** 197.5 2865 1671 -1.473 13.94 0.4463 0.4381 NA
##
## **Test set** 4824 7101 5192 12.36 15.52 1.386 0.4245 0.461
## ---------------------------------------------------------------------------------------
##
## Table: Forecast Accuracy Metrics for Model B (12 month horizon)
plot(as.numeric(validSS_TS - FC_A$mean),
type = 'p', col = 'darkgreen', pch = 16,
xaxt = 'n',
ylab = 'Residual (Aus$)', xlab = 'Month',
main = '2002 Forecast Residuals')
points(as.numeric(validSS_TS - FC_B$mean),
col = 'blue', pch = 18, cex = .95)
abline(h = 0, col = 'red')
axis(1, at = 1:12, labels = c('Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'))
legend('topleft',
legend = c('Model A', 'Model B'),
pch = c(16,18),
col = c('darkgreen', 'blue'))
Model B is preferable to model A. Model B’s RMSE, MAPE, and MASE are all lower for the validation period than Model A’s performance metrics. Also, from looking at the residual plot, we can see that Model A performs especially poorly for the months of November and December. This is an issue because these are the months of the tourist season, the most important sales months of the year, so it is important to have accurate forecasts for these months.
If the goal was to understand the different components of sales I would have used Error Trend Seasonality (ets) smoothing methods. This way we could look at more types of trend and seasonality, such as additive trend with multiplicative seasonality. The regression models we used only looked at additive trend with additive seasonality and multiplicative trend with multiplicative seasonality. Additionally, the weighting parameters in the ets models would tell us how much Sales dependended on historical values. Finally, a quick and easy way to get a look at the different components in Sales would be to use the decompose function in R as shown below.