Question 2

From the time-series it looks like a quadratic trend model without seasonality would work best. There is an obvious dip in the time series (hence the quadratic) and there does not appear to be any seasonality.



Question 4

library(readr)
library(forecast)
library(ggplot2)

Sales <- read_csv('DeptStoreSales.csv')

Sales_TS <- ts(Sales[,2], frequency = 4)


A

In order to fit a regression-based exponential trend model the natural logarithm of the Sales must be taken.


B

nValid <- 4
nTrain <- length(Sales_TS) - nValid

train_TS <- window(Sales_TS, end = c(1, nTrain))
valid_TS <- window(Sales_TS, start = c(1, nTrain + 1))

Exp_Snl <- tslm(train_TS ~ trend + season, lambda = 0)

Exp_Snl <- tslm(train_TS ~ trend + I(trend^2) + season, lambda = 0)

Plot1 <- autoplot(train_TS, series = 'Training Data') + autolayer(Exp_Snl$fitted.values, series = 'Fitted Values')
Plot1


C

summary(Exp_Snl)
## 
## Call:
## tslm(formula = train_TS ~ trend + I(trend^2) + season, lambda = 0)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.027792 -0.014381 -0.001153  0.014211  0.021609 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 10.8100603  0.0149412 723.505  < 2e-16 ***
## trend       -0.0057993  0.0030151  -1.923   0.0750 .  
## I(trend^2)   0.0008041  0.0001393   5.772 4.83e-05 ***
## season2      0.0265642  0.0116937   2.272   0.0394 *  
## season3      0.1669514  0.0117618  14.194 1.05e-09 ***
## season4      0.4337455  0.0118710  36.538 2.73e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01845 on 14 degrees of freedom
## Multiple R-squared:      1,  Adjusted R-squared:      1 
## F-statistic: 2.001e+12 on 5 and 14 DF,  p-value: < 2.2e-16

From the summary above we can see that average Q2 sales are slightly higher than Q1 sales, but the difference is not statistically significant.


D

FC_2122 <- forecast(Exp_Snl, h = 2)

autoplot(FC_2122)


E

Plot1 + xlab('Quarter') + ylab('Sales')


autoplot(train_TS - Exp_Snl$fitted.values, xlab = 'Year', ylab = 'Residual', main = 'Residual Plot')


## Now plot just the residuals for the first and second quarter of each year
plot((train_TS - Exp_Snl$fitted.values)[rep(c(TRUE, TRUE, FALSE, FALSE),5)],
     xaxt = 'n', xlab = 'Quarter', ylab = 'Residual', type = 'b',
     main = 'Plot of First and Second Quarter Residuals for Each Year')
axis(1, at = 1:10, labels = c(1,2,5,6,9,10,13,14,17,18))
abline(h = 0, col = 'red')

It is hard to tell from looking at the above plots, the model doesn’t appear to consistently over- or under-forecast. I would expect the the forecasts to be reasonably close to the real values.


F

There doesn’t seem to be any seasonal pattern to the residuals (indicating that the regression model captured the seasonality well), but there does seem to be an overall U-shaped pattern to the residuals indicating that the model did not capture the trend well.


G

Adding a quadratic component to the exponential trend and seasonality model would be the better approach.



Question 5

SouvSales <- read_csv('SouvenirSales.csv')
SouvSales_TS <- ts(SouvSales$Sales, start = c(1995, 1), frequency = 12)
nValid <- 12
nTrain <- length(SouvSales_TS) - nValid
trainSS_TS <- window(SouvSales_TS, end = c(1995, nTrain))
validSS_TS <- window(SouvSales_TS, start = c(1995, nTrain + 1))


A

From looking at the time-series (fig. 6.14 of textbook) there appears to be both trend and seasonality. Thus there should be predictors for both trend and seasonality in addition to the intercept. There should be 1 predictor for the intercept, 1 for trend, and 11 (12 months - 1) predictors for seasonality. In total there should be 13 predictors in the regression model.


B

Model_A <- tslm(trainSS_TS ~ trend + season)
Plot <- autoplot(trainSS_TS, series = 'Training Data', ylab = 'Sales (Aus$)') + 
  autolayer(Model_A$fitted.values, series = 'Model A')

Plot


i)

summary(Model_A)
## 
## Call:
## tslm(formula = trainSS_TS ~ trend + season)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -12592  -2359   -411   1940  33651 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -3065.55    2640.26  -1.161  0.25029    
## trend         245.36      34.08   7.199 1.24e-09 ***
## season2      1119.38    3422.06   0.327  0.74474    
## season3      4408.84    3422.56   1.288  0.20272    
## season4      1462.57    3423.41   0.427  0.67077    
## season5      1446.19    3424.60   0.422  0.67434    
## season6      1867.98    3426.13   0.545  0.58766    
## season7      2988.56    3427.99   0.872  0.38684    
## season8      3227.58    3430.19   0.941  0.35058    
## season9      3955.56    3432.73   1.152  0.25384    
## season10     4821.66    3435.61   1.403  0.16573    
## season11    11524.64    3438.82   3.351  0.00141 ** 
## season12    32469.55    3442.36   9.432 2.19e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5927 on 59 degrees of freedom
## Multiple R-squared:  0.7903, Adjusted R-squared:  0.7476 
## F-statistic: 18.53 on 12 and 59 DF,  p-value: 9.435e-16

November (coeff = 11524.64) and December (coeff = 32469.55) tend to have the highest average sales, this makes sense because this is tourist season in Australia.


ii)

The trend coefficient of 245.36 means that souvenir sales are predicted to increase by $245.36 (not including changes due to seasonality).


C

Model_B <- tslm(trainSS_TS ~ trend + season, lambda = 0)

Plot <- Plot + autolayer(Model_B$fitted.values, series = 'Model B')

Plot


i)

Exponential trend


ii)

The estimated trend coefficient of \(0.02112\) indicates that (ignoring seasonality) souvenir sales are estimated to increase by \(e^{0.02112}\) fold (or \((1 - e^{0.02112})\%\)) in one month (one time unit).


iii)

FC_B <- forecast(Model_B, h = 12)
February <- FC_B$mean[2]
print(paste('Forecast for souvenir sales in February 2002 is:',
            as.numeric(February),
            'Australian dollars.'))
## [1] "Forecast for souvenir sales in February 2002 is: 13243.0947639253 Australian dollars."


D

library(pander)

FC_A <- forecast(Model_A, h = 12)
accuracy(FC_A, validSS_TS)
##                         ME      RMSE       MAE       MPE     MAPE     MASE
## Training set -5.684342e-14  5365.199  3205.089  6.967778 36.75088 0.855877
## Test set      8.251513e+03 17451.547 10055.276 10.533974 26.66568 2.685130
##                   ACF1 Theil's U
## Training set 0.4048039        NA
## Test set     0.3206228 0.9075924

pandoc.table(accuracy(FC_A, validSS_TS),
             caption = 'Forecast Accuracy Metrics for Model A (12 month horizon)',
             split.table = Inf)
## 
## ---------------------------------------------------------------------------------------------
##       &nbsp;            ME       RMSE     MAE     MPE    MAPE     MASE     ACF1    Theil's U 
## ------------------ ------------ ------- ------- ------- ------- -------- -------- -----------
##  **Training set**   -5.684e-14   5365    3205    6.968   36.75   0.8559   0.4048      NA     
## 
##    **Test set**        8252      17452   10055   10.53   26.67   2.685    0.3206    0.9076   
## ---------------------------------------------------------------------------------------------
## 
## Table: Forecast Accuracy Metrics for Model A (12 month horizon)

pandoc.table(accuracy(FC_B, validSS_TS),
             caption = 'Forecast Accuracy Metrics for Model B (12 month horizon)',
             split.table = Inf)
## 
## ---------------------------------------------------------------------------------------
##       &nbsp;         ME     RMSE   MAE     MPE     MAPE     MASE     ACF1    Theil's U 
## ------------------ ------- ------ ------ -------- ------- -------- -------- -----------
##  **Training set**   197.5   2865   1671   -1.473   13.94   0.4463   0.4381      NA     
## 
##    **Test set**     4824    7101   5192   12.36    15.52   1.386    0.4245     0.461   
## ---------------------------------------------------------------------------------------
## 
## Table: Forecast Accuracy Metrics for Model B (12 month horizon)

plot(as.numeric(validSS_TS - FC_A$mean),
     type = 'p', col = 'darkgreen', pch = 16, 
     xaxt = 'n', 
     ylab = 'Residual (Aus$)', xlab = 'Month', 
     main = '2002 Forecast Residuals')
points(as.numeric(validSS_TS - FC_B$mean), 
       col = 'blue', pch = 18, cex = .95)
abline(h = 0, col = 'red')
axis(1, at = 1:12, labels = c('Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
                              'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'))
legend('topleft',
       legend = c('Model A', 'Model B'),
       pch = c(16,18),
       col = c('darkgreen', 'blue'))

Model B is preferable to model A. Model B’s RMSE, MAPE, and MASE are all lower for the validation period than Model A’s performance metrics. Also, from looking at the residual plot, we can see that Model A performs especially poorly for the months of November and December. This is an issue because these are the months of the tourist season, the most important sales months of the year, so it is important to have accurate forecasts for these months.


E

If the goal was to understand the different components of sales I would have used Error Trend Seasonality (ets) smoothing methods. This way we could look at more types of trend and seasonality, such as additive trend with multiplicative seasonality. The regression models we used only looked at additive trend with additive seasonality and multiplicative trend with multiplicative seasonality. Additionally, the weighting parameters in the ets models would tell us how much Sales dependended on historical values. Finally, a quick and easy way to get a look at the different components in Sales would be to use the decompose function in R as shown below.