Question 2

Linear Trend Model

can_ts <- ts(can_df$HoursPerWeek, start=c(1966), frequency=1)
can_lm <- tslm(can_ts ~ trend)
summary(can_lm)
## 
## Call:
## tslm(formula = can_ts ~ trend)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.20457 -0.28761  0.04779  0.30210  1.23190 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 37.416134   0.175818 212.811  < 2e-16 ***
## trend       -0.061373   0.008518  -7.205 2.93e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.509 on 33 degrees of freedom
## Multiple R-squared:  0.6113, Adjusted R-squared:  0.5996 
## F-statistic: 51.91 on 1 and 33 DF,  p-value: 2.928e-08

Linear Trend Model With Seasonality

After review, it is not possible to model seasonality using annual data.

Quadratic Trend Model

can_qm <- tslm(can_ts ~ trend + I(trend^2))
summary(can_qm)
## 
## Call:
## tslm(formula = can_ts ~ trend + I(trend^2))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.94503 -0.20964 -0.01652  0.31862  0.60160 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 38.1644156  0.2176599 175.340  < 2e-16 ***
## trend       -0.1827154  0.0278786  -6.554 2.21e-07 ***
## I(trend^2)   0.0033706  0.0007512   4.487 8.76e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4049 on 32 degrees of freedom
## Multiple R-squared:  0.7614, Adjusted R-squared:  0.7465 
## F-statistic: 51.07 on 2 and 32 DF,  p-value: 1.1e-10

Quadratic Trend Model With Seasonality

Again it’s found after review, it is not possible to model seasonality using annual data.

Question 4

Part A

Take a logarithm of sales

An exponential trend model should begin with an exploration of the logarithmic output (or y) variable

Part B

## 
## Call:
## tslm(formula = dept_train ~ trend + season, lambda = 0)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.053524 -0.013199 -0.004527  0.014387  0.062681 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 10.748945   0.018725 574.057  < 2e-16 ***
## trend        0.011088   0.001295   8.561 3.70e-07 ***
## season2      0.024956   0.020764   1.202    0.248    
## season3      0.165343   0.020884   7.917 9.79e-07 ***
## season4      0.433746   0.021084  20.572 2.10e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03277 on 15 degrees of freedom
## Multiple R-squared:      1,  Adjusted R-squared:      1 
## F-statistic: 7.63e+11 on 4 and 15 DF,  p-value: < 2.2e-16

Part C

Although visually it would appear that quarter two generally results in higher sales figures, the second quarter (season2) output is insignificant (.248) when valued statistically (p < .05).

Part D

dept_f <- forecast(dept_exp, h=2)
kable(dept_f)
6 Q1 58793.71 55790.19 61958.92 54090.84 63905.46
6 Q2 60951.51 57837.76 64232.89 56076.04 66250.87

Part E

i

plot(dept_train, type = "o")
lines(dept_exp$fitted, col = "blue")

plot(dept_train - dept_exp$fitted.values, type ="o")

ii

The U-Shape of the residuals shows that the extremes of the series (beginning/end) may be under-valued. As such, the estimates for quarters 21 and 22 may be slightly undervalued when compared to their actual values. We would prefer for these residuals to appear as white noise without a discernable pattern visible within the data.

Part F

The trend in the data is not captured well by the model

As described above, the U-Shape evident within the residual plot reveals that that the trend may not be captured adequately within the model. Seasonality would have been revealed through more oscillating trends in the residuals.

Part G

Based on the U-Shape mentioned previously, a quadratic trend model to the residuals (with Quarter and Quarter^2) would be an adequate and parsimoniou solution for improving the model’s fit.

Question 5

Part A

The time-series appears to contain both trend and seasonality. The upward trend is initially masked visibly by the strong seasonality. From what we can see in the plots, there seems to be monthly seasonality (11 dummy variables) and a linear trend (1 variable) for a total of 12 variables within the series.

Part B

sales_df <- read.csv("SouvenirSales.csv",stringsAsFactors = FALSE)
sales_ts <- ts(sales_df$Sales,start = c(1995, 1), frequency = 12)
test_len <- 12
train_len <- length(sales_ts) - test_len
sales_train <- window(sales_ts, end = c(1995, train_len))
sales_test <- window(sales_ts, start = c(1995, train_len + 1))
a <- tslm(sales_train ~ trend + season)
summary(a)
## 
## Call:
## tslm(formula = sales_train ~ trend + season)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -12592  -2359   -411   1940  33651 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -3065.55    2640.26  -1.161  0.25029    
## trend         245.36      34.08   7.199 1.24e-09 ***
## season2      1119.38    3422.06   0.327  0.74474    
## season3      4408.84    3422.56   1.288  0.20272    
## season4      1462.57    3423.41   0.427  0.67077    
## season5      1446.19    3424.60   0.422  0.67434    
## season6      1867.98    3426.13   0.545  0.58766    
## season7      2988.56    3427.99   0.872  0.38684    
## season8      3227.58    3430.19   0.941  0.35058    
## season9      3955.56    3432.73   1.152  0.25384    
## season10     4821.66    3435.61   1.403  0.16573    
## season11    11524.64    3438.82   3.351  0.00141 ** 
## season12    32469.55    3442.36   9.432 2.19e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5927 on 59 degrees of freedom
## Multiple R-squared:  0.7903, Adjusted R-squared:  0.7476 
## F-statistic: 18.53 on 12 and 59 DF,  p-value: 9.435e-16
plot(a, main = "Model A")

i

The highest coefficients (highest positive variation from month (season) 1) are in months 11 and 12. At first this would seem related to Christmas sales, but on further reflection it is most likely due to Australia’s location in the Southern hemisphere. November and December would mark the beginning of the summer season, which would most likely result in a large uptick in sales.

ii

The trend coefficient of 245.36 is the slope (m) within a linear model (y = mx + b). This means that although their may be seasonal variation, the model overall maintains a regressive slope value of 245.36 over ever x (month) variable.

Part C

i

b <- tslm(log(sales_train) ~ trend + season)
summary(b)
## 
## Call:
## tslm(formula = log(sales_train) ~ trend + season)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.4529 -0.1163  0.0001  0.1005  0.3438 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 7.646363   0.084120  90.898  < 2e-16 ***
## trend       0.021120   0.001086  19.449  < 2e-16 ***
## season2     0.282015   0.109028   2.587 0.012178 *  
## season3     0.694998   0.109044   6.374 3.08e-08 ***
## season4     0.373873   0.109071   3.428 0.001115 ** 
## season5     0.421710   0.109109   3.865 0.000279 ***
## season6     0.447046   0.109158   4.095 0.000130 ***
## season7     0.583380   0.109217   5.341 1.55e-06 ***
## season8     0.546897   0.109287   5.004 5.37e-06 ***
## season9     0.635565   0.109368   5.811 2.65e-07 ***
## season10    0.729490   0.109460   6.664 9.98e-09 ***
## season11    1.200954   0.109562  10.961 7.38e-16 ***
## season12    1.952202   0.109675  17.800  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1888 on 59 degrees of freedom
## Multiple R-squared:  0.9424, Adjusted R-squared:  0.9306 
## F-statistic:  80.4 on 12 and 59 DF,  p-value: < 2.2e-16
plot(b, main = "Model B")

Fitting the logarithmic value of sales with a linear trend is the same as fitting an exponential trend to a model of Sales.

ii

Much like with the trend coefficient in the previous problem, this coefficient represents the slope of the model. Rather than being in terms of dollars however, this coefficient (as a log value) represents a % growth value over every (x) period, in this case the growth is slightly larger than 2% per period. This of course will be varied by seasonal variations.

iii

b_f <- forecast(b, h = 14)
b_f
##          Point Forecast     Lo 80     Hi 80     Lo 95     Hi 95
## Jan 2001       9.188097  8.917220  9.458974  8.769890  9.606304
## Feb 2001       9.491232  9.220354  9.762109  9.073024  9.909439
## Mar 2001       9.925335  9.654457 10.196212  9.507127 10.343542
## Apr 2001       9.625329  9.354452  9.896207  9.207122 10.043537
## May 2001       9.694286  9.423408  9.965163  9.276078 10.112493
## Jun 2001       9.740741  9.469864 10.011619  9.322534 10.158949
## Jul 2001       9.898195  9.627318 10.169072  9.479988 10.316402
## Aug 2001       9.882831  9.611954 10.153708  9.464624 10.301038
## Sep 2001       9.992619  9.721742 10.263496  9.574412 10.410826
## Oct 2001      10.107664  9.836787 10.378542  9.689457 10.525872
## Nov 2001      10.600248 10.329370 10.871125 10.182040 11.018455
## Dec 2001      11.372615 11.101738 11.643493 10.954408 11.790823
## Jan 2002       9.441533  9.166476  9.716590  9.016873  9.866193
## Feb 2002       9.744667  9.469610 10.019724  9.320007 10.169327
feb <- 9.744667
exp(9.744667)
## [1] 17062.99

Part D

af <- forecast(a, h = test_len)
bf <- forecast(b, h = test_len)
accuracy(af$mean, sales_test)
##                ME     RMSE      MAE      MPE     MAPE      ACF1 Theil's U
## Test set 8251.513 17451.55 10055.28 10.53397 26.66568 0.3206228 0.9075924
accuracy(exp(bf$mean), sales_test)
##                ME     RMSE      MAE      MPE    MAPE      ACF1 Theil's U
## Test set 4824.494 7101.444 5191.669 12.35943 15.5191 0.4245018 0.4610253

It would appear that both the RMSE and MAPE are lower for the second model (B), these are both valid measures of a model’s accuracy and with an MAPE almost 50% of the value of a, b would seem to be a better fit overall. We can also plot the models against the validation set to visually decide on the best fit.

plot(sales_test)

lines(af$mean, col="red")
lines(exp(bf$mean), col="blue")

The blue line seems to be the best fit to the actual validation set (black), and lo and behold, its our model B, the RMSE and MAPE proved to be good predictors all along.

Part E

If we’re looking into descriptive rather than prescriptive analysis, we can concentrate our efforts on the heavy hitting months each year. Instead of generating foreccast patterns we can consolidate all data to really define the trends and seasonality differences that have become clear over time. Also, we may want to consider the presence of a multiplicative trend/seasonality emerging in the last few periods of the model based on the previous estimates.