library(fpp)
library(fma)
library(forecast)
library(ggplot2)
Work <- read.csv("CanadianWorkHours.csv", stringsAsFactors = FALSE)
WorkTS <- ts(Work$HoursPerWeek, start=c(1966), frequency=1)
yrange = range(WorkTS)
plot(c(1966, 2000), yrange, type="n", xlab="Year", ylab="Average Canadian Work Hours per Week", bty="l", xaxt="n", yaxt="n")
lines(WorkTS, bty="l")
axis(1, at=seq(1966,2000,1), labels=format(seq(1966,2000,1)))
axis(2, at=seq(30,40,1), labels=format(seq(30,40,1)), las=2)
LinearTrend <- tslm(WorkTS ~ trend)
summary(LinearTrend)
##
## Call:
## tslm(formula = WorkTS ~ trend)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.20457 -0.28761 0.04779 0.30210 1.23190
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 37.416134 0.175818 212.811 < 2e-16 ***
## trend -0.061373 0.008518 -7.205 2.93e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.509 on 33 degrees of freedom
## Multiple R-squared: 0.6113, Adjusted R-squared: 0.5996
## F-statistic: 51.91 on 1 and 33 DF, p-value: 2.928e-08
The series has an adjusted R-squared of 0.6, which means that the linear trend explains 60% of the variation in hours worked.
LinearTrendSeason <- tslm(WorkTS ~ trend + season)
summary(LinearTrendSeason)
This series does not have explicit seasonality. Because of that linear trend with seasonality is not an option.
QuadTrend <- tslm(WorkTS ~ trend + I(trend^2))
summary(QuadTrend)
##
## Call:
## tslm(formula = WorkTS ~ trend + I(trend^2))
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.94503 -0.20964 -0.01652 0.31862 0.60160
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 38.1644156 0.2176599 175.340 < 2e-16 ***
## trend -0.1827154 0.0278786 -6.554 2.21e-07 ***
## I(trend^2) 0.0033706 0.0007512 4.487 8.76e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4049 on 32 degrees of freedom
## Multiple R-squared: 0.7614, Adjusted R-squared: 0.7465
## F-statistic: 51.07 on 2 and 32 DF, p-value: 1.1e-10
This is a much better fit as compared to linear trend, as illustrated by the adjusted R-squared value of 0.75.This means that the quadratic trend explains 75% of the variation in hours worked.
QuadTrendSeason <- tslm(WorkTS ~ trend + I(trend^2) + season)
summary(QuadTrendSeason)
This series does not have explicit seasonality. Because of that quadratic trend with seasonality is not an option.
plot(c(1966, 2000), yrange, type="n", xlab="Year", ylab="Average Canadian Work Hours per Week", bty="l", xaxt="n", yaxt="n")
lines(WorkTS, bty="l")
axis(1, at=seq(1966,2000,1), labels=format(seq(1966,2000,1)))
axis(2, at=seq(30,40,1), labels=format(seq(30,40,1)), las=2)
lines(LinearTrend$fitted, col="red")
lines(QuadTrend$fitted, col="green")
legend(1966,36, c("Average Work Hours", "Linear Trend", "Quadratic Trend"), lty=c(1,1,1), col=c("black", "red", "green"), bty="n")
Based on the chart above, quadratic trend model fits the series best as it is able to capture not only the downward trend, but also the upward trend that starts in the late 1980s. Looking at the Adjusted R-squared values for each model, we also see that the model that used a quadratic trend had the highest explanation of variation in the average work hours.
No
Yes, this is used to fit an exponential trend. In R this is done by setting lambda to equal zero in the tslm function.
No
No
DeptSales <- read.csv("DeptStoreSales.csv")
SalesTS <- ts(DeptSales$Sales, start=c(1,1), freq=4)
ValidLength <- 4
TrainLength <- length(SalesTS) - ValidLength
TrainTS <- window(SalesTS, start= c(1,1), end= c(1, TrainLength))
ValidTS <- window(SalesTS, start= c(1, TrainLength+1), end= c(1, TrainLength+ValidLength))
ExpoModel <- tslm(TrainTS ~ trend + season, lambda=0)
summary(ExpoModel)
##
## Call:
## tslm(formula = TrainTS ~ trend + season, lambda = 0)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.053524 -0.013199 -0.004527 0.014387 0.062681
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.748945 0.018725 574.057 < 2e-16 ***
## trend 0.011088 0.001295 8.561 3.70e-07 ***
## season2 0.024956 0.020764 1.202 0.248
## season3 0.165343 0.020884 7.917 9.79e-07 ***
## season4 0.433746 0.021084 20.572 2.10e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.03277 on 15 degrees of freedom
## Multiple R-squared: 1, Adjusted R-squared: 1
## F-statistic: 7.63e+11 on 4 and 15 DF, p-value: < 2.2e-16
Even though the Season 2 coefficient is 0.024, which appears to be slightly higher than that shown in Season 1. However, the P-value illustrates the fact that Season 2 is not statistically different from Season 1. Which means that the average sales in Season 2 are about the same as in Season 1.
validPeriodForecasts <- forecast(ExpoModel, h=ValidLength)
validPeriodForecasts
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 6 Q1 58793.71 55790.19 61958.92 54090.84 63905.46
## 6 Q2 60951.51 57837.76 64232.89 56076.04 66250.87
## 6 Q3 70920.09 67297.09 74738.13 65247.24 77086.15
## 6 Q4 93788.66 88997.41 98837.86 86286.57 101943.01
yrange = range(SalesTS)
plot(c(1,7), yrange, type="n", xlab="Year", ylab="Sales Dollars in Thousands", bty="l", xaxt="n", yaxt="n")
lines(SalesTS, bty="l")
axis(1, at=seq(1,24,1), labels=format(seq(1,24,1)))
axis(2, at=seq(45000,105000,10000), labels=format(seq(45,105,10)), las=1)
lines(ExpoModel$fitted,col="red")
lines(validPeriodForecasts$mean,col="blue",lty=2)
legend(1,100000, c("Actual Sales", "Exponential Trend+Season", "Forecasts"), lty=c(1,1,2), col=c("black", "red", "blue"), bty="n")
plot(ExpoModel$residuals, type="o", bty="l", main="Residuals for Training Period - Log-Transformed Units")
plot(TrainTS - ExpoModel$fitted, type="o", bty="l")
title(main = "Residuals - Original Scale")
ErrorValid <- ValidTS - validPeriodForecasts$mean
plot(ErrorValid, type="o", bty="l", main="Residuals in Validation Period")
The forecast for Q21 and Q22 are likely to be lower as compared to the actual sales.
The first chart indicates a relatively small under-forecasting, as compared to actual sales. This is confirmed by the residuals plot: the parabolic shape indicates under-forecasting in the beginning and the end of the series. Additionally, residuals in the plot showing validation period confirm under-forecasting.
FALSE. The chart above shows seasonality.
TRUE. The model works well with the seasonality and trend, as confirmed by the charts above.
FALSE. However, it is not a perfect fit as illustrated by the parabolic shape of the residuals plot.
NO.
Yes, to address the parabolic shape of residuals.
Based on the plots, it is clear that the forecasting model for this series should include both trend and seasonality predictors. Similar to the example in the book, the quadratic trend and monthly seasonality implies 13 predictors: 11 dummy varilables for each month after January and t and t-squared for trend.
SouvenirData <- read.csv("SouvenirSales.csv")
SouvenirTS <- ts(SouvenirData$Sales, start=c(1995, 1), frequency=12)
yrange = range(SouvenirTS)
ValidLength1 <- 12
TrainLength1 <- length(SouvenirTS) - ValidLength1
SouvenirTrain <- window(SouvenirTS, end=c(1995, TrainLength1))
SouvenirValid <- window(SouvenirTS, start=c(1995, TrainLength1+1))
ModelA <- tslm(SouvenirTrain ~ trend + season)
summary(ModelA)
##
## Call:
## tslm(formula = SouvenirTrain ~ trend + season)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12592 -2359 -411 1940 33651
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3065.55 2640.26 -1.161 0.25029
## trend 245.36 34.08 7.199 1.24e-09 ***
## season2 1119.38 3422.06 0.327 0.74474
## season3 4408.84 3422.56 1.288 0.20272
## season4 1462.57 3423.41 0.427 0.67077
## season5 1446.19 3424.60 0.422 0.67434
## season6 1867.98 3426.13 0.545 0.58766
## season7 2988.56 3427.99 0.872 0.38684
## season8 3227.58 3430.19 0.941 0.35058
## season9 3955.56 3432.73 1.152 0.25384
## season10 4821.66 3435.61 1.403 0.16573
## season11 11524.64 3438.82 3.351 0.00141 **
## season12 32469.55 3442.36 9.432 2.19e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5927 on 59 degrees of freedom
## Multiple R-squared: 0.7903, Adjusted R-squared: 0.7476
## F-statistic: 18.53 on 12 and 59 DF, p-value: 9.435e-16
As illustrated by the table, December has the highest average sales of the year, with November being the second highest. This can be explained by the holiday season as well as by increased number of tourists coming from the Northern hemispere.
The trend coeffient of the model A is 245.36, which represents the increase per period in the model’s trend. What it means is we can expect a monthly increase of roughly AU $245 from the prior month before considering the seasonality impact of each period.
ModelB <- tslm(log(SouvenirTrain) ~ trend + season)
summary(ModelB)
##
## Call:
## tslm(formula = log(SouvenirTrain) ~ trend + season)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.4529 -0.1163 0.0001 0.1005 0.3438
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.646363 0.084120 90.898 < 2e-16 ***
## trend 0.021120 0.001086 19.449 < 2e-16 ***
## season2 0.282015 0.109028 2.587 0.012178 *
## season3 0.694998 0.109044 6.374 3.08e-08 ***
## season4 0.373873 0.109071 3.428 0.001115 **
## season5 0.421710 0.109109 3.865 0.000279 ***
## season6 0.447046 0.109158 4.095 0.000130 ***
## season7 0.583380 0.109217 5.341 1.55e-06 ***
## season8 0.546897 0.109287 5.004 5.37e-06 ***
## season9 0.635565 0.109368 5.811 2.65e-07 ***
## season10 0.729490 0.109460 6.664 9.98e-09 ***
## season11 1.200954 0.109562 10.961 7.38e-16 ***
## season12 1.952202 0.109675 17.800 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1888 on 59 degrees of freedom
## Multiple R-squared: 0.9424, Adjusted R-squared: 0.9306
## F-statistic: 80.4 on 12 and 59 DF, p-value: < 2.2e-16
It is equivalent to fitting an exponential trend model to Sales.
The trend coefficient estimates the percentage increase of each period compared to the previous period, before applying seasonality. In our example, it would be a 2.1% increase each month.
Feb2002Forecast <- ModelB$coefficients["(Intercept)"] + ModelB$coefficients["trend"]*86 + ModelB$coefficients["season2"]
exp(Feb2002Forecast)
## (Intercept)
## 17062.99
ModelAForecast <- forecast(ModelA, h = ValidLength1)
accuracy(ModelAForecast$mean, SouvenirValid)
## ME RMSE MAE MPE MAPE ACF1 Theil's U
## Test set 8251.513 17451.55 10055.28 10.53397 26.66568 0.3206228 0.9075924
ModelBForecast <- forecast(ModelB, h = ValidLength1)
accuracy(exp(ModelBForecast$mean), SouvenirValid)
## ME RMSE MAE MPE MAPE ACF1 Theil's U
## Test set 4824.494 7101.444 5191.669 12.35943 15.5191 0.4245018 0.4610253
plot(SouvenirValid, bty="l", xaxt="n", xlab="Year: 2001", yaxt="n", ylab="Sales (A$)")
axis(1, at=seq(2001,2001+11/12,1/12), labels=c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"))
axis(2, at=seq(0,100000,10000), labels=format(seq(0,100,10)), las=2)
lines(ModelAForecast$mean, col="blue")
lines(exp(ModelBForecast$mean), col="red")
legend(2001,85000, c("Actuals", "Model A (Linear & Season)", "Model B (log(y) Linear + Season)"), lty=c(1,1,1), col=c("black", "blue", "red"), bty="n")
Model B is preferred because of the following reasons:
If the goal was to understand the different components of sales in the souvenir shop, I would model the data based on the classification of the souvenirs sold as opposed to the total dollar amount.Taking it one step further, I would also recommend collecting demographic information of the customers and the items they purchase to develop marketing campaigns depending on features of the most profitable cluster of clients.
That said, the information that is being collected now is incredibly useful in terms of informing the owners of their inventory needs as well as their staffing neeeds. Knowing that November and December are the most profitable months of the year - assuming because of the holidays - can be used to capitalize on holiday cheer by offering more holiday souvenirs, decorations and perhaps even costumes.