This set of analysis looks at

This page briefly describes some approaches to linear forecasting using Canadian Working Hours. The forecasting approaches use the regression models. Specifically three types which include linear, polynomial and quadratic.

# read the data

raw <- read.csv("CanadianWorkHours.csv", header=TRUE)

Including Plots

This first plot is really simple. It shows the data and is part of the human ‘eyeballing the data’ step.

The next steps include creating some charts which show different trend lines. This is so we can see which shapes may fit better to the data.

Let’s convert the data to a time series object. This is useful for plotting and forecasting. Recreate the time plot and overlay different types of trend lines in order to determine a suitable trend shape.

CanadianWorkHours.ts <- ts(raw[,2], start=1966, end=2000, frequency =1)
plot(CanadianWorkHours.ts)

So, for first step lets create a linear trend line and add it. The tslm function I think stands for Time Series Linear Model.

CanadianWorkHours.ts.lm <- tslm(CanadianWorkHours.ts ~ trend)
plot(CanadianWorkHours.ts)
lines(CanadianWorkHours.ts.lm$fitted, lwd=2)

This chart shows us how the linear model fits the data. For the first part it fits the data pretty well, but after 1980 it doesn’t forecast the fall well and overforecasts. Towards the end of the 1980s those hard working Canadians start to spend more time in work, but the model completely misses the upturn, and by the late 1990s is underpredicting.

So, how about an exponential trend. Well, to do this in r you can use the same code as above, but set the lambda value to zero.

CanadianWorkHours.ts.exp <- tslm(CanadianWorkHours.ts ~ trend, lambda = 0)
plot(CanadianWorkHours.ts)
lines(CanadianWorkHours.ts.exp$fitted, lwd=2)

This didn’t create much of a difference.

So the final type of regression model I would use here would be polynomial. A further type (quadratic) is not explored due to time constraints.

CanadianWorkHours.ts.poly <- tslm(CanadianWorkHours.ts ~ trend + I(trend^2))
plot(CanadianWorkHours.ts)
lines(CanadianWorkHours.ts.poly$fitted, lwd=2)

So, this chart shows that we’re doing a bit better in terms of capturing the increase again. It does capture the overall shape, but under-forecasts at the start, over-forecasts in the 1980s. In the very late 1990s, the model under-forecasts again.

The next steps are to fit a regression model to the training period with the trend shape that you find most suitable. This was fairly obviously the polynomial trend (we didn’t look at quadratic).

Partition the data into a training and validation period. This is so we can compare the accuracy of our results. We’re supposed to hold back the last three years as a validation period.

In this section of code I use the short-hand “CWH” to mean “Canadian Working Hours”

cwh.train <- window(CanadianWorkHours.ts, start=1966, end=1997)
cwh.valid <- window(CanadianWorkHours.ts, start=1998, end=2000)

Let’s generate forecasts for the training and validation periods and overlay this series of forecasts onto the plot of the original series

fit <- tslm(cwh.train ~ trend + I(trend^2))
cwh.forecast <- forecast(fit, h=3)
plot(cwh.forecast)
lines(cwh.valid)

So, the model we created, although it fitted the data better overall, still did a fairly poor job at forecasting the final 1990s du to a futher upturn which the model didn’t adequately detect.