indigo_data <- read.csv("~/Dropbox (Personal)/ECON4210/Assignment Two/Question One/indigo_data.csv")
summary(indigo_data)
## year quarter sales
## Min. :2002 Min. :1.000 Min. :135.0
## 1st Qu.:2006 1st Qu.:1.500 1st Qu.:181.5
## Median :2009 Median :2.000 Median :194.0
## Mean :2009 Mean :2.476 Mean :222.8
## 3rd Qu.:2013 3rd Qu.:3.000 3rd Qu.:251.0
## Max. :2017 Max. :4.000 Max. :400.0
indigo_data = ts(indigo_data, start=2002, frequency=4)
data = indigo_data[,3]
plot(indigo_data[,3],main="Indigo Quarterly Sales", xlab="Year",ylab="Millions")
train <- window(data,start=c(2002, 1),end=c(2014, 2))
test <- window(data, start=c(2014,2),end=c(2017, 3))
both <- window(data,start=c(2002, 1))
h=length(test)
Indigofit1 <- meanf(train, h=h)
plot(Indigofit1,main="Indigo Quarterly Sales Average Forecast", xlab="Year",ylab="Millions")
Indigofit2 <- naive(train, h=h)
plot(Indigofit2,main="Indigo Quarterly Sales Naive Forecast ", xlab="Year",ylab="Millions")
Indigofit3 <- snaive(train, h=h)
plot(Indigofit3,main="Indigo Quarterly Sales Seasonal Naive Forecast", xlab="Year",ylab="Millions")
Indigofit4 <- tslm(data ~ trend)
f <- forecast(Indigofit4, h=6,level=c(80,95))
plot(f,main="Indigo Quarterly Sales Linear Trend", ylab="Millions", xlab="Year")
lines(fitted(Indigofit4),col="blue")
a1Indigofit1 = accuracy(Indigofit1, test)
a2Indigofit2 = accuracy(Indigofit2, test)
a3Indigofit3 = accuracy(Indigofit3, test)
a.table<-rbind(a1Indigofit1,a2Indigofit2, a3Indigofit3)
row.names(a.table)<-c('Mean training','Mean test', 'Naive training', 'Naive test',
'S. Naive training', 'S. Naive test' )
#Order the table according to MASE
a.table<-as.data.frame(a.table)
a.table<-a.table[order(a.table$MASE),]
a.table
## ME RMSE MAE MPE MAPE
## S. Naive training 2.891304e+00 12.11449 9.934783 1.217238 4.837041
## S. Naive test 2.023077e+01 28.44833 21.000000 7.895729 8.164698
## Mean training -5.680872e-15 59.58758 47.872000 -6.431668 21.411814
## Mean test 3.090769e+01 85.70588 60.261538 4.626308 20.342311
## Naive test 6.830769e+01 105.14825 68.307692 21.109562 21.109562
## Naive training 8.979592e-01 88.40722 68.326531 -5.869436 29.343977
## MASE ACF1 Theil's U
## S. Naive training 1.000000 0.4453930 NA
## S. Naive test 2.113786 0.5470580 0.2896050
## Mean training 4.818626 -0.1011995 NA
## Mean test 6.065713 -0.1526504 0.8400265
## Naive test 6.875610 -0.1526504 1.0194622
## Naive training 6.877506 -0.3814527 NA
kable(a.table, caption="Forecast accuracy",digits = 4 )
ME | RMSE | MAE | MPE | MAPE | MASE | ACF1 | Theil’s U | |
---|---|---|---|---|---|---|---|---|
S. Naive training | 2.8913 | 12.1145 | 9.9348 | 1.2172 | 4.8370 | 1.0000 | 0.4454 | NA |
S. Naive test | 20.2308 | 28.4483 | 21.0000 | 7.8957 | 8.1647 | 2.1138 | 0.5471 | 0.2896 |
Mean training | 0.0000 | 59.5876 | 47.8720 | -6.4317 | 21.4118 | 4.8186 | -0.1012 | NA |
Mean test | 30.9077 | 85.7059 | 60.2615 | 4.6263 | 20.3423 | 6.0657 | -0.1527 | 0.8400 |
Naive test | 68.3077 | 105.1482 | 68.3077 | 21.1096 | 21.1096 | 6.8756 | -0.1527 | 1.0195 |
Naive training | 0.8980 | 88.4072 | 68.3265 | -5.8694 | 29.3440 | 6.8775 | -0.3815 | NA |
When determining how accurate the naïve, seasonal naïve and mean model is, I will be looking at the following: RSME, MAE, MAPE and MASE. It is evident that in all of those, the seasonal naïve model produces the least amount of error. This is evident because:
In every single scenario, the season naive forecast has the lowest number which indicates the smallest error. Therefore, using these accuracies, the season naive forecast is the most accurate.
However, now we must take into consideration another model, the linear trend model.
par(mfrow=c(1,2))
res1 <- ts(resid(Indigofit4), s = 2002, f = 4)
plot.ts(res1, main="Linear Trend Forecast Residuals", ylab = "res(Indigo Sales)")
abline(0, 0)
Acf(res1, main="Autocorrelation of Linear Trend Residuals")
summary(Indigofit4)
##
## Call:
## tslm(formula = data ~ trend)
##
## Residuals:
## Min 1Q Median 3Q Max
## -67.84 -42.30 -28.77 39.25 146.12
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 190.6687 16.3084 11.691 <2e-16 ***
## trend 1.0034 0.4431 2.265 0.0271 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 63.95 on 61 degrees of freedom
## Multiple R-squared: 0.07755, Adjusted R-squared: 0.06243
## F-statistic: 5.128 on 1 and 61 DF, p-value: 0.02711
When we analyze the residuals and the ACFs of the residuals, it is apparent that there are large spikes on lag 4, 8, 12 and 16 going upwards, and large spikes going downward at 2, 6, 10, and 14. This shows that there is a high degree of both pattern (which appears to be seasonal) and autocorrelation of the residuals. Therefore, the estimated model violates the assumption of no autocorrelation in the errors and our forecasts may be inefficient.
If we analyze this further (to determine if seasonality is the better forecast) we can consider the linear trend forecasts summary. When assessing how well the model fit the data, we will first look at the residual statistics which indicate (like our plots) that the distribution of the residuals does not appear to be strongly symmetrical. Taking into consideration that the linear trend forecast shows a high degree of error, and the seasonal naïve forecast shows a small degree of error, I would say that the best forecast is the seasonal naïve approach. With that conclusion, the forecasts for the next 6 periods would be:
as_2 <- read.csv("~/Dropbox (Personal)/ECON4210/Assignment Two/as_2.csv")
summary(as_2)
## date MGA GDP
## 2000Q1 : 1 Min. :2354 Min. :10031
## 2000Q2 : 1 1st Qu.:4623 1st Qu.:12181
## 2000Q3 : 1 Median :6050 Median :14566
## 2000Q4 : 1 Mean :6014 Mean :14406
## 2001Q1 : 1 3rd Qu.:7727 3rd Qu.:16297
## 2001Q2 : 1 Max. :9443 Max. :19058
## (Other):63
as_2 = ts(as_2, start=2000, frequency=4)
plot(jitter(MGA) ~ jitter(GDP),main= "Scatter plot of Magna Sales versus GDP", xlab="GDP (millions)",ylab="MGA (millions)",data=as_2)
fit <- lm(MGA ~ GDP, data=as_2)
abline(fit)
trainData <- window(as_2,start=c(2001, 1),end=c(2014, 4))
testData <- window(as_2, start=c(2015,1),end=c(2017, 1))
both <- window(as_2,start=c(2002, 1))
h=length(testData)
fit1 <- lm(MGA ~ GDP, data=trainData)
plot(jitter(MGA) ~ jitter(GDP), main= "Scatter plot of Magna Sales versus GDP", xlab="GDP (millions)",ylab="Magna Sales (millions)",data=trainData)
abline(fit)
res <- residuals(fit1)
plot(jitter(res)~jitter(GDP), main= "Scatter plot of Residuals for Magna Sales versus GDP", ylab="Residuals", xlab="GDP", data=trainData)
abline(0,0)
This linear regression model fits well. This is evident because the straight line that goes through the plot shows that majority of the points land on or around it, beside one or two outliers. To further verify that this model fits well we must analyze the scatterplot of the residuals against the predictor variable. The residual plot shows that there is a high degree of randomness and no clear pattern, as the residuals are both positive and negative, spread out, and are relatively symmetrically distributed. This indicates that a simple linear regression is appropriate for this data.
x = as_2[,3]
fit2 = rwf(trainData[,3],h=9,drift=T,level=c(80,95),fan=FALSE,lambda=NULL)
plot(fit2,main="GDP Forecast using Random Walk with Drift", xlab="Year",ylab="Millions")
h=length(testData)
head(fit2)
## $method
## [1] "Random walk with drift"
##
## $level
## [1] 80 95
##
## $x
## Qtr1 Qtr2 Qtr3 Qtr4
## 2001 10508.1 10638.4 10639.5 10701.3
## 2002 10834.4 10934.8 11037.1 11103.8
## 2003 11230.1 11370.7 11625.1 11816.8
## 2004 11988.4 12181.4 12367.7 12562.2
## 2005 12813.7 12974.1 13205.4 13381.6
## 2006 13648.9 13799.8 13908.5 14066.4
## 2007 14233.2 14422.3 14569.7 14685.3
## 2008 14668.4 14813.0 14843.0 14549.9
## 2009 14383.9 14340.4 14384.1 14566.5
## 2010 14681.1 14888.6 15057.7 15230.2
## 2011 15238.4 15460.9 15587.1 15785.3
## 2012 15973.9 16121.9 16227.9 16297.3
## 2013 16475.4 16541.4 16749.3 16999.9
## 2014 17031.3 17320.9 17622.3 17735.9
##
## $mean
## Qtr1 Qtr2 Qtr3 Qtr4
## 2015 17867.31 17998.73 18130.14 18261.56
## 2016 18392.97 18524.39 18655.80 18787.22
## 2017 18918.63
##
## $lower
## 80% 95%
## 2015 Q1 17733.42 17662.54
## 2015 Q2 17807.66 17706.52
## 2015 Q3 17894.06 17769.08
## 2015 Q4 17986.57 17840.99
## 2016 Q1 18082.88 17918.73
## 2016 Q2 18181.83 18000.50
## 2016 Q3 18282.73 18085.24
## 2016 Q4 18385.13 18172.28
## 2017 Q1 18488.73 18261.15
##
## $upper
## 80% 95%
## 2015 Q1 18001.21 18072.09
## 2015 Q2 18189.80 18290.94
## 2015 Q3 18366.23 18491.21
## 2015 Q4 18536.55 18682.12
## 2016 Q1 18703.06 18867.21
## 2016 Q2 18866.94 19048.28
## 2016 Q3 19028.87 19226.36
## 2016 Q4 19189.30 19402.15
## 2017 Q1 19348.53 19576.11
fcast <- forecast(fit1, newdata=data.frame(GDP=c(17867.31,17998.73,18130.14,18261.56,18392.97,18524.39, 18655.80,18787.22,18918.63)))
plot(fcast, ylab="% change in Magna Sales", xlab="% change in GDP")
prediction <- predict(fit1, newdata = data.frame(testData))
head(fcast)
## $model
##
## Call:
## lm(formula = MGA ~ GDP, data = trainData)
##
## Coefficients:
## (Intercept) GDP
## -5762.9113 0.8251
##
##
## $mean
## 1 2 3 4 5 6 7 8
## 8979.725 9088.162 9196.591 9305.028 9413.457 9521.894 9630.323 9738.760
## 9
## 9847.188
##
## $lower
## [,1] [,2]
## 1 7964.209 7410.474
## 2 8070.606 7515.759
## 3 8176.931 7620.937
## 4 8283.199 7726.023
## 5 8389.396 7831.002
## 6 8495.537 7935.891
## 7 8601.607 8040.675
## 8 8707.622 8145.370
## 9 8813.567 8249.960
##
## $upper
## [,1] [,2]
## 1 9995.242 10548.98
## 2 10105.719 10660.57
## 3 10216.251 10772.25
## 4 10326.857 10884.03
## 5 10437.518 10995.91
## 6 10548.251 11107.90
## 7 10659.038 11219.97
## 8 10769.897 11332.15
## 9 10880.810 11444.42
##
## $level
## [1] 80 95
##
## $x
## [1] 2863 2817 2517 2829 3121 2896 2962 3443 3496 3660 3566 4623 5103 5113
## [15] 4784 5653 5718 5858 5381 5854 6019 6369 5424 6368 6423 6731 6077 6836
## [29] 6622 6713 5533 4836 3574 3705 4669 5419 5512 6050 5942 6598 7189 7338
## [43] 6970 7251 7666 7727 7411 8033 8361 8962 8338 9174 8455 8911 8247 8790
Therefore, the forecasts for Magna sales over the test period are as follows:
Forcasting Magna sales using a Linear Trend
data = trainData[,2]
fit3 <- tslm(data ~ trend)
f <- forecast(fit3, h=9,level=c(80,95))
plot(f,main="Magna Sales using Linear trend Forecast", ylab="Millions", xlab="Year")
lines(fitted(fit3),col="blue")
head(f)
## $model
##
## Call:
## tslm(formula = data ~ trend)
##
## Coefficients:
## (Intercept) trend
## 2974.0 100.2
##
##
## $mean
## Qtr1 Qtr2 Qtr3 Qtr4
## 2015 8686.675 8786.896 8887.118 8987.340
## 2016 9087.561 9187.783 9288.005 9388.226
## 2017 9488.448
##
## $lower
## [,1] [,2]
## 2015 Q1 7477.222 6817.739
## 2015 Q2 7575.213 6914.513
## 2015 Q3 7673.130 7011.174
## 2015 Q4 7770.975 7107.724
## 2016 Q1 7868.749 7204.162
## 2016 Q2 7966.451 7300.491
## 2016 Q3 8064.082 7396.710
## 2016 Q4 8161.643 7492.820
## 2017 Q1 8259.134 7588.821
##
## $upper
## [,1] [,2]
## 2015 Q1 9896.127 10555.61
## 2015 Q2 9998.580 10659.28
## 2015 Q3 10101.106 10763.06
## 2015 Q4 10203.704 10866.96
## 2016 Q1 10306.374 10970.96
## 2016 Q2 10409.115 11075.08
## 2016 Q3 10511.927 11179.30
## 2016 Q4 10614.809 11283.63
## 2017 Q1 10717.762 11388.07
##
## $level
## [1] 80 95
##
## $x
## Qtr1 Qtr2 Qtr3 Qtr4
## 2001 2863 2817 2517 2829
## 2002 3121 2896 2962 3443
## 2003 3496 3660 3566 4623
## 2004 5103 5113 4784 5653
## 2005 5718 5858 5381 5854
## 2006 6019 6369 5424 6368
## 2007 6423 6731 6077 6836
## 2008 6622 6713 5533 4836
## 2009 3574 3705 4669 5419
## 2010 5512 6050 5942 6598
## 2011 7189 7338 6970 7251
## 2012 7666 7727 7411 8033
## 2013 8361 8962 8338 9174
## 2014 8455 8911 8247 8790
Therefore, the forecasts for Magna sales over the test period are as follows: ######2015,1: 8686.675 ######2015,2: 8786.896 ######2015,3: 8887.118 ######2015,4: 8987.340 ######2016,1: 9087.561 ######2016,2: 9187.783 ######2016,3: 9288.005 ######2016,4: 9388.226 ######2017,1: 9488.448
Comparing the accuracy of the forecasting models
par(mfrow=c(2,2))
#linear regression
res2 <- ts(resid(fcast),s=2000,f=4)
plot.ts(res2,main="Linear Regression Forecast Residuals", ylab="res (Magna Sales)")
abline(0,0)
Acf(res2, main="Autocorrelation of Linear Regression Residuals")
#linear trend
res1 <- ts(resid(fit3),s=2000,f=4)
plot.ts(res1, main="Linear Trend Forecast Residuals", ylab = "res (Magna Sales)")
abline(0,0)
Acf(res1, main="Autocorrelation of Linear Trend Residuals")
To compare the accuracy of the two forecasting models, we examine the residuals and the ACFs of the residuals. Although both ACF’s show significant spikes both up and down near the lower and high lags, the autocorrelation of the linear trend model residuals show that more lines extend beyond the blue lines, meaning there is more autocorrelation in the results for that model. Therefore, the linear trend estimated model violates the assumption of no autocorrelation in the errors more than the linear regression model, and its forecasts are more inefficient. The linear regression model utilizes more information and there has better forecasts. The linear trend forecast has a larger prediction interval than the linear regression forecast.
Furthermore, when examining the residuals graphs, the linear regression residual points (plot) lie closer to and on zero (abline) than the linear trend residuals, meaning the former forecast has less error.
Therefore, the best forecast is the linear regression forecast.