Seasonal fluctuations in the data match the anecdotal description. There is a large peak in sales in December each year for Christmas and a smaller peak for the March festival starting in 1988. Additionally, sales drop considerably in January. However, in 1992 and 1993, there is an unexpected change: the months outside of March and December are no longer flat (with, perhaps a small increasing trend). Instead, there is an upward slope showing sales increasing each month, with the December peaks becoming even larger.
library(fpp)
## Loading required package: forecast
## Warning: package 'forecast' was built under R version 3.3.2
## Loading required package: fma
## Warning: package 'fma' was built under R version 3.3.2
## Loading required package: expsmooth
## Loading required package: lmtest
## Warning: package 'lmtest' was built under R version 3.3.2
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 3.3.2
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: tseries
## Warning: package 'tseries' was built under R version 3.3.2
data(fancy)
myts = ts(fancy, start=1987, frequency = 12)
plot(myts)
This data shows variation that increases with the level of the series. To straighten the trend so that it can be fitted by a linear model, log transformation is necessary. Looking at my first time series plot, the graph seems to show exponential growth but looking at the new plot with log transformed data, the graph shows a more linear trend, stabilizing the variance.
log.myts = ts(log(fancy), start=1987, frequency = 12)
plot(log.myts)
log.sales = log(fancy)
dum_fest = rep(0, length(fancy))
dum_fest[seq_along(dum_fest)%%12 == 3] = 1
dum_fest[3] = 0
dum_fest = ts(dum_fest, freq = 12, start=c(1987,1))
fancy.new = data.frame(log.sales, dum_fest)
fit <- tslm(log.sales ~ trend + seasonaldummy(log.myts) + dum_fest, data=fancy.new)
summary(fit)
##
## Call:
## tslm(formula = log.sales ~ trend + seasonaldummy(log.myts) +
## dum_fest, data = fancy.new)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.33673 -0.12757 0.00257 0.10911 0.37671
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.5819082 0.0784330 122.167 < 2e-16 ***
## trend 0.0220198 0.0008268 26.634 < 2e-16 ***
## seasonaldummy(log.myts)Jan -1.9622412 0.0961066 -20.417 < 2e-16 ***
## seasonaldummy(log.myts)Feb -1.7108244 0.0960319 -17.815 < 2e-16 ***
## seasonaldummy(log.myts)Mar -1.6961584 0.1949340 -8.701 9.35e-13 ***
## seasonaldummy(log.myts)Apr -1.5781877 0.0959037 -16.456 < 2e-16 ***
## seasonaldummy(log.myts)May -1.5527543 0.0958503 -16.200 < 2e-16 ***
## seasonaldummy(log.myts)Jun -1.5134129 0.0958039 -15.797 < 2e-16 ***
## seasonaldummy(log.myts)Jul -1.3517867 0.0957647 -14.116 < 2e-16 ***
## seasonaldummy(log.myts)Aug -1.3742768 0.0957325 -14.355 < 2e-16 ***
## seasonaldummy(log.myts)Sep -1.2929114 0.0957075 -13.509 < 2e-16 ***
## seasonaldummy(log.myts)Oct -1.2148493 0.0956897 -12.696 < 2e-16 ***
## seasonaldummy(log.myts)Nov -0.7554933 0.0956790 -7.896 2.84e-11 ***
## dum_fest 0.5015151 0.1964273 2.553 0.0129 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.179 on 70 degrees of freedom
## Multiple R-squared: 0.9567, Adjusted R-squared: 0.9487
## F-statistic: 119 on 13 and 70 DF, p-value: < 2.2e-16
Residuals against time shows a slight upward trend from 1987 to 1989, downward from 1989 to 1991, and upward from 1991 to 1994. So, a nonlinear transformation might be needed to better fit the model. Residuals against fitted doesn’t show a pattern and the variation in the residuals does not seem to change with the size of the fitted value. Thus, the residuals are unbiased.
plot(residuals(fit), xlab="Predicted scores", ylab="Time", type="p")
plot(as.numeric(fitted(fit)), residuals(fit), xlab="Predicted scores", ylab="Residuals", type="p")
## 1e There is greater variability in August, September, and October. Given that this is a shop in a beach resort town, perhaps there is a weather component in effect during these months that the model does not account for.
boxplot(residuals(fit) ~ cycle(residuals(fit)))
There is an upward trend of 2.20% per month. On average, January has sales 196% lower than December, February has sales 171% lower than December, March has sales 169% lower than December, April has sales 157% lower than December, May has sales 155% lower than December, June has sales 151% lower than December, July has sales 135% lower than December, August has sales 137% lower than December, September has sales 129% lower than December, October has sales 121% lower than December, November has sales 75.5% lower than December. On average, months with the surfing festival have sales 50.2% higher than months without the surfing festival. I was surprised that the March sales were so much lower than December sales compared to the other months but I think the dummy variable for festival is capturing that peak.
summary(fit)
##
## Call:
## tslm(formula = log.sales ~ trend + seasonaldummy(log.myts) +
## dum_fest, data = fancy.new)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.33673 -0.12757 0.00257 0.10911 0.37671
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.5819082 0.0784330 122.167 < 2e-16 ***
## trend 0.0220198 0.0008268 26.634 < 2e-16 ***
## seasonaldummy(log.myts)Jan -1.9622412 0.0961066 -20.417 < 2e-16 ***
## seasonaldummy(log.myts)Feb -1.7108244 0.0960319 -17.815 < 2e-16 ***
## seasonaldummy(log.myts)Mar -1.6961584 0.1949340 -8.701 9.35e-13 ***
## seasonaldummy(log.myts)Apr -1.5781877 0.0959037 -16.456 < 2e-16 ***
## seasonaldummy(log.myts)May -1.5527543 0.0958503 -16.200 < 2e-16 ***
## seasonaldummy(log.myts)Jun -1.5134129 0.0958039 -15.797 < 2e-16 ***
## seasonaldummy(log.myts)Jul -1.3517867 0.0957647 -14.116 < 2e-16 ***
## seasonaldummy(log.myts)Aug -1.3742768 0.0957325 -14.355 < 2e-16 ***
## seasonaldummy(log.myts)Sep -1.2929114 0.0957075 -13.509 < 2e-16 ***
## seasonaldummy(log.myts)Oct -1.2148493 0.0956897 -12.696 < 2e-16 ***
## seasonaldummy(log.myts)Nov -0.7554933 0.0956790 -7.896 2.84e-11 ***
## dum_fest 0.5015151 0.1964273 2.553 0.0129 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.179 on 70 degrees of freedom
## Multiple R-squared: 0.9567, Adjusted R-squared: 0.9487
## F-statistic: 119 on 13 and 70 DF, p-value: < 2.2e-16
As the p-value is very small, we reject the null hypothesis that here is no lag one autocorrelation in the residuals. Since there is autocorrelation in the residuals, there is information remaining in the residuals that can be exploited to obtain better forecasts.
dwtest(fit, alt="two.sided")
##
## Durbin-Watson test
##
## data: fit
## DW = 0.88889, p-value = 1.956e-07
## alternative hypothesis: true autocorrelation is not 0
predicted = forecast(fit, h=36, newdata=fancy.new)
predicted
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 1994 9.491352 9.238522 9.744183 9.101594 9.88111
## Feb 1994 9.764789 9.511959 10.017620 9.375031 10.15455
## Mar 1994 9.801475 9.461879 10.141071 9.277961 10.32499
## Apr 1994 9.941465 9.688635 10.194296 9.551707 10.33122
## May 1994 9.988919 9.736088 10.241749 9.599161 10.37868
## Jun 1994 10.050280 9.797449 10.303110 9.660522 10.44004
## Jul 1994 10.233926 9.981095 10.486756 9.844168 10.62368
## Aug 1994 10.233456 9.980625 10.486286 9.843698 10.62321
## Sep 1994 10.336841 10.084010 10.589671 9.947083 10.72660
## Oct 1994 10.436923 10.184092 10.689753 10.047165 10.82668
## Nov 1994 10.918299 10.665468 11.171129 10.528541 11.30806
## Dec 1994 11.695812 11.442981 11.948642 11.306054 12.08557
## Jan 1995 9.755590 9.499844 10.011336 9.361338 10.14984
## Feb 1995 10.029027 9.773281 10.284773 9.634775 10.42328
## Mar 1995 10.567228 10.310518 10.823938 10.171489 10.96297
## Apr 1995 10.205703 9.949957 10.461449 9.811451 10.59996
## May 1995 10.253157 9.997411 10.508903 9.858904 10.64741
## Jun 1995 10.314518 10.058772 10.570264 9.920265 10.70877
## Jul 1995 10.498164 10.242418 10.753910 10.103911 10.89242
## Aug 1995 10.497694 10.241948 10.753440 10.103441 10.89195
## Sep 1995 10.601079 10.345333 10.856825 10.206826 10.99533
## Oct 1995 10.701161 10.445415 10.956907 10.306908 11.09541
## Nov 1995 11.182537 10.926791 11.438282 10.788284 11.57679
## Dec 1995 11.960050 11.704304 12.215796 11.565797 12.35430
## Jan 1996 10.019828 9.760564 10.279093 9.620151 10.41951
## Feb 1996 10.293265 10.034000 10.552530 9.893588 10.69294
## Mar 1996 10.831466 10.571566 11.091365 10.430810 11.23212
## Apr 1996 10.469941 10.210677 10.729206 10.070264 10.86962
## May 1996 10.517395 10.258130 10.776659 10.117718 10.91707
## Jun 1996 10.578756 10.319491 10.838021 10.179079 10.97843
## Jul 1996 10.762402 10.503137 11.021667 10.362725 11.16208
## Aug 1996 10.761932 10.502667 11.021196 10.362254 11.16161
## Sep 1996 10.865317 10.606052 11.124582 10.465640 11.26499
## Oct 1996 10.965399 10.706134 11.224664 10.565722 11.36508
## Nov 1996 11.446774 11.187510 11.706039 11.047097 11.84645
## Dec 1996 12.224288 11.965023 12.483552 11.824611 12.62396
## Jan 1997 10.284066 10.020703 10.547429 9.878072 10.69006
## Feb 1997 10.557503 10.294140 10.820866 10.151508 10.96350
## Mar 1997 11.095704 10.832028 11.359379 10.689227 11.50218
## Apr 1997 10.734179 10.470816 10.997542 10.328185 11.14017
## May 1997 10.781633 10.518270 11.044995 10.375638 11.18763
## Jun 1997 10.842994 10.579631 11.106357 10.436999 11.24899
## Jul 1997 11.026640 10.763277 11.290003 10.620645 11.43263
## Aug 1997 11.026169 10.762807 11.289532 10.620175 11.43216
## Sep 1997 11.129555 10.866192 11.392918 10.723560 11.53555
## Oct 1997 11.229637 10.966274 11.493000 10.823642 11.63563
## Nov 1997 11.711012 11.447650 11.974375 11.305018 12.11701
## Dec 1997 12.488526 12.225163 12.751888 12.082531 12.89452
## Jan 1998 10.548304 10.280291 10.816318 10.135140 10.96147
## Feb 1998 10.821741 10.553727 11.089754 10.408577 11.23490
## Mar 1998 11.359942 11.091928 11.627955 10.946778 11.77311
## Apr 1998 10.998417 10.730404 11.266431 10.585253 11.41158
## May 1998 11.045870 10.777857 11.313884 10.632707 11.45903
## Jun 1998 11.107232 10.839218 11.375245 10.694068 11.52040
## Jul 1998 11.290878 11.022864 11.558891 10.877714 11.70404
## Aug 1998 11.290407 11.022394 11.558421 10.877244 11.70357
## Sep 1998 11.393793 11.125779 11.661806 10.980629 11.80696
## Oct 1998 11.493875 11.225861 11.761888 11.080711 11.90704
## Nov 1998 11.975250 11.707237 12.243264 11.562086 12.38841
## Dec 1998 12.752764 12.484750 13.020777 12.339600 13.16593
## Jan 1999 10.812542 10.539354 11.085731 10.391400 11.23368
## Feb 1999 11.085979 10.812790 11.359167 10.664837 11.50712
## Mar 1999 11.624180 11.351293 11.897067 11.203503 12.04486
## Apr 1999 11.262655 10.989467 11.535844 10.841513 11.68380
## May 1999 11.310108 11.036920 11.583297 10.888967 11.73125
## Jun 1999 11.371470 11.098281 11.644658 10.950328 11.79261
## Jul 1999 11.555116 11.281927 11.828304 11.133974 11.97626
## Aug 1999 11.554645 11.281457 11.827834 11.133504 11.97579
## Sep 1999 11.658031 11.384842 11.931219 11.236889 12.07917
## Oct 1999 11.758113 11.484924 12.031301 11.336971 12.17925
## Nov 1999 12.239488 11.966300 12.512677 11.818347 12.66063
## Dec 1999 13.017001 12.743813 13.290190 12.595860 13.43814
## Jan 2000 11.076780 10.797921 11.355639 10.646897 11.50666
## Feb 2000 11.350217 11.071358 11.629076 10.920334 11.78010
## Mar 2000 11.888418 11.610150 12.166685 11.459447 12.31739
## Apr 2000 11.526893 11.248034 11.805752 11.097010 11.95678
## May 2000 11.574346 11.295487 11.853205 11.144463 12.00423
## Jun 2000 11.635708 11.356849 11.914566 11.205825 12.06559
## Jul 2000 11.819354 11.540495 12.098212 11.389471 12.24924
## Aug 2000 11.818883 11.540024 12.097742 11.389000 12.24877
## Sep 2000 11.922269 11.643410 12.201127 11.492386 12.35215
## Oct 2000 12.022350 11.743492 12.301209 11.592468 12.45223
## Nov 2000 12.503726 12.224867 12.782585 12.073843 12.93361
## Dec 2000 13.281239 13.002381 13.560098 12.851357 13.71112
rawdata = as.data.frame(predicted)
rawdata = exp(rawdata)
rawdata[1:36,]
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 1994 13244.70 10285.82 17054.73 8969.583 19557.43
## Feb 1994 17409.81 13520.45 22418.00 11790.284 25707.73
## Mar 1994 18060.36 12860.02 25363.61 10699.594 30484.96
## Apr 1994 20774.16 16133.21 26750.16 14068.696 30675.62
## May 1994 21783.73 16917.24 28050.15 14752.395 32166.37
## Jun 1994 23162.27 17987.81 29825.24 15685.969 34201.95
## Jul 1994 27831.56 21613.98 35837.72 18848.111 41096.73
## Aug 1994 27818.48 21603.82 35820.87 18839.249 41077.41
## Sep 1994 30848.42 23956.87 39722.43 20891.193 45551.50
## Oct 1994 34095.57 26478.61 43903.67 23090.230 50346.32
## Nov 1994 55176.84 42850.31 71049.28 37366.903 81475.41
## Dec 1994 120067.79 93244.59 154607.08 81312.400 177294.90
## Jan 1995 17250.40 13357.65 22277.59 11629.938 25587.08
## Feb 1995 22675.20 17558.28 29283.31 15287.252 33633.55
## Mar 1995 38840.85 30046.98 50208.44 26146.972 57697.39
## Apr 1995 27057.06 20951.33 34942.16 18241.435 40133.06
## May 1995 28371.96 21969.51 36640.25 19127.918 42083.42
## Jun 1995 30167.42 23359.80 38958.95 20338.387 44746.58
## Jul 1995 36248.88 28068.91 46812.70 24438.412 53767.06
## Aug 1995 36231.84 28055.72 46790.69 24426.922 53741.78
## Sep 1995 40178.16 31111.50 51887.06 27087.467 59595.26
## Oct 1995 44407.37 34386.35 57348.77 29938.733 65868.34
## Nov 1995 71864.42 55647.40 92807.48 48449.831 106594.69
## Dec 1995 156380.86 121091.75 201954.08 105429.448 231955.81
## Jan 1996 22467.57 17336.40 29117.46 15065.329 33506.86
## Feb 1996 29533.04 22788.25 38274.14 19802.984 44043.89
## Mar 1996 50587.81 39009.73 65602.25 33887.802 75517.62
## Apr 1996 35240.15 27191.96 45670.42 23629.808 52555.15
## May 1996 36952.72 28513.41 47889.88 24778.151 55109.18
## Jun 1996 39291.20 30317.82 50920.48 26346.183 58596.65
## Jul 1996 47211.93 36429.60 61185.57 31657.322 70409.18
## Aug 1996 47189.73 36412.48 61156.80 31642.439 70376.07
## Sep 1996 52329.57 40378.47 67817.91 35088.887 78041.33
## Oct 1996 57837.85 44628.77 74956.52 38782.394 86256.08
## Nov 1996 93598.96 72222.70 121302.09 62761.521 139588.15
## Dec 1996 203676.38 157160.50 263959.89 136572.460 303751.35
These predictions could be improved by determining if there are additional predictors that have been left out of the model. For instance, any changes that the store makes in the way that they conduct their business due to continued increased sales growth such as an increase in the store’s marketing, a change in the product selection, or store hours could all affect future sales. Additionally, if there are any weather effects on sales that could explain some of the variability in ranges for some of the later months of the year, that information would be useful in the model.
data(texasgas)
plot(texasgas$price, texasgas$consumption)
fit.model1 = lm(consumption ~ exp(price), texasgas)
summary(fit.model1)
##
## Call:
## lm(formula = consumption ~ exp(price), data = texasgas)
##
## Residuals:
## Min 1Q Median 3Q Max
## -35.86 -25.09 -13.86 20.64 65.14
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.086e+01 7.670e+00 9.238 2.98e-08 ***
## exp(price) -1.642e-43 1.711e-43 -0.959 0.35
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 33.19 on 18 degrees of freedom
## Multiple R-squared: 0.04864, Adjusted R-squared: -0.004214
## F-statistic: 0.9203 on 1 and 18 DF, p-value: 0.3501
(summary(fit.model1)$sigma)^2
## [1] 1101.359
fit.model3= lm(consumption ~ poly(price,2), texasgas)
summary(fit.model3)
##
## Call:
## lm(formula = consumption ~ poly(price, 2), data = texasgas)
##
## Residuals:
## Min 1Q Median 3Q Max
## -25.5601 -5.4693 0.7502 11.0252 25.6619
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 69.000 3.213 21.472 9.35e-14 ***
## poly(price, 2)1 -114.043 14.371 -7.936 4.07e-07 ***
## poly(price, 2)2 65.737 14.371 4.574 0.000269 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 14.37 on 17 degrees of freedom
## Multiple R-squared: 0.8315, Adjusted R-squared: 0.8117
## F-statistic: 41.95 on 2 and 17 DF, p-value: 2.666e-07
(summary(fit.model3)$sigma)^2
## [1] 206.5276