About me
The data below (data set fancy) concern the monthly sales figures of a shop which opened in January 1987 and sells gifts, souvenirs, and novelties. The shop is situated on the wharf at a beach resort town in Queensland, Australia. The sales volume varies with the seasonal population of tourists. There is a large influx of visitors to the town at Christmas and for the local surfing festival, held every March since 1988. Over time, the shop has expanded its premises, range of products, and staff.
In the time plot we see seasonality with the spike at Christmas and a smaller spike in March for the surfing festival as expected. Over time we see an increase of sales volume which makes sense as the shop has expanded.
Logarithms of the data should be taken in order to more clearly see the seasonality without the expanding changes in the data over time.
Use R to fit a regression model to the logarithms of
these sales data with a linear trend, seasonal dummies and a “surfing
festival” dummy variable.
##
## Call:
## tslm(formula = fancy_log ~ trend + season + festival_dummy, data = fancy_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.33673 -0.12757 0.00257 0.10911 0.37671
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.6196670 0.0742471 102.626 < 2e-16 ***
## trend 0.0220198 0.0008268 26.634 < 2e-16 ***
## season2 0.2514168 0.0956790 2.628 0.010555 *
## season3 0.2660828 0.1934044 1.376 0.173275
## season4 0.3840535 0.0957075 4.013 0.000148 ***
## season5 0.4094870 0.0957325 4.277 5.88e-05 ***
## season6 0.4488283 0.0957647 4.687 1.33e-05 ***
## season7 0.6104545 0.0958039 6.372 1.71e-08 ***
## season8 0.5879644 0.0958503 6.134 4.53e-08 ***
## season9 0.6693299 0.0959037 6.979 1.36e-09 ***
## season10 0.7473919 0.0959643 7.788 4.48e-11 ***
## season11 1.2067479 0.0960319 12.566 < 2e-16 ***
## season12 1.9622412 0.0961066 20.417 < 2e-16 ***
## festival_dummy 0.5015151 0.1964273 2.553 0.012856 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.179 on 70 degrees of freedom
## Multiple R-squared: 0.9567, Adjusted R-squared: 0.9487
## F-statistic: 119 on 13 and 70 DF, p-value: < 2.2e-16
Plot the residuals against time and against the fitted values. Do these plots reveal any problems with the model?
Both plots show the residuals to be random and do not point to any problems with the model.
The boxplots show some wider variance towards the end of the summer and start of the fall. This could point to the model missing out on expressing some seasonality at this time.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.02202 0.39041 0.54474 1.12051 0.72788 7.61967
Looking at the coefficients we can see how much each variable is thought to impact the model. All are very significant expect for season 3 which would be March and when the surfing festival takes place. Because this same time period is also represented using the dummy variable, season 3 becomes a less impactful variable in the model.
##
## Durbin-Watson test
##
## data: fit_fancy
## DW = 0.88889, p-value = 1.956e-07
## alternative hypothesis: true autocorrelation is not 0
The Durbin-Watson rejects the null expressing their is still some autocorrelation remaining in the residuals that could be exploited in order to obtain a better forecast.
Regardless of your answers to the above questions, use your regression model to predict the monthly sales for 1994, 1995, and 1996. Produce prediction intervals for each of your forecasts.
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 1994 9.491352 9.238522 9.744183 9.101594 9.88111
## Feb 1994 9.764789 9.511959 10.017620 9.375031 10.15455
## Mar 1994 9.801475 9.461879 10.141071 9.277961 10.32499
## Apr 1994 9.941465 9.688635 10.194296 9.551707 10.33122
## May 1994 9.988919 9.736088 10.241749 9.599161 10.37868
## Jun 1994 10.050280 9.797449 10.303110 9.660522 10.44004
## Jul 1994 10.233926 9.981095 10.486756 9.844168 10.62368
## Aug 1994 10.233456 9.980625 10.486286 9.843698 10.62321
## Sep 1994 10.336841 10.084010 10.589671 9.947083 10.72660
## Oct 1994 10.436923 10.184092 10.689753 10.047165 10.82668
## Nov 1994 10.918299 10.665468 11.171129 10.528541 11.30806
## Dec 1994 11.695812 11.442981 11.948642 11.306054 12.08557
## Jan 1995 9.755590 9.499844 10.011336 9.361338 10.14984
## Feb 1995 10.029027 9.773281 10.284773 9.634775 10.42328
## Mar 1995 10.065713 9.722498 10.408928 9.536620 10.59481
## Apr 1995 10.205703 9.949957 10.461449 9.811451 10.59996
## May 1995 10.253157 9.997411 10.508903 9.858904 10.64741
## Jun 1995 10.314518 10.058772 10.570264 9.920265 10.70877
## Jul 1995 10.498164 10.242418 10.753910 10.103911 10.89242
## Aug 1995 10.497694 10.241948 10.753440 10.103441 10.89195
## Sep 1995 10.601079 10.345333 10.856825 10.206826 10.99533
## Oct 1995 10.701161 10.445415 10.956907 10.306908 11.09541
## Nov 1995 11.182537 10.926791 11.438282 10.788284 11.57679
## Dec 1995 11.960050 11.704304 12.215796 11.565797 12.35430
## Jan 1996 10.019828 9.760564 10.279093 9.620151 10.41951
## Feb 1996 10.293265 10.034000 10.552530 9.893588 10.69294
## Mar 1996 10.329951 9.982679 10.677222 9.794605 10.86530
## Apr 1996 10.469941 10.210677 10.729206 10.070264 10.86962
## May 1996 10.517395 10.258130 10.776659 10.117718 10.91707
## Jun 1996 10.578756 10.319491 10.838021 10.179079 10.97843
## Jul 1996 10.762402 10.503137 11.021667 10.362725 11.16208
## Aug 1996 10.761932 10.502667 11.021196 10.362254 11.16161
## Sep 1996 10.865317 10.606052 11.124582 10.465640 11.26499
## Oct 1996 10.965399 10.706134 11.224664 10.565722 11.36508
## Nov 1996 11.446774 11.187510 11.706039 11.047097 11.84645
## Dec 1996 12.224288 11.965023 12.483552 11.824611 12.62396
Transform your predictions and intervals to obtain predictions and intervals for the raw data.
As stated by the book, because the Durbin-Watson test shows that there is some autocorrelation remaining in the residuals, and therefore information remaining in the residuals that can be exploited to obtain better forecasts; a dynamic-regression model might be better for this data as the forecasts from the current model are unbiased, but will have larger prediction intervals than they need to.
The data below (data set texasgas) shows the demand for natural gas and the price of natural gas for 20 towns in Texas in 1969.
## price consumption
## 1 30 134
## 2 31 112
## 3 37 136
## 4 42 109
## 5 43 105
## 6 45 87
Do a scatterplot of consumption against price. The data are clearly not linear. Three possible nonlinear models for the data are given. The second model divides the data into two sections, depending on whether the price is above or below 60 cents per 1,000 cubic feet.
It looks like there is a relationship between P and C, therefore changes in P will affect the slope of a fitted line.
For the second model, the parameters a1a1, a2a2, b1b1, b2b2 can be estimated by simply fitting a regression with four regressors but no constant: (i) a dummy taking value 1 when P is less than or equal to 60 and 0 otherwise; (ii) P1=P when P is less than or equal to 60P and 0 otherwise; (iii) a dummy taking value 0 when P is less than or equal to 60 and 1 otherwise; (iv) P2=P when P>60 and 0 otherwise.
##
## Call:
## lm(formula = texasgas$consumption ~ exp_price)
##
## Residuals:
## Min 1Q Median 3Q Max
## -35.86 -25.09 -13.86 20.64 65.14
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.086e+01 7.670e+00 9.238 2.98e-08 ***
## exp_price -1.642e-43 1.711e-43 -0.959 0.35
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 33.19 on 18 degrees of freedom
## Multiple R-squared: 0.04864, Adjusted R-squared: -0.004214
## F-statistic: 0.9203 on 1 and 18 DF, p-value: 0.3501
## [1] 1101.359
##
## Call:
## lm(formula = texasgas$consumption ~ dummy_i + dummy_ii + dummy_iii)
##
## Residuals:
## Min 1Q Median 3Q Max
## -20.987 -6.421 2.823 9.324 22.617
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 84.7861 51.8428 1.635 0.1215
## dummy_i 136.1068 54.9412 2.477 0.0248 *
## dummy_ii -2.9057 0.3738 -7.773 8.05e-07 ***
## dummy_iii -0.4470 0.5634 -0.793 0.4392
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 13.49 on 16 degrees of freedom
## Multiple R-squared: 0.8602, Adjusted R-squared: 0.834
## F-statistic: 32.81 on 3 and 16 DF, p-value: 4.565e-07
## [1] 182.1078
##
## Call:
## lm(formula = texasgas$consumption ~ texasgas$price + sq_price)
##
## Residuals:
## Min 1Q Median 3Q Max
## -25.5601 -5.4693 0.7502 11.0252 25.6619
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 273.930628 31.031614 8.827 9.32e-08 ***
## texasgas$price -5.675863 1.009086 -5.625 3.03e-05 ***
## sq_price 0.033904 0.007412 4.574 0.000269 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 14.37 on 17 degrees of freedom
## Multiple R-squared: 0.8315, Adjusted R-squared: 0.8117
## F-statistic: 41.95 on 2 and 17 DF, p-value: 2.666e-07
## [1] 206.5276
## [1] -0.004214286
## [1] 200.7363
## [1] 0.811689
## [1] 168.1158
Looking at the graphs it’s clear that Model 1 differs drastically from the other two and is likely not a very good model as we see from the lack of residual variance. Model 2 and 3 both show more acceptable residual variance. When looking at R-squared and AIC, Model 2 has a higher R-sqaured and a smaller AIC so that is likely the best model out of the three.
For prices 40, 60, 80, 100, and 120 cents per 1,000 cubic feet, compute the forecasted per capita demand using the best model of the three above.
## 1 2 3 4 5 6 7 8
## 133.72290 130.81724 113.38323 98.85490 95.94923 90.13790 75.60956 63.98689
## 9 10 11 12 13 14 15 16
## 63.98689 55.26989 52.36423 52.36423 46.55289 52.15787 45.45344 45.00647
## 17 18 19 20
## 43.66559 41.43078 40.08989 39.19597
This is coming out for 20 variables instead of the 5, I know this has something to do with the naming of variables but can’t seem to fix it.
## fit lwr upr
## 1 133.72290 100.916974 166.52883
## 2 130.81724 98.340621 163.29385
## 3 113.38323 82.526833 144.23964
## 4 98.85490 68.835752 128.87405
## 5 95.94923 66.037298 125.86117
## 6 90.13790 60.378172 119.89762
## 7 75.60956 45.862015 105.35711
## 8 63.98689 33.871341 94.10245
## 9 63.98689 33.871341 94.10245
## 10 55.26989 24.665025 85.87476
## 11 52.36423 21.557183 83.17127
## 12 52.36423 21.557183 83.17127
## 13 46.55289 15.285111 77.82067
## 14 52.15787 14.378306 89.93743
## 15 45.45344 14.574639 76.33223
## 16 45.00647 14.269892 75.74306
## 17 43.66559 13.078544 74.25263
## 18 41.43078 10.168295 72.69326
## 19 40.08989 7.892941 72.28684
## 20 39.19597 6.174121 72.21781
Same issue as before with different amounts of variables causing no graph to view. If the prediction intervals are shown to be more narrow on a graph then we would assume the model is predicting more accurately. If the prediction intervals are wider then the assumption would follow that the model is not predicting very accurately.
## [1] 0.9904481
Because \(P^2\) can only be positive and P could be positive or negative there is the possibility that this could be problematic in dealing with polynomial regressions as price increases as the relation with consumption might differ when P is lower versus higher.
Show that a 3×5 MA is equivalent to a 7-term weighted moving average with weights of 0.067, 0.133, 0.200, 0.200, 0.200, 0.133, and 0.067.
3x5 MA = 1/3 [1/5(y1+y2+y3+y4+y5)1/5(y2+y3+y4+y5+y6)1/5(y3+y4+y5+y6+y7) = 1/3 (1/5y1+1/5y2+1/5y3+1/5y4+1/5y5+1/5y2+1/5y3+1/5y4+1/5y5+1/5y6+1/5y3+1/5y4+1/5y5+1/5y6+1/5y7) = 1/3 (1/5y1+2/5y2+3/5y3+3/5y4+3/5y5+2/5y6+1/5y7) = 1/15y1 + 2/15y2 + 3/15y3 + 3/15y4 + 3/15y5 + 2/15y6 + 1/15y7) = 0.067y1 + 0.133y2 + 0.200y3 + 0.200y4 + 0.200y5 + 0.133y6 + 0.067y7
The data below represent the monthly sales (in thousands) of product A for a plastics manufacturer for years 1 through 5 (data set plastics).
## Jan Feb Mar Apr May Jun
## 1 742 697 776 898 1030 1107
It appears that there is a seasonal fluctuation with a peak towards the end of summer each year. There is a postive trend over time.
Use a classical multiplicative decomposition to calculate the trend-cycle and seasonal indices.
## Length Class Mode
## x 60 ts numeric
## seasonal 60 ts numeric
## trend 60 ts numeric
## random 60 ts numeric
## figure 12 -none- numeric
## type 1 -none- character
## Jan Feb Mar Apr May Jun Jul
## 1 NA NA NA NA NA NA 976.9583
## 2 1000.4583 1011.2083 1022.2917 1034.7083 1045.5417 1054.4167 1065.7917
## 3 1117.3750 1121.5417 1130.6667 1142.7083 1153.5833 1163.0000 1170.3750
## 4 1208.7083 1221.2917 1231.7083 1243.2917 1259.1250 1276.5833 1287.6250
## 5 1374.7917 1382.2083 1381.2500 1370.5833 1351.2500 1331.2500 NA
## Aug Sep Oct Nov Dec
## 1 977.0417 977.0833 978.4167 982.7083 990.4167
## 2 1076.1250 1084.6250 1094.3750 1103.8750 1112.5417
## 3 1175.5000 1180.5417 1185.0000 1190.1667 1197.0833
## 4 1298.0417 1313.0000 1328.1667 1343.5833 1360.6250
## 5 NA NA NA NA NA
## Jan Feb Mar Apr May Jun Jul
## 1 0.7670466 0.7103357 0.7765294 0.9103112 1.0447386 1.1570026 1.1636317
## 2 0.7670466 0.7103357 0.7765294 0.9103112 1.0447386 1.1570026 1.1636317
## 3 0.7670466 0.7103357 0.7765294 0.9103112 1.0447386 1.1570026 1.1636317
## 4 0.7670466 0.7103357 0.7765294 0.9103112 1.0447386 1.1570026 1.1636317
## 5 0.7670466 0.7103357 0.7765294 0.9103112 1.0447386 1.1570026 1.1636317
## Aug Sep Oct Nov Dec
## 1 1.2252952 1.2313635 1.1887444 0.9919176 0.8330834
## 2 1.2252952 1.2313635 1.1887444 0.9919176 0.8330834
## 3 1.2252952 1.2313635 1.1887444 0.9919176 0.8330834
## 4 1.2252952 1.2313635 1.1887444 0.9919176 0.8330834
## 5 1.2252952 1.2313635 1.1887444 0.9919176 0.8330834
Yes, the results support the graphical interpretation that there is a peak during the summer. May-Oct have the highest points.
Compute and plot the seasonally adjusted data.
The outlier causes a spike and drop during the summer.
If towards the end of the time series the outlier has less of an impact in the earlier time points and more of an impact later on.
Use a random walk with drift to produce forecasts of the seasonally adjusted data.
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 6 1220.179 1167.802 1272.555 1140.0757 1300.281
## Feb 6 1224.392 1149.706 1299.079 1110.1697 1338.615
## Mar 6 1228.606 1136.388 1320.825 1087.5706 1369.642
## Apr 6 1232.820 1125.480 1340.160 1068.6581 1396.982
## May 6 1237.034 1116.076 1357.992 1052.0443 1422.024
## Jun 6 1241.248 1107.714 1374.782 1037.0248 1445.471
## Jul 6 1245.462 1100.123 1390.800 1023.1853 1467.738
## Aug 6 1249.676 1093.129 1406.222 1010.2586 1489.093
## Sep 6 1253.889 1086.612 1421.166 998.0614 1509.718
## Oct 6 1258.103 1080.486 1435.721 986.4612 1529.745
Personalize the results to give forecasts on the original scale
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 6 936.2531 883.8662 988.6400 856.1342 1016.3720
## Feb 6 863.6074 789.5211 937.6937 750.3022 976.9126
## Mar 6 942.2625 851.5257 1032.9993 803.4925 1081.0325
## Apr 6 1095.6329 990.8590 1200.4067 935.3951 1255.8707
## May 6 1234.3115 1117.1708 1351.4523 1055.1602 1413.4628
## Jun 6 1344.5774 1216.2562 1472.8987 1148.3270 1540.8278
## Jul 6 1390.0138 1251.4111 1528.6166 1178.0392 1601.9885
## Aug 6 1447.9805 1299.8079 1596.1531 1221.3700 1674.5910
## Sep 6 1463.6068 1306.4460 1620.7676 1223.2501 1703.9635
## Oct 6 1415.5186 1249.8565 1581.1806 1162.1604 1668.8768
## Nov 6 1159.9252 986.1774 1333.6730 894.2009 1425.6495
## Dec 6 1013.0000 831.5264 1194.4736 735.4600 1290.5400
## Jan 7 936.2531 747.3693 1125.1369 647.3803 1225.1259
## Feb 7 863.6074 667.5935 1059.6214 563.8300 1163.3849
## Mar 7 942.2625 739.3688 1145.1562 631.9633 1252.5616
## Apr 7 1095.6329 886.0852 1305.1806 775.1573 1416.1085
## May 7 1234.3115 1018.3147 1450.3084 903.9729 1564.6502
## Jun 7 1344.5774 1122.3185 1566.8363 1004.6617 1684.4931
## Jul 7 1390.0138 1161.6645 1618.3632 1040.7837 1739.2440
## Aug 7 1447.9805 1213.6990 1682.2620 1089.6779 1806.2831
## Sep 7 1463.6068 1223.5397 1703.6739 1096.4559 1830.7577
## Oct 7 1415.5186 1169.8021 1661.2350 1039.7276 1791.3095
## Nov 7 1159.9252 908.6863 1411.1641 775.6885 1544.1620
## Dec 7 1013.0000 756.3575 1269.6425 620.4992 1405.5008