This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
You can also embed plots, for example:
Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.
5.8 Exercises
library(fma)
## Loading required package: forecast
## Warning: package 'forecast' was built under R version 3.4.2
summary(fancy)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1665 5884 8772 14316 16889 104661
plot(fancy)
b) It is necessary to take the log of this data before fitting a model in order to make it normal, if we fit it before taking the log the model would be off since it is exponential
log.fancy=log(fancy)
dummy=rep(0,length(fancy))
dummy[seq_along(dummy)%%12==3]=1
dummy[3]=0
dummy=ts(dummy,frequency=12, start=c(1987,1))
data.fancy=data.frame(log.fancy, dummy)
fit=tslm(log.fancy~trend++season+dummy,data=data.fancy)
summary(fit)
##
## Call:
## tslm(formula = log.fancy ~ trend + +season + dummy, data = data.fancy)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.33673 -0.12757 0.00257 0.10911 0.37671
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.6196670 0.0742471 102.626 < 2e-16 ***
## trend 0.0220198 0.0008268 26.634 < 2e-16 ***
## season2 0.2514168 0.0956790 2.628 0.010555 *
## season3 0.2660828 0.1934044 1.376 0.173275
## season4 0.3840535 0.0957075 4.013 0.000148 ***
## season5 0.4094870 0.0957325 4.277 5.88e-05 ***
## season6 0.4488283 0.0957647 4.687 1.33e-05 ***
## season7 0.6104545 0.0958039 6.372 1.71e-08 ***
## season8 0.5879644 0.0958503 6.134 4.53e-08 ***
## season9 0.6693299 0.0959037 6.979 1.36e-09 ***
## season10 0.7473919 0.0959643 7.788 4.48e-11 ***
## season11 1.2067479 0.0960319 12.566 < 2e-16 ***
## season12 1.9622412 0.0961066 20.417 < 2e-16 ***
## dummy 0.5015151 0.1964273 2.553 0.012856 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.179 on 70 degrees of freedom
## Multiple R-squared: 0.9567, Adjusted R-squared: 0.9487
## F-statistic: 119 on 13 and 70 DF, p-value: < 2.2e-16
plot(residuals(fit), type='p')
The residuals show no pattern and therefore are unbiased and have no problem
boxplot(resid(fit)~cycle(resid(fit)))
There is a lot of variance in this boxplot, meaning that there may be an issue with our model
The coefficient values show how much each month affects the y value of the overall model. As the year goes along the coefficients get higher because the amounts are higher in the later months of the year.
library(lmtest)
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 3.4.3
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
dwtest(fit)
##
## Durbin-Watson test
##
## data: fit
## DW = 0.88889, p-value = 9.78e-08
## alternative hypothesis: true autocorrelation is greater than 0
This test shows that there is autocorrelation in the residuals. We should create a better model to get rid of this.
fancy.1=data.frame(dummy=rep(0,36))
forecast=forecast(fit,newdata=fancy.1)
forecast
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 1994 9.491352 9.238522 9.744183 9.101594 9.88111
## Feb 1994 9.764789 9.511959 10.017620 9.375031 10.15455
## Mar 1994 9.801475 9.461879 10.141071 9.277961 10.32499
## Apr 1994 9.941465 9.688635 10.194296 9.551707 10.33122
## May 1994 9.988919 9.736088 10.241749 9.599161 10.37868
## Jun 1994 10.050280 9.797449 10.303110 9.660522 10.44004
## Jul 1994 10.233926 9.981095 10.486756 9.844168 10.62368
## Aug 1994 10.233456 9.980625 10.486286 9.843698 10.62321
## Sep 1994 10.336841 10.084010 10.589671 9.947083 10.72660
## Oct 1994 10.436923 10.184092 10.689753 10.047165 10.82668
## Nov 1994 10.918299 10.665468 11.171129 10.528541 11.30806
## Dec 1994 11.695812 11.442981 11.948642 11.306054 12.08557
## Jan 1995 9.755590 9.499844 10.011336 9.361338 10.14984
## Feb 1995 10.029027 9.773281 10.284773 9.634775 10.42328
## Mar 1995 10.065713 9.722498 10.408928 9.536620 10.59481
## Apr 1995 10.205703 9.949957 10.461449 9.811451 10.59996
## May 1995 10.253157 9.997411 10.508903 9.858904 10.64741
## Jun 1995 10.314518 10.058772 10.570264 9.920265 10.70877
## Jul 1995 10.498164 10.242418 10.753910 10.103911 10.89242
## Aug 1995 10.497694 10.241948 10.753440 10.103441 10.89195
## Sep 1995 10.601079 10.345333 10.856825 10.206826 10.99533
## Oct 1995 10.701161 10.445415 10.956907 10.306908 11.09541
## Nov 1995 11.182537 10.926791 11.438282 10.788284 11.57679
## Dec 1995 11.960050 11.704304 12.215796 11.565797 12.35430
## Jan 1996 10.019828 9.760564 10.279093 9.620151 10.41951
## Feb 1996 10.293265 10.034000 10.552530 9.893588 10.69294
## Mar 1996 10.329951 9.982679 10.677222 9.794605 10.86530
## Apr 1996 10.469941 10.210677 10.729206 10.070264 10.86962
## May 1996 10.517395 10.258130 10.776659 10.117718 10.91707
## Jun 1996 10.578756 10.319491 10.838021 10.179079 10.97843
## Jul 1996 10.762402 10.503137 11.021667 10.362725 11.16208
## Aug 1996 10.761932 10.502667 11.021196 10.362254 11.16161
## Sep 1996 10.865317 10.606052 11.124582 10.465640 11.26499
## Oct 1996 10.965399 10.706134 11.224664 10.565722 11.36508
## Nov 1996 11.446774 11.187510 11.706039 11.047097 11.84645
## Dec 1996 12.224288 11.965023 12.483552 11.824611 12.62396
raw.forecast=as.data.frame(forecast)
raw.forecast=exp(raw.forecast)
raw.forecast
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 1994 13244.70 10285.82 17054.73 8969.583 19557.43
## Feb 1994 17409.81 13520.45 22418.00 11790.284 25707.73
## Mar 1994 18060.36 12860.02 25363.61 10699.594 30484.96
## Apr 1994 20774.16 16133.21 26750.16 14068.696 30675.62
## May 1994 21783.73 16917.24 28050.15 14752.395 32166.37
## Jun 1994 23162.27 17987.81 29825.24 15685.969 34201.95
## Jul 1994 27831.56 21613.98 35837.72 18848.111 41096.73
## Aug 1994 27818.48 21603.82 35820.87 18839.249 41077.41
## Sep 1994 30848.42 23956.87 39722.43 20891.193 45551.50
## Oct 1994 34095.57 26478.61 43903.67 23090.230 50346.32
## Nov 1994 55176.84 42850.31 71049.28 37366.903 81475.41
## Dec 1994 120067.79 93244.59 154607.08 81312.400 177294.90
## Jan 1995 17250.40 13357.65 22277.59 11629.938 25587.08
## Feb 1995 22675.20 17558.28 29283.31 15287.252 33633.55
## Mar 1995 23522.50 16688.88 33154.31 13858.024 39926.92
## Apr 1995 27057.06 20951.33 34942.16 18241.435 40133.06
## May 1995 28371.96 21969.51 36640.25 19127.918 42083.42
## Jun 1995 30167.42 23359.80 38958.95 20338.387 44746.58
## Jul 1995 36248.88 28068.91 46812.70 24438.412 53767.06
## Aug 1995 36231.84 28055.72 46790.69 24426.922 53741.78
## Sep 1995 40178.16 31111.50 51887.06 27087.467 59595.26
## Oct 1995 44407.37 34386.35 57348.77 29938.733 65868.34
## Nov 1995 71864.42 55647.40 92807.48 48449.831 106594.69
## Dec 1995 156380.86 121091.75 201954.08 105429.448 231955.81
## Jan 1996 22467.57 17336.40 29117.46 15065.329 33506.86
## Feb 1996 29533.04 22788.25 38274.14 19802.984 44043.89
## Mar 1996 30636.60 21648.24 43356.94 17936.708 52328.52
## Apr 1996 35240.15 27191.96 45670.42 23629.808 52555.15
## May 1996 36952.72 28513.41 47889.88 24778.151 55109.18
## Jun 1996 39291.20 30317.82 50920.48 26346.183 58596.65
## Jul 1996 47211.93 36429.60 61185.57 31657.322 70409.18
## Aug 1996 47189.73 36412.48 61156.80 31642.439 70376.07
## Sep 1996 52329.57 40378.47 67817.91 35088.887 78041.33
## Oct 1996 57837.85 44628.77 74956.52 38782.394 86256.08
## Nov 1996 93598.96 72222.70 121302.09 62761.521 139588.15
## Dec 1996 203676.38 157160.50 263959.89 136572.460 303751.35
We could improve these predictions by modifying the model to get rid of the autocorrelation in the residuals by doing more transformations to begin with or by messing around with different types of regression models.
texasgas
## price consumption
## 1 30 134
## 2 31 112
## 3 37 136
## 4 42 109
## 5 43 105
## 6 45 87
## 7 50 56
## 8 54 43
## 9 54 77
## 10 57 35
## 11 58 65
## 12 58 56
## 13 60 58
## 14 73 55
## 15 88 49
## 16 89 39
## 17 92 36
## 18 97 46
## 19 100 40
## 20 102 42
names(texasgas)
## [1] "price" "consumption"
attach(texasgas)
plot(price,consumption)
P and C look to be inversely related so the slope of the line should change with P
price.exp=exp(texasgas$price)
mylm=lm(texasgas$consumption~price.exp)
summary(mylm)
##
## Call:
## lm(formula = texasgas$consumption ~ price.exp)
##
## Residuals:
## Min 1Q Median 3Q Max
## -35.86 -25.09 -13.86 20.64 65.14
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.086e+01 7.670e+00 9.238 2.98e-08 ***
## price.exp -1.642e-43 1.711e-43 -0.959 0.35
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 33.19 on 18 degrees of freedom
## Multiple R-squared: 0.04864, Adjusted R-squared: -0.004214
## F-statistic: 0.9203 on 1 and 18 DF, p-value: 0.3501
(summary(mylm)$sigma)^2
## [1] 1101.359
dummy1=ifelse(texasgas$price<=60, 1, 0)
dummy2=ifelse(texasgas$price<=60, texasgas$price, 0)
dummy3=ifelse(texasgas$price>60, texasgas$price, 0)
mylm2=lm(texasgas$consumption~dummy1+dummy2+dummy3)
summary(mylm2)
##
## Call:
## lm(formula = texasgas$consumption ~ dummy1 + dummy2 + dummy3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -20.987 -6.421 2.823 9.324 22.617
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 84.7861 51.8428 1.635 0.1215
## dummy1 136.1068 54.9412 2.477 0.0248 *
## dummy2 -2.9057 0.3738 -7.773 8.05e-07 ***
## dummy3 -0.4470 0.5634 -0.793 0.4392
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 13.49 on 16 degrees of freedom
## Multiple R-squared: 0.8602, Adjusted R-squared: 0.834
## F-statistic: 32.81 on 3 and 16 DF, p-value: 4.565e-07
(summary(mylm2)$sigma)^2
## [1] 182.1078
price.squared=texasgas$price^2
mylm3=lm(texasgas$consumption~texasgas$price+price.squared)
summary(mylm3)
##
## Call:
## lm(formula = texasgas$consumption ~ texasgas$price + price.squared)
##
## Residuals:
## Min 1Q Median 3Q Max
## -25.5601 -5.4693 0.7502 11.0252 25.6619
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 273.930628 31.031614 8.827 9.32e-08 ***
## texasgas$price -5.675863 1.009086 -5.625 3.03e-05 ***
## price.squared 0.033904 0.007412 4.574 0.000269 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 14.37 on 17 degrees of freedom
## Multiple R-squared: 0.8315, Adjusted R-squared: 0.8117
## F-statistic: 41.95 on 2 and 17 DF, p-value: 2.666e-07
(summary(mylm3)$sigma)^2
## [1] 206.5276
summary(mylm)$adj.r.squared
## [1] -0.004214286
AIC(mylm)
## [1] 200.7363
summary(mylm2)$adj.r.squared
## [1] 0.8339549
AIC(mylm2)
## [1] 166.3866
summary(mylm3)$adj.r.squared
## [1] 0.811689
AIC(mylm3)
## [1] 168.1158
plot(residuals(mylm))
plot(residuals(mylm2))
plot(residuals(mylm3))
The first model looks like it isn’t as good as the other two since there isn’t much residual variance.
prediction.data=data.frame(price=c(40,60,80,100,120))
predict(mylm2, prediction.data)
## Warning: 'newdata' had 5 rows but variables found have 20 rows
## 1 2 3 4 5 6 7
## 133.72290 130.81724 113.38323 98.85490 95.94923 90.13790 75.60956
## 8 9 10 11 12 13 14
## 63.98689 63.98689 55.26989 52.36423 52.36423 46.55289 52.15787
## 15 16 17 18 19 20
## 45.45344 45.00647 43.66559 41.43078 40.08989 39.19597
pred.intervals=predict(mylm2,prediction.data,interval="predict")
## Warning: 'newdata' had 5 rows but variables found have 20 rows
pred.intervals
## fit lwr upr
## 1 133.72290 100.916974 166.52883
## 2 130.81724 98.340621 163.29385
## 3 113.38323 82.526833 144.23964
## 4 98.85490 68.835752 128.87405
## 5 95.94923 66.037298 125.86117
## 6 90.13790 60.378172 119.89762
## 7 75.60956 45.862015 105.35711
## 8 63.98689 33.871341 94.10245
## 9 63.98689 33.871341 94.10245
## 10 55.26989 24.665025 85.87476
## 11 52.36423 21.557183 83.17127
## 12 52.36423 21.557183 83.17127
## 13 46.55289 15.285111 77.82067
## 14 52.15787 14.378306 89.93743
## 15 45.45344 14.574639 76.33223
## 16 45.00647 14.269892 75.74306
## 17 43.66559 13.078544 74.25263
## 18 41.43078 10.168295 72.69326
## 19 40.08989 7.892941 72.28684
## 20 39.19597 6.174121 72.21781
The interpretation is that 95% of actual values should be between the predictions with 95% confidence interval
cor(price,price.squared)
## [1] 0.9904481
This could be problematic when dealing with polynomials because the relation with consumption might differ
Chapter 6
3 x 5 MA = 1/3(Yt-1 + Yt + Yt+1) Yt=1/5(yt-2 + yt-1 + yt + yt+1 + yt+2) which gives y3x5 = 1/15(yt-3 + 2yt-2 + 3yt-1 + 3yt + 3yt+1 + 2yt+2 + 1yt+3) Which gives us the 7-term weighted moving average with weights of 0.067, 0.133, 0.200, 0.200, 0.200, 0.133, and 0.067.
plot(plastics)
Looks to be a seasonal fluctuation in the middle/end of each year with a lowpoint at the begging of each year
p.fit=decompose(plastics, type="multiplicative")
summary(p.fit)
## Length Class Mode
## x 60 ts numeric
## seasonal 60 ts numeric
## trend 60 ts numeric
## random 60 ts numeric
## figure 12 -none- numeric
## type 1 -none- character
trend=p.fit$trend
trend
## Jan Feb Mar Apr May Jun Jul
## 1 NA NA NA NA NA NA 976.9583
## 2 1000.4583 1011.2083 1022.2917 1034.7083 1045.5417 1054.4167 1065.7917
## 3 1117.3750 1121.5417 1130.6667 1142.7083 1153.5833 1163.0000 1170.3750
## 4 1208.7083 1221.2917 1231.7083 1243.2917 1259.1250 1276.5833 1287.6250
## 5 1374.7917 1382.2083 1381.2500 1370.5833 1351.2500 1331.2500 NA
## Aug Sep Oct Nov Dec
## 1 977.0417 977.0833 978.4167 982.7083 990.4167
## 2 1076.1250 1084.6250 1094.3750 1103.8750 1112.5417
## 3 1175.5000 1180.5417 1185.0000 1190.1667 1197.0833
## 4 1298.0417 1313.0000 1328.1667 1343.5833 1360.6250
## 5 NA NA NA NA NA
seasonal=p.fit$seasonal
seasonal
## Jan Feb Mar Apr May Jun Jul
## 1 0.7670466 0.7103357 0.7765294 0.9103112 1.0447386 1.1570026 1.1636317
## 2 0.7670466 0.7103357 0.7765294 0.9103112 1.0447386 1.1570026 1.1636317
## 3 0.7670466 0.7103357 0.7765294 0.9103112 1.0447386 1.1570026 1.1636317
## 4 0.7670466 0.7103357 0.7765294 0.9103112 1.0447386 1.1570026 1.1636317
## 5 0.7670466 0.7103357 0.7765294 0.9103112 1.0447386 1.1570026 1.1636317
## Aug Sep Oct Nov Dec
## 1 1.2252952 1.2313635 1.1887444 0.9919176 0.8330834
## 2 1.2252952 1.2313635 1.1887444 0.9919176 0.8330834
## 3 1.2252952 1.2313635 1.1887444 0.9919176 0.8330834
## 4 1.2252952 1.2313635 1.1887444 0.9919176 0.8330834
## 5 1.2252952 1.2313635 1.1887444 0.9919176 0.8330834
Yes, they support the fact that there is a peak during the middle of each year.
p.fit.adj=seasadj(p.fit)
plot(p.fit.adj)
plastics2=plastics
plastics2[20]=plastics2[20]+500
p.fit2=decompose(plastics2, type="multiplicative")
p.fit2.adj=seasadj(p.fit2)
plot(p.fit2.adj)
The outlier causes a rise and fall during the middle of the year
plastics2=plastics
plastics2[50]=plastics2[50]+500
p.fit2=decompose(plastics2, type="multiplicative")
p.fit2.adj=seasadj(p.fit2)
plot(p.fit2.adj)
It changes the beginning to less up and downs and the end to more
p.fit3=rwf(p.fit.adj,drift=TRUE)
p.fit3=stl(plastics,t.window=15,s.window="periodic",robust=TRUE)
p.fit4=forecast(p.fit3,method="naive")
p.fit4
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 6 936.2531 883.8662 988.6400 856.1342 1016.3720
## Feb 6 863.6074 789.5211 937.6937 750.3022 976.9126
## Mar 6 942.2625 851.5257 1032.9993 803.4925 1081.0325
## Apr 6 1095.6329 990.8590 1200.4067 935.3951 1255.8707
## May 6 1234.3115 1117.1708 1351.4523 1055.1602 1413.4628
## Jun 6 1344.5774 1216.2562 1472.8987 1148.3270 1540.8278
## Jul 6 1390.0138 1251.4111 1528.6166 1178.0392 1601.9885
## Aug 6 1447.9805 1299.8079 1596.1531 1221.3700 1674.5910
## Sep 6 1463.6068 1306.4460 1620.7676 1223.2501 1703.9635
## Oct 6 1415.5186 1249.8565 1581.1806 1162.1604 1668.8768
## Nov 6 1159.9252 986.1774 1333.6730 894.2009 1425.6495
## Dec 6 1013.0000 831.5264 1194.4736 735.4600 1290.5400
## Jan 7 936.2531 747.3693 1125.1369 647.3803 1225.1259
## Feb 7 863.6074 667.5935 1059.6214 563.8300 1163.3849
## Mar 7 942.2625 739.3688 1145.1562 631.9633 1252.5616
## Apr 7 1095.6329 886.0852 1305.1806 775.1573 1416.1085
## May 7 1234.3115 1018.3147 1450.3084 903.9729 1564.6502
## Jun 7 1344.5774 1122.3185 1566.8363 1004.6617 1684.4931
## Jul 7 1390.0138 1161.6645 1618.3632 1040.7837 1739.2440
## Aug 7 1447.9805 1213.6990 1682.2620 1089.6779 1806.2831
## Sep 7 1463.6068 1223.5397 1703.6739 1096.4559 1830.7577
## Oct 7 1415.5186 1169.8021 1661.2350 1039.7276 1791.3095
## Nov 7 1159.9252 908.6863 1411.1641 775.6885 1544.1620
## Dec 7 1013.0000 756.3575 1269.6425 620.4992 1405.5008
The adjustment removes the trend component from the seasonal component to better see trends over time. With this adjustment we will see an upwards trend even though there are fluctuations in the early 90s. There are a few months with huge fluctuations which show large changes during these months.
Yes, the recession is very visible and can be seen in the large dips in the graph during the early 90s.