R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

5.8 Exercises

  1. The plot is rising at an exponential level with large positive fluctuations at the end of every year
library(fma)
## Loading required package: forecast
## Warning: package 'forecast' was built under R version 3.4.2
summary(fancy)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1665    5884    8772   14316   16889  104661
plot(fancy)

b) It is necessary to take the log of this data before fitting a model in order to make it normal, if we fit it before taking the log the model would be off since it is exponential

log.fancy=log(fancy)
dummy=rep(0,length(fancy))
dummy[seq_along(dummy)%%12==3]=1
dummy[3]=0
dummy=ts(dummy,frequency=12, start=c(1987,1))

data.fancy=data.frame(log.fancy, dummy)
fit=tslm(log.fancy~trend++season+dummy,data=data.fancy)
summary(fit)
## 
## Call:
## tslm(formula = log.fancy ~ trend + +season + dummy, data = data.fancy)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.33673 -0.12757  0.00257  0.10911  0.37671 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 7.6196670  0.0742471 102.626  < 2e-16 ***
## trend       0.0220198  0.0008268  26.634  < 2e-16 ***
## season2     0.2514168  0.0956790   2.628 0.010555 *  
## season3     0.2660828  0.1934044   1.376 0.173275    
## season4     0.3840535  0.0957075   4.013 0.000148 ***
## season5     0.4094870  0.0957325   4.277 5.88e-05 ***
## season6     0.4488283  0.0957647   4.687 1.33e-05 ***
## season7     0.6104545  0.0958039   6.372 1.71e-08 ***
## season8     0.5879644  0.0958503   6.134 4.53e-08 ***
## season9     0.6693299  0.0959037   6.979 1.36e-09 ***
## season10    0.7473919  0.0959643   7.788 4.48e-11 ***
## season11    1.2067479  0.0960319  12.566  < 2e-16 ***
## season12    1.9622412  0.0961066  20.417  < 2e-16 ***
## dummy       0.5015151  0.1964273   2.553 0.012856 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.179 on 70 degrees of freedom
## Multiple R-squared:  0.9567, Adjusted R-squared:  0.9487 
## F-statistic:   119 on 13 and 70 DF,  p-value: < 2.2e-16
plot(residuals(fit), type='p')

The residuals show no pattern and therefore are unbiased and have no problem

boxplot(resid(fit)~cycle(resid(fit)))

There is a lot of variance in this boxplot, meaning that there may be an issue with our model

  1. The coefficient values show how much each month affects the y value of the overall model. As the year goes along the coefficients get higher because the amounts are higher in the later months of the year.

library(lmtest)
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 3.4.3
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
dwtest(fit)
## 
##  Durbin-Watson test
## 
## data:  fit
## DW = 0.88889, p-value = 9.78e-08
## alternative hypothesis: true autocorrelation is greater than 0

This test shows that there is autocorrelation in the residuals. We should create a better model to get rid of this.

fancy.1=data.frame(dummy=rep(0,36))
forecast=forecast(fit,newdata=fancy.1)
forecast
##          Point Forecast     Lo 80     Hi 80     Lo 95    Hi 95
## Jan 1994       9.491352  9.238522  9.744183  9.101594  9.88111
## Feb 1994       9.764789  9.511959 10.017620  9.375031 10.15455
## Mar 1994       9.801475  9.461879 10.141071  9.277961 10.32499
## Apr 1994       9.941465  9.688635 10.194296  9.551707 10.33122
## May 1994       9.988919  9.736088 10.241749  9.599161 10.37868
## Jun 1994      10.050280  9.797449 10.303110  9.660522 10.44004
## Jul 1994      10.233926  9.981095 10.486756  9.844168 10.62368
## Aug 1994      10.233456  9.980625 10.486286  9.843698 10.62321
## Sep 1994      10.336841 10.084010 10.589671  9.947083 10.72660
## Oct 1994      10.436923 10.184092 10.689753 10.047165 10.82668
## Nov 1994      10.918299 10.665468 11.171129 10.528541 11.30806
## Dec 1994      11.695812 11.442981 11.948642 11.306054 12.08557
## Jan 1995       9.755590  9.499844 10.011336  9.361338 10.14984
## Feb 1995      10.029027  9.773281 10.284773  9.634775 10.42328
## Mar 1995      10.065713  9.722498 10.408928  9.536620 10.59481
## Apr 1995      10.205703  9.949957 10.461449  9.811451 10.59996
## May 1995      10.253157  9.997411 10.508903  9.858904 10.64741
## Jun 1995      10.314518 10.058772 10.570264  9.920265 10.70877
## Jul 1995      10.498164 10.242418 10.753910 10.103911 10.89242
## Aug 1995      10.497694 10.241948 10.753440 10.103441 10.89195
## Sep 1995      10.601079 10.345333 10.856825 10.206826 10.99533
## Oct 1995      10.701161 10.445415 10.956907 10.306908 11.09541
## Nov 1995      11.182537 10.926791 11.438282 10.788284 11.57679
## Dec 1995      11.960050 11.704304 12.215796 11.565797 12.35430
## Jan 1996      10.019828  9.760564 10.279093  9.620151 10.41951
## Feb 1996      10.293265 10.034000 10.552530  9.893588 10.69294
## Mar 1996      10.329951  9.982679 10.677222  9.794605 10.86530
## Apr 1996      10.469941 10.210677 10.729206 10.070264 10.86962
## May 1996      10.517395 10.258130 10.776659 10.117718 10.91707
## Jun 1996      10.578756 10.319491 10.838021 10.179079 10.97843
## Jul 1996      10.762402 10.503137 11.021667 10.362725 11.16208
## Aug 1996      10.761932 10.502667 11.021196 10.362254 11.16161
## Sep 1996      10.865317 10.606052 11.124582 10.465640 11.26499
## Oct 1996      10.965399 10.706134 11.224664 10.565722 11.36508
## Nov 1996      11.446774 11.187510 11.706039 11.047097 11.84645
## Dec 1996      12.224288 11.965023 12.483552 11.824611 12.62396
raw.forecast=as.data.frame(forecast)
raw.forecast=exp(raw.forecast)
raw.forecast
##          Point Forecast     Lo 80     Hi 80      Lo 95     Hi 95
## Jan 1994       13244.70  10285.82  17054.73   8969.583  19557.43
## Feb 1994       17409.81  13520.45  22418.00  11790.284  25707.73
## Mar 1994       18060.36  12860.02  25363.61  10699.594  30484.96
## Apr 1994       20774.16  16133.21  26750.16  14068.696  30675.62
## May 1994       21783.73  16917.24  28050.15  14752.395  32166.37
## Jun 1994       23162.27  17987.81  29825.24  15685.969  34201.95
## Jul 1994       27831.56  21613.98  35837.72  18848.111  41096.73
## Aug 1994       27818.48  21603.82  35820.87  18839.249  41077.41
## Sep 1994       30848.42  23956.87  39722.43  20891.193  45551.50
## Oct 1994       34095.57  26478.61  43903.67  23090.230  50346.32
## Nov 1994       55176.84  42850.31  71049.28  37366.903  81475.41
## Dec 1994      120067.79  93244.59 154607.08  81312.400 177294.90
## Jan 1995       17250.40  13357.65  22277.59  11629.938  25587.08
## Feb 1995       22675.20  17558.28  29283.31  15287.252  33633.55
## Mar 1995       23522.50  16688.88  33154.31  13858.024  39926.92
## Apr 1995       27057.06  20951.33  34942.16  18241.435  40133.06
## May 1995       28371.96  21969.51  36640.25  19127.918  42083.42
## Jun 1995       30167.42  23359.80  38958.95  20338.387  44746.58
## Jul 1995       36248.88  28068.91  46812.70  24438.412  53767.06
## Aug 1995       36231.84  28055.72  46790.69  24426.922  53741.78
## Sep 1995       40178.16  31111.50  51887.06  27087.467  59595.26
## Oct 1995       44407.37  34386.35  57348.77  29938.733  65868.34
## Nov 1995       71864.42  55647.40  92807.48  48449.831 106594.69
## Dec 1995      156380.86 121091.75 201954.08 105429.448 231955.81
## Jan 1996       22467.57  17336.40  29117.46  15065.329  33506.86
## Feb 1996       29533.04  22788.25  38274.14  19802.984  44043.89
## Mar 1996       30636.60  21648.24  43356.94  17936.708  52328.52
## Apr 1996       35240.15  27191.96  45670.42  23629.808  52555.15
## May 1996       36952.72  28513.41  47889.88  24778.151  55109.18
## Jun 1996       39291.20  30317.82  50920.48  26346.183  58596.65
## Jul 1996       47211.93  36429.60  61185.57  31657.322  70409.18
## Aug 1996       47189.73  36412.48  61156.80  31642.439  70376.07
## Sep 1996       52329.57  40378.47  67817.91  35088.887  78041.33
## Oct 1996       57837.85  44628.77  74956.52  38782.394  86256.08
## Nov 1996       93598.96  72222.70 121302.09  62761.521 139588.15
## Dec 1996      203676.38 157160.50 263959.89 136572.460 303751.35

We could improve these predictions by modifying the model to get rid of the autocorrelation in the residuals by doing more transformations to begin with or by messing around with different types of regression models.

texasgas
##    price consumption
## 1     30         134
## 2     31         112
## 3     37         136
## 4     42         109
## 5     43         105
## 6     45          87
## 7     50          56
## 8     54          43
## 9     54          77
## 10    57          35
## 11    58          65
## 12    58          56
## 13    60          58
## 14    73          55
## 15    88          49
## 16    89          39
## 17    92          36
## 18    97          46
## 19   100          40
## 20   102          42
names(texasgas)
## [1] "price"       "consumption"
attach(texasgas)
plot(price,consumption)

  1. P and C look to be inversely related so the slope of the line should change with P

price.exp=exp(texasgas$price)
mylm=lm(texasgas$consumption~price.exp)
summary(mylm)
## 
## Call:
## lm(formula = texasgas$consumption ~ price.exp)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -35.86 -25.09 -13.86  20.64  65.14 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  7.086e+01  7.670e+00   9.238 2.98e-08 ***
## price.exp   -1.642e-43  1.711e-43  -0.959     0.35    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 33.19 on 18 degrees of freedom
## Multiple R-squared:  0.04864,    Adjusted R-squared:  -0.004214 
## F-statistic: 0.9203 on 1 and 18 DF,  p-value: 0.3501
(summary(mylm)$sigma)^2
## [1] 1101.359
dummy1=ifelse(texasgas$price<=60, 1, 0)
dummy2=ifelse(texasgas$price<=60, texasgas$price, 0)
dummy3=ifelse(texasgas$price>60, texasgas$price, 0)

mylm2=lm(texasgas$consumption~dummy1+dummy2+dummy3)
summary(mylm2)
## 
## Call:
## lm(formula = texasgas$consumption ~ dummy1 + dummy2 + dummy3)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -20.987  -6.421   2.823   9.324  22.617 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  84.7861    51.8428   1.635   0.1215    
## dummy1      136.1068    54.9412   2.477   0.0248 *  
## dummy2       -2.9057     0.3738  -7.773 8.05e-07 ***
## dummy3       -0.4470     0.5634  -0.793   0.4392    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 13.49 on 16 degrees of freedom
## Multiple R-squared:  0.8602, Adjusted R-squared:  0.834 
## F-statistic: 32.81 on 3 and 16 DF,  p-value: 4.565e-07
(summary(mylm2)$sigma)^2
## [1] 182.1078
price.squared=texasgas$price^2
mylm3=lm(texasgas$consumption~texasgas$price+price.squared)
summary(mylm3)
## 
## Call:
## lm(formula = texasgas$consumption ~ texasgas$price + price.squared)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -25.5601  -5.4693   0.7502  11.0252  25.6619 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    273.930628  31.031614   8.827 9.32e-08 ***
## texasgas$price  -5.675863   1.009086  -5.625 3.03e-05 ***
## price.squared    0.033904   0.007412   4.574 0.000269 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 14.37 on 17 degrees of freedom
## Multiple R-squared:  0.8315, Adjusted R-squared:  0.8117 
## F-statistic: 41.95 on 2 and 17 DF,  p-value: 2.666e-07
(summary(mylm3)$sigma)^2
## [1] 206.5276
summary(mylm)$adj.r.squared
## [1] -0.004214286
AIC(mylm)
## [1] 200.7363
summary(mylm2)$adj.r.squared
## [1] 0.8339549
AIC(mylm2)
## [1] 166.3866
summary(mylm3)$adj.r.squared
## [1] 0.811689
AIC(mylm3)
## [1] 168.1158
plot(residuals(mylm))

plot(residuals(mylm2))

plot(residuals(mylm3))

The first model looks like it isn’t as good as the other two since there isn’t much residual variance.

prediction.data=data.frame(price=c(40,60,80,100,120))
predict(mylm2, prediction.data)
## Warning: 'newdata' had 5 rows but variables found have 20 rows
##         1         2         3         4         5         6         7 
## 133.72290 130.81724 113.38323  98.85490  95.94923  90.13790  75.60956 
##         8         9        10        11        12        13        14 
##  63.98689  63.98689  55.26989  52.36423  52.36423  46.55289  52.15787 
##        15        16        17        18        19        20 
##  45.45344  45.00647  43.66559  41.43078  40.08989  39.19597
pred.intervals=predict(mylm2,prediction.data,interval="predict")
## Warning: 'newdata' had 5 rows but variables found have 20 rows
pred.intervals
##          fit        lwr       upr
## 1  133.72290 100.916974 166.52883
## 2  130.81724  98.340621 163.29385
## 3  113.38323  82.526833 144.23964
## 4   98.85490  68.835752 128.87405
## 5   95.94923  66.037298 125.86117
## 6   90.13790  60.378172 119.89762
## 7   75.60956  45.862015 105.35711
## 8   63.98689  33.871341  94.10245
## 9   63.98689  33.871341  94.10245
## 10  55.26989  24.665025  85.87476
## 11  52.36423  21.557183  83.17127
## 12  52.36423  21.557183  83.17127
## 13  46.55289  15.285111  77.82067
## 14  52.15787  14.378306  89.93743
## 15  45.45344  14.574639  76.33223
## 16  45.00647  14.269892  75.74306
## 17  43.66559  13.078544  74.25263
## 18  41.43078  10.168295  72.69326
## 19  40.08989   7.892941  72.28684
## 20  39.19597   6.174121  72.21781

The interpretation is that 95% of actual values should be between the predictions with 95% confidence interval

cor(price,price.squared)
## [1] 0.9904481

This could be problematic when dealing with polynomials because the relation with consumption might differ

Chapter 6

  1. 3 x 5 MA = 1/3(Yt-1 + Yt + Yt+1) Yt=1/5(yt-2 + yt-1 + yt + yt+1 + yt+2) which gives y3x5 = 1/15(yt-3 + 2yt-2 + 3yt-1 + 3yt + 3yt+1 + 2yt+2 + 1yt+3) Which gives us the 7-term weighted moving average with weights of 0.067, 0.133, 0.200, 0.200, 0.200, 0.133, and 0.067.

plot(plastics)

Looks to be a seasonal fluctuation in the middle/end of each year with a lowpoint at the begging of each year

p.fit=decompose(plastics, type="multiplicative")
summary(p.fit)
##          Length Class  Mode     
## x        60     ts     numeric  
## seasonal 60     ts     numeric  
## trend    60     ts     numeric  
## random   60     ts     numeric  
## figure   12     -none- numeric  
## type      1     -none- character
trend=p.fit$trend
trend
##         Jan       Feb       Mar       Apr       May       Jun       Jul
## 1        NA        NA        NA        NA        NA        NA  976.9583
## 2 1000.4583 1011.2083 1022.2917 1034.7083 1045.5417 1054.4167 1065.7917
## 3 1117.3750 1121.5417 1130.6667 1142.7083 1153.5833 1163.0000 1170.3750
## 4 1208.7083 1221.2917 1231.7083 1243.2917 1259.1250 1276.5833 1287.6250
## 5 1374.7917 1382.2083 1381.2500 1370.5833 1351.2500 1331.2500        NA
##         Aug       Sep       Oct       Nov       Dec
## 1  977.0417  977.0833  978.4167  982.7083  990.4167
## 2 1076.1250 1084.6250 1094.3750 1103.8750 1112.5417
## 3 1175.5000 1180.5417 1185.0000 1190.1667 1197.0833
## 4 1298.0417 1313.0000 1328.1667 1343.5833 1360.6250
## 5        NA        NA        NA        NA        NA
seasonal=p.fit$seasonal
seasonal
##         Jan       Feb       Mar       Apr       May       Jun       Jul
## 1 0.7670466 0.7103357 0.7765294 0.9103112 1.0447386 1.1570026 1.1636317
## 2 0.7670466 0.7103357 0.7765294 0.9103112 1.0447386 1.1570026 1.1636317
## 3 0.7670466 0.7103357 0.7765294 0.9103112 1.0447386 1.1570026 1.1636317
## 4 0.7670466 0.7103357 0.7765294 0.9103112 1.0447386 1.1570026 1.1636317
## 5 0.7670466 0.7103357 0.7765294 0.9103112 1.0447386 1.1570026 1.1636317
##         Aug       Sep       Oct       Nov       Dec
## 1 1.2252952 1.2313635 1.1887444 0.9919176 0.8330834
## 2 1.2252952 1.2313635 1.1887444 0.9919176 0.8330834
## 3 1.2252952 1.2313635 1.1887444 0.9919176 0.8330834
## 4 1.2252952 1.2313635 1.1887444 0.9919176 0.8330834
## 5 1.2252952 1.2313635 1.1887444 0.9919176 0.8330834
  1. Yes, they support the fact that there is a peak during the middle of each year.

p.fit.adj=seasadj(p.fit)
plot(p.fit.adj)

plastics2=plastics
plastics2[20]=plastics2[20]+500
p.fit2=decompose(plastics2, type="multiplicative")
p.fit2.adj=seasadj(p.fit2)
plot(p.fit2.adj)

The outlier causes a rise and fall during the middle of the year

plastics2=plastics
plastics2[50]=plastics2[50]+500
p.fit2=decompose(plastics2, type="multiplicative")
p.fit2.adj=seasadj(p.fit2)
plot(p.fit2.adj)

It changes the beginning to less up and downs and the end to more

p.fit3=rwf(p.fit.adj,drift=TRUE)
p.fit3=stl(plastics,t.window=15,s.window="periodic",robust=TRUE)
p.fit4=forecast(p.fit3,method="naive")
p.fit4
##       Point Forecast     Lo 80     Hi 80     Lo 95     Hi 95
## Jan 6       936.2531  883.8662  988.6400  856.1342 1016.3720
## Feb 6       863.6074  789.5211  937.6937  750.3022  976.9126
## Mar 6       942.2625  851.5257 1032.9993  803.4925 1081.0325
## Apr 6      1095.6329  990.8590 1200.4067  935.3951 1255.8707
## May 6      1234.3115 1117.1708 1351.4523 1055.1602 1413.4628
## Jun 6      1344.5774 1216.2562 1472.8987 1148.3270 1540.8278
## Jul 6      1390.0138 1251.4111 1528.6166 1178.0392 1601.9885
## Aug 6      1447.9805 1299.8079 1596.1531 1221.3700 1674.5910
## Sep 6      1463.6068 1306.4460 1620.7676 1223.2501 1703.9635
## Oct 6      1415.5186 1249.8565 1581.1806 1162.1604 1668.8768
## Nov 6      1159.9252  986.1774 1333.6730  894.2009 1425.6495
## Dec 6      1013.0000  831.5264 1194.4736  735.4600 1290.5400
## Jan 7       936.2531  747.3693 1125.1369  647.3803 1225.1259
## Feb 7       863.6074  667.5935 1059.6214  563.8300 1163.3849
## Mar 7       942.2625  739.3688 1145.1562  631.9633 1252.5616
## Apr 7      1095.6329  886.0852 1305.1806  775.1573 1416.1085
## May 7      1234.3115 1018.3147 1450.3084  903.9729 1564.6502
## Jun 7      1344.5774 1122.3185 1566.8363 1004.6617 1684.4931
## Jul 7      1390.0138 1161.6645 1618.3632 1040.7837 1739.2440
## Aug 7      1447.9805 1213.6990 1682.2620 1089.6779 1806.2831
## Sep 7      1463.6068 1223.5397 1703.6739 1096.4559 1830.7577
## Oct 7      1415.5186 1169.8021 1661.2350 1039.7276 1791.3095
## Nov 7      1159.9252  908.6863 1411.1641  775.6885 1544.1620
## Dec 7      1013.0000  756.3575 1269.6425  620.4992 1405.5008
  1. The adjustment removes the trend component from the seasonal component to better see trends over time. With this adjustment we will see an upwards trend even though there are fluctuations in the early 90s. There are a few months with huge fluctuations which show large changes during these months.

  2. Yes, the recession is very visible and can be seen in the large dips in the graph during the early 90s.