Homework 1

2.10

5.

library(fpp)

## Loading required package: forecast

## Warning: package 'forecast' was built under R version 3.4.4

## Loading required package: fma

## Loading required package: expsmooth

## Loading required package: lmtest

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

## Loading required package: tseries

## Warning: package 'tseries' was built under R version 3.4.4

writing

##           Jan      Feb      Mar      Apr      May      Jun      Jul
## 1968  562.674  599.000  668.516  597.798  579.889  668.233  499.232
## 1969  634.712  639.283  712.182  621.557  621.000  675.989  501.322
## 1970  646.783  658.442  712.906  687.714  723.916  707.183  629.000
## 1971  676.155  748.183  810.681  729.363  701.108  790.079  594.621
## 1972  747.636  773.392  813.788  766.713  728.875  749.197  680.954
## 1973  795.337  788.421  889.968  797.393  751.000  821.255  691.605
## 1974  843.038  847.000  941.952  804.309  840.307  871.528  656.330
## 1975  778.139  856.075  938.833  813.023  783.417  828.110  657.311
## 1976  895.217  856.075  893.268  875.000  835.088  934.595  832.500
## 1977  875.024  992.968  976.804  968.697  871.675 1006.852  832.037
##           Aug      Sep      Oct      Nov      Dec
## 1968  215.187  555.813  586.935  546.136  571.111
## 1969  220.286  560.727  602.530  626.379  605.508
## 1970  237.530  613.296  730.444  734.925  651.812
## 1971  230.716  617.189  691.389  701.067  705.777
## 1972  241.424  680.234  708.326  694.238  772.071
## 1973  290.655  727.147  868.355  812.390  799.556
## 1974  370.508  742.000  847.152  731.675  898.527
## 1975  310.032  780.000  860.000  780.000  807.993
## 1976  300.000  791.443  900.000  781.729  880.000
## 1977  345.587  849.528  913.871  868.746  993.733

?writing

ggseasonplot(writing)

ggsubseriesplot(writing)

The seasonal plots show the sales for printing and writing paper in France dropped dramatically during the month of August between the years 1968 and 1977.

There is no year that looks particulary unusual in this time series.

fancy

##            Jan       Feb       Mar       Apr       May       Jun       Jul
## 1987   1664.81   2397.53   2840.71   3547.29   3752.96   3714.74   4349.61
## 1988   2499.81   5198.24   7225.14   4806.03   5900.88   4951.34   6179.12
## 1989   4717.02   5702.63   9957.58   5304.78   6492.43   6630.80   7349.62
## 1990   5921.10   5814.58  12421.25   6369.77   7609.12   7224.75   8121.22
## 1991   4826.64   6470.23   9638.77   8821.17   8722.37  10209.48  11276.55
## 1992   7615.03   9849.69  14558.40  11587.33   9332.56  13082.09  16732.78
## 1993  10243.24  11266.88  21826.84  17357.33  15997.79  18601.53  26155.15
##            Aug       Sep       Oct       Nov       Dec
## 1987   3566.34   5021.82   6423.48   7600.60  19756.21
## 1988   4752.15   5496.43   5835.10  12600.08  28541.72
## 1989   8176.62   8573.17   9690.50  15151.84  34061.01
## 1990   7979.25   8093.06   8476.70  17914.66  30114.41
## 1991  12552.22  11637.39  13606.89  21822.11  45060.69
## 1992  19888.61  23933.38  25391.35  36024.80  80721.71
## 1993  28586.52  30505.41  30821.33  46634.38 104660.67

?fancy

ggseasonplot(fancy)

ggsubseriesplot(fancy)

The seasonal plots for montly sales at a souvenir shop in Queensland, Australia show a large increase in sales in the month of November, and an even larger increase in December every year from 1987 to 1993. Most years after 1987, also show a small spike in sales in the month of March. In most years, sales remain relatively constant for most of the year with low variance between months.

Souvenir sales in 1992 and 1993 experienced larger increases between May and October than previous years. December sales in these years were also much higher than sales in previous years. From 1987-1991, December sales increased steadily from $19k in ’87 to $45k in ’91. However, December sales jumped to $80k in ’92 and $104k in ’93.

These plots display the seasonality of tourism in Queensland,Australia, and the growth in popularity of the location, especially in the years 1992 and 1993.

a10

##            Jan       Feb       Mar       Apr       May       Jun       Jul
## 1991                                                              3.526591
## 1992  5.088335  2.814520  2.985811  3.204780  3.127578  3.270523  3.737851
## 1993  6.192068  3.450857  3.772307  3.734303  3.905399  4.049687  4.315566
## 1994  6.731473  3.841278  4.394076  4.075341  4.540645  4.645615  4.752607
## 1995  6.749484  4.216067  4.949349  4.823045  5.194754  5.170787  5.256742
## 1996  8.329452  5.069796  5.262557  5.597126  6.110296  5.689161  6.486849
## 1997  8.524471  5.277918  5.714303  6.214529  6.411929  6.667716  7.050831
## 1998  8.798513  5.918261  6.534493  6.675736  7.064201  7.383381  7.813496
## 1999 10.391416  6.421535  8.062619  7.297739  7.936916  8.165323  8.717420
## 2000 12.511462  7.457199  8.591191  8.474000  9.386803  9.560399 10.834295
## 2001 14.497581  8.049275 10.312891  9.753358 10.850382  9.961719 11.443601
## 2002 16.300269  9.053485 10.002449 10.788750 12.106705 10.954101 12.844566
## 2003 16.828350  9.800215 10.816994 10.654223 12.512323 12.161210 12.998046
## 2004 18.003768 11.938030 12.997900 12.882645 13.943447 13.989472 15.339097
## 2005 20.778723 12.154552 13.402392 14.459239 14.795102 15.705248 15.829550
## 2006 23.486694 12.536987 15.467018 14.233539 17.783058 16.291602 16.980282
## 2007 28.038383 16.763869 19.792754 16.427305 21.000742 20.681002 21.834890
## 2008 29.665356 21.654285 18.264945 23.107677 22.912510 19.431740          
##            Aug       Sep       Oct       Nov       Dec
## 1991  3.180891  3.252221  3.611003  3.565869  4.306371
## 1992  3.558776  3.777202  3.924490  4.386531  5.810549
## 1993  4.562185  4.608662  4.667851  5.093841  7.179962
## 1994  5.350605  5.204455  5.301651  5.773742  6.204593
## 1995  5.855277  5.490729  6.115293  6.088473  7.416598
## 1996  6.300569  6.467476  6.828629  6.649078  8.606937
## 1997  6.704919  7.250988  7.819733  7.398101 10.096233
## 1998  7.431892  8.275117  8.260441  8.596156 10.558939
## 1999  9.070964  9.177113  9.251887  9.933136 11.532974
## 2000 10.643751  9.908162 11.710041 11.340151 12.079132
## 2001 11.659239 10.647060 12.652134 13.674466 12.965735
## 2002 12.196500 12.854748 13.542004 13.287640 15.134918
## 2003 12.517276 13.268658 14.733622 13.669382 16.503966
## 2004 15.370764 16.142005 16.685754 17.636728 18.869325
## 2005 17.554701 18.100864 17.496668 19.347265 20.031291
## 2006 18.612189 16.623343 21.430241 23.575517 23.334206
## 2007 23.930204 22.930357 23.263340 25.250030 25.806090
## 2008

?a10

ggseasonplot(a10)

ggsubseriesplot(a10)

The season plots for monthly anti-diabetic drug sales in Australia from 1992 to 2008 show a general drop in sales from January to Febuary. There also appears to be an increase in variance between months in later years compared to earlier years in the time series.

h02

##            Jan       Feb       Mar       Apr       May       Jun       Jul
## 1991                                                             0.4297950
## 1992 0.6601190 0.3362200 0.3513480 0.3798080 0.3618010 0.4105340 0.4833887
## 1993 0.7515028 0.3875543 0.4272832 0.4138902 0.4288588 0.4701264 0.5092097
## 1994 0.8193253 0.4376698 0.5061213 0.4704912 0.5106963 0.5405138 0.5581189
## 1995 0.8031126 0.4752582 0.5525723 0.5271078 0.5612498 0.5889776 0.6231336
## 1996 0.9372759 0.5287616 0.5593399 0.5778717 0.6149274 0.5941888 0.7077584
## 1997 0.8468335 0.4638225 0.4852732 0.5280586 0.5623365 0.5885704 0.6694804
## 1998 0.8005444 0.4905572 0.5244080 0.5366495 0.5520905 0.6033656 0.6812454
## 1999 0.8930815 0.5126960 0.6529959 0.5739764 0.6392384 0.7038719 0.7706482
## 2000 0.9696557 0.5732915 0.6185068 0.6189957 0.6652092 0.7265201 0.8558649
## 2001 1.0438053 0.5106472 0.6725690 0.6484701 0.7041147 0.6994307 0.8519259
## 2002 1.1458676 0.5755844 0.6411646 0.6798621 0.7679384 0.7520959 0.9180636
## 2003 1.0781449 0.5782962 0.6433333 0.6633674 0.7505160 0.8007456 0.9163610
## 2004 1.1301252 0.6679887 0.7490143 0.7399860 0.7951286 0.8568028 1.0015932
## 2005 1.1706900 0.5976390 0.6525900 0.6705050 0.6952480 0.8422630 0.8743360
## 2006 1.2306910 0.5871350 0.7069590 0.6396410 0.8074050 0.7979700 0.8843120
## 2007 1.2233190 0.5977530 0.7043980 0.5617600 0.7452580 0.8379340 0.9541440
## 2008 1.2199410 0.7618220 0.6494350 0.8278870 0.8162550 0.7621370          
##            Aug       Sep       Oct       Nov       Dec
## 1991 0.4009060 0.4321590 0.4925430 0.5023690 0.6026520
## 1992 0.4754634 0.5347610 0.5686061 0.5952233 0.7712578
## 1993 0.5584430 0.6015141 0.6329471 0.6996054 0.9630805
## 1994 0.6728521 0.6858974 0.6896920 0.7413036 0.8133076
## 1995 0.7408372 0.7253718 0.8158030 0.8140095 0.9266531
## 1996 0.7195020 0.7443237 0.8048551 0.7885423 0.9710894
## 1997 0.6779937 0.7629955 0.7997237 0.7705219 0.9943893
## 1998 0.6780753 0.7948926 0.7846239 0.8130087 0.9777323
## 1999 0.8461859 0.8927289 0.8978999 0.9472807 1.0507073
## 2000 0.8659843 0.8252488 0.9554210 0.9385960 1.0130244
## 2001 0.9077052 0.8674445 1.0242928 1.1095902 1.0123132
## 2002 0.9243675 1.0131977 1.0269761 1.0067960 1.1027757
## 2003 0.9168868 1.0846589 1.1506482 1.0508382 1.2232345
## 2004 0.9948643 1.1344320 1.1810110 1.2160370 1.2572380
## 2005 1.0064970 1.0947360 1.0270430 1.1492320 1.1607120
## 2006 1.0496480 0.9957090 1.1682530 1.1080380 1.1200530
## 2007 1.0782195 1.1109816 1.1099791 1.1635343 1.1765890
## 2008

?h02

ggseasonplot(h02)

ggsubseriesplot(h02)

Simalarly to the last seasonal plots, these plots display a large drop in drug sales from January to Febuary. The variance in sales between months is again much greater in later years. Starting around 2000, the monthly sales have become much more random, to the point that some years show decrease sales in a month, that the previous year showed an increase in the same month. The sales seem much less predictable in the later years.

6.

hsales

##      Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1973  55  60  68  63  65  61  54  52  46  42  37  30
## 1974  37  44  55  53  58  50  48  45  41  34  30  24
## 1975  29  34  44  54  57  51  51  53  46  46  46  39
## 1976  41  53  55  62  55  56  57  59  58  55  49  47
## 1977  57  68  84  81  78  74  64  74  71  63  55  51
## 1978  57  63  75  85  80  77  68  72  68  70  53  50
## 1979  53  58  73  72  68  63  64  68  60  54  41  35
## 1980  43  44  44  36  44  50  55  61  50  46  39  33
## 1981  37  40  49  44  45  38  36  34  28  29  27  29
## 1982  28  29  36  32  36  34  31  36  39  40  39  33
## 1983  44  46  57  59  64  59  51  50  48  51  45  48
## 1984  52  58  63  61  59  58  52  48  53  55  42  38
## 1985  48  55  67  60  65  65  63  61  54  52  51  47
## 1986  55  59  89  84  75  66  57  52  60  54  48  49
## 1987  53  59  73  72  62  58  55  56  52  52  43  37
## 1988  43  55  68  68  64  65  57  59  54  57  43  42
## 1989  52  51  58  60  61  58  62  61  49  51  47  40
## 1990  45  50  58  52  50  50  46  46  38  37  34  29
## 1991  30  40  46  46  47  47  43  46  37  41  39  36
## 1992  48  55  56  53  52  53  52  56  51  48  42  42
## 1993  44  50  60  66  58  59  55  57  57  56  53  51
## 1994  45  58  74  65  65  55  52  59  54  57  45  40
## 1995  47  47  60  58  63  64  64  63  55  54  44

?hsales

autoplot(hsales)

ggseasonplot(hsales)

ggsubseriesplot(hsales)

gglagplot(hsales)

ggAcf(hsales)

In viewing the plots, there appears to be seaonality in the sales of new one-family houses in the USA from 1973-1995. The sales tend to increase during spring months each year, whith the largest spike typically being in March. There does not appear to be a trend with this time series, as sales have not generally increased or decreased over the years. There does appear to by cyclicity every 5-10 years in the sales of one-family houses in the USA during this time period.

The plots show that there is cyclicity and seasonality in the sales of one-family houses in the USA. There are rises and falls in the market every 5-10 years. Also, there is seasonality within each year, where more houses are sold in the spring months compared to the rest of the year.

?usdeaths
autoplot(usdeaths)

ggseasonplot(usdeaths)

ggsubseriesplot(usdeaths)

gglagplot(usdeaths)

ggAcf(usdeaths)

In viewing the plots, there is definitly seasonality in monthly accidental deaths in the USA from 1973 to 1978. There are far more deaths, typically, in the summer months, especially July. There are also far less accidental deaths every year during the winter months, especially February. The plots do not show any trend as the number of deaths does not seem to rise or fall throughout the years. I also do not notice any cyclicity in this time series.

The plots reveal certain seasonality in this time series. This says to me that there is something during the summer months (maybe more drivers) that causes more deaths during these months. The plots also reveal that there is no real increase or decrease of accidental deaths over the years, and no cyclicity either.

?bricksq
autoplot(bricksq)

ggseasonplot(bricksq)

ggsubseriesplot(bricksq)

gglagplot(bricksq)

ggAcf(bricksq)

The plots for quarterly clay brick production in Australia from 1956 to 1994 show possible seasonality. Although brick produciton remains relatively constant most years, there does appear to be a seasonal trend of an steady increase in brick produciton from Q1-Q3 and then a drop after Q3 into Q4. There also appears to be an upward trend in the time series, as brick production has generally increased from 1956 to 1994. However, there is a major decrease in 1984 and a decrease from the late ’80s into the early and mid ’90s, although it does appear to trend up again towards the end. There is no cyclicity in the time series.

The plots reveal to me that there is a seasonal trend most years for brick sales in Australia, with Q3 typically being the best quarter. It also reveals that something must have happened in 1984 to cause such a large decrease in brick sales.

ss.ts<- ts(data = sunspotarea, start = 1875, end = 2011, frequency = 12, ts.eps = getOption("ts.eps"), class ="ts")

autoplot(ss.ts)

ggseasonplot(ss.ts)

ggsubseriesplot(ss.ts)

gglagplot(ss.ts)

ggAcf(ss.ts)

plot(decompose(ss.ts))

This time series appears to be much more random than the other. It is tough to read the plots with so many years of observations, however I do believe there is seasonality in the time series. There appears to be an increase in sunspot area during the summer and into fall months. There is no trend in the data but there does appear to be cyclicity about every 10 years.

The plots for this time series reveal to me an increase in sun spot area during summer and fall months. This makes sense considering we typically get more sun during these months. I think it is also worth noting that there is a major spike about every 10 years where the averages of sunspot area jumps way higher than normal.

(“Gasoline” could not be found)

8.

Time Plot A -> ACF B Time Plot B -> ACF A Time Plot C -> ACF D Time Plot D -> ACF C

10.

?dj
ddj <- diff(dj)
plot(ddj)

ggAcf(ddj)

Yes, the ddj time series looks similar to white noise. Looking at the ACF plot, each autocorrelation is close to zero, indicating white noise.

3.7

8.

retaildata <- readxl::read_excel("retail.xlsx", skip = 1)
myts <- ts(retaildata[,186], c(1982, 4), frequency = 12)

myts.train <- window(myts, end=c(2010,12))
myts.test <- window(myts, start=2011)

autoplot(myts) +
  autolayer(myts.train, series="Training") +
  autolayer(myts.test, series="Test")

fc <- snaive(myts.train)
fc

##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Jan 2011         2601.3 2464.343 2738.257 2391.842 2810.758
## Feb 2011         2426.4 2289.443 2563.357 2216.942 2635.858
## Mar 2011         2709.9 2572.943 2846.857 2500.442 2919.358
## Apr 2011         2549.6 2412.643 2686.557 2340.142 2759.058
## May 2011         2633.7 2496.743 2770.657 2424.242 2843.158
## Jun 2011         2604.6 2467.643 2741.557 2395.142 2814.058
## Jul 2011         2765.1 2628.143 2902.057 2555.642 2974.558
## Aug 2011         2803.4 2666.443 2940.357 2593.942 3012.858
## Sep 2011         2809.9 2672.943 2946.857 2600.442 3019.358
## Oct 2011         2854.1 2717.143 2991.057 2644.642 3063.558
## Nov 2011         2998.3 2861.343 3135.257 2788.842 3207.758
## Dec 2011         3947.6 3810.643 4084.557 3738.142 4157.058
## Jan 2012         2601.3 2407.614 2794.986 2305.082 2897.518
## Feb 2012         2426.4 2232.714 2620.086 2130.182 2722.618
## Mar 2012         2709.9 2516.214 2903.586 2413.682 3006.118
## Apr 2012         2549.6 2355.914 2743.286 2253.382 2845.818
## May 2012         2633.7 2440.014 2827.386 2337.482 2929.918
## Jun 2012         2604.6 2410.914 2798.286 2308.382 2900.818
## Jul 2012         2765.1 2571.414 2958.786 2468.882 3061.318
## Aug 2012         2803.4 2609.714 2997.086 2507.182 3099.618
## Sep 2012         2809.9 2616.214 3003.586 2513.682 3106.118
## Oct 2012         2854.1 2660.414 3047.786 2557.882 3150.318
## Nov 2012         2998.3 2804.614 3191.986 2702.082 3294.518
## Dec 2012         3947.6 3753.914 4141.286 3651.382 4243.818

accuracy(fc,myts.test)

##                     ME     RMSE       MAE      MPE     MAPE     MASE
## Training set  82.63694 106.8681  87.78769 5.845747 6.146052 1.000000
## Test set     239.16667 256.2178 239.16667 7.835325 7.835325 2.724376
##                   ACF1 Theil's U
## Training set 0.6795773        NA
## Test set     0.2784120 0.7753287

checkresiduals(fc)

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 635.4, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

The residuals appear uncorrelated and normal

The accuracy metrics appear to be much better for the training set (possible overfitting?)

5.10

5

autoplot(fancy)

#####A) The time series plot shows a trend over time of an increase in sales from 1988-1994. There also appears to be large spikes around december every year (due to the influx of tourists during Christmas time.) There is also a large increase in sales generally over the last couple of years.

B)

It is important to take a log of this data because it is has exponetial growth, therefore it would normalize the data.

C)

log.fancy <- log(fancy)
dummy.surf = rep(0, length(fancy))
dummy.surf[seq_along(dummy.surf)%%12 == 3] <- 1
dummy.surf[3] <- 0
dummy.surf <- ts(dummy.surf, freq = 12, start=c(1987,1))
log.fancy.df <- data.frame(log.fancy,  dummy.surf)

fit <- tslm(log.fancy ~ trend + season + dummy.surf, data=log.fancy.df)

future.surf <- data.frame(dummy.surf = rep(0, 12))
future.surf[3,] <- 1
forecast(fit, newdata=future.surf)

##          Point Forecast     Lo 80     Hi 80     Lo 95    Hi 95
## Jan 1994       9.491352  9.238522  9.744183  9.101594  9.88111
## Feb 1994       9.764789  9.511959 10.017620  9.375031 10.15455
## Mar 1994      10.302990 10.048860 10.557120  9.911228 10.69475
## Apr 1994       9.941465  9.688635 10.194296  9.551707 10.33122
## May 1994       9.988919  9.736088 10.241749  9.599161 10.37868
## Jun 1994      10.050280  9.797449 10.303110  9.660522 10.44004
## Jul 1994      10.233926  9.981095 10.486756  9.844168 10.62368
## Aug 1994      10.233456  9.980625 10.486286  9.843698 10.62321
## Sep 1994      10.336841 10.084010 10.589671  9.947083 10.72660
## Oct 1994      10.436923 10.184092 10.689753 10.047165 10.82668
## Nov 1994      10.918299 10.665468 11.171129 10.528541 11.30806
## Dec 1994      11.695812 11.442981 11.948642 11.306054 12.08557

D)

plot(residuals(fit), type='p')
abline(h = 0, col="grey")

There appears to be potential correlation between the residuals and nonlinearity. From 1990 to 1993 the residuals are mostly negative while in 1994 they are mostly positive.

plot(as.numeric(fitted(fit)), residuals(fit), type='p')

The residuals plotted against the fitted values show no pattern or correlation.This means they should be unbiased and homoscedastic

E)

boxplot(resid(fit) ~ cycle(resid(fit)))

Residuals for the fall months seem to have more variance than the other months, which may mean there is some factor from these months that the model is not capturing. #####F)

summary(fit)

## 
## Call:
## tslm(formula = log.fancy ~ trend + season + dummy.surf, data = log.fancy.df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.33673 -0.12757  0.00257  0.10911  0.37671 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 7.6196670  0.0742471 102.626  < 2e-16 ***
## trend       0.0220198  0.0008268  26.634  < 2e-16 ***
## season2     0.2514168  0.0956790   2.628 0.010555 *  
## season3     0.2660828  0.1934044   1.376 0.173275    
## season4     0.3840535  0.0957075   4.013 0.000148 ***
## season5     0.4094870  0.0957325   4.277 5.88e-05 ***
## season6     0.4488283  0.0957647   4.687 1.33e-05 ***
## season7     0.6104545  0.0958039   6.372 1.71e-08 ***
## season8     0.5879644  0.0958503   6.134 4.53e-08 ***
## season9     0.6693299  0.0959037   6.979 1.36e-09 ***
## season10    0.7473919  0.0959643   7.788 4.48e-11 ***
## season11    1.2067479  0.0960319  12.566  < 2e-16 ***
## season12    1.9622412  0.0961066  20.417  < 2e-16 ***
## dummy.surf  0.5015151  0.1964273   2.553 0.012856 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.179 on 70 degrees of freedom
## Multiple R-squared:  0.9567, Adjusted R-squared:  0.9487 
## F-statistic:   119 on 13 and 70 DF,  p-value: < 2.2e-16

The season coefficients show typically how sales increase or decrease for each month of the year. All of the months are positive and are statistically significant, except for March.

The trend variable shows that there is a positive trend in the data and that trend is significant. Lastly, the dummy.surf variable shows that the surfing festival is statistically significant in this model and it causes a .5 increase in sales, generally.

G)

library(lmtest)
bgtest(fit)

## 
##  Breusch-Godfrey test for serial correlation of order up to 1
## 
## data:  fit
## LM test = 25.031, df = 1, p-value = 5.642e-07

The Breusch-Godfrey test shows that there is still some autocorrelation in the residuals.This means there is still some factors not being accounted for that could improve the model.

H)

future.surf <- data.frame(dummy.surf = rep(0, 36))
preds <- forecast(fit, newdata=future.surf)
preds

##          Point Forecast     Lo 80     Hi 80     Lo 95    Hi 95
## Jan 1994       9.491352  9.238522  9.744183  9.101594  9.88111
## Feb 1994       9.764789  9.511959 10.017620  9.375031 10.15455
## Mar 1994       9.801475  9.461879 10.141071  9.277961 10.32499
## Apr 1994       9.941465  9.688635 10.194296  9.551707 10.33122
## May 1994       9.988919  9.736088 10.241749  9.599161 10.37868
## Jun 1994      10.050280  9.797449 10.303110  9.660522 10.44004
## Jul 1994      10.233926  9.981095 10.486756  9.844168 10.62368
## Aug 1994      10.233456  9.980625 10.486286  9.843698 10.62321
## Sep 1994      10.336841 10.084010 10.589671  9.947083 10.72660
## Oct 1994      10.436923 10.184092 10.689753 10.047165 10.82668
## Nov 1994      10.918299 10.665468 11.171129 10.528541 11.30806
## Dec 1994      11.695812 11.442981 11.948642 11.306054 12.08557
## Jan 1995       9.755590  9.499844 10.011336  9.361338 10.14984
## Feb 1995      10.029027  9.773281 10.284773  9.634775 10.42328
## Mar 1995      10.065713  9.722498 10.408928  9.536620 10.59481
## Apr 1995      10.205703  9.949957 10.461449  9.811451 10.59996
## May 1995      10.253157  9.997411 10.508903  9.858904 10.64741
## Jun 1995      10.314518 10.058772 10.570264  9.920265 10.70877
## Jul 1995      10.498164 10.242418 10.753910 10.103911 10.89242
## Aug 1995      10.497694 10.241948 10.753440 10.103441 10.89195
## Sep 1995      10.601079 10.345333 10.856825 10.206826 10.99533
## Oct 1995      10.701161 10.445415 10.956907 10.306908 11.09541
## Nov 1995      11.182537 10.926791 11.438282 10.788284 11.57679
## Dec 1995      11.960050 11.704304 12.215796 11.565797 12.35430
## Jan 1996      10.019828  9.760564 10.279093  9.620151 10.41951
## Feb 1996      10.293265 10.034000 10.552530  9.893588 10.69294
## Mar 1996      10.329951  9.982679 10.677222  9.794605 10.86530
## Apr 1996      10.469941 10.210677 10.729206 10.070264 10.86962
## May 1996      10.517395 10.258130 10.776659 10.117718 10.91707
## Jun 1996      10.578756 10.319491 10.838021 10.179079 10.97843
## Jul 1996      10.762402 10.503137 11.021667 10.362725 11.16208
## Aug 1996      10.761932 10.502667 11.021196 10.362254 11.16161
## Sep 1996      10.865317 10.606052 11.124582 10.465640 11.26499
## Oct 1996      10.965399 10.706134 11.224664 10.565722 11.36508
## Nov 1996      11.446774 11.187510 11.706039 11.047097 11.84645
## Dec 1996      12.224288 11.965023 12.483552 11.824611 12.62396

I)

preds.tr <- exp(data.frame(preds))
preds.tr

##          Point.Forecast     Lo.80     Hi.80      Lo.95     Hi.95
## Jan 1994       13244.70  10285.82  17054.73   8969.583  19557.43
## Feb 1994       17409.81  13520.45  22418.00  11790.284  25707.73
## Mar 1994       18060.36  12860.02  25363.61  10699.594  30484.96
## Apr 1994       20774.16  16133.21  26750.16  14068.696  30675.62
## May 1994       21783.73  16917.24  28050.15  14752.395  32166.37
## Jun 1994       23162.27  17987.81  29825.24  15685.969  34201.95
## Jul 1994       27831.56  21613.98  35837.72  18848.111  41096.73
## Aug 1994       27818.48  21603.82  35820.87  18839.249  41077.41
## Sep 1994       30848.42  23956.87  39722.43  20891.193  45551.50
## Oct 1994       34095.57  26478.61  43903.67  23090.230  50346.32
## Nov 1994       55176.84  42850.31  71049.28  37366.903  81475.41
## Dec 1994      120067.79  93244.59 154607.08  81312.400 177294.90
## Jan 1995       17250.40  13357.65  22277.59  11629.938  25587.08
## Feb 1995       22675.20  17558.28  29283.31  15287.252  33633.55
## Mar 1995       23522.50  16688.88  33154.31  13858.024  39926.92
## Apr 1995       27057.06  20951.33  34942.16  18241.435  40133.06
## May 1995       28371.96  21969.51  36640.25  19127.918  42083.42
## Jun 1995       30167.42  23359.80  38958.95  20338.387  44746.58
## Jul 1995       36248.88  28068.91  46812.70  24438.412  53767.06
## Aug 1995       36231.84  28055.72  46790.69  24426.922  53741.78
## Sep 1995       40178.16  31111.50  51887.06  27087.467  59595.26
## Oct 1995       44407.37  34386.35  57348.77  29938.733  65868.34
## Nov 1995       71864.42  55647.40  92807.48  48449.831 106594.69
## Dec 1995      156380.86 121091.75 201954.08 105429.448 231955.81
## Jan 1996       22467.57  17336.40  29117.46  15065.329  33506.86
## Feb 1996       29533.04  22788.25  38274.14  19802.984  44043.89
## Mar 1996       30636.60  21648.24  43356.94  17936.708  52328.52
## Apr 1996       35240.15  27191.96  45670.42  23629.808  52555.15
## May 1996       36952.72  28513.41  47889.88  24778.151  55109.18
## Jun 1996       39291.20  30317.82  50920.48  26346.183  58596.65
## Jul 1996       47211.93  36429.60  61185.57  31657.322  70409.18
## Aug 1996       47189.73  36412.48  61156.80  31642.439  70376.07
## Sep 1996       52329.57  40378.47  67817.91  35088.887  78041.33
## Oct 1996       57837.85  44628.77  74956.52  38782.394  86256.08
## Nov 1996       93598.96  72222.70 121302.09  62761.521 139588.15
## Dec 1996      203676.38 157160.50 263959.89 136572.460 303751.35

J)

I think there may be overfitting in the model, and missing some factors that may be important in forecasting and predictions.

6.

A)

library(fpp2)

## Warning: package 'fpp2' was built under R version 3.4.4

## Loading required package: ggplot2

## 
## Attaching package: 'fpp2'

## The following objects are masked from 'package:fpp':
## 
##     ausair, ausbeer, austa, austourists, debitcards, departures,
##     elecequip, euretail, guinearice, oil, sunspotarea, usmelec

gas1<-window(gasoline,end=2005)
length(gas1)

## [1] 726

fourier.gas1 <- tslm(gas1 ~ trend + fourier(gas1, K=26))
fourier.gas2 <- tslm(gas1 ~ trend + fourier(gas1, K=9))
fourier.gas3 <- tslm(gas1 ~ trend + fourier(gas1, K=4))
summary(fourier.gas1)

## 
## Call:
## tslm(formula = gas1 ~ trend + fourier(gas1, K = 26))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.77873 -0.16653 -0.00024  0.17675  0.94911 
## 
## Coefficients:
##                               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  7.093e+00  1.951e-02 363.468  < 2e-16 ***
## trend                        2.799e-03  4.653e-05  60.155  < 2e-16 ***
## fourier(gas1, K = 26)S1-52   5.843e-03  1.375e-02   0.425 0.670921    
## fourier(gas1, K = 26)C1-52  -2.732e-01  1.381e-02 -19.787  < 2e-16 ***
## fourier(gas1, K = 26)S2-52  -4.222e-02  1.374e-02  -3.072 0.002216 ** 
## fourier(gas1, K = 26)C2-52  -1.959e-02  1.379e-02  -1.421 0.155867    
## fourier(gas1, K = 26)S3-52   2.373e-02  1.376e-02   1.725 0.084994 .  
## fourier(gas1, K = 26)C3-52  -9.979e-02  1.378e-02  -7.244 1.20e-12 ***
## fourier(gas1, K = 26)S4-52   1.672e-02  1.377e-02   1.215 0.224973    
## fourier(gas1, K = 26)C4-52  -2.852e-02  1.376e-02  -2.072 0.038624 *  
## fourier(gas1, K = 26)S5-52   3.354e-03  1.377e-02   0.244 0.807615    
## fourier(gas1, K = 26)C5-52  -5.144e-02  1.376e-02  -3.738 0.000202 ***
## fourier(gas1, K = 26)S6-52   6.221e-02  1.376e-02   4.520 7.32e-06 ***
## fourier(gas1, K = 26)C6-52  -5.180e-02  1.377e-02  -3.763 0.000183 ***
## fourier(gas1, K = 26)S7-52   3.127e-02  1.376e-02   2.273 0.023358 *  
## fourier(gas1, K = 26)C7-52   4.325e-02  1.377e-02   3.140 0.001763 ** 
## fourier(gas1, K = 26)S8-52   2.083e-02  1.376e-02   1.514 0.130417    
## fourier(gas1, K = 26)C8-52   1.802e-02  1.378e-02   1.308 0.191282    
## fourier(gas1, K = 26)S9-52  -1.834e-02  1.376e-02  -1.333 0.182848    
## fourier(gas1, K = 26)C9-52   1.358e-02  1.378e-02   0.986 0.324499    
## fourier(gas1, K = 26)S10-52 -1.772e-02  1.376e-02  -1.287 0.198386    
## fourier(gas1, K = 26)C10-52  3.054e-02  1.377e-02   2.218 0.026893 *  
## fourier(gas1, K = 26)S11-52  9.772e-04  1.377e-02   0.071 0.943428    
## fourier(gas1, K = 26)C11-52 -2.765e-02  1.377e-02  -2.008 0.044999 *  
## fourier(gas1, K = 26)S12-52 -2.950e-02  1.377e-02  -2.143 0.032469 *  
## fourier(gas1, K = 26)C12-52 -2.556e-02  1.377e-02  -1.857 0.063810 .  
## fourier(gas1, K = 26)S13-52  3.307e-03  1.376e-02   0.240 0.810173    
## fourier(gas1, K = 26)C13-52 -8.001e-03  1.377e-02  -0.581 0.561452    
## fourier(gas1, K = 26)S14-52 -8.868e-03  1.376e-02  -0.645 0.519395    
## fourier(gas1, K = 26)C14-52 -1.977e-02  1.378e-02  -1.435 0.151646    
## fourier(gas1, K = 26)S15-52 -1.760e-04  1.376e-02  -0.013 0.989797    
## fourier(gas1, K = 26)C15-52  1.335e-03  1.378e-02   0.097 0.922854    
## fourier(gas1, K = 26)S16-52 -2.143e-02  1.376e-02  -1.558 0.119770    
## fourier(gas1, K = 26)C16-52 -5.377e-04  1.377e-02  -0.039 0.968869    
## fourier(gas1, K = 26)S17-52 -1.664e-02  1.376e-02  -1.209 0.227033    
## fourier(gas1, K = 26)C17-52 -1.620e-02  1.377e-02  -1.177 0.239732    
## fourier(gas1, K = 26)S18-52  4.396e-02  1.377e-02   3.193 0.001473 ** 
## fourier(gas1, K = 26)C18-52 -1.674e-03  1.377e-02  -0.122 0.903269    
## fourier(gas1, K = 26)S19-52  5.762e-03  1.376e-02   0.419 0.675646    
## fourier(gas1, K = 26)C19-52  1.024e-02  1.377e-02   0.743 0.457458    
## fourier(gas1, K = 26)S20-52  9.333e-03  1.376e-02   0.678 0.497751    
## fourier(gas1, K = 26)C20-52  2.978e-03  1.377e-02   0.216 0.828910    
## fourier(gas1, K = 26)S21-52 -8.704e-03  1.375e-02  -0.633 0.526996    
## fourier(gas1, K = 26)C21-52 -2.435e-03  1.378e-02  -0.177 0.859782    
## fourier(gas1, K = 26)S22-52  1.058e-03  1.375e-02   0.077 0.938701    
## fourier(gas1, K = 26)C22-52  9.891e-03  1.378e-02   0.718 0.473123    
## fourier(gas1, K = 26)S23-52  1.011e-02  1.376e-02   0.735 0.462775    
## fourier(gas1, K = 26)C23-52 -7.423e-03  1.377e-02  -0.539 0.590007    
## fourier(gas1, K = 26)S24-52 -1.775e-02  1.378e-02  -1.288 0.198172    
## fourier(gas1, K = 26)C24-52 -4.251e-03  1.375e-02  -0.309 0.757365    
## fourier(gas1, K = 26)S25-52  1.774e-02  1.380e-02   1.286 0.198865    
## fourier(gas1, K = 26)C25-52 -7.411e-04  1.374e-02  -0.054 0.957007    
## fourier(gas1, K = 26)S26-52  1.308e-03  1.392e-02   0.094 0.925169    
## fourier(gas1, K = 26)C26-52  2.732e-03  1.385e-02   0.197 0.843658    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2622 on 672 degrees of freedom
## Multiple R-squared:  0.8634, Adjusted R-squared:  0.8526 
## F-statistic: 80.13 on 53 and 672 DF,  p-value: < 2.2e-16

The models all give similar results, however I think the model with K=26 is the best as it uses more parameters.

B)

CV(fourier.gas1)

##            CV           AIC          AICc           BIC         AdjR2 
##  7.426757e-02 -1.889694e+03 -1.880500e+03 -1.637379e+03  8.526098e-01

CV(fourier.gas2)

##            CV           AIC          AICc           BIC         AdjR2 
##  7.161451e-02 -1.912292e+03 -1.910980e+03 -1.815954e+03  8.506543e-01

CV(fourier.gas3)

##            CV           AIC          AICc           BIC         AdjR2 
##  7.648608e-02 -1.864112e+03 -1.863742e+03 -1.813649e+03  8.382404e-01

Based on AIC and CV metrics, the third model, using K=9 is actually the best model.

C)

checkresiduals(fourier.gas2)

## 
##  Breusch-Godfrey test for serial correlation of order up to 104
## 
## data:  Residuals from Linear regression model
## LM test = 156.67, df = 104, p-value = 0.0006505

The residuals appear normal, however there may be some autocorrelation present.

D)

fc <- forecast(fourier.gas2, newdata=data.frame(fourier(gas1, 9, 52)))
fc

##          Point Forecast    Lo 80    Hi 80    Lo 95     Hi 95
## 2005.014       8.777225 8.433379 9.121071 8.250947  9.303502
## 2005.033       8.633547 8.289604 8.977489 8.107121  9.159972
## 2005.052       8.597550 8.253577 8.941522 8.071079  9.124021
## 2005.071       8.646957 8.303008 8.990907 8.120521  9.173393
## 2005.090       8.729791 8.385918 9.073664 8.203472  9.256110
## 2005.110       8.806443 8.462643 9.150242 8.280237  9.332649
## 2005.129       8.866663 8.522877 9.210449 8.340477  9.392848
## 2005.148       8.917757 8.573961 9.261552 8.391557  9.443957
## 2005.167       8.965209 8.621424 9.308994 8.439025  9.491393
## 2005.186       9.005903 8.662131 9.349676 8.479739  9.532068
## 2005.205       9.034890 8.691120 9.378660 8.508729  9.561051
## 2005.225       9.052825 8.709052 9.396598 8.526660  9.578991
## 2005.244       9.065201 8.721422 9.408980 8.539026  9.591375
## 2005.263       9.077132 8.733352 9.420912 8.550956  9.603308
## 2005.282       9.091893 8.748119 9.435668 8.565726  9.618061
## 2005.301       9.113713 8.769943 9.457484 8.587552  9.639875
## 2005.320       9.147594 8.803823 9.491365 8.621431  9.673756
## 2005.340       9.192685 8.848910 9.536459 8.666517  9.718853
## 2005.359       9.236845 8.893067 9.580623 8.710672  9.763018
## 2005.378       9.263132 8.919356 9.606908 8.736961  9.789302
## 2005.397       9.266926 8.923154 9.610698 8.740762  9.793090
## 2005.416       9.267083 8.923313 9.610854 8.740921  9.793245
## 2005.435       9.295168 8.951395 9.638940 8.769003  9.821332
## 2005.455       9.367169 9.023393 9.710944 8.840999  9.893339
## 2005.474       9.462974 9.119197 9.806751 8.936803  9.989145
## 2005.493       9.536563 9.192789 9.880337 9.010396 10.062730
## 2005.512       9.552560 9.208789 9.896331 9.026398 10.078723
## 2005.531       9.517666 9.173895 9.861437 8.991504 10.043829
## 2005.550       9.476771 9.132997 9.820545 8.950604 10.002938
## 2005.570       9.474840 9.131063 9.818617 8.948669 10.001012
## 2005.589       9.517911 9.174136 9.861687 8.991742 10.044081
## 2005.608       9.568096 9.224324 9.911869 9.041932 10.094261
## 2005.627       9.576123 9.232352 9.919893 9.049961 10.102285
## 2005.646       9.520957 9.177185 9.864730 8.994793 10.047122
## 2005.665       9.422126 9.078350 9.765902 8.895955  9.948296
## 2005.685       9.318736 8.974959 9.662513 8.792564  9.844908
## 2005.704       9.240131 8.896357 9.583905 8.713964  9.766298
## 2005.723       9.195783 8.852012 9.539554 8.669620  9.721945
## 2005.742       9.186562 8.842791 9.530333 8.660399  9.712724
## 2005.761       9.215402 8.871627 9.559176 8.689234  9.741570
## 2005.780       9.279492 8.935714 9.623271 8.753318  9.805667
## 2005.800       9.353158 9.009381 9.696935 8.826986  9.879330
## 2005.819       9.388600 9.044828 9.732372 8.862436  9.914765
## 2005.838       9.347153 9.003383 9.690923 8.820992  9.873314
## 2005.857       9.238738 8.894966 9.582511 8.712573  9.764903
## 2005.876       9.129196 8.785416 9.472976 8.603020  9.655373
## 2005.895       9.097817 8.754033 9.441602 8.571635  9.624000
## 2005.915       9.173478 8.829701 9.517256 8.647306  9.699651
## 2005.934       9.303287 8.959514 9.647060 8.777122  9.829452
## 2005.953       9.384871 9.041099 9.728643 8.858707  9.911035
## 2005.972       9.340037 8.996246 9.683827 8.813844  9.866229
## 2005.991       9.171444 8.827563 9.515325 8.645113  9.697775

E)

gas2<-window(gasoline,start=2005,end=2006)
autoplot(gas2, series="Data") +  autolayer(fc$mean, series="Fitted")

The forecasted data predicts the trend of the time series pretty well. However, it does not predict the large rises and falls, which is most likely due to missing factors.

6.9

5.

A)

?cangas
autoplot(cangas)

ggseasonplot(cangas)

ggsubseriesplot(cangas)

There is change in the seasonality over time in gas production in Canada from 1960 to 2005. There seems to be an increase in variability throughout the year. I am not sure why this is, but maybe it has to do with the increased knowlege of gas producers, giving them the ability to change production as they see fit, whereas in early years they may have had less knowlege of the market and therefore produced at a steadier rate.

B)

plot(stl(cangas,t.window=46, s.window="periodic", robust=TRUE))

plot(stl(cangas,t.window=46, s.window=3, robust=TRUE))

plot(stl(cangas,t.window=46, s.window=12, robust=TRUE))

plot(stl(cangas,t.window=46, s.window=100, robust=TRUE))

Using larger numbers for the window parameter, creates a more normal season plot. This is because it uses more of the recent years to find seasonality. Using a smaller number creates irregular season plots because it only uses a few of the recent years.

C)

library(seasonal)
autoplot(seas(cangas,x11=""))

For some reason I was unable to produce the SEATS plot, however the X11 plot gives a good remainder plot for detecting outliers. Everything is similar to the original plot though.

6.

A)

plot(stl(bricksq,t.window=39, s.window="periodic", robust=TRUE))

plot(stl(bricksq,t.window=39, s.window=8, robust=TRUE))

plot(stl(bricksq,t.window=39, s.window=6, robust=TRUE))

There is an increasing trend and seasonality in this time series. There does not appear to be cyclicity.

B)

plot(seasadj((stl(bricksq,t.window=39, s.window="periodic", robust=TRUE))))

#####C)

fit<-stl(bricksq,t.window=39, s.window="periodic", robust=TRUE)
plot(naive(seasadj(fit)))

#####D)

fcast <- stlf(bricksq, method='naive')
forecast(fcast,h=8)

##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 1994 Q4       465.7338 440.1878 491.2798 426.6645 504.8030
## 1995 Q1       424.6739 388.5464 460.8014 369.4217 479.9261
## 1995 Q2       483.9926 439.7456 528.2395 416.3227 551.6625
## 1995 Q3       494.0000 442.9080 545.0920 415.8615 572.1385
## 1995 Q4       465.7338 408.6112 522.8564 378.3723 553.0952
## 1996 Q1       424.6739 362.0992 487.2486 328.9742 520.3736
## 1996 Q2       483.9926 416.4042 551.5809 380.6251 587.3600
## 1996 Q3       494.0000 421.7450 566.2550 383.4955 604.5045

E)

checkresiduals(fcast)

There seems to be some possible autocorrelation in the residuals #####F)

fit <- stl(bricksq, t.window=39, s.window="periodic", robust=TRUE)
plot(naive(seasadj(fit)))

#####G

bricktrain<-window(bricksq, end=c(1991,4))
bricktest<-window(bricksq, start=c(1992,1), end=c(1993,4))

fit.naive<-snaive(bricktrain)
fit.stlf<-stlf(bricktrain)
fcast.naive<-forecast(fit.naive,h=8)
fcast.stlf<-forecast(fit.stlf,h=8)
names(fcast.naive)

##  [1] "method"    "model"     "level"     "mean"      "lower"    
##  [6] "upper"     "x"         "series"    "fitted"    "residuals"

fcast.naive$mean

##      Qtr1 Qtr2 Qtr3 Qtr4
## 1992  373  436  424  430
## 1993  373  436  424  430

res.naive<-as.numeric(bricktest)-fcast.naive$mean
res.stlf<-as.numeric(bricktest)-fcast.stlf$mean
autoplot(res.naive) + autolayer(res.stlf)

The stiff forecast performs better than the seasonal naive forecast.

8.

plot(fancy)

Should log transform

fcast.fancy.naive <- stlf(writing, method='naive')
fcast.fancy.drift <- stlf(writing, method='rwdrift')
fcast.fancy.drift2 <- stlf(writing, method='rwdrift', lambda=0)
res.fancy.naive<-as.numeric(residuals(fcast.fancy.naive))
res.fancy.drift<-as.numeric(residuals(fcast.fancy.drift))
res.fancy.drift2<-as.numeric(residuals(fcast.fancy.drift2, type="response"))
res.fancy.naive<-na.omit(res.fancy.naive)
res.fancy.drift<-na.omit(res.fancy.drift)
res.fancy.drift2<-na.omit(res.fancy.drift2)

Compare models with RMSE

RMSE.fancy.naive<-sqrt(sum((res.fancy.naive)^2)/119)
RMSE.fancy.drift<-sqrt(sum((res.fancy.drift)^2)/119)
RMSE.fancy.drift2<-sqrt(sum((res.fancy.drift2)^2)/119)
RMSE.fancy.naive

## [1] 46.6923

RMSE.fancy.drift

## [1] 46.55813

RMSE.fancy.drift2

## [1] 45.38429

The log transformed model has the lowest RMSE and is therefore the best model

Untitled

Brendan O’Brien

11/8/2018

Homework 1

2.10

5.

6.

8.

10.

3.7

8.

5.10

5

B)

C)

D)

E)

G)

H)

I)

J)

6.

A)

B)

C)

D)

E)

6.9

5.

A)

B)

C)

6.

A)

B)

E)

8.