library(fpp)
## Loading required package: forecast
## Warning: package 'forecast' was built under R version 3.4.4
## Loading required package: fma
## Loading required package: expsmooth
## Loading required package: lmtest
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: tseries
## Warning: package 'tseries' was built under R version 3.4.4
writing
## Jan Feb Mar Apr May Jun Jul
## 1968 562.674 599.000 668.516 597.798 579.889 668.233 499.232
## 1969 634.712 639.283 712.182 621.557 621.000 675.989 501.322
## 1970 646.783 658.442 712.906 687.714 723.916 707.183 629.000
## 1971 676.155 748.183 810.681 729.363 701.108 790.079 594.621
## 1972 747.636 773.392 813.788 766.713 728.875 749.197 680.954
## 1973 795.337 788.421 889.968 797.393 751.000 821.255 691.605
## 1974 843.038 847.000 941.952 804.309 840.307 871.528 656.330
## 1975 778.139 856.075 938.833 813.023 783.417 828.110 657.311
## 1976 895.217 856.075 893.268 875.000 835.088 934.595 832.500
## 1977 875.024 992.968 976.804 968.697 871.675 1006.852 832.037
## Aug Sep Oct Nov Dec
## 1968 215.187 555.813 586.935 546.136 571.111
## 1969 220.286 560.727 602.530 626.379 605.508
## 1970 237.530 613.296 730.444 734.925 651.812
## 1971 230.716 617.189 691.389 701.067 705.777
## 1972 241.424 680.234 708.326 694.238 772.071
## 1973 290.655 727.147 868.355 812.390 799.556
## 1974 370.508 742.000 847.152 731.675 898.527
## 1975 310.032 780.000 860.000 780.000 807.993
## 1976 300.000 791.443 900.000 781.729 880.000
## 1977 345.587 849.528 913.871 868.746 993.733
?writing
ggseasonplot(writing)
ggsubseriesplot(writing)
The seasonal plots show the sales for printing and writing paper in France dropped dramatically during the month of August between the years 1968 and 1977.
There is no year that looks particulary unusual in this time series.
fancy
## Jan Feb Mar Apr May Jun Jul
## 1987 1664.81 2397.53 2840.71 3547.29 3752.96 3714.74 4349.61
## 1988 2499.81 5198.24 7225.14 4806.03 5900.88 4951.34 6179.12
## 1989 4717.02 5702.63 9957.58 5304.78 6492.43 6630.80 7349.62
## 1990 5921.10 5814.58 12421.25 6369.77 7609.12 7224.75 8121.22
## 1991 4826.64 6470.23 9638.77 8821.17 8722.37 10209.48 11276.55
## 1992 7615.03 9849.69 14558.40 11587.33 9332.56 13082.09 16732.78
## 1993 10243.24 11266.88 21826.84 17357.33 15997.79 18601.53 26155.15
## Aug Sep Oct Nov Dec
## 1987 3566.34 5021.82 6423.48 7600.60 19756.21
## 1988 4752.15 5496.43 5835.10 12600.08 28541.72
## 1989 8176.62 8573.17 9690.50 15151.84 34061.01
## 1990 7979.25 8093.06 8476.70 17914.66 30114.41
## 1991 12552.22 11637.39 13606.89 21822.11 45060.69
## 1992 19888.61 23933.38 25391.35 36024.80 80721.71
## 1993 28586.52 30505.41 30821.33 46634.38 104660.67
?fancy
ggseasonplot(fancy)
ggsubseriesplot(fancy)
The seasonal plots for montly sales at a souvenir shop in Queensland, Australia show a large increase in sales in the month of November, and an even larger increase in December every year from 1987 to 1993. Most years after 1987, also show a small spike in sales in the month of March. In most years, sales remain relatively constant for most of the year with low variance between months.
Souvenir sales in 1992 and 1993 experienced larger increases between May and October than previous years. December sales in these years were also much higher than sales in previous years. From 1987-1991, December sales increased steadily from $19k in ’87 to $45k in ’91. However, December sales jumped to $80k in ’92 and $104k in ’93.
These plots display the seasonality of tourism in Queensland,Australia, and the growth in popularity of the location, especially in the years 1992 and 1993.
a10
## Jan Feb Mar Apr May Jun Jul
## 1991 3.526591
## 1992 5.088335 2.814520 2.985811 3.204780 3.127578 3.270523 3.737851
## 1993 6.192068 3.450857 3.772307 3.734303 3.905399 4.049687 4.315566
## 1994 6.731473 3.841278 4.394076 4.075341 4.540645 4.645615 4.752607
## 1995 6.749484 4.216067 4.949349 4.823045 5.194754 5.170787 5.256742
## 1996 8.329452 5.069796 5.262557 5.597126 6.110296 5.689161 6.486849
## 1997 8.524471 5.277918 5.714303 6.214529 6.411929 6.667716 7.050831
## 1998 8.798513 5.918261 6.534493 6.675736 7.064201 7.383381 7.813496
## 1999 10.391416 6.421535 8.062619 7.297739 7.936916 8.165323 8.717420
## 2000 12.511462 7.457199 8.591191 8.474000 9.386803 9.560399 10.834295
## 2001 14.497581 8.049275 10.312891 9.753358 10.850382 9.961719 11.443601
## 2002 16.300269 9.053485 10.002449 10.788750 12.106705 10.954101 12.844566
## 2003 16.828350 9.800215 10.816994 10.654223 12.512323 12.161210 12.998046
## 2004 18.003768 11.938030 12.997900 12.882645 13.943447 13.989472 15.339097
## 2005 20.778723 12.154552 13.402392 14.459239 14.795102 15.705248 15.829550
## 2006 23.486694 12.536987 15.467018 14.233539 17.783058 16.291602 16.980282
## 2007 28.038383 16.763869 19.792754 16.427305 21.000742 20.681002 21.834890
## 2008 29.665356 21.654285 18.264945 23.107677 22.912510 19.431740
## Aug Sep Oct Nov Dec
## 1991 3.180891 3.252221 3.611003 3.565869 4.306371
## 1992 3.558776 3.777202 3.924490 4.386531 5.810549
## 1993 4.562185 4.608662 4.667851 5.093841 7.179962
## 1994 5.350605 5.204455 5.301651 5.773742 6.204593
## 1995 5.855277 5.490729 6.115293 6.088473 7.416598
## 1996 6.300569 6.467476 6.828629 6.649078 8.606937
## 1997 6.704919 7.250988 7.819733 7.398101 10.096233
## 1998 7.431892 8.275117 8.260441 8.596156 10.558939
## 1999 9.070964 9.177113 9.251887 9.933136 11.532974
## 2000 10.643751 9.908162 11.710041 11.340151 12.079132
## 2001 11.659239 10.647060 12.652134 13.674466 12.965735
## 2002 12.196500 12.854748 13.542004 13.287640 15.134918
## 2003 12.517276 13.268658 14.733622 13.669382 16.503966
## 2004 15.370764 16.142005 16.685754 17.636728 18.869325
## 2005 17.554701 18.100864 17.496668 19.347265 20.031291
## 2006 18.612189 16.623343 21.430241 23.575517 23.334206
## 2007 23.930204 22.930357 23.263340 25.250030 25.806090
## 2008
?a10
ggseasonplot(a10)
ggsubseriesplot(a10)
The season plots for monthly anti-diabetic drug sales in Australia from 1992 to 2008 show a general drop in sales from January to Febuary. There also appears to be an increase in variance between months in later years compared to earlier years in the time series.
h02
## Jan Feb Mar Apr May Jun Jul
## 1991 0.4297950
## 1992 0.6601190 0.3362200 0.3513480 0.3798080 0.3618010 0.4105340 0.4833887
## 1993 0.7515028 0.3875543 0.4272832 0.4138902 0.4288588 0.4701264 0.5092097
## 1994 0.8193253 0.4376698 0.5061213 0.4704912 0.5106963 0.5405138 0.5581189
## 1995 0.8031126 0.4752582 0.5525723 0.5271078 0.5612498 0.5889776 0.6231336
## 1996 0.9372759 0.5287616 0.5593399 0.5778717 0.6149274 0.5941888 0.7077584
## 1997 0.8468335 0.4638225 0.4852732 0.5280586 0.5623365 0.5885704 0.6694804
## 1998 0.8005444 0.4905572 0.5244080 0.5366495 0.5520905 0.6033656 0.6812454
## 1999 0.8930815 0.5126960 0.6529959 0.5739764 0.6392384 0.7038719 0.7706482
## 2000 0.9696557 0.5732915 0.6185068 0.6189957 0.6652092 0.7265201 0.8558649
## 2001 1.0438053 0.5106472 0.6725690 0.6484701 0.7041147 0.6994307 0.8519259
## 2002 1.1458676 0.5755844 0.6411646 0.6798621 0.7679384 0.7520959 0.9180636
## 2003 1.0781449 0.5782962 0.6433333 0.6633674 0.7505160 0.8007456 0.9163610
## 2004 1.1301252 0.6679887 0.7490143 0.7399860 0.7951286 0.8568028 1.0015932
## 2005 1.1706900 0.5976390 0.6525900 0.6705050 0.6952480 0.8422630 0.8743360
## 2006 1.2306910 0.5871350 0.7069590 0.6396410 0.8074050 0.7979700 0.8843120
## 2007 1.2233190 0.5977530 0.7043980 0.5617600 0.7452580 0.8379340 0.9541440
## 2008 1.2199410 0.7618220 0.6494350 0.8278870 0.8162550 0.7621370
## Aug Sep Oct Nov Dec
## 1991 0.4009060 0.4321590 0.4925430 0.5023690 0.6026520
## 1992 0.4754634 0.5347610 0.5686061 0.5952233 0.7712578
## 1993 0.5584430 0.6015141 0.6329471 0.6996054 0.9630805
## 1994 0.6728521 0.6858974 0.6896920 0.7413036 0.8133076
## 1995 0.7408372 0.7253718 0.8158030 0.8140095 0.9266531
## 1996 0.7195020 0.7443237 0.8048551 0.7885423 0.9710894
## 1997 0.6779937 0.7629955 0.7997237 0.7705219 0.9943893
## 1998 0.6780753 0.7948926 0.7846239 0.8130087 0.9777323
## 1999 0.8461859 0.8927289 0.8978999 0.9472807 1.0507073
## 2000 0.8659843 0.8252488 0.9554210 0.9385960 1.0130244
## 2001 0.9077052 0.8674445 1.0242928 1.1095902 1.0123132
## 2002 0.9243675 1.0131977 1.0269761 1.0067960 1.1027757
## 2003 0.9168868 1.0846589 1.1506482 1.0508382 1.2232345
## 2004 0.9948643 1.1344320 1.1810110 1.2160370 1.2572380
## 2005 1.0064970 1.0947360 1.0270430 1.1492320 1.1607120
## 2006 1.0496480 0.9957090 1.1682530 1.1080380 1.1200530
## 2007 1.0782195 1.1109816 1.1099791 1.1635343 1.1765890
## 2008
?h02
ggseasonplot(h02)
ggsubseriesplot(h02)
Simalarly to the last seasonal plots, these plots display a large drop in drug sales from January to Febuary. The variance in sales between months is again much greater in later years. Starting around 2000, the monthly sales have become much more random, to the point that some years show decrease sales in a month, that the previous year showed an increase in the same month. The sales seem much less predictable in the later years.
hsales
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1973 55 60 68 63 65 61 54 52 46 42 37 30
## 1974 37 44 55 53 58 50 48 45 41 34 30 24
## 1975 29 34 44 54 57 51 51 53 46 46 46 39
## 1976 41 53 55 62 55 56 57 59 58 55 49 47
## 1977 57 68 84 81 78 74 64 74 71 63 55 51
## 1978 57 63 75 85 80 77 68 72 68 70 53 50
## 1979 53 58 73 72 68 63 64 68 60 54 41 35
## 1980 43 44 44 36 44 50 55 61 50 46 39 33
## 1981 37 40 49 44 45 38 36 34 28 29 27 29
## 1982 28 29 36 32 36 34 31 36 39 40 39 33
## 1983 44 46 57 59 64 59 51 50 48 51 45 48
## 1984 52 58 63 61 59 58 52 48 53 55 42 38
## 1985 48 55 67 60 65 65 63 61 54 52 51 47
## 1986 55 59 89 84 75 66 57 52 60 54 48 49
## 1987 53 59 73 72 62 58 55 56 52 52 43 37
## 1988 43 55 68 68 64 65 57 59 54 57 43 42
## 1989 52 51 58 60 61 58 62 61 49 51 47 40
## 1990 45 50 58 52 50 50 46 46 38 37 34 29
## 1991 30 40 46 46 47 47 43 46 37 41 39 36
## 1992 48 55 56 53 52 53 52 56 51 48 42 42
## 1993 44 50 60 66 58 59 55 57 57 56 53 51
## 1994 45 58 74 65 65 55 52 59 54 57 45 40
## 1995 47 47 60 58 63 64 64 63 55 54 44
?hsales
autoplot(hsales)
ggseasonplot(hsales)
ggsubseriesplot(hsales)
gglagplot(hsales)
ggAcf(hsales)
In viewing the plots, there appears to be seaonality in the sales of new one-family houses in the USA from 1973-1995. The sales tend to increase during spring months each year, whith the largest spike typically being in March. There does not appear to be a trend with this time series, as sales have not generally increased or decreased over the years. There does appear to by cyclicity every 5-10 years in the sales of one-family houses in the USA during this time period.
The plots show that there is cyclicity and seasonality in the sales of one-family houses in the USA. There are rises and falls in the market every 5-10 years. Also, there is seasonality within each year, where more houses are sold in the spring months compared to the rest of the year.
?usdeaths
autoplot(usdeaths)
ggseasonplot(usdeaths)
ggsubseriesplot(usdeaths)
gglagplot(usdeaths)
ggAcf(usdeaths)
In viewing the plots, there is definitly seasonality in monthly accidental deaths in the USA from 1973 to 1978. There are far more deaths, typically, in the summer months, especially July. There are also far less accidental deaths every year during the winter months, especially February. The plots do not show any trend as the number of deaths does not seem to rise or fall throughout the years. I also do not notice any cyclicity in this time series.
The plots reveal certain seasonality in this time series. This says to me that there is something during the summer months (maybe more drivers) that causes more deaths during these months. The plots also reveal that there is no real increase or decrease of accidental deaths over the years, and no cyclicity either.
?bricksq
autoplot(bricksq)
ggseasonplot(bricksq)
ggsubseriesplot(bricksq)
gglagplot(bricksq)
ggAcf(bricksq)
The plots for quarterly clay brick production in Australia from 1956 to 1994 show possible seasonality. Although brick produciton remains relatively constant most years, there does appear to be a seasonal trend of an steady increase in brick produciton from Q1-Q3 and then a drop after Q3 into Q4. There also appears to be an upward trend in the time series, as brick production has generally increased from 1956 to 1994. However, there is a major decrease in 1984 and a decrease from the late ’80s into the early and mid ’90s, although it does appear to trend up again towards the end. There is no cyclicity in the time series.
The plots reveal to me that there is a seasonal trend most years for brick sales in Australia, with Q3 typically being the best quarter. It also reveals that something must have happened in 1984 to cause such a large decrease in brick sales.
ss.ts<- ts(data = sunspotarea, start = 1875, end = 2011, frequency = 12, ts.eps = getOption("ts.eps"), class ="ts")
autoplot(ss.ts)
ggseasonplot(ss.ts)
ggsubseriesplot(ss.ts)
gglagplot(ss.ts)
ggAcf(ss.ts)
plot(decompose(ss.ts))
This time series appears to be much more random than the other. It is tough to read the plots with so many years of observations, however I do believe there is seasonality in the time series. There appears to be an increase in sunspot area during the summer and into fall months. There is no trend in the data but there does appear to be cyclicity about every 10 years.
The plots for this time series reveal to me an increase in sun spot area during summer and fall months. This makes sense considering we typically get more sun during these months. I think it is also worth noting that there is a major spike about every 10 years where the averages of sunspot area jumps way higher than normal.
(“Gasoline” could not be found)
Time Plot A -> ACF B Time Plot B -> ACF A Time Plot C -> ACF D Time Plot D -> ACF C
?dj
ddj <- diff(dj)
plot(ddj)
ggAcf(ddj)
Yes, the ddj time series looks similar to white noise. Looking at the ACF plot, each autocorrelation is close to zero, indicating white noise.
retaildata <- readxl::read_excel("retail.xlsx", skip = 1)
myts <- ts(retaildata[,186], c(1982, 4), frequency = 12)
myts.train <- window(myts, end=c(2010,12))
myts.test <- window(myts, start=2011)
autoplot(myts) +
autolayer(myts.train, series="Training") +
autolayer(myts.test, series="Test")
fc <- snaive(myts.train)
fc
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2011 2601.3 2464.343 2738.257 2391.842 2810.758
## Feb 2011 2426.4 2289.443 2563.357 2216.942 2635.858
## Mar 2011 2709.9 2572.943 2846.857 2500.442 2919.358
## Apr 2011 2549.6 2412.643 2686.557 2340.142 2759.058
## May 2011 2633.7 2496.743 2770.657 2424.242 2843.158
## Jun 2011 2604.6 2467.643 2741.557 2395.142 2814.058
## Jul 2011 2765.1 2628.143 2902.057 2555.642 2974.558
## Aug 2011 2803.4 2666.443 2940.357 2593.942 3012.858
## Sep 2011 2809.9 2672.943 2946.857 2600.442 3019.358
## Oct 2011 2854.1 2717.143 2991.057 2644.642 3063.558
## Nov 2011 2998.3 2861.343 3135.257 2788.842 3207.758
## Dec 2011 3947.6 3810.643 4084.557 3738.142 4157.058
## Jan 2012 2601.3 2407.614 2794.986 2305.082 2897.518
## Feb 2012 2426.4 2232.714 2620.086 2130.182 2722.618
## Mar 2012 2709.9 2516.214 2903.586 2413.682 3006.118
## Apr 2012 2549.6 2355.914 2743.286 2253.382 2845.818
## May 2012 2633.7 2440.014 2827.386 2337.482 2929.918
## Jun 2012 2604.6 2410.914 2798.286 2308.382 2900.818
## Jul 2012 2765.1 2571.414 2958.786 2468.882 3061.318
## Aug 2012 2803.4 2609.714 2997.086 2507.182 3099.618
## Sep 2012 2809.9 2616.214 3003.586 2513.682 3106.118
## Oct 2012 2854.1 2660.414 3047.786 2557.882 3150.318
## Nov 2012 2998.3 2804.614 3191.986 2702.082 3294.518
## Dec 2012 3947.6 3753.914 4141.286 3651.382 4243.818
accuracy(fc,myts.test)
## ME RMSE MAE MPE MAPE MASE
## Training set 82.63694 106.8681 87.78769 5.845747 6.146052 1.000000
## Test set 239.16667 256.2178 239.16667 7.835325 7.835325 2.724376
## ACF1 Theil's U
## Training set 0.6795773 NA
## Test set 0.2784120 0.7753287
checkresiduals(fc)
##
## Ljung-Box test
##
## data: Residuals from Seasonal naive method
## Q* = 635.4, df = 24, p-value < 2.2e-16
##
## Model df: 0. Total lags used: 24
The residuals appear uncorrelated and normal
The accuracy metrics appear to be much better for the training set (possible overfitting?)
autoplot(fancy)
#####A) The time series plot shows a trend over time of an increase in sales from 1988-1994. There also appears to be large spikes around december every year (due to the influx of tourists during Christmas time.) There is also a large increase in sales generally over the last couple of years.
It is important to take a log of this data because it is has exponetial growth, therefore it would normalize the data.
log.fancy <- log(fancy)
dummy.surf = rep(0, length(fancy))
dummy.surf[seq_along(dummy.surf)%%12 == 3] <- 1
dummy.surf[3] <- 0
dummy.surf <- ts(dummy.surf, freq = 12, start=c(1987,1))
log.fancy.df <- data.frame(log.fancy, dummy.surf)
fit <- tslm(log.fancy ~ trend + season + dummy.surf, data=log.fancy.df)
future.surf <- data.frame(dummy.surf = rep(0, 12))
future.surf[3,] <- 1
forecast(fit, newdata=future.surf)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 1994 9.491352 9.238522 9.744183 9.101594 9.88111
## Feb 1994 9.764789 9.511959 10.017620 9.375031 10.15455
## Mar 1994 10.302990 10.048860 10.557120 9.911228 10.69475
## Apr 1994 9.941465 9.688635 10.194296 9.551707 10.33122
## May 1994 9.988919 9.736088 10.241749 9.599161 10.37868
## Jun 1994 10.050280 9.797449 10.303110 9.660522 10.44004
## Jul 1994 10.233926 9.981095 10.486756 9.844168 10.62368
## Aug 1994 10.233456 9.980625 10.486286 9.843698 10.62321
## Sep 1994 10.336841 10.084010 10.589671 9.947083 10.72660
## Oct 1994 10.436923 10.184092 10.689753 10.047165 10.82668
## Nov 1994 10.918299 10.665468 11.171129 10.528541 11.30806
## Dec 1994 11.695812 11.442981 11.948642 11.306054 12.08557
plot(residuals(fit), type='p')
abline(h = 0, col="grey")
There appears to be potential correlation between the residuals and nonlinearity. From 1990 to 1993 the residuals are mostly negative while in 1994 they are mostly positive.
plot(as.numeric(fitted(fit)), residuals(fit), type='p')
The residuals plotted against the fitted values show no pattern or correlation.This means they should be unbiased and homoscedastic
boxplot(resid(fit) ~ cycle(resid(fit)))
Residuals for the fall months seem to have more variance than the other months, which may mean there is some factor from these months that the model is not capturing. #####F)
summary(fit)
##
## Call:
## tslm(formula = log.fancy ~ trend + season + dummy.surf, data = log.fancy.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.33673 -0.12757 0.00257 0.10911 0.37671
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.6196670 0.0742471 102.626 < 2e-16 ***
## trend 0.0220198 0.0008268 26.634 < 2e-16 ***
## season2 0.2514168 0.0956790 2.628 0.010555 *
## season3 0.2660828 0.1934044 1.376 0.173275
## season4 0.3840535 0.0957075 4.013 0.000148 ***
## season5 0.4094870 0.0957325 4.277 5.88e-05 ***
## season6 0.4488283 0.0957647 4.687 1.33e-05 ***
## season7 0.6104545 0.0958039 6.372 1.71e-08 ***
## season8 0.5879644 0.0958503 6.134 4.53e-08 ***
## season9 0.6693299 0.0959037 6.979 1.36e-09 ***
## season10 0.7473919 0.0959643 7.788 4.48e-11 ***
## season11 1.2067479 0.0960319 12.566 < 2e-16 ***
## season12 1.9622412 0.0961066 20.417 < 2e-16 ***
## dummy.surf 0.5015151 0.1964273 2.553 0.012856 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.179 on 70 degrees of freedom
## Multiple R-squared: 0.9567, Adjusted R-squared: 0.9487
## F-statistic: 119 on 13 and 70 DF, p-value: < 2.2e-16
The season coefficients show typically how sales increase or decrease for each month of the year. All of the months are positive and are statistically significant, except for March.
The trend variable shows that there is a positive trend in the data and that trend is significant. Lastly, the dummy.surf variable shows that the surfing festival is statistically significant in this model and it causes a .5 increase in sales, generally.
library(lmtest)
bgtest(fit)
##
## Breusch-Godfrey test for serial correlation of order up to 1
##
## data: fit
## LM test = 25.031, df = 1, p-value = 5.642e-07
The Breusch-Godfrey test shows that there is still some autocorrelation in the residuals.This means there is still some factors not being accounted for that could improve the model.
future.surf <- data.frame(dummy.surf = rep(0, 36))
preds <- forecast(fit, newdata=future.surf)
preds
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 1994 9.491352 9.238522 9.744183 9.101594 9.88111
## Feb 1994 9.764789 9.511959 10.017620 9.375031 10.15455
## Mar 1994 9.801475 9.461879 10.141071 9.277961 10.32499
## Apr 1994 9.941465 9.688635 10.194296 9.551707 10.33122
## May 1994 9.988919 9.736088 10.241749 9.599161 10.37868
## Jun 1994 10.050280 9.797449 10.303110 9.660522 10.44004
## Jul 1994 10.233926 9.981095 10.486756 9.844168 10.62368
## Aug 1994 10.233456 9.980625 10.486286 9.843698 10.62321
## Sep 1994 10.336841 10.084010 10.589671 9.947083 10.72660
## Oct 1994 10.436923 10.184092 10.689753 10.047165 10.82668
## Nov 1994 10.918299 10.665468 11.171129 10.528541 11.30806
## Dec 1994 11.695812 11.442981 11.948642 11.306054 12.08557
## Jan 1995 9.755590 9.499844 10.011336 9.361338 10.14984
## Feb 1995 10.029027 9.773281 10.284773 9.634775 10.42328
## Mar 1995 10.065713 9.722498 10.408928 9.536620 10.59481
## Apr 1995 10.205703 9.949957 10.461449 9.811451 10.59996
## May 1995 10.253157 9.997411 10.508903 9.858904 10.64741
## Jun 1995 10.314518 10.058772 10.570264 9.920265 10.70877
## Jul 1995 10.498164 10.242418 10.753910 10.103911 10.89242
## Aug 1995 10.497694 10.241948 10.753440 10.103441 10.89195
## Sep 1995 10.601079 10.345333 10.856825 10.206826 10.99533
## Oct 1995 10.701161 10.445415 10.956907 10.306908 11.09541
## Nov 1995 11.182537 10.926791 11.438282 10.788284 11.57679
## Dec 1995 11.960050 11.704304 12.215796 11.565797 12.35430
## Jan 1996 10.019828 9.760564 10.279093 9.620151 10.41951
## Feb 1996 10.293265 10.034000 10.552530 9.893588 10.69294
## Mar 1996 10.329951 9.982679 10.677222 9.794605 10.86530
## Apr 1996 10.469941 10.210677 10.729206 10.070264 10.86962
## May 1996 10.517395 10.258130 10.776659 10.117718 10.91707
## Jun 1996 10.578756 10.319491 10.838021 10.179079 10.97843
## Jul 1996 10.762402 10.503137 11.021667 10.362725 11.16208
## Aug 1996 10.761932 10.502667 11.021196 10.362254 11.16161
## Sep 1996 10.865317 10.606052 11.124582 10.465640 11.26499
## Oct 1996 10.965399 10.706134 11.224664 10.565722 11.36508
## Nov 1996 11.446774 11.187510 11.706039 11.047097 11.84645
## Dec 1996 12.224288 11.965023 12.483552 11.824611 12.62396
preds.tr <- exp(data.frame(preds))
preds.tr
## Point.Forecast Lo.80 Hi.80 Lo.95 Hi.95
## Jan 1994 13244.70 10285.82 17054.73 8969.583 19557.43
## Feb 1994 17409.81 13520.45 22418.00 11790.284 25707.73
## Mar 1994 18060.36 12860.02 25363.61 10699.594 30484.96
## Apr 1994 20774.16 16133.21 26750.16 14068.696 30675.62
## May 1994 21783.73 16917.24 28050.15 14752.395 32166.37
## Jun 1994 23162.27 17987.81 29825.24 15685.969 34201.95
## Jul 1994 27831.56 21613.98 35837.72 18848.111 41096.73
## Aug 1994 27818.48 21603.82 35820.87 18839.249 41077.41
## Sep 1994 30848.42 23956.87 39722.43 20891.193 45551.50
## Oct 1994 34095.57 26478.61 43903.67 23090.230 50346.32
## Nov 1994 55176.84 42850.31 71049.28 37366.903 81475.41
## Dec 1994 120067.79 93244.59 154607.08 81312.400 177294.90
## Jan 1995 17250.40 13357.65 22277.59 11629.938 25587.08
## Feb 1995 22675.20 17558.28 29283.31 15287.252 33633.55
## Mar 1995 23522.50 16688.88 33154.31 13858.024 39926.92
## Apr 1995 27057.06 20951.33 34942.16 18241.435 40133.06
## May 1995 28371.96 21969.51 36640.25 19127.918 42083.42
## Jun 1995 30167.42 23359.80 38958.95 20338.387 44746.58
## Jul 1995 36248.88 28068.91 46812.70 24438.412 53767.06
## Aug 1995 36231.84 28055.72 46790.69 24426.922 53741.78
## Sep 1995 40178.16 31111.50 51887.06 27087.467 59595.26
## Oct 1995 44407.37 34386.35 57348.77 29938.733 65868.34
## Nov 1995 71864.42 55647.40 92807.48 48449.831 106594.69
## Dec 1995 156380.86 121091.75 201954.08 105429.448 231955.81
## Jan 1996 22467.57 17336.40 29117.46 15065.329 33506.86
## Feb 1996 29533.04 22788.25 38274.14 19802.984 44043.89
## Mar 1996 30636.60 21648.24 43356.94 17936.708 52328.52
## Apr 1996 35240.15 27191.96 45670.42 23629.808 52555.15
## May 1996 36952.72 28513.41 47889.88 24778.151 55109.18
## Jun 1996 39291.20 30317.82 50920.48 26346.183 58596.65
## Jul 1996 47211.93 36429.60 61185.57 31657.322 70409.18
## Aug 1996 47189.73 36412.48 61156.80 31642.439 70376.07
## Sep 1996 52329.57 40378.47 67817.91 35088.887 78041.33
## Oct 1996 57837.85 44628.77 74956.52 38782.394 86256.08
## Nov 1996 93598.96 72222.70 121302.09 62761.521 139588.15
## Dec 1996 203676.38 157160.50 263959.89 136572.460 303751.35
I think there may be overfitting in the model, and missing some factors that may be important in forecasting and predictions.
library(fpp2)
## Warning: package 'fpp2' was built under R version 3.4.4
## Loading required package: ggplot2
##
## Attaching package: 'fpp2'
## The following objects are masked from 'package:fpp':
##
## ausair, ausbeer, austa, austourists, debitcards, departures,
## elecequip, euretail, guinearice, oil, sunspotarea, usmelec
gas1<-window(gasoline,end=2005)
length(gas1)
## [1] 726
fourier.gas1 <- tslm(gas1 ~ trend + fourier(gas1, K=26))
fourier.gas2 <- tslm(gas1 ~ trend + fourier(gas1, K=9))
fourier.gas3 <- tslm(gas1 ~ trend + fourier(gas1, K=4))
summary(fourier.gas1)
##
## Call:
## tslm(formula = gas1 ~ trend + fourier(gas1, K = 26))
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.77873 -0.16653 -0.00024 0.17675 0.94911
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.093e+00 1.951e-02 363.468 < 2e-16 ***
## trend 2.799e-03 4.653e-05 60.155 < 2e-16 ***
## fourier(gas1, K = 26)S1-52 5.843e-03 1.375e-02 0.425 0.670921
## fourier(gas1, K = 26)C1-52 -2.732e-01 1.381e-02 -19.787 < 2e-16 ***
## fourier(gas1, K = 26)S2-52 -4.222e-02 1.374e-02 -3.072 0.002216 **
## fourier(gas1, K = 26)C2-52 -1.959e-02 1.379e-02 -1.421 0.155867
## fourier(gas1, K = 26)S3-52 2.373e-02 1.376e-02 1.725 0.084994 .
## fourier(gas1, K = 26)C3-52 -9.979e-02 1.378e-02 -7.244 1.20e-12 ***
## fourier(gas1, K = 26)S4-52 1.672e-02 1.377e-02 1.215 0.224973
## fourier(gas1, K = 26)C4-52 -2.852e-02 1.376e-02 -2.072 0.038624 *
## fourier(gas1, K = 26)S5-52 3.354e-03 1.377e-02 0.244 0.807615
## fourier(gas1, K = 26)C5-52 -5.144e-02 1.376e-02 -3.738 0.000202 ***
## fourier(gas1, K = 26)S6-52 6.221e-02 1.376e-02 4.520 7.32e-06 ***
## fourier(gas1, K = 26)C6-52 -5.180e-02 1.377e-02 -3.763 0.000183 ***
## fourier(gas1, K = 26)S7-52 3.127e-02 1.376e-02 2.273 0.023358 *
## fourier(gas1, K = 26)C7-52 4.325e-02 1.377e-02 3.140 0.001763 **
## fourier(gas1, K = 26)S8-52 2.083e-02 1.376e-02 1.514 0.130417
## fourier(gas1, K = 26)C8-52 1.802e-02 1.378e-02 1.308 0.191282
## fourier(gas1, K = 26)S9-52 -1.834e-02 1.376e-02 -1.333 0.182848
## fourier(gas1, K = 26)C9-52 1.358e-02 1.378e-02 0.986 0.324499
## fourier(gas1, K = 26)S10-52 -1.772e-02 1.376e-02 -1.287 0.198386
## fourier(gas1, K = 26)C10-52 3.054e-02 1.377e-02 2.218 0.026893 *
## fourier(gas1, K = 26)S11-52 9.772e-04 1.377e-02 0.071 0.943428
## fourier(gas1, K = 26)C11-52 -2.765e-02 1.377e-02 -2.008 0.044999 *
## fourier(gas1, K = 26)S12-52 -2.950e-02 1.377e-02 -2.143 0.032469 *
## fourier(gas1, K = 26)C12-52 -2.556e-02 1.377e-02 -1.857 0.063810 .
## fourier(gas1, K = 26)S13-52 3.307e-03 1.376e-02 0.240 0.810173
## fourier(gas1, K = 26)C13-52 -8.001e-03 1.377e-02 -0.581 0.561452
## fourier(gas1, K = 26)S14-52 -8.868e-03 1.376e-02 -0.645 0.519395
## fourier(gas1, K = 26)C14-52 -1.977e-02 1.378e-02 -1.435 0.151646
## fourier(gas1, K = 26)S15-52 -1.760e-04 1.376e-02 -0.013 0.989797
## fourier(gas1, K = 26)C15-52 1.335e-03 1.378e-02 0.097 0.922854
## fourier(gas1, K = 26)S16-52 -2.143e-02 1.376e-02 -1.558 0.119770
## fourier(gas1, K = 26)C16-52 -5.377e-04 1.377e-02 -0.039 0.968869
## fourier(gas1, K = 26)S17-52 -1.664e-02 1.376e-02 -1.209 0.227033
## fourier(gas1, K = 26)C17-52 -1.620e-02 1.377e-02 -1.177 0.239732
## fourier(gas1, K = 26)S18-52 4.396e-02 1.377e-02 3.193 0.001473 **
## fourier(gas1, K = 26)C18-52 -1.674e-03 1.377e-02 -0.122 0.903269
## fourier(gas1, K = 26)S19-52 5.762e-03 1.376e-02 0.419 0.675646
## fourier(gas1, K = 26)C19-52 1.024e-02 1.377e-02 0.743 0.457458
## fourier(gas1, K = 26)S20-52 9.333e-03 1.376e-02 0.678 0.497751
## fourier(gas1, K = 26)C20-52 2.978e-03 1.377e-02 0.216 0.828910
## fourier(gas1, K = 26)S21-52 -8.704e-03 1.375e-02 -0.633 0.526996
## fourier(gas1, K = 26)C21-52 -2.435e-03 1.378e-02 -0.177 0.859782
## fourier(gas1, K = 26)S22-52 1.058e-03 1.375e-02 0.077 0.938701
## fourier(gas1, K = 26)C22-52 9.891e-03 1.378e-02 0.718 0.473123
## fourier(gas1, K = 26)S23-52 1.011e-02 1.376e-02 0.735 0.462775
## fourier(gas1, K = 26)C23-52 -7.423e-03 1.377e-02 -0.539 0.590007
## fourier(gas1, K = 26)S24-52 -1.775e-02 1.378e-02 -1.288 0.198172
## fourier(gas1, K = 26)C24-52 -4.251e-03 1.375e-02 -0.309 0.757365
## fourier(gas1, K = 26)S25-52 1.774e-02 1.380e-02 1.286 0.198865
## fourier(gas1, K = 26)C25-52 -7.411e-04 1.374e-02 -0.054 0.957007
## fourier(gas1, K = 26)S26-52 1.308e-03 1.392e-02 0.094 0.925169
## fourier(gas1, K = 26)C26-52 2.732e-03 1.385e-02 0.197 0.843658
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2622 on 672 degrees of freedom
## Multiple R-squared: 0.8634, Adjusted R-squared: 0.8526
## F-statistic: 80.13 on 53 and 672 DF, p-value: < 2.2e-16
The models all give similar results, however I think the model with K=26 is the best as it uses more parameters.
CV(fourier.gas1)
## CV AIC AICc BIC AdjR2
## 7.426757e-02 -1.889694e+03 -1.880500e+03 -1.637379e+03 8.526098e-01
CV(fourier.gas2)
## CV AIC AICc BIC AdjR2
## 7.161451e-02 -1.912292e+03 -1.910980e+03 -1.815954e+03 8.506543e-01
CV(fourier.gas3)
## CV AIC AICc BIC AdjR2
## 7.648608e-02 -1.864112e+03 -1.863742e+03 -1.813649e+03 8.382404e-01
Based on AIC and CV metrics, the third model, using K=9 is actually the best model.
checkresiduals(fourier.gas2)
##
## Breusch-Godfrey test for serial correlation of order up to 104
##
## data: Residuals from Linear regression model
## LM test = 156.67, df = 104, p-value = 0.0006505
The residuals appear normal, however there may be some autocorrelation present.
fc <- forecast(fourier.gas2, newdata=data.frame(fourier(gas1, 9, 52)))
fc
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2005.014 8.777225 8.433379 9.121071 8.250947 9.303502
## 2005.033 8.633547 8.289604 8.977489 8.107121 9.159972
## 2005.052 8.597550 8.253577 8.941522 8.071079 9.124021
## 2005.071 8.646957 8.303008 8.990907 8.120521 9.173393
## 2005.090 8.729791 8.385918 9.073664 8.203472 9.256110
## 2005.110 8.806443 8.462643 9.150242 8.280237 9.332649
## 2005.129 8.866663 8.522877 9.210449 8.340477 9.392848
## 2005.148 8.917757 8.573961 9.261552 8.391557 9.443957
## 2005.167 8.965209 8.621424 9.308994 8.439025 9.491393
## 2005.186 9.005903 8.662131 9.349676 8.479739 9.532068
## 2005.205 9.034890 8.691120 9.378660 8.508729 9.561051
## 2005.225 9.052825 8.709052 9.396598 8.526660 9.578991
## 2005.244 9.065201 8.721422 9.408980 8.539026 9.591375
## 2005.263 9.077132 8.733352 9.420912 8.550956 9.603308
## 2005.282 9.091893 8.748119 9.435668 8.565726 9.618061
## 2005.301 9.113713 8.769943 9.457484 8.587552 9.639875
## 2005.320 9.147594 8.803823 9.491365 8.621431 9.673756
## 2005.340 9.192685 8.848910 9.536459 8.666517 9.718853
## 2005.359 9.236845 8.893067 9.580623 8.710672 9.763018
## 2005.378 9.263132 8.919356 9.606908 8.736961 9.789302
## 2005.397 9.266926 8.923154 9.610698 8.740762 9.793090
## 2005.416 9.267083 8.923313 9.610854 8.740921 9.793245
## 2005.435 9.295168 8.951395 9.638940 8.769003 9.821332
## 2005.455 9.367169 9.023393 9.710944 8.840999 9.893339
## 2005.474 9.462974 9.119197 9.806751 8.936803 9.989145
## 2005.493 9.536563 9.192789 9.880337 9.010396 10.062730
## 2005.512 9.552560 9.208789 9.896331 9.026398 10.078723
## 2005.531 9.517666 9.173895 9.861437 8.991504 10.043829
## 2005.550 9.476771 9.132997 9.820545 8.950604 10.002938
## 2005.570 9.474840 9.131063 9.818617 8.948669 10.001012
## 2005.589 9.517911 9.174136 9.861687 8.991742 10.044081
## 2005.608 9.568096 9.224324 9.911869 9.041932 10.094261
## 2005.627 9.576123 9.232352 9.919893 9.049961 10.102285
## 2005.646 9.520957 9.177185 9.864730 8.994793 10.047122
## 2005.665 9.422126 9.078350 9.765902 8.895955 9.948296
## 2005.685 9.318736 8.974959 9.662513 8.792564 9.844908
## 2005.704 9.240131 8.896357 9.583905 8.713964 9.766298
## 2005.723 9.195783 8.852012 9.539554 8.669620 9.721945
## 2005.742 9.186562 8.842791 9.530333 8.660399 9.712724
## 2005.761 9.215402 8.871627 9.559176 8.689234 9.741570
## 2005.780 9.279492 8.935714 9.623271 8.753318 9.805667
## 2005.800 9.353158 9.009381 9.696935 8.826986 9.879330
## 2005.819 9.388600 9.044828 9.732372 8.862436 9.914765
## 2005.838 9.347153 9.003383 9.690923 8.820992 9.873314
## 2005.857 9.238738 8.894966 9.582511 8.712573 9.764903
## 2005.876 9.129196 8.785416 9.472976 8.603020 9.655373
## 2005.895 9.097817 8.754033 9.441602 8.571635 9.624000
## 2005.915 9.173478 8.829701 9.517256 8.647306 9.699651
## 2005.934 9.303287 8.959514 9.647060 8.777122 9.829452
## 2005.953 9.384871 9.041099 9.728643 8.858707 9.911035
## 2005.972 9.340037 8.996246 9.683827 8.813844 9.866229
## 2005.991 9.171444 8.827563 9.515325 8.645113 9.697775
gas2<-window(gasoline,start=2005,end=2006)
autoplot(gas2, series="Data") + autolayer(fc$mean, series="Fitted")
The forecasted data predicts the trend of the time series pretty well. However, it does not predict the large rises and falls, which is most likely due to missing factors.
?cangas
autoplot(cangas)
ggseasonplot(cangas)
ggsubseriesplot(cangas)
There is change in the seasonality over time in gas production in Canada from 1960 to 2005. There seems to be an increase in variability throughout the year. I am not sure why this is, but maybe it has to do with the increased knowlege of gas producers, giving them the ability to change production as they see fit, whereas in early years they may have had less knowlege of the market and therefore produced at a steadier rate.
plot(stl(cangas,t.window=46, s.window="periodic", robust=TRUE))
plot(stl(cangas,t.window=46, s.window=3, robust=TRUE))
plot(stl(cangas,t.window=46, s.window=12, robust=TRUE))
plot(stl(cangas,t.window=46, s.window=100, robust=TRUE))
Using larger numbers for the window parameter, creates a more normal season plot. This is because it uses more of the recent years to find seasonality. Using a smaller number creates irregular season plots because it only uses a few of the recent years.
library(seasonal)
autoplot(seas(cangas,x11=""))
For some reason I was unable to produce the SEATS plot, however the X11 plot gives a good remainder plot for detecting outliers. Everything is similar to the original plot though.
plot(stl(bricksq,t.window=39, s.window="periodic", robust=TRUE))
plot(stl(bricksq,t.window=39, s.window=8, robust=TRUE))
plot(stl(bricksq,t.window=39, s.window=6, robust=TRUE))
There is an increasing trend and seasonality in this time series. There does not appear to be cyclicity.
plot(seasadj((stl(bricksq,t.window=39, s.window="periodic", robust=TRUE))))
#####C)
fit<-stl(bricksq,t.window=39, s.window="periodic", robust=TRUE)
plot(naive(seasadj(fit)))
#####D)
fcast <- stlf(bricksq, method='naive')
forecast(fcast,h=8)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 1994 Q4 465.7338 440.1878 491.2798 426.6645 504.8030
## 1995 Q1 424.6739 388.5464 460.8014 369.4217 479.9261
## 1995 Q2 483.9926 439.7456 528.2395 416.3227 551.6625
## 1995 Q3 494.0000 442.9080 545.0920 415.8615 572.1385
## 1995 Q4 465.7338 408.6112 522.8564 378.3723 553.0952
## 1996 Q1 424.6739 362.0992 487.2486 328.9742 520.3736
## 1996 Q2 483.9926 416.4042 551.5809 380.6251 587.3600
## 1996 Q3 494.0000 421.7450 566.2550 383.4955 604.5045
checkresiduals(fcast)
There seems to be some possible autocorrelation in the residuals #####F)
fit <- stl(bricksq, t.window=39, s.window="periodic", robust=TRUE)
plot(naive(seasadj(fit)))
#####G
bricktrain<-window(bricksq, end=c(1991,4))
bricktest<-window(bricksq, start=c(1992,1), end=c(1993,4))
fit.naive<-snaive(bricktrain)
fit.stlf<-stlf(bricktrain)
fcast.naive<-forecast(fit.naive,h=8)
fcast.stlf<-forecast(fit.stlf,h=8)
names(fcast.naive)
## [1] "method" "model" "level" "mean" "lower"
## [6] "upper" "x" "series" "fitted" "residuals"
fcast.naive$mean
## Qtr1 Qtr2 Qtr3 Qtr4
## 1992 373 436 424 430
## 1993 373 436 424 430
res.naive<-as.numeric(bricktest)-fcast.naive$mean
res.stlf<-as.numeric(bricktest)-fcast.stlf$mean
autoplot(res.naive) + autolayer(res.stlf)
The stiff forecast performs better than the seasonal naive forecast.
plot(fancy)
Should log transform
fcast.fancy.naive <- stlf(writing, method='naive')
fcast.fancy.drift <- stlf(writing, method='rwdrift')
fcast.fancy.drift2 <- stlf(writing, method='rwdrift', lambda=0)
res.fancy.naive<-as.numeric(residuals(fcast.fancy.naive))
res.fancy.drift<-as.numeric(residuals(fcast.fancy.drift))
res.fancy.drift2<-as.numeric(residuals(fcast.fancy.drift2, type="response"))
res.fancy.naive<-na.omit(res.fancy.naive)
res.fancy.drift<-na.omit(res.fancy.drift)
res.fancy.drift2<-na.omit(res.fancy.drift2)
Compare models with RMSE
RMSE.fancy.naive<-sqrt(sum((res.fancy.naive)^2)/119)
RMSE.fancy.drift<-sqrt(sum((res.fancy.drift)^2)/119)
RMSE.fancy.drift2<-sqrt(sum((res.fancy.drift2)^2)/119)
RMSE.fancy.naive
## [1] 46.6923
RMSE.fancy.drift
## [1] 46.55813
RMSE.fancy.drift2
## [1] 45.38429
The log transformed model has the lowest RMSE and is therefore the best model