knitr::opts_chunk$set(echo = TRUE)
library(fpp2)
## Warning: package 'fpp2' was built under R version 4.0.5
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## -- Attaching packages ---------------------------------------------- fpp2 2.4 --
## v ggplot2 3.3.3 v fma 2.4
## v forecast 8.14 v expsmooth 2.3
## Warning: package 'fma' was built under R version 4.0.5
## Warning: package 'expsmooth' was built under R version 4.0.5
##
Daily electricity demand for Victoria, Australia, during 2014 is contained in elecdaily. The data for the first 20 days can be obtained as follows.
daily20 <- head(elecdaily,20)
daily20 <- ts(daily20, frequency = 1)
Plot the data and find the regression model for Demand with temperature as an explanatory variable. Why is there a positive relationship?
autoplot(daily20, main="Daily Electricity Demand for Victoria, Australia \n First 20 Days of 2014",
xlab = "Days", ylab = "Demand (MW), Temperature (C)") +
geom_point() +
xlim(1, 20) +
theme(plot.title = element_text(hjust = 0.5))
lmq1a <- lm(Demand~Temperature, data=daily20)
summary(lmq1a)
##
## Call:
## lm(formula = Demand ~ Temperature, data = daily20)
##
## Residuals:
## Min 1Q Median 3Q Max
## -46.060 -7.117 -1.437 17.484 27.102
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 39.2117 17.9915 2.179 0.0428 *
## Temperature 6.7572 0.6114 11.052 1.88e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 22 on 18 degrees of freedom
## Multiple R-squared: 0.8716, Adjusted R-squared: 0.8644
## F-statistic: 122.1 on 1 and 18 DF, p-value: 1.876e-09
cor(daily20)
## Demand WorkDay Temperature
## Demand 1.0000000 0.5380072 0.9335758
## WorkDay 0.5380072 1.0000000 0.4537745
## Temperature 0.9335758 0.4537745 1.0000000
It’s important to recognize that, as Australia is in the southern hemisphere, Victoria would be experiencing summer temperatures in January (i.e. the first 20 days of 2014). This helps explain the strong positive relationship between electricity demand and temperature (indicated by the positive r value of 0.933), because as temperature increases, use of cooling appliances increases, which is especially common during summer months. If this analysis was conducted on winter data, the inverse would likely be true due to the need for electricity to power heating devices.
Produce a residual plot. Is the model adequate? Are there any outliers or influential observations?
plot(lmq1a)
checkresiduals(lmq1a)
##
## Breusch-Godfrey test for serial correlation of order up to 5
##
## data: Residuals
## LM test = 3.8079, df = 5, p-value = 0.5774
The model does a decent job in terms of adequacy. When viewing these residual plots, generally, the normality assumption holds. No obvious pattern exists within the residuals (there is a slight left skew in terms of the residual distribution, and an uptick in autocorrelation at lag 4, though not extreme). However, observations 1, 5, and 9 appear to be possible outliers. Given the nature of the data (i.e. summer temperatures in Australia and electricity demand), we would expect some outliers, possibly reflecting the irregular nature of summer temperatures/weather. Some days might be more humid than others, some days might have more wind, etc.
Use the model to forecast the electricity demand that you would expect for the next day if the maximum temperature was 15 C and compare it with the forecast if the with maximum temperature was 35 C. Do you believe these forecasts?
temp.forecasts <- data.frame(Temperature = c(15,35))
lmq1.forecast <- forecast(lmq1a, newdata = temp.forecasts)
lmq1.forecast
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 1 140.5701 108.6810 172.4591 90.21166 190.9285
## 2 275.7146 245.2278 306.2014 227.57056 323.8586
autoplot(lmq1.forecast)
I generally believe these forecasts. The high R^2 of the model and strong positive correlation between temperature and electricity demand tend to suggest that the explantory power of the linear model is pretty good. The forecasts fall in line with that model, obviously, and fit well within the scope of the other data.
Give prediction intervals for your forecasts. The following R code will get you started:
lmq1.forecast
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 1 140.5701 108.6810 172.4591 90.21166 190.9285
## 2 275.7146 245.2278 306.2014 227.57056 323.8586
autoplot(lmq1.forecast)
Plot Demand vs Temperature for all of the available data in elecdaily. What does this say about your model?
plot(Demand ~ Temperature, data=elecdaily, main="Electricity Demand vs. Temperature for Victoria in 2014")
I think this plot of all the 2014 data supports some of the previous discussion above, along with part of our model. As discussed before, electricity demand tends to increase in warm months to cool things, and based off this plot, it also increases when temperatures are cooler, which makes sense (to heat things). Moderate temperatures tend to reflect the lowest electricity demand. My model is pretty good, but it’s only useful for summer months generally, as it was only “trained” on summer-month data. This explains why our 15 C forecast for demand is low, when in reality it may be higher because 15 C might be cool enough to justify heating. My model does not account for that “electricity demand to heat” effect. A model reflecting seasonality or a quadratic, instead of linear, model may be better to encapsulate the entire dataset, rather than just the first 20 days.
Data set mens400 contains the winning times (in seconds) for the men’s 400 meters final in each Olympic Games from 1896 to 2016.
Plot the winning time against the year. Describe the main features of the plot.
autoplot(mens400, main="Winning times for Men's 400 meter Olympic Games Final \n 1896-2016", ylab="Time (seconds)", xlab="Year") +
theme(plot.title = element_text(hjust = 0.5))
As shown by the plot, men’s 400m winning times are trending down, though it seems at a decreasing rate, which makes sense as technology/human potential approaches its own asymtote. Moreover, there’s periods of missing data, reflecting the years the olympics were not held due to WWI and WWII.
Fit a regression line to the data. Obviously the winning times have been decreasing, but at what average rate per year?
autoplot(mens400, main="Winning times for Men's 400 meter Olympic Games Final \n 1896-2016", ylab="Time (seconds)", xlab="Year") + geom_smooth(method="lm", se=FALSE) + theme(plot.title = element_text(hjust = 0.5))
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 3 rows containing non-finite values (stat_smooth).
time.year <- time(mens400)
lmq2b <- tslm(mens400 ~ time.year, data=mens400)
summary(lmq2b)
##
## Call:
## tslm(formula = mens400 ~ time.year, data = mens400)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.6002 -0.5747 -0.2858 0.5751 4.1505
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 172.481477 11.487522 15.02 2.52e-14 ***
## time.year -0.064574 0.005865 -11.01 2.75e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.136 on 26 degrees of freedom
## (3 observations deleted due to missingness)
## Multiple R-squared: 0.8234, Adjusted R-squared: 0.8166
## F-statistic: 121.2 on 1 and 26 DF, p-value: 2.752e-11
Our basic linear model suggests that as the time increases by 1 year, the winning time for the men’s 400m decreases by about 0.06 seconds on average. Every 4 years (i.e. every Olympic Games), the winning time decreases by about 0.25 seconds.
Plot the residuals against the year. What does this indicate about the suitability of the fitted line?
cbind(Time = time.year, Residuals = lmq2b$residuals) %>%
as.data.frame() %>%
ggplot(aes(x = Time, y = Residuals)) +
geom_point() +
xlab("Year") +
ggtitle("Residuals vs Year") +
theme(plot.title = element_text(hjust = 0.5))
## Warning: Removed 3 rows containing missing values (geom_point).
The residual v. year plot generally shows no pattern in the residuals, though there is a recent uptrend in the residuals starting in 1980. It’s clear that 1896 is an outlier, and the uptrend in the residuals starting in 1980-present may indicate that the linear model is becoming less accurate with newer observations.
Predict the winning time for the men’s 400 meters final in the 2020 Olympics. Give a prediction interval for your forecasts. What assumptions have you made in these calculations?
mens400.forecast <- forecast(lmq2b, newdata = data.frame(time.year = 2020))
autoplot(mens400) + autolayer(mens400.forecast) + geom_point() +
xlab("Year") + ylab("Time (seconds)") +
ggtitle("Winning times for Men's 400 meter Olympic Games Final \n 1896-2016, with 2020 Prediction") +
theme(plot.title = element_text(hjust = 0.5))
## Warning: Removed 3 rows containing missing values (geom_point).
mens400.forecast
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2020 42.04231 40.44975 43.63487 39.55286 44.53176
In terms of assumptions made to make these calculations, I have made all the assumptions underlying OLS modeling. For building the prediction intervals specifically, the assumption of normality is essential for the construction of the 80% and 95% with the standard errors. If this assumption does not hold, then the interval is essentially meaningless.
Type easter(ausbeer) and interpret what you see.
easter(ausbeer)
## Qtr1 Qtr2 Qtr3 Qtr4
## 1956 0.67 0.33 0.00 0.00
## 1957 0.00 1.00 0.00 0.00
## 1958 0.00 1.00 0.00 0.00
## 1959 1.00 0.00 0.00 0.00
## 1960 0.00 1.00 0.00 0.00
## 1961 0.33 0.67 0.00 0.00
## 1962 0.00 1.00 0.00 0.00
## 1963 0.00 1.00 0.00 0.00
## 1964 1.00 0.00 0.00 0.00
## 1965 0.00 1.00 0.00 0.00
## 1966 0.00 1.00 0.00 0.00
## 1967 1.00 0.00 0.00 0.00
## 1968 0.00 1.00 0.00 0.00
## 1969 0.00 1.00 0.00 0.00
## 1970 1.00 0.00 0.00 0.00
## 1971 0.00 1.00 0.00 0.00
## 1972 0.33 0.67 0.00 0.00
## 1973 0.00 1.00 0.00 0.00
## 1974 0.00 1.00 0.00 0.00
## 1975 1.00 0.00 0.00 0.00
## 1976 0.00 1.00 0.00 0.00
## 1977 0.00 1.00 0.00 0.00
## 1978 1.00 0.00 0.00 0.00
## 1979 0.00 1.00 0.00 0.00
## 1980 0.00 1.00 0.00 0.00
## 1981 0.00 1.00 0.00 0.00
## 1982 0.00 1.00 0.00 0.00
## 1983 0.00 1.00 0.00 0.00
## 1984 0.00 1.00 0.00 0.00
## 1985 0.00 1.00 0.00 0.00
## 1986 1.00 0.00 0.00 0.00
## 1987 0.00 1.00 0.00 0.00
## 1988 0.00 1.00 0.00 0.00
## 1989 1.00 0.00 0.00 0.00
## 1990 0.00 1.00 0.00 0.00
## 1991 1.00 0.00 0.00 0.00
## 1992 0.00 1.00 0.00 0.00
## 1993 0.00 1.00 0.00 0.00
## 1994 0.00 1.00 0.00 0.00
## 1995 0.00 1.00 0.00 0.00
## 1996 0.00 1.00 0.00 0.00
## 1997 1.00 0.00 0.00 0.00
## 1998 0.00 1.00 0.00 0.00
## 1999 0.00 1.00 0.00 0.00
## 2000 0.00 1.00 0.00 0.00
## 2001 0.00 1.00 0.00 0.00
## 2002 1.00 0.00 0.00 0.00
## 2003 0.00 1.00 0.00 0.00
## 2004 0.00 1.00 0.00 0.00
## 2005 1.00 0.00 0.00 0.00
## 2006 0.00 1.00 0.00 0.00
## 2007 0.00 1.00 0.00 0.00
## 2008 1.00 0.00 0.00 0.00
## 2009 0.00 1.00 0.00 0.00
## 2010 0.00 1.00
The resulting dataset shows the occurence of something in each year in each quarter from 1956 to 2010 for most likely something to do with australian beer. I would say it’s a dummy variable, but it sometypes takes fractional values, so it’s not a binary dummy variable. I am assuming the fractional cases involve a situation where the occurence/event spans both quarters, which tends to suggest a multi-day event, like a three-day holiday weekend (indicated by the .33 and .66 entries rather than .5 entries which would imply a 2-day event).
Express y as a function of x and show that the coefficient β1 is the elasticity coefficient.
Start: log(y) = β0 + β1log(x) + e 1) The integral of 1/x equals ln(x).
β1=(dy/dx)*(x/y) –> β1(dx/x)=(dy/y).
Using 1, β1(dx/x)=(dy/y) –> β1(log(x))=log(y).
β1=log(y)/log(x) i.e. ~ the percent change in y over the percent change in x (the elasticity)
The data set fancy concerns the monthly sales figures of a shop which opened in January 1987 and sells gifts, souvenirs, and novelties. The shop is situated on the wharf at a beach resort town in Queensland, Australia. The sales volume varies with the seasonal population of tourists. There is a large influx of visitors to the town at Christmas and for the local surfing festival, held every March since 1988. Over time, the shop has expanded its premises, range of products, and staff.
Produce a time plot of the data and describe the patterns in the graph. Identify any unusual or unexpected fluctuations in the time series.
autoplot(fancy, main="Sales Volume 1987-1994", ylab="Sales Volume ($)", xlab = "Year") +
theme(plot.title = element_text(hjust = 0.5))
head(fancy)
## Jan Feb Mar Apr May Jun
## 1987 1664.81 2397.53 2840.71 3547.29 3752.96 3714.74
There are not any obvious “unusal” flucuations in the time series data. As you can see, it is very seasonal, and the peaks coincide with Christmas and the March festival. I think, in this case, the trend is more interesting, as while the seasonality has stayed relatively consistent, the “strength”/size of the peaks has increased/trended up with sales volume in general. It would seem Queensland became significantly more popular from 1992 onwards.
Explain why it is necessary to take logarithms of these data before fitting a model.
It’s necessary to do log transformations of the data before fitting a model in this case because of the highly seasonal nature of the data. The seasonality could lead to heteroskedastic residuals/patterns in the residuals (instead of the OLS assumed homoskedastic). Log transformations could solve this to uphold the necessary assumptions.
Use R to fit a regression model to the logarithms of these sales data with a linear trend, seasonal dummies and a “surfing festival” dummy variable.
time.fancy <- time(fancy)
surfFest <- c()
for(i in 1:length(time.fancy)){
month <- round(12*(time.fancy[i] - floor(time.fancy[i]))) + 1
year <- floor(time.fancy[i])
if(year >= 1988 & month == 3)
{
surfFest[i] <- 1
} else {
surfFest[i] <- 0
}
}
lmq5c <- tslm(log(fancy) ~ trend + season + surfFest)
summary(lmq5c)
##
## Call:
## tslm(formula = log(fancy) ~ trend + season + surfFest)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.33673 -0.12757 0.00257 0.10911 0.37671
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.6196670 0.0742471 102.626 < 2e-16 ***
## trend 0.0220198 0.0008268 26.634 < 2e-16 ***
## season2 0.2514168 0.0956790 2.628 0.010555 *
## season3 0.2660828 0.1934044 1.376 0.173275
## season4 0.3840535 0.0957075 4.013 0.000148 ***
## season5 0.4094870 0.0957325 4.277 5.88e-05 ***
## season6 0.4488283 0.0957647 4.687 1.33e-05 ***
## season7 0.6104545 0.0958039 6.372 1.71e-08 ***
## season8 0.5879644 0.0958503 6.134 4.53e-08 ***
## season9 0.6693299 0.0959037 6.979 1.36e-09 ***
## season10 0.7473919 0.0959643 7.788 4.48e-11 ***
## season11 1.2067479 0.0960319 12.566 < 2e-16 ***
## season12 1.9622412 0.0961066 20.417 < 2e-16 ***
## surfFest 0.5015151 0.1964273 2.553 0.012856 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.179 on 70 degrees of freedom
## Multiple R-squared: 0.9567, Adjusted R-squared: 0.9487
## F-statistic: 119 on 13 and 70 DF, p-value: < 2.2e-16
Plot the residuals against time and against the fitted values. Do these plots reveal any problems with the model?
autoplot(lmq5c$residuals)
cbind(Residuals = lmq5c$residuals,
FittedValues = lmq5c$fitted.values)%>%
as.data.frame%>%
ggplot(aes(x=FittedValues, y=Residuals))+geom_point()
From the plots, there doesn’t appear to be any obvious pattern in the residuals after accounting for trend, seasonality, and the March festival.
Do boxplots of the residuals for each month. Does this reveal any problems with the model?
cbind.data.frame(
Month = factor(
month.abb[round(12*(time.fancy - floor(time.fancy)) + 1)],
labels = month.abb,
ordered = TRUE
),
Residuals = lmq5c$residuals
) %>%
ggplot(aes(x = Month,
y = Residuals)) +
geom_boxplot()
## Don't know how to automatically pick scale for object of type ts. Defaulting to continuous.
Some of the months (like Feb, Nov, Dec) have a larger spread/range of residuals, which may warrant further investigation. It’s not necessarily a cause for concern, but it does indicate that the residuals are not truly homoskedastic, as they have increased variance in certain months, and less variance in others.
What do the values of the coefficients tell you about each variable?
lmq5c$coefficients
## (Intercept) trend season2 season3 season4 season5
## 7.61966701 0.02201983 0.25141682 0.26608280 0.38405351 0.40948697
## season6 season7 season8 season9 season10 season11
## 0.44882828 0.61045453 0.58796443 0.66932985 0.74739195 1.20674790
## season12 surfFest
## 1.96224123 0.50151509
SurfFest has a value of 0.5015, showing an increase in sales during the festival/March compared to the baseline null case. We see the general positive uptrend, shown by the trend coefficient. The coefficients on the seasonality dummies show the increase in sales during the Christmas holiday, while the march seasonality variable (season3) doesn’t see that uptick because it’s embodied in surfFest.
What does the Breusch-Godfrey test tell you about your model?
library(lmtest)
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
bgtest(lmq5c)
##
## Breusch-Godfrey test for serial correlation of order up to 1
##
## data: lmq5c
## LM test = 25.031, df = 1, p-value = 5.642e-07
checkresiduals(lmq5c)
##
## Breusch-Godfrey test for serial correlation of order up to 17
##
## data: Residuals from Linear regression model
## LM test = 37.954, df = 17, p-value = 0.002494
The Breusch-Godfrey test tells us if there is autocorrelation in our regression model. The low p-value means we can reject the null (that there is no autocorrelation), meaning there is autocorrelation/heteroskedasticity in our residuals. This is backed up by the ACF plot above. In the small lags (1 and 2), there appears to be autocorrelation, as well as in some of the more distant lags.
Regardless of your answers to the above questions, use your regression model to predict the monthly sales for 1994, 1995, and 1996. Produce prediction intervals for each of your forecasts.
threeyr <- ts(data=time.fancy, start=1994, end=c(1996,12), frequency=12)
threeyr.forecast <- forecast(lmq5c, threeyr)
## Warning in forecast.lm(lmq5c, threeyr): newdata column names not specified,
## defaulting to first variable required.
threeyr.forecast
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 1994 1006.002 501.0539 1510.950 227.5849 1784.419
## Feb 1994 1006.317 501.3479 1511.286 227.8675 1784.767
## Mar 1994 1006.396 501.6231 1511.168 228.2491 1784.542
## Apr 1994 1006.577 501.5658 1511.589 228.0624 1785.092
## May 1994 1006.667 501.6339 1511.699 228.1190 1785.214
## Jun 1994 1006.770 501.7159 1511.824 228.1896 1785.350
## Jul 1994 1006.995 501.9201 1512.070 228.3823 1785.608
## Aug 1994 1007.036 501.9403 1512.133 228.3910 1785.682
## Sep 1994 1007.182 502.0643 1512.299 228.5036 1785.860
## Oct 1994 1007.324 502.1850 1512.462 228.6128 1786.034
## Nov 1994 1007.847 502.6870 1513.006 229.1033 1786.590
## Dec 1994 1008.666 503.4851 1513.847 229.8900 1787.442
## Jan 1995 1006.768 501.5678 1511.967 227.9624 1785.573
## Feb 1995 1007.083 501.8618 1512.304 228.2450 1785.921
## Mar 1995 1007.161 502.1369 1512.186 228.6266 1785.696
## Apr 1995 1007.343 502.0797 1512.606 228.4399 1786.246
## May 1995 1007.432 502.1478 1512.717 228.4965 1786.368
## Jun 1995 1007.535 502.2298 1512.841 228.5670 1786.504
## Jul 1995 1007.761 502.4340 1513.088 228.7598 1786.762
## Aug 1995 1007.802 502.4542 1513.150 228.7685 1786.836
## Sep 1995 1007.947 502.5782 1513.317 228.8810 1787.014
## Oct 1995 1008.089 502.6989 1513.480 228.9903 1787.188
## Nov 1995 1008.612 503.2009 1514.024 229.4808 1787.744
## Dec 1995 1009.432 503.9990 1514.865 230.2674 1788.596
## Jan 1996 1007.533 502.0817 1512.985 228.3399 1786.727
## Feb 1996 1007.849 502.3757 1513.321 228.6225 1787.075
## Mar 1996 1007.927 502.6508 1513.203 229.0041 1786.850
## Apr 1996 1008.109 502.5936 1513.624 228.8174 1787.400
## May 1996 1008.198 502.6617 1513.734 228.8740 1787.522
## Jun 1996 1008.301 502.7437 1513.859 228.9445 1787.658
## Jul 1996 1008.527 502.9479 1514.105 229.1373 1787.916
## Aug 1996 1008.568 502.9681 1514.168 229.1460 1787.990
## Sep 1996 1008.713 503.0921 1514.334 229.2585 1788.168
## Oct 1996 1008.855 503.2128 1514.497 229.3678 1788.342
## Nov 1996 1009.378 503.7148 1515.042 229.8583 1788.898
## Dec 1996 1010.198 504.5129 1515.882 230.6449 1789.750
Transform your predictions and intervals to obtain predictions and intervals for the raw data.
threeyr.trans <- (BoxCox(fancy, 0))
forecast(threeyr.trans)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 1994 9.674641 9.464046 9.885235 9.352565 9.996717
## Feb 1994 9.958925 9.733371 10.184479 9.613970 10.303880
## Mar 1994 10.439733 10.200143 10.679323 10.073312 10.806155
## Apr 1994 10.121042 9.868185 10.373898 9.734331 10.507752
## May 1994 10.157291 9.891822 10.422759 9.751292 10.563289
## Jun 1994 10.227196 9.949682 10.504711 9.802774 10.651618
## Jul 1994 10.403289 10.114223 10.692356 9.961200 10.845378
## Aug 1994 10.382661 10.082480 10.682842 9.923574 10.841748
## Sep 1994 10.518849 10.207944 10.829754 10.043361 10.994337
## Oct 1994 10.612257 10.290980 10.933534 10.120906 11.103608
## Nov 1994 11.108401 10.777070 11.439732 10.601674 11.615129
## Dec 1994 11.898748 11.557653 12.239843 11.377088 12.420408
## Jan 1995 9.985303 9.634703 10.335903 9.449107 10.521499
## Feb 1995 10.269587 9.909735 10.629440 9.719240 10.819934
## Mar 1995 10.750396 10.381517 11.119274 10.186244 11.314547
## Apr 1995 10.431704 10.054009 10.809399 9.854070 11.009338
## May 1995 10.467953 10.081638 10.854268 9.877135 11.058770
## Jun 1995 10.537858 10.143107 10.932610 9.934137 11.141579
## Jul 1995 10.713952 10.310934 11.116969 10.097590 11.330314
## Aug 1995 10.693323 10.282201 11.104445 10.064567 11.322080
## Sep 1995 10.829511 10.410437 11.248586 10.188592 11.470430
## Oct 1995 10.922919 10.496035 11.349803 10.270057 11.575781
## Nov 1995 11.419063 10.984506 11.853621 10.754465 12.083661
## Dec 1995 12.209410 11.767308 12.651512 11.533273 12.885547
How could you improve these predictions by modifying the model?
The current OLS model does not account for the growth trend very well, which is almost exponential in nature. Using some type of power transformation could possibly help account for this. We could also consider adding higher order terms into our model, or using exponential regression.
The gasoline series consists of weekly data for supplies of US finished motor gasoline product, from 2 February 1991 to 20 January 2017. The units are in “million barrels per day.” Consider only the data to the end of 2004.
Fit a harmonic regression with trend to the data. Experiment with changing the number Fourier terms. Plot the observed gasoline and fitted values and comment on what you see.
gas <- window(gasoline,end=2005)
autoplot(gas)
lmq6a1 <- tslm(gas~trend + fourier(gas, K=2))
lmq6a2 <- tslm(gas~trend + fourier(gas, K=5))
lmq6a3 <- tslm(gas~trend + fourier(gas, K=8))
lmq6a4 <- tslm(gas~trend + fourier(gas, K=11))
autoplot(gas) + autolayer(fitted(lmq6a1), series = "K=2") + autolayer(fitted(lmq6a2), series = "K=5") + autolayer(fitted(lmq6a3), series = "K=8") + autolayer(fitted(lmq6a4), series = "K=11")
plot.ts(gas, lmq6a2$fitted.values)
The fitted v. observed value plot shows that the model is pretty decent, as the fitted and observed values tend to move together (i.e. positive relationship). Moreover, the spread of the data in this plot doesn’t indicate anything problematic.
Select the appropriate number of Fourier terms to include by minimising the AICc or CV value.
gas <- window(gasoline,end=2005)
for(num in c(1:15)){
var_name <- paste("tslm",
as.character(num),
"gas",
sep = "")
assign(var_name,tslm(gas ~ trend + fourier(
gas,
K = num)))
}
for(i in c(1:15)){
tslm_gas.name <- paste(
"tslm", as.character(i), "gas",
sep = ""
)
writeLines(
paste(
"\n", tslm_gas.name, "\n"
)
)
print(CV(get(tslm_gas.name)))
}
##
## tslm1gas
##
## CV AIC AICc BIC AdjR2
## 8.201921e-02 -1.813292e+03 -1.813208e+03 -1.790354e+03 8.250858e-01
##
## tslm2gas
##
## CV AIC AICc BIC AdjR2
## 8.136854e-02 -1.819113e+03 -1.818957e+03 -1.787001e+03 8.269569e-01
##
## tslm3gas
##
## CV AIC AICc BIC AdjR2
## 7.658846e-02 -1.863085e+03 -1.862834e+03 -1.821797e+03 8.375702e-01
##
## tslm4gas
##
## CV AIC AICc BIC AdjR2
## 7.648608e-02 -1.864112e+03 -1.863742e+03 -1.813649e+03 8.382404e-01
##
## tslm5gas
##
## CV AIC AICc BIC AdjR2
## 7.553646e-02 -1.873234e+03 -1.872723e+03 -1.813596e+03 8.406928e-01
##
## tslm6gas
##
## CV AIC AICc BIC AdjR2
## 0.0725333 -1902.7534284 -1902.0773721 -1833.9401782 0.8474536
##
## tslm7gas
##
## CV AIC AICc BIC AdjR2
## 0.0714734 -1913.5362459 -1912.6718391 -1835.5478956 0.8501073
##
## tslm8gas
##
## CV AIC AICc BIC AdjR2
## 7.147431e-02 -1.913619e+03 -1.912542e+03 -1.826455e+03 8.505267e-01
##
## tslm9gas
##
## CV AIC AICc BIC AdjR2
## 7.161451e-02 -1.912292e+03 -1.910980e+03 -1.815954e+03 8.506543e-01
##
## tslm10gas
##
## CV AIC AICc BIC AdjR2
## 7.135754e-02 -1.915014e+03 -1.913441e+03 -1.809500e+03 8.516102e-01
##
## tslm11gas
##
## CV AIC AICc BIC AdjR2
## 7.136488e-02 -1.915085e+03 -1.913228e+03 -1.800396e+03 8.520196e-01
##
## tslm12gas
##
## CV AIC AICc BIC AdjR2
## 7.096945e-02 -1.919276e+03 -1.917110e+03 -1.795412e+03 8.532618e-01
##
## tslm13gas
##
## CV AIC AICc BIC AdjR2
## 7.133807e-02 -1.915672e+03 -1.913172e+03 -1.782633e+03 8.529214e-01
##
## tslm14gas
##
## CV AIC AICc BIC AdjR2
## 7.149714e-02 -1.914218e+03 -1.911359e+03 -1.772004e+03 8.530152e-01
##
## tslm15gas
##
## CV AIC AICc BIC AdjR2
## 7.190862e-02 -1.910231e+03 -1.906988e+03 -1.758841e+03 8.525942e-01
AIC is minimized by K=12.
Check the residuals of the final model using the checkresiduals() function. Even though the residuals fail the correlation tests, the results are probably not severe enough to make much difference to the forecasts and prediction intervals. (Note that the correlations are relatively small, even though they are significant.)
checkresiduals(tslm12gas)
##
## Breusch-Godfrey test for serial correlation of order up to 104
##
## data: Residuals from Linear regression model
## LM test = 154.2, df = 104, p-value = 0.001021
To forecast using harmonic regression, you will need to generate the future values of the Fourier terms. This can be done as follows.
fc <- forecast(fit, newdata=data.frame(fourier(x,K,h)))
where fit is the fitted model using tslm(), K is the number of Fourier terms used in creating fit, and h is the forecast horizon required.
Forecast the next year of data.
gas.forecast <- forecast(tslm12gas, newdata=data.frame(fourier(gas,12,52)))
gas.forecast
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2005.014 8.713386 8.371087 9.055684 8.189474 9.237298
## 2005.033 8.642131 8.299720 8.984542 8.118047 9.166215
## 2005.052 8.656942 8.314540 8.999345 8.132871 9.181013
## 2005.071 8.661292 8.318899 9.003684 8.137236 9.185347
## 2005.090 8.686370 8.344109 9.028631 8.162515 9.210225
## 2005.110 8.784129 8.441982 9.126276 8.260448 9.307809
## 2005.129 8.895967 8.553823 9.238111 8.372291 9.419643
## 2005.148 8.937884 8.595756 9.280012 8.414232 9.461535
## 2005.167 8.940875 8.598761 9.282988 8.417246 9.464503
## 2005.186 8.988596 8.646477 9.330716 8.464958 9.512235
## 2005.205 9.062316 8.720191 9.404442 8.538669 9.585963
## 2005.225 9.073794 8.731673 9.415915 8.550153 9.597435
## 2005.244 9.031913 8.689800 9.374027 8.508284 9.555542
## 2005.263 9.041458 8.699342 9.383575 8.517824 9.565092
## 2005.282 9.122877 8.780756 9.464999 8.599236 9.646519
## 2005.301 9.169383 8.827263 9.511503 8.645744 9.693022
## 2005.320 9.132601 8.790487 9.474715 8.608970 9.656231
## 2005.340 9.120782 8.778667 9.462897 8.597150 9.644414
## 2005.359 9.221551 8.879431 9.563671 8.697912 9.745190
## 2005.378 9.335425 8.993306 9.677545 8.811787 9.859064
## 2005.397 9.316637 8.974522 9.658752 8.793006 9.840269
## 2005.416 9.214109 8.871994 9.556224 8.690478 9.737740
## 2005.435 9.218957 8.876838 9.561076 8.695320 9.742595
## 2005.455 9.383861 9.041741 9.725980 8.860222 9.907499
## 2005.474 9.545162 9.203046 9.887278 9.021530 10.068795
## 2005.493 9.559667 9.217553 9.901782 9.036037 10.083298
## 2005.512 9.487053 9.144935 9.829171 8.963416 10.010689
## 2005.531 9.464953 9.122834 9.807073 8.941315 9.988592
## 2005.550 9.508378 9.166261 9.850494 8.984744 10.032011
## 2005.570 9.534922 9.192808 9.877037 9.011292 10.058553
## 2005.589 9.521986 9.179869 9.864104 8.998351 10.045621
## 2005.608 9.522899 9.180779 9.865019 8.999260 10.046538
## 2005.627 9.548443 9.206326 9.890560 9.024808 10.072078
## 2005.646 9.536705 9.194591 9.878819 9.013075 10.060335
## 2005.665 9.451014 9.108897 9.793131 8.927380 9.974649
## 2005.685 9.330630 8.988509 9.672750 8.806990 9.854269
## 2005.704 9.229844 8.887726 9.571962 8.706208 9.753480
## 2005.723 9.170862 8.828748 9.512976 8.647232 9.694492
## 2005.742 9.168084 8.825968 9.510200 8.644451 9.691717
## 2005.761 9.230990 8.888869 9.573110 8.707349 9.754630
## 2005.780 9.320372 8.978252 9.662492 8.796734 9.844010
## 2005.800 9.363916 9.021801 9.706030 8.840285 9.887546
## 2005.819 9.342664 9.000548 9.684779 8.819032 9.866296
## 2005.838 9.304159 8.962036 9.646281 8.780516 9.827801
## 2005.857 9.267511 8.925389 9.609633 8.743869 9.791153
## 2005.876 9.194408 8.852292 9.536523 8.670776 9.718040
## 2005.895 9.100701 8.758587 9.442816 8.577070 9.624332
## 2005.915 9.104667 8.762540 9.446794 8.581017 9.628317
## 2005.934 9.265935 8.923806 9.608064 8.742282 9.789587
## 2005.953 9.436660 9.094538 9.778782 8.913018 9.960302
## 2005.972 9.400410 9.058293 9.742527 8.876776 9.924044
## 2005.991 9.147441 8.805233 9.489649 8.623668 9.671215
Plot the forecasts along with the actual data for 2005. What do you find?
gas <- window(gasoline,end=2006)
autoplot(gas.forecast) +autolayer(window(gas))
autoplot(gas)
autoplot(gas.forecast)
gas.brief <- window(gasoline, start=2005, end=2006)
autoplot(gas.brief, series="Observed", color = "brown") + autolayer(gas.forecast$mean, series="Forecast", color="blue")
autoplot(gas.brief, series="Observed", color = "brown") + autolayer(gas.forecast, series="Forecast", color="blue")
The model predictive fairly well against the real data. For the most part, the actual stays within the bounds of the prediction intervals. However, at one point approaching the middle of 2005, the actual data falls briefly outside the 95% prediction interval, though it is notvery substantial. Within the definition of a 95% prediction interval, this may be okay from a performance standpoint.
Data set huron gives the water level of Lake Huron in feet from 1875 to 1972.
Plot the data and comment on its features.
autoplot(huron, main="Mean Water Level of Lake Huron 1875-1972", xlab="Year", ylab ="Feet" ) +
theme(plot.title = element_text(hjust = 0.5))
Fit a linear regression and compare this to a piecewise linear trend model with a knot at 1915.
lmq7b <- tslm(huron~trend, data=huron)
summary(lmq7b)
##
## Call:
## tslm(formula = huron ~ trend, data = huron)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.50997 -0.72726 0.00083 0.74402 2.53565
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.202037 0.230111 44.335 < 2e-16 ***
## trend -0.024201 0.004036 -5.996 3.55e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.13 on 96 degrees of freedom
## Multiple R-squared: 0.2725, Adjusted R-squared: 0.2649
## F-statistic: 35.95 on 1 and 96 DF, p-value: 3.545e-08
time.huron <- time(huron)
h <- 10
knot.huron <- ts(pmax(0, time.huron-1915), start = 1875)
piecewise.model <- tslm(huron~time.huron+knot.huron)
summary(piecewise.model)
##
## Call:
## tslm(formula = huron ~ time.huron + knot.huron)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.49626 -0.66240 -0.07139 0.85163 2.39222
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 132.90870 19.97687 6.653 1.82e-09 ***
## time.huron -0.06498 0.01051 -6.181 1.58e-08 ***
## knot.huron 0.06486 0.01563 4.150 7.26e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.045 on 95 degrees of freedom
## Multiple R-squared: 0.3841, Adjusted R-squared: 0.3711
## F-statistic: 29.62 on 2 and 95 DF, p-value: 1.004e-10
Generate forecasts from these two models for the period up to 1980 and comment on these.
lmq7b.forecast <- forecast(lmq7b)
lmq7b.forecast
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 1973 7.806127 6.317648 9.294605 5.516501 10.095752
## 1974 7.781926 6.292536 9.271315 5.490899 10.072952
## 1975 7.757724 6.267406 9.248043 5.465269 10.050179
## 1976 7.733523 6.242259 9.224788 5.439613 10.027434
## 1977 7.709322 6.217094 9.201550 5.413929 10.004715
## 1978 7.685121 6.191912 9.178331 5.388219 9.982024
## 1979 7.660920 6.166712 9.155128 5.362481 9.959359
## 1980 7.636719 6.141494 9.131943 5.336717 9.936721
## 1981 7.612518 6.116259 9.108776 5.310925 9.914110
## 1982 7.588317 6.091007 9.085626 5.285107 9.891526
h <- 10
t.new <- time.huron[length(time.huron)] + seq(h)
knot.new <- knot.huron[length(knot.huron)] + seq(h)
newdata <- cbind(time.huron=t.new, knot.huron=knot.new)%>%
as.data.frame()
pc.fore <- forecast(piecewise.model, newdata = newdata)
pc.fore
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 1973 8.455119 7.063583 9.846655 6.314483 10.59575
## 1974 8.454992 7.061518 9.848467 6.311374 10.59861
## 1975 8.454866 7.059398 9.850333 6.308182 10.60155
## 1976 8.454739 7.057225 9.852253 6.304906 10.60457
## 1977 8.454612 7.054998 9.854227 6.301549 10.60768
## 1978 8.454486 7.052717 9.856254 6.298109 10.61086
## 1979 8.454359 7.050384 9.858334 6.294587 10.61413
## 1980 8.454232 7.047997 9.860467 6.290984 10.61748
## 1981 8.454106 7.045558 9.862653 6.287300 10.62091
## 1982 8.453979 7.043067 9.864891 6.283535 10.62442
autoplot(huron) + autolayer(fitted(piecewise.model), series = "Piecewise") + autolayer(fitted(lmq7b), series = "Linear") + autolayer(pc.fore, series="Piecewise", PI=FALSE) + autolayer(lmq7b.forecast, series="Linear", PI=FALSE) +
xlab("Year") + ylab("Feet") + ggtitle("Mean Water Level of Lake Huron 1875-1972") +
theme(plot.title = element_text(hjust = 0.5))
From a R^2 POV, the piecwise model outperforms the linear model. This makes sense, given the big trend shift/plateau-type action occurring. The basic linear model would suggest the Lake will eventually dry up, so it really isn’t useful in comparison to the piecewise model.
Show that a 3×5 MA is equivalent to a 7-term weighted moving average with weights of 0.067, 0.133, 0.200, 0.200, 0.200, 0.133, and 0.067.
3x5 MA = [((Y1 + Y2 + Y3 + Y4 + Y5)/5) + ((Y2 + Y3 + Y4 + Y5 + Y6)/5) + ((Y3 + Y4 + Y5 + Y6 + Y7)/5)] / 3
–> 1/15Y1 + 2/15Y2 + 3/15Y3 + 3/15Y4 + 3/15Y5 + 2/15Y6 + 1/15Y7
–> 0.067Y1 + 0.133Y2 + 0.2Y3 + 0.2Y4 + 0.2Y5 + 0.133Y6 + 0.067Y7
Plot the time series of sales of product A. Can you identify seasonal fluctuations and/or a trend-cycle?
autoplot(plastics, main="Monthly Sales of Product A", xlab="Year", ylab="Sales (thousands of $)") +
theme(plot.title = element_text(hjust = 0.5))
There seems to be clear seasonality, with peaks just past the midpoint of each year and minimums at the end/very beginning of each year, while there is a general upward trend.
Use a classical multiplicative decomposition to calculate the trend-cycle and seasonal indices.
multdecomp.salesA <- decompose(plastics, type = "multiplicative")
autoplot(multdecomp.salesA)
Do the results support the graphical interpretation from part a?
Yes, we clearly see the mid-year peak/end year minimum seasonality trend in the seasonal component, and we can see the general upward trend represented in the trend component.
Compute and plot the seasonally adjusted data.
autoplot(plastics, series="Observed") +
autolayer(trendcycle(multdecomp.salesA), series="Trend") +
autolayer(seasadj(multdecomp.salesA), series="Seasonal")
## Warning: Removed 12 row(s) containing missing values (geom_path).
Change one observation to be an outlier (e.g., add 500 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?
plastics1 = plastics
plastics1[51] = plastics1[51] + 500
multdecomp.outlier <- decompose(plastics1, type= "multiplicative")
autoplot(plastics1, series="Observed") +
autolayer(trendcycle(multdecomp.outlier), series="Trend") +
autolayer(seasadj(multdecomp.outlier), series="Seasonal")
## Warning: Removed 12 row(s) containing missing values (geom_path).
The outlier seems to have a very local effect on the data. Generally, the trend and seasonal adjusted data don’t change much except those points in close proximity to the outlier.
Does it make any difference if the outlier is near the end rather than in the middle of the time series?
plastics2 <- plastics
plastics2[30] = plastics2[30] + 500
multdecomp.outlier1 <- decompose(plastics2, type= "multiplicative")
autoplot(plastics2, series="Observed") +
autolayer(trendcycle(multdecomp.outlier1), series="Trend") +
autolayer(seasadj(multdecomp.outlier1), series="Seasonal")
## Warning: Removed 12 row(s) containing missing values (geom_path).
While the outliers tend to have significant local impacts, they do not seem to substantially impact the seasonal and trend data as a whole, regardless of the location of the outlier.
Recall your retail time series data (from Exercise 3 in Section 2.10). Decompose the series using X11. Does it reveal any outliers, or unusual features that you had not noticed previously?
retail <- read.csv("tute1.csv")
library(seasonal)
## Warning: package 'seasonal' was built under R version 4.0.5
time.retail <- ts(retail[,"Sales"], frequency=12, start=c(1982,4))
retail.x11 <- seas(time.retail, x11="")
autoplot(retail.x11, main="X11 Decomposition", xlab="Year")
There are some significant outliers at 1987 and towards the end of 1989. There is also a significant jump/plateau in the trend at this second outlier, which seems less like a “trend” and more like a unique occurence that has been partly embodied in the trend component and partly emobodied in the remainder. This means this outlier/occurrence is quite significant, if we are to combine these components.
Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.
This decomposition is quite good, though it could be a consequence of the data. The trend component is a very smooth uptrend, growing 6500 (thousand ?) people to 9000 (thousand) people over 20 years. The seasonal component shows a cyclical yearly nature of employment (which makes sense given fiscal-year/quarter related layoffs), ranging from -100 to 100 (thousand) people. This cyclical/quarterly seasonality is supported by the monthly plot. However, there is a large remainder in the main decomposition plot of negative 300 (thousand) coinciding with 1991/1992.
Is the recession of 1991/1992 visible in the estimated components?
The 1991/1992 recession is very visible in the remainder component of the decomposition. The seasonal and trend components appear quite “outlier robust”, as the remainder component embodies nearly all of this outlier event, which is what you want in your decomposition.
This exercise uses the cangas data (monthly Canadian gas production in billions of cubic metres, January 1960 – February 2005).
Plot the data using autoplot(), ggsubseriesplot() and ggseasonplot() to look at the effect of the changing seasonality over time. What do you think is causing it to change so much?
autoplot(cangas)
ggsubseriesplot(cangas)
ggseasonplot(cangas)
The changing seasonality could be the result of environmental factors. Changes in weather patterns in terms of seasons in the year could very much change the demand for gas at different times. The move towards other energy sources (possibly renewables) could also contribute to the changing seasonality. A move to solar, wind, or hydro, which perform better in certain seasons, could impact the seasonality.
Do an STL decomposition of the data. You will need to choose s.window to allow for the changing shape of the seasonal component.
stl.decomp.cangas <- stl(cangas, s.window = 12)
autoplot(stl.decomp.cangas)
Compare the results with those obtained using SEATS and X11. How are they different?
seas.decomp.cangas <- seas(cangas)
autoplot(seas.decomp.cangas)
x11.decomp.cangas <- seas(cangas, x11="")
autoplot(x11.decomp.cangas)
The SEATS and X11 decompositions generally reduce the seasonal component in terms of magnitude, whereas the STL decomposition shows a large season contribution from 1975-1990ish. Likewise, the largest seasonal components under the SEATS/X11 decompositions occur at the beginning of the timeseries, whereas the STL suggests that that time period has a small seasonal contribution, and the largest contribution is in the middle.
We will use the bricksq data (Australian quarterly clay brick production. 1956–1994) for this exercise.
Use an STL decomposition to calculate the trend-cycle and seasonal indices. (Experiment with having fixed or changing seasonality.)
stl.fixed <- stl(bricksq, s.window = "periodic", robust = TRUE)
autoplot(stl.fixed)
stl.changing = stl(bricksq, s.window = 13)
autoplot(stl.changing)
Compute and plot the seasonally adjusted data.
autoplot(seasadj(stl.changing))
Use a naïve method to produce forecasts of the seasonally adjusted data.
autoplot(bricksq) + autolayer(snaive(bricksq), series="Seasonal Naïve", PI=FALSE) +
autolayer(naive(bricksq), series="Naïve", PI=FALSE)
Use stlf() to reseasonalise the results, giving forecasts for the original data.
brickstlf = stlf(bricksq)
autoplot(brickstlf)
Do the residuals look uncorrelated?
checkresiduals(brickstlf)
## Warning in checkresiduals(brickstlf): The fitted degrees of freedom is based on
## the model used for the seasonally adjusted data.
##
## Ljung-Box test
##
## data: Residuals from STL + ETS(M,N,N)
## Q* = 41.128, df = 6, p-value = 2.733e-07
##
## Model df: 2. Total lags used: 8
The residuals generally look uncorrelated. They follow a nice normal distribution, though there are 2 spikes at lag 1 and 8 in the ACF graph which possibly indicate some level of autocorrelation, but it doesn’t seem very impactful. The model assumptions about the residuals likely hold.
Repeat with a robust STL decomposition. Does it make much difference?
stlf.robust = stlf(bricksq, robust = TRUE)
autoplot(stlf.robust)
checkresiduals(stlf.robust)
## Warning in checkresiduals(stlf.robust): The fitted degrees of freedom is based
## on the model used for the seasonally adjusted data.
##
## Ljung-Box test
##
## data: Residuals from STL + ETS(M,N,N)
## Q* = 28.163, df = 6, p-value = 8.755e-05
##
## Model df: 2. Total lags used: 8
There’s no significant change in the residuals. They are still normal, though the autocorrelation around lags 5-8 seems more convincing than before from the ACF plot.
Compare forecasts from stlf() with those from snaive(), using a test set comprising the last 2 years of data. Which is better?
train <- subset(bricksq, end=length(bricksq) - 9)
test <- subset(bricksq, start=length(bricksq) - 8)
snaive.brick <- snaive(train)
stlf.brick <- stlf(train, robust=TRUE)
autoplot(snaive.brick)
snb.for <- forecast(snaive.brick, newdata = test)
snb.for
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 1992 Q3 424 360.1325 487.8675 326.3231 521.6769
## 1992 Q4 430 366.1325 493.8675 332.3231 527.6769
## 1993 Q1 387 323.1325 450.8675 289.3231 484.6769
## 1993 Q2 413 349.1325 476.8675 315.3231 510.6769
## 1993 Q3 424 333.6777 514.3223 285.8640 562.1360
## 1993 Q4 430 339.6777 520.3223 291.8640 568.1360
## 1994 Q1 387 296.6777 477.3223 248.8640 525.1360
## 1994 Q2 413 322.6777 503.3223 274.8640 551.1360
autoplot(bricksq) + autolayer(snb.for, PI=FALSE, series = "Snaive")
autoplot(stlf.brick)
stlfb.for <- forecast(stlf.brick, newdata = test)
stlfb.for
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 1992 Q3 420.8168 395.8276 445.8059 382.5991 459.0344
## 1992 Q4 394.8295 359.4696 430.1894 340.7512 448.9078
## 1993 Q1 346.8243 303.4916 390.1570 280.5527 413.0959
## 1993 Q2 413.0040 362.9375 463.0705 336.4339 489.5741
## 1993 Q3 420.8168 364.8067 476.8269 335.1567 506.4768
## 1993 Q4 394.8295 333.4361 456.2229 300.9363 488.7226
## 1994 Q1 346.8243 280.4713 413.1773 245.3461 448.3025
## 1994 Q2 413.0040 342.0262 483.9818 304.4529 521.5552
autoplot(bricksq) + autolayer(stlfb.for, PI=FALSE, series = "Stlf")
The snaive approach performs more closely to the actual data on this forecasting interval than the stlf approach, which can be seen visually above.
Use stlf() to produce forecasts of the writing series with either method=“naive” or method=“rwdrift”, whichever is most appropriate. Use the lambda argument if you think a Box-Cox transformation is required.
Given the consistent upward trend of the data with strong seasonality, I’ve gone with a random walk with drift method to best capture the “drift” term, without a Box-Cox transformation.
rwdrift.writing <- stlf(writing, method = 'rwdrift')
autoplot(rwdrift.writing)
Use stlf() to produce forecasts of the fancy series with either method=“naive” or method=“rwdrift”, whichever is most appropriate. Use the lambda argument if you think a Box-Cox transformation is required.
Given the consistent upward trend of the data with strong seasonality, I’ve gone with a random walk with drift method to best capture the “drift” term. With that said, given the extreme season nature of the data, and it’s generally “uly” exponential growth trend, a Box-Cox transformation is likely useful here.
rwdrift.fancy <- stlf(fancy, method='rwdrift', robust=TRUE, lambda = BoxCox.lambda(fancy))
autoplot(rwdrift.fancy)