Assignment1_Ch. 5 and 6 Esercises

##Chapter 5

Daily electricity demand for Victoria, Australia, during 2014 is contained in elecdaily. The data for the first 20 days can be obtained as follows.

library(fpp2)

## Loading required package: ggplot2

## Loading required package: forecast

## Loading required package: fma

## Loading required package: expsmooth

head(elecdaily,20)

## Time Series:
## Start = c(1, 1) 
## End = c(3, 6) 
## Frequency = 7 
##            Demand WorkDay Temperature
## 1.000000 174.8963       0        26.0
## 1.142857 188.5909       1        23.0
## 1.285714 188.9169       1        22.2
## 1.428571 173.8142       0        20.3
## 1.571429 169.5152       0        26.1
## 1.714286 195.7288       1        19.6
## 1.857143 199.9029       1        20.0
## 2.000000 205.3375       1        27.4
## 2.142857 228.0782       1        32.4
## 2.285714 258.5984       1        34.0
## 2.428571 201.7970       0        22.4
## 2.571429 187.6298       0        22.5
## 2.714286 254.6636       1        30.0
## 2.857143 322.2323       1        42.4
## 3.000000 343.9934       1        41.5
## 3.142857 347.6376       1        43.2
## 3.285714 332.9455       1        43.1
## 3.428571 219.7517       0        23.7
## 3.571429 186.9816       0        22.3
## 3.714286 228.4876       1        24.0

a.Plot the data and find the regression model for Demand with temperature as an explanatory variable. Why is there a positive relationship?

daily<-head(elecdaily,20)


dailyts<-ts(daily)

autoplot(dailyts)

tslmdaily<- tslm(Demand~Temperature, data=dailyts)
plot(Demand~Temperature, data=dailyts, main= "Electrical Demand vs. Temperature")

# There is a positive relationship between Demand and temperature. As temperature increases so does electrical demand. Likely due to increased air conditioning usage.

Produce a residual plot. Is the model adequate? Are there any outliers or influential observations?

checkresiduals(tslmdaily$residuals)

## Warning in modeldf.default(object): Could not find appropriate degrees of
## freedom for this model.

# Residuals do not appear to be correlated so the model should be accurate. Outlier is seen at one point.

c.Use the model to forecast the electricity demand that you would expect for the next day if the maximum temperature was 15∘ and compare it with the forecast if the with maximum temperature was 35∘.Do you believe these forecasts?

forecastdemand15<-forecast(tslmdaily,newdata=data.frame(Temperature=15))
forecastdemand15

##    Point Forecast   Lo 80    Hi 80    Lo 95    Hi 95
## 21       140.5701 108.681 172.4591 90.21166 190.9285

forecastdemand35<-forecast(tslmdaily,newdata=data.frame(Temperature=35))
forecastdemand35

##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 21       275.7146 245.2278 306.2014 227.5706 323.8586

# The forecast seems reasonable since the forcasts are ot too far out of the reange of the data.

Give prediction intervals for your forecasts. The following R code will get you started:

autoplot(daily, facets=TRUE)

daily %>%
  as.data.frame() %>%
  ggplot(aes(x=Temperature, y=Demand)) +
  geom_point() +
  geom_smooth(method="lm", se=FALSE)

fit <- tslm(Demand ~ Temperature, data=daily)
checkresiduals(fit)

## 
##  Breusch-Godfrey test for serial correlation of order up to 5
## 
## data:  Residuals from Linear regression model
## LM test = 3.8079, df = 5, p-value = 0.5774

forecast(fit, newdata=data.frame(Temperature=c(15,35)))

##          Point Forecast    Lo 80    Hi 80     Lo 95    Hi 95
## 3.857143       140.5701 108.6810 172.4591  90.21166 190.9285
## 4.000000       275.7146 245.2278 306.2014 227.57056 323.8586

# Using the provided code the forecast would be as follows: 15 degree -> (108.68, 172.45) at 80%. 35 degree -> (245.2278, 306.2014) at 80%. 15 degree -> (90.21,190.92) at 95% and 35 degree -> (227.57, 323.85) at 95%.

Plot Demand vs Temperature for all of the available data in elecdaily. What does this say about your model?

plot(Demand~Temperature, data=elecdaily, main= "Demand vs. Temperature for All Data in elecdaily")

# The model does not account for all data. Specifically when temperature is very low. Does not take into acount increased demand for electricity when cold outside( i.e. usage of heat).

Data set mens400 contains the winning times (in seconds) for the men’s 400 meters final in each Olympic Games from 1896 to 2016.

Plot the winning time against the year. Describe the main features of the plot.

autoplot(mens400, xlab="Year", ylab="Winning Time", main="Winning Time for 400 mater but Year")

b. Fit a regression line to the data. Obviously the winning times have been decreasing, but at what average rate per year?

menstime<-time(mens400)
tslmmens<-(tslm(mens400~menstime, data=mens400))
tslmmens

## 
## Call:
## tslm(formula = mens400 ~ menstime, data = mens400)
## 
## Coefficients:
## (Intercept)     menstime  
##   172.48148     -0.06457

autoplot(mens400, ylab="Time", xlab="Year",main= "AB Line Fit to Time by Year")+
  geom_abline(slope = tslmmens$coefficients[2],
              intercept = tslmmens$coefficients[1])

# Winning times have been decreasing at a rate of 0.06457 seconds per year.

c.Plot the residuals against the year. What does this indicate about the suitability of the fitted line?

checkresiduals(tslmmens$residuals)

## Warning in modeldf.default(object): Could not find appropriate degrees of
## freedom for this model.

# The residuals plot seems to fit the model fairly well with a few exceptions.

Predict the winning time for the men’s 400 meters final in the 2020 Olympics. Give a prediction interval for your forecasts. What assumptions have you made in these calculations?

mens2020<- forecast(tslmmens,newdata=data.frame(menstime=2020))
mens2020

##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2020       42.04231 40.44975 43.63487 39.55286 44.53176

mens2020$upper

## Time Series:
## Start = 2020 
## End = 2020 
## Frequency = 0.25 
##      Series 1 Series 2
## 2020 43.63487 44.53176

mens2020$lower

## Time Series:
## Start = 2020 
## End = 2020 
## Frequency = 0.25 
##      Series 1 Series 2
## 2020 40.44975 39.55286

# The projected winning time is 42.04. The 80% interval is 40.44 to 43.48 and the 95% interval is 49.55 to 44.53.

Type easter(ausbeer) and interpret what you see

easter(ausbeer)

##      Qtr1 Qtr2 Qtr3 Qtr4
## 1956 0.67 0.33 0.00 0.00
## 1957 0.00 1.00 0.00 0.00
## 1958 0.00 1.00 0.00 0.00
## 1959 1.00 0.00 0.00 0.00
## 1960 0.00 1.00 0.00 0.00
## 1961 0.33 0.67 0.00 0.00
## 1962 0.00 1.00 0.00 0.00
## 1963 0.00 1.00 0.00 0.00
## 1964 1.00 0.00 0.00 0.00
## 1965 0.00 1.00 0.00 0.00
## 1966 0.00 1.00 0.00 0.00
## 1967 1.00 0.00 0.00 0.00
## 1968 0.00 1.00 0.00 0.00
## 1969 0.00 1.00 0.00 0.00
## 1970 1.00 0.00 0.00 0.00
## 1971 0.00 1.00 0.00 0.00
## 1972 0.33 0.67 0.00 0.00
## 1973 0.00 1.00 0.00 0.00
## 1974 0.00 1.00 0.00 0.00
## 1975 1.00 0.00 0.00 0.00
## 1976 0.00 1.00 0.00 0.00
## 1977 0.00 1.00 0.00 0.00
## 1978 1.00 0.00 0.00 0.00
## 1979 0.00 1.00 0.00 0.00
## 1980 0.00 1.00 0.00 0.00
## 1981 0.00 1.00 0.00 0.00
## 1982 0.00 1.00 0.00 0.00
## 1983 0.00 1.00 0.00 0.00
## 1984 0.00 1.00 0.00 0.00
## 1985 0.00 1.00 0.00 0.00
## 1986 1.00 0.00 0.00 0.00
## 1987 0.00 1.00 0.00 0.00
## 1988 0.00 1.00 0.00 0.00
## 1989 1.00 0.00 0.00 0.00
## 1990 0.00 1.00 0.00 0.00
## 1991 1.00 0.00 0.00 0.00
## 1992 0.00 1.00 0.00 0.00
## 1993 0.00 1.00 0.00 0.00
## 1994 0.00 1.00 0.00 0.00
## 1995 0.00 1.00 0.00 0.00
## 1996 0.00 1.00 0.00 0.00
## 1997 1.00 0.00 0.00 0.00
## 1998 0.00 1.00 0.00 0.00
## 1999 0.00 1.00 0.00 0.00
## 2000 0.00 1.00 0.00 0.00
## 2001 0.00 1.00 0.00 0.00
## 2002 1.00 0.00 0.00 0.00
## 2003 0.00 1.00 0.00 0.00
## 2004 0.00 1.00 0.00 0.00
## 2005 1.00 0.00 0.00 0.00
## 2006 0.00 1.00 0.00 0.00
## 2007 0.00 1.00 0.00 0.00
## 2008 1.00 0.00 0.00 0.00
## 2009 0.00 1.00 0.00 0.00
## 2010 0.00 1.00

# ausbeer shows austailian beer production. the easter function shows the quarter easter occurs in in a specific year as a vector of either 0, 1, or a fraction. A fraction is seen if the easter period of Good friday Easter sunday spans from March to April.

An elasticity coefficient is the ratio of the percentage change in the forecast variable (y) to the percentage change in the predictor variable (x). Mathematically, the elasticity is defined as (dy/dx)×(x/y).Consider the log-log model, logy=β0+β1logx+ε. Express y as a function of x and show that the coefficient β1 is the elasticity coefficient.

Since elasticity is equal to β1=(dy/dx)*(x/y) taking the inverse gives you β1(dx/x)=(dy/y).You can then take the integral to express in terms of y.

5.The data set fancy concerns the monthly sales figures of a shop which opened in January 1987 and sells gifts, souvenirs, and novelties. The shop is situated on the wharf at a beach resort town in Queensland, Australia. The sales volume varies with the seasonal population of tourists. There is a large influx of visitors to the town at Christmas and for the local surfing festival, held every March since 1988. Over time, the shop has expanded its premises, range of products, and staff.

Produce a time plot of the data and describe the patterns in the graph. Identify any unusual or unexpected fluctuations in the time series.

autoplot(fancy, xlab= "Monthly Sales", ylab="Year")

head(fancy,30)

##           Jan      Feb      Mar      Apr      May      Jun      Jul
## 1987  1664.81  2397.53  2840.71  3547.29  3752.96  3714.74  4349.61
## 1988  2499.81  5198.24  7225.14  4806.03  5900.88  4951.34  6179.12
## 1989  4717.02  5702.63  9957.58  5304.78  6492.43  6630.80         
##           Aug      Sep      Oct      Nov      Dec
## 1987  3566.34  5021.82  6423.48  7600.60 19756.21
## 1988  4752.15  5496.43  5835.10 12600.08 28541.72
## 1989

# The time series seems to show a standard seasonall fluctuation untill 1993. In 1993 and 1994 the seasonal spikes seem to be very high in the christmas season. Almost more than double what had been previously seen.

Explain why it is necessary to take logarithms of these data before fitting a model.

Due to the seasonal veriations it is necessary to take logarithms of the data. This will mitigate the exponential increase that we see in yeart after 1992.

Use R to fit a regression model to the logarithms of these sales data with a linear trend, seasonal dummies and a “surfing festival” dummy variable.

tslmfancy<-tslm(BoxCox(fancy,0)~season+trend)
tslmfancy

## 
## Call:
## tslm(formula = BoxCox(fancy, 0) ~ season + trend)
## 
## Coefficients:
## (Intercept)      season2      season3      season4      season5  
##     7.60586      0.25104      0.69521      0.38293      0.40799  
##     season6      season7      season8      season9     season10  
##     0.44696      0.60822      0.58535      0.66634      0.74403  
##    season11     season12        trend  
##     1.20302      1.95814      0.02239

Plot the residuals against time and against the fitted values. Do these plots reveal any problems with the model?

autoplot(tslmfancy$residuals)

# Residuals show a pattern over time based on the plot. This means the residuals are correlated with time. This is not ideal.

Do boxplots of the residuals for each month. Does this reveal any problems with the model?

Time<-time(fancy)
cbind.data.frame(
    Month = factor(
      month.abb[round(12*(Time - floor(Time)) + 1)],
      labels = month.abb,
      ordered = TRUE
    ),
    Residuals = tslmfancy$residuals
    ) %>%
  ggplot(aes(x = Month,
             y = Residuals)) +
    geom_boxplot()

## Don't know how to automatically pick scale for object of type ts. Defaulting to continuous.

f. What do the values of the coefficients tell you about each variable?

tslmfancy$coefficients

## (Intercept)     season2     season3     season4     season5     season6 
##  7.60586042  0.25104367  0.69520658  0.38293405  0.40799437  0.44696253 
##     season7     season8     season9    season10    season11    season12 
##  0.60821562  0.58535238  0.66634464  0.74403359  1.20301639  1.95813657 
##       trend 
##  0.02239298

# The coefficients show a positive trend showing increasing sales over time. All seasons have positive coefficients. This indicates sales are lower earler in the ear and increasing as the year goes on.

What does the Breusch-Godfrey test tell you about your model?

checkresiduals(tslmfancy)

## 
##  Breusch-Godfrey test for serial correlation of order up to 17
## 
## data:  Residuals from Linear regression model
## LM test = 37.152, df = 17, p-value = 0.003209

# The residuals are correlated with each other since the pvalue is less than 0.05.

Regardless of your answers to the above questions, use your regression model to predict the monthly sales for 1994, 1995, and 1996. Produce prediction intervals for each of your forecasts.

future_fancy <- rep(0, 36)
for(i in 1:36){
  if(i %% 12 == 3){
    future_fancy[i] <- 1
  }
}
# make future data as time series.
future_fancy <- ts(data = future_fancy,
                   start = 1994,
                   end = c(1996, 12),
                   frequency = 12)
# forecast
fctslmfancy <- forecast(tslmfancy,newdata = data.frame(Time = time(future_fancy)))
autoplot(fctslmfancy)

fctslmfancy$upper

##               [,1]      [,2]
## Jan 1994  9.771532  9.913525
## Feb 1994 10.044969 10.186962
## Mar 1994 10.511525 10.653518
## Apr 1994 10.221645 10.363638
## May 1994 10.269098 10.411091
## Jun 1994 10.330459 10.472452
## Jul 1994 10.514105 10.656099
## Aug 1994 10.513635 10.655628
## Sep 1994 10.617020 10.759014
## Oct 1994 10.717102 10.859095
## Nov 1994 11.198478 11.340471
## Dec 1994 11.975991 12.117984
## Jan 1995 10.043182 10.186763
## Feb 1995 10.316618 10.460200
## Mar 1995 10.783174 10.926756
## Apr 1995 10.493295 10.636876
## May 1995 10.540748 10.684330
## Jun 1995 10.602109 10.745691
## Jul 1995 10.785755 10.929337
## Aug 1995 10.785285 10.928867
## Sep 1995 10.888670 11.032252
## Oct 1995 10.988752 11.132334
## Nov 1995 11.470128 11.613710
## Dec 1995 12.247641 12.391223
## Jan 1996 10.315440 10.460940
## Feb 1996 10.588877 10.734376
## Mar 1996 11.055433 11.200932
## Apr 1996 10.765553 10.911053
## May 1996 10.813007 10.958506
## Jun 1996 10.874368 11.019867
## Jul 1996 11.058014 11.203513
## Aug 1996 11.057543 11.203043
## Sep 1996 11.160929 11.306428
## Oct 1996 11.261011 11.406510
## Nov 1996 11.742386 11.887886
## Dec 1996 12.519900 12.665399

fctslmfancy$lower

##               [,1]      [,2]
## Jan 1994  9.246995  9.105002
## Feb 1994  9.520432  9.378439
## Mar 1994  9.986988  9.844995
## Apr 1994  9.697108  9.555115
## May 1994  9.744562  9.602569
## Jun 1994  9.805923  9.663930
## Jul 1994  9.989569  9.847576
## Aug 1994  9.989099  9.847106
## Sep 1994 10.092484  9.950491
## Oct 1994 10.192566 10.050573
## Nov 1994 10.673942 10.531949
## Dec 1994 11.451455 11.309462
## Jan 1995  9.512777  9.369195
## Feb 1995  9.786214  9.642632
## Mar 1995 10.252770 10.109188
## Apr 1995  9.962890  9.819308
## May 1995 10.010343  9.866762
## Jun 1995 10.071704  9.928123
## Jul 1995 10.255351 10.111769
## Aug 1995 10.254880 10.111299
## Sep 1995 10.358266 10.214684
## Oct 1995 10.458347 10.314766
## Nov 1995 10.939723 10.796142
## Dec 1995 11.717236 11.573655
## Jan 1996  9.777950  9.632451
## Feb 1996 10.051387  9.905887
## Mar 1996 10.517943 10.372443
## Apr 1996 10.228063 10.082564
## May 1996 10.275516 10.130017
## Jun 1996 10.336878 10.191378
## Jul 1996 10.520524 10.375024
## Aug 1996 10.520053 10.374554
## Sep 1996 10.623439 10.477939
## Oct 1996 10.723521 10.578021
## Nov 1996 11.204896 11.059397
## Dec 1996 11.982409 11.836910

How could you improve these predictions by modifying the model?

You should be able to improve the predictions by using transfomations to beter handle to exponential growth in the trend.

6.The gasoline series consists of weekly data for supplies of US finished motor gasoline product, from 2 February 1991 to 20 January 2017. The units are in “million barrels per day”. Consider only the data to the end of 2004.

a.Fit a harmonic regression with trend to the data. Experiment with changing the number Fourier terms. Plot the observed gasoline and fitted values and comment on what you see.

gas2004<-window(gasoline,end=2005)
autoplot(gas2004, ylab = "Gas Supply (Weekly)")

fourier.gas1 <- tslm(gas2004 ~ trend + fourier(gas2004, K=7))
fourier.gas2 <- tslm(gas2004 ~ trend + fourier(gas2004, K=12))
fourier.gas3 <- tslm(gas2004 ~ trend + fourier(gas2004, K=20))

autoplot(gas2004, ylab = "Gas Supply (Weekly)",main= "Fourier Transformation") +  autolayer(fitted(fourier.gas1))+
autolayer(fitted(fourier.gas2))+autolayer(fitted(fourier.gas3))

b.Select the appropriate number of Fourier terms to include by minimising the AICc or CV value.

CV(fourier.gas1)

##            CV           AIC          AICc           BIC         AdjR2 
##     0.0714734 -1913.5362459 -1912.6718391 -1835.5478956     0.8501073

CV(fourier.gas2)

##            CV           AIC          AICc           BIC         AdjR2 
##  7.096945e-02 -1.919276e+03 -1.917110e+03 -1.795412e+03  8.532618e-01

CV(fourier.gas3)

##            CV           AIC          AICc           BIC         AdjR2 
##  7.223834e-02 -1.908017e+03 -1.902469e+03 -1.710753e+03  8.540588e-01

# Of the 3 different Harmonic regression models we tried our second model wher K=12 appears to be the best. While our 3rs model where K=20 has the highesr Adj R2 at 0.854 moded 2 is very close with an R2 of 0.853. Model 2 also has the largest AIC and CV compared to the other models.

c.Check the residuals of the final model using the checkresiduals() function. Even though the residuals fail the correlation tests, the results are probably not severe enough to make much difference to the forecasts and prediction intervals. (Note that the correlations are relatively small, even though they are significant.)

checkresiduals(fourier.gas2$residuals)

## Warning in modeldf.default(object): Could not find appropriate degrees of
## freedom for this model.

# We do see correlation of the residuals. There is a very small p-value as well.

d.To forecast using harmonic regression, you will need to generate the future values of the Fourier terms. This can be done as follows.

fcgas <- forecast(fourier.gas2, newdata=data.frame(fourier(gas2004,12,52)))
fcgas

##          Point Forecast    Lo 80    Hi 80    Lo 95     Hi 95
## 2005.014       8.713386 8.371087 9.055684 8.189474  9.237298
## 2005.033       8.642131 8.299720 8.984542 8.118047  9.166215
## 2005.052       8.656942 8.314540 8.999345 8.132871  9.181013
## 2005.071       8.661292 8.318899 9.003684 8.137236  9.185347
## 2005.090       8.686370 8.344109 9.028631 8.162515  9.210225
## 2005.110       8.784129 8.441982 9.126276 8.260448  9.307809
## 2005.129       8.895967 8.553823 9.238111 8.372291  9.419643
## 2005.148       8.937884 8.595756 9.280012 8.414232  9.461535
## 2005.167       8.940875 8.598761 9.282988 8.417246  9.464503
## 2005.186       8.988596 8.646477 9.330716 8.464958  9.512235
## 2005.205       9.062316 8.720191 9.404442 8.538669  9.585963
## 2005.225       9.073794 8.731673 9.415915 8.550153  9.597435
## 2005.244       9.031913 8.689800 9.374027 8.508284  9.555542
## 2005.263       9.041458 8.699342 9.383575 8.517824  9.565092
## 2005.282       9.122877 8.780756 9.464999 8.599236  9.646519
## 2005.301       9.169383 8.827263 9.511503 8.645744  9.693022
## 2005.320       9.132601 8.790487 9.474715 8.608970  9.656231
## 2005.340       9.120782 8.778667 9.462897 8.597150  9.644414
## 2005.359       9.221551 8.879431 9.563671 8.697912  9.745190
## 2005.378       9.335425 8.993306 9.677545 8.811787  9.859064
## 2005.397       9.316637 8.974522 9.658752 8.793006  9.840269
## 2005.416       9.214109 8.871994 9.556224 8.690478  9.737740
## 2005.435       9.218957 8.876838 9.561076 8.695320  9.742595
## 2005.455       9.383861 9.041741 9.725980 8.860222  9.907499
## 2005.474       9.545162 9.203046 9.887278 9.021530 10.068795
## 2005.493       9.559667 9.217553 9.901782 9.036037 10.083298
## 2005.512       9.487053 9.144935 9.829171 8.963416 10.010689
## 2005.531       9.464953 9.122834 9.807073 8.941315  9.988592
## 2005.550       9.508378 9.166261 9.850494 8.984744 10.032011
## 2005.570       9.534922 9.192808 9.877037 9.011292 10.058553
## 2005.589       9.521986 9.179869 9.864104 8.998351 10.045621
## 2005.608       9.522899 9.180779 9.865019 8.999260 10.046538
## 2005.627       9.548443 9.206326 9.890560 9.024808 10.072078
## 2005.646       9.536705 9.194591 9.878819 9.013075 10.060335
## 2005.665       9.451014 9.108897 9.793131 8.927380  9.974649
## 2005.685       9.330630 8.988509 9.672750 8.806990  9.854269
## 2005.704       9.229844 8.887726 9.571962 8.706208  9.753480
## 2005.723       9.170862 8.828748 9.512976 8.647232  9.694492
## 2005.742       9.168084 8.825968 9.510200 8.644451  9.691717
## 2005.761       9.230990 8.888869 9.573110 8.707349  9.754630
## 2005.780       9.320372 8.978252 9.662492 8.796734  9.844010
## 2005.800       9.363916 9.021801 9.706030 8.840285  9.887546
## 2005.819       9.342664 9.000548 9.684779 8.819032  9.866296
## 2005.838       9.304159 8.962036 9.646281 8.780516  9.827801
## 2005.857       9.267511 8.925389 9.609633 8.743869  9.791153
## 2005.876       9.194408 8.852292 9.536523 8.670776  9.718040
## 2005.895       9.100701 8.758587 9.442816 8.577070  9.624332
## 2005.915       9.104667 8.762540 9.446794 8.581017  9.628317
## 2005.934       9.265935 8.923806 9.608064 8.742282  9.789587
## 2005.953       9.436660 9.094538 9.778782 8.913018  9.960302
## 2005.972       9.400410 9.058293 9.742527 8.876776  9.924044
## 2005.991       9.147441 8.805233 9.489649 8.623668  9.671215

e.Plot the forecasts along with the actual data for 2005. What do you find?

gas2005<-window(gasoline,start=2005, end=2006)

autoplot(gas2005, series = "Actual")+ autolayer(fcgas$mean,series = "Forecast", ylab="Gas Supply (Weekly)", main="Actual vs. Forecast of Weekly US Gas Supply")

## Warning: Ignoring unknown parameters: ylab, main

# We see that the model roughtly follows the actual data except in times of outliers. We see a much sharper dip in the actual data than we do in the fitted forecast during times where there are large dips in supply.

Data set huron gives the water level of Lake Huron in feet from 1875 to 1972.

a.Plot the data and comment on its features.

autoplot(huron, xlab = "Lake Level", main="Lake Level of Lake Huron 1875-1972")

# We see that there appears to be some seasonal schanges in the lake level. This could be due to seasonal changes such as cold and iceing in the winter or an increase during the spring/summer rainy seasons. Overall we see a slight negative trend in the lake level over the full timespan of the data.

Fit a linear regression and compare this to a piecewise linear trend model with a knot at 1915.

lmhuron<-tslm(huron~trend)

summary(lmhuron)

## 
## Call:
## tslm(formula = huron ~ trend)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.50997 -0.72726  0.00083  0.74402  2.53565 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 10.202037   0.230111  44.335  < 2e-16 ***
## trend       -0.024201   0.004036  -5.996 3.55e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.13 on 96 degrees of freedom
## Multiple R-squared:  0.2725, Adjusted R-squared:  0.2649 
## F-statistic: 35.95 on 1 and 96 DF,  p-value: 3.545e-08

Time<-time(huron)
t2<-ts(pmax(0,Time-1915),start = 1875)
pwhuron<-tslm(huron~Time+t2)
summary(pwhuron)

## 
## Call:
## tslm(formula = huron ~ Time + t2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.49626 -0.66240 -0.07139  0.85163  2.39222 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 132.90870   19.97687   6.653 1.82e-09 ***
## Time         -0.06498    0.01051  -6.181 1.58e-08 ***
## t2            0.06486    0.01563   4.150 7.26e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.045 on 95 degrees of freedom
## Multiple R-squared:  0.3841, Adjusted R-squared:  0.3711 
## F-statistic: 29.62 on 2 and 95 DF,  p-value: 1.004e-10

# We see a smaller residual standard error and a larger adjusted R2 with the piecewise model than we do with the other linear model.

##CHAPTER 6

Show that a 3×5 MA is equivalent to a 7-term weighted moving average with weights of 0.067, 0.133, 0.200, 0.200, 0.200, 0.133, and 0.067.

3x5 MA is taking the average for the first five terms and then getting the moving average of each three observation set. Since a 3X5 MA will lead to a 1/15(y + 2y + 3y + 3y + 3y + 2y + y). So the 7th term would be 1/15 or 0.067.

The plastics data set consists of the monthly sales (in thousands) of product A for a plastics manufacturer for five years.

Plot the time series of sales of product A. Can you identify seasonal fluctuations and/or a trend-cycle?

autoplot(plastics)

# Plot shows a seasonal trend in plastics sales with a slightly upward trend over time.

Use a classical multiplicative decomposition to calculate the trend-cycle and seasonal indices.

decompose_plastics<-decompose(plastics, type= "multiplicative")
autoplot(decompose_plastics)

c. Do the results support the graphical interpretation from part a?

The results from part b do support the interpritation from part a.

Compute and plot the seasonally adjusted data.

plasticseasonal<-decompose_plastics$x/decompose_plastics$seasonal
autoplot(plasticseasonal)

Change one observation to be an outlier (e.g., add 500 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

plasticoutlier <- plastics
plasticoutlier[20] <- plasticoutlier[20] + 500
decomposeoutlier <- decompose(plasticoutlier, "multiplicative")
plasticoutlieradj <- decomposeoutlier$x / decomposeoutlier$seasonal
plot(plasticoutlieradj)

# You can see the point at which the outlier was added as it appears as a large peak in the data.

Does it make any difference if the outlier is near the end rather than in the middle of the time series?

The placement of the outlier should not matter since the seasonal componet is divided out in the adjusted plot.

Recall your retail time series data (from Exercise 3 in Section 2.10). Decompose the series using X11. Does it reveal any outliers, or unusual features that you had not noticed previously?

library(seasonal)
retaildata<-readxl::read_excel("retail.xlsx", skip=1)
head(retaildata)

## # A tibble: 6 x 190
##   `Series ID`         A3349335T A3349627V A3349338X A3349398A A3349468W
##   <dttm>                  <dbl>     <dbl>     <dbl>     <dbl>     <dbl>
## 1 1982-04-01 00:00:00      303.      41.7      63.9      409.      65.8
## 2 1982-05-01 00:00:00      298.      43.1      64        405.      65.8
## 3 1982-06-01 00:00:00      298       40.3      62.7      401       62.3
## 4 1982-07-01 00:00:00      308.      40.9      65.6      414.      68.2
## 5 1982-08-01 00:00:00      299.      42.1      62.6      404.      66  
## 6 1982-09-01 00:00:00      305.      42        64.4      412.      62.3
## # … with 184 more variables: A3349336V <dbl>, A3349337W <dbl>,
## #   A3349397X <dbl>, A3349399C <dbl>, A3349874C <dbl>, A3349871W <dbl>,
## #   A3349790V <dbl>, A3349556W <dbl>, A3349791W <dbl>, A3349401C <dbl>,
## #   A3349873A <dbl>, A3349872X <dbl>, A3349709X <dbl>, A3349792X <dbl>,
## #   A3349789K <dbl>, A3349555V <dbl>, A3349565X <dbl>, A3349414R <dbl>,
## #   A3349799R <dbl>, A3349642T <dbl>, A3349413L <dbl>, A3349564W <dbl>,
## #   A3349416V <dbl>, A3349643V <dbl>, A3349483V <dbl>, A3349722T <dbl>,
## #   A3349727C <dbl>, A3349641R <dbl>, A3349639C <dbl>, A3349415T <dbl>,
## #   A3349349F <dbl>, A3349563V <dbl>, A3349350R <dbl>, A3349640L <dbl>,
## #   A3349566A <dbl>, A3349417W <dbl>, A3349352V <dbl>, A3349882C <dbl>,
## #   A3349561R <dbl>, A3349883F <dbl>, A3349721R <dbl>, A3349478A <dbl>,
## #   A3349637X <dbl>, A3349479C <dbl>, A3349797K <dbl>, A3349477X <dbl>,
## #   A3349719C <dbl>, A3349884J <dbl>, A3349562T <dbl>, A3349348C <dbl>,
## #   A3349480L <dbl>, A3349476W <dbl>, A3349881A <dbl>, A3349410F <dbl>,
## #   A3349481R <dbl>, A3349718A <dbl>, A3349411J <dbl>, A3349638A <dbl>,
## #   A3349654A <dbl>, A3349499L <dbl>, A3349902A <dbl>, A3349432V <dbl>,
## #   A3349656F <dbl>, A3349361W <dbl>, A3349501L <dbl>, A3349503T <dbl>,
## #   A3349360V <dbl>, A3349903C <dbl>, A3349905J <dbl>, A3349658K <dbl>,
## #   A3349575C <dbl>, A3349428C <dbl>, A3349500K <dbl>, A3349577J <dbl>,
## #   A3349433W <dbl>, A3349576F <dbl>, A3349574A <dbl>, A3349816F <dbl>,
## #   A3349815C <dbl>, A3349744F <dbl>, A3349823C <dbl>, A3349508C <dbl>,
## #   A3349742A <dbl>, A3349661X <dbl>, A3349660W <dbl>, A3349909T <dbl>,
## #   A3349824F <dbl>, A3349507A <dbl>, A3349580W <dbl>, A3349825J <dbl>,
## #   A3349434X <dbl>, A3349822A <dbl>, A3349821X <dbl>, A3349581X <dbl>,
## #   A3349908R <dbl>, A3349743C <dbl>, A3349910A <dbl>, A3349435A <dbl>,
## #   A3349365F <dbl>, A3349746K <dbl>, …

retailts <- ts(retaildata[,"A3349627V"],
  frequency=12, start=c(1982,4))

X11decomp<-seas(retailts, x11 = "")

autoplot(X11decomp)

# The decomposition does reveal outliers whcij can be most clearly seen in the remainder graph. We see these outliers pop up towards the last month of 1988 and 1989.

Figures 6.16 and 6.17 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.

Figures 6.16 an 6.17 show small amounts of seasonality in the civilan labor force in Australia and an overall upward trend over time. We can see that during the early ’90’s this trend seemed to flatten for a period of time. We can also see a significan dip in the plot of the data during this timeframe indicating there was a drop of the overall amount of workers and not just a stagnation in growth. The remainder plot also shows this same point in time as having a decrease with the large negative remainder. A decrase of this size would likely be due to a large event like a recession in the country.

Is the recession of 1991/1992 visible in the estimated components?

You can see the recession in the seasonally adjusted components of the data as a whole as well as when broken into the seasonal components as most months have a large dip and then recovery.

This exercise uses the cangas data (monthly Canadian gas production in billions of cubic metres, January 1960 – February 2005).

Plot the data using autoplot(), ggsubseriesplot() and ggseasonplot() to look at the effect of the changing seasonality over time. What do you think is causing it to change so much?

autoplot(cangas)

ggsubseriesplot(cangas)

ggseasonplot(cangas)

# Production appears to decrease during the summer months. This could be due to the less demand for heaters in the summer time and lack of AC need for the Canadian climate so gas production was not as necessary.The increase overtime could be due to the increase in the useage of electricity over time with the growth of modern technology.

Do an STL decomposition of the data. You will need to choose s.window to allow for the changing shape of the seasonal component.

cangasstl<-stl(cangas, s.window = 12)

autoplot(cangasstl)

# The overall increase in gas production can be seen. The highest leverl of seasonality can be observed in the 1980's.

Compare the results with those obtained using SEATS and X11. How are they different?

cangasts <- ts(cangas,frequency=12, start=c(1960,45))

X11cangas<-seas(cangasts, x11 = "")

autoplot(X11cangas)

# With SEATS and X11 we see more variation in the seasonality than was observed when using STL. The higher levels of seasonality before 1980 were not observed in STL. We also do not see thevariation in the remainders in the STL during the pre 1980 time period that is observed in SEATS.

6.We will use the bricksq data (Australian quarterly clay brick production. 1956–1994) for this exercise.

a.Use an STL decomposition to calculate the trend-cycle and seasonal indices. (Experiment with having fixed or changing seasonality.)

brickstl1<-stl(bricksq, s.window = "periodic", t.window = 12)
autoplot(brickstl1)

brickstl2<-stl(bricksq,s.window=40, t.window= 12)
autoplot(brickstl2)

brickstl3<-stl(bricksq,s.window=12)
autoplot(brickstl3)

b. Compute and plot the seasonally adjusted data.

bricksea<-seasadj(brickstl1)
autoplot(bricksea, ylab = "Brick Production", xlab = "Time",main = "Seasonal Adjusted Brick Production")

c.Use a naïve method to produce forecasts of the seasonally adjusted data.

naivebrick<-naive(bricksea)
autoplot(naivebrick)

Use stlf() to reseasonalise the results, giving forecasts for the original data.

brickstlf<-stlf(bricksq,method="naive")
brickstlf_fc<-forecast(brickstlf)
brickstlf_fc

##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 1994 Q4       465.7338 440.1878 491.2798 426.6645 504.8030
## 1995 Q1       424.6739 388.5464 460.8014 369.4217 479.9261
## 1995 Q2       483.9926 439.7456 528.2395 416.3227 551.6624
## 1995 Q3       494.0000 442.9080 545.0920 415.8616 572.1384
## 1995 Q4       465.7338 408.6112 522.8563 378.3723 553.0952
## 1996 Q1       424.6739 362.0993 487.2485 328.9742 520.3736
## 1996 Q2       483.9926 416.4042 551.5809 380.6251 587.3600
## 1996 Q3       494.0000 421.7450 566.2550 383.4956 604.5044

Do the residuals look uncorrelated?

checkresiduals(brickstlf_fc$residuals)

## Warning in modeldf.default(object): Could not find appropriate degrees of
## freedom for this model.

# Slight correlation shown in residuals on ACF at 4 and 8. Slightly able to see in residuals graph as well. This should not be an issue for model though.

Repeat with a robust STL decomposition. Does it make much difference?

brickstl<-stl(bricksq,t.window=12, s.window="periodic", robust=TRUE)
brickstl1<-seasadj(brickstl)
brickstl2<-naive(brickstl1)
brickstl_fc<-forecast(brickstl2)
brickstl_fc

##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 1994 Q4       471.7973 442.1625 501.4322 426.4748 517.1199
## 1995 Q1       471.7973 429.8874 513.7073 407.7015 535.8931
## 1995 Q2       471.7973 420.4683 523.1264 393.2963 550.2983
## 1995 Q3       471.7973 412.5277 531.0670 381.1522 562.4425
## 1995 Q4       471.7973 405.5318 538.0628 370.4530 573.1417
## 1996 Q1       471.7973 399.2071 544.3876 360.7802 582.8145
## 1996 Q2       471.7973 393.3909 550.2037 351.8851 591.7096
## 1996 Q3       471.7973 387.9774 555.6173 343.6058 599.9889
## 1996 Q4       471.7973 382.8928 560.7018 335.8296 607.7650
## 1997 Q1       471.7973 378.0838 565.5109 328.4748 615.1199

checkresiduals(brickstl_fc$residuals)

## Warning in modeldf.default(object): Could not find appropriate degrees of
## freedom for this model.

# The results are fairly similar whether forecasting using robust to when not using robust.

Compare forecasts from stlf() with those from snaive(), using a test set comprising the last 2 years of data. Which is better?

bricksq

##      Qtr1 Qtr2 Qtr3 Qtr4
## 1956  189  204  208  197
## 1957  187  214  227  223
## 1958  199  229  249  234
## 1959  208  253  267  255
## 1960  242  268  290  277
## 1961  241  253  265  236
## 1962  229  265  275  258
## 1963  231  263  308  313
## 1964  293  328  349  340
## 1965  309  349  366  340
## 1966  302  350  362  337
## 1967  326  358  359  357
## 1968  341  380  404  409
## 1969  383  417  454  428
## 1970  386  428  434  417
## 1971  385  433  453  436
## 1972  399  461  476  477
## 1973  452  461  534  516
## 1974  478  526  518  417
## 1975  340  437  459  449
## 1976  424  501  540  533
## 1977  457  513  522  478
## 1978  421  487  470  482
## 1979  458  526  573  563
## 1980  513  551  589  564
## 1981  519  581  581  578
## 1982  500  560  512  412
## 1983  303  409  420  413
## 1984  400  469  482  484
## 1985  447  507  533  503
## 1986  443  503  505  443
## 1987  415  485  495  458
## 1988  427  519  555  539
## 1989  511  572  570  526
## 1990  472  524  497  460
## 1991  373  436  424  430
## 1992  387  413  451  420
## 1993  394  462  476  443
## 1994  421  472  494

bricksq.train<-window(bricksq, end=c(1992,3))
bricksq.test<-window(bricksq, start=c(1992,4), end=c(1994,3))
bricksq.train

##      Qtr1 Qtr2 Qtr3 Qtr4
## 1956  189  204  208  197
## 1957  187  214  227  223
## 1958  199  229  249  234
## 1959  208  253  267  255
## 1960  242  268  290  277
## 1961  241  253  265  236
## 1962  229  265  275  258
## 1963  231  263  308  313
## 1964  293  328  349  340
## 1965  309  349  366  340
## 1966  302  350  362  337
## 1967  326  358  359  357
## 1968  341  380  404  409
## 1969  383  417  454  428
## 1970  386  428  434  417
## 1971  385  433  453  436
## 1972  399  461  476  477
## 1973  452  461  534  516
## 1974  478  526  518  417
## 1975  340  437  459  449
## 1976  424  501  540  533
## 1977  457  513  522  478
## 1978  421  487  470  482
## 1979  458  526  573  563
## 1980  513  551  589  564
## 1981  519  581  581  578
## 1982  500  560  512  412
## 1983  303  409  420  413
## 1984  400  469  482  484
## 1985  447  507  533  503
## 1986  443  503  505  443
## 1987  415  485  495  458
## 1988  427  519  555  539
## 1989  511  572  570  526
## 1990  472  524  497  460
## 1991  373  436  424  430
## 1992  387  413  451

bricksq.test

##      Qtr1 Qtr2 Qtr3 Qtr4
## 1992                 420
## 1993  394  462  476  443
## 1994  421  472  494

bricksq_naive<-snaive(bricksq.train)
bricksq_stlf<-stlf(bricksq.train)
bricksq.fc_naive<-forecast(bricksq_naive,h=8)
bricksq.fc_stlf<-forecast(bricksq_stlf,h=8)
bricksq.fc_stlf

##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 1992 Q4       423.9034 397.7584 450.0484 383.9181 463.8887
## 1993 Q1       378.9095 341.9154 415.9036 322.3319 435.4871
## 1993 Q2       441.9958 396.6620 487.3296 372.6637 511.3279
## 1993 Q3       450.9971 398.6202 503.3739 370.8936 531.1006
## 1993 Q4       423.9034 365.3107 482.4961 334.2936 513.5132
## 1994 Q1       378.9095 314.6874 443.1316 280.6903 477.1287
## 1994 Q2       441.9958 372.5879 511.4036 335.8457 548.1459
## 1994 Q3       450.9971 376.7541 525.2401 337.4523 564.5419

bricksq.fc_naive

##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 1992 Q4            430 366.2905 493.7095 332.5647 527.4353
## 1993 Q1            387 323.2905 450.7095 289.5647 484.4353
## 1993 Q2            413 349.2905 476.7095 315.5647 510.4353
## 1993 Q3            451 387.2905 514.7095 353.5647 548.4353
## 1993 Q4            430 339.9011 520.0989 292.2056 567.7944
## 1994 Q1            387 296.9011 477.0989 249.2056 524.7944
## 1994 Q2            413 322.9011 503.0989 275.2056 550.7944
## 1994 Q3            451 360.9011 541.0989 313.2056 588.7944

# The STLF forecast seems to be a bit more accurate especually when looking at the 80 and 95 % intervals.

Use stlf() to produce forecasts of the writing series with either method=“naive” or method=“rwdrift”, whichever is most appropriate. Use the lambda argument if you think a Box-Cox transformation is required.

autoplot(writing)

hist(writing)

stlfwriting <- stlf(writing, s.window = "periodic", robust = TRUE,method = "naive")

autoplot(stlfwriting)

8. Use stlf() to produce forecasts of the fancy series with either method=“naive” or method=“rwdrift”, whichever is most appropriate. Use the lambda argument if you think a Box-Cox transformation is required.

autoplot(fancy)

hist(fancy)

stlffancy<-stlf(fancy, s.window = "period", ,lambda = BoxCox.lambda(fancy),method= "rwdrift", robust = TRUE)

autoplot(stlffancy)

Assignment1_Ch. 5 and 6 Esercises

Charles Baumgartner

11/10/2019