Assignment 1. Chapter 5 (P1 ~ P7)

library(fpp2)

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

## ── Attaching packages ──────────────────────────────────────────────────────────────────── fpp2 2.4 ──

## ✓ ggplot2   3.3.2     ✓ fma       2.4  
## ✓ forecast  8.13      ✓ expsmooth 2.3

##

daily20 = head(elecdaily, 20)
daily20

## Time Series:
## Start = c(1, 4) 
## End = c(4, 2) 
## Frequency = 7 
##            Demand WorkDay Temperature
## 1.428571 174.8963       0        26.0
## 1.571429 188.5909       1        23.0
## 1.714286 188.9169       1        22.2
## 1.857143 173.8142       0        20.3
## 2.000000 169.5152       0        26.1
## 2.142857 195.7288       1        19.6
## 2.285714 199.9029       1        20.0
## 2.428571 205.3375       1        27.4
## 2.571429 228.0782       1        32.4
## 2.714286 258.5984       1        34.0
## 2.857143 201.7970       0        22.4
## 3.000000 187.6298       0        22.5
## 3.142857 254.6636       1        30.0
## 3.285714 322.2323       1        42.4
## 3.428571 343.9934       1        41.5
## 3.571429 347.6376       1        43.2
## 3.714286 332.9455       1        43.1
## 3.857143 219.7517       0        23.7
## 4.000000 186.9816       0        22.3
## 4.142857 228.4876       1        24.0

1a.Plot the data and find the regression model for Demand with temperature as an explanatory variable. Why is there a positive relationship?

autoplot(daily20)

regmodel = tslm(Demand ~ Temperature, data = daily20)

summary(regmodel)

## 
## Call:
## tslm(formula = Demand ~ Temperature, data = daily20)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -46.060  -7.117  -1.437  17.484  27.102 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  39.2117    17.9915   2.179   0.0428 *  
## Temperature   6.7572     0.6114  11.052 1.88e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 22 on 18 degrees of freedom
## Multiple R-squared:  0.8716, Adjusted R-squared:  0.8644 
## F-statistic: 122.1 on 1 and 18 DF,  p-value: 1.876e-09

Since there is a positive coefficient before temperature as indicated int he regression model, we can concluded that the temperature is positvely realted to the demand.

1b.Produce a residual plot. Is the model adequate? Are there any outliers or influential observations?

residual = resid(regmodel)
plot(fitted(regmodel, residual))

qqnorm(residual)
qqline(residual)

From the plot,we can see that the model is adequate, and there are outliers shown fromt the qqplot with best fitted line.

1c. Use the model to forecast the electricity demand that you would expect for the next day if the maximum temperature was
15∘and compare it with the forecast if the with maximum temperature was 35∘. Do you believe these forecasts?

forecastmodel = forecast(regmodel, newdata = data.frame(Temperature = c(15,35)))

forecastmodel

##          Point Forecast    Lo 80    Hi 80     Lo 95    Hi 95
## 4.285714       140.5701 108.6810 172.4591  90.21166 190.9285
## 4.428571       275.7146 245.2278 306.2014 227.57056 323.8586

1d.Give prediction intervals for your forecasts. The following R code will get you started:

autoplot(daily20, facets=TRUE)

daily20 %>%
  as.data.frame() %>%
  ggplot(aes(x=Temperature, y=Demand)) +
  geom_point() +
  geom_smooth(method="lm", se=FALSE)

## `geom_smooth()` using formula 'y ~ x'

fit <- tslm(Demand ~ Temperature, data=daily20)
checkresiduals(fit)

## 
##  Breusch-Godfrey test for serial correlation of order up to 5
## 
## data:  Residuals from Linear regression model
## LM test = 3.8079, df = 5, p-value = 0.5774

forecast(fit, newdata=data.frame(Temperature=c(15,35)))

##          Point Forecast    Lo 80    Hi 80     Lo 95    Hi 95
## 4.285714       140.5701 108.6810 172.4591  90.21166 190.9285
## 4.428571       275.7146 245.2278 306.2014 227.57056 323.8586

forecastmodel$lower[,1]

## Time Series:
## Start = c(4, 3) 
## End = c(4, 4) 
## Frequency = 7 
## [1] 108.6810 245.2278

forecastmodel$upper[,1]

## Time Series:
## Start = c(4, 3) 
## End = c(4, 4) 
## Frequency = 7 
## [1] 172.4591 306.2014

forecastmodel$lower[,2]

## Time Series:
## Start = c(4, 3) 
## End = c(4, 4) 
## Frequency = 7 
## [1]  90.21166 227.57056

forecastmodel$upper[,2]

## Time Series:
## Start = c(4, 3) 
## End = c(4, 4) 
## Frequency = 7 
## [1] 190.9285 323.8586

1e.Plot Demand vs Temperature for all of the available data in elecdaily. What does this say about your model?

elecdaily %>%
  as.data.frame() %>%
  ggplot(aes(x=Temperature, y=Demand)) +
  ylab("Electricity Demand") +
  xlab("Temperature") +
  geom_point() +
  geom_smooth(method="lm", se=FALSE)

## `geom_smooth()` using formula 'y ~ x'

Form thisplot, we could tell that the datd set of the model might be collected partially form the total data, which points out that the model might not be accurate or convincing enough to represent the total data group,since we can see a quite dispersive distribution of data in the plot.

Data set mens400 contains the winning times (in seconds) for the men’s 400 meters final in each Olympic Games from 1896 to 2016.

2a.Plot the winning time against the year. Describe the main features of the plot.

autoplot(mens400)

The main feature of this plot is that there are difeinitly missing values and overall, there is a declining or decreasing trend presented

2b.Fit a regression line to the data. Obviously the winning times have been decreasing, but at what average rate per year?

time_mens400 = time(mens400)
lm_mens400 = tslm(mens400 ~ time_mens400, data = mens400)
coefficients(lm_mens400)

##  (Intercept) time_mens400 
## 172.48147736  -0.06457385

print(" The winning times have been decreasing at 0.06457385 average rate per year ")

## [1] " The winning times have been decreasing at 0.06457385 average rate per year "

2c.Plot the residuals against the year. What does this indicate about the suitability of the fitted line?

cbind(Time = time_mens400, Residuals = lm_mens400$residuals) %>%
  as.data.frame() %>%
  ggplot(aes(x = Time, y = Residuals)) +
  geom_point() +
   geom_smooth(method="lm", se=FALSE)

## `geom_smooth()` using formula 'y ~ x'

## Warning: Removed 3 rows containing non-finite values (stat_smooth).

## Warning: Removed 3 rows containing missing values (geom_point).

  ylab("Residuals of Regression Line(Unit:s)")

## $y
## [1] "Residuals of Regression Line(Unit:s)"
## 
## attr(,"class")
## [1] "labels"

The residual plost shows that the suitability of the fitted line is good, meaning that the regression model fits the model pretty well.

2d.Predict the winning time for the men’s 400 meters final in the 2020 Olympics. Give a prediction interval for your forecasts. What assumptions have you made in these calculations?

lm2_mens400 <- lm(mens400 ~ time_mens400,data = mens400,na.action = na.exclude)
forecastmens400 <- forecast(lm2_mens400,newdata = data.frame(time_mens400 = 2020))
autoplot(mens400) +
  autolayer(forecastmens400, PI = TRUE)

forecastmens400$upper

##          [,1]     [,2]
## [1,] 43.63487 44.53176

forecastmens400$lower

##          [,1]     [,2]
## [1,] 40.44975 39.55286

3.Type easter(ausbeer) and interpret what you see.

easter(ausbeer)

##      Qtr1 Qtr2 Qtr3 Qtr4
## 1956 0.67 0.33 0.00 0.00
## 1957 0.00 1.00 0.00 0.00
## 1958 0.00 1.00 0.00 0.00
## 1959 1.00 0.00 0.00 0.00
## 1960 0.00 1.00 0.00 0.00
## 1961 0.33 0.67 0.00 0.00
## 1962 0.00 1.00 0.00 0.00
## 1963 0.00 1.00 0.00 0.00
## 1964 1.00 0.00 0.00 0.00
## 1965 0.00 1.00 0.00 0.00
## 1966 0.00 1.00 0.00 0.00
## 1967 1.00 0.00 0.00 0.00
## 1968 0.00 1.00 0.00 0.00
## 1969 0.00 1.00 0.00 0.00
## 1970 1.00 0.00 0.00 0.00
## 1971 0.00 1.00 0.00 0.00
## 1972 0.33 0.67 0.00 0.00
## 1973 0.00 1.00 0.00 0.00
## 1974 0.00 1.00 0.00 0.00
## 1975 1.00 0.00 0.00 0.00
## 1976 0.00 1.00 0.00 0.00
## 1977 0.00 1.00 0.00 0.00
## 1978 1.00 0.00 0.00 0.00
## 1979 0.00 1.00 0.00 0.00
## 1980 0.00 1.00 0.00 0.00
## 1981 0.00 1.00 0.00 0.00
## 1982 0.00 1.00 0.00 0.00
## 1983 0.00 1.00 0.00 0.00
## 1984 0.00 1.00 0.00 0.00
## 1985 0.00 1.00 0.00 0.00
## 1986 1.00 0.00 0.00 0.00
## 1987 0.00 1.00 0.00 0.00
## 1988 0.00 1.00 0.00 0.00
## 1989 1.00 0.00 0.00 0.00
## 1990 0.00 1.00 0.00 0.00
## 1991 1.00 0.00 0.00 0.00
## 1992 0.00 1.00 0.00 0.00
## 1993 0.00 1.00 0.00 0.00
## 1994 0.00 1.00 0.00 0.00
## 1995 0.00 1.00 0.00 0.00
## 1996 0.00 1.00 0.00 0.00
## 1997 1.00 0.00 0.00 0.00
## 1998 0.00 1.00 0.00 0.00
## 1999 0.00 1.00 0.00 0.00
## 2000 0.00 1.00 0.00 0.00
## 2001 0.00 1.00 0.00 0.00
## 2002 1.00 0.00 0.00 0.00
## 2003 0.00 1.00 0.00 0.00
## 2004 0.00 1.00 0.00 0.00
## 2005 1.00 0.00 0.00 0.00
## 2006 0.00 1.00 0.00 0.00
## 2007 0.00 1.00 0.00 0.00
## 2008 1.00 0.00 0.00 0.00
## 2009 0.00 1.00 0.00 0.00
## 2010 0.00 1.00

str(ausbeer)

##  Time-Series [1:218] from 1956 to 2010: 284 213 227 308 262 228 236 320 272 233 ...

The interpretation from “help” says that “easter” means easter holidays in each season ,and “ausber” means quarterly Australian Beer Production and it starts from 1st qurter of 1956 till the 2nd quarter of 2020, with a total of 218 data points. Also the reurns of “easter” function is a vector of 0’s or 1’s or other fractional values if Easter is observed in time period in between March and April.

4.An elasticity coefficient is the ratio of the percentage change in the forecast variable ( y) to the percentage change in the predictor variable (x). Mathematically, the elasticity is defined as
(dy/dx)×(x/y) . Consider the log-log model,logy=β0+β1logx+ε. Express y as a function of x and show that the coefficient β1 is the elasticity coefficient.

Since elasticity is equal to β1=(dy/dx)(x/y), then β1(dx/x)=(dy/y), so that y =(1/β1)(dydx / x), there for we can concluded that the coeddicient β1 is the elasticity of coefficient.

5.The data set fancy concerns the monthly sales figures of a shop which opened in January 1987 and sells gifts, souvenirs, and novelties. The shop is situated on the wharf at a beach resort town in Queensland, Australia. The sales volume varies with the seasonal population of tourists. There is a large influx of visitors to the town at Christmas and for the local surfing festival, held every March since 1988. Over time, the shop has expanded its premises, range of products, and staff.

a.Produce a time plot of the data and describe the patterns in the graph. Identify any unusual or unexpected fluctuations in the time series.

autoplot(fancy, xlab= "Year", ylab="Monthly sales")

fancy

##            Jan       Feb       Mar       Apr       May       Jun       Jul
## 1987   1664.81   2397.53   2840.71   3547.29   3752.96   3714.74   4349.61
## 1988   2499.81   5198.24   7225.14   4806.03   5900.88   4951.34   6179.12
## 1989   4717.02   5702.63   9957.58   5304.78   6492.43   6630.80   7349.62
## 1990   5921.10   5814.58  12421.25   6369.77   7609.12   7224.75   8121.22
## 1991   4826.64   6470.23   9638.77   8821.17   8722.37  10209.48  11276.55
## 1992   7615.03   9849.69  14558.40  11587.33   9332.56  13082.09  16732.78
## 1993  10243.24  11266.88  21826.84  17357.33  15997.79  18601.53  26155.15
##            Aug       Sep       Oct       Nov       Dec
## 1987   3566.34   5021.82   6423.48   7600.60  19756.21
## 1988   4752.15   5496.43   5835.10  12600.08  28541.72
## 1989   8176.62   8573.17   9690.50  15151.84  34061.01
## 1990   7979.25   8093.06   8476.70  17914.66  30114.41
## 1991  12552.22  11637.39  13606.89  21822.11  45060.69
## 1992  19888.61  23933.38  25391.35  36024.80  80721.71
## 1993  28586.52  30505.41  30821.33  46634.38 104660.67

We notice that there is a seasonal spike from 1993 to 1994 during the usually christmas season, comparing to the amount of monthly sales before 1993, which overally presents a standard seasnal fluctuation. Even though we also notice that, the monthly sales usually sharpenly increases from January to December, during the spike, the amount of monthly sales is almost double times greater than usual.

5b.Explain why it is necessary to take logarithms of these data before fitting a model.

It is necessary to take logrithms of these data, so that we could keep the same seasonal pattern size since the seasonal pattern varies in a certain proportion and presents exponential increase after 1992.

5c.Use R to fit a regression model to the logarithms of these sales data with a linear trend, seasonal dummies and a “surfing festival” dummy variable.

time_fancy <- time(fancy)
surfing_festival <- c()
for(i in 1:length(time_fancy)){
  month <- round(12*(time_fancy[i] - floor(time_fancy[i]))) + 1
  year <- floor(time_fancy[i])
  if(year >= 1988 & month == 3){
    surfing_festival[i] <- 1
  } else {
    surfing_festival[i] <- 0
  }
}

tslm_logfancy = tslm(BoxCox(fancy, 0) ~ trend + season + surfing_festival)

5d.Plot the residuals against time and against the fitted values. Do these plots reveal any problems with the model?

autoplot(tslm_logfancy$residuals)

We see from the plot that the redisuals have patterns over time, meaning that the residuals is correlated with time

5e.Do boxplots of the residuals for each month. Does this reveal any problems with the model?

Timefancy<-time(fancy)
cbind.data.frame(
    Month = factor(
      month.abb[round(12*(Timefancy - floor(Timefancy)) + 1)],
      labels = month.abb,
      ordered = TRUE
    ),
    Residuals = tslm_logfancy$residuals
    ) %>%
  ggplot(aes(x = Month,
             y = Residuals)) +
    geom_boxplot()

## Don't know how to automatically pick scale for object of type ts. Defaulting to continuous.

5f.What do the values of the coefficients tell you about each variable?

summary(tslm_logfancy)

## 
## Call:
## tslm(formula = BoxCox(fancy, 0) ~ trend + season + surfing_festival)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.33673 -0.12757  0.00257  0.10911  0.37671 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      7.6196670  0.0742471 102.626  < 2e-16 ***
## trend            0.0220198  0.0008268  26.634  < 2e-16 ***
## season2          0.2514168  0.0956790   2.628 0.010555 *  
## season3          0.2660828  0.1934044   1.376 0.173275    
## season4          0.3840535  0.0957075   4.013 0.000148 ***
## season5          0.4094870  0.0957325   4.277 5.88e-05 ***
## season6          0.4488283  0.0957647   4.687 1.33e-05 ***
## season7          0.6104545  0.0958039   6.372 1.71e-08 ***
## season8          0.5879644  0.0958503   6.134 4.53e-08 ***
## season9          0.6693299  0.0959037   6.979 1.36e-09 ***
## season10         0.7473919  0.0959643   7.788 4.48e-11 ***
## season11         1.2067479  0.0960319  12.566  < 2e-16 ***
## season12         1.9622412  0.0961066  20.417  < 2e-16 ***
## surfing_festival 0.5015151  0.1964273   2.553 0.012856 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.179 on 70 degrees of freedom
## Multiple R-squared:  0.9567, Adjusted R-squared:  0.9487 
## F-statistic:   119 on 13 and 70 DF,  p-value: < 2.2e-16

coefficients(tslm_logfancy)

##      (Intercept)            trend          season2          season3 
##       7.61966701       0.02201983       0.25141682       0.26608280 
##          season4          season5          season6          season7 
##       0.38405351       0.40948697       0.44882828       0.61045453 
##          season8          season9         season10         season11 
##       0.58796443       0.66932985       0.74739195       1.20674790 
##         season12 surfing_festival 
##       1.96224123       0.50151509

We see that all variables have positive corrlation with time, and the trend says that the sale increases with time; the surfing _festival also positivly influences the sales. Also, we could see that sales of every January is greater that previous January.

5g.What does the Breusch-Godfrey test tell you about your model?

checkresiduals(tslm_logfancy)

## 
##  Breusch-Godfrey test for serial correlation of order up to 17
## 
## data:  Residuals from Linear regression model
## LM test = 37.954, df = 17, p-value = 0.002494

A pvalue of 0.002494, is much smaller than 0.05, meaning that the model suffers from siginificant corrolation in residuals; the residuals are corrolated with each other.

5h.Regardless of your answers to the above questions, use your regression model to predict the monthly sales for 1994, 1995, and 1996. Produce prediction intervals for each of your forecasts. 5i.Transform your predictions and intervals to obtain predictions and intervals for the raw data.

#5h and 5i are answered together here

Next3yrs = rep(0, 36)
for(i in 1:36){
  if(i %% 12 == 3){
    Next3yrs[i] <- 1
  }
}


Next3yrs = ts(data = Next3yrs,
                   start = 1994, end = c(1996, 12),
                   frequency = 12) # Create Time series from 1994 to 1996


froecasttslm_logfancy = forecast(tslm_logfancy,
                             newdata = data.frame(Time = time(Next3yrs),
                                                  surfing_festival = Next3yrs))

autoplot(froecasttslm_logfancy)

froecasttslm_logfancy$upper

##               [,1]     [,2]
## Jan 1994  9.744183  9.88111
## Feb 1994 10.017620 10.15455
## Mar 1994 10.557120 10.69475
## Apr 1994 10.194296 10.33122
## May 1994 10.241749 10.37868
## Jun 1994 10.303110 10.44004
## Jul 1994 10.486756 10.62368
## Aug 1994 10.486286 10.62321
## Sep 1994 10.589671 10.72660
## Oct 1994 10.689753 10.82668
## Nov 1994 11.171129 11.30806
## Dec 1994 11.948642 12.08557
## Jan 1995 10.011336 10.14984
## Feb 1995 10.284773 10.42328
## Mar 1995 10.823938 10.96297
## Apr 1995 10.461449 10.59996
## May 1995 10.508903 10.64741
## Jun 1995 10.570264 10.70877
## Jul 1995 10.753910 10.89242
## Aug 1995 10.753440 10.89195
## Sep 1995 10.856825 10.99533
## Oct 1995 10.956907 11.09541
## Nov 1995 11.438282 11.57679
## Dec 1995 12.215796 12.35430
## Jan 1996 10.279093 10.41951
## Feb 1996 10.552530 10.69294
## Mar 1996 11.091365 11.23212
## Apr 1996 10.729206 10.86962
## May 1996 10.776659 10.91707
## Jun 1996 10.838021 10.97843
## Jul 1996 11.021667 11.16208
## Aug 1996 11.021196 11.16161
## Sep 1996 11.124582 11.26499
## Oct 1996 11.224664 11.36508
## Nov 1996 11.706039 11.84645
## Dec 1996 12.483552 12.62396

froecasttslm_logfancy$lower

##               [,1]      [,2]
## Jan 1994  9.238522  9.101594
## Feb 1994  9.511959  9.375031
## Mar 1994 10.048860  9.911228
## Apr 1994  9.688635  9.551707
## May 1994  9.736088  9.599161
## Jun 1994  9.797449  9.660522
## Jul 1994  9.981095  9.844168
## Aug 1994  9.980625  9.843698
## Sep 1994 10.084010  9.947083
## Oct 1994 10.184092 10.047165
## Nov 1994 10.665468 10.528541
## Dec 1994 11.442981 11.306054
## Jan 1995  9.499844  9.361338
## Feb 1995  9.773281  9.634775
## Mar 1995 10.310518 10.171489
## Apr 1995  9.949957  9.811451
## May 1995  9.997411  9.858904
## Jun 1995 10.058772  9.920265
## Jul 1995 10.242418 10.103911
## Aug 1995 10.241948 10.103441
## Sep 1995 10.345333 10.206826
## Oct 1995 10.445415 10.306908
## Nov 1995 10.926791 10.788284
## Dec 1995 11.704304 11.565797
## Jan 1996  9.760564  9.620151
## Feb 1996 10.034000  9.893588
## Mar 1996 10.571566 10.430810
## Apr 1996 10.210677 10.070264
## May 1996 10.258130 10.117718
## Jun 1996 10.319491 10.179079
## Jul 1996 10.503137 10.362725
## Aug 1996 10.502667 10.362254
## Sep 1996 10.606052 10.465640
## Oct 1996 10.706134 10.565722
## Nov 1996 11.187510 11.047097
## Dec 1996 11.965023 11.824611

5j.How could you improve these predictions by modifying the model? We could improve the precisions by applying data transformations such as exponential transformation to better handle the responding exponential growth over time.

6.The gasoline series consists of weekly data for supplies of US finished motor gasoline product, from 2 February 1991 to 20 January 2017. The units are in “million barrels per day”. Consider only the data to the end of 2004.

6a.Fit a harmonic regression with trend to the data. Experiment with changing the number Fourier terms. Plot the observed gasoline and fitted values and comment on what you see.

head(gasoline)

## Time Series:
## Start = 1991.1 
## End = 1991.19582477755 
## Frequency = 52.1785714285714 
## [1] 6.621 6.433 6.582 7.224 6.875 6.947

gasoline2004 <- window(gasoline, end = 2005)
autoplot(gasoline2004, xlab = "Year", ylab = "Gas Supply (Weekly)")

fourier.gas1 <- tslm(gasoline2004 ~ trend + fourier(gasoline2004, K=8))
fourier.gas2 <- tslm(gasoline2004 ~ trend + fourier(gasoline2004, K=12))
fourier.gas3 <- tslm(gasoline2004 ~ trend + fourier(gasoline2004, K=20))
fourier.gas4 <- tslm(gasoline2004 ~ trend + fourier(gasoline2004, K=22))
fourier.gas5 <- tslm(gasoline2004 ~ trend + fourier(gasoline2004, K=24))

autoplot(gasoline2004,xlab = "Year", ylab = "Gas Supply (Weekly)",main= "Fourier Transformation") +  autolayer(fitted(fourier.gas1))+
autolayer(fitted(fourier.gas2))+autolayer(fitted(fourier.gas3))+autolayer(fitted(fourier.gas4))+autolayer(fitted(fourier.gas5))

From the plot, the fitted line becomes more similar to the original data as more Fourier pairs are used

6b.Select the appropriate number of Fourier terms to include by minimising the AICc or CV value.

CV(fourier.gas1)#8

##            CV           AIC          AICc           BIC         AdjR2 
##  7.147431e-02 -1.913619e+03 -1.912542e+03 -1.826455e+03  8.505267e-01

CV(fourier.gas2)#12

##            CV           AIC          AICc           BIC         AdjR2 
##  7.096945e-02 -1.919276e+03 -1.917110e+03 -1.795412e+03  8.532618e-01

CV(fourier.gas3)#20

##            CV           AIC          AICc           BIC         AdjR2 
##  7.223834e-02 -1.908017e+03 -1.902469e+03 -1.710753e+03  8.540588e-01

CV(fourier.gas4)#22

##            CV           AIC          AICc           BIC         AdjR2 
##  7.298625e-02 -1.901053e+03 -1.894398e+03 -1.685438e+03  8.534097e-01

CV(fourier.gas5)#24

##            CV           AIC          AICc           BIC         AdjR2 
##  7.356848e-02 -1.895870e+03 -1.888001e+03 -1.661905e+03  8.531134e-01

I will select Fourier 2nd model, since it has a relactily largest CV and AIC comparing to other models.

6c.Check the residuals of the final model using the checkresiduals() function. Even though the residuals fail the correlation tests, the results are probably not severe enough to make much difference to the forecasts and prediction intervals. (Note that the correlations are relatively small, even though they are significant.)

checkresiduals(fourier.gas2$residuals)

## Warning in modeldf.default(object): Could not find appropriate degrees of
## freedom for this model.

6d.To forecast using harmonic regression, you will need to generate the future values of the Fourier terms. This can be done as follows.

fc <- forecast(fit, newdata=data.frame(fourier(x,K,h))) where fit is the fitted model using tslm(), K is the number of Fourier terms used in creating fit, and h is the forecast horizon required.

Forecast the next year of data.

fc_gasoline2005 <- forecast(fourier.gas2, newdata=data.frame(fourier(gasoline2004,12,52)))
fc_gasoline2005

##          Point Forecast    Lo 80    Hi 80    Lo 95     Hi 95
## 2005.014       8.713386 8.371087 9.055684 8.189474  9.237298
## 2005.033       8.642131 8.299720 8.984542 8.118047  9.166215
## 2005.052       8.656942 8.314540 8.999345 8.132871  9.181013
## 2005.071       8.661292 8.318899 9.003684 8.137236  9.185347
## 2005.090       8.686370 8.344109 9.028631 8.162515  9.210225
## 2005.110       8.784129 8.441982 9.126276 8.260448  9.307809
## 2005.129       8.895967 8.553823 9.238111 8.372291  9.419643
## 2005.148       8.937884 8.595756 9.280012 8.414232  9.461535
## 2005.167       8.940875 8.598761 9.282988 8.417246  9.464503
## 2005.186       8.988596 8.646477 9.330716 8.464958  9.512235
## 2005.205       9.062316 8.720191 9.404442 8.538669  9.585963
## 2005.225       9.073794 8.731673 9.415915 8.550153  9.597435
## 2005.244       9.031913 8.689800 9.374027 8.508284  9.555542
## 2005.263       9.041458 8.699342 9.383575 8.517824  9.565092
## 2005.282       9.122877 8.780756 9.464999 8.599236  9.646519
## 2005.301       9.169383 8.827263 9.511503 8.645744  9.693022
## 2005.320       9.132601 8.790487 9.474715 8.608970  9.656231
## 2005.340       9.120782 8.778667 9.462897 8.597150  9.644414
## 2005.359       9.221551 8.879431 9.563671 8.697912  9.745190
## 2005.378       9.335425 8.993306 9.677545 8.811787  9.859064
## 2005.397       9.316637 8.974522 9.658752 8.793006  9.840269
## 2005.416       9.214109 8.871994 9.556224 8.690478  9.737740
## 2005.435       9.218957 8.876838 9.561076 8.695320  9.742595
## 2005.455       9.383861 9.041741 9.725980 8.860222  9.907499
## 2005.474       9.545162 9.203046 9.887278 9.021530 10.068795
## 2005.493       9.559667 9.217553 9.901782 9.036037 10.083298
## 2005.512       9.487053 9.144935 9.829171 8.963416 10.010689
## 2005.531       9.464953 9.122834 9.807073 8.941315  9.988592
## 2005.550       9.508378 9.166261 9.850494 8.984744 10.032011
## 2005.570       9.534922 9.192808 9.877037 9.011292 10.058553
## 2005.589       9.521986 9.179869 9.864104 8.998351 10.045621
## 2005.608       9.522899 9.180779 9.865019 8.999260 10.046538
## 2005.627       9.548443 9.206326 9.890560 9.024808 10.072078
## 2005.646       9.536705 9.194591 9.878819 9.013075 10.060335
## 2005.665       9.451014 9.108897 9.793131 8.927380  9.974649
## 2005.685       9.330630 8.988509 9.672750 8.806990  9.854269
## 2005.704       9.229844 8.887726 9.571962 8.706208  9.753480
## 2005.723       9.170862 8.828748 9.512976 8.647232  9.694492
## 2005.742       9.168084 8.825968 9.510200 8.644451  9.691717
## 2005.761       9.230990 8.888869 9.573110 8.707349  9.754630
## 2005.780       9.320372 8.978252 9.662492 8.796734  9.844010
## 2005.800       9.363916 9.021801 9.706030 8.840285  9.887546
## 2005.819       9.342664 9.000548 9.684779 8.819032  9.866296
## 2005.838       9.304159 8.962036 9.646281 8.780516  9.827801
## 2005.857       9.267511 8.925389 9.609633 8.743869  9.791153
## 2005.876       9.194408 8.852292 9.536523 8.670776  9.718040
## 2005.895       9.100701 8.758587 9.442816 8.577070  9.624332
## 2005.915       9.104667 8.762540 9.446794 8.581017  9.628317
## 2005.934       9.265935 8.923806 9.608064 8.742282  9.789587
## 2005.953       9.436660 9.094538 9.778782 8.913018  9.960302
## 2005.972       9.400410 9.058293 9.742527 8.876776  9.924044
## 2005.991       9.147441 8.805233 9.489649 8.623668  9.671215

6e.Plot the forecasts along with the actual data for 2005. What do you find?

autoplot(fc_gasoline2005) +
  autolayer(window(
    gasoline,
    start = 2004,
    end = 2006
  )
  ) +
  scale_x_continuous(limits = c(2004, 2006))

## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.

## Warning: Removed 674 row(s) containing missing values (geom_path).

We found that the model is not able to predict sharpen drop.

7.Data set huron gives the water level of Lake Huron in feet from 1875 to 1972.

7a.Plot the data and comment on its features.

autoplot(huron, xlab = "Water Level", main="Water Level of Lake Huron(1875~1972)")

head(huron)

## Time Series:
## Start = 1875 
## End = 1880 
## Frequency = 1 
## [1] 10.38 11.86 10.97 10.80  9.79 10.39

We can see that seasonal changes may effect the water level of Lake Huron, but the seasonal pattern is not super obvious. We do notice a overall declined trend between 1880 to 1920, and in the entir time frame, there is also a slight downward trend in the water level.

7b.Fit a linear regression and compare this to a piecewise linear trend model with a knot at 1915.

lm_huron = tslm(huron~trend)

summary(lm_huron)

## 
## Call:
## tslm(formula = huron ~ trend)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.50997 -0.72726  0.00083  0.74402  2.53565 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 10.202037   0.230111  44.335  < 2e-16 ***
## trend       -0.024201   0.004036  -5.996 3.55e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.13 on 96 degrees of freedom
## Multiple R-squared:  0.2725, Adjusted R-squared:  0.2649 
## F-statistic: 35.95 on 1 and 96 DF,  p-value: 3.545e-08

t <- time(huron)
t.break <- 1915
t_piece <- ts(pmax(0,t-t.break), start=1875)
pw_huron <- tslm(huron ~ t + t_piece)
summary(pw_huron)

## 
## Call:
## tslm(formula = huron ~ t + t_piece)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.49626 -0.66240 -0.07139  0.85163  2.39222 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 132.90870   19.97687   6.653 1.82e-09 ***
## t            -0.06498    0.01051  -6.181 1.58e-08 ***
## t_piece       0.06486    0.01563   4.150 7.26e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.045 on 95 degrees of freedom
## Multiple R-squared:  0.3841, Adjusted R-squared:  0.3711 
## F-statistic: 29.62 on 2 and 95 DF,  p-value: 1.004e-10

7c.Generate forecasts from these two models for the period up to 1980 and comment on these.

h=8
fc_tslm_huron <- forecast(lm_huron, h=h)

t.new <- t[length(t)] + seq(h)
t_piece_new <- t_piece[length(t_piece)]+seq(h)
newdata <- cbind(t=t.new,
                 t_piece=t_piece_new) %>%
  as.data.frame()
fc_pw_huron <- forecast(pw_huron,newdata = newdata)

autoplot(huron) +
  autolayer(fitted(lm_huron), series = "Linear") +
  autolayer(fc_tslm_huron, series = "Linear")

autoplot(huron) +
  autolayer(fitted(pw_huron), series = "Piecewise") +
  autolayer(fc_pw_huron, series="Piecewise")

Bwteen 1920 to 1980, we cqn see that the piecewise model highlights the trend changes while tslm model only presents an overal downward trend.

Assignment 1. Chapter 5 (P1 ~ P7)

Yuanye Gao

11/6/2020