Load packages and data

library(fpp3)

Questions

Question 1

vic_jan <- vic_elec %>%
  filter(year(Date) < 2015, year(Date) > 2013, month(Date) < 02) %>%
  index_by(Date = as_date(Time)) %>%
  summarise(
    Demand = sum(Demand),
    Temperature = mean(Temperature)
  )
vic_jan

## # A tsibble: 31 x 3 [1D]
##    Date        Demand Temperature
##    <date>       <dbl>       <dbl>
##  1 2014-01-01 175185.        20.9
##  2 2014-01-02 188351.        18.2
##  3 2014-01-03 189086.        17.9
##  4 2014-01-04 173798.        16.8
##  5 2014-01-05 169733.        17.6
##  6 2014-01-06 195241.        16.0
##  7 2014-01-07 199770.        16.3
##  8 2014-01-08 205339.        18.5
##  9 2014-01-09 227334.        23.2
## 10 2014-01-10 258111.        27.8
## # … with 21 more rows
## # ℹ Use `print(n = ...)` to see more rows

a)

autoplot(vic_jan) +
  labs(title = "Electricity Demand Through Time", x = "Date")

## Plot variable not specified, automatically selected `.vars = Demand`

vic_jan %>%
  ggplot(aes(x = Demand, y = Temperature)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE)

## `geom_smooth()` using formula 'y ~ x'

  labs(title = "Temperature and Demand vs Electricity", y = "Temperature in Celcius", x = "Demand for Electricity")

## $y
## [1] "Temperature in Celcius"
## 
## $x
## [1] "Demand for Electricity"
## 
## $title
## [1] "Temperature and Demand vs Electricity"
## 
## attr(,"class")
## [1] "labels"

There is a positive linear relationship between Temperature and Demand for Electricity. This is because the largest source of energy usage in homes is air conditioning. When it gets hotter, people use their air conditioning more which increases their demand for electricity.

b)

vic_jan %>%
  gg_tsdisplay()

## Plot variable not specified, automatically selected `y = Demand`

I do not think this model is adequate. When looking at gg_tsresiduals it is clear that the residuals do not resemble white-noise. There are influential observations between January 13th through the January 17th. These observations are outside of the trend and have very high residuals.

c)

vic_15 <- vic_jan %>%
  model(TSLM(Demand ~ Temperature)) %>%
  forecast(new_data(vic_jan, 1) %>%
  mutate(Temperature = 15)) %>%
  autoplot(vic_jan)
vic_15

vic_35 <- vic_jan %>%
  model(TSLM(Demand ~ Temperature)) %>%
  forecast(new_data(vic_jan, 1) %>%
  mutate(Temperature = 35)) %>%
  autoplot(vic_jan)
vic_35

I do not believe either of these forecasts. Both 15 and 35 degrees are outside of our observations. This makes these forecasts very unlikely to come true. Also following the trend of the demand for electricity it is likely to be centered between these two forecasts not at either extreme (15 or 35 degrees).

d)

vic_15_hilo <- vic_jan %>%
  model(TSLM(Demand ~ Temperature)) %>%
  forecast(new_data(vic_jan, 1) %>%
  mutate(Temperature = 15)) %>%
  hilo()
vic_15_hilo

## # A tsibble: 1 x 7 [1D]
## # Key:       .model [1]
##   .model     Date                   Demand  .mean Tempe…¹                  `80%`
##   <chr>      <date>                 <dist>  <dbl>   <dbl>                 <hilo>
## 1 TSLM(Dema… 2014-02-01 N(167527, 4.7e+08) 1.68e5      15 [139649.7, 195404.5]80
## # … with 1 more variable: `95%` <hilo>, and abbreviated variable name
## #   ¹Temperature
## # ℹ Use `colnames()` to see all variable names

vic_35_hilo <- vic_jan %>%
  model(TSLM(Demand ~ Temperature)) %>%
  forecast(new_data(vic_jan, 1) %>%
  mutate(Temperature = 35)) %>%
  hilo()
vic_35_hilo

## # A tsibble: 1 x 7 [1D]
## # Key:       .model [1]
##   .model       Date                   Demand  .mean Tempe…¹                `80%`
##   <chr>        <date>                 <dist>  <dbl>   <dbl>               <hilo>
## 1 TSLM(Demand… 2014-02-01 N(347385, 5.3e+08) 3.47e5      35 [317814.6, 376955]80
## # … with 1 more variable: `95%` <hilo>, and abbreviated variable name
## #   ¹Temperature
## # ℹ Use `colnames()` to see all variable names

e)

plot(Demand ~ Temperature, data = vic_elec, main = "Comparing Demand and Temperature in all Data")

After plotting all observations, I can see high nonlinearity in our dataset. Our model needs to account for this nonlinearity. It is likely due to the air conditioner and heat usage. When it is very cold, people use the heat a lot and when it is very hot, people use the air conditioning a lot. This would show as a seasonal trend as the temperature changes with the seasons.

Question 2

a)

autoplot(olympic_running) +
  facet_wrap(~Length + Sex, scales = "free") + 
  theme(legend.position = "none")

## Plot variable not specified, automatically selected `.vars = Time`

Through almost all of the races for each sex there is a decreasing trend. This is likely due to increasing technology and health for the athletes. The 1500 women’s race is the only race that does not see this decreasing trend. Many of hte races have gaps from where the Olympics did not occur. This is visible in the teens of the nineteenth century for WWI and the forties for WWII.

b)

olympic_regression <- lm(Time ~ ., olympic_running)
olympic_regression

## 
## Call:
## lm(formula = Time ~ ., data = olympic_running)
## 
## Coefficients:
## (Intercept)         Year       Length     Sexwomen  
##    734.9863      -0.3905       0.1766      32.9732

Times have been decreasing at an average rate of 0.3905 seconds per year.

c)

library(forecast)

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

## 
## Attaching package: 'forecast'

## The following objects are masked from 'package:fabletools':
## 
##     accuracy, forecast

checkresiduals(olympic_regression)

## 
##  Breusch-Godfrey test for serial correlation of order up to 10
## 
## data:  Residuals
## LM test = 179.8, df = 10, p-value < 2.2e-16

unloadNamespace("forecast")

Our fitted line is not suitable to forecast the given dataset. The residuals are not even close to being white-noise. The residuals do not appear to be normally distributed and there is high autocorrelation through most of the lags up to 25.

d)

olympic_running %>%
  filter(Sex == "men") %>%
  filter(Length == "100") %>%
  model(TSLM(Time ~ Year)) %>%
  forecast(new_data(olympic_running, 1) %>%
  mutate(Year = 2020)) %>%
  autoplot(olympic_running) + 
  labs(title = "Forecasted Winning Time (Men's 100)") +
  theme(plot.title = element_text(hjust = 0.5))

olympic_running %>%
  filter(Sex == "men") %>%
  filter(Length == "200") %>%
  model(TSLM(Time ~ Year)) %>%
  forecast(new_data(olympic_running, 1) %>%
  mutate(Year = 2020)) %>%
  autoplot(olympic_running) + 
  labs(title = "Forecasted Winning Time (Men's 200)") +
  theme(plot.title = element_text(hjust = 0.5))

olympic_running %>%
  filter(Sex == "men") %>%
  filter(Length == "400") %>%
  model(TSLM(Time ~ Year)) %>%
  forecast(new_data(olympic_running, 1) %>%
  mutate(Year = 2020)) %>%
  autoplot(olympic_running) + 
  labs(title = "Forecasted Winning Time (Men's 400)") +
  theme(plot.title = element_text(hjust = 0.5))

olympic_running %>%
  filter(Sex == "men") %>%
  filter(Length == "800") %>%
  model(TSLM(Time ~ Year)) %>%
  forecast(new_data(olympic_running, 1) %>%
  mutate(Year = 2020)) %>%
  autoplot(olympic_running) + 
  labs(title = "Forecasted Winning Time (Men's 800)") +
  theme(plot.title = element_text(hjust = 0.5))

olympic_running %>%
  filter(Sex == "men") %>%
  filter(Length == "1500") %>%
  model(TSLM(Time ~ Year)) %>%
  forecast(new_data(olympic_running, 1) %>%
  mutate(Year = 2020)) %>%
  autoplot(olympic_running) + 
  labs(title = "Forecasted Winning Time (Men's 1500)") +
  theme(plot.title = element_text(hjust = 0.5))

olympic_running %>%
  filter(Sex == "men") %>%
  filter(Length == "5000") %>%
  model(TSLM(Time ~ Year)) %>%
  forecast(new_data(olympic_running, 1) %>%
  mutate(Year = 2020)) %>%
  autoplot(olympic_running) + 
  labs(title = "Forecasted Winning Time (Men's 5000)") +
  theme(plot.title = element_text(hjust = 0.5))

olympic_running %>%
  filter(Sex == "women") %>%
  filter(Length == "100") %>%
  model(TSLM(Time ~ Year)) %>%
  forecast(new_data(olympic_running, 1) %>%
  mutate(Year = 2020)) %>%
  autoplot(olympic_running) + 
  labs(title = "Forecasted Winning Time (Women's 100)") +
  theme(plot.title = element_text(hjust = 0.5))

olympic_running %>%
  filter(Sex == "women") %>%
  filter(Length == "200") %>%
  model(TSLM(Time ~ Year)) %>%
  forecast(new_data(olympic_running, 1) %>%
  mutate(Year = 2020)) %>%
  autoplot(olympic_running) + 
  labs(title = "Forecasted Winning Time (Women's 200)") +
  theme(plot.title = element_text(hjust = 0.5))

olympic_running %>%
  filter(Sex == "women") %>%
  filter(Length == "400") %>%
  model(TSLM(Time ~ Year)) %>%
  forecast(new_data(olympic_running, 1) %>%
  mutate(Year = 2020)) %>%
  autoplot(olympic_running) + 
  labs(title = "Forecasted Winning Time (Women's 400)") +
  theme(plot.title = element_text(hjust = 0.5))

olympic_running %>%
  filter(Sex == "women") %>%
  filter(Length == "800") %>%
  model(TSLM(Time ~ Year)) %>%
  forecast(new_data(olympic_running, 1) %>%
  mutate(Year = 2020)) %>%
  autoplot(olympic_running) + 
  labs(title = "Forecasted Winning Time (Women's 800)") +
  theme(plot.title = element_text(hjust = 0.5))

olympic_running %>%
  filter(Sex == "women") %>%
  filter(Length == "1500") %>%
  model(TSLM(Time ~ Year)) %>%
  forecast(new_data(olympic_running, 1) %>%
  mutate(Year = 2020)) %>%
  autoplot(olympic_running) + 
  labs(title = "Forecasted Winning Time (Women's 1500)") +
  theme(plot.title = element_text(hjust = 0.5))

olympic_running %>%
  filter(Sex == "women") %>%
  filter(Length == "5000") %>%
  model(TSLM(Time ~ Year)) %>%
  forecast(new_data(olympic_running, 1) %>%
  mutate(Year = 2020)) %>%
  autoplot(olympic_running) + 
  labs(title = "Forecasted Winning Time (Women's 5000)") +
  theme(plot.title = element_text(hjust = 0.5))

olympic_running %>%
  filter(Sex == "women") %>%
  filter(Length == "10000") %>%
  model(TSLM(Time ~ Year)) %>%
  forecast(new_data(olympic_running, 1) %>%
  mutate(Year = 2020)) %>%
  autoplot(olympic_running) + 
  labs(title = "Forecasted Winning Time (Women's 10000)") +
  theme(plot.title = element_text(hjust = 0.5))

Given these predictions for the winning time I have made the assumptions that the trend will continue after 2016 and that the model we used above is good enough to forecast this data. The assumption that the trend should continue is valid while the other assumption is not. Given the distribution and pattern seen in c of the residuals, the model is not a good enough predictor of the results.

Question 3

a)

plot(souvenirs)

autoplot(souvenirs) +
  labs(title = "Australian Souvenirs Sales", x = "Month")

## Plot variable not specified, automatically selected `.vars = Sales`

The number of souvenirs is increasing through time. The data is not homoskedastic. There is a clear yearly season. The number of souvenirs sold also spikes every December. This is likely due to lost of travel for Christmas as well as people coming to Australia for warm weather when it is cold everywhere else.

b)

souvenir_num <- souvenirs
souvenir_sales <- as.numeric(souvenirs$Sales)
hist(souvenir_sales)

souvenir_log <- souvenir_num
souvenir_log_sales <- log(souvenir_num$Sales)
hist(souvenir_log_sales)

autoplot(souvenir_log) +
  labs(title = "Australian Souvenirs Sales", x = "Month")

## Plot variable not specified, automatically selected `.vars = Sales`

souvenirs_log2 <- souvenirs %>%
  mutate(Sales = log(Sales))

autoplot(souvenirs_log2) +
  labs(title = "Australian Souvenirs Sales", x = "Month")

## Plot variable not specified, automatically selected `.vars = Sales`

It is necessary to take the log of the data to see the variation in the data before 1990. Because the number of sales and variation in sales is so much higher in the latter year, you cannot see much of the variation in the earlier years. Taking the log also makes the variation in sales to be more similar through time. This helps with the heteroskedasticity problem mentioned above. The log also further exemplifies the spikes every December.

c)

souvenir1 <- ts(souvenirs$Sales,start = c(1987,1), end = c(1993,6), frequency=12)
souvenir_log2 <- log(souvenir1)
dummy_surf_fest <- rep(0, length(souvenir1))
dummy_surf_fest[seq_along(dummy_surf_fest)%%12 == 3] = 1
dummy_surf_fest[3] = 0
dummy_surf_fest <- ts(dummy_surf_fest, freq = 12, start = c(1987,1))
souvenir_final <- data.frame(souvenir_log2, dummy_surf_fest)

library(forecast)

## 
## Attaching package: 'forecast'

## The following objects are masked from 'package:fabletools':
## 
##     accuracy, forecast

souvenir_fit <- tslm(souvenir_log2 ~ trend + season + dummy_surf_fest, data = souvenir_final)

souvenir_forecast <- data.frame(dummy_surf_fest = rep(0, 12))
souvenir_forecast[3,] = 1
forecast(souvenir_fit, newdata = souvenir_forecast)

##          Point Forecast     Lo 80     Hi 80     Lo 95     Hi 95
## Jul 1993       9.883136  9.637329 10.128943  9.503919 10.262353
## Aug 1993       9.867772  9.621965 10.113579  9.488555 10.246989
## Sep 1993      10.531942 10.191069 10.872815 10.006062 11.057822
## Oct 1993      10.092605  9.846798 10.338412  9.713388 10.471822
## Nov 1993      10.585189 10.339382 10.830995 10.205971 10.964406
## Dec 1993      11.357556 11.111749 11.603363 10.978339 11.736773
## Jan 1994       9.430933  9.186093  9.675774  9.053207  9.808659
## Feb 1994       9.704370  9.459530  9.949210  9.326644 10.082096
## Mar 1994       9.695742  9.365756 10.025727  9.186658 10.204825
## Apr 1994       9.881046  9.636206 10.125887  9.503320 10.258772
## May 1994       9.928500  9.683659 10.173340  9.550774 10.306226
## Jun 1994       9.989861  9.745020 10.234701  9.612135 10.367587

d)

plot(souvenir_fit$residuals)

By plotting the residuals I can see that they appear to be white-noise. There does not appear to be a trend nor seasonality in our residuals which is ideal. They apear to be completely random.

e)

boxplot(residuals(souvenir_fit)~cycle(residuals(souvenir_fit)))

This boxplot supports my belief that the data suffers from heteroskedasticity. There is much less variation in the middle months of the year than there is in the beginning and final months of the year. There are also two outliers in the data.

f)

library(forecast)
souvenir_fit$coefficients

##     (Intercept)           trend         season2         season3         season4 
##       7.6662400       0.0207611       0.2526756       0.2232860       0.3878297 
##         season5         season6         season7         season8         season9 
##       0.4145219       0.4551219       0.5767690       0.5406444       0.6296713 
##        season10        season11        season12 dummy_surf_fest 
##       0.7239552       1.1957774       1.9473841       0.5543818

Each coefficient tells you the estimated effect it being that month has on souvenir sales. These coefficients further support my point of heteroskedasticity in the dataset.

g)

checkresiduals(souvenir_fit)

## 
##  Breusch-Godfrey test for serial correlation of order up to 17
## 
## data:  Residuals from Linear regression model
## LM test = 31.535, df = 17, p-value = 0.01718

The Ljung-Box test tells me that my residuals are statistically different from white-noise. This means our model is not good enough because there is a trend and/or seasonality in our residuals.

h)

souvenir_forecast2 <- data.frame(dummy_surf_fest = rep(0, 36))
souvenir_forecast3 <- forecast(souvenir_fit, newdata = souvenir_forecast2)
souvenir_forecast3

##          Point Forecast     Lo 80     Hi 80     Lo 95     Hi 95
## Jul 1993       9.883136  9.637329 10.128943  9.503919 10.262353
## Aug 1993       9.867772  9.621965 10.113579  9.488555 10.246989
## Sep 1993       9.977560  9.731753 10.223367  9.598343 10.356777
## Oct 1993      10.092605  9.846798 10.338412  9.713388 10.471822
## Nov 1993      10.585189 10.339382 10.830995 10.205971 10.964406
## Dec 1993      11.357556 11.111749 11.603363 10.978339 11.736773
## Jan 1994       9.430933  9.186093  9.675774  9.053207  9.808659
## Feb 1994       9.704370  9.459530  9.949210  9.326644 10.082096
## Mar 1994       9.695742  9.365756 10.025727  9.186658 10.204825
## Apr 1994       9.881046  9.636206 10.125887  9.503320 10.258772
## May 1994       9.928500  9.683659 10.173340  9.550774 10.306226
## Jun 1994       9.989861  9.745020 10.234701  9.612135 10.367587
## Jul 1994      10.132269  9.883394 10.381144  9.748319 10.516219
## Aug 1994      10.116905  9.868031 10.365780  9.732955 10.500856
## Sep 1994      10.226693  9.977819 10.475568  9.842743 10.610644
## Oct 1994      10.341738 10.092864 10.590613  9.957788 10.725689
## Nov 1994      10.834322 10.585447 11.083197 10.450372 11.218272
## Dec 1994      11.606690 11.357815 11.855564 11.222739 11.990640
## Jan 1995       9.680067  9.431764  9.928369  9.296999 10.063134
## Feb 1995       9.953503  9.705201 10.201806  9.570436 10.336570
## Mar 1995       9.944875  9.610605 10.279144  9.429182 10.460567
## Apr 1995      10.130180  9.881877 10.378482  9.747112 10.513247
## May 1995      10.177633  9.929330 10.425935  9.794566 10.560700
## Jun 1995      10.238994  9.990691 10.487296  9.855927 10.622061
## Jul 1995      10.381402 10.128745 10.634059  9.991617 10.771188
## Aug 1995      10.366039 10.113381 10.618696  9.976253 10.755824
## Sep 1995      10.475827 10.223169 10.728484 10.086041 10.865612
## Oct 1995      10.590872 10.338214 10.843529 10.201086 10.980657
## Nov 1995      11.083455 10.830798 11.336112 10.693669 11.473240
## Dec 1995      11.855823 11.603165 12.108480 11.466037 12.245608
## Jan 1996       9.929200  9.676730 10.181669  9.539704 10.318696
## Feb 1996      10.202636  9.950167 10.455106  9.813141 10.592132
## Mar 1996      10.194008  9.854949 10.533067  9.670926 10.717090
## Apr 1996      10.379313 10.126843 10.631782  9.989817 10.768809
## May 1996      10.426766 10.174296 10.679236 10.037270 10.816262
## Jun 1996      10.488127 10.235658 10.740597 10.098631 10.877623

autoplot(souvenir_forecast3)

unloadNamespace("forecast")

i)

I believe a Box-Cox transformation could be used to improve the forecast model. This is because it would normally distribute the variance in your data. I also believe you could redefine your variables to align with the dataset more properly. One example of this could be to use a squared term to account for the nonlinearity in the data.

Question 4

us_gas <- us_gasoline %>%
  filter(year(Week) < "2005")

a)

fourier_gas1 <- us_gas %>%
  model(TSLM(Barrels ~ trend() + fourier(K = 1)))
report(fourier_gas1)

## Series: Barrels 
## Model: TSLM 
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.969489 -0.197166 -0.002252  0.200869  0.975792 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          7.092e+00  2.131e-02 332.782  < 2e-16 ***
## trend()              2.807e-03  5.081e-05  55.237  < 2e-16 ***
## fourier(K = 1)C1_52 -1.238e-01  1.505e-02  -8.226 9.01e-16 ***
## fourier(K = 1)S1_52 -2.383e-01  1.505e-02 -15.832  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2865 on 722 degrees of freedom
## Multiple R-squared: 0.8248,  Adjusted R-squared: 0.8241
## F-statistic:  1133 on 3 and 722 DF, p-value: < 2.22e-16

augment(fourier_gas1) %>%
  ggplot(aes(x = Week)) +
  geom_line(aes(y = Barrels, color = "Data")) +
  geom_line(aes(y = .fitted, color = "Fitted")) +
  labs(y = "Thousand of Barrels Per Day", title = "US Gas Harmonic Regression (K = 1)") +
  scale_color_manual(values = c(Data = "black", Fitted = "red"))

fourier_gas1 %>%
  forecast(h = 52) %>%
  autoplot(us_gas) +
  labs(y = "Thousand of Barrels Per Day", title = "US Gas Harmonic Regression Forecast (K = 1)")

fourier_gas2 <- us_gas %>%
  model(TSLM(Barrels ~ trend() + fourier(K = 2)))
report(fourier_gas2)

## Series: Barrels 
## Model: TSLM 
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.9375162 -0.1897569 -0.0006692  0.2058275  1.0016928 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          7.094e+00  2.121e-02 334.493  < 2e-16 ***
## trend()              2.802e-03  5.057e-05  55.420  < 2e-16 ***
## fourier(K = 2)C1_52 -1.237e-01  1.497e-02  -8.265 6.71e-16 ***
## fourier(K = 2)S1_52 -2.383e-01  1.497e-02 -15.917  < 2e-16 ***
## fourier(K = 2)C2_52  4.493e-02  1.495e-02   3.006  0.00274 ** 
## fourier(K = 2)S2_52  1.054e-02  1.498e-02   0.704  0.48193    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.285 on 720 degrees of freedom
## Multiple R-squared: 0.8271,  Adjusted R-squared: 0.8259
## F-statistic: 688.8 on 5 and 720 DF, p-value: < 2.22e-16

augment(fourier_gas2) %>%
  ggplot(aes(x = Week)) +
  geom_line(aes(y = Barrels, color = "Data")) +
  geom_line(aes(y = .fitted, color = "Fitted")) +
  labs(y = "Thousand of Barrels Per Day", title = "US Gas Harmonic Regression (K = 2)") +
  scale_color_manual(values = c(Data = "black", Fitted = "red"))

fourier_gas2 %>%
  forecast(h = 52) %>%
  autoplot(us_gas) +
  labs(y = "Thousand of Barrels Per Day", title = "US Gas Harmonic Regression Forecast (K = 2)")

fourier_gas5 <- us_gas %>%
  model(TSLM(Barrels ~ trend() + fourier(K = 5)))
report(fourier_gas5)

## Series: Barrels 
## Model: TSLM 
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.8568361 -0.1759670 -0.0002609  0.1916623  0.9380241 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          7.095e+00  2.043e-02 347.198  < 2e-16 ***
## trend()              2.798e-03  4.873e-05  57.428  < 2e-16 ***
## fourier(K = 5)C1_52 -1.242e-01  1.442e-02  -8.610  < 2e-16 ***
## fourier(K = 5)S1_52 -2.390e-01  1.442e-02 -16.570  < 2e-16 ***
## fourier(K = 5)C2_52  4.517e-02  1.440e-02   3.137  0.00178 ** 
## fourier(K = 5)S2_52  9.760e-03  1.443e-02   0.676  0.49898    
## fourier(K = 5)C3_52  9.586e-02  1.442e-02   6.646    6e-11 ***
## fourier(K = 5)S3_52  2.543e-04  1.440e-02   0.018  0.98592    
## fourier(K = 5)C4_52  2.854e-02  1.442e-02   1.979  0.04821 *  
## fourier(K = 5)S4_52  2.861e-02  1.440e-02   1.987  0.04733 *  
## fourier(K = 5)C5_52 -3.364e-02  1.440e-02  -2.336  0.01974 *  
## fourier(K = 5)S5_52  3.123e-02  1.443e-02   2.165  0.03073 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2746 on 714 degrees of freedom
## Multiple R-squared: 0.8409,  Adjusted R-squared: 0.8384
## F-statistic:   343 on 11 and 714 DF, p-value: < 2.22e-16

augment(fourier_gas5) %>%
  ggplot(aes(x = Week)) +
  geom_line(aes(y = Barrels, color = "Data")) +
  geom_line(aes(y = .fitted, color = "Fitted")) +
  labs(y = "Thousand of Barrels Per Day", title = "US Gas Harmonic Regression (K = 5)") +
  scale_color_manual(values = c(Data = "black", Fitted = "red"))

fourier_gas5 %>%
  forecast(h = 52) %>%
  autoplot(us_gas) +
  labs(y = "Thousand of Barrels Per Day", title = "US Gas Harmonic Regression Forecast (K = 5)")

fourier_gas10 <- us_gas %>%
  model(TSLM(Barrels ~ trend() + fourier(K = 10)))
report(fourier_gas10)

## Series: Barrels 
## Model: TSLM 
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.8625657 -0.1755251  0.0003646  0.1817594  0.9806206 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            7.094e+00  2.002e-02 354.363  < 2e-16 ***
## trend()                2.799e-03  4.774e-05  58.625  < 2e-16 ***
## fourier(K = 10)C1_52  -1.245e-01  1.413e-02  -8.814  < 2e-16 ***
## fourier(K = 10)S1_52  -2.395e-01  1.413e-02 -16.945  < 2e-16 ***
## fourier(K = 10)C2_52   4.529e-02  1.411e-02   3.210  0.00139 ** 
## fourier(K = 10)S2_52   9.203e-03  1.414e-02   0.651  0.51524    
## fourier(K = 10)C3_52   9.636e-02  1.413e-02   6.819 1.98e-11 ***
## fourier(K = 10)S3_52  -4.035e-06  1.411e-02   0.000  0.99977    
## fourier(K = 10)C4_52   2.905e-02  1.413e-02   2.056  0.04015 *  
## fourier(K = 10)S4_52   2.884e-02  1.411e-02   2.044  0.04134 *  
## fourier(K = 10)C5_52  -3.349e-02  1.410e-02  -2.375  0.01783 *  
## fourier(K = 10)S5_52   3.176e-02  1.413e-02   2.247  0.02494 *  
## fourier(K = 10)C6_52  -6.569e-02  1.412e-02  -4.653 3.91e-06 ***
## fourier(K = 10)S6_52   2.815e-02  1.412e-02   1.993  0.04660 *  
## fourier(K = 10)C7_52  -2.240e-02  1.413e-02  -1.585  0.11340    
## fourier(K = 10)S7_52   3.279e-02  1.411e-02   2.324  0.02038 *  
## fourier(K = 10)C8_52  -1.671e-02  1.411e-02  -1.184  0.23676    
## fourier(K = 10)S8_52  -1.432e-03  1.412e-02  -0.101  0.91926    
## fourier(K = 10)C9_52  -1.768e-02  1.411e-02  -1.253  0.21066    
## fourier(K = 10)S9_52  -6.335e-04  1.413e-02  -0.045  0.96424    
## fourier(K = 10)C10_52  1.274e-02  1.412e-02   0.902  0.36751    
## fourier(K = 10)S10_52 -2.368e-02  1.411e-02  -1.678  0.09387 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.269 on 704 degrees of freedom
## Multiple R-squared: 0.8494,  Adjusted R-squared: 0.8449
## F-statistic: 189.1 on 21 and 704 DF, p-value: < 2.22e-16

augment(fourier_gas10) %>%
  ggplot(aes(x = Week)) +
  geom_line(aes(y = Barrels, color = "Data")) +
  geom_line(aes(y = .fitted, color = "Fitted")) +
  labs(y = "Thousand of Barrels Per Day", title = "US Gas Harmonic Regression (K = 10)") +
  scale_color_manual(values = c(Data = "black", Fitted = "red"))

fourier_gas10 %>%
  forecast(h = 52) %>%
  autoplot(us_gas) +
  labs(y = "Thousand of Barrels Per Day", title = "US Gas Harmonic Regression Forecast (K = 10)")

fourier_gas20 <- us_gas %>%
  model(TSLM(Barrels ~ trend() + fourier(K = 20)))
report(fourier_gas20)

## Series: Barrels 
## Model: TSLM 
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.8455538 -0.1703796  0.0009278  0.1724455  0.9995200 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            7.095e+00  2.001e-02 354.550  < 2e-16 ***
## trend()                2.798e-03  4.772e-05  58.649  < 2e-16 ***
## fourier(K = 20)C1_52  -1.245e-01  1.412e-02  -8.814  < 2e-16 ***
## fourier(K = 20)S1_52  -2.394e-01  1.413e-02 -16.946  < 2e-16 ***
## fourier(K = 20)C2_52   4.525e-02  1.410e-02   3.209  0.00139 ** 
## fourier(K = 20)S2_52   9.317e-03  1.413e-02   0.659  0.50988    
## fourier(K = 20)C3_52   9.624e-02  1.413e-02   6.814 2.09e-11 ***
## fourier(K = 20)S3_52   3.202e-05  1.410e-02   0.002  0.99819    
## fourier(K = 20)C4_52   2.895e-02  1.412e-02   2.050  0.04072 *  
## fourier(K = 20)S4_52   2.877e-02  1.410e-02   2.040  0.04175 *  
## fourier(K = 20)C5_52  -3.349e-02  1.410e-02  -2.376  0.01779 *  
## fourier(K = 20)S5_52   3.165e-02  1.413e-02   2.240  0.02541 *  
## fourier(K = 20)C6_52  -6.559e-02  1.411e-02  -4.649 4.01e-06 ***
## fourier(K = 20)S6_52   2.808e-02  1.411e-02   1.990  0.04701 *  
## fourier(K = 20)C7_52  -2.229e-02  1.413e-02  -1.578  0.11498    
## fourier(K = 20)S7_52   3.283e-02  1.410e-02   2.329  0.02018 *  
## fourier(K = 20)C8_52  -1.668e-02  1.411e-02  -1.183  0.23735    
## fourier(K = 20)S8_52  -1.323e-03  1.412e-02  -0.094  0.92536    
## fourier(K = 20)C9_52  -1.775e-02  1.410e-02  -1.259  0.20862    
## fourier(K = 20)S9_52  -5.484e-04  1.412e-02  -0.039  0.96903    
## fourier(K = 20)C10_52  1.263e-02  1.412e-02   0.894  0.37140    
## fourier(K = 20)S10_52 -2.368e-02  1.411e-02  -1.679  0.09360 .  
## fourier(K = 20)C11_52 -2.835e-02  1.411e-02  -2.008  0.04500 *  
## fourier(K = 20)S11_52  2.542e-02  1.411e-02   1.801  0.07208 .  
## fourier(K = 20)C12_52 -1.166e-02  1.411e-02  -0.827  0.40869    
## fourier(K = 20)S12_52 -2.505e-02  1.411e-02  -1.775  0.07633 .  
## fourier(K = 20)C13_52  6.269e-03  1.411e-02   0.444  0.65703    
## fourier(K = 20)S13_52  1.421e-02  1.411e-02   1.007  0.31442    
## fourier(K = 20)C14_52 -3.088e-03  1.411e-02  -0.219  0.82685    
## fourier(K = 20)S14_52  1.816e-02  1.411e-02   1.287  0.19865    
## fourier(K = 20)C15_52 -1.403e-03  1.411e-02  -0.099  0.92086    
## fourier(K = 20)S15_52  1.743e-02  1.411e-02   1.235  0.21708    
## fourier(K = 20)C16_52 -1.224e-02  1.412e-02  -0.867  0.38624    
## fourier(K = 20)S16_52 -1.567e-03  1.411e-02  -0.111  0.91160    
## fourier(K = 20)C17_52  7.639e-03  1.410e-02   0.542  0.58827    
## fourier(K = 20)S17_52 -2.197e-02  1.412e-02  -1.556  0.12021    
## fourier(K = 20)C18_52  2.355e-03  1.411e-02   0.167  0.86749    
## fourier(K = 20)S18_52  1.760e-02  1.412e-02   1.247  0.21299    
## fourier(K = 20)C19_52 -2.098e-03  1.412e-02  -0.149  0.88197    
## fourier(K = 20)S19_52 -7.173e-05  1.410e-02  -0.005  0.99594    
## fourier(K = 20)C20_52 -2.844e-03  1.411e-02  -0.202  0.84034    
## fourier(K = 20)S20_52  7.698e-04  1.411e-02   0.055  0.95652    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2688 on 684 degrees of freedom
## Multiple R-squared: 0.8538,  Adjusted R-squared: 0.8451
## F-statistic: 97.46 on 41 and 684 DF, p-value: < 2.22e-16

augment(fourier_gas20) %>%
  ggplot(aes(x = Week)) +
  geom_line(aes(y = Barrels, color = "Data")) +
  geom_line(aes(y = .fitted, color = "Fitted")) +
  labs(y = "Thousand of Barrels Per Day", title = "US Gas Harmonic Regression (K = 20)") +
  scale_color_manual(values = c(Data = "black", Fitted = "red"))

fourier_gas20 %>%
  forecast(h = 52) %>%
  autoplot(us_gas) +
  labs(y = "Thousand of Barrels Per Day", title = "US Gas Harmonic Regression Forecast (K = 20)")

fourier_gas26 <- us_gas %>%
  model(TSLM(Barrels ~ trend() + fourier(K = 26)))
report(fourier_gas26)

## Series: Barrels 
## Model: TSLM 
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.834099 -0.172836  0.001292  0.167365  1.034800 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            7.095e+00  1.994e-02 355.756  < 2e-16 ***
## trend()                2.799e-03  4.755e-05  58.858  < 2e-16 ***
## fourier(K = 26)C1_52  -1.244e-01  1.407e-02  -8.843  < 2e-16 ***
## fourier(K = 26)S1_52  -2.394e-01  1.408e-02 -17.004  < 2e-16 ***
## fourier(K = 26)C2_52   4.527e-02  1.405e-02   3.222  0.00134 ** 
## fourier(K = 26)S2_52   9.340e-03  1.408e-02   0.663  0.50740    
## fourier(K = 26)C3_52   9.624e-02  1.408e-02   6.837 1.81e-11 ***
## fourier(K = 26)S3_52   7.369e-05  1.405e-02   0.005  0.99582    
## fourier(K = 26)C4_52   2.892e-02  1.407e-02   2.055  0.04031 *  
## fourier(K = 26)S4_52   2.880e-02  1.405e-02   2.050  0.04079 *  
## fourier(K = 26)C5_52  -3.356e-02  1.405e-02  -2.389  0.01719 *  
## fourier(K = 26)S5_52   3.164e-02  1.408e-02   2.247  0.02494 *  
## fourier(K = 26)C6_52  -6.564e-02  1.406e-02  -4.668 3.67e-06 ***
## fourier(K = 26)S6_52   2.802e-02  1.407e-02   1.992  0.04676 *  
## fourier(K = 26)C7_52  -2.227e-02  1.408e-02  -1.582  0.11409    
## fourier(K = 26)S7_52   3.274e-02  1.405e-02   2.330  0.02009 *  
## fourier(K = 26)C8_52  -1.659e-02  1.406e-02  -1.180  0.23839    
## fourier(K = 26)S8_52  -1.368e-03  1.407e-02  -0.097  0.92254    
## fourier(K = 26)C9_52  -1.765e-02  1.406e-02  -1.255  0.20977    
## fourier(K = 26)S9_52  -4.998e-04  1.407e-02  -0.036  0.97168    
## fourier(K = 26)C10_52  1.266e-02  1.407e-02   0.900  0.36859    
## fourier(K = 26)S10_52 -2.356e-02  1.406e-02  -1.676  0.09419 .  
## fourier(K = 26)C11_52 -2.843e-02  1.407e-02  -2.021  0.04366 *  
## fourier(K = 26)S11_52  2.553e-02  1.406e-02   1.815  0.06991 .  
## fourier(K = 26)C12_52 -1.181e-02  1.406e-02  -0.840  0.40118    
## fourier(K = 26)S12_52 -2.505e-02  1.407e-02  -1.781  0.07539 .  
## fourier(K = 26)C13_52  6.166e-03  1.406e-02   0.438  0.66119    
## fourier(K = 26)S13_52  1.409e-02  1.406e-02   1.002  0.31687    
## fourier(K = 26)C14_52 -3.056e-03  1.406e-02  -0.217  0.82802    
## fourier(K = 26)S14_52  1.800e-02  1.407e-02   1.279  0.20120    
## fourier(K = 26)C15_52 -1.247e-03  1.407e-02  -0.089  0.92937    
## fourier(K = 26)S15_52  1.735e-02  1.406e-02   1.234  0.21766    
## fourier(K = 26)C16_52 -1.207e-02  1.407e-02  -0.858  0.39117    
## fourier(K = 26)S16_52 -1.492e-03  1.406e-02  -0.106  0.91550    
## fourier(K = 26)C17_52  7.686e-03  1.406e-02   0.547  0.58466    
## fourier(K = 26)S17_52 -2.178e-02  1.407e-02  -1.548  0.12210    
## fourier(K = 26)C18_52  2.236e-03  1.406e-02   0.159  0.87369    
## fourier(K = 26)S18_52  1.775e-02  1.407e-02   1.262  0.20739    
## fourier(K = 26)C19_52 -2.301e-03  1.408e-02  -0.163  0.87021    
## fourier(K = 26)S19_52 -6.677e-05  1.405e-02  -0.005  0.99621    
## fourier(K = 26)C20_52 -2.977e-03  1.406e-02  -0.212  0.83240    
## fourier(K = 26)S20_52  6.098e-04  1.407e-02   0.043  0.96543    
## fourier(K = 26)C21_52  6.089e-03  1.405e-02   0.433  0.66483    
## fourier(K = 26)S21_52  3.927e-03  1.408e-02   0.279  0.78039    
## fourier(K = 26)C22_52 -8.038e-04  1.407e-02  -0.057  0.95447    
## fourier(K = 26)S22_52 -1.126e-02  1.405e-02  -0.801  0.42323    
## fourier(K = 26)C23_52 -2.159e-02  1.408e-02  -1.534  0.12551    
## fourier(K = 26)S23_52  1.192e-02  1.405e-02   0.848  0.39656    
## fourier(K = 26)C24_52  7.003e-04  1.405e-02   0.050  0.96025    
## fourier(K = 26)S24_52  1.837e-02  1.408e-02   1.305  0.19241    
## fourier(K = 26)C25_52  4.907e-03  1.406e-02   0.349  0.72717    
## fourier(K = 26)S25_52  7.445e-03  1.407e-02   0.529  0.59686    
## fourier(K = 26)C26_52 -3.084e-02  9.944e-03  -3.101  0.00201 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2679 on 673 degrees of freedom
## Multiple R-squared: 0.8572,  Adjusted R-squared: 0.8461
## F-statistic: 77.67 on 52 and 673 DF, p-value: < 2.22e-16

augment(fourier_gas26) %>%
  ggplot(aes(x = Week)) +
  geom_line(aes(y = Barrels, color = "Data")) +
  geom_line(aes(y = .fitted, color = "Fitted")) +
  labs(y = "Thousand of Barrels Per Day", title = "US Gas Harmonic Regression (K = 26)") +
  scale_color_manual(values = c(Data = "black", Fitted = "red"))

fourier_gas26 %>%
  forecast(h = 52) %>%
  autoplot(us_gas) +
  labs(y = "Thousand of Barrels Per Day", title = "US Gas Harmonic Regression Forecast (K = 26)")

I see a clear seasonality and trend in the data. The barrels increases every summer of the year. The trend is also increasing through time.

b)

gas_fit <- us_gas %>%
  model(K1 = TSLM(Barrels ~ trend() + fourier(K = 1)),
        K2 = TSLM(Barrels ~ trend() + fourier(K = 2)),
        K5 = TSLM(Barrels ~ trend() + fourier(K = 5)),
        K10 = TSLM(Barrels ~ trend() + fourier(K = 10)),
        K20 = TSLM(Barrels ~ trend() + fourier(K = 20)),
        K26 = TSLM(Barrels ~ trend() + fourier(K = 26)))
glance(gas_fit) %>% select(.model, r_squared, adj_r_squared, AICc)

## # A tibble: 6 × 4
##   .model r_squared adj_r_squared   AICc
##   <chr>      <dbl>         <dbl>  <dbl>
## 1 K1         0.825         0.824 -1809.
## 2 K2         0.827         0.826 -1814.
## 3 K5         0.841         0.838 -1862.
## 4 K10        0.849         0.845 -1881.
## 5 K20        0.854         0.845 -1859.
## 6 K26        0.857         0.846 -1851.

gas_fit <- us_gas %>%
  model(K11 = TSLM(Barrels ~ trend() + fourier(K = 11)),
        K12 = TSLM(Barrels ~ trend() + fourier(K = 12)),
        K13 = TSLM(Barrels ~ trend() + fourier(K = 13)),
        K14 = TSLM(Barrels ~ trend() + fourier(K = 14)),
        K15 = TSLM(Barrels ~ trend() + fourier(K = 15)),
        K16 = TSLM(Barrels ~ trend() + fourier(K = 16)),
        K17 = TSLM(Barrels ~ trend() + fourier(K = 17)),
        K18 = TSLM(Barrels ~ trend() + fourier(K = 18)),
        K19 = TSLM(Barrels ~ trend() + fourier(K = 19)))
glance(gas_fit) %>% select(.model, r_squared, adj_r_squared, AICc)

## # A tibble: 9 × 4
##   .model r_squared adj_r_squared   AICc
##   <chr>      <dbl>         <dbl>  <dbl>
## 1 K11        0.851         0.846 -1885.
## 2 K12        0.852         0.847 -1884.
## 3 K13        0.852         0.846 -1881.
## 4 K14        0.852         0.846 -1879.
## 5 K15        0.853         0.846 -1876.
## 6 K16        0.853         0.846 -1872.
## 7 K17        0.853         0.846 -1871.
## 8 K18        0.854         0.846 -1868.
## 9 K19        0.854         0.846 -1864.

fourier_gas11 <- us_gas %>%
  model(TSLM(Barrels ~ trend() + fourier(K = 11)))
report(fourier_gas11)

## Series: Barrels 
## Model: TSLM 
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.896208 -0.177835  0.003786  0.173355  1.014511 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            7.095e+00  1.994e-02 355.717  < 2e-16 ***
## trend()                2.799e-03  4.756e-05  58.844  < 2e-16 ***
## fourier(K = 11)C1_52  -1.245e-01  1.407e-02  -8.845  < 2e-16 ***
## fourier(K = 11)S1_52  -2.394e-01  1.408e-02 -17.007  < 2e-16 ***
## fourier(K = 11)C2_52   4.529e-02  1.405e-02   3.223  0.00133 ** 
## fourier(K = 11)S2_52   9.256e-03  1.408e-02   0.657  0.51125    
## fourier(K = 11)C3_52   9.632e-02  1.408e-02   6.842 1.70e-11 ***
## fourier(K = 11)S3_52   3.762e-05  1.405e-02   0.003  0.99787    
## fourier(K = 11)C4_52   2.899e-02  1.408e-02   2.060  0.03980 *  
## fourier(K = 11)S4_52   2.884e-02  1.406e-02   2.052  0.04054 *  
## fourier(K = 11)C5_52  -3.354e-02  1.405e-02  -2.387  0.01725 *  
## fourier(K = 11)S5_52   3.172e-02  1.408e-02   2.253  0.02458 *  
## fourier(K = 11)C6_52  -6.569e-02  1.406e-02  -4.671 3.59e-06 ***
## fourier(K = 11)S6_52   2.808e-02  1.407e-02   1.996  0.04628 *  
## fourier(K = 11)C7_52  -2.235e-02  1.408e-02  -1.588  0.11283    
## fourier(K = 11)S7_52   3.274e-02  1.405e-02   2.330  0.02010 *  
## fourier(K = 11)C8_52  -1.664e-02  1.406e-02  -1.183  0.23704    
## fourier(K = 11)S8_52  -1.428e-03  1.407e-02  -0.102  0.91916    
## fourier(K = 11)C9_52  -1.763e-02  1.406e-02  -1.254  0.21019    
## fourier(K = 11)S9_52  -5.731e-04  1.407e-02  -0.041  0.96753    
## fourier(K = 11)C10_52  1.272e-02  1.407e-02   0.904  0.36621    
## fourier(K = 11)S10_52 -2.359e-02  1.406e-02  -1.678  0.09375 .  
## fourier(K = 11)C11_52 -2.837e-02  1.407e-02  -2.016  0.04414 *  
## fourier(K = 11)S11_52  2.555e-02  1.406e-02   1.817  0.06962 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.268 on 702 degrees of freedom
## Multiple R-squared: 0.851,   Adjusted R-squared: 0.8461
## F-statistic: 174.3 on 23 and 702 DF, p-value: < 2.22e-16

augment(fourier_gas11) %>%
  ggplot(aes(x = Week)) +
  geom_line(aes(y = Barrels, color = "Data")) +
  geom_line(aes(y = .fitted, color = "Fitted")) +
  labs(y = "Thousand of Barrels Per Day", title = "US Gas Harmonic Regression (K = 11)") +
  scale_color_manual(values = c(Data = "black", Fitted = "red"))

fourier_gas11 %>%
  forecast(h = 52) %>%
  autoplot(us_gas) +
  labs(y = "Thousand of Barrels Per Day", title = "US Gas Harmonic Regression Forecast (K = 11)")

c)

fourier_gas11 %>%
  gg_tsresiduals()

augment(fourier_gas11) %>%
  features(.resid, ljung_box, lag = 104, dof = 0)

## # A tibble: 1 × 3
##   .model                                    lb_stat lb_pvalue
##   <chr>                                       <dbl>     <dbl>
## 1 TSLM(Barrels ~ trend() + fourier(K = 11))    149.   0.00268

d)

gas_plot <- us_gas %>%
  model(
    Mean = MEAN(Barrels),
    Naive = NAIVE(Barrels),
    Seasonal_naive = SNAIVE(Barrels),
    Drift = RW(Barrels ~ drift())
  )
gas_plot %>%
  forecast(h = 52) %>%
  autoplot(us_gas) +
  labs(title = "Forecast of 2005 Gas Usage")

fourier_gas11 %>%
  forecast(h = 624) %>%
  autoplot(us_gasoline) +
  labs(title = "Forecast of 2005 Gas Usage using Linear Regression")

I believe the fourier forecast using K = 11 is the best forecast of the ones I checked. This forecast accounts for the increasing trend as well as changing and short seasonalities in the data. The other forecasting methods do not take these into account as well as the fourier, “harmonic”, forecast does.

Question 5

a)

afghan <- global_economy %>%
  filter(Country == "Afghanistan") %>%
  mutate(Population_mil = Population/1000000)

autoplot(afghan, Population_mil) +
  labs(title = "Afghanistan Population Through Time", y = "Population in Millions", x = "Year")

The Soviet-Afghan War had a great effect on Afghanistan’s population through the 1980s. The war caused the population to decrease well below the trend of continuing population. This is likely due to people dying from the war as well as people having less kids during wartime. After the war, the population growth was much higher than before. This could be due to people catching up on the kids they wish they could have had during the war.

b)

ols_afghan <- afghan %>% model(TSLM(Population_mil ~ Year))
report(ols_afghan)

## Series: Population_mil 
## Model: TSLM 
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.7945 -2.5826  0.7448  2.2592  6.0363 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -829.29253   49.73087  -16.68   <2e-16 ***
## Year           0.42577    0.02501   17.02   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.188 on 56 degrees of freedom
## Multiple R-squared: 0.8381,  Adjusted R-squared: 0.8352
## F-statistic: 289.9 on 1 and 56 DF, p-value: < 2.22e-16

ols_afghan %>%
  gg_tsresiduals()

pw_afghan <- afghan %>%
  model(piecewise = TSLM(Population_mil ~ trend(knots = c(1980,1989))))
report(pw_afghan)

## Series: Population_mil 
## Model: TSLM 
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.57759 -0.17420 -0.01678  0.18723  0.67995 
## 
## Coefficients:
##                                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                           8.697573   0.131122   66.33   <2e-16 ***
## trend(knots = c(1980, 1989))trend     0.224372   0.009623   23.32   <2e-16 ***
## trend(knots = c(1980, 1989))trend_21 -0.456803   0.024498  -18.65   <2e-16 ***
## trend(knots = c(1980, 1989))trend_30  1.082782   0.021418   50.55   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3009 on 54 degrees of freedom
## Multiple R-squared: 0.9986,  Adjusted R-squared: 0.9985
## F-statistic: 1.293e+04 on 3 and 54 DF, p-value: < 2.22e-16

c)

afghan_drift <- afghan %>%
  model(RW(Population_mil ~ drift())) %>%
  forecast(h = "5 years") %>%
  autoplot(afghan) +
  labs(title = "Drift Forecast of Afghanistan Population", y = "Population in Millions")
afghan_drift

afghan_naive <- afghan %>%
  model(NAIVE(Population_mil)) %>%
  forecast(h = "5 years") %>%
  autoplot(afghan) +
  labs(title = "Naive Forecast of Afghanistan Population", y = "Population in Millions")
afghan_naive

afghan_mean <- afghan %>%
  model(MEAN(Population_mil)) %>%
  forecast(h = "5 years") %>%
  autoplot(afghan) +
  labs(title = "Mean Forecast of Afghanistan Population", y = "Population in Millions")
afghan_mean

I believe the drift forecast is the best for this data. The drift accounts for the population growth rate (trend) that is likely to change much in the next five years, ceteris paribus. The mean puts too much emphasis on past population amounts affecting the future population which I believe does not happen. The naive is okay but doesn’t account for the trend of an increasing population since 1990.

Question 6

a)

Assumptions: 1- Linear, Correctly Specified, addative error term: This assumption says that our variables are linear, not exponential, our model has the necessary variables with no omitted variables, and we include a proper error term to account for variation in our dependent variable we cannot explain.

2- No Perfect Multicollinearity: This assumption says that none of our prediction variables are a perfect function of another prediction function. This will prevent us from running our regression. We also prefer our variables to not be too multicollinear, but if the are multicollinear but not perfectly it is not a big deal.

3- Sum of the error terms is 0: This assumption says that the population error mean is 0. This allows for our error term to be random (white-noise) and normally distributed around 0. Otherwise the forecast will be biased.

4- Error terms have constant variance: This assumption states that there is no heteroskedasticity. If there is homoskedasticity it can give biased standard errors for our coefficient making our variables look more statistically significant than they actually are.

5- No autocorrelation: This assumption states that there is no serial correlation. If there is serial correlation than our error terms are correlated to error terms in previous time periods. This means we have more variation in our dependent variable that can be explained.

6- Errors are uncorrelated to predictor variables: This assumption states that no predictor variables have a relationship or are correlated to our error term. If this assumption is broken then we have an omitted variable and there is more variation in our dependent variable to be explained by the omitted variables.

7- Errors follow a normal distribution: This assumption states that the errors are not biased. If the errors (remainders) are not normally distributed then there is an omitted variable that can help explain more variation in our dependent variables.

b)

When our model is an unbiased estimator it means that our model has the most accurate coefficients of the predictor variables possible. When the model is completely unbiased then the estimated coefficients (beta) equal the true coefficients. Being an unbiased estimator is not directly influenced by an increase in the sample size.

When our model is a consistent estimator it means that our model continues to get closer to the true coefficients for our prediction variables as our sample size increases. When our model is consistent we always want more and more observations to add to our sample.

c)

R squared is the amount of variation in the dependent variable that is explained by the predictor variables. R squared can never decrease mathematically. This gives an incentive to include as many variables as humanly possible for your model even when the variables make no sense in the model.

Adjusted R squaredis the amount of variation in the dependent variable that is explained by the predictor variables after accounting for degrees of freedom. This measure is more important than R squared because it can decrease. When the addition of a variable is not enough to outweigh the loss of a degree of freedom the adjusted R squared decreases. To maximize adjusted R squared you need to have all necessary variables to explain variation in the dependent variable.

Homework 4

Colin Adams

10/07/2022