Forecasting: Principles and Practice Chapter 7 Exercises

Half-hourly electricity demand for Victoria, Australia is contained in vic_elec. Extract the January 2014 electricity demand, and aggregate this data to daily with daily total demands and maximum temperatures.

jan14_vic_elec <- vic_elec |>
  filter(yearmonth(Time) == yearmonth("2014 Jan")) |>
  index_by(Date = as_date(Time)) |>
  summarise(
    Demand = sum(Demand),
    Temperature = max(Temperature)
  )

Plot the data and find the regression model for Demand with temperature as a predictor variable. Why is there a positive relationship?

jan14_vic_elec |> 
  pivot_longer(2:3, names_to="key", values_to="value") |>
  autoplot(.vars = value) +
  facet_grid(vars(key), scales = "free_y")

There is a positive linear relationship between the demand and the temperature. Inferring that with the increase in temperature, this causes more people to turn on their AC.

fit <- jan14_vic_elec |> 
  model(TSLM(Demand ~ Temperature))

fit |> report()

## Series: Demand 
## Model: TSLM 
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -49978.2 -10218.9   -121.3  18533.2  35440.6 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  59083.9    17424.8   3.391  0.00203 ** 
## Temperature   6154.3      601.3  10.235 3.89e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 24540 on 29 degrees of freedom
## Multiple R-squared: 0.7832,  Adjusted R-squared: 0.7757
## F-statistic: 104.7 on 1 and 29 DF, p-value: 3.8897e-11

Produce a residual plot. Is the model adequate? Are there any outliers or influential observations?

fit |> gg_tsresiduals()

The distributions of the residuals seems normal but there is a massive increase in the afc around lag 12. This and the lags from 1-12 suggest there is a seasonal pattern in the data.

Use the model to forecast the electricity demand that you would expect for the next day if the maximum temperature was 15∘C and compare it with the forecast if the with maximum temperature was 35∘C. Do you believe these forecasts? The following R code will get you started:

demand35 <- jan14_vic_elec |>
  model(TSLM(Demand ~ Temperature)) |>
  forecast(
    new_data(jan14_vic_elec, 1) |>
      mutate(Temperature = 35)
  ) |>
  autoplot(jan14_vic_elec)

demand35

## Warning: Computation failed in `stat_interval()`.
## Caused by error in `trans$transform()`:
## ! `transform_date()` works with objects of class <Date> only

## Warning in min(x, na.rm = na.rm): no non-missing arguments to min; returning
## Inf

## Warning in max(x, na.rm = na.rm): no non-missing arguments to max; returning
## -Inf

## Warning in min(x, na.rm = na.rm): no non-missing arguments to min; returning
## Inf

## Warning in max(x, na.rm = na.rm): no non-missing arguments to max; returning
## -Inf

## Warning in min(x, na.rm = na.rm): no non-missing arguments to min; returning
## Inf

## Warning in max(x, na.rm = na.rm): no non-missing arguments to max; returning
## -Inf

I do believe this forecast is realistic however, the forecast for the initial 15c looked too low so it could be not as accurate for lower temperatures.

Give prediction intervals for your forecasts.

# fit <- lm(Demand ~ Temperature, data=jan14_vic_elec)

# forecast(fit, newdata=data.frame(Temperature=c(15,35)))

E. Plot Demand vs Temperature for all of the available data in vic_elec aggregated to daily total demand and maximum temperature. What does this say about your model?

plot(Demand~Temperature, data=vic_elec, main="Demand vs. Temperature")

This shows that the model picks up well on the temperature increases and the associated demand. However, when it does to lower temperatures, it has the potential to be less accurate.

Data set olympic_running contains the winning times (in seconds) in each Olympic Games sprint, middle-distance and long-distance track events from 1896 to 2016.

olympic_running

## # A tsibble: 312 x 4 [4Y]
## # Key:       Length, Sex [14]
##     Year Length Sex    Time
##    <int>  <int> <chr> <dbl>
##  1  1896    100 men    12  
##  2  1900    100 men    11  
##  3  1904    100 men    11  
##  4  1908    100 men    10.8
##  5  1912    100 men    10.8
##  6  1916    100 men    NA  
##  7  1920    100 men    10.8
##  8  1924    100 men    10.6
##  9  1928    100 men    10.8
## 10  1932    100 men    10.3
## # ℹ 302 more rows

ggplot(olympic_running, aes(x = Year, y = Time, colour = Sex)) +
  geom_line() +
  geom_point(size = 1) +
  facet_wrap(~Length, scales = "free_y", nrow =2) +
  theme_minimal() +
  scale_color_brewer(palette = "Dark2") +
  theme(legend.position = "bottom", legend.title = element_blank())

## Warning: Removed 31 rows containing missing values or values outside the scale range
## (`geom_point()`).

Fit a regression line to the data for each event. Obviously the winning times have been decreasing, but at what average rate per year?

fit <- olympic_running |>
  model(TSLM(Time ~ trend()))

tidy(fit)

## # A tibble: 28 × 8
##    Length Sex   .model               term  estimate std.error statistic  p.value
##     <int> <chr> <chr>                <chr>    <dbl>     <dbl>     <dbl>    <dbl>
##  1    100 men   TSLM(Time ~ trend()) (Int…  11.1      0.0909     123.   1.86e-37
##  2    100 men   TSLM(Time ~ trend()) tren…  -0.0504   0.00479    -10.5  7.24e-11
##  3    100 women TSLM(Time ~ trend()) (Int…  12.4      0.163       76.0  4.58e-25
##  4    100 women TSLM(Time ~ trend()) tren…  -0.0567   0.00749     -7.58 3.72e- 7
##  5    200 men   TSLM(Time ~ trend()) (Int…  22.3      0.140      159.   4.06e-39
##  6    200 men   TSLM(Time ~ trend()) tren…  -0.0995   0.00725    -13.7  3.80e-13
##  7    200 women TSLM(Time ~ trend()) (Int…  25.5      0.525       48.6  8.34e-19
##  8    200 women TSLM(Time ~ trend()) tren…  -0.135    0.0227      -5.92 2.17e- 5
##  9    400 men   TSLM(Time ~ trend()) (Int…  50.3      0.445      113.   1.53e-36
## 10    400 men   TSLM(Time ~ trend()) tren…  -0.258    0.0235     -11.0  2.75e-11
## # ℹ 18 more rows

The data set souvenirs concerns the monthly sales figures of a shop which opened in January 1987 and sells gifts, souvenirs, and novelties. The shop is situated on the wharf at a beach resort town in Queensland, Australia. The sales volume varies with the seasonal population of tourists. There is a large influx of visitors to the town at Christmas and for the local surfing festival, held every March since 1988. Over time, the shop has expanded its premises, range of products, and staff.

souvenirs

## # A tsibble: 84 x 2 [1M]
##       Month Sales
##       <mth> <dbl>
##  1 1987 Jan 1665.
##  2 1987 Feb 2398.
##  3 1987 Mar 2841.
##  4 1987 Apr 3547.
##  5 1987 May 3753.
##  6 1987 Jun 3715.
##  7 1987 Jul 4350.
##  8 1987 Aug 3566.
##  9 1987 Sep 5022.
## 10 1987 Oct 6423.
## # ℹ 74 more rows

souvenirs |> autoplot()

## Plot variable not specified, automatically selected `.vars = Sales`

souvenirs |> acf()

We can see a clear seasonal inrease in sales due to the influx of people around Christmas or Q4. there is a slight uptick also during the summer months. The current model has a positive trend over the amount of observations.