DSC4812 Assessment 1 540690

The purpose of this document is to provide the answers to assignment 1.

Question 1: The use and application of nilometer forecasts in Pharaonic Egypt

The use of nilometers in Pharaonic Egypt was to measure the water levels of Nile River during its flooding season (2020, Mingren). The flooding season of the Nile River occurs between July and October (2016, Romeo).

Pharaonic nilometers were in temples where the temples were ordained over by a priest. Nilometers in temples were wells connected to the Nile River using a tunnel. These wells had stairs leading down to water levels so that the priests could make their readings. To read the water levels, the well walls had markings. These markings symbolised one of three water levels: highest, lowest or average level of the Nile River. Based on the water level readings during the flood season, farmers and prefects were able to make forecasts on key agricultural and therefore economic measures.

If water levels reached 19 cubits of water, then a flood was forecasted. If 12 cubits of water was measured, then famine was forecasted. Based on these measures, it was then forecasted that a water level of approximately 16 cubits, would lead to ideal crop output. Based on the water level readings from the nilometers, priests were able to forecast crop output. This was due to fertile silts that the flood waters would leave behind as the water receded. If the water levels were low, silts were inadequate, and famine was forecasted. If the water levels were too high, infrastructure damage would result in crop damage and therefore impacted crop output. They were also able to forecast a cycle i.e. every 5 years water levels would either be peaks or troughs.

Given the forecasted crop outputs, revenues were then able to be forecasted. If they forecasted crop output to be low, then taxes would be low and therefore, revenue would be forecasted to be very low. For adequate water levels, taxes would be high and therefore adequate crop output would result in high revenue to be forecasted.

Having seen the uses and applications of forecasting in ancient Egypt, it can now be applied to South Africa’s GDP between 1960 and 2024.

Question 2: GDP at market prices (constant) in South Africa between 1960 and 2024.

Describing the data

The best way to describe the data is to consider the trend, seasonality and cyclicness of the data.The quarterly South African GDP data between 1960 and 2024 shows a strong increasing trend. Furthermore, the GDP data shows strong seasonality within each year as the GDP in quarter 4 is the highest for most years. This is often due to festive season spending and manufacturing. The data is cyclic as there are periods where the GDP will increase and/or decrease at random periods of time. Generally associated to economic conditions. For example, there is a decrease in GDP in 2020 for all quarters due to the COVID-19 pandemic.

Loading the data

# Reading in a .csv file
library(fpp3)

## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr

## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.1 ──

## ✔ tibble      3.2.1     ✔ tsibble     1.1.6
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.1     ✔ feasts      0.4.1
## ✔ lubridate   1.9.4     ✔ fable       0.4.1
## ✔ ggplot2     3.5.1

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

gdp <- readr::read_csv("SA_GDP.csv")

## Rows: 259 Columns: 2

## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Date
## dbl (1): GDP
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

gdp

## # A tibble: 259 × 2
##    Date        GDP
##    <chr>     <dbl>
##  1 2024 Q3 4652690
##  2 2024 Q2 4659509
##  3 2024 Q1 4645542
##  4 2023 Q4 4642859
##  5 2023 Q3 4626804
##  6 2023 Q2 4645950
##  7 2023 Q1 4614117
##  8 2022 Q4 4585212
##  9 2022 Q3 4650164
## 10 2022 Q2 4561323
## # ℹ 249 more rows

Converting to tsibble object

gdp  <- gdp |> mutate(Quarter = yearquarter(Date), GDP = GDP/1000)|>
  select(Quarter, GDP, -Date)|>
  as_tsibble(index = Quarter)
gdp

## # A tsibble: 259 x 2 [1Q]
##    Quarter   GDP
##      <qtr> <dbl>
##  1 1960 Q1  817.
##  2 1960 Q2  831.
##  3 1960 Q3  846.
##  4 1960 Q4  854.
##  5 1961 Q1  861.
##  6 1961 Q2  857.
##  7 1961 Q3  876.
##  8 1961 Q4  884.
##  9 1962 Q1  907.
## 10 1962 Q2  913.
## # ℹ 249 more rows

Plotting the time series

# Time plot
autoplot(gdp,GDP) +
  labs(title = "GDP at constant market prices per quarter from 1960-2024",y = "GDP in Rands ('000)")

The time plot confirms a strong increasing trend. There is some seasonality shown in the data i.e. Q4 GDP being the highest for most years. The data also has some cyclic features. For example, GDP decreased between Q2 1992 and Q3 1993.

# Seasonal Plot
gdp |> gg_season(GDP) +
  labs(title = "GDP at constant market prices per quarter from 1960-2024",y = "GDP in Rands ('000)")

The seasonal plot illustrates that GDP increases each quarter with Q4 recording the highest GDP for the year.

# Seasonal sub-series plot
gdp |> gg_subseries(GDP) +
  labs(title = "GDP at constant market prices per quarter from 1960-2024",y = "GDP in Rands ('000)")

The sub-series plot does not show anything revealing other than that the average GDP in Q3 is slightly higher than that of Q4.

Scatter Plot

To plot the Scatter Plot of the GDP time series is pointless as there are no other time series in the data set whose relationship can be explored relative to the GDP time series.

# Lag Plot
gdp |> 
  gg_lag(GDP,geom = "point") +
  labs(x = "lag(GDP,k)", y = "GDP ('000)")

The lag plots shows that the relationship is very strong for all lags. Therefore, there is strong seasonality in the data.

Question 3: Monthly Australian Retail data

set.seed(59047623)
aus_retail1 <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))
aus_retail1

## # A tsibble: 441 x 5 [1M]
## # Key:       State, Industry [1]
##    State           Industry                        `Series ID`    Month Turnover
##    <chr>           <chr>                           <chr>          <mth>    <dbl>
##  1 South Australia Pharmaceutical, cosmetic and t… A3349500K   1982 Apr      7.2
##  2 South Australia Pharmaceutical, cosmetic and t… A3349500K   1982 May      7.5
##  3 South Australia Pharmaceutical, cosmetic and t… A3349500K   1982 Jun      7.5
##  4 South Australia Pharmaceutical, cosmetic and t… A3349500K   1982 Jul      7.9
##  5 South Australia Pharmaceutical, cosmetic and t… A3349500K   1982 Aug      8.3
##  6 South Australia Pharmaceutical, cosmetic and t… A3349500K   1982 Sep      7.8
##  7 South Australia Pharmaceutical, cosmetic and t… A3349500K   1982 Oct      8.4
##  8 South Australia Pharmaceutical, cosmetic and t… A3349500K   1982 Nov      8.8
##  9 South Australia Pharmaceutical, cosmetic and t… A3349500K   1982 Dec     11.1
## 10 South Australia Pharmaceutical, cosmetic and t… A3349500K   1983 Jan      8.1
## # ℹ 431 more rows

The time series selected is for state of South Australia, the pharmaceutical, cosmetic and toiletries industries with the series ID A3349500K.

Now to plot the time series that was selected.

# Time Plot
autoplot(aus_retail1,Turnover) +
  labs(title = "South Australia sales for pharmaceutical, cosmetic and toiletries industry", y = "Turnover")

The time plot illustrates a strong increasing trend with strong seasonality due to increase in sales during the December months.

# Seasonal Plot
aus_retail1 |> gg_season(Turnover) +
  labs(title = "South Australia sales for pharmaceutical, cosmetic and toiletries industry", y = "Turnover")

Sales in this industry for South Australia is at its highest during the December month. This possibly due to festive period shopping.

#Seasonal sub-series plots
aus_retail1 |> gg_subseries(Turnover) +
  labs(title = "South Australia sales for pharmaceutical, cosmetic and toiletries industry", y = "Turnover")

The sub-series plot illustrates strong seasonality during the December period.

# Lag Plots
aus_retail1 |> gg_lag(Turnover,geom = "point") +
  labs( y = "Pharmaceutical, cosmetic and toiletries", x = "lags(Pharmaceutical, cosmetic and toiletries,k)")

The lag plot proves that there is strong seasonality present in the time series due to the relationshoip being strong for all lags.

# Correlogram
aus_retail1 |> ACF(Turnover, lag_max = 36) |>
  autoplot() +
  labs(title = "South Australia sales for pharmaceutical, cosmetic and toiletries industry", y = "ACF")

The correlogram confirms that the time series has trend as the smaller lags are large and positive and it slowly decreases as the lags increase. The time series is also seasonal because the ACF is at its largest on the December lags i.e. 12, 24 and 36. Therefore. the time series has both trend and seasonality.

Question 4: Australian Tourism

Pairwise Plots for South Australia

The state chosen to perform time series feature analysis on is South Australia. Below is the features of the time series for South Australia.

library(feasts)
library(urca)
library(fracdiff)

tourism_feats <- tourism |>
  features(Trips, feature_set(pkgs = "feasts"))|>
  filter(State == "South Australia")
tourism_feats

## # A tibble: 48 × 51
##    Region State Purpose trend_strength seasonal_strength_year seasonal_peak_year
##    <chr>  <chr> <chr>            <dbl>                  <dbl>              <dbl>
##  1 Adela… Sout… Busine…          0.464                  0.407                  3
##  2 Adela… Sout… Holiday          0.554                  0.619                  1
##  3 Adela… Sout… Other            0.746                  0.202                  2
##  4 Adela… Sout… Visiti…          0.435                  0.452                  1
##  5 Adela… Sout… Busine…          0.464                  0.179                  3
##  6 Adela… Sout… Holiday          0.528                  0.296                  2
##  7 Adela… Sout… Other            0.593                  0.404                  2
##  8 Adela… Sout… Visiti…          0.488                  0.254                  0
##  9 Baros… Sout… Busine…          0.490                  0.158                  0
## 10 Baros… Sout… Holiday          0.266                  0.194                  2
## # ℹ 38 more rows
## # ℹ 45 more variables: seasonal_trough_year <dbl>, spikiness <dbl>,
## #   linearity <dbl>, curvature <dbl>, stl_e_acf1 <dbl>, stl_e_acf10 <dbl>,
## #   acf1 <dbl>, acf10 <dbl>, diff1_acf1 <dbl>, diff1_acf10 <dbl>,
## #   diff2_acf1 <dbl>, diff2_acf10 <dbl>, season_acf1 <dbl>, pacf5 <dbl>,
## #   diff1_pacf5 <dbl>, diff2_pacf5 <dbl>, season_pacf <dbl>,
## #   zero_run_mean <dbl>, nonzero_squared_cv <dbl>, zero_start_prop <dbl>, …

To show all the features that involve seasonality, a pairwise plot is presented below.

library(glue)
library(GGally)

## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2

tourism_feats |>
  select_at(vars(contains("season"), Purpose)) |>
  mutate(
    seasonal_peak_year = seasonal_peak_year +
      4*(seasonal_peak_year==0),
    seasonal_trough_year = seasonal_trough_year +
      4*(seasonal_trough_year==0),
    seasonal_peak_year = glue("Q{seasonal_peak_year}"),
    seasonal_trough_year = glue("Q{seasonal_trough_year}"),
  ) |>
  GGally::ggpairs(mapping = aes(colour = Purpose),
          diag = list(discrete = wrap("barDiag", position = "stack")),
          upper = list(continuous = wrap("cor", size = 3)),
          lower = list(continuous = wrap("points", alpha = 0.6, size = 1.5))
         )

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

The principal component analysis for the state of South Australia is provided below and yields some interesting result.

library(broom)
pcs <- tourism_feats |>
  select(-State, -Region, -Purpose) |>
  prcomp(scale = TRUE) |>
  augment(tourism_feats)
pcs |>
  ggplot(aes(x = .fittedPC1, y = .fittedPC2, col = Purpose)) +
  geom_point() +
  theme(aspect.ratio = 1)

All the time series are in the first half of the plot. Furthermore, the holiday time series appears to be more to the left than compared to the rest of the other time series.

To determine if there is any seasonality or trend in the Adelaide region in South Australia, a seasonal and trend strength analysis has to be performed.

tourism |>
  features(Trips, feat_stl) |>
  filter(State == "South Australia", Region == "Adelaide") |>
  ggplot(aes(x = trend_strength, y = seasonal_strength_year,
             col = Purpose)) +
  geom_point() +
  facet_wrap(vars(Region))

The holiday time series has strong seasonality.

Question 5: Australian Tourism

Using the seasonal naive method to forecast electricity production in Australia

# Selecting electricity production data
electricity <- aus_production |> 
  select(Electricity)
electricity

## # A tsibble: 218 x 2 [1Q]
##    Electricity Quarter
##          <dbl>   <qtr>
##  1        3923 1956 Q1
##  2        4436 1956 Q2
##  3        4806 1956 Q3
##  4        4418 1956 Q4
##  5        4339 1957 Q1
##  6        4811 1957 Q2
##  7        5259 1957 Q3
##  8        4735 1957 Q4
##  9        4608 1958 Q1
## 10        5196 1958 Q2
## # ℹ 208 more rows

# Fit a seasonal naive model to quarterly electricity production data in Australia
electricity_model <- electricity |>  model(SNAIVE(Electricity ~ lag("year")))
electricity_model

## # A mable: 1 x 1
##   `SNAIVE(Electricity ~ lag("year"))`
##                               <model>
## 1                            <SNAIVE>

# Forecast electricity production in Australia for 1 year (4 quarters)
electricity_forecast <- electricity_model |> forecast(h = "1 year")
electricity_forecast

## # A fable: 4 x 4 [1Q]
## # Key:     .model [1]
##   .model                                Quarter
##   <chr>                                   <qtr>
## 1 "SNAIVE(Electricity ~ lag(\"year\"))" 2010 Q3
## 2 "SNAIVE(Electricity ~ lag(\"year\"))" 2010 Q4
## 3 "SNAIVE(Electricity ~ lag(\"year\"))" 2011 Q1
## 4 "SNAIVE(Electricity ~ lag(\"year\"))" 2011 Q2
## # ℹ 2 more variables: Electricity <dist>, .mean <dbl>

# Plotting actual versus forecast
electricity_forecast |>
  autoplot(aus_production, level = NULL) +
  labs(title = "Electricity Production in Australia", y = "Gigawatt hours")

# Checking if the residuals look like white noise
# Graphical check
electricity_model |> gg_tsresiduals()

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 4 rows containing non-finite outside the scale range
## (`stat_bin()`).

# Using the Ljung-Box test
augment_electricity <- augment(electricity_model) |>
  features(.innov,ljung_box,lag = 8)
augment_electricity

## # A tibble: 1 × 3
##   .model                                lb_stat lb_pvalue
##   <chr>                                   <dbl>     <dbl>
## 1 "SNAIVE(Electricity ~ lag(\"year\"))"    86.2  2.78e-15

The ACF of the residuals illustrates that there is no white noise since some autocorrelations are not close to 0. This is confirmed by performing the Ljung-Box test, where the p-value is extremely small (less than 0.05) suggesting that there is no white noise, and therefore there is information in the residuals that should be included in the model.

Question 7: Libya Economy

# Selecting a country's population to model and forecast
libya <- global_economy |>
  filter(Country == "Libya") |>
  mutate(Population = Population/1000) |>
  select(Year, Population)
libya

## # A tsibble: 58 x 2 [1Y]
##     Year Population
##    <dbl>      <dbl>
##  1  1960      1448.
##  2  1961      1498.
##  3  1962      1551.
##  4  1963      1607.
##  5  1964      1668.
##  6  1965      1733.
##  7  1966      1804.
##  8  1967      1879.
##  9  1968      1959.
## 10  1969      2044.
## # ℹ 48 more rows

# Visualising the data
autoplot(libya,Population) +
  labs(title = "Population of Libya per year", y = "Population ('000)")

# The population of the Libya shows strong linear trend. There is no seasonality as the data is annual data.

# ACF features of the data
libya |> features(Population,feat_acf)

## # A tibble: 1 × 6
##    acf1 acf10 diff1_acf1 diff1_acf10 diff2_acf1 diff2_acf10
##   <dbl> <dbl>      <dbl>       <dbl>      <dbl>       <dbl>
## 1 0.959  5.83      0.950        2.76      0.829        1.27

# The first lag, acf1, of 0.95 suggests that the population of Libya is trended time series since the first lag is large and positive. This is a feature of time series with a trend.

# Applying a linear time series model to the Population of Libya and generating the forecast for 5 years
library(fable)
libya_model <- libya |> model(trend_model = TSLM(Population ~ trend()))
libya_model

## # A mable: 1 x 1
##   trend_model
##       <model>
## 1      <TSLM>

libya_forecast <- libya_model |> forecast(h = "5 years")
libya_forecast

## # A fable: 5 x 4 [1Y]
## # Key:     .model [1]
##   .model  Year     Population
##   <chr>  <dbl>         <dist>
## 1 trend…  2018 N(6942, 35485)
## 2 trend…  2019 N(7039, 35607)
## 3 trend…  2020 N(7135, 35734)
## 4 trend…  2021 N(7232, 35864)
## 5 trend…  2022 N(7329, 35999)
## # ℹ 1 more variable: .mean <dbl>

# Current available data for Libya's population between 2018 and 2022 are:
# Year  Population ('000)
# 2018  6849
# 2019  6951
# 2020  7045
# 2021  7135
# 2022  7224

# The forecasted population for Libya relative to the actual population for the same period of the forecast, shows that the forecast is quite accurate. For 2018, the difference between the forecasted and actual population is 93,000.

Question 8: Libya Population Forecast using more recent data

libya_current <- readr::read_csv2("libya_current.csv")

## ℹ Using "','" as decimal and "'.'" as grouping mark. Use `read_delim()` for more control.

## Rows: 6 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ";"
## dbl (2): Year, Population
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

libya_current

## # A tibble: 6 × 2
##    Year Population
##   <dbl>      <dbl>
## 1  2018    6849055
## 2  2019    6951033
## 3  2020    7045399
## 4  2021    7135175
## 5  2022    7223805
## 6  2023    7305659

# Converting current data into a tsibble
libya_current <- libya_current |>
  mutate(Population = Population/1000) |>
  as_tsibble(index = Year, )
libya_current

## # A tsibble: 6 x 2 [1Y]
##    Year Population
##   <dbl>      <dbl>
## 1  2018      6849.
## 2  2019      6951.
## 3  2020      7045.
## 4  2021      7135.
## 5  2022      7224.
## 6  2023      7306.

# Merging existng and current data
libya_current_population <- libya |>
  bind_rows(libya_current)
libya_current_population

## # A tsibble: 64 x 2 [1Y]
##     Year Population
##    <dbl>      <dbl>
##  1  1960      1448.
##  2  1961      1498.
##  3  1962      1551.
##  4  1963      1607.
##  5  1964      1668.
##  6  1965      1733.
##  7  1966      1804.
##  8  1967      1879.
##  9  1968      1959.
## 10  1969      2044.
## # ℹ 54 more rows

# Producing a forecast using mean and naive methods
libya_model_current <- libya_current_population |>
  model(
    Mean = MEAN(Population),
    `Naïve` = NAIVE(Population)
  )
libya_model_current

## # A mable: 1 x 2
##      Mean   Naïve
##   <model> <model>
## 1  <MEAN> <NAIVE>

# Generate forecasts for 2 years
libya_forecast_current <- libya_model_current |>
  forecast(h = "2 years")
libya_forecast_current

## # A fable: 4 x 4 [1Y]
## # Key:     .model [2]
##   .model  Year       Population
##   <chr>  <dbl>           <dist>
## 1 Mean    2024 N(4372, 3265851)
## 2 Mean    2025 N(4372, 3265851)
## 3 Naïve   2024   N(7306, 11882)
## 4 Naïve   2025   N(7306, 23763)
## # ℹ 1 more variable: .mean <dbl>

# Plot the forecasts
libya_forecast_current |>
  autoplot(libya_current_population) +
  labs(title = "Libya's Population Forecast for 2 Years" , y = "Population ('000)")

# Table Predictor Intervals
libya_predictor <- libya_forecast_current |>
  hilo() |>
  unpack_hilo(c(`80%`, `95%`))
libya_predictor

## # A tsibble: 4 x 8 [1Y]
## # Key:       .model [2]
##   .model  Year       Population
##   <chr>  <dbl>           <dist>
## 1 Mean    2024 N(4372, 3265851)
## 2 Mean    2025 N(4372, 3265851)
## 3 Naïve   2024   N(7306, 11882)
## 4 Naïve   2025   N(7306, 23763)
## # ℹ 5 more variables: .mean <dbl>, `80%_lower` <dbl>, `80%_upper` <dbl>,
## #   `95%_lower` <dbl>, `95%_upper` <dbl>

print(libya_predictor, n = Inf, width = Inf)

## # A tsibble: 4 x 8 [1Y]
## # Key:       .model [2]
##   .model  Year       Population .mean `80%_lower` `80%_upper` `95%_lower`
##   <chr>  <dbl>           <dist> <dbl>       <dbl>       <dbl>       <dbl>
## 1 Mean    2024 N(4372, 3265851) 4372.       2056.       6687.        830.
## 2 Mean    2025 N(4372, 3265851) 4372.       2056.       6687.        830.
## 3 Naïve   2024   N(7306, 11882) 7306.       7166.       7445.       7092.
## 4 Naïve   2025   N(7306, 23763) 7306.       7108.       7503.       7004.
##   `95%_upper`
##         <dbl>
## 1       7913.
## 2       7913.
## 3       7519.
## 4       7608.

Question 9: Electricity Demand for Vistoria, Australia

# Extracting 2012 data and aggregating demand and temperatures
daily_vic_elec <- vic_elec |>
  filter(year(Time) == 2012) |>
  index_by(Date = as_date(Time)) |>
  summarise(
    Demand = sum(Demand),
    Average_Temperature = mean(Temperature)
  )
daily_vic_elec

## # A tsibble: 366 x 3 [1D]
##    Date        Demand Average_Temperature
##    <date>       <dbl>               <dbl>
##  1 2012-01-01 222438.                25.3
##  2 2012-01-02 257965.                30.7
##  3 2012-01-03 267099.                26.5
##  4 2012-01-04 222742.                21.0
##  5 2012-01-05 210585.                17.5
##  6 2012-01-06 210247.                18.7
##  7 2012-01-07 202526.                22.8
##  8 2012-01-08 193413.                22.3
##  9 2012-01-09 213804.                19.3
## 10 2012-01-10 215020.                17.1
## # ℹ 356 more rows

# Plotting the relationship using a scatter plot and a fitted regression line
daily_vic_elec |> ggplot(aes(x = Average_Temperature, y= Demand)) +
  geom_point() +
  labs(title = "2012 Electricity Demand vs. Average Temperature", x = "Average Temperature") +
  geom_smooth(method = "lm", se = FALSE)

## `geom_smooth()` using formula = 'y ~ x'

# Finding a linear regression
daily_elect_model <- daily_vic_elec |>
  model(TSLM(Demand ~ Average_Temperature)) |>
  report()

## Series: Demand 
## Model: TSLM 
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -65753.6 -13799.4   -501.8  19007.6  70883.5 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         232130.9     4432.0  52.376   <2e-16 ***
## Average_Temperature   -300.3      266.4  -1.127     0.26    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 24080 on 364 degrees of freedom
## Multiple R-squared: 0.00348, Adjusted R-squared: 0.0007423
## F-statistic: 1.271 on 1 and 364 DF, p-value: 0.26029

# Producing a residual plot
augment(daily_elect_model) |>
  ggplot(aes(x = .fitted, y = .innov)) +
  geom_point() + labs(x = "Fitted", y = "Residuals")

# The innovation residuals seems to have a curve. This suggests that the relationship is non-linear.