class 4 - may 16th

R Markdown

For more details on using R Markdown see http://rmarkdown.rstudio.com. When you click the Knit button a document will be generated..

step 1: grab the file

library(fpp3)

## Warning: package 'fpp3' was built under R version 4.3.3

## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr

## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.1 ──

## ✔ tibble      3.2.1     ✔ tsibble     1.1.6
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.1     ✔ feasts      0.4.1
## ✔ lubridate   1.9.4     ✔ fable       0.4.1
## ✔ ggplot2     3.5.2

## Warning: package 'tibble' was built under R version 4.3.3

## Warning: package 'dplyr' was built under R version 4.3.3

## Warning: package 'tidyr' was built under R version 4.3.3

## Warning: package 'lubridate' was built under R version 4.3.3

## Warning: package 'tsibble' was built under R version 4.3.3

## Warning: package 'tsibbledata' was built under R version 4.3.3

## Warning: package 'feasts' was built under R version 4.3.3

## Warning: package 'fabletools' was built under R version 4.3.3

## Warning: package 'fable' was built under R version 4.3.3

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

us_change

## # A tsibble: 198 x 6 [1Q]
##    Quarter Consumption Income Production Savings Unemployment
##      <qtr>       <dbl>  <dbl>      <dbl>   <dbl>        <dbl>
##  1 1970 Q1       0.619  1.04      -2.45    5.30         0.9  
##  2 1970 Q2       0.452  1.23      -0.551   7.79         0.5  
##  3 1970 Q3       0.873  1.59      -0.359   7.40         0.5  
##  4 1970 Q4      -0.272 -0.240     -2.19    1.17         0.700
##  5 1971 Q1       1.90   1.98       1.91    3.54        -0.100
##  6 1971 Q2       0.915  1.45       0.902   5.87        -0.100
##  7 1971 Q3       0.794  0.521      0.308  -0.406        0.100
##  8 1971 Q4       1.65   1.16       2.29   -1.49         0    
##  9 1972 Q1       1.31   0.457      4.15   -4.29        -0.200
## 10 1972 Q2       1.89   1.03       1.89   -4.69        -0.100
## # ℹ 188 more rows

tail(us_change)

## # A tsibble: 6 x 6 [1Q]
##   Quarter Consumption Income Production Savings Unemployment
##     <qtr>       <dbl>  <dbl>      <dbl>   <dbl>        <dbl>
## 1 2018 Q1       0.417  1.67       0.564 17.4          -0.100
## 2 2018 Q2       0.983  0.662      1.12  -2.72          0    
## 3 2018 Q3       0.853  0.806      1.26  -0.0857       -0.300
## 4 2018 Q4       0.357  0.695      0.948  5.03          0.200
## 5 2019 Q1       0.283  1.10      -0.488  9.76         -0.100
## 6 2019 Q2       1.11   0.593     -0.540 -4.26         -0.100

# ?us_change

step 2: Create a plot between the independent variable and the predictor consumption-y and income-x

us_change |>
  ggplot(aes(x = Income, y = Consumption)) +
  labs(y = "Consumption (quarterly % change)",
       x = "Income (quarterly % change)") +
geom_point() + 
# plot the regression line
geom_smooth(method = "lm")

## `geom_smooth()` using formula = 'y ~ x'

step 3: model with one predictor

us_change |>
  model(TSLM(Consumption ~ Income)) |>
  report()

## Series: Consumption 
## Model: TSLM 
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.58236 -0.27777  0.01862  0.32330  1.42229 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.54454    0.05403  10.079  < 2e-16 ***
## Income       0.27183    0.04673   5.817  2.4e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5905 on 196 degrees of freedom
## Multiple R-squared: 0.1472,  Adjusted R-squared: 0.1429
## F-statistic: 33.84 on 1 and 196 DF, p-value: 2.4022e-08

from above we see that the p-values for intercept and Income are very small and therefore significant so there is a linear relationship. In statistics, adjusted R-squared should be as close to 100% as possible and the F-statistic should be as high as possible

step 4: model TSLM with all predictors TSLM is used to fit linear models to time series data and incorporates trend and seasonality

us_change |>
  model(TSLM(Consumption ~ Income + Production + Savings + Unemployment)) |>
  report()

## Series: Consumption 
## Model: TSLM 
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.90555 -0.15821 -0.03608  0.13618  1.15471 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   0.253105   0.034470   7.343 5.71e-12 ***
## Income        0.740583   0.040115  18.461  < 2e-16 ***
## Production    0.047173   0.023142   2.038   0.0429 *  
## Savings      -0.052890   0.002924 -18.088  < 2e-16 ***
## Unemployment -0.174685   0.095511  -1.829   0.0689 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3102 on 193 degrees of freedom
## Multiple R-squared: 0.7683,  Adjusted R-squared: 0.7635
## F-statistic:   160 on 4 and 193 DF, p-value: < 2.22e-16

from above we see that the first 4 are significant. However, Production is significant only at 95%

step 5: different way of coding step above -save model in object

fit_consMR <- us_change |>
  model(tslm = TSLM(Consumption ~ Income + Production + 
                      Savings + Unemployment))
report(fit_consMR)

## Series: Consumption 
## Model: TSLM 
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.90555 -0.15821 -0.03608  0.13618  1.15471 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   0.253105   0.034470   7.343 5.71e-12 ***
## Income        0.740583   0.040115  18.461  < 2e-16 ***
## Production    0.047173   0.023142   2.038   0.0429 *  
## Savings      -0.052890   0.002924 -18.088  < 2e-16 ***
## Unemployment -0.174685   0.095511  -1.829   0.0689 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3102 on 193 degrees of freedom
## Multiple R-squared: 0.7683,  Adjusted R-squared: 0.7635
## F-statistic:   160 on 4 and 193 DF, p-value: < 2.22e-16

step 6: Augment function shows details of model

augment(fit_consMR) |> head()

## # A tsibble: 6 x 6 [1Q]
## # Key:       .model [1]
##   .model Quarter Consumption .fitted  .resid  .innov
##   <chr>    <qtr>       <dbl>   <dbl>   <dbl>   <dbl>
## 1 tslm   1970 Q1       0.619   0.474  0.145   0.145 
## 2 tslm   1970 Q2       0.452   0.635 -0.183  -0.183 
## 3 tslm   1970 Q3       0.873   0.931 -0.0583 -0.0583
## 4 tslm   1970 Q4      -0.272  -0.212 -0.0603 -0.0603
## 5 tslm   1971 Q1       1.90    1.64   0.264   0.264 
## 6 tslm   1971 Q2       0.915   1.07  -0.158  -0.158

step 7: a) do residual plots, b) do against predictors, c) against fitted value

answer 1st plot - residuals over time 2nd plot - ACF and is pretty good - only one over limit so there is no strong correlation to residuals 3rd plot - residuals close to normal distribution and centred in middle

fit_consMR |>
  gg_tsresiduals()

step 7b: plot residuals against each predictor we see that the residuals focus most near zero - which is good and they seem random and no strong correlations y is the residual

us_change |>
  # join the residual to the original data
  left_join(residuals(fit_consMR), by = "Quarter") |> 
  pivot_longer(Income:Unemployment,
               names_to = "regressor", values_to = "x") |> 
  ggplot(aes(x = x, y = .resid)) +
  geom_point() + 
  facet_wrap(. ~ regressor, scales = "free_x") +
  labs(y = "Residuals", x = "")

step 7c: plot residuals against fitted value using augment function it seems like a random plot which is what we want

augment(fit_consMR) |>
  ggplot(aes(x = .fitted, y = .resid)) +
  geom_point() + 
  labs(x = "Fitted", y = "Residuals")

measures of predictive accuracy are: 1 Adjusted R=Squared 2 CV - cross-validation…compute MSE from errors (higher better) 3 AIC - Akaike’s information criterion (higher better) 4 AICc - corrected AIC 5 BIC - Schwarz’s Bayesian Information Criterion

Adj Rsq is 76% which is good

Cross validation fits the model using a subset (-1 obs) of the full dataset, it is repeated over and over and we collect the errors MSE is computed using the errors -> CV

AIC - goodness of fit

use either AIC, AICc or CV

glance(fit_consMR) |>
  select(adj_r_squared, CV, AIC, AICc, BIC)

## # A tibble: 1 × 5
##   adj_r_squared    CV   AIC  AICc   BIC
##           <dbl> <dbl> <dbl> <dbl> <dbl>
## 1         0.763 0.104 -457. -456. -437.

Show a scatterplot matrix of the correlations of the ‘x’ values with each other

library(ggplot2)
library(GGally)

## Warning: package 'GGally' was built under R version 4.3.3

## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2

ggpairs(us_change, columns = 2:6)

forecast types: ex-ante: predictors must be forecasted first to predict outcomes ex-post: predictor future values are known. Good for evaluating model perform.. scenario analysis: a form of ex-ante forecasting

class 4 - may 16th

2025-05-16

R Markdown