For more details on using R Markdown see http://rmarkdown.rstudio.com. When you click the Knit button a document will be generated..
step 1: grab the file
library(fpp3)
## Warning: package 'fpp3' was built under R version 4.3.3
## Registered S3 method overwritten by 'tsibble':
## method from
## as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.1 ──
## ✔ tibble 3.2.1 ✔ tsibble 1.1.6
## ✔ dplyr 1.1.4 ✔ tsibbledata 0.4.1
## ✔ tidyr 1.3.1 ✔ feasts 0.4.1
## ✔ lubridate 1.9.4 ✔ fable 0.4.1
## ✔ ggplot2 3.5.2
## Warning: package 'tibble' was built under R version 4.3.3
## Warning: package 'dplyr' was built under R version 4.3.3
## Warning: package 'tidyr' was built under R version 4.3.3
## Warning: package 'lubridate' was built under R version 4.3.3
## Warning: package 'tsibble' was built under R version 4.3.3
## Warning: package 'tsibbledata' was built under R version 4.3.3
## Warning: package 'feasts' was built under R version 4.3.3
## Warning: package 'fabletools' was built under R version 4.3.3
## Warning: package 'fable' was built under R version 4.3.3
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date() masks base::date()
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ✖ tsibble::setdiff() masks base::setdiff()
## ✖ tsibble::union() masks base::union()
us_change
## # A tsibble: 198 x 6 [1Q]
## Quarter Consumption Income Production Savings Unemployment
## <qtr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1970 Q1 0.619 1.04 -2.45 5.30 0.9
## 2 1970 Q2 0.452 1.23 -0.551 7.79 0.5
## 3 1970 Q3 0.873 1.59 -0.359 7.40 0.5
## 4 1970 Q4 -0.272 -0.240 -2.19 1.17 0.700
## 5 1971 Q1 1.90 1.98 1.91 3.54 -0.100
## 6 1971 Q2 0.915 1.45 0.902 5.87 -0.100
## 7 1971 Q3 0.794 0.521 0.308 -0.406 0.100
## 8 1971 Q4 1.65 1.16 2.29 -1.49 0
## 9 1972 Q1 1.31 0.457 4.15 -4.29 -0.200
## 10 1972 Q2 1.89 1.03 1.89 -4.69 -0.100
## # ℹ 188 more rows
tail(us_change)
## # A tsibble: 6 x 6 [1Q]
## Quarter Consumption Income Production Savings Unemployment
## <qtr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2018 Q1 0.417 1.67 0.564 17.4 -0.100
## 2 2018 Q2 0.983 0.662 1.12 -2.72 0
## 3 2018 Q3 0.853 0.806 1.26 -0.0857 -0.300
## 4 2018 Q4 0.357 0.695 0.948 5.03 0.200
## 5 2019 Q1 0.283 1.10 -0.488 9.76 -0.100
## 6 2019 Q2 1.11 0.593 -0.540 -4.26 -0.100
# ?us_change
step 2: Create a plot between the independent variable and the predictor consumption-y and income-x
us_change |>
ggplot(aes(x = Income, y = Consumption)) +
labs(y = "Consumption (quarterly % change)",
x = "Income (quarterly % change)") +
geom_point() +
# plot the regression line
geom_smooth(method = "lm")
## `geom_smooth()` using formula = 'y ~ x'
step 3: model with one predictor
us_change |>
model(TSLM(Consumption ~ Income)) |>
report()
## Series: Consumption
## Model: TSLM
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.58236 -0.27777 0.01862 0.32330 1.42229
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.54454 0.05403 10.079 < 2e-16 ***
## Income 0.27183 0.04673 5.817 2.4e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5905 on 196 degrees of freedom
## Multiple R-squared: 0.1472, Adjusted R-squared: 0.1429
## F-statistic: 33.84 on 1 and 196 DF, p-value: 2.4022e-08
from above we see that the p-values for intercept and Income are very small and therefore significant so there is a linear relationship. In statistics, adjusted R-squared should be as close to 100% as possible and the F-statistic should be as high as possible
step 4: model TSLM with all predictors TSLM is used to fit linear models to time series data and incorporates trend and seasonality
us_change |>
model(TSLM(Consumption ~ Income + Production + Savings + Unemployment)) |>
report()
## Series: Consumption
## Model: TSLM
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.90555 -0.15821 -0.03608 0.13618 1.15471
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.253105 0.034470 7.343 5.71e-12 ***
## Income 0.740583 0.040115 18.461 < 2e-16 ***
## Production 0.047173 0.023142 2.038 0.0429 *
## Savings -0.052890 0.002924 -18.088 < 2e-16 ***
## Unemployment -0.174685 0.095511 -1.829 0.0689 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3102 on 193 degrees of freedom
## Multiple R-squared: 0.7683, Adjusted R-squared: 0.7635
## F-statistic: 160 on 4 and 193 DF, p-value: < 2.22e-16
from above we see that the first 4 are significant. However, Production is significant only at 95%
step 5: different way of coding step above -save model in object
fit_consMR <- us_change |>
model(tslm = TSLM(Consumption ~ Income + Production +
Savings + Unemployment))
report(fit_consMR)
## Series: Consumption
## Model: TSLM
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.90555 -0.15821 -0.03608 0.13618 1.15471
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.253105 0.034470 7.343 5.71e-12 ***
## Income 0.740583 0.040115 18.461 < 2e-16 ***
## Production 0.047173 0.023142 2.038 0.0429 *
## Savings -0.052890 0.002924 -18.088 < 2e-16 ***
## Unemployment -0.174685 0.095511 -1.829 0.0689 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3102 on 193 degrees of freedom
## Multiple R-squared: 0.7683, Adjusted R-squared: 0.7635
## F-statistic: 160 on 4 and 193 DF, p-value: < 2.22e-16
step 6: Augment function shows details of model
augment(fit_consMR) |> head()
## # A tsibble: 6 x 6 [1Q]
## # Key: .model [1]
## .model Quarter Consumption .fitted .resid .innov
## <chr> <qtr> <dbl> <dbl> <dbl> <dbl>
## 1 tslm 1970 Q1 0.619 0.474 0.145 0.145
## 2 tslm 1970 Q2 0.452 0.635 -0.183 -0.183
## 3 tslm 1970 Q3 0.873 0.931 -0.0583 -0.0583
## 4 tslm 1970 Q4 -0.272 -0.212 -0.0603 -0.0603
## 5 tslm 1971 Q1 1.90 1.64 0.264 0.264
## 6 tslm 1971 Q2 0.915 1.07 -0.158 -0.158
step 7: a) do residual plots, b) do against predictors, c) against fitted value
fit_consMR |>
gg_tsresiduals()
step 7b: plot residuals against each predictor we see that the residuals focus most near zero - which is good and they seem random and no strong correlations y is the residual
us_change |>
# join the residual to the original data
left_join(residuals(fit_consMR), by = "Quarter") |>
pivot_longer(Income:Unemployment,
names_to = "regressor", values_to = "x") |>
ggplot(aes(x = x, y = .resid)) +
geom_point() +
facet_wrap(. ~ regressor, scales = "free_x") +
labs(y = "Residuals", x = "")
step 7c: plot residuals against fitted value using augment function it seems like a random plot which is what we want
augment(fit_consMR) |>
ggplot(aes(x = .fitted, y = .resid)) +
geom_point() +
labs(x = "Fitted", y = "Residuals")
measures of predictive accuracy are: 1 Adjusted R=Squared 2 CV -
cross-validation…compute MSE from errors (higher better) 3 AIC -
Akaike’s information criterion (higher better) 4 AICc - corrected AIC 5
BIC - Schwarz’s Bayesian Information Criterion
Adj Rsq is 76% which is good
Cross validation fits the model using a subset (-1 obs) of the full dataset, it is repeated over and over and we collect the errors MSE is computed using the errors -> CV
AIC - goodness of fit
use either AIC, AICc or CV
glance(fit_consMR) |>
select(adj_r_squared, CV, AIC, AICc, BIC)
## # A tibble: 1 × 5
## adj_r_squared CV AIC AICc BIC
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.763 0.104 -457. -456. -437.
Show a scatterplot matrix of the correlations of the ‘x’ values with each other
library(ggplot2)
library(GGally)
## Warning: package 'GGally' was built under R version 4.3.3
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
ggpairs(us_change, columns = 2:6)
forecast types: ex-ante: predictors must be forecasted first to predict outcomes ex-post: predictor future values are known. Good for evaluating model perform.. scenario analysis: a form of ex-ante forecasting