Fable Modelling & Accuracy Metrics

Author

AS

Core Modelling Pipeline

The core modeling pipeline in the fable framework follows four tidy steps:

  1. Declare the tsibble
  2. Fit models with model()
  3. Generate forecasts with forecast()
  4. Visualize with autoplot()

This workflow is designed to align with the tidyverse philosophy — making it intuitive, scalable, and easy to integrate with tools like ggplot2, dplyr, and patchwork.

Setup

Load Libraries

  • When you run the code, try to hide all the associated output and warning for the packages below (for cleaner presentation). Of course, read the warning and see if you have to take any appropriate action.

    • Make sure you read up on the syntax of each package
Registered S3 method overwritten by 'tsibble':
  method               from 
  as_tibble.grouped_df dplyr
── Attaching packages ──────────────────────────────────────────── fpp3 1.0.2 ──
✔ tibble      3.3.0     ✔ tsibble     1.1.6
✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
✔ tidyr       1.3.1     ✔ feasts      0.4.2
✔ lubridate   1.9.4     ✔ fable       0.4.1
✔ ggplot2     3.5.2     
── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
✖ lubridate::date()    masks base::date()
✖ dplyr::filter()      masks stats::filter()
✖ tsibble::intersect() masks base::intersect()
✖ tsibble::interval()  masks lubridate::interval()
✖ dplyr::lag()         masks stats::lag()
✖ tsibble::setdiff()   masks base::setdiff()
✖ tsibble::union()     masks base::union()
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ forcats 1.0.0     ✔ readr   2.1.5
✔ purrr   1.1.0     ✔ stringr 1.5.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter()     masks stats::filter()
✖ tsibble::interval() masks lubridate::interval()
✖ dplyr::lag()        masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Attaching package: 'kableExtra'

The following object is masked from 'package:dplyr':

    group_rows

Import Data

remove(list=ls())

fredr_set_key("8a9ec1330374c1696f05cc8e526233b5") # replace with your own key please

?fredr
myts <- fredr("FEDFUNDS",
              observation_start = as.Date("1980-01-01"),
              observation_end   = as.Date("2024-12-01")
              ) |>
  transmute(Month = yearmonth(date), value) |>
  as_tsibble(index = Month)

glimpse(myts)
Rows: 540
Columns: 2
$ Month <mth> 1980 Jan, 1980 Feb, 1980 Mar, 1980 Apr, 1980 May, 1980 Jun, 1980…
$ value <dbl> 13.82, 14.13, 17.19, 17.61, 10.98, 9.47, 9.03, 9.61, 10.87, 12.8…
str(myts)
tbl_ts [540 × 2] (S3: tbl_ts/tbl_df/tbl/data.frame)
 $ Month: mth [1:540] 1980 Jan, 1980 Feb, 1980 Mar, 1980 Apr, 1980 May, 1980 Jun...
 $ value: num [1:540] 13.8 14.1 17.2 17.6 11 ...
 - attr(*, "key")= tibble [1 × 1] (S3: tbl_df/tbl/data.frame)
  ..$ .rows: list<int> [1:1] 
  .. ..$ : int [1:540] 1 2 3 4 5 6 7 8 9 10 ...
  .. ..@ ptype: int(0) 
 - attr(*, "index")= chr "Month"
  ..- attr(*, "ordered")= logi TRUE
 - attr(*, "index2")= chr "Month"
 - attr(*, "interval")= interval [1:1] 1M
  ..@ .regular: logi TRUE
  • Try to play with the frequency argument in particular and aggregation_method in fredr command.
Tip

Month is the tsibble index.

Split Data

Why Split Data into Train and Test Sets?

  1. To Evaluate Out-of-Sample Performance (Generalization Ability):

    • The training set is used to fit the model — it learns patterns from this data.

    • The test set is held out to simulate unseen, future data.

      This helps assess whether the model truly learns meaningful patterns or is just overfitting to noise in the training data.

  2. To Mimic Real-World Forecasting Scenarios:

    • In practice, we forecast the future based on the past.

    • A time-based split (like 80% train / 20% test) replicates this real-world condition by ensuring the test data occurs after the training data in time — which is especially crucial in time series forecasting.

There are always multiple ways to do the same thing.

Method 1

train <- myts |> filter_index(~"2015 Dec")
test  <- myts |> filter_index("2016 Jan"~.)

Method 2

# Calculate split index (80% of 540 = 432)
split_index <- floor(0.8 * nrow(myts))  # = 432

# Create training and test sets
train <- myts[1:split_index, ]
test  <- myts[(split_index + 1):nrow(myts), ]

Either way, describe your train and test data.

head(train, 1)
# A tsibble: 1 x 2 [1M]
     Month value
     <mth> <dbl>
1 1980 Jan  13.8
tail(x = train, n = 1) # Show last month of training set
# A tsibble: 1 x 2 [1M]
     Month value
     <mth> <dbl>
1 2015 Dec  0.24
head(x = test, n = 1) # Show first month of test set
# A tsibble: 1 x 2 [1M]
     Month value
     <mth> <dbl>
1 2016 Jan  0.34
tail(test, 1)
# A tsibble: 1 x 2 [1M]
     Month value
     <mth> <dbl>
1 2024 Dec  4.48

EG - Train data contains monthly observations from January, 1980 - December, 2015 while the test data contains monthly observations from January, 2016 - December, 2024.

Forecasting

We will use the fable framework for forecasting.

Package Purpose
tsibble A tidy data structure for time series (like data.frame, but with time-awareness)
feasts Feature extraction and visualization (seasonality, autocorrelation, etc.)
fable Forecasting models and methods (e.g., ARIMA, ETS, NAIVE, etc.)
fabletools Common tools and infrastructure for building forecasting models

tsibble

A tsibble (short for tidy temporal tibble) is a time series-aware data frame from the tsibble package in the fable forecasting framework. It builds on the tibble1 structure but adds time-related features that make working with time series more natural and powerful in the tidyverse style.

Key Features of a tsibble

Feature Description
Index A time variable (e.g., Month, Date, Year) that defines time order
Key One or more variables identifying multiple series (e.g., by country)
Regularity Can detect whether data is regular (monthly, quarterly, etc.)
Tidy Format Fully compatible with dplyr, ggplot2, and other tidyverse tools

Common tsibble Functions

Function Use
as_tsibble() Convert data to tsibble format
index_by() Aggregate to a different time unit
fill_gaps() Fill missing time periods
has_gaps() Check for missing intervals
interval() Report frequency (e.g., monthly)
Important

Before modeling, your time series object train must be declared a tsibble (tidy temporal data structure used by fable).

feasts

The feasts package (Features Extracted from Time Series) is part of the fpp3 framework and provides tools for exploratory data analysis (EDA) and feature extraction for time series models.

It works seamlessly with tsibble objects and focuses on visualizing, decomposing, and extracting features from time series data. These tools are especially useful before modeling, helping you understand structure, seasonality, trends, outliers, and autocorrelation.

Key Features of feasts

Feature Description
Time series decomposition STL() for seasonal-trend decomposition
Visual diagnostics Built-in autoplot() support for ACF, decompositions, and more
Statistical summaries Extract summaries like trend strength, seasonal strength, entropy
Fourier terms Generate Fourier terms for modeling complex seasonality
Spectral and autocorrelation tools Easily compute and visualize ACF, PACF, and periodograms

Common feasts Functions

Function Purpose
STL() Seasonal-Trend decomposition using Loess
ACF() / PACF() Autocorrelation and partial autocorrelation functions
features() Extracts numeric summaries/statistics for modeling or clustering
Fourier() Adds seasonal Fourier terms as predictors
gg_season() Seasonal plots using ggplot2
gg_subseries() Subseries plots (seasonal slices over time)

When to Use feasts

Use feasts when:

  • You want to visualize patterns before modeling

  • You want to diagnose structure (seasonality, trend, irregularity)

  • You’re preparing a dataset for model comparison or ML feature selection

feasts is focused on feature discovery and diagnostic plotting, complementing the modeling tools provided by fable.

Decomposition

?decomposition_model # fabletools
?STL # fable
?ETS # fable


mods <- train |>
  model(
    additive = decomposition_model(
      STL(formula = value ~ season(window="periodic")),
      ETS(formula = season_adjust ~ error("A") + trend("Ad") + season("N"))
    ),
    multiplicative = decomposition_model(
      dcmp = STL(formula = log(value) ~ season(window = "periodic")),
      ETS(formula = season_adjust)
    )
  )

mods
# A mable: 1 x 2
                   additive            multiplicative
                    <model>                   <model>
1 <STL decomposition model> <STL decomposition model>
typeof(mods)
[1] "list"
glimpse(mods)
Rows: 1
Columns: 2
$ additive       <model> [STL decomposition model]
$ multiplicative <model> [STL decomposition model]

If you’re using model pipelines (e.g., with fable or caret), many outputs are stored as nested lists. See Section 7.1 for more information. Here’s how you might extract model components:

Goal Code
Full model mods$multiplicative[[1]]
Internal ETS model mods$multiplicative[[1]]$fit$e1$fit
Parameter estimates mods$multiplicative[[1]]$fit$e1$fit$par
Fitted + residuals mods$multiplicative[[1]]$fit$e1$fit$est
State estimates mods$multiplicative[[1]]$fit$e1$fit$states
Accuracy measures mods$multiplicative[[1]]$fit$e1$fit$fit
Model specification mods$multiplicative[[1]]$fit$e1$fit$spec
str(mods) # skim
mdl_df [1 × 2] (S3: mdl_df/tbl_df/tbl/data.frame)
 $ additive      : lst_mdl [1:1] 
  ..$ :List of 5
  .. ..$ fit           :List of 2
  .. .. ..$ e1:List of 5
  .. .. .. ..$ fit           :List of 5
  .. .. .. .. ..$ par   : tibble [5 × 2] (S3: tbl_df/tbl/data.frame)
  .. .. .. .. .. ..$ term    : chr [1:5] "alpha" "beta" "phi" "l" ...
  .. .. .. .. .. ..$ estimate: num [1:5] 1 0.702 0.8 14.831 0.425
  .. .. .. .. ..$ est   : tbl_ts [432 × 4] (S3: tbl_ts/tbl_df/tbl/data.frame)
  .. .. .. .. .. ..$ Month        : mth [1:432] 1980 Jan, 1980 Feb, 1980 Mar, 1980 Apr, ...
  .. .. .. .. .. ..$ season_adjust: num [1:432] 13.8 14.2 17.1 17.5 11 ...
  .. .. .. .. .. ..$ .fitted      : num [1:432] 15.2 13.3 14.3 18.8 18.1 ...
  .. .. .. .. .. ..$ .resid       : num [1:432] -1.355 0.849 2.873 -1.318 -7.145 ...
  .. .. .. .. .. ..- attr(*, "key")= tibble [1 × 1] (S3: tbl_df/tbl/data.frame)
  .. .. .. .. .. .. ..$ .rows: list<int> [1:1] 
  .. .. .. .. .. .. .. ..$ : int [1:432] 1 2 3 4 5 6 7 8 9 10 ...
  .. .. .. .. .. .. .. ..@ ptype: int(0) 
  .. .. .. .. .. ..- attr(*, "index")= chr "Month"
  .. .. .. .. .. .. ..- attr(*, "ordered")= logi TRUE
  .. .. .. .. .. ..- attr(*, "index2")= chr "Month"
  .. .. .. .. .. ..- attr(*, "interval")= interval [1:1] 1M
  .. .. .. .. .. .. ..@ .regular: logi TRUE
  .. .. .. .. ..$ fit   : tibble [1 × 8] (S3: tbl_df/tbl/data.frame)
  .. .. .. .. .. ..$ sigma2 : num 0.334
  .. .. .. .. .. ..$ log_lik: num -1071
  .. .. .. .. .. ..$ AIC    : num 2155
  .. .. .. .. .. ..$ AICc   : num 2155
  .. .. .. .. .. ..$ BIC    : num 2179
  .. .. .. .. .. ..$ MSE    : num 0.33
  .. .. .. .. .. ..$ AMSE   : num 1.38
  .. .. .. .. .. ..$ MAE    : num 0.24
  .. .. .. .. ..$ states: tbl_ts [433 × 3] (S3: tbl_ts/tbl_df/tbl/data.frame)
  .. .. .. .. .. ..$ Month: mth [1:433] 1979 Dec, 1980 Jan, 1980 Feb, 1980 Mar, ...
  .. .. .. .. .. ..$ l    : num [1:433] 14.8 13.8 14.2 17.1 17.5 ...
  .. .. .. .. .. ..$ b    : num [1:433] 0.425 -0.61 0.108 2.103 0.757 ...
  .. .. .. .. .. ..- attr(*, "key")= tibble [1 × 1] (S3: tbl_df/tbl/data.frame)
  .. .. .. .. .. .. ..$ .rows: list<int> [1:1] 
  .. .. .. .. .. .. .. ..$ : int [1:433] 1 2 3 4 5 6 7 8 9 10 ...
  .. .. .. .. .. .. .. ..@ ptype: int(0) 
  .. .. .. .. .. ..- attr(*, "index")= chr "Month"
  .. .. .. .. .. .. ..- attr(*, "ordered")= logi TRUE
  .. .. .. .. .. ..- attr(*, "index2")= chr "Month"
  .. .. .. .. .. ..- attr(*, "interval")= interval [1:1] 1M
  .. .. .. .. .. .. ..@ .regular: logi TRUE
  .. .. .. .. ..$ spec  : tibble [1 × 5] (S3: tbl_df/tbl/data.frame)
  .. .. .. .. .. ..$ errortype : chr "A"
  .. .. .. .. .. ..$ trendtype : chr "A"
  .. .. .. .. .. ..$ seasontype: chr "N"
  .. .. .. .. .. ..$ damped    : logi TRUE
  .. .. .. .. .. ..$ period    : num 12
  .. .. .. .. .. ..- attr(*, "out.attrs")=List of 2
  .. .. .. .. .. .. ..$ dim     : Named int [1:3] 1 1 1
  .. .. .. .. .. .. .. ..- attr(*, "names")= chr [1:3] "errortype" "trendtype" "seasontype"
  .. .. .. .. .. .. ..$ dimnames:List of 3
  .. .. .. .. .. .. .. ..$ errortype : chr "errortype=A"
  .. .. .. .. .. .. .. ..$ trendtype : chr "trendtype=Ad"
  .. .. .. .. .. .. .. ..$ seasontype: chr "seasontype=N"
  .. .. .. .. ..- attr(*, "class")= chr "ETS"
  .. .. .. ..$ model         :Classes 'mdl_defn', 'R6' <mdl_defn>
  Public:
    add_data: function (.data) 
    check: function (.data) 
    clone: function (deep = FALSE) 
    data: NULL
    env: environment
    extra: list
    formula: formula
    initialize: function (formula, ..., .env) 
    model: ETS
    origin: 1980 Jan
    prepare: function (...) 
    print: function (...) 
    recall_lag: function (x, n = 1L, ...) 
    recent_data: NULL
    remove_data: function () 
    specials: environment
    stage: NULL
    train: function (.data, specials, opt_crit, nmse, bounds, ic, restrict = TRUE,  
  .. .. .. ..$ data          : tbl_ts [432 × 2] (S3: tbl_ts/tbl_df/tbl/data.frame)
  .. .. .. .. ..$ Month        : mth [1:432] 1980 Jan, 1980 Feb, 1980 Mar, 1980 Apr, 198...
  .. .. .. .. ..$ season_adjust: num [1:432] 13.8 14.2 17.1 17.5 11 ...
  .. .. .. .. ..- attr(*, "key")= tibble [1 × 1] (S3: tbl_df/tbl/data.frame)
  .. .. .. .. .. ..$ .rows: list<int> [1:1] 
  .. .. .. .. .. .. ..$ : int [1:432] 1 2 3 4 5 6 7 8 9 10 ...
  .. .. .. .. .. .. ..@ ptype: int(0) 
  .. .. .. .. ..- attr(*, "index")= chr "Month"
  .. .. .. .. .. ..- attr(*, "ordered")= logi TRUE
  .. .. .. .. ..- attr(*, "index2")= chr "Month"
  .. .. .. .. ..- attr(*, "interval")= interval [1:1] 1M
  .. .. .. .. .. ..@ .regular: logi TRUE
  .. .. .. ..$ response      :List of 1
  .. .. .. .. ..$ : symbol season_adjust
  .. .. .. ..$ transformation:List of 1
  .. .. .. .. ..$ :function (season_adjust)  
  .. .. .. .. .. ..- attr(*, "class")= chr "transformation"
  .. .. .. .. .. ..- attr(*, "inverse")=function (season_adjust)  
  .. .. .. ..- attr(*, "class")= chr "mdl_ts"
  .. .. ..$ e2:List of 5
  .. .. .. ..$ fit           :List of 8
  .. .. .. .. ..$ b      : num(0) 
  .. .. .. .. ..$ b.se   : num(0) 
  .. .. .. .. ..$ lag    : num 12
  .. .. .. .. ..$ sigma2 : num 0
  .. .. .. .. ..$ .fitted: num [1:432] NA NA NA NA NA NA NA NA NA NA ...
  .. .. .. .. ..$ .resid : num [1:432] NA NA NA NA NA NA NA NA NA NA ...
  .. .. .. .. ..$ time   :List of 2
  .. .. .. .. .. ..$ start   : mth [1:1] 1980 Jan
  .. .. .. .. .. ..$ interval: interval [1:1] 1M
  .. .. .. .. .. .. ..@ .regular: logi TRUE
  .. .. .. .. ..$ future : num [1:12] 0.00372 -0.04728 0.05339 0.10973 0.01885 ...
  .. .. .. .. ..- attr(*, "class")= chr "RW"
  .. .. .. ..$ model         :Classes 'mdl_defn', 'R6' <mdl_defn>
  Public:
    add_data: function (.data) 
    check: function (.data) 
    clone: function (deep = FALSE) 
    data: NULL
    env: environment
    extra: list
    formula: formula
    initialize: function (formula, ..., .env) 
    model: RW
    origin: 1980 Jan
    prepare: function (...) 
    print: function (...) 
    recall_lag: function (x, n = 1L, ...) 
    recent_data: NULL
    remove_data: function () 
    specials: environment
    stage: NULL
    train: function (.data, specials, ...)  
  .. .. .. ..$ data          : tbl_ts [432 × 2] (S3: tbl_ts/tbl_df/tbl/data.frame)
  .. .. .. .. ..$ Month      : mth [1:432] 1980 Jan, 1980 Feb, 1980 Mar, 1980 Apr, 198...
  .. .. .. .. ..$ season_year: num [1:432] 0.00372 -0.04728 0.05339 0.10973 0.01885 ...
  .. .. .. .. ..- attr(*, "key")= tibble [1 × 1] (S3: tbl_df/tbl/data.frame)
  .. .. .. .. .. ..$ .rows: list<int> [1:1] 
  .. .. .. .. .. .. ..$ : int [1:432] 1 2 3 4 5 6 7 8 9 10 ...
  .. .. .. .. .. .. ..@ ptype: int(0) 
  .. .. .. .. ..- attr(*, "index")= chr "Month"
  .. .. .. .. .. ..- attr(*, "ordered")= logi TRUE
  .. .. .. .. ..- attr(*, "index2")= chr "Month"
  .. .. .. .. ..- attr(*, "interval")= interval [1:1] 1M
  .. .. .. .. .. ..@ .regular: logi TRUE
  .. .. .. ..$ response      :List of 1
  .. .. .. .. ..$ : symbol season_year
  .. .. .. ..$ transformation:List of 1
  .. .. .. .. ..$ :function (season_year)  
  .. .. .. .. .. ..- attr(*, "class")= chr "transformation"
  .. .. .. .. .. ..- attr(*, "inverse")=function (season_year)  
  .. .. .. ..- attr(*, "class")= chr "mdl_ts"
  .. .. ..- attr(*, "combination")= language e1 + e2
  .. .. ..- attr(*, "class")= chr [1:2] "decomposition_model" "model_combination"
  .. .. ..- attr(*, "dcmp_method")= chr "STL"
  .. ..$ model         :Classes 'mdl_defn', 'R6' <mdl_defn>
  Public:
    add_data: function (.data) 
    check: function (.data) 
    clone: function (deep = FALSE) 
    data: NULL
    env: environment
    extra: list
    formula: formula
    initialize: function (formula, ..., .env) 
    model: dcmp_mdl
    origin: 1980 Jan
    prepare: function (...) 
    print: function (...) 
    recall_lag: function (x, n = 1L, ...) 
    recent_data: NULL
    remove_data: function () 
    specials: environment
    stage: NULL
    train: function (.data, specials, ..., dcmp)  
  .. ..$ data          : tbl_ts [432 × 2] (S3: tbl_ts/tbl_df/tbl/data.frame)
  .. .. ..$ Month: mth [1:432] 1980 Jan, 1980 Feb, 1980 Mar, 1980 Apr, 1980 May,...
  .. .. ..$ value: num [1:432] 13.8 14.1 17.2 17.6 11 ...
  .. .. ..- attr(*, "key")= tibble [1 × 1] (S3: tbl_df/tbl/data.frame)
  .. .. .. ..$ .rows:List of 1
  .. .. .. .. ..$ : int [1:432] 1 2 3 4 5 6 7 8 9 10 ...
  .. .. ..- attr(*, "index")= chr "Month"
  .. .. .. ..- attr(*, "ordered")= logi TRUE
  .. .. ..- attr(*, "index2")= chr "Month"
  .. .. ..- attr(*, "interval")= interval [1:1] 1M
  .. .. .. ..@ .regular: logi TRUE
  .. ..$ response      :List of 1
  .. .. ..$ : symbol value
  .. ..$ transformation:List of 1
  .. .. ..$ :function (value)  
  .. .. .. ..- attr(*, "class")= chr "transformation"
  .. .. .. ..- attr(*, "inverse")=function (value)  
  .. ..- attr(*, "class")= chr "mdl_ts"
 $ multiplicative: lst_mdl [1:1] 
  ..$ :List of 5
  .. ..$ fit           :List of 2
  .. .. ..$ e1:List of 5
  .. .. .. ..$ fit           :List of 5
  .. .. .. .. ..$ par   : tibble [5 × 2] (S3: tbl_df/tbl/data.frame)
  .. .. .. .. .. ..$ term    : chr [1:5] "alpha" "beta" "phi" "l" ...
  .. .. .. .. .. ..$ estimate: num [1:5] 1 0.215 0.802 3.247 -0.814
  .. .. .. .. ..$ est   : tbl_ts [432 × 4] (S3: tbl_ts/tbl_df/tbl/data.frame)
  .. .. .. .. .. ..$ Month        : mth [1:432] 1980 Jan, 1980 Feb, 1980 Mar, 1980 Apr, ...
  .. .. .. .. .. ..$ season_adjust: num [1:432] 2.67 2.66 2.84 2.85 2.38 ...
  .. .. .. .. .. ..$ .fitted      : num [1:432] 2.59 2.16 2.34 2.67 2.75 ...
  .. .. .. .. .. ..$ .resid       : num [1:432] 0.0724 0.5082 0.4964 0.1869 -0.3643 ...
  .. .. .. .. .. ..- attr(*, "key")= tibble [1 × 1] (S3: tbl_df/tbl/data.frame)
  .. .. .. .. .. .. ..$ .rows: list<int> [1:1] 
  .. .. .. .. .. .. .. ..$ : int [1:432] 1 2 3 4 5 6 7 8 9 10 ...
  .. .. .. .. .. .. .. ..@ ptype: int(0) 
  .. .. .. .. .. ..- attr(*, "index")= chr "Month"
  .. .. .. .. .. .. ..- attr(*, "ordered")= logi TRUE
  .. .. .. .. .. ..- attr(*, "index2")= chr "Month"
  .. .. .. .. .. ..- attr(*, "interval")= interval [1:1] 1M
  .. .. .. .. .. .. ..@ .regular: logi TRUE
  .. .. .. .. ..$ fit   : tibble [1 × 8] (S3: tbl_df/tbl/data.frame)
  .. .. .. .. .. ..$ sigma2 : num 0.0128
  .. .. .. .. .. ..$ log_lik: num -367
  .. .. .. .. .. ..$ AIC    : num 745
  .. .. .. .. .. ..$ AICc   : num 745
  .. .. .. .. .. ..$ BIC    : num 769
  .. .. .. .. .. ..$ MSE    : num 0.0126
  .. .. .. .. .. ..$ AMSE   : num 0.0387
  .. .. .. .. .. ..$ MAE    : num 0.0606
  .. .. .. .. ..$ states: tbl_ts [433 × 3] (S3: tbl_ts/tbl_df/tbl/data.frame)
  .. .. .. .. .. ..$ Month: mth [1:433] 1979 Dec, 1980 Jan, 1980 Feb, 1980 Mar, ...
  .. .. .. .. .. ..$ l    : num [1:433] 3.25 2.67 2.66 2.84 2.85 ...
  .. .. .. .. .. ..$ b    : num [1:433] -0.814 -0.637 -0.401 -0.215 -0.132 ...
  .. .. .. .. .. ..- attr(*, "key")= tibble [1 × 1] (S3: tbl_df/tbl/data.frame)
  .. .. .. .. .. .. ..$ .rows: list<int> [1:1] 
  .. .. .. .. .. .. .. ..$ : int [1:433] 1 2 3 4 5 6 7 8 9 10 ...
  .. .. .. .. .. .. .. ..@ ptype: int(0) 
  .. .. .. .. .. ..- attr(*, "index")= chr "Month"
  .. .. .. .. .. .. ..- attr(*, "ordered")= logi TRUE
  .. .. .. .. .. ..- attr(*, "index2")= chr "Month"
  .. .. .. .. .. ..- attr(*, "interval")= interval [1:1] 1M
  .. .. .. .. .. .. ..@ .regular: logi TRUE
  .. .. .. .. ..$ spec  : tibble [1 × 5] (S3: tbl_df/tbl/data.frame)
  .. .. .. .. .. ..$ errortype : chr "A"
  .. .. .. .. .. ..$ trendtype : chr "A"
  .. .. .. .. .. ..$ seasontype: chr "N"
  .. .. .. .. .. ..$ damped    : logi TRUE
  .. .. .. .. .. ..$ period    : num 12
  .. .. .. .. .. ..- attr(*, "out.attrs")=List of 2
  .. .. .. .. .. .. ..$ dim     : Named int [1:3] 2 3 3
  .. .. .. .. .. .. .. ..- attr(*, "names")= chr [1:3] "errortype" "trendtype" "seasontype"
  .. .. .. .. .. .. ..$ dimnames:List of 3
  .. .. .. .. .. .. .. ..$ errortype : chr [1:2] "errortype=A" "errortype=M"
  .. .. .. .. .. .. .. ..$ trendtype : chr [1:3] "trendtype=N" "trendtype=A" "trendtype=Ad"
  .. .. .. .. .. .. .. ..$ seasontype: chr [1:3] "seasontype=N" "seasontype=A" "seasontype=M"
  .. .. .. .. ..- attr(*, "class")= chr "ETS"
  .. .. .. ..$ model         :Classes 'mdl_defn', 'R6' <mdl_defn>
  Public:
    add_data: function (.data) 
    check: function (.data) 
    clone: function (deep = FALSE) 
    data: NULL
    env: environment
    extra: list
    formula: name
    initialize: function (formula, ..., .env) 
    model: ETS
    origin: 1980 Jan
    prepare: function (...) 
    print: function (...) 
    recall_lag: function (x, n = 1L, ...) 
    recent_data: NULL
    remove_data: function () 
    specials: environment
    stage: NULL
    train: function (.data, specials, opt_crit, nmse, bounds, ic, restrict = TRUE,  
  .. .. .. ..$ data          : tbl_ts [432 × 2] (S3: tbl_ts/tbl_df/tbl/data.frame)
  .. .. .. .. ..$ Month        : mth [1:432] 1980 Jan, 1980 Feb, 1980 Mar, 1980 Apr, 198...
  .. .. .. .. ..$ season_adjust: num [1:432] 2.67 2.66 2.84 2.85 2.38 ...
  .. .. .. .. ..- attr(*, "key")= tibble [1 × 1] (S3: tbl_df/tbl/data.frame)
  .. .. .. .. .. ..$ .rows: list<int> [1:1] 
  .. .. .. .. .. .. ..$ : int [1:432] 1 2 3 4 5 6 7 8 9 10 ...
  .. .. .. .. .. .. ..@ ptype: int(0) 
  .. .. .. .. ..- attr(*, "index")= chr "Month"
  .. .. .. .. .. ..- attr(*, "ordered")= logi TRUE
  .. .. .. .. ..- attr(*, "index2")= chr "Month"
  .. .. .. .. ..- attr(*, "interval")= interval [1:1] 1M
  .. .. .. .. .. ..@ .regular: logi TRUE
  .. .. .. ..$ response      :List of 1
  .. .. .. .. ..$ : symbol season_adjust
  .. .. .. ..$ transformation:List of 1
  .. .. .. .. ..$ :function (season_adjust)  
  .. .. .. .. .. ..- attr(*, "class")= chr "transformation"
  .. .. .. .. .. ..- attr(*, "inverse")=function (season_adjust)  
  .. .. .. ..- attr(*, "class")= chr "mdl_ts"
  .. .. ..$ e2:List of 5
  .. .. .. ..$ fit           :List of 8
  .. .. .. .. ..$ b      : num(0) 
  .. .. .. .. ..$ b.se   : num(0) 
  .. .. .. .. ..$ lag    : num 12
  .. .. .. .. ..$ sigma2 : num 0
  .. .. .. .. ..$ .fitted: num [1:432] NA NA NA NA NA NA NA NA NA NA ...
  .. .. .. .. ..$ .resid : num [1:432] NA NA NA NA NA NA NA NA NA NA ...
  .. .. .. .. ..$ time   :List of 2
  .. .. .. .. .. ..$ start   : mth [1:1] 1980 Jan
  .. .. .. .. .. ..$ interval: interval [1:1] 1M
  .. .. .. .. .. .. ..@ .regular: logi TRUE
  .. .. .. .. ..$ future : num [1:12] -0.04041 -0.01576 0.00575 0.01549 0.01342 ...
  .. .. .. .. ..- attr(*, "class")= chr "RW"
  .. .. .. ..$ model         :Classes 'mdl_defn', 'R6' <mdl_defn>
  Public:
    add_data: function (.data) 
    check: function (.data) 
    clone: function (deep = FALSE) 
    data: NULL
    env: environment
    extra: list
    formula: formula
    initialize: function (formula, ..., .env) 
    model: RW
    origin: 1980 Jan
    prepare: function (...) 
    print: function (...) 
    recall_lag: function (x, n = 1L, ...) 
    recent_data: NULL
    remove_data: function () 
    specials: environment
    stage: NULL
    train: function (.data, specials, ...)  
  .. .. .. ..$ data          : tbl_ts [432 × 2] (S3: tbl_ts/tbl_df/tbl/data.frame)
  .. .. .. .. ..$ Month      : mth [1:432] 1980 Jan, 1980 Feb, 1980 Mar, 1980 Apr, 198...
  .. .. .. .. ..$ season_year: num [1:432] -0.04041 -0.01576 0.00575 0.01549 0.01342 ...
  .. .. .. .. ..- attr(*, "key")= tibble [1 × 1] (S3: tbl_df/tbl/data.frame)
  .. .. .. .. .. ..$ .rows: list<int> [1:1] 
  .. .. .. .. .. .. ..$ : int [1:432] 1 2 3 4 5 6 7 8 9 10 ...
  .. .. .. .. .. .. ..@ ptype: int(0) 
  .. .. .. .. ..- attr(*, "index")= chr "Month"
  .. .. .. .. .. ..- attr(*, "ordered")= logi TRUE
  .. .. .. .. ..- attr(*, "index2")= chr "Month"
  .. .. .. .. ..- attr(*, "interval")= interval [1:1] 1M
  .. .. .. .. .. ..@ .regular: logi TRUE
  .. .. .. ..$ response      :List of 1
  .. .. .. .. ..$ : symbol season_year
  .. .. .. ..$ transformation:List of 1
  .. .. .. .. ..$ :function (season_year)  
  .. .. .. .. .. ..- attr(*, "class")= chr "transformation"
  .. .. .. .. .. ..- attr(*, "inverse")=function (season_year)  
  .. .. .. ..- attr(*, "class")= chr "mdl_ts"
  .. .. ..- attr(*, "combination")= language e1 + e2
  .. .. ..- attr(*, "class")= chr [1:2] "decomposition_model" "model_combination"
  .. .. ..- attr(*, "dcmp_method")= chr "STL"
  .. ..$ model         :Classes 'mdl_defn', 'R6' <mdl_defn>
  Public:
    add_data: function (.data) 
    check: function (.data) 
    clone: function (deep = FALSE) 
    data: NULL
    env: environment
    extra: list
    formula: formula
    initialize: function (formula, ..., .env) 
    model: dcmp_mdl
    origin: 1980 Jan
    prepare: function (...) 
    print: function (...) 
    recall_lag: function (x, n = 1L, ...) 
    recent_data: NULL
    remove_data: function () 
    specials: environment
    stage: NULL
    train: function (.data, specials, ..., dcmp)  
  .. ..$ data          : tbl_ts [432 × 2] (S3: tbl_ts/tbl_df/tbl/data.frame)
  .. .. ..$ Month     : mth [1:432] 1980 Jan, 1980 Feb, 1980 Mar, 1980 Apr, 1980 May,...
  .. .. ..$ log(value): num [1:432] 2.63 2.65 2.84 2.87 2.4 ...
  .. .. ..- attr(*, "key")= tibble [1 × 1] (S3: tbl_df/tbl/data.frame)
  .. .. .. ..$ .rows:List of 1
  .. .. .. .. ..$ : int [1:432] 1 2 3 4 5 6 7 8 9 10 ...
  .. .. ..- attr(*, "index")= chr "Month"
  .. .. .. ..- attr(*, "ordered")= logi TRUE
  .. .. ..- attr(*, "index2")= chr "Month"
  .. .. ..- attr(*, "interval")= interval [1:1] 1M
  .. .. .. ..@ .regular: logi TRUE
  .. ..$ response      :List of 1
  .. .. ..$ : symbol value
  .. ..$ transformation:List of 1
  .. .. ..$ :function (value)  
  .. .. .. ..- attr(*, "class")= chr "transformation"
  .. .. .. ..- attr(*, "inverse")=function (value)  
  .. ..- attr(*, "class")= chr "mdl_ts"
 - attr(*, "key")= tibble [1 × 1] (S3: tbl_df/tbl/data.frame)
  ..$ .rows: list<int> [1:1] 
  .. ..$ : int 1
  .. ..@ ptype: int(0) 
 - attr(*, "model")= chr [1:2] "additive" "multiplicative"
 - attr(*, "response")= chr "value"
# ----- Plots -----
p1 <- train |>
  model(STL(value ~ season(window="periodic"))) |>
  components() |> autoplot() + ggtitle("Additive STL decomposition")

p2 <- train |>
  model(STL(log(value) ~ season(window="periodic"))) |>
  components() |> autoplot() + ggtitle("Multiplicative STL decomposition (log)")
 
p1/p2

fable

The fable package is the core forecasting engine in the fpp3 framework. It provides a tidy, model-based grammar for time series forecasting — similar in spirit to how ggplot2 provides a grammar of graphics. It is built to work seamlessly with tsibble, dplyr, and other tidyverse tools.

“Fable” stands for Forecasting Tables, and it allows users to fit, evaluate, and forecast time series models in a consistent and scalable way.

Key Features of fable

Feature Description
Model grammar Define models using model() and a pipeable grammar (e.g., ARIMA(), ETS())
Tidy forecasts Forecasts are stored in tidy tibble format and easily visualized
Multiple models Fit many models at once, either across groups or with multiple specifications
Integration Works seamlessly with ggplot2, dplyr, tsibble, and other tidy tools
Support for ensembles and decompositions Use hybrid workflows like STL + ETS models with decomposition_model()

Common fable Functions

Function Purpose
model() Fit one or more models to a tsibble object
forecast() Generate forecasts for future time periods
report() Print model summaries and parameter estimates
glance() Return one-row summaries of model metrics (e.g., AIC)
augment() Add fitted values and residuals to historical data
accuracy() Compare model performance based on accuracy metrics

fabletools

The fabletools package provides the modeling infrastructure used by fable and other forecasting packages in the fpp3 ecosystem. It defines the internal structure for model fitting, forecasting, and diagnostics.

You don’t typically use fabletools directly unless you are creating custom models or building extensions, but it is automatically loaded when you use fable.

Key Features of fabletools

Feature Description
Model definition grammar Enables packages to define and register new model types
Decomposition support Powers tools like decomposition_model() for combining STL + ETS/ARIMA
Tidy infrastructure Provides glance(), augment(), report() for consistent model diagnostics
Modular architecture Supports plug-in models like fable.prophet, fable.ets, etc.

Workflow

The core modeling pipeline: declare the tsibble → fit models → generate forecasts → visualize.

# Fit five benchmark forecasting models 

models <- train %>% model( NAIVE  = NAIVE(formula = value), 
                           MEAN   = MEAN(formula = value), 
                           SNAIVE = SNAIVE(formula = value), 
                           AVG    = MEAN(formula = value), # simple average = mean 
                           WAVG   = TSLM(formula = value ~ trend()) # using trend to simulate a weighted linear average 
                           ) 

fc <- models %>%
  forecast(h = nrow(test)
           ) # h = 12 if you want to forecast the next 12 months 

# Plot all forecasts 
fc %>% autoplot(train) + 
  labs( title = "Five Benchmark Forecasts: Naive, Mean, SNaive, Average, Weighted",
        y = "Value",
        x = "Month" 
        ) + facet_wrap(~ .model,
                       ncol = 2
                       ) 

report(models)
Warning in report.mdl_df(models): Model reporting is only supported for
individual models, so a glance will be shown. To see the report for a specific
model, use `select()` and `filter()` to identify a single model.
# A tibble: 5 × 15
  .model sigma2 r_squared adj_r_squared statistic    p_value    df log_lik   AIC
  <chr>   <dbl>     <dbl>         <dbl>     <dbl>      <dbl> <int>   <dbl> <dbl>
1 NAIVE   0.333    NA            NA           NA  NA            NA     NA    NA 
2 MEAN   16.8      NA            NA           NA  NA            NA     NA    NA 
3 SNAIVE  4.11     NA            NA           NA  NA            NA     NA    NA 
4 AVG    16.8      NA            NA           NA  NA            NA     NA    NA 
5 WAVG    4.21      0.750         0.750     1291.  1.43e-131     2   -922.  625.
# ℹ 6 more variables: AICc <dbl>, BIC <dbl>, CV <dbl>, deviance <dbl>,
#   df.residual <int>, rank <int>

Why report(models) Shows Mostly NA

When using the fabletools::report() function in the fpp3 framework, it’s important to understand that not all models return detailed statistical summaries.

Why You’re Seeing NA in report(models)

The output of report(models) shows many NA values for certain models because:

  • NAIVE(), MEAN(), SNAIVE(), and your custom AVG (which is just a mean model) are non-parametric or rule-based models.
  • These models do not estimate coefficients and therefore do not have:
    • R-squared
    • Adjusted R-squared
    • p-values
    • t-statistics

Since there is no regression formula involved, these statistics simply do not apply, hence the NA values in the summary output.

Why WAVG (Weighted Average) Works

We defined WAVG as:

WAVG = TSLM(formula = value ~ trend())

This is a regression model (TSLM), where:

  • trend() is the predictor

  • value is the response

Because it’s a linear model, it produces:

  • Coefficients

  • R-squared and adjusted R-squared

  • t-statistics

  • p-values

That’s why we get a full report for this model and not for the others.

If you want detailed statistical summaries (e.g., for interpretation), use models like:

  • TSLM() (Time Series Linear Models)

  • ETS() (Exponential Smoothing)

  • These are parametric models that support full diagnostics.

preds <- forecast(object = mods, 
                  new_data = test) |>
  mutate(.mean = ifelse(test = .model=="multiplicative", 
                        yes = exp(.mean),
                        no = .mean)
         )

p3 <- autoplot(myts) +
  autolayer(filter(preds, .model=="additive"), colour="blue") +
  autolayer(filter(preds, .model=="multiplicative"), colour="red") +
  ggtitle("Additive (blue) vs Multiplicative (red) forecasts")
Plot variable not specified, automatically selected `.vars = value`
Scale for fill_ramp is already present.
Adding another scale for fill_ramp, which will replace the existing scale.
p4 <- autoplot(myts |> filter_index("2015 Jan"~.)) +
  autolayer(filter(preds, .model=="additive"), colour="blue") +
  autolayer(filter(preds, .model=="multiplicative"), colour="red") +
  ggtitle("Forecasts vs Actuals (2015+)")
Plot variable not specified, automatically selected `.vars = value`
Scale for fill_ramp is already present.
Adding another scale for fill_ramp, which will replace the existing scale.
# ----- 4-panel layout -----
# (p1 | p2) / (p3 | p4)

# ----- 2-panel layout -----
(p3 / p4)

Accuracy Metrics

Accuracy Metrics: Definitions and Interpretations

Metric Formula Interpretation
ME (Mean Error) \(\displaystyle \frac{1}{n} \sum (y_t - \hat{y}_t)\) Average forecast error. Indicates bias. Positive: underprediction; Negative: overprediction.
MPE (Mean Percentage Error) \(\frac{1}{n} \sum ( \frac{y_t - \hat{y}_t}{y_t}) * 100\%\) Average percentage error. Can be misleading when actuals are near zero.
RMSE (Root Mean Squared Error) \(\sqrt{ \frac{1}{n} \sum (y_t - \hat{y}_t)^2 }\) Penalizes large errors more than MAE. Sensitive to outliers.
MAE (Mean Absolute Error) \(\frac{1}{n} \sum |y_t - \hat{y}_t|\) Average absolute forecast error. Easy to interpret in original units.
MAPE (Mean Absolute Percentage Error) \(\frac{1}{n} \sum | \frac{y_t - \hat{y}_t}{y_t}| * 100\%\) Scale-independent, but undefined if any actual \(y_t = 0\).
MASE (Mean Absolute Scaled Error) \(\displaystyle \frac{\text{MAE}}{\text{MAE of naive model}}\) <1 = better than naive; >1 = worse. Allows comparison across series.
RMSSE (Root Mean Squared Scaled Error) \(\frac{RMSE}{RMSE \ of \ naive \ model}\) Like MASE, but penalizes larger errors more heavily.

#str(fc)
#str(test)

?accuracy
Help on topic 'accuracy' was found in the following packages:

  Package               Library
  generics              /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library
  fabletools            /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library


Using the first match ...
# ----- Accuracy -----
acc <- accuracy(object = fc, 
                data = test
                ) |>
  select(.model, ME, MPE, RMSE, MAE, MAPE #, MASE, RMSSE
         )


# Print accuracy table separately
acc
# A tibble: 5 × 6
  .model    ME     MPE  RMSE   MAE   MAPE
  <chr>  <dbl>   <dbl> <dbl> <dbl>  <dbl>
1 AVG    -3.07 -1632.   3.60  3.15 1633. 
2 MEAN   -3.07 -1632.   3.60  3.15 1633. 
3 NAIVE   1.73    17.5  2.55  1.79  107. 
4 SNAIVE  1.83    54.9  2.62  1.86   84.6
5 WAVG    4.60  1025.   5.26  4.60 1025. 
?kable
Help on topic 'kable' was found in the following packages:

  Package               Library
  kableExtra            /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library
  knitr                 /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library


Using the first match ...
# Print with knitr::kable
kable(acc, caption = "Forecast Accuracy Metrics") |>
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)
Forecast Accuracy Metrics
.model ME MPE RMSE MAE MAPE
AVG -3.072963 -1632.08484 3.601883 3.147387 1633.48465
MEAN -3.072963 -1632.08484 3.601883 3.147387 1633.48465
NAIVE 1.725926 17.50158 2.551329 1.794630 106.61975
SNAIVE 1.833426 54.90794 2.624286 1.856019 84.57798
WAVG 4.603637 1024.83600 5.264253 4.603637 1024.83600

Replicating the values

Extract Forecast Values from SNAIVE

# Forecast using the snaive model only
fc_snaive <- models %>%
  select(SNAIVE) %>%
  forecast(h = nrow(test))

Join Forecasts to Actuals

# Step 1: Fit SNAIVE model
snaive_model <- train %>%
  model(SNAIVE = SNAIVE(value))

# Step 2: Forecast same length as test
fc_snaive <- snaive_model %>%
  forecast(h = nrow(test))

# Step 3: Join forecasts with test data
# test$value is the actual column — make sure it exists
compare <- fc_snaive %>%
  left_join(test %>% rename(actual = value), by = "Month") %>%
  rename(forecast = .mean) %>%
  select(Month, actual, forecast)

# Step 4: Compute metrics
actual <- compare$actual
forecast <- compare$forecast

ME    <- mean(actual - forecast)                        # Mean Error (bias)
MPE   <- mean((actual - forecast) / actual) * 100       # Mean Percentage Error
RMSE  <- sqrt(mean((actual - forecast)^2))              # Root Mean Squared Error
MAE   <- mean(abs(actual - forecast))                   # Mean Absolute Error
MAPE  <- mean(abs((actual - forecast) / actual)) * 100  # Mean Absolute Percentage Error
#MASE  <- MAE / mean(abs(diff(actual)))                  # Mean Absolute Scaled Error (vs. naive)
#RMSSE <- RMSE / sqrt(mean(diff(actual)^2))              # Root Mean Squared Scaled Error


# Step 5: Show results
tibble::tibble(
  Model = "SNAIVE",
  ME = ME,
  MPE = MPE,
  RMSE = RMSE,
  MAE = MAE,
  MAPE = MAPE
#  MASE = MASE,
#  RMSSE = RMSSE
)
# A tibble: 1 × 6
  Model     ME   MPE  RMSE   MAE  MAPE
  <chr>  <dbl> <dbl> <dbl> <dbl> <dbl>
1 SNAIVE  1.83  54.9  2.62  1.86  84.6

Export

?model
# Step 1: Fit multiple models on training data
models <- train %>% model( NAIVE  = NAIVE(formula = value), 
                           MEAN   = MEAN(formula = value), 
                           SNAIVE = SNAIVE(formula = value), 
                           AVG    = MEAN(formula = value), # simple average = mean 
                           WAVG   = TSLM(formula = value ~ trend()) # using trend to simulate a weighted linear average 
                           ) 
?forecast
Help on topic 'forecast' was found in the following packages:

  Package               Library
  generics              /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library
  fabletools            /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library


Using the first match ...
# Step 2: Forecast from all models for test set horizon
fc_all <- models %>% fabletools::forecast(h = nrow(test))

# Step 3: Reshape forecasts into wide format
fc_wide <- fc_all %>%
  as_tibble() %>%  # Converts fable to a normal tibble
  select(Month, .model, .mean) %>%
  pivot_wider(names_from = .model, values_from = .mean)

# Step 4: Add actual test values for comparison
df_forecast_all <- test %>%
  rename(actual = value) %>%
  left_join(fc_wide, by = "Month")  %>%
  mutate(Month = as.Date(Month))

# Step 5: Preview the final dataframe
print(df_forecast_all)
# A tsibble: 108 x 7 [1D]
   Month      actual NAIVE  MEAN SNAIVE   AVG  WAVG
   <date>      <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl>
 1 2016-01-01   0.34  0.24  5.04   0.11  5.04 -1.12
 2 2016-02-01   0.38  0.24  5.04   0.11  5.04 -1.15
 3 2016-03-01   0.36  0.24  5.04   0.11  5.04 -1.17
 4 2016-04-01   0.37  0.24  5.04   0.12  5.04 -1.20
 5 2016-05-01   0.37  0.24  5.04   0.12  5.04 -1.23
 6 2016-06-01   0.38  0.24  5.04   0.13  5.04 -1.26
 7 2016-07-01   0.39  0.24  5.04   0.13  5.04 -1.29
 8 2016-08-01   0.4   0.24  5.04   0.14  5.04 -1.32
 9 2016-09-01   0.4   0.24  5.04   0.14  5.04 -1.34
10 2016-10-01   0.4   0.24  5.04   0.12  5.04 -1.37
# ℹ 98 more rows
# Write to Excel file
write_xlsx(df_forecast_all, "accuracy_metrics.xlsx")

Appendix

Subsetting and Indexing Basics in R: Lists

In R, lists are flexible data structures that can hold elements of different types — such as numbers, characters, vectors, data frames, functions, or even other lists. Because of this, lists are extremely useful in modeling pipelines, complex data wrangling, and nested object storage.

Creating a List

You can create a list using the list() function. Here’s an example of a nested list:

my_list <- list( name = "Arvind", 
                 scores = c(90, 85, 88), 
                 passed = TRUE, 
                 details = list(age = 32,
                                city = "Boston"
                                )
                 )

This creates a named list with 4 elements.

Subsetting a List in R

There are three main ways to access elements from a list in R:

Operator Returns Description
$ Single element Access by name (only top-level, no quotes needed)
[[ ]] Single element Access by name or index, returns the object itself
[ ] Sublist Returns a list (even if length 1)

Using $ — Direct Access by Name

my_list$scores
[1] 90 85 88
  • $ is a shorthand for accessing top-level named elements.

  • Cannot be used for dynamic indexing or nested structures.

Using [[ ]] — Element Extraction

my_list[["scores"]]
[1] 90 85 88
my_list[[2]]
[1] 90 85 88
  • Extracts the value itself (not wrapped in a list).

  • Can use a name or an index.

  • Supports dynamic referencing:

    element_name <- "scores"
    my_list[[element_name]]
    [1] 90 85 88

Using [ ] — Sublist Extraction

my_list["scores"]
$scores
[1] 90 85 88
my_list[2]
$scores
[1] 90 85 88
typeof(my_list[2])
[1] "list"
  • Returns a list of length 1.

  • Useful when you want to keep the list structure intact.

str(my_list["scores"])
List of 1
 $ scores: num [1:3] 90 85 88

Accessing Nested Lists

You can chain access using $ or multiple [[ ]] calls:

my_list$details$city
[1] "Boston"
my_list[["details"]][["city"]]
[1] "Boston"

This allows you to drill down into hierarchical data structures.

Goal Code Output Type
Get vector of scores my_list[["scores"]] numeric vector
Get scores sublist my_list["scores"] list
Dynamic name access my_list[[varname]] element
Nested city value my_list$details$city character string

Footnotes

  1. A tibble is a more user-friendly and consistent type of data frame used in modern R programming. It behaves like a data.frame, but with improvements that make it easier to use, especially for data analysis and visualization.

    Feature data.frame tibble
    Printing behavior Prints all rows and columns Prints only first 10 rows and as many columns as fit
    Row names Uses row names by default Does not use row names
    Partial matching Allows partial matching of column names (df$val works) Does not allow partial matching — avoids accidental bugs
    String handling Converts strings to factors by default Keeps character vectors as character (no surprise factors)
    Subsetting behavior df[x] may return a vector or a data.frame tibble[x] always returns a tibble
    Integration Works well in base R Fully integrated with tidyverse tools (dplyr, ggplot2)
    ↩︎