Module 1 Discussion 2

Author

Robert Jenkins

1 Discussion Part I

1.1 What is the fpp3 package about? How many datasets does it have? What packages does it load?

#install.packages("fpp3")
library(fpp3)
Warning: package 'fpp3' was built under R version 4.5.2
Registered S3 method overwritten by 'tsibble':
  method               from 
  as_tibble.grouped_df dplyr
── Attaching packages ──────────────────────────────────────────── fpp3 1.0.2 ──
✔ tibble      3.3.0     ✔ tsibble     1.1.6
✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
✔ tidyr       1.3.1     ✔ feasts      0.4.2
✔ lubridate   1.9.4     ✔ fable       0.5.0
✔ ggplot2     3.5.2     
Warning: package 'tsibble' was built under R version 4.5.2
Warning: package 'tsibbledata' was built under R version 4.5.2
Warning: package 'feasts' was built under R version 4.5.2
Warning: package 'fabletools' was built under R version 4.5.2
Warning: package 'fable' was built under R version 4.5.2
── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
✖ lubridate::date()    masks base::date()
✖ dplyr::filter()      masks stats::filter()
✖ tsibble::intersect() masks base::intersect()
✖ tsibble::interval()  masks lubridate::interval()
✖ dplyr::lag()         masks stats::lag()
✖ tsibble::setdiff()   masks base::setdiff()
✖ tsibble::union()     masks base::union()
?fpp3
starting httpd help server ...
 done
data(package = "fpp3")

When i first tried to call the above code I got an error message. It turns out I did not already had the fpp3 package installed but it was pretty easy to do so with the line of code above.

I loaded fpp3 to see what it actually does in R. library(fpp3) brings in the forecasting workflow as a bundle, so I don’t have to manually load all the time series packages. When I ran ?fpp3, the help page basically confirmed it’s meant to support the FPP3 forecasting approach and ties together the main forecasting packages as explained in our book. It was created by Rob Hyndman. This package is meant specifically to give you all the tools you need to analyze and manipulate time series data. The packages it brought in are tibble, tsibble, dplyr, tsibbledata, tidyr, feasts, lubridate, fable, and ggplot. Then data(package="fpp3") showed me the data sets that come directly with the package in a seperate tab (it lists them, it doesn’t automatically load them). I counted the data sets it listed (I’m sure there is a quicker way to do this, I am just not that sound in R yet) and there is 25 data sets included in the package. I also noticed that a lot of the example datasets people use aren’t necessarily stored inside fpp3 itself, they often come from companion packages like tsibbledata.

1.2 What is a tsibble?

?tsibble

A tsibble is basically a time series version of a tibble in R. What makes it different from a regular data frame is that it forces you to be explicit about time by setting an index column, which is the variable that represents the time scale of the data like daily, monthly, quarterly, or yearly. On top of that, you can also define a key or even multiple keys to label which observations belong to which series, like separating data by store, region, or product line. Once you do that, the tsibble can hold multiple time series in one tidy table, and the number of unique groups in the key is basically the number of separate series you are tracking. Then you have the measured variables, which are the values you actually care about analyzing and forecasting over time.

1.3 What is feasts package about?

?feasts

The feasts package is basically the part of the fpp3 ecosystem that helps you understand what your time series is doing before you jump straight into forecasting. It focuses on exploration and diagnostics, so you can visualize patterns, check seasonality and trend behavior, and look at common time series relationships like autocorrelation. It also includes tools for breaking a series into components using decomposition methods, which makes it easier to explain what portion of the movement is trend, what is seasonal, and what is just noise. In practice, feasts feels like the “figure out what’s going on” toolkit that supports a tsibble workflow, so you can describe the series clearly and validate assumptions before you start fitting models.

1.4 What is fable package about?

?fable

The fable package is basically the modeling and forecasting engine in the fpp3 ecosystem. It is the piece that lets you take a tsibble, fit forecasting models in a clean, consistent way, and then generate forecasts for future time periods. What stood out to me is that it is built to handle multiple time series naturally using the key, so you can fit the same model across many groups without manually splitting your data into a bunch of separate objects. It also makes it straightforward to compare models, check results, and produce forecast outputs that plug right back into plotting and analysis. Overall, fable is the part you use when you are done exploring the data and you are ready to actually build a forecast and evaluate how well it performs.

2 Discussion Part II

2.1 Load Package

Before creating any plots I first had to install the fredr packages and call the fredr library. Then I went to the FREDR site and made an account to get a key. I had to follow some online help but I was able to store my key in an .renviron file. I had some trouble with that at first because I tried to save the renviron file while I had RStudio open so it kept adding a .sh to the back of the file and R wasn’t reading the key correctly but I eventually figured it out. After that I used the fredr_has_key function to check if the key was loaded (make sure it returned true).

#install.packages('fredr')
library(fredr)
Warning: package 'fredr' was built under R version 4.5.2
fredr_has_key()
[1] TRUE

2.2 Load Data

I chose the CT unemployment rate time series off of FREDR because I live in the CT and I figured it might be interesting to look at. The ID of this series is “CTUR” and it goes back to 1976. I did have to manipulate it a little bit because it imported into R as a tibble. Further, the frequency was monthly but it was showing up as daily once I converted it to a tsibble because the months were marked by the first day of every month. I used the mutate function to convert the index to a month from the yearmonth() function. I also used the select function to remove the realtime_start and realtime_end columns because they were just yesterday’s date and not useful.

#Connecticut Unemployment Rate (ct_ur) imported from FREDR which has the series name "CTUR". THis brought in a tibble. Convert to tsibble using the as_tsibble command.

ct_ur <- fredr(series_id = "CTUR") |> as_tsibble(index = date)
ct_ur
# A tsibble: 600 x 5 [1D]
   date       series_id value realtime_start realtime_end
   <date>     <chr>     <dbl> <date>         <date>      
 1 1976-01-01 CTUR        9.7 2026-01-28     2026-01-28  
 2 1976-02-01 CTUR        9.7 2026-01-28     2026-01-28  
 3 1976-03-01 CTUR        9.7 2026-01-28     2026-01-28  
 4 1976-04-01 CTUR        9.6 2026-01-28     2026-01-28  
 5 1976-05-01 CTUR        9.4 2026-01-28     2026-01-28  
 6 1976-06-01 CTUR        9.3 2026-01-28     2026-01-28  
 7 1976-07-01 CTUR        9.1 2026-01-28     2026-01-28  
 8 1976-08-01 CTUR        9   2026-01-28     2026-01-28  
 9 1976-09-01 CTUR        8.8 2026-01-28     2026-01-28  
10 1976-10-01 CTUR        8.7 2026-01-28     2026-01-28  
# ℹ 590 more rows
# It created a 600x5[1D] tsibble but the dates are the first day of every month and the frequency is monthly. I have to convert the date to monthly. Also the real time start and end dates are the same and is yesterday's date so we can just remove those.

ct_ur_m <- ct_ur |>
  mutate(Month = tsibble::yearmonth(date)) |>
  select(Month, value) |>
  as_tsibble(index = Month)

ct_ur_m
# A tsibble: 600 x 3 [1M]
      Month value date      
      <mth> <dbl> <date>    
 1 1976 Jan   9.7 1976-01-01
 2 1976 Feb   9.7 1976-02-01
 3 1976 Mar   9.7 1976-03-01
 4 1976 Apr   9.6 1976-04-01
 5 1976 May   9.4 1976-05-01
 6 1976 Jun   9.3 1976-06-01
 7 1976 Jul   9.1 1976-07-01
 8 1976 Aug   9   1976-08-01
 9 1976 Sep   8.8 1976-09-01
10 1976 Oct   8.7 1976-10-01
# ℹ 590 more rows

2.3 Time Series Visualization

2.4 Plot 1: Connecticut Unemployment Rate Over Time

I started with a simple time plot because it is the fastest way to get a baseline understanding of what the series is doing before jumping into anything more technical. Plotting the Connecticut unemployment rate over time lets you see the overall level, the major up and down cycles, and whether there are any obvious shocks or structural breaks that would affect how you interpret the data. What I found right away is that the series is dominated by broader economic cycles rather than short, random swings. You can clearly see periods where unemployment rises and then gradually falls over multiple years, and the spike around 2020 stands out as an extreme event compared to the rest of the history. This plot also helped confirm that the data are behaving like monthly unemployment data should, and it gave me a clear reason to follow up with seasonality and persistence diagnostics in the next plots.

#Use autoplot to graph the CT unemployment rate % in connecticut since January of 1976. Add the labels and title. This is the starting point of the visualization.

ct_ur_m |>
  autoplot(value) +
  labs(title = "Connecticut Unemployment Rate (CTUR)",
       x = "Month", y = "Percent")

2.5 Plot 2: CT Unemployment Seasonal Subseries Plot

I made the seasonal subseries plot next because after seeing the overall ups and downs in the time plot, I wanted to check whether there is any consistent month of year pattern in the data. This plot splits the unemployment rate into twelve panels, one for each month, so you can compare January across all years, February across all years, and so on. What I found is that the monthly averages are pretty similar from panel to panel, which suggests the CT unemployment rate does not have a strong seasonal signature where certain months are reliably higher or lower every year. Instead, the big movements show up in every month, which lines up with the idea that unemployment is mainly driven by broader economic cycles and shocks rather than predictable seasonality. Using the 2000–2025 window also keeps the plot readable while still capturing multiple cycles, including the major disruption around 2020.

# See if CT unemployment has any real month-to-month seasonality, or if it’s mostly just bigger economic cycles. Break the series into 12 panels (Jan–Dec) to compare the same month across years. If the monthly averages look pretty similar, then seasonality isn’t a big driver.

ct_ur_m |>
  filter_index("2000 Jan" ~ "2025 Dec") |>
  gg_subseries(value) +
  labs(
    title = "Seasonal Subseries: CT Unemployment Rate (CTUR), 2000–2025",
    x = "Month of year",
    y = "Percent"
  )
Warning: `gg_subseries()` was deprecated in feasts 0.4.2.
ℹ Please use `ggtime::gg_subseries()` instead.

2.6 Plot 3: CT Unemployment Autocorrelation Function (ACF) Plot

I made the ACF plot next because after looking at the series over time and checking for seasonality, I wanted to understand how much the unemployment rate depends on its own recent history. The ACF shows the correlation between the current value and past values at different lags, which is a quick way to see whether the series has short memory or if it carries momentum over time. What I found is that the autocorrelations start very high at the first few lags and then decline slowly, staying well above the significance bands for a long stretch. That pattern tells me the CT unemployment rate is highly persistent, meaning it tends to evolve gradually rather than bouncing around randomly from month to month. It also supports what the time plot suggested, which is that unemployment moves in longer business cycle waves, so past months contain a lot of information about where the series is likely headed in the near term.

#Evaluate how persistent the Connecticut unemployment rate is by plotting the ACF.This shows how strongly the current value is correlated with prior months at different lags, which helps assess whether the series evolves gradually over time and whether there is any recurring structure such as seasonality (for example, a pattern around lag 12 in monthly data).

ct_ur_m |>
  ACF(value) |>
  autoplot() +
  labs(
    title = "ACF: Connecticut Unemployment Rate (CTUR)",
    x = "Lag",
    y = "Autocorrelation"
  )