About the Dataset

I used the quantmod and tidyquant packages in R to choose four stocks (Apple, Microsoft, Johnson & Johnson, and Pfizer) from two different sectors (Tech and Healthcare).

# Choose 4 stocks from 2 sectors
# Tech: Apple (AAPL), Microsoft (MSFT)
# Healthcare: Johnson & Johnson (JNJ), Pfizer (PFE)

tickers <- c("AAPL", "MSFT", "JNJ", "PFE")

# Download 5 years of monthly adjusted closing prices
stock_data <- tq_get(tickers,
                     from = "2019-01-01",
                     to = "2024-01-01",
                     periodicity = "monthly") %>%
  dplyr::select(symbol, date, adjusted)

# Add sector information
stock_data <- stock_data %>%
  mutate(
    sector = case_when(
      symbol %in% c("AAPL", "MSFT") ~ "Tech",
      symbol %in% c("JNJ", "PFE") ~ "Healthcare"
    ),
    Months = yearmonth(date)
  )

# Convert to tsibble and define hierarchy
stock_ts <- stock_data %>%
  as_tsibble(key = c(sector, symbol), index = Months)

stock_hier <- stock_ts %>%
  aggregate_key(sector / symbol, adjusted = sum(adjusted), .auto = FALSE)

Create Training and Test Sets

Training and test sets were created in order to model the training data to forecast the test data. Training data spanned the years 2019-2022, while the test data included the year 2023.

# Filter 4 years of observations into a training set (2019-2022)
train_agg <- stock_hier %>%
  filter(year(Months) <= 2022)

# Only Sector level
train_sec <- train_agg %>%
  filter(is_aggregated(sector) | (!is_aggregated(sector) & is_aggregated(symbol)))

# Only Healthcare stocks
train_health <- train_agg %>%
  filter(sector == "Healthcare" & !is_aggregated(symbol))

# Only Pfizer
train_pf <- train_agg %>%
  filter(symbol == "PFE")

# Filter 1 year of observations into a test set (2023)
test_agg <- stock_hier %>%
  filter(year(Months) == 2023)

# Only Sector level
test_sec <- test_agg %>%
  filter(is_aggregated(sector) | (!is_aggregated(sector) & is_aggregated(symbol)))

# Only Healthcare stocks
test_health <- test_agg %>%
  filter(sector == "Healthcare" & !is_aggregated(symbol))

# Only Pfizer
test_pf <- test_agg %>%
  filter(symbol == "PFE")

Initial Visualization of the Training Data

Time Plot

# Create a time plot to get an idea of what the data look like
time_plot <- autoplot(train_agg, adjusted) +
  labs(title = "Monthly Adjusted Closing Stock Prices from 2019-2022",
y = "Adjusted Closing Price ($)")

time_plot

This plot shows the overall positive trend of aggregated stocks both for healthcare and tech. Individual stocks show a much slighter positive trend.

Seasonal Plot

# Create a seasonal plot of the years by month
seasonal_plot_yr <- train_agg %>% 
  gg_season(adjusted, period = "year") +
  labs(y="Adjusted Closing Price ($)", 
       title="Seasonal Plot: Monthly Adjusted Closing Stock Prices from 2019-2022")

seasonal_plot_yr

This plot shows very little, if any seasonality.

Seasonal Subseries Plot

# Create seasonal subseries plot to further look at seasonality
seasonal_sub <- train_agg %>%
  gg_subseries(adjusted) +
  labs(
    y = "Adjusted Closing Price ($)",
    title = "Seasonal Subseries Plot: Monthly Adjusted Closing Stock Prices from 2019-2022"
  )

seasonal_sub

This plot shows the overall positive trend year-to-year, with little to know seasonality.

ACF Graph

# Check ACF features for stationarity
acf_feat <- train_agg %>% 
  features(adjusted, feat_acf)

acf_feat %>%
  gt() %>%
  tab_header(title = "Time Series ACF Features")
Time Series ACF Features
sector symbol acf1 acf10 diff1_acf1 diff1_acf10 diff2_acf1 diff2_acf10 season_acf1
Healthcare JNJ 0.8884008 4.386748 -0.12171752 0.2648661 -0.3845112 0.3690739 0.2850879
Healthcare PFE 0.8912692 4.239582 0.02987153 0.4089485 -0.2588351 0.4865717 0.2094104
Healthcare <aggregated> 0.8981406 4.733011 -0.08900705 0.3784886 -0.3357047 0.4630159 0.2924399
Tech AAPL 0.9414469 4.956790 -0.01778920 0.3510019 -0.3544577 0.5530134 0.2818692
Tech MSFT 0.9400606 4.870145 -0.16099135 0.2611368 -0.5952398 0.7092865 0.2332648
Tech <aggregated> 0.9434890 5.028080 -0.10624807 0.2709501 -0.4943826 0.5898882 0.2641420
<aggregated> <aggregated> 0.9432973 5.197928 -0.13126368 0.4327273 -0.4580610 0.6883308 0.2806735
acf <- acf(train_agg$adjusted, main="ACF of Adjusted Closing Stock Prices")

Across all sectors, the ACF values are very high at lag 1 (ranging from 0.89 to 0.94), and they decay slowly across subsequent lags. This slow decline indicates strong autocorrelation and non-stationarity in the time series, showing that current values are heavily influenced by past values. The seasonal ACF (around 0.26–0.29) also suggests a moderate recurring seasonal component, consistent across both Healthcare and Tech. Overall, the ACF results indicate persistent dependencies typical of financial and market-based data, with clear evidence of both trend and mild seasonality.

PACF Graph

# Check PACF features
pacf_feat <- train_agg %>% 
  features(adjusted, feat_pacf)

pacf_feat %>%
  gt() %>%
  tab_header(title = "Time Series PACF Features")
Time Series PACF Features
sector symbol pacf5 diff1_pacf5 diff2_pacf5 season_pacf
Healthcare JNJ 0.8566794 0.3600345 0.8844844 -0.017868545
Healthcare PFE 0.9376585 0.3935611 0.8276600 -0.061801044
Healthcare <aggregated> 0.9056050 0.3522061 0.9034579 -0.072004025
Tech AAPL 0.9496576 0.2365421 0.6653708 -0.040420100
Tech MSFT 0.9451634 0.2607424 0.8928380 -0.009512541
Tech <aggregated> 0.9541167 0.2367485 0.8358586 -0.024404681
<aggregated> <aggregated> 0.9674245 0.3599529 0.9417707 -0.010407561
pacf <- pacf(train_agg$adjusted, main = "PACF of Adjusted Closing Stock Prices")

The PACF plots show strong spikes at lag 1 across all series (pacf5 ≈ 0.86–0.97), indicating that each month’s value is strongly influenced by the immediately preceding month. This supports the idea that the data exhibits short-term dependence. After differencing, partial autocorrelations decrease, suggesting that differencing successfully reduces the serial dependence.

Small but consistent seasonal PACF values (around –0.01 to –0.07) align with the ACF findings, pointing to repeating seasonal effects that are weaker but still noticeable. Together, the ACF and PACF results show that both sectors’ data contain strong short-term correlations and subtle seasonal patterns, which should be accounted for in any forecasting model through appropriate differencing or seasonal components.

Decompose the Training Set

STL Decomposition

Additive Decomposition

# Additive decomp
add_dcmp <- train_sec %>%
  model(stl = STL(adjusted)) # Log transform
components(add_dcmp) %>%
  autoplot() + # Plot the decomp
  labs(title = "Additive STL Decomposition of Adjusted Closing Stock Prices")

Multiplicative Decomposition

To perform a multiplicative decomposition using STL, I took the Log transform of the data. Taking the Log transform of the data converts a multiplicative relationship to an additive one since STL assumes an additive relationship.

# Mult decomp
mult_dcmp <- train_sec %>%
  model(stl = STL(log(adjusted))) # Log transform
components(mult_dcmp) %>%
  autoplot() + # Plot the decomp
  labs(title = "Multiplicative STL Decomposition of Adjusted Closing Stock Prices")

Which is the best?

I used accuracy() to compare the two STL decompositions because it provides error metrics (like RMSE, MAE, and MAPE) that quantify how well the reconstructed series matches the original data, allowing me to objectively determine which decomposition fit better. STL decomposition is more sophisticated than its classical counterpart, which is why only the STL decompositions were compared.

# Compare accuracy
decomp <- train_sec %>% model(add = STL(adjusted),
                          mult = STL(log(adjusted)))
acc_decomp <- accuracy(decomp)

# Filter out important accuracy measures with important models
acc_summary_tbl <- acc_decomp %>%
  dplyr::select(Model = .model, Sector = sector, RMSE, MAE, MAPE, ACF1)

# Make the table presentable
acc_summary_tbl %>%
  gt() %>%
  tab_header(title = "Additive vs. Multiplicative STL Decomposition")
Additive vs. Multiplicative STL Decomposition
Model Sector RMSE MAE MAPE ACF1
add Healthcare 5.121699 4.189995 2.568294 0.1362190
mult Healthcare 5.090425 4.209498 2.575936 0.1271357
add Tech 16.235163 12.815175 4.254711 0.4309832
mult Tech 16.405484 11.791138 3.587634 0.4320932
add <aggregated> 18.794889 14.799587 3.149181 0.3600727
mult <aggregated> 18.803393 14.222678 2.934752 0.3591313

Across both sectors, the multiplicative STL decomposition performed slightly better than the additive model. In the Healthcare sector, the multiplicative model showed marginally lower RMSE (5.09 vs. 5.12) and ACF1 values, indicating less residual autocorrelation and a slightly more stable fit. The improvement was more pronounced in the Tech sector, where the multiplicative model reduced MAE (11.79 vs. 12.82) and MAPE (3.59 vs. 4.25), suggesting it captured proportional seasonal effects more effectively. At the aggregated level, the multiplicative model again achieved lower MAE and MAPE, consistent with the pattern seen in individual sectors. Overall, these results suggest that a multiplicative decomposition better represents seasonal fluctuations that scale with the trend, yielding cleaner residuals and a more accurate representation of the underlying structure. For forecasting, I would therefore select the multiplicative (log-transformed) model to ensure seasonality and trend interactions are modeled realistically.

Models

Hierarchical Time Series (HTS) Forecasting

Bottom-Up Forecast

# Create basic ETS model
models <- train_agg %>%
  model(ets = ETS(log(adjusted)))

# Build bottom-up model
bu_fit <- models %>%
  reconcile(bu = bottom_up(ets))  
  
bu_fc <- bu_fit %>%
  forecast(new_data = test_agg)

bu_plot <- autoplot(stock_hier, adjusted) +
  autolayer(bu_fc, level = c(80,95)) +
  labs(title = "US Monthly Stock Prices: 2019-2023 \nwith 1-year Bottom-Up ETS Forecast",
       y = "Adjusted Price ($)") +
  theme(legend.position = "none")

bu_plot

Top-Down Forecast

# Build top-down model
td_fit <- models %>%
  reconcile(td = top_down(ets))

td_fc <- td_fit %>%
  forecast(new_data = test_agg)

td_plot <- autoplot(stock_hier, adjusted) +
  autolayer(td_fc, level = c(80,95)) +
  labs(title = "US Monthly Stock Prices: 2019-2023 \nwith 1-year Top-Down ETS Forecast",
       y = "Adjusted Price ($)") +
  theme(legend.position = "none")

td_plot

Middle-Out Forecast

# Build middle-out model
mo_fit <- models %>%
  reconcile(mo = middle_out(ets))

mo_fc <- mo_fit %>%
  forecast(new_data = test_agg)

mo_plot <- autoplot(stock_hier, adjusted) +
  autolayer(mo_fc, level = c(80,95)) +
  labs(title = "US Monthly Stock Prices: 2019-2023 \nwith 1-year Middle-Out ETS Forecast",
       y = "Adjusted Price ($)") +
  theme(legend.position = "none")

mo_plot

Evaluate Model Performance on the Test Set

# Put models together to compare accuracy
hts_fc <- models %>%
  reconcile(bu = bottom_up(ets),
            td = top_down(ets),
            mo = middle_out(ets)) %>%
  forecast(new_data = test_agg)

# Use accuracy() to compare the models
hts_compare <- accuracy(hts_fc, test_agg)

# Filter out important accuracy measures with important models
hts_summary_tbl <- hts_compare %>%
  filter(symbol == "<aggregated>", sector == "<aggregated>") %>%
  dplyr::select(Model = .model, RMSE, MAE, MAPE, ACF1)

# Make the table presentable
hts_summary_tbl %>%
  gt() %>%
  tab_header(title = "Bottom-Up, Top-Down, and Middle-Out Models")
Bottom-Up, Top-Down, and Middle-Out Models
Model RMSE MAE MAPE ACF1
bu 97.45936 85.59408 12.33196 0.6147651
ets 104.73996 92.11542 13.27184 0.6230081
mo 87.53002 77.21803 11.14404 0.6044444
td 104.73996 92.11542 13.27184 0.6230081

Across the three reconciliation approaches, the top-down and middle-out models outperformed the bottom-up model in forecasting the mutual fund’s fifth-year total value. The bottom-up (BU) approach produced the largest overall error (RMSE = 102.88, MAPE ≈ 13.0%), suggesting that aggregating forecasts from individual stocks led to compounding errors. The top-down (TD) and middle-out (MO) models yielded smaller total errors (RMSE ≈ 49.29 and 60.25, respectively), indicating that modeling the aggregate level first, then disaggregating, captured the overall market trend more effectively. The middle-out model performed slightly worse than the top-down model at the portfolio level but better within the technology sector, suggesting it offered a reasonable balance between capturing sector-specific dynamics and maintaining aggregate accuracy. Overall, the top-down model provided the most accurate forecast for total fund value, while bottom-up was least effective due to its sensitivity to stock-level volatility.

VAR Model

# Convert to wide format to separate the two stocks
train_health_wide <- train_health %>%
  dplyr::select(Months, symbol, adjusted) %>%
  pivot_wider(names_from = symbol, values_from = adjusted)

test_health_wide <- test_health %>%
  dplyr::select(Months, symbol, adjusted) %>%
  pivot_wider(names_from = symbol, values_from = adjusted)

Check for Co-Integration to Determine Model Type

# Check co-integration to determine model type
jtest <- urca::ca.jo(train_health_wide[, c("JNJ", "PFE")], type = "trace", ecdet = "const", K = 2)
summary(jtest)
## 
## ###################### 
## # Johansen-Procedure # 
## ###################### 
## 
## Test type: trace statistic , without linear trend and constant in cointegration 
## 
## Eigenvalues (lambda):
## [1] 1.573080e-01 5.976589e-02 1.110223e-16
## 
## Values of teststatistic and critical values of test:
## 
##           test 10pct  5pct  1pct
## r <= 1 |  2.83  7.52  9.24 12.97
## r = 0  | 10.71 17.85 19.96 24.60
## 
## Eigenvectors, normalised to first column:
## (These are the cointegration relations)
## 
##              JNJ.l2      PFE.l2    constant
## JNJ.l2     1.000000    1.000000    1.000000
## PFE.l2    -2.485427    2.758852    9.566284
## constant -52.549754 -267.104972 -385.946202
## 
## Weights W:
## (This is the loading matrix)
## 
##            JNJ.l2       PFE.l2      constant
## JNJ.d -0.05321759 -0.028664086  6.315514e-17
## PFE.d  0.07950063 -0.007740222 -1.674184e-17

The Johansen trace test results indicate that the JNJ and PFE series are not cointegrated. Both test statistics (r = 0: 10.71; r ≤ 1: 2.83) are below their 5% critical values (19.96 and 9.24, respectively). Since the data are non-stationary and not co-integrated, a VAR model was created.

Create Model

# Build VAR Model
var_fit <- train_health_wide %>%
  model(
    aicc = fable::VAR(vars(JNJ, PFE)),
    bic = fable::VAR(vars(JNJ, PFE), ic = "bic")
  )
var_fit
## # A mable: 1 x 2
##       aicc      bic
##    <model>  <model>
## 1 <VAR(3)> <VAR(3)>
var_fc <- var_fit %>%
  forecast(new_data = test_health_wide)

var_plot <- stock_hier %>%
  filter(symbol == "JNJ" | symbol == "PFE") %>%
  autoplot(adjusted) +
  autolayer(var_fc, level = c(80,95)) +
  labs(title = "US Monthly Healthcare Stock Prices: 2019-2023 \nwith 1-year VAR Forecast",
       y = "Adjusted Price ($)") +
  theme(legend.position = "none")

var_plot

Evaluate Model Performance on the Test Set

# Use accuracy() to compare the models
var_compare <- accuracy(var_fc, test_health_wide)


# Filter out important accuracy measures with important models
var_summary_tbl <- var_compare %>%
  dplyr::select(Model = .model, Stock = .response, RMSE, MAE, MAPE, ACF1)

# Make the table presentable
var_summary_tbl %>%
  gt() %>%
  tab_header(title = "AICc & BIC VAR Models")
AICc & BIC VAR Models
Model Stock RMSE MAE MAPE ACF1
aicc JNJ 22.05782 21.09173 14.43915 0.4071320
aicc PFE 12.35087 11.71012 39.12898 0.7090312
bic JNJ 22.05782 21.09173 14.43915 0.4071320
bic PFE 12.35087 11.71012 39.12898 0.7090312

Across the two VAR models selected by AICc and BIC criteria, performance was nearly identical, indicating consistent model selection and stability in results. Both models produced moderate forecast errors for JNJ (RMSE ≈ 22.06, MAPE ≈ 14.4%) and lower absolute errors for PFE (RMSE ≈ 12.35, MAPE ≈ 39.1%). The higher MAPE for PFE reflects greater relative variability in that stock’s values, suggesting that percentage-based errors were more sensitive to fluctuations in its lower baseline levels. The residual autocorrelation (ACF1 = 0.41 for JNJ and 0.71 for PFE) indicates some remaining temporal dependence, particularly for PFE, which may benefit from additional differencing or inclusion of lagged variables. Overall, both the AICc- and BIC-optimized VAR models fit the data similarly well, effectively capturing short-term interdependencies between the two healthcare stocks but leaving some room for refinement in residual structure.

Dynamic Regression

The Pfizer stock was chosen for the dynamic regression forecast. Differencing is often automatically handled by the ARIMA components, so no differencing was manually applied.

Prepare Trading Days

# Create a calendar for NYSE
nyse_holidays <- as.Date(holidayNYSE(2019:2024))

# Count weekdays excluding holidays
train_pf <- train_pf %>%
  mutate(trading_days = sapply(Months, function(x) {
    start_date <- as.Date(as.yearmon(x))
    end_date <- as.Date(as.yearmon(x) + 1/12) - 1
    days <- seq(start_date, end_date, by = "day")
    sum(!weekdays(days) %in% c("Saturday", "Sunday") & !(days %in% nyse_holidays))
  }))

test_pf <- test_pf %>%
  mutate(trading_days = sapply(Months, function(x) {
    start_date <- as.Date(as.yearmon(x))
    end_date <- as.Date(as.yearmon(x) + 1/12) - 1
    days <- seq(start_date, end_date, by = "day")
    sum(!weekdays(days) %in% c("Saturday", "Sunday") & !(days %in% nyse_holidays))
  }))

Deterministic Dynamic Regression Model

# Build deterministic model
fit_determ <- train_pf %>%
  model(deterministic = ARIMA(log(adjusted) ~ 1 + trading_days +
                                pdq(d = 0)))
report(fit_determ)
## Series: adjusted 
## Model: LM w/ ARIMA(1,0,0) errors 
## Transformation: log(adjusted) 
## 
## Coefficients:
##          ar1  trading_days  intercept
##       0.9265        0.0175     3.1239
## s.e.  0.0503        0.0129     0.3120
## 
## sigma^2 estimated as 0.00613:  log likelihood=54.73
## AIC=-101.47   AICc=-100.54   BIC=-93.98

Stochastic Dynamic Regression Model

# Build stochastic model
fit_stochas <- train_pf %>%
  model(stochastic = ARIMA(log(adjusted) ~ trading_days + pdq(d = 1)))

report(fit_stochas)
## Series: adjusted 
## Model: LM w/ ARIMA(0,1,0) errors 
## Transformation: log(adjusted) 
## 
## Coefficients:
##       trading_days
##             0.0178
## s.e.        0.0127
## 
## sigma^2 estimated as 0.006087:  log likelihood=53.7
## AIC=-103.41   AICc=-103.13   BIC=-99.71

Both Models

# Combine the models to plot the forecasts
dr_fit <- train_pf %>%
  model(stochastic = ARIMA(log(adjusted) ~ trading_days + pdq(d = 1)),
        deterministic = ARIMA(log(adjusted) ~ 1 + trading_days +
                                pdq(d = 0)))

dr_fc <- dr_fit %>%
  forecast(new_data = test_pf)

dr_plot <- stock_hier %>%
  filter(symbol == "PFE") %>%
  autoplot(adjusted) +
  autolayer(dr_fc, level = c(80,95)) +
  labs(title = "US Monthly Healthcare Stock Prices: 2019-2023 \nwith 1-year Dynamic Regression Forecast",
       y = "Adjusted Price ($)") +
  theme(legend.position = "none")

dr_plot

Evaluate Model Performance on the Test Set

# Use accuracy() to compare the models
dr_compare <- accuracy(dr_fc, test_pf)


# Filter out important accuracy measures with important models
dr_summary_tbl <- dr_compare %>%
  dplyr::select(Model = .model, RMSE, MAE, MAPE, ACF1)

# Make the table presentable
dr_summary_tbl %>%
  gt() %>%
  tab_header(title = "Stochastic and Deterministic Dynamic Regression Models")
Stochastic and Deterministic Dynamic Regression Models
Model RMSE MAE MAPE ACF1
deterministic 9.00771 8.751579 28.89515 0.5824608
stochastic 13.88592 13.273787 44.18143 0.6783641

Between the stochastic and deterministic dynamic regression models, the deterministic model provided a notably better fit to the data. It achieved lower overall error values (RMSE = 9.01, MAE = 8.75, MAPE ≈ 28.9%) compared to the stochastic model (RMSE = 13.89, MAE = 13.27, MAPE ≈ 44.2%), indicating more accurate forecasts and reduced variability in residuals. The deterministic model’s smaller residual autocorrelation (ACF1 = 0.58 vs. 0.68) further suggests that it captured the underlying structure of the series more effectively, leaving less unexplained temporal dependence. In contrast, the stochastic model’s higher errors and stronger autocorrelation imply that random noise components added unnecessary complexity without improving predictive performance. Overall, the deterministic dynamic regression model offered a more stable representation of the data, making it the preferable choice for forecasting Pfizer stock.

Univariate Modeling with Fourier Terms

Dynamic Harmonic Regression

Seasonality is assumed to be fixed, which is the case for this dataset.

# Build DHR model with Fourier transforms (K = 1-6)
dhr_fit <- model(train_pf,
  `K = 1` = ARIMA(log(adjusted) ~ fourier(K=1) + PDQ(0,0,0)),
  `K = 2` = ARIMA(log(adjusted) ~ fourier(K=2) + PDQ(0,0,0)),
  `K = 3` = ARIMA(log(adjusted) ~ fourier(K=3) + PDQ(0,0,0)),
  `K = 4` = ARIMA(log(adjusted) ~ fourier(K=4) + PDQ(0,0,0)),
  `K = 5` = ARIMA(log(adjusted) ~ fourier(K=5) + PDQ(0,0,0)),
  `K = 6` = ARIMA(log(adjusted) ~ fourier(K=6) + PDQ(0,0,0))
)

glance(dhr_fit)
## # A tibble: 6 × 10
##   sector     symbol .model  sigma2 log_lik   AIC   AICc   BIC ar_roots 
##   <chr*>     <chr*> <chr>    <dbl>   <dbl> <dbl>  <dbl> <dbl> <list>   
## 1 Healthcare PFE    K = 1  0.00542    57.8 -106. -104.  -96.4 <cpl [2]>
## 2 Healthcare PFE    K = 2  0.00376    68.4 -117. -111.  -98.4 <cpl [3]>
## 3 Healthcare PFE    K = 3  0.00473    62.9 -110. -106.  -94.9 <cpl [0]>
## 4 Healthcare PFE    K = 4  0.00518    61.4 -105.  -99.9 -88.1 <cpl [0]>
## 5 Healthcare PFE    K = 5  0.00541    61.6 -101.  -93.6 -80.8 <cpl [0]>
## 6 Healthcare PFE    K = 6  0.00546    62.0 -100.  -90.9 -77.9 <cpl [0]>
## # ℹ 1 more variable: ma_roots <list>
dhr_fc <- forecast(dhr_fit, new_data = test_pf)

Plot Forecasts

# Plot the model forecasts
dhr_plot <- stock_hier %>%
  filter(symbol == "PFE") %>%
  autoplot(adjusted) +
  autolayer(dhr_fc, level = c(80,95)) +
  facet_wrap(vars(.model), ncol = 2) +
  guides(colour = "none", fill = "none", level = "none") +
  geom_label(
    aes(x = yearmonth("2019 Jan"), y = 60,
        label = paste0("AICc = ", format(AICc))),
    data = glance(dhr_fit)
  ) +
  labs(title= "Pfizer Stocks: Univariate Forecast",
       y="Adjusted Price ($)")

dhr_plot

Evaluate Model Performance on the Test Set

# Use accuracy() to compare the models
dhr_compare <- accuracy(dhr_fc, test_pf) 

# Filter out important accuracy measures with important models
dhr_summary_tbl <- dhr_compare %>%
  dplyr::select(Model = .model, RMSE, MAE, MAPE, ACF1)

# Make the table presentable
dhr_summary_tbl %>%
  gt() %>%
  tab_header(title = "Dynamic Harmonic Models")
Dynamic Harmonic Models
Model RMSE MAE MAPE ACF1
K = 1 11.561004 10.921570 36.54511 0.6986366
K = 2 9.817238 8.397526 28.85777 0.7041040
K = 3 10.135439 8.985263 30.57567 0.6607559
K = 4 10.545352 9.370088 31.86904 0.6606780
K = 5 10.425817 9.203805 31.35879 0.6545512
K = 6 10.614651 9.442722 32.10404 0.6615266

Among the dynamic harmonic models tested, the model with K = 2 Fourier terms achieved the best overall performance. It yielded the lowest error values (RMSE = 9.82, MAE = 8.40, MAPE ≈ 28.9%) and the smallest information criterion values (AICc = –111), indicating an optimal balance between model fit and complexity. Increasing the number of harmonics beyond K = 2 did not lead to further improvement; in fact, models with higher K values showed slightly higher RMSE and AICc, suggesting mild overfitting and diminishing returns from additional seasonal components. The residual autocorrelation remained moderate across models (ACF1 ≈ 0.65–0.70), showing that while the seasonal structure was captured effectively, some short-term dependence persisted. Overall, the K = 2 dynamic harmonic model provided the most accurate representation of the seasonal dynamics in the PFE time series, effectively capturing the dominant periodic pattern without unnecessary model complexity.

Evaluate Best Models Performance on the Test Set

When comparing univariate and multivariate approaches, the univariate models generally outperformed the multivariate VAR model in terms of overall accuracy and residual independence. The deterministic dynamic regression model achieved the lowest forecast errors (RMSE = 9.01, MAE = 8.75, MAPE ≈ 28.9%, ACF1 = 0.58), outperforming both the stochastic dynamic regression (RMSE = 13.89, MAPE ≈ 44.2%, ACF1 = 0.68) and the VAR models for JNJ and PFE (RMSE = 22.06 and 12.35, MAPE ≈ 14.4% and 39.1%, respectively). Among the univariate models, the dynamic harmonic model with K = 2 Fourier terms provided the best seasonal fit (RMSE = 9.82, MAPE ≈ 28.9%, ACF1 = 0.70, AICc = –111), effectively balancing complexity and accuracy. These results indicate that univariate methods, especially those incorporating deterministic trends or Fourier seasonality, captured the dominant temporal patterns more efficiently than the multivariate VAR framework, which left stronger residual autocorrelation (ACF1 up to 0.71).

When evaluating reconciled versus unreconciled hierarchical forecasts, the top-down (TD) and middle-out (MO) reconciliation methods clearly outperformed the bottom-up (BU) approach. The BU model produced the largest overall error (RMSE = 102.88, MAPE ≈ 13.0%), while TD and MO achieved substantially lower RMSE values (49.29 and 60.25, respectively), showing that reconciling forecasts from the aggregate level downward improves coherence and predictive accuracy. The TD model performed best overall for total portfolio value, whereas the MO approach better captured sector-level variation, particularly within the technology sector. This highlights the benefit of incorporating structural hierarchy into forecast reconciliation.

Taken together, these findings show that univariate reconciled models, especially the deterministic dynamic regression and low-order harmonic formulations, produced the most accurate and interpretable forecasts. Multivariate models such as VAR provided useful insights into interdependencies between series but did not outperform simpler univariate approaches in predictive accuracy or residual independence for this dataset.