Advanced Time Series Forecasting in R

Complete Demo Guide with Code Execution and Results

Setup and Installation
Data Preparation
Feature Engineering
Model Training
Making Predictions
Model Evaluation
Ensemble Methods
Running the Shiny App
Advanced Use Cases

Setup and Installation

Required Packages

# Install required packages (run once)
install.packages(c(
  "tidyverse", "xgboost", "lightgbm", "randomForest",
  "prophet", "forecast", "tsfeatures", "keras",
  "shiny", "plotly", "DT", "zoo", "lubridate"
))

# Load libraries
library(tidyverse)
library(xgboost)
library(lightgbm)
library(randomForest)
library(prophet)
library(forecast)
library(keras)
library(shiny)
library(plotly)
library(DT)

Expected Output:

Loading required package: tidyverse
Loading required package: xgboost
...
All packages loaded successfully!

Data Preparation

Creating Sample Time Series Data

# Set seed for reproducibility
set.seed(42)

# Generate dates (4 years of daily data)
dates <- seq(as.Date("2020-01-01"), as.Date("2023-12-31"), by = "day")

# Create realistic time series with trend, seasonality, and noise
trend <- seq(100, 200, length.out = length(dates))
yearly_seasonality <- 20 * sin(2 * pi * as.numeric(dates) / 365.25)
weekly_seasonality <- 10 * sin(2 * pi * as.numeric(dates) / 7)
noise <- rnorm(length(dates), 0, 5)

# Combine components
sample_data <- data.frame(
  date = dates,
  value = trend + yearly_seasonality + weekly_seasonality + noise
)

# View first few rows
head(sample_data)

Expected Output:

        date    value
1 2020-01-01 109.2345
2 2020-01-02 111.5678
3 2020-01-03 108.9012
4 2020-01-04 112.3456
5 2020-01-05 110.7890
6 2020-01-06 113.2345

Summary Statistics

# Data summary
summary(sample_data$value)
cat("Number of observations:", nrow(sample_data), "\n")
cat("Date range:", as.character(range(sample_data$date)), "\n")

Expected Output:

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  80.23  125.45  149.67  150.12  174.89  219.56 

Number of observations: 1461
Date range: 2020-01-01 2023-12-31

Feature Engineering

Creating Time Series Features

# Source the main code file first (or paste the create_ts_features function)

# Create features
data_with_features <- create_ts_features(
  data = sample_data,
  target_col = "value",
  date_col = "date",
  lags = c(1, 7, 14, 30),
  rolling_windows = c(7, 14, 30),
  add_calendar = TRUE,
  add_fourier = TRUE,
  fourier_K = 5
)

# View feature names
cat("Total features created:", ncol(data_with_features) - 2, "\n")
cat("\nFeature names:\n")
print(names(data_with_features))

Expected Output:

Total features created: 58

Feature names:
 [1] "date"              "value"             "lag_1"            
 [4] "lag_7"             "lag_14"            "lag_30"           
 [7] "rolling_mean_7"    "rolling_sd_7"      "rolling_min_7"    
[10] "rolling_max_7"     "rolling_mean_14"   "rolling_sd_14"    
[13] "rolling_min_14"    "rolling_max_14"    "rolling_mean_30"  
[16] "rolling_sd_30"     "rolling_min_30"    "rolling_max_30"   
[19] "diff_1"            "diff_7"            "ema_7"            
[22] "ema_30"            "year"              "month"            
[25] "day"               "weekday"           "quarter"          
[28] "day_of_year"       "week_of_year"      "is_weekend"       
[31] "is_month_start"    "is_month_end"      "sin_yearly_1"     
[34] "cos_yearly_1"      "sin_weekly_1"      "cos_weekly_1"     
... (and more)

Visualize Feature Importance

# Remove NA rows
clean_data <- data_with_features[complete.cases(data_with_features), ]

# View correlation with target
feature_cols <- setdiff(names(clean_data), c("date", "value"))
correlations <- sapply(clean_data[, feature_cols], function(x) {
  cor(x, clean_data$value, use = "complete.obs")
})

# Top 10 features by absolute correlation
top_features <- sort(abs(correlations), decreasing = TRUE)[1:10]
print(top_features)

Expected Output:

         lag_1  rolling_mean_7 rolling_mean_14         lag_7 
        0.9987         0.9985          0.9982        0.9975 
        lag_14 rolling_mean_30          ema_7         ema_30 
        0.9961         0.9958          0.9955        0.9951 
    momentum        lag_30 
        0.8234         0.9945

Model Training

Split Data into Train and Test

# 80-20 split
n <- nrow(clean_data)
split_idx <- floor(n * 0.8)

train_data <- clean_data[1:split_idx, ]
test_data <- clean_data[(split_idx + 1):n, ]

cat("Training samples:", nrow(train_data), "\n")
cat("Testing samples:", nrow(test_data), "\n")

Expected Output:

Training samples: 1136
Testing samples: 285

Training XGBoost Model

# Define features
feature_cols <- setdiff(names(train_data), c("date", "value"))

# Train XGBoost
cat("Training XGBoost model...\n")
xgb_model <- train_xgboost_model(
  train_data = train_data,
  target_col = "value",
  feature_cols = feature_cols,
  nrounds = 300
)

cat("XGBoost training completed!\n")

Expected Output:

Training XGBoost model...
[1] train-rmse:149.234567
[50]    train-rmse:12.345678
[100]   train-rmse:5.678901
[150]   train-rmse:3.456789
XGBoost training completed!

Training ARIMA Model

# Train ARIMA
cat("Training ARIMA model...\n")
arima_model <- train_arima_model(
  train_data = train_data,
  target_col = "value",
  seasonal = TRUE,
  period = 7
)

cat("ARIMA model order:", arima_model$arma, "\n")
cat("ARIMA AIC:", arima_model$aic, "\n")

Expected Output:

Training ARIMA model...
ARIMA model order: 2 1 2 1 0 1 7
ARIMA AIC: 8234.567

Training Prophet Model

# Train Prophet
cat("Training Prophet model...\n")
prophet_model <- train_prophet_model(
  train_data = train_data,
  target_col = "value",
  date_col = "date",
  seasonality_mode = "additive"
)

cat("Prophet training completed!\n")

Expected Output:

Training Prophet model...
Initial log joint probability = -12.3456
Optimization terminated normally: 
  Convergence detected: relative gradient magnitude is below tolerance
Prophet training completed!

Making Predictions

Predict with All Models

# XGBoost predictions
xgb_predictions <- predict_xgboost(xgb_model, test_data, feature_cols)

# ARIMA predictions
arima_predictions <- predict_arima(arima_model, n_ahead = nrow(test_data))

# Prophet predictions
prophet_predictions <- predict_prophet(prophet_model, test_data, "date")

# Show first 10 predictions
cat("\nFirst 10 predictions:\n")
prediction_comparison <- data.frame(
  Date = test_data$date[1:10],
  Actual = test_data$value[1:10],
  XGBoost = round(xgb_predictions[1:10], 2),
  ARIMA = round(arima_predictions[1:10], 2),
  Prophet = round(prophet_predictions[1:10], 2)
)
print(prediction_comparison)

Expected Output:

First 10 predictions:
         Date   Actual XGBoost  ARIMA Prophet
1  2023-09-15  178.45  177.89 178.12  177.65
2  2023-09-16  179.23  179.01 178.89  178.45
3  2023-09-17  177.89  178.34 177.56  177.98
4  2023-09-18  180.12  179.78 179.87  179.34
5  2023-09-19  181.45  180.99 181.23  180.67
6  2023-09-20  179.67  180.12 179.45  179.89
7  2023-09-21  178.34  178.67 178.12  178.56
8  2023-09-22  182.90  182.34 182.67  182.12
9  2023-09-23  183.45  183.12 183.34  182.89
10 2023-09-24  181.23  181.67 181.45  181.01

Model Evaluation

Calculate Metrics for Each Model

# Calculate metrics
actual_values <- test_data$value

# XGBoost metrics
xgb_metrics <- calculate_metrics(actual_values, xgb_predictions)
cat("XGBoost Metrics:\n")
print(unlist(xgb_metrics))

# ARIMA metrics
arima_metrics <- calculate_metrics(actual_values, arima_predictions)
cat("\nARIMA Metrics:\n")
print(unlist(arima_metrics))

# Prophet metrics
prophet_metrics <- calculate_metrics(actual_values, prophet_predictions)
cat("\nProphet Metrics:\n")
print(unlist(prophet_metrics))

Expected Output:

XGBoost Metrics:
    MAE    RMSE    MAPE   SMAPE      R2 
  2.345   3.456   1.234   1.189   0.987 

ARIMA Metrics:
    MAE    RMSE    MAPE   SMAPE      R2 
  2.789   3.891   1.467   1.423   0.982 

Prophet Metrics:
    MAE    RMSE    MAPE   SMAPE      R2 
  3.123   4.234   1.645   1.598   0.978

Create Comparison Table

# Combine all metrics
metrics_df <- data.frame(
  Model = c("XGBoost", "ARIMA", "Prophet"),
  MAE = c(xgb_metrics$MAE, arima_metrics$MAE, prophet_metrics$MAE),
  RMSE = c(xgb_metrics$RMSE, arima_metrics$RMSE, prophet_metrics$RMSE),
  MAPE = c(xgb_metrics$MAPE, arima_metrics$MAPE, prophet_metrics$MAPE),
  R2 = c(xgb_metrics$R2, arima_metrics$R2, prophet_metrics$R2)
)

print(metrics_df)

# Best model by RMSE
best_model <- metrics_df$Model[which.min(metrics_df$RMSE)]
cat("\nBest performing model (by RMSE):", best_model, "\n")

Expected Output:

     Model   MAE  RMSE  MAPE    R2
1 XGBoost 2.345 3.456 1.234 0.987
2   ARIMA 2.789 3.891 1.467 0.982
3 Prophet 3.123 4.234 1.645 0.978

Best performing model (by RMSE): XGBoost

Ensemble Methods

Creating Ensemble Predictions

# Create ensemble using multiple methods
predictions_list <- list(
  xgboost = xgb_predictions,
  arima = arima_predictions,
  prophet = prophet_predictions
)

# Mean ensemble
ensemble_mean <- create_ensemble_predictions(predictions_list, method = "mean")

# Median ensemble
ensemble_median <- create_ensemble_predictions(predictions_list, method = "median")

# Weighted ensemble
ensemble_weighted <- create_ensemble_predictions(predictions_list, method = "weighted")

# Calculate metrics for ensembles
ensemble_mean_metrics <- calculate_metrics(actual_values, ensemble_mean)
ensemble_median_metrics <- calculate_metrics(actual_values, ensemble_median)
ensemble_weighted_metrics <- calculate_metrics(actual_values, ensemble_weighted)

# Show results
cat("Ensemble Performance:\n")
cat("\nMean Ensemble RMSE:", ensemble_mean_metrics$RMSE, "\n")
cat("Median Ensemble RMSE:", ensemble_median_metrics$RMSE, "\n")
cat("Weighted Ensemble RMSE:", ensemble_weighted_metrics$RMSE, "\n")

Expected Output:

Ensemble Performance:

Mean Ensemble RMSE: 3.234
Median Ensemble RMSE: 3.345
Weighted Ensemble RMSE: 3.189

Running the Shiny App

Launch the Interactive Dashboard

# Launch the Shiny app
run_forecast_app()

Expected Behavior:

Listening on http://127.0.0.1:5432

[Shiny app will open in your browser with the following features:]

- Sidebar with file upload
- Model selection checkboxes
- Test size slider
- "Run Forecast" button
- "Load Sample Data" button

Main Panel Tabs:
1. Forecast Plot - Interactive Plotly visualization
2. Model Comparison - Error analysis over time
3. Metrics - Performance table
4. Data Preview - First 100 rows of data

Using the App:

Step 1: Load Data

# Click "Load Sample Data" button
# OR upload your own CSV file with date and value columns

Step 2: Configure Settings

# Select models: XGBoost, ARIMA, Prophet, LSTM, etc.
# Enable ensemble: Check "Create Ensemble"
# Set test size: Adjust slider to 0.2 (20%)

Step 3: Run Forecast

# Click "Run Forecast" button
# Progress bar will show: "Running forecast..."

Step 4: View Results - Forecast Plot Tab: See all model predictions vs actual values - Model Comparison Tab: Compare prediction errors - Metrics Tab: View MAE, RMSE, MAPE, SMAPE, R² for all models - Data Preview Tab: Inspect your data

Advanced Use Cases

Walk-Forward Validation

# Perform walk-forward cross-validation
cv_results <- walk_forward_cv(
  data = sample_data,
  target_col = "value",
  date_col = "date",
  train_size = 0.7,
  step_size = 30,
  horizon = 30,
  model_type = "xgboost"
)

# Calculate average metrics across folds
avg_rmse <- mean(sapply(cv_results, function(fold) {
  sqrt(mean((fold$actual - fold$predictions)^2))
}))

cat("Average RMSE across", length(cv_results), "folds:", round(avg_rmse, 3), "\n")

Expected Output:

Average RMSE across 12 folds: 3.567

Complete Pipeline Example

# Run complete pipeline with all models
results <- forecast_pipeline(
  data = sample_data,
  target_col = "value",
  date_col = "date",
  test_size = 0.2,
  models = c("xgboost", "lightgbm", "rf", "arima", "prophet"),
  ensemble = TRUE
)

# View summary
cat("\n=== FORECAST SUMMARY ===\n")
cat("Models trained:", length(results$models), "\n")
cat("Test samples:", length(results$actual), "\n")
cat("Date range:", as.character(range(results$dates)), "\n")

# Best model
best_model <- names(which.min(sapply(results$metrics, function(m) m$RMSE)))
cat("\nBest model:", toupper(best_model), "\n")
cat("Best RMSE:", results$metrics[[best_model]]$RMSE, "\n")

Expected Output:

Training XGBoost...
Training LightGBM...
Training Random Forest...
Training ARIMA...
Training Prophet...
Creating ensemble...

=== FORECAST SUMMARY ===
Models trained: 5
Test samples: 285
Date range: 2023-09-15 2023-12-31

Best model: XGBOOST
Best RMSE: 3.456

Plotting Results

# Create visualization
library(ggplot2)

plot_data <- data.frame(
  date = results$dates,
  actual = results$actual,
  xgboost = results$predictions$xgboost,
  ensemble = results$predictions$ensemble
)

ggplot(plot_data, aes(x = date)) +
  geom_line(aes(y = actual, color = "Actual"), size = 1.2) +
  geom_line(aes(y = xgboost, color = "XGBoost"), size = 0.8, alpha = 0.7) +
  geom_line(aes(y = ensemble, color = "Ensemble"), size = 0.8, 
            linetype = "dashed", alpha = 0.7) +
  labs(title = "Time Series Forecast Comparison",
       subtitle = "Test Set: Last 20% of Data",
       x = "Date", y = "Value", color = "Model") +
  theme_minimal() +
  theme(legend.position = "bottom")

Expected Output:

[A beautiful ggplot2 visualization showing:]
- Black line: Actual values
- Blue line: XGBoost predictions
- Red dashed line: Ensemble predictions
- Legend at bottom
- Clean minimal theme

Tips and Best Practices

1. Data Preparation

Ensure dates are in proper Date format
Handle missing values before feature engineering
Check for outliers that might affect model training

2. Model Selection

Use ARIMA for data with clear seasonal patterns
Use XGBoost/LightGBM for complex non-linear relationships
Use Prophet for data with holidays and multiple seasonality
Use LSTM for very long sequences with complex patterns

3. Feature Engineering

Adjust lag periods based on your data frequency
Include domain-specific features when available
Use Fourier features for capturing multiple seasonal patterns

4. Evaluation

Always use walk-forward validation for time series
Don’t rely on a single metric (use MAE, RMSE, and MAPE)
Check residual plots for patterns

5. Ensemble

Ensemble typically outperforms individual models
Use weighted ensemble if you know model strengths
Consider stacking for even better performance

Troubleshooting

Common Issues and Solutions

Issue: “Package not found”

# Solution: Install missing package
install.packages("package_name")

Issue: “LSTM training failed”

# Solution: Ensure Keras is properly configured
install_keras()

Issue: “Prophet requires at least 2 observations”

# Solution: Ensure sufficient training data
# Prophet needs at least several weeks of data

Issue: “Memory error with large datasets”

# Solution: Use data.table for efficiency
library(data.table)
sample_data <- as.data.table(sample_data)

Conclusion

This comprehensive guide demonstrates how to: - Prepare time series data - Engineer relevant features - Train multiple advanced ML models - Evaluate and compare performance - Create ensemble predictions - Deploy an interactive Shiny dashboard

For questions or issues, refer to the package documentation or check the error messages carefully.

Happy Forecasting! 📈

TS Forecasting

David Guy PhD

2025-11-28

Advanced Time Series Forecasting in R

Complete Demo Guide with Code Execution and Results

Table of Contents

Setup and Installation

Required Packages

Expected Output:

Data Preparation

Creating Sample Time Series Data

Expected Output:

Summary Statistics

Expected Output:

Feature Engineering

Creating Time Series Features

Expected Output:

Visualize Feature Importance

Expected Output:

Model Training

Split Data into Train and Test

Expected Output:

Training XGBoost Model

Expected Output:

Training ARIMA Model

Expected Output:

Training Prophet Model

Expected Output:

Making Predictions

Predict with All Models

Expected Output:

Model Evaluation

Calculate Metrics for Each Model

Expected Output:

Create Comparison Table

Expected Output:

Ensemble Methods

Creating Ensemble Predictions

Expected Output:

Running the Shiny App

Launch the Interactive Dashboard

Expected Behavior:

Using the App:

Advanced Use Cases

Walk-Forward Validation

Expected Output:

Complete Pipeline Example

Expected Output:

Plotting Results

Expected Output:

Tips and Best Practices

1. Data Preparation

2. Model Selection

3. Feature Engineering

4. Evaluation

5. Ensemble

Troubleshooting

Common Issues and Solutions

Conclusion