Inflation Forecasting Competition by The Economic Growth & Forecasting Lab at IBA-SESS
This is part of the Inflation Forecasting Competition organized by the Economic Growth & Forecasting Lab at IBA-SESS. The objective is to develop accurate models for predicting inflation using historical economic indicators and related variables. The competition promotes the use of various forecasting techniques to enhance predictive accuracy, with a focus on regression-based models.
We utilize a linear regression model as the primary method for forecasting inflation. The model is trained on historical data, where predictors (independent variables) include factors such as consumer confidence, credit to the private sector, interest rates, oil prices, and real exchange rates. Below is an outline of the modeling process:
The forecasting process involves the following steps:
# Load necessary libraries
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
#importing and saving the datset
data <- read.csv("C:/Users/Abdul Qudoos/Desktop/kaggle-EGF.csv", skip = 2, header = TRUE)
#checking on the data
str(data)
## 'data.frame': 142 obs. of 25 variables:
## $ Date: chr "1/1/2012" "2/1/2012" "3/1/2012" "4/1/2012" ...
## $ V1 : num 10.1 11.2 10.8 11.3 12.3 ...
## $ V2 : num 3492649 3567035 3606844 3602125 3626809 ...
## $ V3 : num 15722 24172 23286 21337 23713 ...
## $ V4 : num 29.5 29.9 30.4 29.2 28 ...
## $ V5 : num 10.9 11.5 11.3 11.3 10.8 ...
## $ V6 : num 136 140 140 140 134 ...
## $ V7 : num 79.8 80.8 81.6 82.4 83.8 ...
## $ V8 : num 3017343 2973088 2959743 2966976 2924351 ...
## $ V9 : num 5682167 5716810 5920093 5935537 6021033 ...
## $ V10 : num 13.2 13.1 12.8 12.8 12.9 ...
## $ V11 : num 90.3 90.7 90.8 90.7 91.5 ...
## $ V12 : num 71.8 95.1 67.6 81.3 91.3 ...
## $ V13 : num 602 603 603 604 605 ...
## $ V14 : num 11875 12878 13762 13990 13787 ...
## $ V15 : num 11.8 11.7 11.8 11.8 11.9 ...
## $ V16 : num 11.6 11.6 11.7 11.8 11.8 ...
## $ V17 : num 8226 8947 9650 9813 9714 ...
## $ V18 : num 5.88 5.85 5.74 5.76 5.88 5.82 5.88 5.87 5.75 5.64 ...
## $ V19 : num 107 113 118 114 104 ...
## $ V20 : num 12 12 12 12 12 12 12 10.5 10.5 10 ...
## $ V21 : num 3402132 3496691 3534391 3602893 3691756 ...
## $ V22 : num 87.5 87.6 87.7 87.9 88.1 ...
## $ V23 : num 105 103 104 106 107 ...
## $ V24 : num 11 11.6 11.3 11.3 11.1 ...
head(data)
## Date V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
## 1 1/1/2012 10.11 3492649 15722 29.48 10.90 136.29 79.84 3017343 5682167 13.18
## 2 2/1/2012 11.16 3567035 24172 29.94 11.52 140.04 80.80 2973088 5716810 13.14
## 3 3/1/2012 10.84 3606844 23286 30.39 11.27 140.11 81.58 2959743 5920093 12.80
## 4 4/1/2012 11.26 3602125 21337 29.18 11.33 139.73 82.43 2966976 5935537 12.83
## 5 5/1/2012 12.29 3626809 23713 27.96 10.85 134.00 83.78 2924351 6021033 12.94
## 6 6/1/2012 11.22 3739942 27917 30.07 11.10 131.24 83.76 2921997 6402735 13.13
## V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22
## 1 90.27 71.78 601.96 11874.89 11.75 11.65 8225.97 5.88 106.71 12 3402132 87.50
## 2 90.74 95.06 602.73 12877.88 11.74 11.65 8946.64 5.85 112.71 12 3496691 87.63
## 3 90.79 67.61 603.49 13761.76 11.81 11.70 9649.95 5.74 117.83 12 3534391 87.74
## 4 90.71 81.32 604.26 13990.38 11.85 11.75 9812.71 5.76 113.76 12 3602893 87.88
## 5 91.48 91.31 605.02 13786.62 11.87 11.76 9714.33 5.88 104.34 12 3691756 88.10
## 6 94.26 97.20 605.78 13801.41 11.88 11.78 9708.31 5.82 90.91 12 3799917 88.35
## V23 V24
## 1 104.74 10.98
## 2 102.94 11.65
## 3 104.09 11.26
## 4 105.76 11.31
## 5 107.39 11.11
## 6 105.47 11.48
sum(is.na(data))
## [1] 0
we removed the first two rows during the import process because they only contain metadata and unnecessary headers. Now, we are renaming the third row, which initially had generic names like ‘V1’, ‘V2’, etc., to more meaningful and interpretable column names based on the second row. This ensures that each column has a clear, useful name for analysis and modeling.
colnames(data) <- c("Date", "Inflation", "ADV", "AUTOS_F", "CCI", "CMR_E", "COMM",
"CPI", "CPS", "DEP", "DR", "E", "EPU2", "FERTS_F", "K100", "K1Y",
"K6M", "KALL", "LR", "OIL", "PR", "PSB", "QIM", "REER", "WAONR_A")
Additionally, Since there are no missing values in the dataset, we can proceed without any data imputation. However, the ‘Date’ column is currently in string format, which isn’t suitable for time series forecasting. To properly handle time-based data, we need to convert the ‘Date’ column to a date format. This conversion allows us to treat the ‘Date’ as a time index, enabling accurate modeling and forecasting.
data$Date <- as.Date(data$Date, format="%m/%d/%Y")
Inflation often shows seasonal patterns throughout the year, which can affect forecasts. To capture these patterns, we convert months into dummy variables, representing each month as a binary indicator (1 or 0). This helps the model understand month-specific effects on inflation, making predictions more accurate by accounting for recurring trends.
#Adding seasonalal dummies for months
data$Month <- as.factor(format(data$Date, "%m")) #Converting month to factor variable
#Creating dummy variables for each month
month_dummies <- model.matrix(~ Month - 1, data = data) %>% as.data.frame()
#Combining month dummies with original data (excluding 'Inflation' and 'Date')
X <- cbind(data %>% select(-Inflation, -Date, -Month), month_dummies)
##Including these dummies helps the model account for month-specific variations in the target variable (inflation), allowing it to capture seasonal trends that might affect the predictions.
y <- data$Inflation
To begin the forecasting process, we start by creating an initial linear regression model using all the variables available in the dataset. The primary goal is to understand which variables significantly impact the inflation rate. The model summary gives p-values for each variable, helping us identify which variables have a statistically significant effect (commonly considered as p < 0.05). Variables that meet this criterion will be used to build a refined model, ensuring that we only include relevant predictors, making the model both efficient and interpretable.
lm_model_initial <- lm(y ~ ., data = X)
#displaying the summary of the model
summary(lm_model_initial)
##
## Call:
## lm(formula = y ~ ., data = X)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.4033 -0.7168 -0.0509 0.6682 5.5302
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -9.006e+00 6.490e+01 -0.139 0.889893
## ADV -1.274e-06 1.304e-06 -0.977 0.330917
## AUTOS_F -8.808e-05 4.513e-05 -1.952 0.053583 .
## CCI -2.090e-01 6.913e-02 -3.023 0.003134 **
## CMR_E -2.981e-01 6.298e-01 -0.473 0.636963
## COMM -5.061e-02 3.500e-02 -1.446 0.151042
## CPI 1.575e-01 8.207e-02 1.919 0.057685 .
## CPS 4.219e-06 1.899e-06 2.222 0.028398 *
## DEP -8.097e-07 5.374e-07 -1.507 0.134851
## DR 1.240e+00 4.894e-01 2.534 0.012722 *
## E 1.254e-01 5.790e-02 2.166 0.032531 *
## EPU2 -3.072e-03 4.792e-03 -0.641 0.522845
## FERTS_F -3.893e-02 1.020e-01 -0.382 0.703415
## K100 2.475e-04 2.444e-04 1.013 0.313525
## K1Y -1.953e+00 1.860e+00 -1.050 0.296142
## K6M 5.269e-01 2.106e+00 0.250 0.802926
## KALL -5.075e-04 3.511e-04 -1.445 0.151275
## LR -4.493e+00 7.205e-01 -6.236 9.14e-09 ***
## OIL 7.813e-02 2.277e-02 3.431 0.000855 ***
## PR 2.595e+00 7.706e-01 3.368 0.001054 **
## PSB -4.206e-07 5.891e-07 -0.714 0.476808
## QIM 2.252e-02 5.599e-02 0.402 0.688339
## REER 2.083e-01 8.616e-02 2.418 0.017292 *
## WAONR_A 1.426e-01 8.750e-01 0.163 0.870837
## Month01 3.819e-01 6.991e-01 0.546 0.586067
## Month02 2.017e-01 7.343e-01 0.275 0.784117
## Month03 3.125e-01 7.490e-01 0.417 0.677331
## Month04 -3.489e-01 7.620e-01 -0.458 0.648010
## Month05 1.975e-01 7.761e-01 0.254 0.799623
## Month06 5.903e-02 7.635e-01 0.077 0.938521
## Month07 -3.736e-01 7.853e-01 -0.476 0.635190
## Month08 3.598e-01 7.851e-01 0.458 0.647683
## Month09 3.531e-01 7.832e-01 0.451 0.653036
## Month10 -2.383e-01 8.018e-01 -0.297 0.766911
## Month11 -2.294e-01 7.414e-01 -0.309 0.757605
## Month12 NA NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.476 on 107 degrees of freedom
## Multiple R-squared: 0.9718, Adjusted R-squared: 0.9629
## F-statistic: 108.5 on 34 and 107 DF, p-value: < 2.2e-16
We will only keep variables with p-values < 0.05 from the initial model since they are statistically significant. This helps in making the model more accurate by focusing only on key predictors of inflation.
significant_vars <- c("CCI", "CPS", "DR", "E", "LR", "OIL", "PR", "REER")
#filtering the data with only these variables, we onnly need to do this to independent variables only
X_refined <- X %>% select(all_of(significant_vars))
To prepare the data for training and testing, we split it into two sets based on specified date ranges:
#Creating indices for train and test sets based on the date range
train_indices <- which(data$Date >= "2011-01-01" & data$Date <= "2022-10-01")
test_indices <- which(data$Date >= "2022-11-01" & data$Date <= "2023-10-01")
#filtering the refined data and target variable for training and testing
X_train <- X_refined[train_indices, ]
y_train <- y[train_indices]
X_test <- X_refined[test_indices, ]
y_test <- y[test_indices]
After selecting the significant variables, we now will train a linear regression model using only these variables. The refined model aims to better predict inflation by focusing on the most relevant factors.
lm_model_refined <- lm(y_train ~ ., data = X_train)
summary(lm_model_refined)
##
## Call:
## lm(formula = y_train ~ ., data = X_train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.3080 -0.5734 0.1358 0.7200 3.5277
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.981e+01 3.699e+00 -8.060 6.17e-13 ***
## CCI -2.332e-01 4.005e-02 -5.823 4.85e-08 ***
## CPS -6.601e-07 3.881e-07 -1.701 0.091516 .
## DR 2.085e-01 4.135e-01 0.504 0.615003
## E 1.436e-01 1.960e-02 7.330 2.84e-11 ***
## LR -1.846e+00 5.948e-01 -3.104 0.002379 **
## OIL 3.987e-02 8.098e-03 4.923 2.73e-06 ***
## PR 1.341e+00 3.623e-01 3.701 0.000325 ***
## REER 2.297e-01 2.414e-02 9.513 2.33e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.304 on 121 degrees of freedom
## Multiple R-squared: 0.9224, Adjusted R-squared: 0.9173
## F-statistic: 179.9 on 8 and 121 DF, p-value: < 2.2e-16
After training the refined model, we use it to predict inflation on the test dataset. The accuracy of these predictions is measured using the Root Mean Square Error (RMSE), which provides a sense of how well the model forecasts inflation out-of-sample.
The comparison between the actual inflation values and the predicted inflation values is displayed in a table for better visualization.
# Making predictions on the test data using the refined model
y_pred_refined <- predict(lm_model_refined, X_test)
# Calculate RMSE for the out-of-sample predictions
rmse_refined <- sqrt(mean((y_test - y_pred_refined)^2))
print(paste("Refined Out-of-Sample RMSE is:", rmse_refined))
## [1] "Refined Out-of-Sample RMSE is: 4.96129962204243"
# Creating a comparison table of original vs. predicted inflation
comparison_table <- data.frame(
Date = data$Date[test_indices],
Actual_Inflation = y_test,
Predicted_Inflation = y_pred_refined)
# Displaying the comparison table
print(comparison_table)
## Date Actual_Inflation Predicted_Inflation
## 131 2022-11-01 23.99 24.10274
## 132 2022-12-01 24.60 23.62223
## 133 2023-01-01 27.55 25.14890
## 134 2023-02-01 31.59 29.10232
## 135 2023-03-01 35.18 34.72483
## 136 2023-04-01 36.30 36.05355
## 137 2023-05-01 38.12 34.68895
## 138 2023-06-01 29.28 33.90944
## 139 2023-07-01 28.16 34.20883
## 140 2023-08-01 27.25 36.33808
## 141 2023-09-01 31.49 38.61699
## 142 2023-10-01 26.86 35.74842
The following plot compares the actual inflation values with the predicted values from the refined model over the test period.
# Plot actual vs. predicted inflation for the refined model
par(mar = c(5, 5, 4, 2) + 0.1) # Adjusting margins for better visualization
plot(data$Date[test_indices], y_test, type = "o", col = "blue",
xlab = "Date", ylab = "Inflation",
main = "Final Results: Actual vs. Predicted Inflation",
ylim = range(c(y_test, y_pred_refined), na.rm = TRUE))
# Adding the line for predicted inflation
lines(data$Date[test_indices], y_pred_refined, type = "o", col = "red")
# Adding a legend to the plot to distinguish between actual and predicted values
legend("topright", legend = c("Actual", "Predicted"),
col = c("blue", "red"), lty = 1, bty = "n")
In this section, we apply a fixed rolling window method to forecast inflation using a linear regression model. A fixed window size of 100 observations is used, meaning that the model trains on the most recent 100 observations at each step and then predicts the next 12 months.
# Setting the window size for the fixed rolling window approach
window_size <- 100
fixed_rmse <- c() #This will store the RMSEs for each step
fixed_forecasts <- rep(NA, length(y_test)) # To store the 12-month forecasts
#Loop to perform fixed rolling window forecasts
for (i in 1:(nrow(X_test) - 12)) { X_window <- X_train[i:(i + window_size - 1), ]
y_window <- y_train[i:(i + window_size - 1)] # Defining the current window of data (100 observations)
#Training the linear model using the current window of data
lm_model_window <- lm(y_window ~ ., data = X_window)
#Making 12-month ahead predictions using the trained model
X_12 <- X_test[i:(i + 11), ]
y_pred_12 <- predict(lm_model_window, X_12)
#Storing the 12-month forecasts to compare with actual values later
fixed_forecasts[i:(i + 11)] <- y_pred_12
#Calculating RMSE for the current window and store it
fixed_rmse[i] <- sqrt(mean((y_test[i:(i + 11)] - y_pred_12)^2))}
# Calculate average RMSE for the fixed rolling window
avg_fixed_rmse <- mean(fixed_rmse, na.rm = TRUE)
print(paste("Average RMSE for Fixed Rolling Window is:", avg_fixed_rmse))
## [1] "Average RMSE for Fixed Rolling Window is: 6.90804822347697"
here we create a plot to visualize the actual inflation values against the 12-month forecasts generated using the fixed rolling window approach. This plot helps us understand how well the model’s forecasts align with the actual inflation data over time.
# Adjusting the margins for better visualization
par(mar = c(5, 5, 4, 2) + 0.1)
# Plotting the actual vs. 12-month forecasts from the fixed rolling window
plot(data$Date[test_indices], y_test, type = "o", col = "blue", xlab = "Date", ylab = "Inflation",
main = "Actual vs. Fixed Rolling Window 12-Month Forecasts", ylim = range(c(y_test, fixed_forecasts), na.rm = TRUE))
# Adding the line for 12-month forecasts
lines(data$Date[test_indices], fixed_forecasts, type = "o", col = "red")
# Adding a legend to the plot to distinguish between actual values and 12-month forecasts
legend("topright", legend = c("Actual", "12-Month Forecast"), col = c("blue", "red"), lty = 1, bty = "n")
Now, we implement an expanding rolling window approach to predict inflation. Unlike the fixed window, this method expands the training data with each iteration, adding one observation at a time. This allows the model to use more historical data as it makes predictions, potentially improving accuracy over time.
#Initializing variables for expanding rolling window
expanding_rmse <- c() #Storing RMSE for each window
expanding_forecasts <- rep(NA, length(y_test)) #Store 12-month forecasts
# Loop for expanding rolling window forecast
for (i in 1:(nrow(X_test) - 12)) {
X_window <- X_train[1:(window_size + i - 1), ]
y_window <- y_train[1:(window_size + i - 1)] #Define the expanding window
#Training the model on the expanding window
lm_model_expanding <- lm(y_window ~ ., data = X_window)
#Making 12-month ahead predictions using the trained model
X_12 <- X_test[i:(i + 11), ]
y_pred_12 <- predict(lm_model_expanding, X_12)
#Storing the 12-month forecasts to compare with actual values later
expanding_forecasts[i:(i + 11)] <- y_pred_12
#Calculating RMSE for the current window and store it
expanding_rmse[i] <- sqrt(mean((y_test[i:(i + 11)] - y_pred_12)^2))}
# Calculate average RMSE for the expanding rolling window
avg_expanding_rmse <- mean(expanding_rmse, na.rm = TRUE)
print(paste("Average RMSE for Expanding Rolling Window:", avg_expanding_rmse))
## [1] "Average RMSE for Expanding Rolling Window: 6.90804822347697"
The plot below shows the comparison between actual inflation and the 12-month forecasts generated using the expanding rolling window method.
# Plot actual vs. 12-month forecasts from the expanding rolling window
plot(data$Date[test_indices], y_test, type = "o", col = "blue", xlab = "Date", ylab = "Inflation",
main = "Actual vs. Expanding Rolling Window 12-Month Forecasts",
ylim = range(c(y_test, expanding_forecasts), na.rm = TRUE))
# Adding the line for 12-month forecasts
lines(data$Date[test_indices], expanding_forecasts, type = "o", col = "red")
par(mar = c(5, 5, 4, 2) + 0.1)
#Adding a legend to the plot to distinguish between actual values and 12-month forecasts
legend("topright", legend = c("Actual", "12-Month Forecast"), col = c("blue", "red"), lty = 1, bty = "n")
### Final Results Summary
Based on the three different forecasting approaches, here are the final results: