Purchase Price Index (PPI) Analysis & Forecasting using Artificial Neural Networking

Team Introduction

Logan Ramanatham (22085735)
Lai Siau Hung (17207453)
Vinod Kunhi Krishnan Annukaran (22079942)
Lee Yin Tung (22083362)
Guo Xin Rui (S2166997)
Kok Jing Xian (22091447)

Introduction

Within economic analysis, the Purchase Price Index (PPI) acts as a linchpin, providing a comprehensive snapshot of pricing dynamics prevalent at the producer level. When fused with the computational prowess of Artificial Neural Networks (ANN), this amalgamation holds the promise of ushering in a new era in predictive analytics, offering a deeper understanding of the multifaceted patterns inherent in producer price fluctuations. The core essence of this study lies in untangling the intricate web of the Purchase Price Index through the lens of ANN’s adaptive learning mechanisms. By leveraging the innate ability of ANN to decipher non-linear relationships within intricate datasets, the objective is to construct predictive models that not only decipher historical trends embedded in the PPI but also possess the acumen to anticipate and forecast future fluctuations in producer prices with unprecedented accuracy. At its heart, this introduction serves as a gateway to an immersive journey into the symbiotic fusion of the Purchase Price Index and Artificial Neural Networks. This exploration aims to shed light on the transformative potential of this amalgamation, illuminating its ability to redefine the landscape of economic analysis and predictive forecasting, particularly within the realm of deciphering and predicting producer price dynamics.

Objective

To analyze and forecast the Purchase Price Index (PPI) using Artificial Neural Networking.

Libraries

Following libraries were chosen to provide a comprehensive toolkit for data manipulation, analysis, modelling, and visualization in the context of PPI forecasting using Artificial Neural Networking (ANN).

#Load libraries
library("neuralnet")
library("Metrics")
library("MLmetrics")
library("ggplot2")
library("fitdistrplus")
library("arrow")
library("WDI")
library("plotly")
library("dplyr")
library("lubridate")

Functions

In the following code segment, we create several functions for data preprocessing, preparing time-series data for neural network training and subsequent prediction tasks.

#Functions
splitDataRates <-  function(data, steps) {
  m <- matrix(ncol = steps+1)
  if (steps == 2) {
    colnames(m) <- c("input1", "input2", "output")
  } else if (steps == 3) {
    colnames(m) <- c("input1", "input2", "input3", "output")
  } else if (steps == 4) {
    colnames(m) <- c("input1", "input2", "input3", "input4", "output")
  } else if (steps == 5) {
    colnames(m) <- c("input1", "input2", "input3", "input4", "input5", "output")
  } 
  
  for (i in 1:(length(data)-(steps+1))) {
    v <- c(data[i:(i+steps)])
    m <- rbind(m, v)
  }
  return(m[-1,])
}

#Function for normalizing data
normalise <- function(x) {
  return((x - min(x)) / (max(x) - min(x)))
}

#Function to undo normalization
unnormalise <- function(x, min, max) { 
  return( (max - min)*x + min )
}

‘splitDataRates’ function is designed to transform a univariate time series data into a format suitable for neural network training. It is used to prepare the dataset for the neural network, ensuring that the network has the necessary lagged values to learn the temporal patterns in the data.

‘normalise’ function is designed to normalise the data, scaling the input values to a range between 0 and 1. It is applied to input features before feeding them into the model to ensure consistent scale across different variables.

‘unnormalise’ function is designed to reverse the normalization process, converting the normalized data back to their original scale.

Data Preparation

# Interest Rates
#Base Rate : Average rate across commercial banks during the month
interest_rate<-arrow::read_parquet("https://storage.data.gov.my/finsector/interestrates.parquet")
interest_rate_df <- as.data.frame(interest_rate)
baserate <- interest_rate_df %>% dplyr::select(1, 13);names(baserate) <- c("Date", "Base Rate")
baserate$Date <- as.Date(baserate$Date)
filtered_data <- baserate %>% 
  filter(Date >= as.Date("2010-01-01"))
baserate_monthly <- as.data.frame(filtered_data)

In this step, we load interest rate data from a Parquet file hosted online using the ‘arrow’ package. It is then converted into a data frame for data manipulation and analysis. Subsequently, other key steps in data preparation, including selecting relevant columns, converting date, and filtering data were conducted. Finally, the filtered data is stored in a new data frame named ‘baserate_monthly’ for further analysis.

# CPI by Group
#National CPI for 12 main groups of goods and services, consistent with the United Nations "Classification of Individual Consumption According to Purpose (COICOP)".
CPIgroup<-arrow::read_parquet("https://storage.dosm.gov.my/cpi/cpi_headline.parquet")
CPIgroup_df <- as.data.frame(CPIgroup)
CPI <- CPIgroup_df %>% dplyr::select(1, 2);names(CPI) <- c("Date", "CPI %")
CPI$Date <- as.Date(CPI$Date)
filtered_data <- CPI %>% 
  filter(Date >= as.Date("2010-01-01"))
CPI_monthly <- as.data.frame(filtered_data)

We extract, clean, and organize the CPI data for analysis. First, we load and read the CPI data from an online source. after loading, the CPI data is converted into a data frame to make it easier for manipulating and analyzing. Then, we select only the columns that represent the date and the CPI percentage, format the date, and filter the data to included data starting January 1, 2010. Finally, the filtered data is stored in a new data frame named ‘CPI_monthly’ for further analysis.

# MYR/USD Daily & Monthly PPI Rates
exchange_rate<-read_parquet("https://storage.data.gov.my/finsector/exchangerates.parquet")
exchange_rate_df <- as.data.frame(exchange_rate)
myrusd <- exchange_rate_df[, 1:2];names(myrusd) <- c("Date", "MYR/USD")
rates <- exchange_rate_df[,2]
myrusd$Date <- as.Date(myrusd$Date)
filtered_data <- myrusd %>% 
  filter(Date >= as.Date("2010-01-01") & Date<= as.Date(max(CPI_monthly$Date)))
myrusd_monthly_avg <- as.data.frame(filtered_data %>%
                                      mutate(Date = floor_date(Date, "month")) %>%
                                      group_by(Date) %>%
                                      summarize(Avg_MYR_USD = round(mean(`MYR/USD`, na.rm = TRUE),5)))

In this step, we focus on extracting, transforming, and preparing data related to the exchange rate between the Malaysian Ringgit (MYR) and the US Dollar (USD), specifically for the purpose of analyzing daily and monthly exchange rates. Like previous steps, we extract data from a Parquet file, convert it into a data frame, select relevant column, format data, and filter data to included records from January 1, 2010 up to the most recent date available in the ‘CPI_monthly’ data frame, which aligns the exchange rate data with the CPI data timeframe. Finally, we calculate the monthly average of the MYR/USD exchange rate and round it up to five decimal places for accuracy.

# Inflation Rate Calculation
CPIgroup <- arrow::read_parquet("https://storage.dosm.gov.my/cpi/cpi_headline.parquet")
CPIgroup_df <- as.data.frame(CPIgroup)
CPI <- CPIgroup_df %>% select(1, 2)
names(CPI) <- c("Date", "CPI")
CPI$Date <- as.Date(CPI$Date)
filtered_data <- CPI %>% 
  filter(Date >= as.Date("2010-01-01"))
CPI_monthly <- filtered_data %>%
  arrange(Date) %>%
  mutate(
    Monthly_Inflation_Rate = (CPI / lag(CPI) - 1) * 100
  )
CPI_monthly <- CPI_monthly %>% 
  mutate(Monthly_Inflation_Rate = ifelse(is.na(Monthly_Inflation_Rate), 0, Monthly_Inflation_Rate))
inflation_rate_monthly<- CPI_monthly %>% dplyr::select(1, 3);names(CPI) <- c("Date", "Inflation %")

This part is to calculate the monthly inflation rate based on CPI data. Inflation rate is a key economic indicator and is directly relevant to the analysis of PPI. Similar to previous steps, the code begins by loading CPI data from a Parquet file online. The loaded data is converted into a data frame for ease of manipulation. Followed by date formatting and data filtering. Then, the monthly inflation rate is calculated using the formula (CPI / lag(CPI) - 1) * 100. This formula computes the percentage change in CPI from one month to the next, which is a standard method of calculating inflation rates. The resulting inflation rates are then added to the data frame as a new column ‘Monthly_Inflation_Rate’. Before creating a new data frame (inflation_rate_monthly) that contain only the ‘date’ and the calculated ‘Monthly_Inflation_Rate’, we handle NA values with 0 to ensure the integrity of the dataset.

# Econometric Table for PPI
combined_df <- CPI_monthly %>%
  left_join(baserate_monthly, by = "Date") %>%
  left_join(inflation_rate_monthly, by = "Date")
combined_df <- combined_df[, -ncol(combined_df)]
economicmetric <- combined_df %>%
  mutate(
    CPI = ifelse(is.na(CPI), mean(CPI, na.rm = TRUE), CPI),
    `Base Rate` = ifelse(is.na(`Base Rate`), mean(`Base Rate`, na.rm = TRUE), `Base Rate`),
    Monthly_Inflation_Rate.x =ifelse(is.na(Monthly_Inflation_Rate.x), mean(Monthly_Inflation_Rate.x, na.rm = TRUE), Monthly_Inflation_Rate.x)
  )
economicmetric

##           Date   CPI Monthly_Inflation_Rate.x Base Rate
## 1   2010-01-01  99.4               0.00000000 0.4122404
## 2   2010-02-01  99.4               0.00000000 0.4122404
## 3   2010-03-01  99.4               0.00000000 0.4122404
## 4   2010-04-01  99.4               0.00000000 0.4122404
## 5   2010-05-01  99.6               0.20120724 0.4122404
## 6   2010-06-01  99.7               0.10040161 0.4122404
## 7   2010-07-01 100.0               0.30090271 0.4122404
## 8   2010-08-01 100.3               0.30000000 0.4122404
## 9   2010-09-01 100.4               0.09970090 0.4122404
## 10  2010-10-01 100.6               0.19920319 0.4122404
## 11  2010-11-01 100.8               0.19880716 0.4122404
## 12  2010-12-01 101.2               0.39682540 0.4122404
## 13  2011-01-01 101.8               0.59288538 0.4122404
## 14  2011-02-01 102.3               0.49115914 0.4122404
## 15  2011-03-01 102.4               0.09775171 0.4122404
## 16  2011-04-01 102.6               0.19531250 0.4122404
## 17  2011-05-01 102.9               0.29239766 0.4122404
## 18  2011-06-01 103.2               0.29154519 0.4122404
## 19  2011-07-01 103.4               0.19379845 0.4122404
## 20  2011-08-01 103.6               0.19342360 0.4122404
## 21  2011-09-01 103.8               0.19305019 0.4122404
## 22  2011-10-01 104.0               0.19267823 0.4122404
## 23  2011-11-01 104.1               0.09615385 0.4122404
## 24  2011-12-01 104.2               0.09606148 0.4122404
## 25  2012-01-01 104.5               0.28790787 0.4122404
## 26  2012-02-01 104.5               0.00000000 0.4122404
## 27  2012-03-01 104.5               0.00000000 0.4122404
## 28  2012-04-01 104.5               0.00000000 0.4122404
## 29  2012-05-01 104.7               0.19138756 0.4122404
## 30  2012-06-01 104.8               0.09551098 0.4122404
## 31  2012-07-01 104.8               0.00000000 0.4122404
## 32  2012-08-01 105.0               0.19083969 0.4122404
## 33  2012-09-01 105.2               0.19047619 0.4122404
## 34  2012-10-01 105.4               0.19011407 0.4122404
## 35  2012-11-01 105.5               0.09487666 0.4122404
## 36  2012-12-01 105.5               0.00000000 0.4122404
## 37  2013-01-01 105.9               0.37914692 0.4122404
## 38  2013-02-01 106.1               0.18885741 0.4122404
## 39  2013-03-01 106.2               0.09425071 0.4122404
## 40  2013-04-01 106.3               0.09416196 0.4122404
## 41  2013-05-01 106.6               0.28222013 0.4122404
## 42  2013-06-01 106.7               0.09380863 0.4122404
## 43  2013-07-01 106.9               0.18744142 0.4122404
## 44  2013-08-01 107.0               0.09354537 0.4122404
## 45  2013-09-01 107.9               0.84112150 0.4122404
## 46  2013-10-01 108.3               0.37071362 0.4122404
## 47  2013-11-01 108.6               0.27700831 0.4122404
## 48  2013-12-01 108.9               0.27624309 0.4122404
## 49  2014-01-01 109.5               0.55096419 0.4122404
## 50  2014-02-01 109.8               0.27397260 0.4122404
## 51  2014-03-01 109.9               0.09107468 0.4122404
## 52  2014-04-01 109.9               0.00000000 0.4122404
## 53  2014-05-01 110.0               0.09099181 0.4122404
## 54  2014-06-01 110.2               0.18181818 0.4122404
## 55  2014-07-01 110.3               0.09074410 0.4122404
## 56  2014-08-01 110.5               0.18132366 0.4122404
## 57  2014-09-01 110.7               0.18099548 0.4122404
## 58  2014-10-01 111.3               0.54200542 0.4122404
## 59  2014-11-01 111.9               0.53908356 0.4122404
## 60  2014-12-01 111.8              -0.08936550 0.4122404
## 61  2015-01-01 110.6              -1.07334526 0.4122404
## 62  2015-02-01 109.9              -0.63291139 0.4122404
## 63  2015-03-01 110.9               0.90991811 0.4122404
## 64  2015-04-01 111.9               0.90171326 0.4122404
## 65  2015-05-01 112.3               0.35746202 0.4122404
## 66  2015-06-01 113.0               0.62333037 0.4122404
## 67  2015-07-01 113.9               0.79646018 0.4122404
## 68  2015-08-01 113.9               0.00000000 0.4122404
## 69  2015-09-01 113.6              -0.26338894 0.4122404
## 70  2015-10-01 114.1               0.44014085 0.4122404
## 71  2015-11-01 114.8               0.61349693 0.4122404
## 72  2015-12-01 114.8               0.00000000 0.4122404
## 73  2016-01-01 114.5              -0.26132404 0.4122404
## 74  2016-02-01 114.5               0.00000000 0.4122404
## 75  2016-03-01 113.8              -0.61135371 0.4122404
## 76  2016-04-01 114.3               0.43936731 0.4122404
## 77  2016-05-01 114.6               0.26246719 0.4122404
## 78  2016-06-01 114.8               0.17452007 0.4122404
## 79  2016-07-01 115.1               0.26132404 0.4122404
## 80  2016-08-01 115.6               0.43440487 0.4122404
## 81  2016-09-01 115.3              -0.25951557 0.4122404
## 82  2016-10-01 115.7               0.34692108 0.4122404
## 83  2016-11-01 116.8               0.95073466 0.4122404
## 84  2016-12-01 116.8               0.00000000 0.4122404
## 85  2017-01-01 118.1               1.11301370 0.4122404
## 86  2017-02-01 119.6               1.27011008 0.4122404
## 87  2017-03-01 119.4              -0.16722408 0.4122404
## 88  2017-04-01 119.2              -0.16750419 0.4122404
## 89  2017-05-01 119.0              -0.16778523 0.4122404
## 90  2017-06-01 118.7              -0.25210084 0.4122404
## 91  2017-07-01 118.7               0.00000000 0.4122404
## 92  2017-08-01 119.8               0.92670598 0.4122404
## 93  2017-09-01 120.1               0.25041736 0.4122404
## 94  2017-10-01 120.0              -0.08326395 0.4122404
## 95  2017-11-01 120.8               0.66666667 0.4122404
## 96  2017-12-01 120.9               0.08278146 0.4122404
## 97  2018-01-01 121.3               0.33085194 0.4122404
## 98  2018-02-01 121.3               0.00000000 0.4122404
## 99  2018-03-01 120.9              -0.32976092 0.4122404
## 100 2018-04-01 120.9               0.00000000 0.4122404
## 101 2018-05-01 121.1               0.16542597 0.4122404
## 102 2018-06-01 119.6              -1.23864575 0.4122404
## 103 2018-07-01 119.8               0.16722408 0.4122404
## 104 2018-08-01 120.0               0.16694491 0.4122404
## 105 2018-09-01 120.5               0.41666667 0.4122404
## 106 2018-10-01 120.7               0.16597510 0.4122404
## 107 2018-11-01 121.0               0.24855012 0.4122404
## 108 2018-12-01 121.1               0.08264463 0.4122404
## 109 2019-01-01 120.5              -0.49545830 0.4122404
## 110 2019-02-01 120.8               0.24896266 0.4122404
## 111 2019-03-01 121.1               0.24834437 0.4122404
## 112 2019-04-01 121.1               0.00000000 0.4122404
## 113 2019-05-01 121.4               0.24772915 0.4122404
## 114 2019-06-01 121.4               0.00000000 0.4122404
## 115 2019-07-01 121.5               0.08237232 0.4122404
## 116 2019-08-01 121.8               0.24691358 0.4122404
## 117 2019-09-01 121.8               0.00000000 0.4122404
## 118 2019-10-01 122.0               0.16420361 0.4122404
## 119 2019-11-01 122.1               0.08196721 0.4122404
## 120 2019-12-01 122.3               0.16380016 0.4122404
## 121 2020-01-01 122.4               0.08176615 0.4122404
## 122 2020-02-01 122.4               0.00000000 0.4122404
## 123 2020-03-01 120.9              -1.22549020 0.4122404
## 124 2020-04-01 117.6              -2.72952854 0.4122404
## 125 2020-05-01 117.9               0.25510204 0.4122404
## 126 2020-06-01 119.1               1.01781170 0.4122404
## 127 2020-07-01 119.9               0.67170445 0.4122404
## 128 2020-08-01 120.1               0.16680567 0.4122404
## 129 2020-09-01 120.1               0.00000000 0.4122404
## 130 2020-10-01 120.2               0.08326395 0.4122404
## 131 2020-11-01 120.0              -0.16638935 0.4122404
## 132 2020-12-01 120.6               0.50000000 0.4122404
## 133 2021-01-01 122.1               1.24378109 0.4122404
## 134 2021-02-01 122.5               0.32760033 0.4122404
## 135 2021-03-01 122.9               0.32653061 0.4122404
## 136 2021-04-01 123.1               0.16273393 0.4122404
## 137 2021-05-01 123.1               0.00000000 0.4122404
## 138 2021-06-01 123.2               0.08123477 0.4122404
## 139 2021-07-01 122.5              -0.56818182 0.4122404
## 140 2021-08-01 122.5               0.00000000 0.4122404
## 141 2021-09-01 122.8               0.24489796 0.4122404
## 142 2021-10-01 123.7               0.73289902 0.4122404
## 143 2021-11-01 124.0               0.24252223 0.4122404
## 144 2021-12-01 124.5               0.40322581 0.4122404
## 145 2022-01-01 124.9               0.32128514 0.4122404
## 146 2022-02-01 125.2               0.24019215 0.4122404
## 147 2022-03-01 125.6               0.31948882 0.4122404
## 148 2022-04-01 125.9               0.23885350 0.4122404
## 149 2022-05-01 126.6               0.55599682 0.4122404
## 150 2022-06-01 127.4               0.63191153 0.4122404
## 151 2022-07-01 127.9               0.39246468 0.3731386
## 152 2022-08-01 128.2               0.23455825 0.3801607
## 153 2022-09-01 128.3               0.07800312 0.4003582
## 154 2022-10-01 128.6               0.23382697 0.3971271
## 155 2022-11-01 129.0               0.31104199 0.4138794
## 156 2022-12-01 129.2               0.15503876 0.4188929
## 157 2023-01-01 129.5               0.23219814 0.4166762
## 158 2023-02-01 129.8               0.23166023 0.4137872
## 159 2023-03-01 129.9               0.07704160 0.4137396
## 160 2023-04-01 130.0               0.07698229 0.4118372
## 161 2023-05-01 130.2               0.15384615 0.4279426
## 162 2023-06-01 130.4               0.15360983 0.4326895
## 163 2023-07-01 130.5               0.07668712 0.4353648
## 164 2023-08-01 130.8               0.22988506 0.4357718

This code segment is focused on creating an integrated dataset by combining various economic indicators, including CPI (CPI_monthly), base interest rates (baserate_monthly), and monthly inflation rates (inflation_rate_monthly). To achieve this, following steps is conducted.

First, we use the ‘left_join’ function to merge ‘CPI_monthly’, ‘baserate_monthly’, and ‘inflation_rate_monthly’ datasets. The merge is based on the Date column, which is common across these datasets. This step creates a comprehensive dataset (combined_df) that brings together key economic indicators. We then remove redundant columns resulting from the merging. Furthermore, we handle missing values by mean imputation. Finally, the resulting dataset, named ‘economicmetric’, now contains merged and cleaned data, ready for analysis.

# Derived Purchase Price Index
economicmetric <- economicmetric %>%
  mutate(
    PPI = CPI * (1 + Monthly_Inflation_Rate.x/100) * (1 + `Base Rate`/100)
  )
rates<-economicmetric$PPI

In this step, we aim to create a derived version of PPI using the ‘economicmetric’ dataset. The derived PPI is calculated by integrating several economic indicators, including CPI, monthly inflation rate, and base interest rate. This derived PPI is then stored in a separate vector named ‘rates’.

# Input/Output Matrices & Normalization 
# --- m = 2
training.dat<-round(0.8*length(rates))
testing.dat.start <- training.dat+1
splitRates_2 <- as.data.frame(splitDataRates(rates, 2))
testing.dat.end <- round(nrow(splitRates_2))
splitRatesNormalised_2 <- as.data.frame(lapply(splitRates_2, normalise))
splitRates_2_train <- splitRates_2[1:training.dat,]
splitRates_2_test <- splitRates_2[testing.dat.start:testing.dat.end,]
splitRatesNormalised_2_train <- splitRatesNormalised_2[1:training.dat,]
splitRatesNormalised_2_test <- splitRatesNormalised_2[testing.dat.start:testing.dat.end,]
#--- m = 3
splitRates_3 <- as.data.frame(splitDataRates(rates, 3))
testing.dat.end <- round(nrow(splitRates_3))
splitRatesNormalised_3 <- as.data.frame(lapply(splitRates_3, normalise))
splitRates_3_train <- splitRates_3[1:training.dat,] 
splitRates_3_test <- splitRates_3[testing.dat.start:testing.dat.end,]
splitRatesNormalised_3_train <- splitRatesNormalised_3[1:training.dat,]
splitRatesNormalised_3_test <- splitRatesNormalised_3[testing.dat.start:testing.dat.end,]
#--- m = 4
splitRates_4 <- as.data.frame(splitDataRates(rates, 4))
testing.dat.end <- round(nrow(splitRates_4))
splitRatesNormalised_4 <- as.data.frame(lapply(splitRates_4, normalise))
splitRates_4_train <- splitRates_4[1:training.dat,] 
splitRates_4_test <- splitRates_4[testing.dat.start:testing.dat.end,]
splitRatesNormalised_4_train <- splitRatesNormalised_4[1:training.dat,]
splitRatesNormalised_4_test <- splitRatesNormalised_4[testing.dat.start:testing.dat.end,]
#--- m = 5
splitRates_5 <- as.data.frame(splitDataRates(rates, 5))
testing.dat.end <- round(nrow(splitRates_5))
splitRatesNormalised_5 <- as.data.frame(lapply(splitRates_5, normalise))
splitRates_5_train <- splitRates_5[1:training.dat,] 
splitRates_5_test <- splitRates_5[testing.dat.start:testing.dat.end,]
splitRatesNormalised_5_train <- splitRatesNormalised_5[1:training.dat,]
splitRatesNormalised_5_test <- splitRatesNormalised_5[testing.dat.start:testing.dat.end,]

This part focuses on preparing the data for training ANN for PPI forecasting. We first split the dataset into 80% training set and 20% testing set. We then use the ‘splitDataRates()’ function to construct datasets with varying numbers of lagged values (2 to 5), creating different input configurations for the ANN models. Each of these datasets undergoes normalization, a crucial step where data values are scaled to a uniform range, enhancing the efficiency and accuracy of the neural network training. The datasets are subsequently divided into corresponding training and testing subsets. This preparation ensures that the neural networks have optimal data structures for effective learning and robust performance evaluation.

Modelling

In the following steps, multiple Artificial Neural Network (ANN) models are constructed, trained, and evaluated for forecasting the Purchase Price Index (PPI). The process begins with setting a seed for reproducibility, followed by building and training ANNs with varying configurations of input nodes (2 or 3) and hidden layers (1, 3, or 5). Each model, such as PPIModel_2_1 with 2 inputs and 1 hidden layer, is trained on a corresponding normalized dataset, like splitRatesNormalised_2_train, that includes lagged PPI values. In addition, we compute predictions on a normalized test dataset and compare them to the actual PPI values, with the correlation coefficient to quantify the prediction accuracy. To interpret the results in a real-world context, we de-normalize the predictions, converting them back to their original scale. The performance of these models is then evaluated on test datasets through metrics like RMSE, MAE, MAPE, and Forecast Accuracy. Furthermore, we perform a distribution analysis of the predicted PPI values, employing bootstrap techniques to estimate the distribution’s summary statistics. This analysis provides an in-depth look at the predictive distribution’s behavior, including its central tendency and variability. Finally, visualizations are created to compare actual versus predicted PPI values, offering a visual assessment of each model’s predictive accuracy.

# NN (2 inputs & 1 hidden layer)
set.seed(69)

# Building and training the neural network model
PPIModel_2_1 <- neuralnet(output ~ input1 + input2,
                               data = splitRatesNormalised_2_train,
                               hidden = c(1),
                               stepmax = 1e6) 

# Plotting the neural network
plot(PPIModel_2_1, rep = "best")

Neural Network Model Construction and Training:

The neural network model is a model with 2 input neurons, 1 hidden layer, and 1 output neuron. The neuralnet function from the neuralnet package was used to build the model, and it was trained using the splitRatesNormalised_2_train dataset. The hidden layer contains 1 neuron, and you can adjust the number of neurons in the hidden layer as needed. The stepmax parameter is set to 1e6, which is the maximum number of iterations for training.

# Computing model results and correlation
PPIResults_2_1 <- neuralnet::compute(PPIModel_2_1, splitRatesNormalised_2_test[1:2])
correlation <- cor(PPIResults_2_1$net.result, splitRatesNormalised_2_test$output) 
cat("Correlation: ", correlation, "\n")

## Correlation:  0.9798174

Computing Model Results and Correlation:

The compute function was used to calculate the model’s predictions on the test dataset splitRatesNormalised_2_test. The correlation between the predicted results and the test dataset outputs was calculated, resulting in a high correlation of 0.9798.

# De-normalizing the predictions
testRates_2_min <- min(splitRates_2_train$output)
testRates_2_max <- max(splitRates_2_train$output)
PPIPrediction_2_1 <- unnormalise(PPIResults_2_1$net.result,
                                      testRates_2_min,
                                      testRates_2_max)

# Calculating accuracy metrics
rmse_2_1 <- rmse(splitRates_2_test$output, PPIPrediction_2_1)
mae_2_1 <- mae(splitRates_2_test$output, PPIPrediction_2_1)
mape_2_1 <- mape(splitRates_2_test$output, PPIPrediction_2_1)
fa_2_1 <- (1-RMSPE(splitRates_2_test$output, PPIPrediction_2_1))*100
ForecastAccuracy_2_1<- matrix(data=c(rmse_2_1,mae_2_1,mape_2_1,fa_2_1),ncol = 4)
colnames(ForecastAccuracy_2_1) <- c("RMSE", "MAE", "MAPE","FA%")
ForecastAccuracy_2_1

##          RMSE      MAE       MAPE      FA%
## [1,] 8.923797 8.674132 0.06785149 92.49366

De-normalizing the Predictions involved using the unnormalise function to transform normalized predictions back to their original scale. Calculating Accuracy Metrics included computing Root Mean Squared Error (RMSE) at 8.923797, Mean Absolute Error (MAE) at 8.674132, Mean Absolute Percentage Error (MAPE) at 0.06785149, and Forecast Accuracy (FA%) at 92.49366%.

PPIPrediction_2_1_vector <- as.numeric(PPIPrediction_2_1[, 1])
descdist(PPIPrediction_2_1_vector, boot = 5000)

## summary statistics
## ------
## min:  116.9801   max:  119.6054 
## median:  118.564 
## mean:  118.525 
## estimated sd:  0.8572958 
## estimated skewness:  -0.2429534 
## estimated kurtosis:  1.510801

# Plotting real vs. predicted values
plot(splitRates_2_test$output, PPIPrediction_2_1, col='red', main='Real vs Predicted NN', pch=18, cex=0.4)
text(splitRates_2_test$output, PPIPrediction_2_1, labels = round(PPIPrediction_2_1, 2), pos = 4, col = "blue", cex = 0.7)
lines(PPIPrediction_2_1, lty=2, lwd=2, col="red")
abline(0,1,lwd=1)
legend('bottomright', legend='Forecast', pch=18, col='red', bty='n')

plot(splitRates_2_test$output, type = "l", col="black", main = "Real vs Predicted")
lines(PPIPrediction_2_1, lty=2, lwd=2, col="red")
legend('bottomright',legend='Forecast', pch=18,col='red', bty='n')

# NN (2 inputs & 3 hidden layer)
set.seed(69)

# Building and training the neural network model
PPIModel_2_3 <- neuralnet(output ~ input1 + input2,
                               data = splitRatesNormalised_2_train,
                               hidden = c(3),
                               stepmax = 1e6) 

# Plotting the neural network
plot(PPIModel_2_3, rep = "best")

Neural Network Model Construction and Training:

This model comprises 2 input neurons, 3 hidden layers, and 1 output neuron. The model was constructed using the neuralnet function from the neuralnet package and trained using the splitRatesNormalised_2_train dataset. The hidden layer consists of 3 neurons, introducing increased complexity compared to the previous model.

# Computing model results and correlation
PPIResults_2_3 <- neuralnet::compute(PPIModel_2_3, splitRatesNormalised_2_test[1:2])
correlation <- cor(PPIResults_2_3$net.result, splitRatesNormalised_2_test$output) 
cat("Correlation: ", correlation, "\n")

## Correlation:  0.9844616

Computing Model Results and Correlation:

Model predictions on the test dataset (splitRatesNormalised_2_test) are calculated using the compute function. The correlation between the predicted results and the actual test dataset outputs is determined, resulting in a high correlation value of approximately 0.9845.

# De-normalizing the predictions
testRates_2_min <- min(splitRates_2_train$output)
testRates_2_max <- max(splitRates_2_train$output)
PPIPrediction_2_3 <- unnormalise(PPIResults_2_3$net.result,
                                      testRates_2_min,
                                      testRates_2_max)

# Calculating accuracy metrics
rmse_2_3 <- rmse(splitRates_2_test$output, PPIPrediction_2_3)
mae_2_3 <- mae(splitRates_2_test$output, PPIPrediction_2_3)
mape_2_3 <- mape(splitRates_2_test$output, PPIPrediction_2_3)
fa_2_3 <- (1-RMSPE(splitRates_2_test$output, PPIPrediction_2_3))*100
ForecastAccuracy_2_3<- matrix(data=c(rmse_2_3,mae_2_3,mape_2_3,fa_2_3),ncol = 4)
colnames(ForecastAccuracy_2_3) <- c("RMSE", "MAE", "MAPE","FA%")
ForecastAccuracy_2_3

##          RMSE      MAE       MAPE   FA%
## [1,] 7.846771 7.702579 0.06032019 93.46

Denormalizing Predictions:

The model’s predictions, previously normalized, have been reverted to their original scale using the minimum (testRates_2_min) and maximum (testRates_2_max) values from the training dataset.

Accuracy Metrics:

The Root Mean Squared Error (RMSE) is 7.846771, the Mean Absolute Error (MAE) is 7.702579, the Mean Absolute Percentage Error (MAPE) is 0.06032019, and the Forecast Accuracy (FA%) is 93.46%.

PPIPrediction_2_3_vector <- as.numeric(PPIPrediction_2_3[, 1])
descdist(PPIPrediction_2_3_vector, boot = 5000)

## summary statistics
## ------
## min:  117.1155   max:  121.5489 
## median:  119.4037 
## mean:  119.4966 
## estimated sd:  1.487747 
## estimated skewness:  -0.07054787 
## estimated kurtosis:  1.406443

# Plotting real vs. predicted values
plot(splitRates_2_test$output, PPIPrediction_2_3 ,col='red',main='Real vs predicted NN',pch=18,cex=0.4)
abline(0,1,lwd=1)
legend('bottomright',legend='Forecast', pch=18,col='red', bty='n')

plot(splitRates_2_test$output, type = "l", col="black", main = "Real vs Predicted")
lines(PPIPrediction_2_3, lty=2, lwd=2, col="red")
legend('bottomright',legend='Forecast', pch=18,col='red', bty='n')

# NN (2 inputs & 5 hidden layer)
set.seed(69)

# Building and training the neural network model
PPIModel_2_5 <- neuralnet(output ~ input1 + input2,
                               data = splitRatesNormalised_2_train,
                               hidden = c(5),
                               stepmax = 1e6) 

# Plotting the neural network
plot(PPIModel_2_5, rep = "best")

Neural Network Model Construction and Training:

This model consists of 2 input neurons, 5 hidden layers, and 1 output neuron. The neuralnet function from the neuralnet package was used to build the model. The hidden layer contains 5 neurons, making the model more complex compared to the previous ones.

# Computing model results and correlation
PPIResults_2_5 <- neuralnet::compute(PPIModel_2_5, splitRatesNormalised_2_test[1:2])
correlation <- cor(PPIResults_2_5$net.result, splitRatesNormalised_2_test$output) 
cat("Correlation: ", correlation, "\n")

## Correlation:  0.9843718

Computing Model Results and Correlation:

The neural network model with 2 input neurons and 5 hidden layers has been evaluated, resulting in a high correlation coefficient of approximately 0.9844 between the model’s predictions and the actual Purchase Price Index (PPI) values. This indicates a strong positive linear relationship, suggesting that the model’s predictions closely align with the observed PPI values. The high correlation suggests that the model with the additional complexity of 5 hidden layers is performing well in capturing the underlying patterns in the data.

# De-normalizing the predictions
testRates_2_min <- min(splitRates_2_train$output)
testRates_2_max <- max(splitRates_2_train$output)
PPIPrediction_2_5 <- unnormalise(PPIResults_2_5$net.result,
                                      testRates_2_min,
                                      testRates_2_max)

# Calculating accuracy metrics
rmse_2_5 <- rmse(splitRates_2_test$output, PPIPrediction_2_5)
mae_2_5 <- mae(splitRates_2_test$output, PPIPrediction_2_5)
mape_2_5 <- mape(splitRates_2_test$output, PPIPrediction_2_5)
fa_2_5 <- (1-RMSPE(splitRates_2_test$output, PPIPrediction_2_5))*100
ForecastAccuracy_2_5<- matrix(data=c(rmse_2_5,mae_2_5,mape_2_5,fa_2_5),ncol = 4)
colnames(ForecastAccuracy_2_5) <- c("RMSE", "MAE", "MAPE","FA%")
ForecastAccuracy_2_5

##          RMSE      MAE       MAPE      FA%
## [1,] 7.090499 6.999637 0.05486164 94.12411

Accuracy Metrics for predicting the Purchase Price Index (PPI) in denormalized terms show Root Mean Squared Error (RMSE) at 7.090499, Mean Absolute Error (MAE) at 6.999637, Mean Absolute Percentage Error (MAPE) at 0.05486164, and Forecast Accuracy (FA%) at 94.12411%. These metrics demonstrate that the model’s predictions are closer to the actual values, indicated by the lower RMSE and MAE, suggesting improved accuracy compared to previous models. The MAPE and FA% values also reflect relatively low prediction errors.

PPIPrediction_2_5_vector <- as.numeric(PPIPrediction_2_5[, 1])
descdist(PPIPrediction_2_5_vector, boot = 5000)

## summary statistics
## ------
## min:  117.4608   max:  122.9111 
## median:  120.0046 
## mean:  120.1995 
## estimated sd:  1.893125 
## estimated skewness:  0.01104616 
## estimated kurtosis:  1.378583

These summary statistics provide valuable insights into the distribution and characteristics of the denormalized predictions. The relatively small standard deviation indicates that the predicted values cluster closely around the mean, suggesting consistency in the model’s forecasts. The skewness close to zero indicates a nearly symmetric distribution of predictions, and the kurtosis value suggests a distribution slightly less heavy-tailed than a normal distribution.

# Plotting real vs. predicted values
plot(splitRates_2_test$output, PPIPrediction_2_5 ,col='red',main='Real vs predicted NN',pch=18,cex=0.4)
abline(0,1,lwd=1)
legend('bottomright',legend='Forecast', pch=18,col='red', bty='n')

plot(splitRates_2_test$output, type = "l", col="black", main = "Real vs Predicted")
lines(PPIPrediction_2_5, lty=2, lwd=2, col="red")
legend('bottomright',legend='Forecast', pch=18,col='red', bty='n')

# NN (3 inputs & 1 hidden layer)
set.seed(69)

# Building and training the neural network model
PPIModel_3_1 <- neuralnet(output ~ input1 + input2 + input3,
                               data = splitRatesNormalised_3_train,
                               hidden = c(1),
                               stepmax = 1e6)

# Plotting the neural network
plot(PPIModel_3_1, rep = "best")

Neural Network Model Construction and Training:

For this neural network model, it comprises 3 input neurons, 1 hidden layer, and 1 output neuron. The construction of the model employed the neuralnet function from the neuralnet package. In this configuration, the hidden layer is comprised of a single neuron, rendering the model simpler in structure compared to models with multiple hidden layers.

# Computing model results and correlation
PPIResults_3_1 <- neuralnet::compute(PPIModel_3_1, splitRatesNormalised_3_test[1:3])
correlation <- cor(PPIResults_3_1$net.result, splitRatesNormalised_3_test$output)
cat("Correlation: ", correlation, "\n")

## Correlation:  0.9831341

Computing Model Results and Correlation:

Upon computing the results of the neural network model with 3 input neurons and 1 hidden layer, a correlation coefficient of approximately 0.9831 was obtained when comparing the model’s predictions with the actual Purchase Price Index (PPI) values. This high correlation indicates a strong positive linear relationship, signifying that the model’s predictions closely align with the observed PPI values. The correlation coefficient serves as an indicator of the model’s capability to capture and predict the underlying patterns in the data accurately.

# De-normalizing the predictions
testRates_3_min <- min(splitRates_3_train$output)
testRates_3_max <- max(splitRates_3_train$output)
PPIPrediction_3_1 <- unnormalise(PPIResults_3_1$net.result,
                                      testRates_3_min,
                                      testRates_3_max)
# Calculating accuracy metrics
rmse_3_1 <- rmse(splitRates_3_test$output, PPIPrediction_3_1)
mae_3_1 <- mae(splitRates_3_test$output, PPIPrediction_3_1)
mape_3_1 <- mape(splitRates_3_test$output, PPIPrediction_3_1)
fa_3_1 <- (1-RMSPE(splitRates_3_test$output, PPIPrediction_3_1))*100
ForecastAccuracy_3_1<- matrix(data=c(rmse_3_1,mae_3_1,mape_3_1,fa_3_1),ncol = 4)
colnames(ForecastAccuracy_3_1) <- c("RMSE", "MAE", "MAPE","FA%")
ForecastAccuracy_3_1

##          RMSE      MAE       MAPE      FA%
## [1,] 7.842889 7.705588 0.06029131 93.46925

Denormalizing Predictions:

Following the neural network model’s predictions, which were initially normalized, the results have been transformed back to their original scale. This denormalization process utilized the minimum (testRates_2_min) and maximum (testRates_2_max) values obtained from the training dataset to ensure the predictions are expressed in their native units.

Accuracy Metrics:

These metrics collectively offer a comprehensive evaluation of the model’s forecasting performance. A lower RMSE and MAE signify increased precision, while a smaller MAPE suggests reduced percentage error in the predictions. The FA%, on the other hand, represents the overall forecasting accuracy, with higher values indicating superior performance.

PPIPrediction_3_1_vector <- as.numeric(PPIPrediction_3_1[, 1])
descdist(PPIPrediction_3_1_vector, boot = 5000)

## summary statistics
## ------
## min:  117.1941   max:  121.6289 
## median:  119.5284 
## mean:  119.6243 
## estimated sd:  1.491617 
## estimated skewness:  -0.1338361 
## estimated kurtosis:  1.435918

# Plotting real vs. predicted values
plot(splitRates_3_test$output, PPIPrediction_3_1 ,col='red',main='Real vs predicted NN',pch=18,cex=0.4)
abline(0,1,lwd=1)
legend('bottomright',legend='Forecast', pch=18,col='red', bty='n')

plot(splitRates_3_test$output, type = "l", col="black", main = "Real vs Predicted")
lines(PPIPrediction_3_1, lty=2, lwd=2, col="red")
legend('bottomright',legend='Forecast', pch=18,col='red', bty='n')

# NN (3 inputs & 3 hidden layer)
set.seed(69)

# Building and training the neural network model
PPIModel_3_3 <- neuralnet(output ~ input1 + input2 + input3,
                               data = splitRatesNormalised_3_train,
                               hidden = c(3),
                               stepmax = 1e6)


# Plotting the neural network
plot(PPIModel_3_3, rep = "best")

Neural Network Model Construction and Training:

This neural network model was developed with 3 input neurons, 3 hidden layers, and 1 output neuron. The neuralnet function from the neuralnet package was utilized to build and train the model. The presence of 3 hidden neurons in the model’s architecture introduces a moderate level of complexity.

# Computing model results and correlation
PPIResults_3_3 <- neuralnet::compute(PPIModel_3_3, splitRatesNormalised_3_test[1:3])
correlation <- cor(PPIResults_3_3$net.result, splitRatesNormalised_3_test$output) 
cat("Correlation: ", correlation, "\n")

## Correlation:  0.9839146

Computing Model Results and Correlation:

After constructing and training the neural network, computations were carried out to obtain its results. Subsequently, the correlation between the model’s predictions and the actual output values was computed. The resulting correlation coefficient, which measures the linear relationship between predictions and actual outcomes, is 0.9839146.

# De-normalizing the predictions
testRates_3_min <- min(splitRates_3_train$output)
testRates_3_max <- max(splitRates_3_train$output)
PPIPrediction_3_3 <- unnormalise(PPIResults_3_3$net.result,
                                      testRates_3_min,
                                      testRates_3_max)

# Calculating accuracy metrics
rmse_3_3 <- rmse(splitRates_3_test$output, PPIPrediction_3_3)
mae_3_3 <- mae(splitRates_3_test$output, PPIPrediction_3_3)
mape_3_3 <- mape(splitRates_3_test$output, PPIPrediction_3_3)
fa_3_3 <- (1-RMSPE(splitRates_3_test$output, PPIPrediction_3_3))*100
ForecastAccuracy_3_3<- matrix(data=c(rmse_3_3,mae_3_3,mape_3_3,fa_3_3),ncol = 4)
colnames(ForecastAccuracy_3_3) <- c("RMSE", "MAE", "MAPE","FA%")
ForecastAccuracy_3_3

##          RMSE     MAE       MAPE      FA%
## [1,] 7.832381 7.69873 0.06024098 93.47836

PPIPrediction_3_3_vector <- as.numeric(PPIPrediction_3_3[, 1])
descdist(PPIPrediction_3_3_vector, boot = 5000)

## summary statistics
## ------
## min:  117.3181   max:  121.6902 
## median:  119.533 
## mean:  119.6312 
## estimated sd:  1.511325 
## estimated skewness:  -0.1002512 
## estimated kurtosis:  1.404186

# Plotting real vs. predicted values
plot(splitRates_3_test$output, PPIPrediction_3_3 ,col='red',main='Real vs predicted NN',pch=18,cex=0.4)
abline(0,1,lwd=1)
legend('bottomright',legend='Forecast', pch=18,col='red', bty='n')

plot(splitRates_3_test$output, type = "l", col="black", main = "Real vs Predicted")
lines(PPIPrediction_3_3, lty=2, lwd=2, col="red")
legend('bottomright',legend='Forecast', pch=18,col='red', bty='n')

# NN (3 inputs & 5 hidden layer)
set.seed(69)

# Building and training the neural network model
PPIModel_3_5 <- neuralnet(output ~ input1 + input2 + input3,
                               data = splitRatesNormalised_3_train,
                               hidden = c(5),
                               stepmax = 1e6)

# Plotting the neural network
plot(PPIModel_3_5, rep = "best")

Neural Network Model Construction and Training:

This neural network model was designed with 3 input neurons, 5 hidden layers, and 1 output neuron. The construction and training of the model were carried out using the neuralnet function from the neuralnet package. The presence of 5 hidden neurons adds complexity to the model, potentially allowing it to capture intricate patterns in the data.

# Computing model results and correlation
PPIResults_3_5 <- neuralnet::compute(PPIModel_3_5, splitRatesNormalised_3_test[1:3])
correlation <- cor(PPIResults_3_5$net.result, splitRatesNormalised_3_test$output) 
cat("Correlation: ", correlation, "\n")

## Correlation:  0.982179

Computing Model Results and Correlation:

Upon constructing the neural network model, computations were performed to obtain its results, followed by the calculation of the correlation between the model’s predictions and the actual output values. The computed correlation coefficient, representing the degree of linear association, is 0.982179.

# De-normalizing the predictions
testRates_3_min <- min(splitRates_3_train$output)
testRates_3_max <- max(splitRates_3_train$output)
PPIPrediction_3_5 <- unnormalise(PPIResults_3_3$net.result,
                                      testRates_3_min,
                                      testRates_3_max)

# Calculating accuracy metrics
rmse_3_5 <- rmse(splitRates_3_test$output, PPIPrediction_3_5)
mae_3_5 <- mae(splitRates_3_test$output, PPIPrediction_3_5)
mape_3_5 <- mape(splitRates_3_test$output, PPIPrediction_3_5)
fa_3_5 <- (1-RMSPE(splitRates_3_test$output, PPIPrediction_3_5))*100
ForecastAccuracy_3_5<- matrix(data=c(rmse_3_5,mae_3_5,mape_3_5,fa_3_5),ncol = 4)
colnames(ForecastAccuracy_3_5) <- c("RMSE", "MAE", "MAPE","FA%")
ForecastAccuracy_3_5

##          RMSE     MAE       MAPE      FA%
## [1,] 7.832381 7.69873 0.06024098 93.47836

PPIPrediction_3_5_vector <- as.numeric(PPIPrediction_3_5[, 1])
descdist(PPIPrediction_3_5_vector, boot = 5000)

## summary statistics
## ------
## min:  117.3181   max:  121.6902 
## median:  119.533 
## mean:  119.6312 
## estimated sd:  1.511325 
## estimated skewness:  -0.1002512 
## estimated kurtosis:  1.404186

# Plotting real vs. predicted values
plot(splitRates_3_test$output, PPIPrediction_3_5 ,col='red',main='Real vs predicted NN',pch=18,cex=0.4)
abline(0,1,lwd=1)
legend('bottomright',legend='Forecast', pch=18,col='red', bty='n')

plot(splitRates_3_test$output, type = "l", col="black", main = "Real vs Predicted")
lines(PPIPrediction_3_5, lty=2, lwd=2, col="red")
legend('bottomright',legend='Forecast', pch=18,col='red', bty='n')

The most accurate ANN model for PPI forecasting

In this part, we aim to select the most accurate ANN model for predicting the PPI. The process starts by aggregating the forecast accuracy metrics from various ANN configurations into a vector and identifying the model with the highest accuracy. This model is then employed to predict PPI rates for the next 30 months. The forecasting incorporates simulations that include randomized noise, mirroring real-world data variability, followed by normalization of inputs, prediction using the chosen ANN model, and then rescaling to the original scale. The results are compiled into a data frame and visualized using ‘ggplot2’ and ‘plotly’, offering an interactive and clear graphical representation of the PPI trend forecast. Additionally, the code adapts its output format based on the operating system, displaying model information in a dialog box on Windows or in the console for other systems. An error check is also in place to handle cases where the best model is not identified, ensuring robustness in the code.

# Recommender
fa <- c(fa_2_1, fa_2_3, fa_2_5,fa_3_1, fa_3_3,fa_3_5)
max_accuracy_fa <- max(fa)
max_accuracy_index <- which(max_accuracy_fa == fa)
accuracy_names <- c("NN (2 inputs & 1 hidden layer)", "NN (2 inputs & 3 hidden layer)", "NN (2 inputs & 5 hidden layer)", 
                    "NN (3 inputs & 1 hidden layer)", "NN (3 inputs & 3 hidden layer)", "NN (3 inputs & 5 hidden layer)")
highest_accuracy_name <- accuracy_names[max_accuracy_index]

if (highest_accuracy_name == "NN (2 inputs & 1 hidden layer)") {
  num_simulations <- 5000
  prediction30 <- matrix(nrow = 30, ncol = num_simulations)
  last_known_data <- tail(rates, 5)
  training.dat <- round(0.8 * length(rates))
  splitRates_2 <- as.data.frame(splitDataRates(rates, 2))
  testRates_2_min <- min(splitRates_2$output[1:training.dat])
  testRates_2_max <- max(splitRates_2$output[1:training.dat])
  for (month in 1:30) {
    month_predictions <- numeric(num_simulations)
    for (sim in 1:num_simulations) {
      noisy_input <- last_known_data * (1 + rnorm(5, mean = 0, sd = 0.01))
      normalized_vector <- normalise(noisy_input)
      normalized_input <- data.frame(input1 = normalized_vector[1], input2 = normalized_vector[2])
      prediction <- neuralnet::compute(PPIModel_2_1, normalized_input)$net.result
      prediction_unnormalized <- unnormalise(prediction, testRates_2_min, testRates_2_max)
      month_predictions[sim] <- prediction_unnormalized
    }
    prediction30[month, ] <- month_predictions
  }
  average_predictions <- apply(prediction30, 1, mean)
  dates_forecast <- seq(from = max(economicmetric$Date), by = "month", length.out = 30)
  forecast_df <- data.frame(Date = as.Date(dates_forecast), Forecast = average_predictions)
  p <- ggplot(forecast_df, aes(x = Date, y = Forecast)) +
    geom_line(color = "green", size = 0.5) +
    geom_point(color = "red", size = 1) +
    geom_smooth(method = "lm", aes(group = 1), color = "turquoise", fill = "blue", alpha = 0.05, linetype = "dashed", size = 0.5) +
    labs(title = "30-month PPI Rate Forecast", x = "Date", y = "Forecasted PPI") +
    theme_minimal()
  plotly::ggplotly(p, tooltip = c("x", "y"))
} 
if (highest_accuracy_name == "NN (2 inputs & 3 hidden layer)") {
  num_simulations <- 5000
  prediction30 <- matrix(nrow = 30, ncol = num_simulations)
  last_known_data <- tail(rates, 5)
  training.dat <- round(0.8 * length(rates))
  splitRates_2 <- as.data.frame(splitDataRates(rates, 2))
  testRates_2_min <- min(splitRates_2$output[1:training.dat])
  testRates_2_max <- max(splitRates_2$output[1:training.dat])
  for (month in 1:30) {
    month_predictions <- numeric(num_simulations)
    for (sim in 1:num_simulations) {
      noisy_input <- last_known_data * (1 + rnorm(5, mean = 0, sd = 0.01))
      normalized_vector <- normalise(noisy_input)
      normalized_input <- data.frame(input1 = normalized_vector[1], input2 = normalized_vector[2])
      prediction <- neuralnet::compute(PPIModel_2_3, normalized_input)$net.result
      prediction_unnormalized <- unnormalise(prediction, testRates_2_min, testRates_2_max)
      month_predictions[sim] <- prediction_unnormalized
    }
    prediction30[month, ] <- month_predictions
  }
  average_predictions <- apply(prediction30, 1, mean)
  dates_forecast <- seq(from = max(economicmetric$Date), by = "month", length.out = 30)
  forecast_df <- data.frame(Date = as.Date(dates_forecast), Forecast = average_predictions)
  p <- ggplot(forecast_df, aes(x = Date, y = Forecast)) +
    geom_line(color = "green", size = 0.5) +
    geom_point(color = "red", size = 1) +
    geom_smooth(method = "lm", aes(group = 1), color = "turquoise", fill = "blue", alpha = 0.05, linetype = "dashed", size = 0.5) +
    labs(title = "30-month PPI Rate Forecast", x = "Date", y = "Forecasted PPI") +
    theme_minimal()
  plotly::ggplotly(p, tooltip = c("x", "y"))}
if (highest_accuracy_name == "NN (2 inputs & 5 hidden layer)") {num_simulations <- 5000
prediction30 <- matrix(nrow = 30, ncol = num_simulations)
last_known_data <- tail(rates, 5)
training.dat <- round(0.8 * length(rates))
splitRates_2 <- as.data.frame(splitDataRates(rates, 2))
testRates_2_min <- min(splitRates_2$output[1:training.dat])
testRates_2_max <- max(splitRates_2$output[1:training.dat])
for (month in 1:30) {
  month_predictions <- numeric(num_simulations)
  for (sim in 1:num_simulations) {
    noisy_input <- last_known_data * (1 + rnorm(5, mean = 0, sd = 0.01))
    normalized_vector <- normalise(noisy_input)
    normalized_input <- data.frame(input1 = normalized_vector[1], input2 = normalized_vector[2])
    prediction <- neuralnet::compute(PPIModel_2_5, normalized_input)$net.result
    prediction_unnormalized <- unnormalise(prediction, testRates_2_min, testRates_2_max)
    month_predictions[sim] <- prediction_unnormalized
  }
  prediction30[month, ] <- month_predictions
}
average_predictions <- apply(prediction30, 1, mean)
dates_forecast <- seq(from = max(economicmetric$Date), by = "month", length.out = 30)
forecast_df <- data.frame(Date = as.Date(dates_forecast), Forecast = average_predictions)
p <- ggplot(forecast_df, aes(x = Date, y = Forecast)) +
  geom_line(color = "green", size = 0.5) +
  geom_point(color = "red", size = 1) +
  geom_smooth(method = "lm", aes(group = 1), color = "turquoise", fill = "blue", alpha = 0.05, linetype = "dashed", size = 0.5) +
  labs(title = "30-month PPI Rate Forecast", x = "Date", y = "Forecasted PPI") +
  theme_minimal()
plotly::ggplotly(p, tooltip = c("x", "y"))}

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## `geom_smooth()` using formula = 'y ~ x'

if (highest_accuracy_name == "NN (3 inputs & 1 hidden layer)") {
  num_simulations <- 5000
  prediction30 <- matrix(nrow = 30, ncol = num_simulations)
  last_known_data <- tail(rates, 5)
  training.dat <- round(0.8 * length(rates))
  splitRates_3 <- as.data.frame(splitDataRates(rates, 2))
  testRates_3_min <- min(splitRates_3$output[1:training.dat])
  testRates_3_max <- max(splitRates_3$output[1:training.dat])
  
  for (month in 1:30) {
    month_predictions <- numeric(num_simulations)
    for (sim in 1:num_simulations) {
      noisy_input <- last_known_data * (1 + rnorm(5, mean = 0, sd = 0.01))
      normalized_vector <- normalise(noisy_input)
      normalized_input <- data.frame(input1 = normalized_vector[1], input2 = normalized_vector[2], input3 = normalized_vector[3])
      prediction <- neuralnet::compute(PPIModel_3_1, normalized_input)$net.result
      prediction_unnormalized <- unnormalise(prediction, testRates_3_min, testRates_3_max)
      month_predictions[sim] <- prediction_unnormalized
    }
    prediction30[month, ] <- month_predictions
  }
  average_predictions <- apply(prediction30, 1, mean)
  dates_forecast <- seq(from = max(economicmetric$Date), by = "month", length.out = 30)
  forecast_df <- data.frame(Date = as.Date(dates_forecast), Forecast = average_predictions)
  p <- ggplot(forecast_df, aes(x = Date, y = Forecast)) +
    geom_line(color = "green", size = 0.5) +
    geom_point(color = "red", size = 1) +
    geom_smooth(method = "lm", aes(group = 1), color = "turquoise", fill = "blue", alpha = 0.05, linetype = "dashed", size = 0.5) +
    labs(title = "30-month PPI Rate Forecast", x = "Date", y = "Forecasted PPI") +
    theme_minimal()
  plotly::ggplotly(p, tooltip = c("x", "y"))} 
if (highest_accuracy_name == "NN (3 inputs & 3 hidden layer)") {
  num_simulations <- 5000
  prediction30 <- matrix(nrow = 30, ncol = num_simulations)
  last_known_data <- tail(rates, 5)
  training.dat <- round(0.8 * length(rates))
  splitRates_3 <- as.data.frame(splitDataRates(rates, 2))
  testRates_3_min <- min(splitRates_3$output[1:training.dat])
  testRates_3_max <- max(splitRates_3$output[1:training.dat])
  
  for (month in 1:30) {
    month_predictions <- numeric(num_simulations)
    for (sim in 1:num_simulations) {
      noisy_input <- last_known_data * (1 + rnorm(5, mean = 0, sd = 0.01))
      normalized_vector <- normalise(noisy_input)
      normalized_input <- data.frame(input1 = normalized_vector[1], input2 = normalized_vector[2], input3 = normalized_vector[3])
      prediction <- neuralnet::compute(PPIModel_3_3, normalized_input)$net.result
      prediction_unnormalized <- unnormalise(prediction, testRates_3_min, testRates_3_max)
      month_predictions[sim] <- prediction_unnormalized
    }
    prediction30[month, ] <- month_predictions
  }
  average_predictions <- apply(prediction30, 1, mean)
  dates_forecast <- seq(from = max(economicmetric$Date), by = "month", length.out = 30)
  forecast_df <- data.frame(Date = as.Date(dates_forecast), Forecast = average_predictions)
  p <- ggplot(forecast_df, aes(x = Date, y = Forecast)) +
    geom_line(color = "green", size = 0.5) +
    geom_point(color = "red", size = 1) +
    geom_smooth(method = "lm", aes(group = 1), color = "turquoise", fill = "blue", alpha = 0.05, linetype = "dashed", size = 0.5) +
    labs(title = "30-month PPI Rate Forecast", x = "Date", y = "Forecasted PPI") +
    theme_minimal()
  plotly::ggplotly(p, tooltip = c("x", "y"))
} 
if (highest_accuracy_name == "NN (3 inputs & 5 hidden layer)") {
  num_simulations <- 5000
  prediction30 <- matrix(nrow = 30, ncol = num_simulations)
  last_known_data <- tail(rates, 5)
  training.dat <- round(0.8 * length(rates))
  splitRates_3 <- as.data.frame(splitDataRates(rates, 2))
  testRates_3_min <- min(splitRates_3$output[1:training.dat])
  testRates_3_max <- max(splitRates_3$output[1:training.dat])
  
  for (month in 1:30) {
    month_predictions <- numeric(num_simulations)
    for (sim in 1:num_simulations) {
      noisy_input <- last_known_data * (1 + rnorm(5, mean = 0, sd = 0.01))
      normalized_vector <- normalise(noisy_input)
      normalized_input <- data.frame(input1 = normalized_vector[1], input2 = normalized_vector[2], input3 = normalized_vector[3])
      prediction <- neuralnet::compute(PPIModel_3_5, normalized_input)$net.result
      prediction_unnormalized <- unnormalise(prediction, testRates_3_min, testRates_3_max)
      month_predictions[sim] <- prediction_unnormalized
    }
    prediction30[month, ] <- month_predictions
  }
  average_predictions <- apply(prediction30, 1, mean)
  dates_forecast <- seq(from = max(economicmetric$Date), by = "month", length.out = 30)
  forecast_df <- data.frame(Date = as.Date(dates_forecast), Forecast = average_predictions)
  p <- ggplot(forecast_df, aes(x = Date, y = Forecast)) +
    geom_line(color = "green", size = 0.5) +
    geom_point(color = "red", size = 1) +
    geom_smooth(method = "lm", aes(group = 1), color = "turquoise", fill = "blue", alpha = 0.05, linetype = "dashed", size = 0.5) +
    labs(title = "30-month PPI Rate Forecast", x = "Date", y = "Forecasted PPI") +
    theme_minimal()
  plotly::ggplotly(p, tooltip = c("x", "y"))
} 
if (is.na(highest_accuracy_name)==T){cat("Error 404 : Information Not Found")}

# cat("The model with the highest forecast accuracy metrics is", highest_accuracy_name, "\n")

Summary of findings

In our project paper, ANN-based models were employed for evaluating and predicting the purchase price index (PPI). We have employed 6 neural network models with different input configurations and hidden layers.

Model	Correlation
NN (2 inputs & 1 hidden layer)	0.9798174
NN (2 inputs & 3 hidden layer)	0.9844616
NN (2 inputs & 5 hidden layer)	0.9843718
NN (3 inputs & 1 hidden layer)	0.9831341
NN (3 inputs & 3 hidden layer)	0.9839146
NN (3 inputs & 5 hidden layer)	0.982179

All models showed consistently high correlation values, ranging from 0.9798 to 0.9845, suggest strong linear relationships between the predicted and actual values across all models. This indicates that the neural network models effectively capture the underlying patterns in the data.

In the evaluation of model performance, the metrics employed include Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and Forecast Accuracy (FA%). These metrics serve distinct purposes in assessing the effectiveness of the model predictions.

Model	RMSE	MAE	MAPE	FA%
NN (2 inputs & 1 hidden layer)	8.923797	8.674132	0.067851	92.249366
NN (2 inputs & 3 hidden layer)	7.846771	7.702579	0.060320	93.46
NN (2 inputs & 5 hidden layer)	7.090499	6.999637	0.054861	94.12411
NN (3 inputs & 1 hidden layer)	7.842889	7.705588	0.060291	93.46925
NN (3 inputs & 3 hidden layer)	7.832381	7.69873	0.0602409	93.47836
NN (3 inputs & 5 hidden layer)	7.832381	7.69873	0.0602409	93.47836

According to the statistical indicators, the neural network configuration featuring 2 inputs and 5 hidden layers stands out as the optimal model, showcasing the lowest error metrics and highest forecast accuracy. The results highlight the importance of carefully balancing model complexity and accuracy, with the chosen architecture providing a favorable trade-off. Furthermore, models with 3 inputs and 3 hidden layers, and 3 inputs and 5 hidden layers, demonstrate identical performance, indicating that the inclusion of an extra hidden layer may not significantly enhance predictive capabilities in this context.

The summary statistics provide insights into the central tendency, variability, and shape of the predicted values. It helps us in understanding the characteristics and reliability of each model’s predictions.

Model	Min	Max	Median	mean	Est sd	Est skewness	Est kurtosis
NN (2 inputs & 1 hidden layer)	116.9801	119.6054	118.564	118.525	0.8572958	-0.2429534	1.510801
NN (2 inputs & 3 hidden layer)	117.1155	121.5489	119.4037	119.4966	1.487747	-0.0705478	1.406443
NN (2 inputs & 5 hidden layer)	117.4608	122.9111	120.0046	120.1995	1.893125	0.01104616	1.378583
NN (3 inputs & 1 hidden layer)	117.1941	121.6289	119.5284	119.6243	1.491617	-0.1338361	1.435918
NN (3 inputs & 3 hidden layer)	117.3181	121.6902	119.6312	119.6312	1.511325	-0.1002512	1.404186
NN (3 inputs & 5 hidden layer)	117.3181	121.6902	119.533	119.6312	1.511325	-0.1002512	1.404186

Based on the above results, we observe consistent increases in both mean and median values as the number of hidden layers increases. This demonstrates that the central tendency is slightly higher in more complex models. Additionally, the estimated standard deviation grows, reflecting increased variability in predictions. Skewness values close to zero indicate that the distribution is relatively symmetric, while kurtosis values above one indicate a mildly skewed distortion of the distribution.

The correlation of all the models are close to 1 and RMSE values are minimized, thus showing the robust performance of the developed models.

In the final section,we aggregate accuracy metrics to identify the most accurate ANN configuration for forecasting the PPI over the next 30 months. The forecasts show a stable trend with minor variations, peaking at 110.0672 in June 2024. Despite some volatility, the PPI remains within a narrow range, reaching 109.9019 in April 2025. Overall, these stable forecasts suggest resilience to external factors, emphasizing the need for ongoing economic monitoring to detect potential changes.

Conclusion

This study underscores the potential of artificial neural network (ANN)s for accurate Purchase Price Index (PPI) forecasting, offering valuable insights for economic analysis and decision-making. Selecting the optimal ANN configuration based on performance metrics helps maximize predictive accuracy. Stable future forecasts highlight the potential of ANNs for reliable short-term economic insights.

Strengths: 1. Comprehensive data preparation: Integrating various economic indicators for a holistic analysis. 2. Multiple ANN models: Exploring different configurations to identify the optimal model. 3. Detailed performance evaluation: Utilizing various metrics to assess model accuracy. 4. Stable future forecasts: Providing valuable insights for economic planning and preparedness.

Possible future directions: 1. Analyze the impact of additional economic indicators on model performance. 2. Explore alternative ANN architectures for potential improvements. 3. Monitor economic trends and update forecasts to ensure continued accuracy.

Overall, this project demonstrates a successful application of ANNs for PPI forecasting, delivering valuable insights and laying the foundation for further research and development in this area.

Purchase Price Index (PPI) Analysis & Forecasting using Artificial Neural Networking

Group 8: The Economist

2024-01-02

Team Introduction

Introduction

Objective

Libraries

Functions

Data Preparation

Modelling

The most accurate ANN model for PPI forecasting

Summary of findings

Conclusion