Demand-Driven Stock Planning with Predictive Forecasting

Team Introduction

LOGAN RAMANATHAM (22085735)
PRIYADARSHINI NAIR MUNIANDY (22062712)
MUHAMMAD DANISH BIN ZULKAFLI (22065632)
LEE JIE YENG (S2168543)
VAISHNAVI A/P YUDEYELJODI (22121712)

Introduction

Effective stock planning is crucial for running a successful retail business in today’s dynamic market. With rapidly changing customer preferences and fierce competition, businesses increasingly rely on demand-driven stock planning and predictive forecasting tools to optimize inventory levels and meet customer expectations.Accurate demand forecasting is vital for demand-driven stock planning. By integrating historical sales data, seasonal patterns, and predictive analytics, businesses can enhance their demand estimates (Vu, 2006). Modern analytical tools like machine learning and AI help detect data trends, enabling more accurate future sales predictions (Abolghasemi et al., 2020).

Strong customer and supplier relationships are essential for effective stock planning. Collaborating with suppliers ensures agile restocking based on predicted sales, reducing lead times and managing inventory levels. Assessing supplier lead times in relation to sales estimates allows precise order timing, resulting in efficient inventory turnover. Market trend analysis and customer behavior tracking help predict demand shifts and make proactive inventory adjustments (Heczková et al., 2010).Understanding product life cycles is critical for optimizing stock planning. Adjusting inventory strategies for different life cycle stages ensures optimal stocking of new items while efficiently clearing expired ones (Madenas et al., 2014). Historical life cycle analysis aids in forecasting new product performance and improving inventory management. Maximizing sales of products nearing expiry reduces waste and increases profitability.

Dynamic pricing strategies, based on predictive analytics, enable firms to adjust prices dynamically, boosting revenue and preventing excess inventory. Balancing safety stock levels is crucial to avoid stockouts while minimizing excess inventory, protecting against demand unpredictability (Kayikci et al., 2022). Stock Keeping Unit (SKU) rationalization, guided by sales data, helps businesses optimize their stock assortment by focusing on high-performing items and discontinuing underperforming SKUs.

Objective

Efficient inventory management is vital for maintaining competitiveness and operational excellence in today’s market. An effective system optimizes stock levels, improves cash flow, prevents shortages, and enhances supply chain efficiency. This research aims to reduce holding costs related to storage and expiration, thereby improving financial sustainability and resource allocation. Additionally, it seeks to enhance consumer satisfaction by reducing stock shortages and backorders, ensuring products are available when needed, which promotes loyalty and brand reputation. By integrating advanced analytics and machine learning, our project aims to transform inventory management, boosting operational efficiency and profitability across various industries.

Libraries

Following libraries were chosen to provide a comprehensive toolkit for data manipulation, analysis, modelling, and visualization.

#Load libraries
library(neuralnet)
library(Metrics)
library(MLmetrics)
library(ggplot2)
library(fitdistrplus)
library(arrow)
library(WDI)
library(plotly)
library(dplyr)
library(lubridate)
library(readxl)
library(forecast)
library(timeSeries)
library(imputeTS)
library(PerformanceAnalytics)
library(xgboost)
library(caret)
library(TSstudio)
library(tidyverse)
library(tsibble)
library(fable)
library(e1071)
library(microbenchmark)
library(pryr)
require(stats)

Functions

In the following code segment, we create several functions for data preprocessing, preparing time-series data for neural network training and subsequent prediction tasks.

#Functions for Combining All-Files in Directory
combining_file <- function(file_path) {
  df <- readxl::read_excel(file_path)
  return(df)
}

splitDataRates <-  function(data, steps) {
  m <- matrix(ncol = steps+1)
  if (steps == 2) {
    colnames(m) <- c("input1", "input2", "output")
  } else if (steps == 3) {
    colnames(m) <- c("input1", "input2", "input3", "output")
  } else if (steps == 4) {
    colnames(m) <- c("input1", "input2", "input3", "input4", "output")
  } else if (steps == 5) {
    colnames(m) <- c("input1", "input2", "input3", "input4", "input5", "output")
  } 
  
  for (i in 1:(length(data)-(steps+1))) {
    v <- c(data[i:(i+steps)])
    m <- rbind(m, v)
  }
  return(m[-1,])
}

#Function for normalizing data
normalise <- function(x) {
  return((x - min(x)) / (max(x) - min(x)))
}

#Function to undo normalization
unnormalise <- function(x, min, max) { 
  return( (max - min)*x + min )
}

‘combining_file’ function is designed to combine all files of equal structure in specified project path directory.

‘splitDataRates’ function is designed to transform a univariate time series data into a format suitable for neural network training. It is used to prepare the dataset for the neural network, ensuring that the network has the necessary lagged values to learn the temporal patterns in the data.

‘normalise’ function is designed to normalise the data, scaling the input values to a range between 0 and 1. It is applied to input features before feeding them into the model to ensure consistent scale across different variables.

‘unnormalise’ function is designed to reverse the normalization process, converting the normalized data back to their original scale.

Data Preparation

dataset_path<-"C:/Users/logan/OneDrive/Master's/Unilever ML Model Development/Project Dataset Phase 1"
dataset_files <- sort(list.files(path = dataset_path,full.names = TRUE),decreasing = FALSE)
dataset <- lapply(dataset_files, combining_file) %>% bind_rows()
projectData<-dataset[,1:33];attach(projectData);TS <- ts(projectData)
RDD <- timeSequence(from=as.Date(head(projectData$`Requested deliv.date`,1)),
                    to=as.Date(tail(projectData$`Requested deliv.date`,1)),
                    length.out = length(projectData$`Requested deliv.date`)
                       ,by="day")

In the pre-processing pipeline, importing and combining the dataset files is the initial step. All of the files in the dataset directory are enumerated, sorted, processed and combined into one large data frame through the combining_file function is used independently by files. We extract the first 33 columns of the combined dataset to concentrate on the pertinent attributes for our research. In order to facilitate time series analysis, the data frame is then transformed into a time series object with the requested delivery date as the primary timeline.

projectData <- projectData %>%
  mutate(
    `Order Status` = case_when(
      `Reason for rejection` == "R1" ~ "Cancelled",
      `CCFOT reason` != "" | (`Requested deliv.date` < today() & `Delivery` == "") ~ "Dropped",
      `Billed Quantity` < `Order Quantity` & `Billing Document` != "" ~ "Partial Delivery",
      `Billing Document` != "" ~ "Billed",
      `Delivery` != "" & `Requested deliv.date` >= today() ~ "Pending Delivery",
      `Delivery` != "" & `Requested deliv.date` < today() ~ "Past Due Pending Billing",
      TRUE ~ "Open Order"  # Default case
    )
  )

The Order Status measure was developed depending on the understanding of each order’s status through the designation of specific documents to classify whether its cancelled, dropped, partially delivered, billed, or pending delivery. This stage allows for specialized investigation of each category and is essential for differentiating between various order outcomes.

projectData <- projectData %>%
  mutate(
    `Total Sales Quantity` = case_when(
      `Order Status` == "Billed" ~ `Billed Quantity`,
      `Order Status` == "Open Order" ~ `Initial order quantity`,
      `Order Status` == "Partial Delivery" ~ `Delivery quantity`,
      `Order Status` == "Past Due Pending Billing" ~ `Cumul.confirmed qty`,
      `Order Status` == "Pending Delivery" ~ `Delivery quantity`,
      TRUE ~ 0  # Default case
    )
  )

Orders are categorized and added to form “Total Sales Quantity”, determined using the order status as base. The total sales quantity is the addition of specific quantities by order status ande guarantees each order status’s sales quantity are appropriately reflected in the dataset.

TotalSalesQuantityByRDD <- aggregate(projectData$`Total Sales Quantity` ~ as.Date(projectData$`Requested deliv.date`), data = projectData, FUN = sum)
TotalSalesQuantityByRDD <- TotalSalesQuantityByRDD %>%
  mutate(
    Year = year(TotalSalesQuantityByRDD$`as.Date(projectData$\`Requested deliv.date\`)`),
    Month = month(TotalSalesQuantityByRDD$`as.Date(projectData$\`Requested deliv.date\`)`),
    Week = week(TotalSalesQuantityByRDD$`as.Date(projectData$\`Requested deliv.date\`)`)
  )
projectDataset<-as.data.frame(aggregate(TotalSalesQuantityByRDD$`projectData$\`Total Sales Quantity\``~TotalSalesQuantityByRDD$Year+TotalSalesQuantityByRDD$Week, data = TotalSalesQuantityByRDD, FUN = sum))
colnames(projectDataset) <-c("Year","Week","Total Sales Quantity")
projectDataset <- projectDataset[order(projectDataset$Year,projectDataset$Week), ]
projectDataset <- subset(projectDataset, Year != 2022)
projectDataset <- subset(projectDataset, Week != 53)

Next, aggregating total sales quantity by the requested delivery date facilitates further grouping and analysis, through the creation of Year, Month, and Week columns, aiding data summarization and long-term trend identification.For accuracy, the dataset is sorted by week and year with week 53 and year 2022 are filtered away as it’s out-of-scope of project. The dataset was transformed into time-series object to facilitate time-series analysis, subsequently to a tibble.By plotting QQ-plot and ts-plot to evaluate the distribution of sales amounts, we analyse the target dataset for trends and outliers.

TS <- ts(projectDataset)
Date = as.Date(paste(projectDataset$Year,projectDataset$Week, 1, sep = "-"), "%Y-%U-%u")
total_sales_tbl <- tibble(Date = Date, Sales = TS[, 3])
total_sales_ts <- as_tsibble(total_sales_tbl, index = Date)
plot.ts (total_sales_ts$Sales  , main =" Total Sales Quantity. "  ,ylab ="Total Sales Quantity by Month ",xlab =" Date ")

qqnorm (total_sales_ts$Sales - mean (total_sales_ts$Sales), col = "blue",main = 'Total Sales Quantity by Month')
qqline (total_sales_ts$Sales - mean (total_sales_ts$Sales), col = "red",lwd=1)

Finally, the project dataset’s descriptive statistics are computed and produced. The distribution and variability of the sales volumes are shown by these statistics, which include metrics like skewness, kurtosis, and the skewness-kurtosis ratio. The distribution is then examined using a bootstrap analysis, and any patterns of correlation over time are presented using the sales values’ autocorrelation function.

ProjectData_Description <- as.data.frame(t(c(summary(projectDataset$`Total Sales Quantity`), 
                             Skewness = skewness(projectDataset$`Total Sales Quantity`),
                             Kurtosis = kurtosis(projectDataset$`Total Sales Quantity`), 
                             SkewnessKurtosisRatio = SkewnessKurtosisRatio(projectDataset$`Total Sales Quantity`),
                             SD = sd(projectDataset$`Total Sales Quantity`,na.rm = TRUE))))
print(ProjectData_Description)

##   Min. 1st Qu.   Median     Mean  3rd Qu.     Max.   Skewness  Kurtosis
## 1    0  373380 530582.6 445790.1 622445.8 907050.7 -0.7073131 -0.817708
##   SkewnessKurtosisRatio       SD
## 1            -0.3216119 265721.1

descdist(projectDataset$`Total Sales Quantity`, boot = 5000)

## summary statistics
## ------
## min:  0   max:  907050.7 
## median:  530582.6 
## mean:  445790.1 
## estimated sd:  265721.1 
## estimated skewness:  -0.7411701 
## estimated kurtosis:  2.288355

acf(abs(total_sales_ts$Sales) , main ="Total Sales Quantity")

Modelling - ARIMA (AutoRegressive Integrated Moving Average)

In this section, we focus on the application of ARIMA (AutoRegressive Integrated Moving Average) modeling for time series forecasting. We aim to split our sales data into training and testing sets, develop an ARIMA model, and evaluate its performance using various accuracy metrics.

split_ts <- total_sales_ts %>% 
  dplyr::mutate(row_num = row_number()) %>%
  dplyr::mutate(set = if_else(row_num <= n() - 5, "train", "test")) %>%
  dplyr::group_by(set) %>%
  dplyr::group_split()
test <- split_ts[[1]]$Sales
train <- split_ts[[2]]$Sales
forecastedArima <- forecast(auto.arima(train, lambda = BoxCox.lambda(train)), h = 5)
method_arima <- "Arima"
rmse_arima <- rmse(test,forecastedArima$mean)
mae_arima<- mae(test,forecastedArima$mean)
mape_arima <- mape(test,forecastedArima$mean)
fa_arima <- (1-RMSPE(test,forecastedArima$mean))*100
training_time_arima <- microbenchmark(
  forecastedArima <- forecast(auto.arima(train, lambda = BoxCox.lambda(train)), h = 5),
  times = 10
)
memory_usage_arima <- object_size(forecastedArima)
ForecastAccuracy_arima<- matrix(data=c(method_arima,rmse_arima,mae_arima,mape_arima,fa_arima,(mean(training_time_arima$time) / 1e6),memory_usage_arima/1000), ncol=7)
colnames(ForecastAccuracy_arima) <- c("Method","RMSE", "MAE", "MAPE","FA%","Training Time(s)","Memory Usage(kB)")
ForecastAccuracy_arima

##      Method  RMSE               MAE                MAPE  FA%               
## [1,] "Arima" "602268.861803512" "497606.690226635" "Inf" "44.6551621743664"
##      Training Time(s) Memory Usage(kB)
## [1,] "71.0147109"     "15.92"

Modelling - Artificial Neural Network (ANN)

This part focuses on preparing the data for training ANN for sales forecasting. We first split the dataset into 80% training set and 20% testing set. We then use the ‘splitDataRates()’ function to construct datasets with varying numbers of lagged values (2 to 5), creating different input configurations for the ANN models. Each of these datasets undergoes normalization, a crucial step where data values are scaled to a uniform range, enhancing the efficiency and accuracy of the neural network training. The datasets are subsequently divided into corresponding training and testing subsets. This preparation ensures that the neural networks have optimal data structures for effective learning and robust performance evaluation.

rates<-projectDataset[,3]
# Input/Output Matrices & Normalization 
# --- m = 2
training.dat<-round(0.8*length(rates))
testing.dat.start <- training.dat+1
splitRates_2 <- as.data.frame(splitDataRates(rates, 2))
testing.dat.end <- round(nrow(splitRates_2))
splitRatesNormalised_2 <- as.data.frame(lapply(splitRates_2, normalise))
splitRates_2_train <- splitRates_2[1:training.dat,]
splitRates_2_test <- splitRates_2[testing.dat.start:testing.dat.end,]
splitRatesNormalised_2_train <- splitRatesNormalised_2[1:training.dat,]
splitRatesNormalised_2_test <- splitRatesNormalised_2[testing.dat.start:testing.dat.end,]
#--- m = 3
splitRates_3 <- as.data.frame(splitDataRates(rates, 3))
testing.dat.end <- round(nrow(splitRates_3))
splitRatesNormalised_3 <- as.data.frame(lapply(splitRates_3, normalise))
splitRates_3_train <- splitRates_3[1:training.dat,] 
splitRates_3_test <- splitRates_3[testing.dat.start:testing.dat.end,]
splitRatesNormalised_3_train <- splitRatesNormalised_3[1:training.dat,]
splitRatesNormalised_3_test <- splitRatesNormalised_3[testing.dat.start:testing.dat.end,]
#--- m = 4
splitRates_4 <- as.data.frame(splitDataRates(rates, 4))
testing.dat.end <- round(nrow(splitRates_4))
splitRatesNormalised_4 <- as.data.frame(lapply(splitRates_4, normalise))
splitRates_4_train <- splitRates_4[1:training.dat,] 
splitRates_4_test <- splitRates_4[testing.dat.start:testing.dat.end,]
splitRatesNormalised_4_train <- splitRatesNormalised_4[1:training.dat,]
splitRatesNormalised_4_test <- splitRatesNormalised_4[testing.dat.start:testing.dat.end,]
#--- m = 5
splitRates_5 <- as.data.frame(splitDataRates(rates, 5))
testing.dat.end <- round(nrow(splitRates_5))
splitRatesNormalised_5 <- as.data.frame(lapply(splitRates_5, normalise))
splitRates_5_train <- splitRates_5[1:training.dat,] 
splitRates_5_test <- splitRates_5[testing.dat.start:testing.dat.end,]
splitRatesNormalised_5_train <- splitRatesNormalised_5[1:training.dat,]
splitRatesNormalised_5_test <- splitRatesNormalised_5[testing.dat.start:testing.dat.end,]

In the following steps, multiple Artificial Neural Network (ANN) models are constructed, trained, and evaluated for forecasting the total sales by week and year. The process begins with setting a seed for reproducibility, followed by building and training ANNs with varying configurations of input nodes (2 or 3) and hidden layers (1, 3, or 5). Each model, such as PPIModel_2_1 with 2 inputs and 1 hidden layer, is trained on a corresponding normalized dataset, like splitRatesNormalised_2_train, that includes lagged sales values. In addition, we compute predictions on a normalized test dataset and compare them to the actual sales values, with the correlation coefficient to quantify the prediction accuracy. To interpret the results in a real-world context, we de-normalize the predictions, converting them back to their original scale. The performance of these models is then evaluated on test datasets through metrics like RMSE, MAE, MAPE, and Forecast Accuracy. Furthermore, we perform a distribution analysis of the predicted sales values, employing bootstrap techniques to estimate the distribution’s summary statistics. This analysis provides an in-depth look at the predictive distribution’s behavior, including its central tendency and variability. Finally, visualizations are created to compare actual versus predicted sales values, offering a visual assessment of each model’s predictive accuracy.

set.seed(69)
PPIModel_2_1 <- neuralnet(output ~ input1 + input2,
                          data = splitRatesNormalised_2_train,
                          hidden = c(1),
                          stepmax = 1e6) 
plot(PPIModel_2_1,rep = "best")

Neural Network Model Construction and Training:

The neural network model is a model with 2 input neurons, 1 hidden layer, and 1 output neuron. The neuralnet function from the neuralnet package was used to build the model, and it was trained using the splitRatesNormalised_2_train dataset. The hidden layer contains 1 neuron, and you can adjust the number of neurons in the hidden layer as needed. The stepmax parameter is set to 1e6, which is the maximum number of iterations for training.

PPIResults_2_1 <- neuralnet::compute(PPIModel_2_1, splitRatesNormalised_2_test[1:2])
testRates_2_min <- min(splitRates_2_train$output)
testRates_2_max <- max(splitRates_2_train$output)
PPIPrediction_2_1 <- unnormalise(PPIResults_2_1$net.result,
                                 testRates_2_min,
                                 testRates_2_max)

method_2_1 <- "NN (2 inputs & 1 hidden layer)"
rmse_2_1 <- rmse(splitRates_2_test$output, PPIPrediction_2_1)
mae_2_1 <- mae(splitRates_2_test$output, PPIPrediction_2_1)
mape_2_1 <- mape(splitRates_2_test$output, PPIPrediction_2_1)
fa_2_1 <- (1-RMSPE(splitRates_2_test$output, PPIPrediction_2_1))*100
training_time_2_1 <- microbenchmark(
  PPIResults_2_1 <- neuralnet::compute(PPIModel_2_1, splitRatesNormalised_2_test[1:2]),
  times = 10
)
memory_usage_2_1 <- object_size(PPIResults_2_1)
ForecastAccuracy_2_1<- matrix(data=c(method_2_1,rmse_2_1,mae_2_1,mape_2_1,fa_2_1,(mean(training_time_2_1$time) / 1e6),memory_usage_2_1/1000), ncol=7)
colnames(ForecastAccuracy_2_1) <- c("Method","RMSE", "MAE", "MAPE","FA%","Training Time(s)","Memory Usage(kB)")
ForecastAccuracy_2_1

##      Method                           RMSE               MAE               
## [1,] "NN (2 inputs & 1 hidden layer)" "116188.192671719" "98795.8443003266"
##      MAPE                FA%              Training Time(s) Memory Usage(kB)
## [1,] "0.203829580068449" "80.01577815494" "0.3154209"      "3.648"

De-normalizing the Predictions involved using the un-normalise function to transform normalized predictions back to their original scale.

PPIPrediction_2_1_vector <- as.numeric(PPIPrediction_2_1[, 1])

The aforementioned steps were repeated with a combination of {1,3,5} hidden layers for both, 2 and 3 inputs models of ANN to analyse the best-performing configuration for ANN to accurately forecast the total sales quantity to the lowest-possible degree of error.The neural network model is a model with 2 input neurons, 3 hidden layer, and 1 output neuron.

set.seed(69)
PPIModel_2_3 <- neuralnet(output ~ input1 + input2,
                          data = splitRatesNormalised_2_train,
                          hidden = c(3),
                          stepmax = 1e6) 
plot(PPIModel_2_3,rep = "best")

PPIResults_2_3 <- neuralnet::compute(PPIModel_2_3, splitRatesNormalised_2_test[1:2])
testRates_2_min <- min(splitRates_2_train$output)
testRates_2_max <- max(splitRates_2_train$output)
PPIPrediction_2_3 <- unnormalise(PPIResults_2_3$net.result,
                                 testRates_2_min,
                                 testRates_2_max)
method_2_3 <- "NN (2 inputs & 3 hidden layer)"
rmse_2_3 <- rmse(splitRates_2_test$output, PPIPrediction_2_3)
mae_2_3 <- mae(splitRates_2_test$output, PPIPrediction_2_3)
mape_2_3 <- mape(splitRates_2_test$output, PPIPrediction_2_3)
fa_2_3 <- (1-RMSPE(splitRates_2_test$output, PPIPrediction_2_3))*100
training_time_2_3 <- microbenchmark(
  PPIResults_2_3 <- neuralnet::compute(PPIModel_2_3, splitRatesNormalised_2_test[1:2]),
  times = 10
)
memory_usage_2_3 <- object_size(PPIResults_2_3)
ForecastAccuracy_2_3<- matrix(data=c(method_2_3,rmse_2_3,mae_2_3,mape_2_3,fa_2_3,(mean(training_time_2_3$time) / 1e6),memory_usage_2_3/1000), ncol=7)
colnames(ForecastAccuracy_2_3) <- c("Method","RMSE", "MAE", "MAPE","FA%","Training Time(s)","Memory Usage(kB)")
ForecastAccuracy_2_3

##      Method                           RMSE               MAE               
## [1,] "NN (2 inputs & 3 hidden layer)" "109770.735633743" "92097.0118171223"
##      MAPE                FA%                Training Time(s) Memory Usage(kB)
## [1,] "0.187884975978272" "80.6927311933513" "0.2407007"      "3.808"

PPIPrediction_2_3_vector <- as.numeric(PPIPrediction_2_3[, 1])

The neural network model is a model with 2 input neurons, 5 hidden layer, and 1 output neuron.

set.seed(69)
PPIModel_2_5 <- neuralnet(output ~ input1 + input2,
                          data = splitRatesNormalised_2_train,
                          hidden = c(5),
                          stepmax = 1e6) 
plot(PPIModel_2_5,rep = "best")

PPIResults_2_5 <- neuralnet::compute(PPIModel_2_5, splitRatesNormalised_2_test[1:2])
testRates_2_min <- min(splitRates_2_train$output)
testRates_2_max <- max(splitRates_2_train$output)
PPIPrediction_2_5 <- unnormalise(PPIResults_2_5$net.result,
                                 testRates_2_min,
                                 testRates_2_max)
method_2_5 <- "NN (2 inputs & 5 hidden layer)"
rmse_2_5 <- rmse(splitRates_2_test$output, PPIPrediction_2_5)
mae_2_5 <- mae(splitRates_2_test$output, PPIPrediction_2_5)
mape_2_5 <- mape(splitRates_2_test$output, PPIPrediction_2_5)
fa_2_5 <- (1-RMSPE(splitRates_2_test$output, PPIPrediction_2_5))*100
training_time_2_5 <- microbenchmark(
  PPIResults_2_5 <- neuralnet::compute(PPIModel_2_5, splitRatesNormalised_2_test[1:2]),
  times = 10
)
memory_usage_2_5 <- object_size(PPIResults_2_5)
ForecastAccuracy_2_5<- matrix(data=c(method_2_5,rmse_2_5,mae_2_5,mape_2_5,fa_2_5,(mean(training_time_2_5$time) / 1e6),memory_usage_2_5/1000), ncol=7)
colnames(ForecastAccuracy_2_5) <- c("Method","RMSE", "MAE", "MAPE","FA%","Training Time(s)","Memory Usage(kB)")
ForecastAccuracy_2_5

##      Method                           RMSE               MAE               
## [1,] "NN (2 inputs & 5 hidden layer)" "117781.421046873" "96421.2310585965"
##      MAPE                FA%                Training Time(s) Memory Usage(kB)
## [1,] "0.196482880025068" "79.3388454516574" "0.181541"       "3.968"

PPIPrediction_2_5_vector <- as.numeric(PPIPrediction_2_5[, 1])

The neural network model is a model with 3 input neurons, 1 hidden layer, and 1 output neuron.

set.seed(69)
PPIModel_3_1 <- neuralnet(output ~ input1 + input2 + input3,
                          data = splitRatesNormalised_3_train,
                          hidden = c(1),
                          stepmax = 1e6)
plot(PPIModel_3_1,rep = "best")

PPIResults_3_1 <- neuralnet::compute(PPIModel_3_1, splitRatesNormalised_3_test[1:3])
testRates_3_min <- min(splitRates_3_train$output)
testRates_3_max <- max(splitRates_3_train$output)
PPIPrediction_3_1 <- unnormalise(PPIResults_3_1$net.result,
                                 testRates_3_min,
                                 testRates_3_max)
method_3_1 <- "NN (3 inputs & 1 hidden layer)"
rmse_3_1 <- rmse(splitRates_3_test$output, PPIPrediction_3_1)
mae_3_1 <- mae(splitRates_3_test$output, PPIPrediction_3_1)
mape_3_1 <- mape(splitRates_3_test$output, PPIPrediction_3_1)
fa_3_1 <- (1-RMSPE(splitRates_3_test$output, PPIPrediction_3_1))*100
training_time_3_1 <- microbenchmark(
  PPIResults_3_1 <- neuralnet::compute(PPIModel_3_1, splitRatesNormalised_3_test[1:3]),
  times = 10
)
memory_usage_3_1 <- object_size(PPIResults_3_1)
ForecastAccuracy_3_1<- matrix(data=c(method_3_1,rmse_3_1,mae_3_1,mape_3_1,fa_3_1,(mean(training_time_3_1$time) / 1e6),memory_usage_3_1/1000), ncol=7)
colnames(ForecastAccuracy_3_1) <- c("Method","RMSE", "MAE", "MAPE","FA%","Training Time(s)","Memory Usage(kB)")
ForecastAccuracy_3_1

##      Method                           RMSE               MAE              
## [1,] "NN (3 inputs & 1 hidden layer)" "121888.830759173" "105297.44217686"
##      MAPE                FA%                Training Time(s) Memory Usage(kB)
## [1,] "0.219584664438057" "79.0628145131386" "0.2152011"      "3.68"

PPIPrediction_3_1_vector <- as.numeric(PPIPrediction_3_1[, 1])

The neural network model is a model with 3 input neurons, 3 hidden layer, and 1 output neuron.

set.seed(69)
PPIModel_3_3 <- neuralnet(output ~ input1 + input2 + input3,
                          data = splitRatesNormalised_3_train,
                          hidden = c(3),
                          stepmax = 1e6)
plot(PPIModel_3_3,rep = "best")

PPIResults_3_3 <- neuralnet::compute(PPIModel_3_3, splitRatesNormalised_3_test[1:3])
testRates_3_min <- min(splitRates_3_train$output)
testRates_3_max <- max(splitRates_3_train$output)
PPIPrediction_3_3 <- unnormalise(PPIResults_3_3$net.result,
                                 testRates_3_min,
                                 testRates_3_max)
method_3_3 <- "NN (3 inputs & 3 hidden layer)"
rmse_3_3 <- rmse(splitRates_3_test$output, PPIPrediction_3_3)
mae_3_3 <- mae(splitRates_3_test$output, PPIPrediction_3_3)
mape_3_3 <- mape(splitRates_3_test$output, PPIPrediction_3_3)
fa_3_3 <- (1-RMSPE(splitRates_3_test$output, PPIPrediction_3_3))*100
training_time_3_3 <- microbenchmark(
  PPIResults_3_3 <- neuralnet::compute(PPIModel_3_3, splitRatesNormalised_3_test[1:3]),
  times = 10
)
memory_usage_3_3 <- object_size(PPIResults_3_3)
ForecastAccuracy_3_3<- matrix(data=c(method_3_3,rmse_3_3,mae_3_3,mape_3_3,fa_3_3,(mean(training_time_3_3$time) / 1e6),memory_usage_3_3/1000), ncol=7)
colnames(ForecastAccuracy_3_3) <- c("Method","RMSE", "MAE", "MAPE","FA%","Training Time(s)","Memory Usage(kB)")
ForecastAccuracy_3_3

##      Method                           RMSE               MAE               
## [1,] "NN (3 inputs & 3 hidden layer)" "129421.715588937" "106294.849071009"
##      MAPE                FA%                Training Time(s) Memory Usage(kB)
## [1,] "0.218828117427941" "77.5016397472226" "0.2662711"      "3.824"

PPIPrediction_3_3_vector <- as.numeric(PPIPrediction_3_3[, 1])

The neural network model is a model with 3 input neurons, 5 hidden layer, and 1 output neuron.

set.seed(69)
PPIModel_3_5 <- neuralnet(output ~ input1 + input2 + input3,
                          data = splitRatesNormalised_3_train,
                          hidden = c(5),
                          stepmax = 1e6)
plot(PPIModel_3_5,rep = "best")

PPIResults_3_5 <- neuralnet::compute(PPIModel_3_5, splitRatesNormalised_3_test[1:3])
testRates_3_min <- min(splitRates_3_train$output)
testRates_3_max <- max(splitRates_3_train$output)
PPIPrediction_3_5 <- unnormalise(PPIResults_3_3$net.result,
                                 testRates_3_min,
                                 testRates_3_max)
method_3_5 <- "NN (3 inputs & 5 hidden layer)"
rmse_3_5 <- rmse(splitRates_3_test$output, PPIPrediction_3_5)
mae_3_5 <- mae(splitRates_3_test$output, PPIPrediction_3_5)
mape_3_5 <- mape(splitRates_3_test$output, PPIPrediction_3_5)
fa_3_5 <- (1-RMSPE(splitRates_3_test$output, PPIPrediction_3_5))*100
training_time_3_5 <- microbenchmark(
  PPIResults_3_5 <- neuralnet::compute(PPIModel_3_5, splitRatesNormalised_3_test[1:3]),
  times = 10
)
memory_usage_3_5 <- object_size(PPIResults_3_5)
ForecastAccuracy_3_5<- matrix(data=c(method_3_5,rmse_3_5,mae_3_5,mape_3_5,fa_3_5,(mean(training_time_3_5$time) / 1e6),memory_usage_3_5/1000), ncol=7)
colnames(ForecastAccuracy_3_5) <- c("Method","RMSE", "MAE", "MAPE","FA%","Training Time(s)","Memory Usage(kB)")
ForecastAccuracy_3_5

##      Method                           RMSE               MAE               
## [1,] "NN (3 inputs & 5 hidden layer)" "129421.715588937" "106294.849071009"
##      MAPE                FA%                Training Time(s) Memory Usage(kB)
## [1,] "0.218828117427941" "77.5016397472226" "0.1796909"      "3.968"

PPIPrediction_3_5_vector <- as.numeric(PPIPrediction_3_5[, 1])

Modelling - XG-Boost

In this section, we implement an XG-Boost (Extreme Gradient Boosting) model to forecast sales quantities. XG-Boost is a powerful machine learning algorithm known for its high performance and accuracy in predictive modeling tasks.

First, we enhance the dataset by adding lag features (lag_1 and lag_2) for the Total Sales Quantity. These lagged variables help capture temporal dependencies in the data. Any rows with missing values resulting from the lag operation are removed to ensure a complete dataset.

projectDataset <- projectDataset %>% 
  mutate(
    lag_1 = lag(`Total Sales Quantity`, 1),
    lag_2 = lag(`Total Sales Quantity`, 2)
  ) %>% 
  na.omit()

Next, we separate the target variable (Total Sales Quantity) from the feature variables. The data is then split into training and testing sets using an 80-20 split, ensuring that the model is trained on a majority of the data while the remaining data is used for validation.

target <- projectDataset$`Total Sales Quantity`
features <- projectDataset %>% select(-`Total Sales Quantity`)
set.seed(123)
train_index <- createDataPartition(target, p = 0.8, list = FALSE)
train_features <- features[train_index, ]
test_features <- features[-train_index, ]
train_target <- target[train_index]
test_target <- target[-train_index]

We convert the training and testing sets into xgb.DMatrix objects, which are optimized data structures for XG-Boost.The XG-Boost model is then trained with the specified parameters, including the objective function reg:squarederror for regression tasks. The model is trained for 100 rounds.

dtrain <- xgb.DMatrix(data = as.matrix(train_features), label = train_target)
dtest <- xgb.DMatrix(data = as.matrix(test_features), label = test_target)
params <- list(objective = "reg:squarederror")
xgb_model <- xgboost(data = dtrain, params = params, nrounds = 100)

## [1]  train-rmse:376032.373561 
## [2]  train-rmse:271659.977103 
## [3]  train-rmse:197014.302295 
## [4]  train-rmse:143986.715608 
## [5]  train-rmse:106054.575162 
## [6]  train-rmse:78708.401174 
## [7]  train-rmse:59261.142979 
## [8]  train-rmse:45036.612500 
## [9]  train-rmse:34619.132205 
## [10] train-rmse:26960.944943 
## [11] train-rmse:21344.390926 
## [12] train-rmse:17228.330354 
## [13] train-rmse:14078.180237 
## [14] train-rmse:11728.092527 
## [15] train-rmse:9820.438399 
## [16] train-rmse:8256.381790 
## [17] train-rmse:7067.793431 
## [18] train-rmse:6043.038235 
## [19] train-rmse:5175.317870 
## [20] train-rmse:4472.194958 
## [21] train-rmse:3866.337138 
## [22] train-rmse:3364.881579 
## [23] train-rmse:2934.395574 
## [24] train-rmse:2584.133997 
## [25] train-rmse:2302.632269 
## [26] train-rmse:1996.298629 
## [27] train-rmse:1772.401064 
## [28] train-rmse:1551.709911 
## [29] train-rmse:1398.528044 
## [30] train-rmse:1239.313287 
## [31] train-rmse:1079.140372 
## [32] train-rmse:998.858223 
## [33] train-rmse:873.651336 
## [34] train-rmse:787.968532 
## [35] train-rmse:736.516603 
## [36] train-rmse:670.130904 
## [37] train-rmse:622.521284 
## [38] train-rmse:549.534645 
## [39] train-rmse:485.582805 
## [40] train-rmse:451.473263 
## [41] train-rmse:417.240133 
## [42] train-rmse:394.896771 
## [43] train-rmse:361.479710 
## [44] train-rmse:327.863578 
## [45] train-rmse:315.204414 
## [46] train-rmse:296.563036 
## [47] train-rmse:285.233937 
## [48] train-rmse:275.000859 
## [49] train-rmse:248.090188 
## [50] train-rmse:234.040263 
## [51] train-rmse:212.529709 
## [52] train-rmse:197.656353 
## [53] train-rmse:184.450461 
## [54] train-rmse:177.629293 
## [55] train-rmse:161.685036 
## [56] train-rmse:155.144869 
## [57] train-rmse:135.290302 
## [58] train-rmse:126.246061 
## [59] train-rmse:117.510775 
## [60] train-rmse:103.832299 
## [61] train-rmse:96.568591 
## [62] train-rmse:90.807070 
## [63] train-rmse:80.562097 
## [64] train-rmse:76.259561 
## [65] train-rmse:71.083827 
## [66] train-rmse:65.854179 
## [67] train-rmse:59.134817 
## [68] train-rmse:54.884886 
## [69] train-rmse:51.492676 
## [70] train-rmse:47.427852 
## [71] train-rmse:44.066420 
## [72] train-rmse:40.105875 
## [73] train-rmse:37.341324 
## [74] train-rmse:34.357599 
## [75] train-rmse:32.288832 
## [76] train-rmse:31.037969 
## [77] train-rmse:28.663811 
## [78] train-rmse:25.619637 
## [79] train-rmse:22.136454 
## [80] train-rmse:20.571403 
## [81] train-rmse:19.237834 
## [82] train-rmse:18.066685 
## [83] train-rmse:16.380865 
## [84] train-rmse:15.505279 
## [85] train-rmse:14.128864 
## [86] train-rmse:13.487362 
## [87] train-rmse:12.397692 
## [88] train-rmse:11.873191 
## [89] train-rmse:10.957981 
## [90] train-rmse:10.535849 
## [91] train-rmse:9.898897 
## [92] train-rmse:9.573741 
## [93] train-rmse:8.557649 
## [94] train-rmse:7.541542 
## [95] train-rmse:6.675287 
## [96] train-rmse:6.031104 
## [97] train-rmse:5.455125 
## [98] train-rmse:5.257288 
## [99] train-rmse:4.574771 
## [100]    train-rmse:4.367762

We use the trained model to predict the sales quantities on the test set and evaluate the model’s performance using metrics such as RMSE, MAE, MAPE, and Forecast Accuracy (FA%). Additionally, we measure the training time and memory usage of the model to assess its computational efficiency.

predictions <- predict(xgb_model, dtest)
method_XGB <- "XG-Boost"
rmse_XGB <- rmse(test_target, predictions)
mae_XGB <- mae(test_target, predictions)
mape_XGB <- mape(test_target, predictions)
fa_XGB <- mean(abs(predictions - test_target) <= 0.1 * abs(test_target))*100
training_time_XGB <- microbenchmark(
  xgb_model <- xgboost(data = dtrain, params = params, nrounds = 100),
  times = 10
)

## [1]  train-rmse:376032.373561 
## [2]  train-rmse:271659.977103 
## [3]  train-rmse:197014.302295 
## [4]  train-rmse:143986.715608 
## [5]  train-rmse:106054.575162 
## [6]  train-rmse:78708.401174 
## [7]  train-rmse:59261.142979 
## [8]  train-rmse:45036.612500 
## [9]  train-rmse:34619.132205 
## [10] train-rmse:26960.944943 
## [11] train-rmse:21344.390926 
## [12] train-rmse:17228.330354 
## [13] train-rmse:14078.180237 
## [14] train-rmse:11728.092527 
## [15] train-rmse:9820.438399 
## [16] train-rmse:8256.381790 
## [17] train-rmse:7067.793431 
## [18] train-rmse:6043.038235 
## [19] train-rmse:5175.317870 
## [20] train-rmse:4472.194958 
## [21] train-rmse:3866.337138 
## [22] train-rmse:3364.881579 
## [23] train-rmse:2934.395574 
## [24] train-rmse:2584.133997 
## [25] train-rmse:2302.632269 
## [26] train-rmse:1996.298629 
## [27] train-rmse:1772.401064 
## [28] train-rmse:1551.709911 
## [29] train-rmse:1398.528044 
## [30] train-rmse:1239.313287 
## [31] train-rmse:1079.140372 
## [32] train-rmse:998.858223 
## [33] train-rmse:873.651336 
## [34] train-rmse:787.968532 
## [35] train-rmse:736.516603 
## [36] train-rmse:670.130904 
## [37] train-rmse:622.521284 
## [38] train-rmse:549.534645 
## [39] train-rmse:485.582805 
## [40] train-rmse:451.473263 
## [41] train-rmse:417.240133 
## [42] train-rmse:394.896771 
## [43] train-rmse:361.479710 
## [44] train-rmse:327.863578 
## [45] train-rmse:315.204414 
## [46] train-rmse:296.563036 
## [47] train-rmse:285.233937 
## [48] train-rmse:275.000859 
## [49] train-rmse:248.090188 
## [50] train-rmse:234.040263 
## [51] train-rmse:212.529709 
## [52] train-rmse:197.656353 
## [53] train-rmse:184.450461 
## [54] train-rmse:177.629293 
## [55] train-rmse:161.685036 
## [56] train-rmse:155.144869 
## [57] train-rmse:135.290302 
## [58] train-rmse:126.246061 
## [59] train-rmse:117.510775 
## [60] train-rmse:103.832299 
## [61] train-rmse:96.568591 
## [62] train-rmse:90.807070 
## [63] train-rmse:80.562097 
## [64] train-rmse:76.259561 
## [65] train-rmse:71.083827 
## [66] train-rmse:65.854179 
## [67] train-rmse:59.134817 
## [68] train-rmse:54.884886 
## [69] train-rmse:51.492676 
## [70] train-rmse:47.427852 
## [71] train-rmse:44.066420 
## [72] train-rmse:40.105875 
## [73] train-rmse:37.341324 
## [74] train-rmse:34.357599 
## [75] train-rmse:32.288832 
## [76] train-rmse:31.037969 
## [77] train-rmse:28.663811 
## [78] train-rmse:25.619637 
## [79] train-rmse:22.136454 
## [80] train-rmse:20.571403 
## [81] train-rmse:19.237834 
## [82] train-rmse:18.066685 
## [83] train-rmse:16.380865 
## [84] train-rmse:15.505279 
## [85] train-rmse:14.128864 
## [86] train-rmse:13.487362 
## [87] train-rmse:12.397692 
## [88] train-rmse:11.873191 
## [89] train-rmse:10.957981 
## [90] train-rmse:10.535849 
## [91] train-rmse:9.898897 
## [92] train-rmse:9.573741 
## [93] train-rmse:8.557649 
## [94] train-rmse:7.541542 
## [95] train-rmse:6.675287 
## [96] train-rmse:6.031104 
## [97] train-rmse:5.455125 
## [98] train-rmse:5.257288 
## [99] train-rmse:4.574771 
## [100]    train-rmse:4.367762 
## [1]  train-rmse:376032.373561 
## [2]  train-rmse:271659.977103 
## [3]  train-rmse:197014.302295 
## [4]  train-rmse:143986.715608 
## [5]  train-rmse:106054.575162 
## [6]  train-rmse:78708.401174 
## [7]  train-rmse:59261.142979 
## [8]  train-rmse:45036.612500 
## [9]  train-rmse:34619.132205 
## [10] train-rmse:26960.944943 
## [11] train-rmse:21344.390926 
## [12] train-rmse:17228.330354 
## [13] train-rmse:14078.180237 
## [14] train-rmse:11728.092527 
## [15] train-rmse:9820.438399 
## [16] train-rmse:8256.381790 
## [17] train-rmse:7067.793431 
## [18] train-rmse:6043.038235 
## [19] train-rmse:5175.317870 
## [20] train-rmse:4472.194958 
## [21] train-rmse:3866.337138 
## [22] train-rmse:3364.881579 
## [23] train-rmse:2934.395574 
## [24] train-rmse:2584.133997 
## [25] train-rmse:2302.632269 
## [26] train-rmse:1996.298629 
## [27] train-rmse:1772.401064 
## [28] train-rmse:1551.709911 
## [29] train-rmse:1398.528044 
## [30] train-rmse:1239.313287 
## [31] train-rmse:1079.140372 
## [32] train-rmse:998.858223 
## [33] train-rmse:873.651336 
## [34] train-rmse:787.968532 
## [35] train-rmse:736.516603 
## [36] train-rmse:670.130904 
## [37] train-rmse:622.521284 
## [38] train-rmse:549.534645 
## [39] train-rmse:485.582805 
## [40] train-rmse:451.473263 
## [41] train-rmse:417.240133 
## [42] train-rmse:394.896771 
## [43] train-rmse:361.479710 
## [44] train-rmse:327.863578 
## [45] train-rmse:315.204414 
## [46] train-rmse:296.563036 
## [47] train-rmse:285.233937 
## [48] train-rmse:275.000859 
## [49] train-rmse:248.090188 
## [50] train-rmse:234.040263 
## [51] train-rmse:212.529709 
## [52] train-rmse:197.656353 
## [53] train-rmse:184.450461 
## [54] train-rmse:177.629293 
## [55] train-rmse:161.685036 
## [56] train-rmse:155.144869 
## [57] train-rmse:135.290302 
## [58] train-rmse:126.246061 
## [59] train-rmse:117.510775 
## [60] train-rmse:103.832299 
## [61] train-rmse:96.568591 
## [62] train-rmse:90.807070 
## [63] train-rmse:80.562097 
## [64] train-rmse:76.259561 
## [65] train-rmse:71.083827 
## [66] train-rmse:65.854179 
## [67] train-rmse:59.134817 
## [68] train-rmse:54.884886 
## [69] train-rmse:51.492676 
## [70] train-rmse:47.427852 
## [71] train-rmse:44.066420 
## [72] train-rmse:40.105875 
## [73] train-rmse:37.341324 
## [74] train-rmse:34.357599 
## [75] train-rmse:32.288832 
## [76] train-rmse:31.037969 
## [77] train-rmse:28.663811 
## [78] train-rmse:25.619637 
## [79] train-rmse:22.136454 
## [80] train-rmse:20.571403 
## [81] train-rmse:19.237834 
## [82] train-rmse:18.066685 
## [83] train-rmse:16.380865 
## [84] train-rmse:15.505279 
## [85] train-rmse:14.128864 
## [86] train-rmse:13.487362 
## [87] train-rmse:12.397692 
## [88] train-rmse:11.873191 
## [89] train-rmse:10.957981 
## [90] train-rmse:10.535849 
## [91] train-rmse:9.898897 
## [92] train-rmse:9.573741 
## [93] train-rmse:8.557649 
## [94] train-rmse:7.541542 
## [95] train-rmse:6.675287 
## [96] train-rmse:6.031104 
## [97] train-rmse:5.455125 
## [98] train-rmse:5.257288 
## [99] train-rmse:4.574771 
## [100]    train-rmse:4.367762 
## [1]  train-rmse:376032.373561 
## [2]  train-rmse:271659.977103 
## [3]  train-rmse:197014.302295 
## [4]  train-rmse:143986.715608 
## [5]  train-rmse:106054.575162 
## [6]  train-rmse:78708.401174 
## [7]  train-rmse:59261.142979 
## [8]  train-rmse:45036.612500 
## [9]  train-rmse:34619.132205 
## [10] train-rmse:26960.944943 
## [11] train-rmse:21344.390926 
## [12] train-rmse:17228.330354 
## [13] train-rmse:14078.180237 
## [14] train-rmse:11728.092527 
## [15] train-rmse:9820.438399 
## [16] train-rmse:8256.381790 
## [17] train-rmse:7067.793431 
## [18] train-rmse:6043.038235 
## [19] train-rmse:5175.317870 
## [20] train-rmse:4472.194958 
## [21] train-rmse:3866.337138 
## [22] train-rmse:3364.881579 
## [23] train-rmse:2934.395574 
## [24] train-rmse:2584.133997 
## [25] train-rmse:2302.632269 
## [26] train-rmse:1996.298629 
## [27] train-rmse:1772.401064 
## [28] train-rmse:1551.709911 
## [29] train-rmse:1398.528044 
## [30] train-rmse:1239.313287 
## [31] train-rmse:1079.140372 
## [32] train-rmse:998.858223 
## [33] train-rmse:873.651336 
## [34] train-rmse:787.968532 
## [35] train-rmse:736.516603 
## [36] train-rmse:670.130904 
## [37] train-rmse:622.521284 
## [38] train-rmse:549.534645 
## [39] train-rmse:485.582805 
## [40] train-rmse:451.473263 
## [41] train-rmse:417.240133 
## [42] train-rmse:394.896771 
## [43] train-rmse:361.479710 
## [44] train-rmse:327.863578 
## [45] train-rmse:315.204414 
## [46] train-rmse:296.563036 
## [47] train-rmse:285.233937 
## [48] train-rmse:275.000859 
## [49] train-rmse:248.090188 
## [50] train-rmse:234.040263 
## [51] train-rmse:212.529709 
## [52] train-rmse:197.656353 
## [53] train-rmse:184.450461 
## [54] train-rmse:177.629293 
## [55] train-rmse:161.685036 
## [56] train-rmse:155.144869 
## [57] train-rmse:135.290302 
## [58] train-rmse:126.246061 
## [59] train-rmse:117.510775 
## [60] train-rmse:103.832299 
## [61] train-rmse:96.568591 
## [62] train-rmse:90.807070 
## [63] train-rmse:80.562097 
## [64] train-rmse:76.259561 
## [65] train-rmse:71.083827 
## [66] train-rmse:65.854179 
## [67] train-rmse:59.134817 
## [68] train-rmse:54.884886 
## [69] train-rmse:51.492676 
## [70] train-rmse:47.427852 
## [71] train-rmse:44.066420 
## [72] train-rmse:40.105875 
## [73] train-rmse:37.341324 
## [74] train-rmse:34.357599 
## [75] train-rmse:32.288832 
## [76] train-rmse:31.037969 
## [77] train-rmse:28.663811 
## [78] train-rmse:25.619637 
## [79] train-rmse:22.136454 
## [80] train-rmse:20.571403 
## [81] train-rmse:19.237834 
## [82] train-rmse:18.066685 
## [83] train-rmse:16.380865 
## [84] train-rmse:15.505279 
## [85] train-rmse:14.128864 
## [86] train-rmse:13.487362 
## [87] train-rmse:12.397692 
## [88] train-rmse:11.873191 
## [89] train-rmse:10.957981 
## [90] train-rmse:10.535849 
## [91] train-rmse:9.898897 
## [92] train-rmse:9.573741 
## [93] train-rmse:8.557649 
## [94] train-rmse:7.541542 
## [95] train-rmse:6.675287 
## [96] train-rmse:6.031104 
## [97] train-rmse:5.455125 
## [98] train-rmse:5.257288 
## [99] train-rmse:4.574771 
## [100]    train-rmse:4.367762 
## [1]  train-rmse:376032.373561 
## [2]  train-rmse:271659.977103 
## [3]  train-rmse:197014.302295 
## [4]  train-rmse:143986.715608 
## [5]  train-rmse:106054.575162 
## [6]  train-rmse:78708.401174 
## [7]  train-rmse:59261.142979 
## [8]  train-rmse:45036.612500 
## [9]  train-rmse:34619.132205 
## [10] train-rmse:26960.944943 
## [11] train-rmse:21344.390926 
## [12] train-rmse:17228.330354 
## [13] train-rmse:14078.180237 
## [14] train-rmse:11728.092527 
## [15] train-rmse:9820.438399 
## [16] train-rmse:8256.381790 
## [17] train-rmse:7067.793431 
## [18] train-rmse:6043.038235 
## [19] train-rmse:5175.317870 
## [20] train-rmse:4472.194958 
## [21] train-rmse:3866.337138 
## [22] train-rmse:3364.881579 
## [23] train-rmse:2934.395574 
## [24] train-rmse:2584.133997 
## [25] train-rmse:2302.632269 
## [26] train-rmse:1996.298629 
## [27] train-rmse:1772.401064 
## [28] train-rmse:1551.709911 
## [29] train-rmse:1398.528044 
## [30] train-rmse:1239.313287 
## [31] train-rmse:1079.140372 
## [32] train-rmse:998.858223 
## [33] train-rmse:873.651336 
## [34] train-rmse:787.968532 
## [35] train-rmse:736.516603 
## [36] train-rmse:670.130904 
## [37] train-rmse:622.521284 
## [38] train-rmse:549.534645 
## [39] train-rmse:485.582805 
## [40] train-rmse:451.473263 
## [41] train-rmse:417.240133 
## [42] train-rmse:394.896771 
## [43] train-rmse:361.479710 
## [44] train-rmse:327.863578 
## [45] train-rmse:315.204414 
## [46] train-rmse:296.563036 
## [47] train-rmse:285.233937 
## [48] train-rmse:275.000859 
## [49] train-rmse:248.090188 
## [50] train-rmse:234.040263 
## [51] train-rmse:212.529709 
## [52] train-rmse:197.656353 
## [53] train-rmse:184.450461 
## [54] train-rmse:177.629293 
## [55] train-rmse:161.685036 
## [56] train-rmse:155.144869 
## [57] train-rmse:135.290302 
## [58] train-rmse:126.246061 
## [59] train-rmse:117.510775 
## [60] train-rmse:103.832299 
## [61] train-rmse:96.568591 
## [62] train-rmse:90.807070 
## [63] train-rmse:80.562097 
## [64] train-rmse:76.259561 
## [65] train-rmse:71.083827 
## [66] train-rmse:65.854179 
## [67] train-rmse:59.134817 
## [68] train-rmse:54.884886 
## [69] train-rmse:51.492676 
## [70] train-rmse:47.427852 
## [71] train-rmse:44.066420 
## [72] train-rmse:40.105875 
## [73] train-rmse:37.341324 
## [74] train-rmse:34.357599 
## [75] train-rmse:32.288832 
## [76] train-rmse:31.037969 
## [77] train-rmse:28.663811 
## [78] train-rmse:25.619637 
## [79] train-rmse:22.136454 
## [80] train-rmse:20.571403 
## [81] train-rmse:19.237834 
## [82] train-rmse:18.066685 
## [83] train-rmse:16.380865 
## [84] train-rmse:15.505279 
## [85] train-rmse:14.128864 
## [86] train-rmse:13.487362 
## [87] train-rmse:12.397692 
## [88] train-rmse:11.873191 
## [89] train-rmse:10.957981 
## [90] train-rmse:10.535849 
## [91] train-rmse:9.898897 
## [92] train-rmse:9.573741 
## [93] train-rmse:8.557649 
## [94] train-rmse:7.541542 
## [95] train-rmse:6.675287 
## [96] train-rmse:6.031104 
## [97] train-rmse:5.455125 
## [98] train-rmse:5.257288 
## [99] train-rmse:4.574771 
## [100]    train-rmse:4.367762 
## [1]  train-rmse:376032.373561 
## [2]  train-rmse:271659.977103 
## [3]  train-rmse:197014.302295 
## [4]  train-rmse:143986.715608 
## [5]  train-rmse:106054.575162 
## [6]  train-rmse:78708.401174 
## [7]  train-rmse:59261.142979 
## [8]  train-rmse:45036.612500 
## [9]  train-rmse:34619.132205 
## [10] train-rmse:26960.944943 
## [11] train-rmse:21344.390926 
## [12] train-rmse:17228.330354 
## [13] train-rmse:14078.180237 
## [14] train-rmse:11728.092527 
## [15] train-rmse:9820.438399 
## [16] train-rmse:8256.381790 
## [17] train-rmse:7067.793431 
## [18] train-rmse:6043.038235 
## [19] train-rmse:5175.317870 
## [20] train-rmse:4472.194958 
## [21] train-rmse:3866.337138 
## [22] train-rmse:3364.881579 
## [23] train-rmse:2934.395574 
## [24] train-rmse:2584.133997 
## [25] train-rmse:2302.632269 
## [26] train-rmse:1996.298629 
## [27] train-rmse:1772.401064 
## [28] train-rmse:1551.709911 
## [29] train-rmse:1398.528044 
## [30] train-rmse:1239.313287 
## [31] train-rmse:1079.140372 
## [32] train-rmse:998.858223 
## [33] train-rmse:873.651336 
## [34] train-rmse:787.968532 
## [35] train-rmse:736.516603 
## [36] train-rmse:670.130904 
## [37] train-rmse:622.521284 
## [38] train-rmse:549.534645 
## [39] train-rmse:485.582805 
## [40] train-rmse:451.473263 
## [41] train-rmse:417.240133 
## [42] train-rmse:394.896771 
## [43] train-rmse:361.479710 
## [44] train-rmse:327.863578 
## [45] train-rmse:315.204414 
## [46] train-rmse:296.563036 
## [47] train-rmse:285.233937 
## [48] train-rmse:275.000859 
## [49] train-rmse:248.090188 
## [50] train-rmse:234.040263 
## [51] train-rmse:212.529709 
## [52] train-rmse:197.656353 
## [53] train-rmse:184.450461 
## [54] train-rmse:177.629293 
## [55] train-rmse:161.685036 
## [56] train-rmse:155.144869 
## [57] train-rmse:135.290302 
## [58] train-rmse:126.246061 
## [59] train-rmse:117.510775 
## [60] train-rmse:103.832299 
## [61] train-rmse:96.568591 
## [62] train-rmse:90.807070 
## [63] train-rmse:80.562097 
## [64] train-rmse:76.259561 
## [65] train-rmse:71.083827 
## [66] train-rmse:65.854179 
## [67] train-rmse:59.134817 
## [68] train-rmse:54.884886 
## [69] train-rmse:51.492676 
## [70] train-rmse:47.427852 
## [71] train-rmse:44.066420 
## [72] train-rmse:40.105875 
## [73] train-rmse:37.341324 
## [74] train-rmse:34.357599 
## [75] train-rmse:32.288832 
## [76] train-rmse:31.037969 
## [77] train-rmse:28.663811 
## [78] train-rmse:25.619637 
## [79] train-rmse:22.136454 
## [80] train-rmse:20.571403 
## [81] train-rmse:19.237834 
## [82] train-rmse:18.066685 
## [83] train-rmse:16.380865 
## [84] train-rmse:15.505279 
## [85] train-rmse:14.128864 
## [86] train-rmse:13.487362 
## [87] train-rmse:12.397692 
## [88] train-rmse:11.873191 
## [89] train-rmse:10.957981 
## [90] train-rmse:10.535849 
## [91] train-rmse:9.898897 
## [92] train-rmse:9.573741 
## [93] train-rmse:8.557649 
## [94] train-rmse:7.541542 
## [95] train-rmse:6.675287 
## [96] train-rmse:6.031104 
## [97] train-rmse:5.455125 
## [98] train-rmse:5.257288 
## [99] train-rmse:4.574771 
## [100]    train-rmse:4.367762 
## [1]  train-rmse:376032.373561 
## [2]  train-rmse:271659.977103 
## [3]  train-rmse:197014.302295 
## [4]  train-rmse:143986.715608 
## [5]  train-rmse:106054.575162 
## [6]  train-rmse:78708.401174 
## [7]  train-rmse:59261.142979 
## [8]  train-rmse:45036.612500 
## [9]  train-rmse:34619.132205 
## [10] train-rmse:26960.944943 
## [11] train-rmse:21344.390926 
## [12] train-rmse:17228.330354 
## [13] train-rmse:14078.180237 
## [14] train-rmse:11728.092527 
## [15] train-rmse:9820.438399 
## [16] train-rmse:8256.381790 
## [17] train-rmse:7067.793431 
## [18] train-rmse:6043.038235 
## [19] train-rmse:5175.317870 
## [20] train-rmse:4472.194958 
## [21] train-rmse:3866.337138 
## [22] train-rmse:3364.881579 
## [23] train-rmse:2934.395574 
## [24] train-rmse:2584.133997 
## [25] train-rmse:2302.632269 
## [26] train-rmse:1996.298629 
## [27] train-rmse:1772.401064 
## [28] train-rmse:1551.709911 
## [29] train-rmse:1398.528044 
## [30] train-rmse:1239.313287 
## [31] train-rmse:1079.140372 
## [32] train-rmse:998.858223 
## [33] train-rmse:873.651336 
## [34] train-rmse:787.968532 
## [35] train-rmse:736.516603 
## [36] train-rmse:670.130904 
## [37] train-rmse:622.521284 
## [38] train-rmse:549.534645 
## [39] train-rmse:485.582805 
## [40] train-rmse:451.473263 
## [41] train-rmse:417.240133 
## [42] train-rmse:394.896771 
## [43] train-rmse:361.479710 
## [44] train-rmse:327.863578 
## [45] train-rmse:315.204414 
## [46] train-rmse:296.563036 
## [47] train-rmse:285.233937 
## [48] train-rmse:275.000859 
## [49] train-rmse:248.090188 
## [50] train-rmse:234.040263 
## [51] train-rmse:212.529709 
## [52] train-rmse:197.656353 
## [53] train-rmse:184.450461 
## [54] train-rmse:177.629293 
## [55] train-rmse:161.685036 
## [56] train-rmse:155.144869 
## [57] train-rmse:135.290302 
## [58] train-rmse:126.246061 
## [59] train-rmse:117.510775 
## [60] train-rmse:103.832299 
## [61] train-rmse:96.568591 
## [62] train-rmse:90.807070 
## [63] train-rmse:80.562097 
## [64] train-rmse:76.259561 
## [65] train-rmse:71.083827 
## [66] train-rmse:65.854179 
## [67] train-rmse:59.134817 
## [68] train-rmse:54.884886 
## [69] train-rmse:51.492676 
## [70] train-rmse:47.427852 
## [71] train-rmse:44.066420 
## [72] train-rmse:40.105875 
## [73] train-rmse:37.341324 
## [74] train-rmse:34.357599 
## [75] train-rmse:32.288832 
## [76] train-rmse:31.037969 
## [77] train-rmse:28.663811 
## [78] train-rmse:25.619637 
## [79] train-rmse:22.136454 
## [80] train-rmse:20.571403 
## [81] train-rmse:19.237834 
## [82] train-rmse:18.066685 
## [83] train-rmse:16.380865 
## [84] train-rmse:15.505279 
## [85] train-rmse:14.128864 
## [86] train-rmse:13.487362 
## [87] train-rmse:12.397692 
## [88] train-rmse:11.873191 
## [89] train-rmse:10.957981 
## [90] train-rmse:10.535849 
## [91] train-rmse:9.898897 
## [92] train-rmse:9.573741 
## [93] train-rmse:8.557649 
## [94] train-rmse:7.541542 
## [95] train-rmse:6.675287 
## [96] train-rmse:6.031104 
## [97] train-rmse:5.455125 
## [98] train-rmse:5.257288 
## [99] train-rmse:4.574771 
## [100]    train-rmse:4.367762 
## [1]  train-rmse:376032.373561 
## [2]  train-rmse:271659.977103 
## [3]  train-rmse:197014.302295 
## [4]  train-rmse:143986.715608 
## [5]  train-rmse:106054.575162 
## [6]  train-rmse:78708.401174 
## [7]  train-rmse:59261.142979 
## [8]  train-rmse:45036.612500 
## [9]  train-rmse:34619.132205 
## [10] train-rmse:26960.944943 
## [11] train-rmse:21344.390926 
## [12] train-rmse:17228.330354 
## [13] train-rmse:14078.180237 
## [14] train-rmse:11728.092527 
## [15] train-rmse:9820.438399 
## [16] train-rmse:8256.381790 
## [17] train-rmse:7067.793431 
## [18] train-rmse:6043.038235 
## [19] train-rmse:5175.317870 
## [20] train-rmse:4472.194958 
## [21] train-rmse:3866.337138 
## [22] train-rmse:3364.881579 
## [23] train-rmse:2934.395574 
## [24] train-rmse:2584.133997 
## [25] train-rmse:2302.632269 
## [26] train-rmse:1996.298629 
## [27] train-rmse:1772.401064 
## [28] train-rmse:1551.709911 
## [29] train-rmse:1398.528044 
## [30] train-rmse:1239.313287 
## [31] train-rmse:1079.140372 
## [32] train-rmse:998.858223 
## [33] train-rmse:873.651336 
## [34] train-rmse:787.968532 
## [35] train-rmse:736.516603 
## [36] train-rmse:670.130904 
## [37] train-rmse:622.521284 
## [38] train-rmse:549.534645 
## [39] train-rmse:485.582805 
## [40] train-rmse:451.473263 
## [41] train-rmse:417.240133 
## [42] train-rmse:394.896771 
## [43] train-rmse:361.479710 
## [44] train-rmse:327.863578 
## [45] train-rmse:315.204414 
## [46] train-rmse:296.563036 
## [47] train-rmse:285.233937 
## [48] train-rmse:275.000859 
## [49] train-rmse:248.090188 
## [50] train-rmse:234.040263 
## [51] train-rmse:212.529709 
## [52] train-rmse:197.656353 
## [53] train-rmse:184.450461 
## [54] train-rmse:177.629293 
## [55] train-rmse:161.685036 
## [56] train-rmse:155.144869 
## [57] train-rmse:135.290302 
## [58] train-rmse:126.246061 
## [59] train-rmse:117.510775 
## [60] train-rmse:103.832299 
## [61] train-rmse:96.568591 
## [62] train-rmse:90.807070 
## [63] train-rmse:80.562097 
## [64] train-rmse:76.259561 
## [65] train-rmse:71.083827 
## [66] train-rmse:65.854179 
## [67] train-rmse:59.134817 
## [68] train-rmse:54.884886 
## [69] train-rmse:51.492676 
## [70] train-rmse:47.427852 
## [71] train-rmse:44.066420 
## [72] train-rmse:40.105875 
## [73] train-rmse:37.341324 
## [74] train-rmse:34.357599 
## [75] train-rmse:32.288832 
## [76] train-rmse:31.037969 
## [77] train-rmse:28.663811 
## [78] train-rmse:25.619637 
## [79] train-rmse:22.136454 
## [80] train-rmse:20.571403 
## [81] train-rmse:19.237834 
## [82] train-rmse:18.066685 
## [83] train-rmse:16.380865 
## [84] train-rmse:15.505279 
## [85] train-rmse:14.128864 
## [86] train-rmse:13.487362 
## [87] train-rmse:12.397692 
## [88] train-rmse:11.873191 
## [89] train-rmse:10.957981 
## [90] train-rmse:10.535849 
## [91] train-rmse:9.898897 
## [92] train-rmse:9.573741 
## [93] train-rmse:8.557649 
## [94] train-rmse:7.541542 
## [95] train-rmse:6.675287 
## [96] train-rmse:6.031104 
## [97] train-rmse:5.455125 
## [98] train-rmse:5.257288 
## [99] train-rmse:4.574771 
## [100]    train-rmse:4.367762 
## [1]  train-rmse:376032.373561 
## [2]  train-rmse:271659.977103 
## [3]  train-rmse:197014.302295 
## [4]  train-rmse:143986.715608 
## [5]  train-rmse:106054.575162 
## [6]  train-rmse:78708.401174 
## [7]  train-rmse:59261.142979 
## [8]  train-rmse:45036.612500 
## [9]  train-rmse:34619.132205 
## [10] train-rmse:26960.944943 
## [11] train-rmse:21344.390926 
## [12] train-rmse:17228.330354 
## [13] train-rmse:14078.180237 
## [14] train-rmse:11728.092527 
## [15] train-rmse:9820.438399 
## [16] train-rmse:8256.381790 
## [17] train-rmse:7067.793431 
## [18] train-rmse:6043.038235 
## [19] train-rmse:5175.317870 
## [20] train-rmse:4472.194958 
## [21] train-rmse:3866.337138 
## [22] train-rmse:3364.881579 
## [23] train-rmse:2934.395574 
## [24] train-rmse:2584.133997 
## [25] train-rmse:2302.632269 
## [26] train-rmse:1996.298629 
## [27] train-rmse:1772.401064 
## [28] train-rmse:1551.709911 
## [29] train-rmse:1398.528044 
## [30] train-rmse:1239.313287 
## [31] train-rmse:1079.140372 
## [32] train-rmse:998.858223 
## [33] train-rmse:873.651336 
## [34] train-rmse:787.968532 
## [35] train-rmse:736.516603 
## [36] train-rmse:670.130904 
## [37] train-rmse:622.521284 
## [38] train-rmse:549.534645 
## [39] train-rmse:485.582805 
## [40] train-rmse:451.473263 
## [41] train-rmse:417.240133 
## [42] train-rmse:394.896771 
## [43] train-rmse:361.479710 
## [44] train-rmse:327.863578 
## [45] train-rmse:315.204414 
## [46] train-rmse:296.563036 
## [47] train-rmse:285.233937 
## [48] train-rmse:275.000859 
## [49] train-rmse:248.090188 
## [50] train-rmse:234.040263 
## [51] train-rmse:212.529709 
## [52] train-rmse:197.656353 
## [53] train-rmse:184.450461 
## [54] train-rmse:177.629293 
## [55] train-rmse:161.685036 
## [56] train-rmse:155.144869 
## [57] train-rmse:135.290302 
## [58] train-rmse:126.246061 
## [59] train-rmse:117.510775 
## [60] train-rmse:103.832299 
## [61] train-rmse:96.568591 
## [62] train-rmse:90.807070 
## [63] train-rmse:80.562097 
## [64] train-rmse:76.259561 
## [65] train-rmse:71.083827 
## [66] train-rmse:65.854179 
## [67] train-rmse:59.134817 
## [68] train-rmse:54.884886 
## [69] train-rmse:51.492676 
## [70] train-rmse:47.427852 
## [71] train-rmse:44.066420 
## [72] train-rmse:40.105875 
## [73] train-rmse:37.341324 
## [74] train-rmse:34.357599 
## [75] train-rmse:32.288832 
## [76] train-rmse:31.037969 
## [77] train-rmse:28.663811 
## [78] train-rmse:25.619637 
## [79] train-rmse:22.136454 
## [80] train-rmse:20.571403 
## [81] train-rmse:19.237834 
## [82] train-rmse:18.066685 
## [83] train-rmse:16.380865 
## [84] train-rmse:15.505279 
## [85] train-rmse:14.128864 
## [86] train-rmse:13.487362 
## [87] train-rmse:12.397692 
## [88] train-rmse:11.873191 
## [89] train-rmse:10.957981 
## [90] train-rmse:10.535849 
## [91] train-rmse:9.898897 
## [92] train-rmse:9.573741 
## [93] train-rmse:8.557649 
## [94] train-rmse:7.541542 
## [95] train-rmse:6.675287 
## [96] train-rmse:6.031104 
## [97] train-rmse:5.455125 
## [98] train-rmse:5.257288 
## [99] train-rmse:4.574771 
## [100]    train-rmse:4.367762 
## [1]  train-rmse:376032.373561 
## [2]  train-rmse:271659.977103 
## [3]  train-rmse:197014.302295 
## [4]  train-rmse:143986.715608 
## [5]  train-rmse:106054.575162 
## [6]  train-rmse:78708.401174 
## [7]  train-rmse:59261.142979 
## [8]  train-rmse:45036.612500 
## [9]  train-rmse:34619.132205 
## [10] train-rmse:26960.944943 
## [11] train-rmse:21344.390926 
## [12] train-rmse:17228.330354 
## [13] train-rmse:14078.180237 
## [14] train-rmse:11728.092527 
## [15] train-rmse:9820.438399 
## [16] train-rmse:8256.381790 
## [17] train-rmse:7067.793431 
## [18] train-rmse:6043.038235 
## [19] train-rmse:5175.317870 
## [20] train-rmse:4472.194958 
## [21] train-rmse:3866.337138 
## [22] train-rmse:3364.881579 
## [23] train-rmse:2934.395574 
## [24] train-rmse:2584.133997 
## [25] train-rmse:2302.632269 
## [26] train-rmse:1996.298629 
## [27] train-rmse:1772.401064 
## [28] train-rmse:1551.709911 
## [29] train-rmse:1398.528044 
## [30] train-rmse:1239.313287 
## [31] train-rmse:1079.140372 
## [32] train-rmse:998.858223 
## [33] train-rmse:873.651336 
## [34] train-rmse:787.968532 
## [35] train-rmse:736.516603 
## [36] train-rmse:670.130904 
## [37] train-rmse:622.521284 
## [38] train-rmse:549.534645 
## [39] train-rmse:485.582805 
## [40] train-rmse:451.473263 
## [41] train-rmse:417.240133 
## [42] train-rmse:394.896771 
## [43] train-rmse:361.479710 
## [44] train-rmse:327.863578 
## [45] train-rmse:315.204414 
## [46] train-rmse:296.563036 
## [47] train-rmse:285.233937 
## [48] train-rmse:275.000859 
## [49] train-rmse:248.090188 
## [50] train-rmse:234.040263 
## [51] train-rmse:212.529709 
## [52] train-rmse:197.656353 
## [53] train-rmse:184.450461 
## [54] train-rmse:177.629293 
## [55] train-rmse:161.685036 
## [56] train-rmse:155.144869 
## [57] train-rmse:135.290302 
## [58] train-rmse:126.246061 
## [59] train-rmse:117.510775 
## [60] train-rmse:103.832299 
## [61] train-rmse:96.568591 
## [62] train-rmse:90.807070 
## [63] train-rmse:80.562097 
## [64] train-rmse:76.259561 
## [65] train-rmse:71.083827 
## [66] train-rmse:65.854179 
## [67] train-rmse:59.134817 
## [68] train-rmse:54.884886 
## [69] train-rmse:51.492676 
## [70] train-rmse:47.427852 
## [71] train-rmse:44.066420 
## [72] train-rmse:40.105875 
## [73] train-rmse:37.341324 
## [74] train-rmse:34.357599 
## [75] train-rmse:32.288832 
## [76] train-rmse:31.037969 
## [77] train-rmse:28.663811 
## [78] train-rmse:25.619637 
## [79] train-rmse:22.136454 
## [80] train-rmse:20.571403 
## [81] train-rmse:19.237834 
## [82] train-rmse:18.066685 
## [83] train-rmse:16.380865 
## [84] train-rmse:15.505279 
## [85] train-rmse:14.128864 
## [86] train-rmse:13.487362 
## [87] train-rmse:12.397692 
## [88] train-rmse:11.873191 
## [89] train-rmse:10.957981 
## [90] train-rmse:10.535849 
## [91] train-rmse:9.898897 
## [92] train-rmse:9.573741 
## [93] train-rmse:8.557649 
## [94] train-rmse:7.541542 
## [95] train-rmse:6.675287 
## [96] train-rmse:6.031104 
## [97] train-rmse:5.455125 
## [98] train-rmse:5.257288 
## [99] train-rmse:4.574771 
## [100]    train-rmse:4.367762 
## [1]  train-rmse:376032.373561 
## [2]  train-rmse:271659.977103 
## [3]  train-rmse:197014.302295 
## [4]  train-rmse:143986.715608 
## [5]  train-rmse:106054.575162 
## [6]  train-rmse:78708.401174 
## [7]  train-rmse:59261.142979 
## [8]  train-rmse:45036.612500 
## [9]  train-rmse:34619.132205 
## [10] train-rmse:26960.944943 
## [11] train-rmse:21344.390926 
## [12] train-rmse:17228.330354 
## [13] train-rmse:14078.180237 
## [14] train-rmse:11728.092527 
## [15] train-rmse:9820.438399 
## [16] train-rmse:8256.381790 
## [17] train-rmse:7067.793431 
## [18] train-rmse:6043.038235 
## [19] train-rmse:5175.317870 
## [20] train-rmse:4472.194958 
## [21] train-rmse:3866.337138 
## [22] train-rmse:3364.881579 
## [23] train-rmse:2934.395574 
## [24] train-rmse:2584.133997 
## [25] train-rmse:2302.632269 
## [26] train-rmse:1996.298629 
## [27] train-rmse:1772.401064 
## [28] train-rmse:1551.709911 
## [29] train-rmse:1398.528044 
## [30] train-rmse:1239.313287 
## [31] train-rmse:1079.140372 
## [32] train-rmse:998.858223 
## [33] train-rmse:873.651336 
## [34] train-rmse:787.968532 
## [35] train-rmse:736.516603 
## [36] train-rmse:670.130904 
## [37] train-rmse:622.521284 
## [38] train-rmse:549.534645 
## [39] train-rmse:485.582805 
## [40] train-rmse:451.473263 
## [41] train-rmse:417.240133 
## [42] train-rmse:394.896771 
## [43] train-rmse:361.479710 
## [44] train-rmse:327.863578 
## [45] train-rmse:315.204414 
## [46] train-rmse:296.563036 
## [47] train-rmse:285.233937 
## [48] train-rmse:275.000859 
## [49] train-rmse:248.090188 
## [50] train-rmse:234.040263 
## [51] train-rmse:212.529709 
## [52] train-rmse:197.656353 
## [53] train-rmse:184.450461 
## [54] train-rmse:177.629293 
## [55] train-rmse:161.685036 
## [56] train-rmse:155.144869 
## [57] train-rmse:135.290302 
## [58] train-rmse:126.246061 
## [59] train-rmse:117.510775 
## [60] train-rmse:103.832299 
## [61] train-rmse:96.568591 
## [62] train-rmse:90.807070 
## [63] train-rmse:80.562097 
## [64] train-rmse:76.259561 
## [65] train-rmse:71.083827 
## [66] train-rmse:65.854179 
## [67] train-rmse:59.134817 
## [68] train-rmse:54.884886 
## [69] train-rmse:51.492676 
## [70] train-rmse:47.427852 
## [71] train-rmse:44.066420 
## [72] train-rmse:40.105875 
## [73] train-rmse:37.341324 
## [74] train-rmse:34.357599 
## [75] train-rmse:32.288832 
## [76] train-rmse:31.037969 
## [77] train-rmse:28.663811 
## [78] train-rmse:25.619637 
## [79] train-rmse:22.136454 
## [80] train-rmse:20.571403 
## [81] train-rmse:19.237834 
## [82] train-rmse:18.066685 
## [83] train-rmse:16.380865 
## [84] train-rmse:15.505279 
## [85] train-rmse:14.128864 
## [86] train-rmse:13.487362 
## [87] train-rmse:12.397692 
## [88] train-rmse:11.873191 
## [89] train-rmse:10.957981 
## [90] train-rmse:10.535849 
## [91] train-rmse:9.898897 
## [92] train-rmse:9.573741 
## [93] train-rmse:8.557649 
## [94] train-rmse:7.541542 
## [95] train-rmse:6.675287 
## [96] train-rmse:6.031104 
## [97] train-rmse:5.455125 
## [98] train-rmse:5.257288 
## [99] train-rmse:4.574771 
## [100]    train-rmse:4.367762

memory_usage_XGB <- object_size(xgb_model)

Finally, we compile the results into a matrix, ForecastAccuracy_XGBoost, providing a comprehensive overview of the XG-Boost model’s forecasting performance and resource requirements.

ForecastAccuracy_XGBoost <- matrix(data=c(method_XGB, rmse_XGB, mae_XGB, mape_XGB, fa_XGB, (mean(training_time_XGB$time) / 1e6), memory_usage_XGB/1000), ncol=7)
colnames(ForecastAccuracy_XGBoost) <- c("Method","RMSE", "MAE", "MAPE","FA%","Training Time(s)","Memory Usage(kB)")
ForecastAccuracy_XGBoost

##      Method     RMSE               MAE                MAPE               
## [1,] "XG-Boost" "13402.4948526562" "10951.5670764008" "0.127369184201445"
##      FA%                Training Time(s) Memory Usage(kB)
## [1,] "83.3333333333333" "434.3503509"    "222.192"

Forecast Accuracy Evaluation

In this section, we evaluate and compare the performance of various forecasting models. We compile the results of multiple models into a single table and identify the model with the highest forecast accuracy.

ModelPerformanceTable<-as.data.frame(rbind(
                             ForecastAccuracy_arima,
                             ForecastAccuracy_2_1,
                             ForecastAccuracy_2_3,
                             ForecastAccuracy_2_5,
                             ForecastAccuracy_3_1,
                             ForecastAccuracy_3_3,
                             ForecastAccuracy_3_5,
                             ForecastAccuracy_XGBoost))
print.data.frame(ModelPerformanceTable)

##                           Method             RMSE              MAE
## 1                          Arima 602268.861803512 497606.690226635
## 2 NN (2 inputs & 1 hidden layer) 116188.192671719 98795.8443003266
## 3 NN (2 inputs & 3 hidden layer) 109770.735633743 92097.0118171223
## 4 NN (2 inputs & 5 hidden layer) 117781.421046873 96421.2310585965
## 5 NN (3 inputs & 1 hidden layer) 121888.830759173  105297.44217686
## 6 NN (3 inputs & 3 hidden layer) 129421.715588937 106294.849071009
## 7 NN (3 inputs & 5 hidden layer) 129421.715588937 106294.849071009
## 8                       XG-Boost 13402.4948526562 10951.5670764008
##                MAPE              FA% Training Time(s) Memory Usage(kB)
## 1               Inf 44.6551621743664       71.0147109            15.92
## 2 0.203829580068449   80.01577815494        0.3154209            3.648
## 3 0.187884975978272 80.6927311933513        0.2407007            3.808
## 4 0.196482880025068 79.3388454516574         0.181541            3.968
## 5 0.219584664438057 79.0628145131386        0.2152011             3.68
## 6 0.218828117427941 77.5016397472226        0.2662711            3.824
## 7 0.218828117427941 77.5016397472226        0.1796909            3.968
## 8 0.127369184201445 83.3333333333333      434.3503509          222.192

BestModel <- ModelPerformanceTable[which.max(ModelPerformanceTable$`FA%`), ]
cat("The model with the highest forecast accuracy metrics is", BestModel$Method, 
    "with a forecast accuracy against sample test dataset of", BestModel$`FA%`, "%")

## The model with the highest forecast accuracy metrics is XG-Boost with a forecast accuracy against sample test dataset of 83.3333333333333 %

Forecast for Next 16 Weeks

This section implements the forecasting process based on the model with the highest forecast accuracy. The approach dynamically selects the most accurate model from a range of candidates and generates a 16-week sales forecast.

if (BestModel$Method == "Arima"){
  forecastdataset <- auto.arima(TotalSalesQuantityByRDD$`projectData$\`Total Sales Quantity\``)
  forecast_arima <- forecast(forecastdataset, h = 16)
  plot(forecast_arima)

  }else if (BestModel$Method == "NN (2 inputs & 1 hidden layer)"){
    num_simulations <- 5000
    prediction30 <- matrix(nrow = 16, ncol = num_simulations)
    last_known_data <- tail(rates, 5)
    training.dat <- round(0.8 * length(rates))
    splitRates_2 <- as.data.frame(splitDataRates(rates, 2))
    testRates_2_min <- min(splitRates_2$output[1:training.dat])
    testRates_2_max <- max(splitRates_2$output[1:training.dat])
    for (month in 1:16) {
      month_predictions <- numeric(num_simulations)
      for (sim in 1:num_simulations) {
        noisy_input <- last_known_data * (1 + rnorm(5, mean = 0, sd = 0.01))
        normalized_vector <- normalise(noisy_input)
        normalized_input <- data.frame(input1 = normalized_vector[1], input2 = normalized_vector[2])
        prediction <- neuralnet::compute(PPIModel_2_1, normalized_input)$net.result
        prediction_unnormalized <- unnormalise(prediction, testRates_2_min, testRates_2_max)
        month_predictions[sim] <- prediction_unnormalized
      }
      prediction30[month, ] <- month_predictions
    }
    average_predictions <- apply(prediction30, 1, mean)
    dates_forecast <- seq(from = max(projectData$`Requested deliv.date`), by = "week", length.out = 16)
    forecast_df <- data.frame(Date = as.Date(dates_forecast), Forecast = average_predictions)
    p <- ggplot(forecast_df, aes(x = Date, y = Forecast)) +
      geom_line(color = "green", size = 0.5) +
      geom_point(color = "red", size = 1) +
      geom_smooth(method = "lm", aes(group = 1), color = "turquoise", fill = "blue", alpha = 0.05, linetype = "dashed", size = 0.5) +
      labs(title = "16-Week Total Sales Forecast", x = "Date", y = "Forecasted Sales") +
      theme_minimal()
    plotly::ggplotly(p, tooltip = c("x", "y"))
    

    
    
    }else if (BestModel$Method == "NN (2 inputs & 3 hidden layer)"){
      num_simulations <- 5000
      prediction30 <- matrix(nrow = 16, ncol = num_simulations)
      last_known_data <- tail(rates, 5)
      training.dat <- round(0.8 * length(rates))
      splitRates_2 <- as.data.frame(splitDataRates(rates, 2))
      testRates_2_min <- min(splitRates_2$output[1:training.dat])
      testRates_2_max <- max(splitRates_2$output[1:training.dat])
      for (month in 1:16) {
        month_predictions <- numeric(num_simulations)
        for (sim in 1:num_simulations) {
          noisy_input <- last_known_data * (1 + rnorm(5, mean = 0, sd = 0.01))
          normalized_vector <- normalise(noisy_input)
          normalized_input <- data.frame(input1 = normalized_vector[1], input2 = normalized_vector[2])
          prediction <- neuralnet::compute(PPIModel_2_3, normalized_input)$net.result
          prediction_unnormalized <- unnormalise(prediction, testRates_2_min, testRates_2_max)
          month_predictions[sim] <- prediction_unnormalized
        }
        prediction30[month, ] <- month_predictions
      }
      average_predictions <- apply(prediction30, 1, mean)
      dates_forecast <- seq(from = max(projectData$`Requested deliv.date`), by = "week", length.out = 16)
      forecast_df <- data.frame(Date = as.Date(dates_forecast), Forecast = average_predictions)
      p <- ggplot(forecast_df, aes(x = Date, y = Forecast)) +
        geom_line(color = "green", size = 0.5) +
        geom_point(color = "red", size = 1) +
        geom_smooth(method = "lm", aes(group = 1), color = "turquoise", fill = "blue", alpha = 0.05, linetype = "dashed", size = 0.5) +
        labs(title = "16-Week Total Sales Forecast", x = "Date", y = "Forecasted Sales") +
        theme_minimal()
      plotly::ggplotly(p, tooltip = c("x", "y"))

      
      }else if (BestModel$Method == "NN (2 inputs & 5 hidden layer)"){
        num_simulations <- 5000
        prediction30 <- matrix(nrow = 16, ncol = num_simulations)
        last_known_data <- tail(rates, 5)
        training.dat <- round(0.8 * length(rates))
        splitRates_2 <- as.data.frame(splitDataRates(rates, 2))
        testRates_2_min <- min(splitRates_2$output[1:training.dat])
        testRates_2_max <- max(splitRates_2$output[1:training.dat])
        for (month in 1:16) {
          month_predictions <- numeric(num_simulations)
          for (sim in 1:num_simulations) {
            noisy_input <- last_known_data * (1 + rnorm(5, mean = 0, sd = 0.01))
            normalized_vector <- normalise(noisy_input)
            normalized_input <- data.frame(input1 = normalized_vector[1], input2 = normalized_vector[2])
            prediction <- neuralnet::compute(PPIModel_2_5, normalized_input)$net.result
            prediction_unnormalized <- unnormalise(prediction, testRates_2_min, testRates_2_max)
            month_predictions[sim] <- prediction_unnormalized
          }
          prediction30[month, ] <- month_predictions
        }
        average_predictions <- apply(prediction30, 1, mean)
        dates_forecast <- seq(from = max(projectData$`Requested deliv.date`), by = "week", length.out = 16)
        forecast_df <- data.frame(Date = as.Date(dates_forecast), Forecast = average_predictions)
        p <- ggplot(forecast_df, aes(x = Date, y = Forecast)) +
          geom_line(color = "green", size = 0.5) +
          geom_point(color = "red", size = 1) +
          geom_smooth(method = "lm", aes(group = 1), color = "turquoise", fill = "blue", alpha = 0.05, linetype = "dashed", size = 0.5) +
          labs(title = "16-Week Total Sales Forecast", x = "Date", y = "Forecasted Sales") +
          theme_minimal()
        plotly::ggplotly(p, tooltip = c("x", "y"))

        
        }else if (BestModel$Method == "NN (3 inputs & 1 hidden layer)"){
          num_simulations <- 5000
        prediction30 <- matrix(nrow = 16, ncol = num_simulations)
        last_known_data <- tail(rates, 5)
        training.dat <- round(0.8 * length(rates))
        splitRates_2 <- as.data.frame(splitDataRates(rates, 2))
        testRates_2_min <- min(splitRates_2$output[1:training.dat])
        testRates_2_max <- max(splitRates_2$output[1:training.dat])
        for (month in 1:16) {
          month_predictions <- numeric(num_simulations)
          for (sim in 1:num_simulations) {
            noisy_input <- last_known_data * (1 + rnorm(5, mean = 0, sd = 0.01))
            normalized_vector <- normalise(noisy_input)
            normalized_input <- data.frame(input1 = normalized_vector[1], input2 = normalized_vector[2])
            prediction <- neuralnet::compute(PPIModel_3_1, normalized_input)$net.result
            prediction_unnormalized <- unnormalise(prediction, testRates_2_min, testRates_2_max)
            month_predictions[sim] <- prediction_unnormalized
          }
          prediction30[month, ] <- month_predictions
        }
        average_predictions <- apply(prediction30, 1, mean)
        dates_forecast <- seq(from = max(projectData$`Requested deliv.date`), by = "week", length.out = 16)
        forecast_df <- data.frame(Date = as.Date(dates_forecast), Forecast = average_predictions)
        p <- ggplot(forecast_df, aes(x = Date, y = Forecast)) +
          geom_line(color = "green", size = 0.5) +
          geom_point(color = "red", size = 1) +
          geom_smooth(method = "lm", aes(group = 1), color = "turquoise", fill = "blue", alpha = 0.05, linetype = "dashed", size = 0.5) +
          labs(title = "16-Week Total Sales Forecast", x = "Date", y = "Forecasted Sales") +
          theme_minimal()
        plotly::ggplotly(p, tooltip = c("x", "y"))

        
          }else if (BestModel$Method == "NN (3 inputs & 3 hidden layer)"){
          num_simulations <- 5000
        prediction30 <- matrix(nrow = 16, ncol = num_simulations)
        last_known_data <- tail(rates, 5)
        training.dat <- round(0.8 * length(rates))
        splitRates_2 <- as.data.frame(splitDataRates(rates, 2))
        testRates_2_min <- min(splitRates_2$output[1:training.dat])
        testRates_2_max <- max(splitRates_2$output[1:training.dat])
        for (month in 1:16) {
          month_predictions <- numeric(num_simulations)
          for (sim in 1:num_simulations) {
            noisy_input <- last_known_data * (1 + rnorm(5, mean = 0, sd = 0.01))
            normalized_vector <- normalise(noisy_input)
            normalized_input <- data.frame(input1 = normalized_vector[1], input2 = normalized_vector[2])
            prediction <- neuralnet::compute(PPIModel_3_3, normalized_input)$net.result
            prediction_unnormalized <- unnormalise(prediction, testRates_2_min, testRates_2_max)
            month_predictions[sim] <- prediction_unnormalized
          }
          prediction30[month, ] <- month_predictions
        }
        average_predictions <- apply(prediction30, 1, mean)
        dates_forecast <- seq(from = max(projectData$`Requested deliv.date`), by = "week", length.out = 16)
        forecast_df <- data.frame(Date = as.Date(dates_forecast), Forecast = average_predictions)
        p <- ggplot(forecast_df, aes(x = Date, y = Forecast)) +
          geom_line(color = "green", size = 0.5) +
          geom_point(color = "red", size = 1) +
          geom_smooth(method = "lm", aes(group = 1), color = "turquoise", fill = "blue", alpha = 0.05, linetype = "dashed", size = 0.5) +
          labs(title = "16-Week Total Sales Forecast", x = "Date", y = "Forecasted Sales") +
          theme_minimal()
        plotly::ggplotly(p, tooltip = c("x", "y"))

            }else if (BestModel$Method == "NN (3 inputs & 5 hidden layer)"){
              num_simulations <- 5000
              prediction30 <- matrix(nrow = 16, ncol = num_simulations)
              last_known_data <- tail(rates, 5)
              training.dat <- round(0.8 * length(rates))
              splitRates_2 <- as.data.frame(splitDataRates(rates, 2))
              testRates_2_min <- min(splitRates_2$output[1:training.dat])
              testRates_2_max <- max(splitRates_2$output[1:training.dat])
              for (month in 1:16) {
                month_predictions <- numeric(num_simulations)
                for (sim in 1:num_simulations) {
                  noisy_input <- last_known_data * (1 + rnorm(5, mean = 0, sd = 0.01))
                  normalized_vector <- normalise(noisy_input)
                  normalized_input <- data.frame(input1 = normalized_vector[1], input2 = normalized_vector[2])
                  prediction <- neuralnet::compute(PPIModel_3_5, normalized_input)$net.result
                  prediction_unnormalized <- unnormalise(prediction, testRates_2_min, testRates_2_max)
                  month_predictions[sim] <- prediction_unnormalized
                }
                prediction30[month, ] <- month_predictions
              }
              average_predictions <- apply(prediction30, 1, mean)
              dates_forecast <- seq(from = max(projectData$`Requested deliv.date`), by = "week", length.out = 16)
              forecast_df <- data.frame(Date = as.Date(dates_forecast), Forecast = average_predictions)
              p <- ggplot(forecast_df, aes(x = Date, y = Forecast)) +
                geom_line(color = "green", size = 0.5) +
                geom_point(color = "red", size = 1) +
                geom_smooth(method = "lm", aes(group = 1), color = "turquoise", fill = "blue", alpha = 0.05, linetype = "dashed", size = 0.5) +
                labs(title = "16-Week Total Sales Forecast", x = "Date", y = "Forecasted Sales") +
                theme_minimal()
              plotly::ggplotly(p, tooltip = c("x", "y"))

              }else if (BestModel$Method == "XG-Boost"){
                data <-projectDataset$`Total Sales Quantity`
                data_matrix <- xgb.DMatrix(data = as.matrix(data), label = data)
                params <- list(objective = "reg:squarederror",eta = 0.1,max_depth = 6)
                xgb_model <- xgb.train(params, data_matrix, nrounds = 100)
                predictions <- predict(xgb_model, as.matrix(data[(length(data) - 11):length(data)]))
                plot(predictions)
                dates <- seq.Date(from = as.Date(tail(projectData$`Requested deliv.date`,1)), by = "week", length.out = length(predictions))
                forecast_df <- data.frame(Date = dates, Forecast = predictions)
                cl <- mean(forecast_df$Forecast)
                ucl <- cl + 3 * sd(forecast_df$Forecast)
                lcl <- cl - 3 * sd(forecast_df$Forecast)
                forecast_df <- forecast_df %>%
                  mutate(CL = cl, UCL = ucl, LCL = lcl)
                
                # Plot the forecast using ggplot2 and make it interactive with plotly
                p <- ggplot(forecast_df, aes(x = Date, y = Forecast)) +
                  geom_line(color = "green", size = 0.5) +
                  geom_point(color = "red", size = 1) +
                  geom_smooth(method = "lm", aes(group = 1), color = "turquoise", fill = "blue", alpha = 0.05, linetype = "dashed", size = 0.5) +
                  geom_hline(aes(yintercept = CL), color = "blue", linetype = "solid", size = 0.5) +
                  geom_hline(aes(yintercept = UCL), color = "red", linetype = "dashed", size = 0.5) +
                  geom_hline(aes(yintercept = LCL), color = "red", linetype = "dashed", size = 0.5) +
                  labs(title = "16-Week Total Sales Forecast with SPC", x = "Date", y = "Forecasted Sales") +
                  theme_minimal()
                
                plotly::ggplotly(p, tooltip = c("x", "y"))
}else {cat("Error 404 : Information Not Found")}

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## `geom_smooth()` using formula = 'y ~ x'

Conclusion & Future Works

This project represents a significant advancement in inventory management for the FMCG sector, leveraging machine learning to enhance demand forecasting and operational efficiency. It integrates data processing, model design, feature engineering, and deployment to create a robust system capable of adapting to complex inventory dynamics. The core innovation lies in a hybrid modeling approach combining traditional statistical methods with machine learning, offering insights into inventory patterns influenced by sales trends, economic conditions, and promotions. A cloud-based, real-time predictive system ensures informed, timely inventory decisions, reducing stockouts and overstock scenarios. Integration with ERP and CRM systems enhances supply chain visibility and control, supporting compliance and fraud detection. Future work includes algorithm enhancement, expansion to other sectors, and user experience improvements, aiming to refine capabilities and explore new opportunities in inventory management.