AIB TERM PAPER

ANN ANALYSIS OF INVENTORY QUANTITY ANALYSIS

Accurate inventory management is crucial for businesses to optimize costs, minimize stockouts, and improve customer satisfaction. Traditionally, inventory forecasting relies on historical data and statistical models. However, with the increasing availability of data and advancements in machine learning, Artificial Neural Networks (ANNs) offer a powerful alternative for predicting inventory quantity.

This project explores the application of ANNs for inventory quantity analysis using R. We will utilize a real-world dataset (replace with a description of your dataset if applicable) containing information about various factors potentially influencing inventory levels, such as store location, product brand, vendor details, purchase price, and product classification.

Through this analysis, we aim to:

  1. Develop an ANN model to predict inventory quantity based on the provided data.

  2. Evaluate the model’s performance by comparing predicted quantities with actual inventory levels.

  3. Compare the effectiveness of different ANN architectures by exploring models with varying numbers of hidden layers.

  4. Gain insights into the relationship between the chosen features and inventory quantity.

The results of this project will demonstrate the potential of ANNs for inventory forecasting and provide valuable insights for businesses seeking to improve their inventory management strategies.

inventorydata <- read.csv("C:/AMRITA SCHOOL OF BUSINESS/Trimister 6/AIB/TERM PAPER/TERM PAPER DATASET 2.csv")
str(inventorydata)
'data.frame':   10000 obs. of  7 variables:
 $ Store         : int  69 30 34 1 76 5 1 30 34 1 ...
 $ PurchasePrice : num  35.71 9.35 9.41 9.35 21.32 ...
 $ Quantity      : int  6 4 5 6 5 6 12 48 5 23 ...
 $ Dollars       : num  214.3 37.4 47 56.1 106.6 ...
 $ Classification: int  1 1 1 1 1 1 1 1 1 1 ...
 $ Brand         : int  8412 5255 5215 5255 2034 3348 8358 4903 3782 4233 ...
 $ VendorNumber  : int  105 4466 4466 4466 388 480 480 480 480 480 ...

This code reads the dataset stored in the CSV file specified by the path and assigns it to the variable inventorydata. The str() function is used to display the structure of the dataset, showing the data types and the first few entries.

normalize <- function(x) { 
  return((x - min(x)) / (max(x) - min(x)))
}

This code defines a normalization function that scales the values of a numeric vector x to fall between 0 and 1.

invdata_normal <- as.data.frame(lapply(inventorydata,normalize))

Here, each numeric column of the dataset is normalized using the normalize function defined earlier. The result is stored in a new dataframe invdata_normal.

summary(invdata_normal$Quantity)
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
0.000000 0.004634 0.010195 0.015374 0.010195 1.000000 
summary(inventorydata$Quantity)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   1.00    6.00   12.00   17.59   12.00 1080.00 
invdata_train <- invdata_normal[1:7500, ]
invdata_test <- invdata_normal[7501:10000, ]

This code splits the normalized dataset into training and testing sets. The first 7500 rows are used for training (invdata_train), and the remaining rows are used for testing (invdata_test).

library(neuralnet)
Warning: package 'neuralnet' was built under R version 4.2.3

This line loads the neuralnet package, which provides functions for building and training neural networks.

set.seed(12345) # to guarantee repeatable results
invdata_model <- neuralnet(Quantity ~ Store +
                              Brand + VendorNumber + PurchasePrice + Dollars + 
                              Classification,
                              data = invdata_train)

This code builds a neural network model using the neuralnet function. It specifies Quantity as the target variable and includes other variables (Store, Brand, VendorNumber, PurchasePrice, Dollars, and Classification) as predictors. The model is trained using the training data.

plot(invdata_model)

This code generates a plot of the neural network model (invdata_model), which provides a visual representation of the network architecture and connections between nodes.

# obtain model results
model_results <- compute(invdata_model, invdata_test[1:7])
# obtain predicted strength values
predicted_strength <- model_results$net.result

These lines compute the model results for the testing set and extract the predicted quantities. The compute function applies the trained neural network model (invdata_model) to the testing data (invdata_test) to obtain predictions.

# examine the correlation between predicted and actual values
cor(predicted_strength, invdata_test$Quantity)
          [,1]
[1,] 0.8532024

This code calculates the correlation between the predicted quantities (predicted_strength) and the actual quantities from the testing set (invdata_test$Quantity). A higher correlation indicates better performance of the model.

set.seed(12345) # to guarantee repeatable results
invdata_model_2 <- neuralnet(Quantity ~ Store +
                              Brand + VendorNumber + PurchasePrice + Dollars + 
                              Classification,
                              data = invdata_train, hidden = 5)

This code builds another neural network model (invdata_model_2) with the same predictors as the first model but includes a hidden layer with 5 neurons. This is an alternative architecture to explore different model configurations

plot(invdata_model_2)

Similar to the first model, this code generates a plot of the second neural network model (invdata_model_2) to visualize its architecture.

model_results2 <- compute(invdata_model_2, invdata_test[1:7])
predicted_strength2 <- model_results2$net.result
cor(predicted_strength2, invdata_test$Quantity)
          [,1]
[1,] 0.9948968

These lines compute the results of the second model for the testing set and evaluate its performance by calculating the correlation between the predicted quantities (predicted_strength2) and the actual quantities from the testing set (invdata_test$Quantity).

ANALYSIS

  1. Data Loading and Normalization:

    • The data is loaded into a data frame named inventorydata using the read.csv() function.

    • The structure of the data frame is explored using the str() function, which reveals that it contains 10000 observations and 7 variables.

    • The variables include Store, PurchasePrice, Quantity, Dollars, Classification, Brand, and VendorNumber.

    • All the features are normalized using a function named normalize. Normalization scales the features to a range of 0 to 1. This is helpful for machine learning algorithms because it puts all features on the same scale and avoids issues with features having a much larger range than others.

  2. Train-Test Split:

    • The data is split into training and testing sets. The training set is used to build the ANN model, and the testing set is used to evaluate the model’s performance.

    • In this case, the data is split into a 75% training set (invdata_train) and a 25% testing set (invdata_test).

  3. Building the ANN Model:

    • Two neural network models are built using the neuralnet library in R.

    • The first model (invdata_model) has a single hidden layer.

    • The second model (invdata_model_2) has five hidden layers.

    • Both models use the following features as input variables to predict the quantity:

      • Store

      • Brand

      • VendorNumber

      • PurchasePrice

      • Dollars

      • Classification

  4. Model Evaluation:

    • The performance of the models is evaluated using the correlation coefficient between the predicted quantity (predicted_strength) and the actual quantity (Quantity) on the testing set.

    • The correlation coefficient is a measure of the linear relationship between two variables. A value of 1 indicates a perfect positive correlation, 0 indicates no correlation, and -1 indicates a perfect negative correlation.

    • The first model (invdata_model) has a correlation coefficient of 0.853, which indicates a strong positive correlation between the predicted and actual quantity.

    • The second model (invdata_model_2) has a significantly higher correlation coefficient of 0.995, which indicates a very strong positive correlation between the predicted and actual quantity. This suggests that the model with five hidden layers performs significantly better than the model with a single hidden layer in predicting inventory quantity.

Inventory Quantity Analysis using C5.0 Decision Trees​

This R program investigates the use of C5.0 decision trees for classifying inventory items into two categories: “One” and “Two.” The project analyzes a dataset with 8 attributes potentially influencing this classification.

The program follows these steps:

  1. Data Preparation:

    • Imports a CSV dataset containing inventory information.

    • Explores the data structure and summarizes key variables.

    • Splits the data into training and testing sets for model building and evaluation.

    • Prepares the data for C5.0 by converting the class labels into a factor.

  2. Model Building and Training:

    • Creates a C5.0 decision tree model using the training data.

    • Analyzes the model summary, including size, errors, and attribute usage.

    • Evaluates the model’s performance on the training data using a confusion matrix.

  3. Model Evaluation:

    • Uses the trained model to predict class labels for the unseen test data.

    • Creates a confusion matrix to assess the model’s performance on the test set.

  4. Experiment with Boosting (Optional):

    • Attempts to create a boosted C5.0 model but stops prematurely.
  5. Overall Analysis:

    • Discusses the achieved perfect classification accuracy (likely due to overfitting).

    • Proposes improvements for building a more robust and generalizable model.

This project demonstrates the application of C5.0 decision trees for inventory classification in R and highlights the importance of addressing overfitting for better model performance.

Step 1 – collecting data

The efficient management of oil reserves is crucial for oil producers. Maintaining optimal inventory levels ensures a steady supply for refineries while minimizing storage costs and avoiding waste. However, determining the ideal quantity for each type of crude oil can be a complex task.

This study explores the use of C5.0 decision trees for inventory quantity analysis in the context of oil production. We leverage historical data on various crude oil grades stored at production facilities.

Our goal is to develop a data-driven model that classifies different crude oil types based on features that influence optimal stock levels

The data contain information about oil inventory levels.

  • The store number for a particular type of oil (likely referring to a storage tank or reservoir)

  • A vendor number that could be used to track suppliers of the oil

  • The purchase price of the oil per unit (e.g., per barrel)

  • The quantity of a particular type of oil bought

  • The total dollars spent on a particular type of oil (which can be calculated by multiplying the purchase price by the quantity)

  • Brand is the Brand number of the product

  • Class and Classifications is the type in which the dataset are divided into

inventorydata <- read.csv("C:/AMRITA SCHOOL OF BUSINESS/Trimister 6/AIB/TERM PAPER/TERM PAPER DATASET.csv", stringsAsFactors = TRUE)

This chunk of code reads the dataset stored in the file path “C:/AMRITA SCHOOL OF BUSINESS/Trimister 6/AIB/TERM PAPER/TERM PAPER DATASET.csv” into R. The stringsAsFactors argument specifies whether to convert strings to factors.

str(inventorydata)
'data.frame':   10000 obs. of  8 variables:
 $ Store         : int  69 30 34 1 76 5 1 30 34 1 ...
 $ PurchasePrice : num  35.71 9.35 9.41 9.35 21.32 ...
 $ Quantity      : int  6 4 5 6 5 6 12 48 5 23 ...
 $ Dollars       : num  214.3 37.4 47 56.1 106.6 ...
 $ Classification: int  1 1 1 1 1 1 1 1 1 1 ...
 $ Brand         : int  8412 5255 5215 5255 2034 3348 8358 4903 3782 4233 ...
 $ VendorNumber  : int  105 4466 4466 4466 388 480 480 480 480 480 ...
 $ Class         : Factor w/ 2 levels "One","Two": 1 1 1 1 1 1 1 1 1 1 ...

This chunk provides a structural summary of the dataset, including the data types and the first few observations of each variable.

table(inventorydata$Quantity)

   1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16 
 188  119  303  358  519 2027   63  120  171  395  723 2807   21   25   50   37 
  17   18   19   20   21   22   23   24   25   26   27   28   29   30   31   32 
  57  124   16   37   46   79  119  364    5    6    9   12   19   55    6   13 
  33   34   35   36   37   38   39   40   41   42   43   44   45   46   47   48 
  11   27   36  180    2    1    6    7    8   20    5    9   17   17   25  129 
  49   50   51   52   53   54   55   56   57   58   59   60   62   64   65   66 
   1    5    3    5    6   13    4    5    7   17   22   68    3    2    7   11 
  67   68   69   70   71   72   73   74   75   76   77   78   79   80   82   83 
   1    3    4    7   10   22    3    1    2    2    5    6    3    5    4    7 
  84   87   88   89   90   92   93   94   95   96   98   99  101  102  103  104 
  19    2    2    1    2    2    3    7    4   17    3    2    2    2    3    1 
 105  106  107  108  110  111  112  113  114  115  116  117  118  119  120  123 
   1    5    7   21    6    3    3    2   12    8    8    4   12   18   64    1 
 125  126  128  130  131  132  137  138  140  141  142  143  144  147  148  150 
   1    1    1    1    1   11    1    2    2    1    3    3   12    2    1    2 
 152  154  155  156  168  170  173  180  186  188  198  204  226  227  230  232 
   1    1    2    3    5    1    1    3    1    1    1    1    1    1    1    1 
 236  246  247  253  258  263  299  300  302  308  342  343  348  402  462  472 
   1    1    1    1    1    1    2    2    1    1    2    1    1    1    1    1 
 503  604  613  702  711  738 1080 
   1    1    1    1    1    1    1 

This chunk creates a frequency table for the variable “Quantity” in the dataset, showing the count of each unique quantity.

summary(inventorydata$PurchasePrice)  
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   0.62    6.66    9.77   12.39   15.38  240.59 
summary(inventorydata$Dollars)
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
    1.42    60.30    95.88   188.20   174.46 15303.60 

These chunks provide summary statistics (e.g., mean, median, min, max) for the variables “PurchasePrice” and “Dollars” in the dataset.

Data preparation

set.seed(9829)  
train_sample <- sample(10000,9000) 
str(train_sample)
 int [1:9000] 9869 1890 1030 2391 9583 1256 1961 3510 8559 3826 ...

This chunk sets a seed for reproducibility and creates a random sample of 9000 indices from 1 to 10000. These indices will be used to select the training data.

inv_train <- inventorydata[train_sample, ] 
inv_test <- inventorydata[-train_sample, ]

These chunks subset the dataset into training and testing sets using the indices generated in the previous step. The training set contains 9000 randomly selected observations, and the testing set contains the remaining observations.

prop.table(table(inv_train$Quantity))  

           1            2            3            4            5            6 
0.0198888889 0.0115555556 0.0303333333 0.0363333333 0.0526666667 0.2028888889 
           7            8            9           10           11           12 
0.0062222222 0.0116666667 0.0174444444 0.0406666667 0.0717777778 0.2797777778 
          13           14           15           16           17           18 
0.0023333333 0.0025555556 0.0050000000 0.0036666667 0.0058888889 0.0124444444 
          19           20           21           22           23           24 
0.0016666667 0.0037777778 0.0041111111 0.0081111111 0.0117777778 0.0363333333 
          25           26           27           28           29           30 
0.0004444444 0.0005555556 0.0008888889 0.0012222222 0.0014444444 0.0055555556 
          31           32           33           34           35           36 
0.0005555556 0.0011111111 0.0010000000 0.0026666667 0.0037777778 0.0180000000 
          37           38           39           40           41           42 
0.0001111111 0.0001111111 0.0005555556 0.0007777778 0.0006666667 0.0017777778 
          43           44           45           46           47           48 
0.0004444444 0.0007777778 0.0017777778 0.0016666667 0.0024444444 0.0128888889 
          49           50           51           52           53           54 
0.0001111111 0.0005555556 0.0002222222 0.0005555556 0.0004444444 0.0013333333 
          55           56           57           58           59           60 
0.0003333333 0.0004444444 0.0006666667 0.0016666667 0.0021111111 0.0064444444 
          62           64           65           66           67           68 
0.0003333333 0.0002222222 0.0006666667 0.0011111111 0.0001111111 0.0003333333 
          69           70           71           72           73           74 
0.0003333333 0.0007777778 0.0008888889 0.0023333333 0.0003333333 0.0001111111 
          75           76           77           78           79           80 
0.0001111111 0.0001111111 0.0005555556 0.0006666667 0.0002222222 0.0005555556 
          82           83           84           87           88           89 
0.0003333333 0.0007777778 0.0018888889 0.0002222222 0.0002222222 0.0001111111 
          90           92           93           94           95           96 
0.0002222222 0.0002222222 0.0003333333 0.0006666667 0.0004444444 0.0017777778 
          98           99          101          102          103          104 
0.0003333333 0.0002222222 0.0002222222 0.0001111111 0.0002222222 0.0001111111 
         105          106          107          108          110          111 
0.0001111111 0.0004444444 0.0007777778 0.0022222222 0.0005555556 0.0003333333 
         112          114          115          116          117          118 
0.0003333333 0.0012222222 0.0008888889 0.0005555556 0.0003333333 0.0013333333 
         119          120          125          126          128          130 
0.0016666667 0.0061111111 0.0001111111 0.0001111111 0.0001111111 0.0001111111 
         132          137          138          140          141          142 
0.0012222222 0.0001111111 0.0001111111 0.0002222222 0.0001111111 0.0003333333 
         143          144          147          148          150          152 
0.0002222222 0.0013333333 0.0002222222 0.0001111111 0.0002222222 0.0001111111 
         154          155          156          168          170          173 
0.0001111111 0.0002222222 0.0003333333 0.0005555556 0.0001111111 0.0001111111 
         180          186          188          198          204          226 
0.0003333333 0.0001111111 0.0001111111 0.0001111111 0.0001111111 0.0001111111 
         227          230          232          236          246          247 
0.0001111111 0.0001111111 0.0001111111 0.0001111111 0.0001111111 0.0001111111 
         253          258          263          299          300          302 
0.0001111111 0.0001111111 0.0001111111 0.0001111111 0.0002222222 0.0001111111 
         342          343          402          462          472          503 
0.0002222222 0.0001111111 0.0001111111 0.0001111111 0.0001111111 0.0001111111 
         604          613          702          711          738 
0.0001111111 0.0001111111 0.0001111111 0.0001111111 0.0001111111 
prop.table(table(inv_test$Quantity)) 

    1     2     3     4     5     6     7     8     9    10    11    12    14 
0.009 0.015 0.030 0.031 0.045 0.201 0.007 0.015 0.014 0.029 0.077 0.289 0.002 
   15    16    17    18    19    20    21    22    23    24    25    26    27 
0.005 0.004 0.004 0.012 0.001 0.003 0.009 0.006 0.013 0.037 0.001 0.001 0.001 
   28    29    30    31    32    33    34    35    36    37    39    41    42 
0.001 0.006 0.005 0.001 0.003 0.002 0.003 0.002 0.018 0.001 0.001 0.002 0.004 
   43    44    45    46    47    48    51    53    54    55    56    57    58 
0.001 0.002 0.001 0.002 0.003 0.013 0.001 0.002 0.001 0.001 0.001 0.001 0.002 
   59    60    65    66    69    71    72    75    76    79    82    84    94 
0.003 0.010 0.001 0.001 0.001 0.002 0.001 0.001 0.001 0.001 0.001 0.002 0.001 
   96   102   103   106   108   110   113   114   116   117   119   120   123 
0.001 0.001 0.001 0.001 0.001 0.001 0.002 0.001 0.003 0.001 0.003 0.009 0.001 
  131   138   143   299   308   348  1080 
0.001 0.001 0.001 0.001 0.001 0.001 0.001 

These chunks calculate the proportions of each unique quantity in both the training and testing sets.

Step 3 – training a model on the data

library(C50)
Warning: package 'C50' was built under R version 4.2.3

This chunk loads the C50 package, converts the “Class” variable into a factor, and trains a C5.0 decision tree model using the training data.

inventorydata$Class <- factor(inventorydata$Class)

In this chunk, the “Class” variable in the dataset inventorydata is converted into a factor. This is typically done when the variable represents categorical or qualitative data. Factors are R’s data structure used to represent categorical data, where each unique value of the variable is treated as a level.

inv_model <- C5.0(Class~ ., data = inv_train)

In this chunk, a C5.0 decision tree model is trained using the formula Class~ ., which means that we are predicting the “Class” variable using all other variables in the dataset. The data = inv_train argument specifies that the model should be trained on the inv_train dataset, which was previously created by splitting the original dataset into a training set

summary(inv_model)

Call:
C5.0.formula(formula = Class ~ ., data = inv_train)


C5.0 [Release 2.07 GPL Edition]     Fri Apr  5 18:10:14 2024
-------------------------------

Class specified by attribute `outcome'

Read 9000 cases (8 attributes) from undefined.data

Decision tree:

Classification <= 1: One (5412)
Classification > 1: Two (3588)


Evaluation on training data (9000 cases):

        Decision Tree   
      ----------------  
      Size      Errors  

         2    0( 0.0%)   <<


       (a)   (b)    <-classified as
      ----  ----
      5412          (a): class One
            3588    (b): class Two


    Attribute usage:

    100.00% Classification


Time: 0.0 secs

This chunk provides a summary of the trained C5.0 model, including information about the size of the tree and its complexity.

Step 4 – evaluating model performance

inv_pred <- predict(inv_model, inv_test)

This chunk uses the trained model to make predictions on the testing set.

library(gmodels) 
Warning: package 'gmodels' was built under R version 4.2.3
CrossTable(inv_test$Class, inv_pred, prop.chisq = FALSE, prop.c = FALSE, prop.r = FALSE, dnn = c('actual Classification', 'predicted Classification'))

 
   Cell Contents
|-------------------------|
|                       N |
|         N / Table Total |
|-------------------------|

 
Total Observations in Table:  1000 

 
                      | predicted Classification 
actual Classification |       One |       Two | Row Total | 
----------------------|-----------|-----------|-----------|
                  One |       620 |         0 |       620 | 
                      |     0.620 |     0.000 |           | 
----------------------|-----------|-----------|-----------|
                  Two |         0 |       380 |       380 | 
                      |     0.000 |     0.380 |           | 
----------------------|-----------|-----------|-----------|
         Column Total |       620 |       380 |      1000 | 
----------------------|-----------|-----------|-----------|

 

This chunk creates a cross-tabulation table to evaluate the performance of the model by comparing the actual classifications in the testing set with the predicted classifications.

Step 5 – improving model performance

credit_boost10 <- C5.0(Class ~ ., data = inv_train, trials = 10)

This chunk trains a boosted C5.0 model with 10 trials using the training data.

That’s an overview of the code chunks provided for inventory quantity analysis using AI. Each chunk serves a specific purpose in the data preparation, model training, and evaluation process.

credit_boost10

Call:
C5.0.formula(formula = Class ~ ., data = inv_train, trials = 10)

Classification Tree
Number of samples: 9000 
Number of predictors: 7 

Number of boosting iterations: 10 requested;  1 used due to early stopping

Non-standard options: attempt to group attributes

ANALYSIS

Data Preparation:

  1. Data Import: The code reads a CSV file named “TERM PAPER DATASET.csv” containing information about inventory items.

  2. Data Exploration:

    • str(inventorydata) shows the data structure with 10000 observations and 8 variables.

    • summary(inventorydata$PurchasePrice) and summary(inventorydata$Dollars) summarize Purchase Price and Dollar values.

  3. Train-Test Split:

    • set.seed(9829) sets a random seed for reproducibility.

    • train_sample <- sample(10000,9000) creates a random sample of 9000 observations for the training set.

    • inv_train <- inventorydata[train_sample, ] selects the training data from the original dataset.

    • inv_test <- inventorydata[-train_sample, ] selects the remaining 1000 observations for the testing set.

  4. Class Factor Conversion:

    • inventorydata$Class <- factor(inventorydata$Class) converts the “Class” variable into a factor for the C5.0 algorithm.

Model Building and Training:

  1. C5.0 Model Creation:

    • inv_model <- C5.0(Class~ ., data = inv_train) builds a C5.0 decision tree model using the training data. The formula Class ~ . indicates that the model predicts the “Class” variable based on all other attributes (represented by the dot).
  2. Model Summary:

    • summary(inv_model) displays the model summary:

      • Call: Shows the formula used for the model.

      • Read: Indicates the number of cases (9000) and attributes (8) used for training.

      • Decision Tree:

        • Size: The tree has only 2 nodes (root and a single child).

        • Errors: There are zero errors on the training data (perfect fit).

      • Evaluation: Shows the confusion matrix with all 5412 instances from Class “One” correctly classified and all 3588 instances from Class “Two” correctly classified.

      • Attribute Usage: Indicates that the “Classification” attribute was the only one used for splitting in the tree (likely because it perfectly separated the classes).

      • Time: Shows the training time (0.0 seconds).

Model Evaluation:

  1. Prediction on Test Set:

    • inv_pred <- predict(inv_model, inv_test) uses the trained model to predict the class labels for the test data.
  2. Evaluation Metrics:

    • library(gmodels) loads the “gmodels” package for creating a confusion matrix.

    • CrossTable creates a cross-tabulation table comparing the actual class labels in the test set with the predicted labels from the model.

    • The table shows a perfect classification performance on the test set:

      • All 620 instances from Class “One” were correctly predicted.

      • All 380 instances from Class “Two” were correctly predicted.

Boosting Experiment (Attempted but Early Stopped):

  1. Boosting Model Creation:

    • credit_boost10 <- C5.0(Class ~ ., data = inv_train, trials = 10) attempts to create a boosted C5.0 model with 10 boosting iterations. However, due to early stopping, only one iteration was used.

    • The model details are not printed as it wasn’t fully trained.

Overall Analysis:

The C5.0 decision tree achieved perfect classification accuracy on both the training and test sets. However, this is likely due to overfitting, where the model memorized the training data perfectly but might not generalize well to unseen data.

Here are some improvements to consider:

  • Feature Engineering: Explore creating new features from existing ones or using techniques like feature scaling/normalization to potentially improve model performance.

  • Model Selection and Evaluation: Try other classification algorithms and compare their performance using metrics like accuracy, precision, recall, and F1-score.

  • Cross-Validation: Use cross-validation techniques to obtain a more reliable estimate of model performance on unseen data.

By addressing these points, you can build a more robust and generalizable model for inventory classification using AI.

Analysis of the Two AI Inventory Management Approaches

The provided document explores two AI approaches for inventory quantity analysis: Artificial Neural Networks (ANNs) and C5.0 decision trees. Here’s a breakdown of each approach and its analysis:

1. ANN Approach

  • Goal: Develop an ANN model to predict inventory quantity based on various factors like store location, brand, vendor details, purchase price, and product classification.

  • Data: A dataset containing 10000 observations with 7 features is used.

  • Process:

    • Data is loaded and normalized to scale features between 0 and 1.

    • The data is split into training (75%) and testing (25%) sets.

    • Two ANN models are built:

      • Model 1: One hidden layer

      • Model 2: Five hidden layers

    • Model performance is evaluated using the correlation coefficient between predicted and actual quantities on the testing set.

  • Analysis:

    • Model 1 achieves a correlation coefficient of 0.853, indicating a strong positive correlation.

    • Model 2 with five hidden layers performs significantly better, achieving a correlation coefficient of 0.995, suggesting a very strong positive correlation between predicted and actual quantities.

2. C5.0 Decision Tree Approach

  • Goal: Develop a C5.0 decision tree model to classify inventory items into two categories (“One” and “Two”) based on 8 attributes.

  • Data: The same dataset (10000 observations with 8 features) is used.

  • Process:

    • Data is loaded and explored.

    • The data is split into training (90%) and testing (10%) sets.

    • The “Class” variable is converted into a factor for the C5.0 algorithm.

    • A C5.0 decision tree model is built using the training data.

    • Model performance is evaluated using a confusion matrix on the testing set.

    • An attempt is made to create a boosted C5.0 model, but it’s stopped prematurely.

  • Analysis:

    • The C5.0 decision tree achieves perfect classification accuracy (100%) on both the training and testing sets. However, this is likely due to overfitting, where the model memorized the training data but may not generalize well to unseen data.

    • To improve the model, feature engineering, using different classification algorithms, and cross-validation techniques are suggested.

Overall Observations

  • Both ANN and C5.0 decision trees can be used for inventory analysis.

  • The ANN approach with multiple hidden layers shows promise for accurate inventory quantity prediction.

  • The C5.0 decision tree approach achieves high accuracy but suffers from overfitting, limiting its generalizability.