Accurate inventory management is crucial for businesses to optimize costs, minimize stockouts, and improve customer satisfaction. Traditionally, inventory forecasting relies on historical data and statistical models. However, with the increasing availability of data and advancements in machine learning, Artificial Neural Networks (ANNs) offer a powerful alternative for predicting inventory quantity.
This project explores the application of ANNs for inventory quantity analysis using R. We will utilize a real-world dataset (replace with a description of your dataset if applicable) containing information about various factors potentially influencing inventory levels, such as store location, product brand, vendor details, purchase price, and product classification.
Through this analysis, we aim to:
Develop an ANN model to predict inventory quantity based on the provided data.
Evaluate the model’s performance by comparing predicted quantities with actual inventory levels.
Compare the effectiveness of different ANN architectures by exploring models with varying numbers of hidden layers.
Gain insights into the relationship between the chosen features and inventory quantity.
The results of this project will demonstrate the potential of ANNs for inventory forecasting and provide valuable insights for businesses seeking to improve their inventory management strategies.
inventorydata <-read.csv("C:/AMRITA SCHOOL OF BUSINESS/Trimister 6/AIB/TERM PAPER/TERM PAPER DATASET 2.csv")str(inventorydata)
This code reads the dataset stored in the CSV file specified by the path and assigns it to the variable inventorydata. The str() function is used to display the structure of the dataset, showing the data types and the first few entries.
Here, each numeric column of the dataset is normalized using the normalize function defined earlier. The result is stored in a new dataframe invdata_normal.
summary(invdata_normal$Quantity)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000000 0.004634 0.010195 0.015374 0.010195 1.000000
summary(inventorydata$Quantity)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.00 6.00 12.00 17.59 12.00 1080.00
This code splits the normalized dataset into training and testing sets. The first 7500 rows are used for training (invdata_train), and the remaining rows are used for testing (invdata_test).
library(neuralnet)
Warning: package 'neuralnet' was built under R version 4.2.3
This line loads the neuralnet package, which provides functions for building and training neural networks.
This code builds a neural network model using the neuralnet function. It specifies Quantity as the target variable and includes other variables (Store, Brand, VendorNumber, PurchasePrice, Dollars, and Classification) as predictors. The model is trained using the training data.
plot(invdata_model)
This code generates a plot of the neural network model (invdata_model), which provides a visual representation of the network architecture and connections between nodes.
These lines compute the model results for the testing set and extract the predicted quantities. The compute function applies the trained neural network model (invdata_model) to the testing data (invdata_test) to obtain predictions.
# examine the correlation between predicted and actual valuescor(predicted_strength, invdata_test$Quantity)
[,1]
[1,] 0.8532024
This code calculates the correlation between the predicted quantities (predicted_strength) and the actual quantities from the testing set (invdata_test$Quantity). A higher correlation indicates better performance of the model.
This code builds another neural network model (invdata_model_2) with the same predictors as the first model but includes a hidden layer with 5 neurons. This is an alternative architecture to explore different model configurations
plot(invdata_model_2)
Similar to the first model, this code generates a plot of the second neural network model (invdata_model_2) to visualize its architecture.
These lines compute the results of the second model for the testing set and evaluate its performance by calculating the correlation between the predicted quantities (predicted_strength2) and the actual quantities from the testing set (invdata_test$Quantity).
ANALYSIS
Data Loading and Normalization:
The data is loaded into a data frame named inventorydata using the read.csv() function.
The structure of the data frame is explored using the str() function, which reveals that it contains 10000 observations and 7 variables.
The variables include Store, PurchasePrice, Quantity, Dollars, Classification, Brand, and VendorNumber.
All the features are normalized using a function named normalize. Normalization scales the features to a range of 0 to 1. This is helpful for machine learning algorithms because it puts all features on the same scale and avoids issues with features having a much larger range than others.
Train-Test Split:
The data is split into training and testing sets. The training set is used to build the ANN model, and the testing set is used to evaluate the model’s performance.
In this case, the data is split into a 75% training set (invdata_train) and a 25% testing set (invdata_test).
Building the ANN Model:
Two neural network models are built using the neuralnet library in R.
The first model (invdata_model) has a single hidden layer.
The second model (invdata_model_2) has five hidden layers.
Both models use the following features as input variables to predict the quantity:
Store
Brand
VendorNumber
PurchasePrice
Dollars
Classification
Model Evaluation:
The performance of the models is evaluated using the correlation coefficient between the predicted quantity (predicted_strength) and the actual quantity (Quantity) on the testing set.
The correlation coefficient is a measure of the linear relationship between two variables. A value of 1 indicates a perfect positive correlation, 0 indicates no correlation, and -1 indicates a perfect negative correlation.
The first model (invdata_model) has a correlation coefficient of 0.853, which indicates a strong positive correlation between the predicted and actual quantity.
The second model (invdata_model_2) has a significantly higher correlation coefficient of 0.995, which indicates a very strong positive correlation between the predicted and actual quantity. This suggests that the model with five hidden layers performs significantly better than the model with a single hidden layer in predicting inventory quantity.
Inventory Quantity Analysis using C5.0 Decision Trees
This R program investigates the use of C5.0 decision trees for classifying inventory items into two categories: “One” and “Two.” The project analyzes a dataset with 8 attributes potentially influencing this classification.
The program follows these steps:
Data Preparation:
Imports a CSV dataset containing inventory information.
Explores the data structure and summarizes key variables.
Splits the data into training and testing sets for model building and evaluation.
Prepares the data for C5.0 by converting the class labels into a factor.
Model Building and Training:
Creates a C5.0 decision tree model using the training data.
Analyzes the model summary, including size, errors, and attribute usage.
Evaluates the model’s performance on the training data using a confusion matrix.
Model Evaluation:
Uses the trained model to predict class labels for the unseen test data.
Creates a confusion matrix to assess the model’s performance on the test set.
Experiment with Boosting (Optional):
Attempts to create a boosted C5.0 model but stops prematurely.
Overall Analysis:
Discusses the achieved perfect classification accuracy (likely due to overfitting).
Proposes improvements for building a more robust and generalizable model.
This project demonstrates the application of C5.0 decision trees for inventory classification in R and highlights the importance of addressing overfitting for better model performance.
Step 1 – collecting data
The efficient management of oil reserves is crucial for oil producers. Maintaining optimal inventory levels ensures a steady supply for refineries while minimizing storage costs and avoiding waste. However, determining the ideal quantity for each type of crude oil can be a complex task.
This study explores the use of C5.0 decision trees for inventory quantity analysis in the context of oil production. We leverage historical data on various crude oil grades stored at production facilities.
Our goal is to develop a data-driven model that classifies different crude oil types based on features that influence optimal stock levels
The data contain information about oil inventory levels.
The store number for a particular type of oil (likely referring to a storage tank or reservoir)
A vendor number that could be used to track suppliers of the oil
The purchase price of the oil per unit (e.g., per barrel)
The quantity of a particular type of oil bought
The total dollars spent on a particular type of oil (which can be calculated by multiplying the purchase price by the quantity)
Brand is the Brand number of the product
Class and Classifications is the type in which the dataset are divided into
inventorydata <-read.csv("C:/AMRITA SCHOOL OF BUSINESS/Trimister 6/AIB/TERM PAPER/TERM PAPER DATASET.csv", stringsAsFactors =TRUE)
This chunk of code reads the dataset stored in the file path “C:/AMRITA SCHOOL OF BUSINESS/Trimister 6/AIB/TERM PAPER/TERM PAPER DATASET.csv” into R. The stringsAsFactors argument specifies whether to convert strings to factors.
This chunk sets a seed for reproducibility and creates a random sample of 9000 indices from 1 to 10000. These indices will be used to select the training data.
These chunks subset the dataset into training and testing sets using the indices generated in the previous step. The training set contains 9000 randomly selected observations, and the testing set contains the remaining observations.
These chunks calculate the proportions of each unique quantity in both the training and testing sets.
Step 3 – training a model on the data
library(C50)
Warning: package 'C50' was built under R version 4.2.3
This chunk loads the C50 package, converts the “Class” variable into a factor, and trains a C5.0 decision tree model using the training data.
inventorydata$Class <-factor(inventorydata$Class)
In this chunk, the “Class” variable in the dataset inventorydata is converted into a factor. This is typically done when the variable represents categorical or qualitative data. Factors are R’s data structure used to represent categorical data, where each unique value of the variable is treated as a level.
inv_model <-C5.0(Class~ ., data = inv_train)
In this chunk, a C5.0 decision tree model is trained using the formula Class~ ., which means that we are predicting the “Class” variable using all other variables in the dataset. The data = inv_train argument specifies that the model should be trained on the inv_train dataset, which was previously created by splitting the original dataset into a training set
summary(inv_model)
Call:
C5.0.formula(formula = Class ~ ., data = inv_train)
C5.0 [Release 2.07 GPL Edition] Fri Apr 5 18:10:14 2024
-------------------------------
Class specified by attribute `outcome'
Read 9000 cases (8 attributes) from undefined.data
Decision tree:
Classification <= 1: One (5412)
Classification > 1: Two (3588)
Evaluation on training data (9000 cases):
Decision Tree
----------------
Size Errors
2 0( 0.0%) <<
(a) (b) <-classified as
---- ----
5412 (a): class One
3588 (b): class Two
Attribute usage:
100.00% Classification
Time: 0.0 secs
This chunk provides a summary of the trained C5.0 model, including information about the size of the tree and its complexity.
Step 4 – evaluating model performance
inv_pred <-predict(inv_model, inv_test)
This chunk uses the trained model to make predictions on the testing set.
library(gmodels)
Warning: package 'gmodels' was built under R version 4.2.3
Cell Contents
|-------------------------|
| N |
| N / Table Total |
|-------------------------|
Total Observations in Table: 1000
| predicted Classification
actual Classification | One | Two | Row Total |
----------------------|-----------|-----------|-----------|
One | 620 | 0 | 620 |
| 0.620 | 0.000 | |
----------------------|-----------|-----------|-----------|
Two | 0 | 380 | 380 |
| 0.000 | 0.380 | |
----------------------|-----------|-----------|-----------|
Column Total | 620 | 380 | 1000 |
----------------------|-----------|-----------|-----------|
This chunk creates a cross-tabulation table to evaluate the performance of the model by comparing the actual classifications in the testing set with the predicted classifications.
Step 5 – improving model performance
credit_boost10 <-C5.0(Class ~ ., data = inv_train, trials =10)
This chunk trains a boosted C5.0 model with 10 trials using the training data.
That’s an overview of the code chunks provided for inventory quantity analysis using AI. Each chunk serves a specific purpose in the data preparation, model training, and evaluation process.
credit_boost10
Call:
C5.0.formula(formula = Class ~ ., data = inv_train, trials = 10)
Classification Tree
Number of samples: 9000
Number of predictors: 7
Number of boosting iterations: 10 requested; 1 used due to early stopping
Non-standard options: attempt to group attributes
ANALYSIS
Data Preparation:
Data Import: The code reads a CSV file named “TERM PAPER DATASET.csv” containing information about inventory items.
Data Exploration:
str(inventorydata) shows the data structure with 10000 observations and 8 variables.
summary(inventorydata$PurchasePrice) and summary(inventorydata$Dollars) summarize Purchase Price and Dollar values.
Train-Test Split:
set.seed(9829) sets a random seed for reproducibility.
train_sample <- sample(10000,9000) creates a random sample of 9000 observations for the training set.
inv_train <- inventorydata[train_sample, ] selects the training data from the original dataset.
inv_test <- inventorydata[-train_sample, ] selects the remaining 1000 observations for the testing set.
Class Factor Conversion:
inventorydata$Class <- factor(inventorydata$Class) converts the “Class” variable into a factor for the C5.0 algorithm.
Model Building and Training:
C5.0 Model Creation:
inv_model <- C5.0(Class~ ., data = inv_train) builds a C5.0 decision tree model using the training data. The formula Class ~ . indicates that the model predicts the “Class” variable based on all other attributes (represented by the dot).
Model Summary:
summary(inv_model) displays the model summary:
Call: Shows the formula used for the model.
Read: Indicates the number of cases (9000) and attributes (8) used for training.
Decision Tree:
Size: The tree has only 2 nodes (root and a single child).
Errors: There are zero errors on the training data (perfect fit).
Evaluation: Shows the confusion matrix with all 5412 instances from Class “One” correctly classified and all 3588 instances from Class “Two” correctly classified.
Attribute Usage: Indicates that the “Classification” attribute was the only one used for splitting in the tree (likely because it perfectly separated the classes).
Time: Shows the training time (0.0 seconds).
Model Evaluation:
Prediction on Test Set:
inv_pred <- predict(inv_model, inv_test) uses the trained model to predict the class labels for the test data.
Evaluation Metrics:
library(gmodels) loads the “gmodels” package for creating a confusion matrix.
CrossTable creates a cross-tabulation table comparing the actual class labels in the test set with the predicted labels from the model.
The table shows a perfect classification performance on the test set:
All 620 instances from Class “One” were correctly predicted.
All 380 instances from Class “Two” were correctly predicted.
Boosting Experiment (Attempted but Early Stopped):
Boosting Model Creation:
credit_boost10 <- C5.0(Class ~ ., data = inv_train, trials = 10) attempts to create a boosted C5.0 model with 10 boosting iterations. However, due to early stopping, only one iteration was used.
The model details are not printed as it wasn’t fully trained.
Overall Analysis:
The C5.0 decision tree achieved perfect classification accuracy on both the training and test sets. However, this is likely due to overfitting, where the model memorized the training data perfectly but might not generalize well to unseen data.
Here are some improvements to consider:
Feature Engineering: Explore creating new features from existing ones or using techniques like feature scaling/normalization to potentially improve model performance.
Model Selection and Evaluation: Try other classification algorithms and compare their performance using metrics like accuracy, precision, recall, and F1-score.
Cross-Validation: Use cross-validation techniques to obtain a more reliable estimate of model performance on unseen data.
By addressing these points, you can build a more robust and generalizable model for inventory classification using AI.
Analysis of the Two AI Inventory Management Approaches
The provided document explores two AI approaches for inventory quantity analysis: Artificial Neural Networks (ANNs) and C5.0 decision trees. Here’s a breakdown of each approach and its analysis:
1. ANN Approach
Goal: Develop an ANN model to predict inventory quantity based on various factors like store location, brand, vendor details, purchase price, and product classification.
Data: A dataset containing 10000 observations with 7 features is used.
Process:
Data is loaded and normalized to scale features between 0 and 1.
The data is split into training (75%) and testing (25%) sets.
Two ANN models are built:
Model 1: One hidden layer
Model 2: Five hidden layers
Model performance is evaluated using the correlation coefficient between predicted and actual quantities on the testing set.
Analysis:
Model 1 achieves a correlation coefficient of 0.853, indicating a strong positive correlation.
Model 2 with five hidden layers performs significantly better, achieving a correlation coefficient of 0.995, suggesting a very strong positive correlation between predicted and actual quantities.
2. C5.0 Decision Tree Approach
Goal: Develop a C5.0 decision tree model to classify inventory items into two categories (“One” and “Two”) based on 8 attributes.
Data: The same dataset (10000 observations with 8 features) is used.
Process:
Data is loaded and explored.
The data is split into training (90%) and testing (10%) sets.
The “Class” variable is converted into a factor for the C5.0 algorithm.
A C5.0 decision tree model is built using the training data.
Model performance is evaluated using a confusion matrix on the testing set.
An attempt is made to create a boosted C5.0 model, but it’s stopped prematurely.
Analysis:
The C5.0 decision tree achieves perfect classification accuracy (100%) on both the training and testing sets. However, this is likely due to overfitting, where the model memorized the training data but may not generalize well to unseen data.
To improve the model, feature engineering, using different classification algorithms, and cross-validation techniques are suggested.
Overall Observations
Both ANN and C5.0 decision trees can be used for inventory analysis.
The ANN approach with multiple hidden layers shows promise for accurate inventory quantity prediction.
The C5.0 decision tree approach achieves high accuracy but suffers from overfitting, limiting its generalizability.