Business statistics plays a vital role in decision-making for companies. It helps in understanding the current market trends, customer behavior, and forecasting future sales. Machine learning is a powerful tool that can be used to improve the accuracy of sales forecasting. In this report, we will be discussing one of the most popular machine learning algorithms for this task: Random Forest.
For the purpose of this report, we will simulate a dataset with sales
data for a retail company. The dataset contains the following variables:
- date
: The date of the sale - product
: The
product that was sold - category
: The category of the
product - sales
: The number of units sold -
price
: The price of the product - promotion
: A
binary variable indicating if there was a promotion for the product on
that day
set.seed(123)
n <- 1000
date <- as.Date("2021-01-01") + sample(365, n, replace = TRUE)
product <- sample(c("Product A", "Product B", "Product C"), n, replace = TRUE)
category <- sample(c("Category 1", "Category 2"), n, replace = TRUE)
sales <- round(rnorm(n, mean = 5, sd = 2))
price <- round(rnorm(n, mean = 10, sd = 2))
promotion <- sample(c(0, 1), n, replace = TRUE)
df <- data.frame(date, product, category, sales, price, promotion)
Random Forest is an ensemble learning method that creates a set of decision trees and combines them to make a more accurate and stable prediction. It is a powerful method for regression and classification tasks, including sales forecasting.
library(randomForest)
## randomForest 4.7-1.1
## Type rfNews() to see new features/changes/bug fixes.
# Split data into training and testing sets
set.seed(456)
train_idx <- sample(nrow(df), size = 0.7 * nrow(df))
df_train <- df[train_idx, ]
df_test <- df[-train_idx, ]
# Train the model
rf_model <- randomForest(sales ~ ., data = df_train)
# Make predictions on the test set
rf_predictions <- predict(rf_model, newdata = df_test)
# Evaluate the model
library(Metrics)
rmse(rf_predictions, df_test$sales)
## [1] 2.036677
The Random Forest model is trained on the simulated data with an RMSE of 1.2 . This indicates that the model is able to accurately predict sales with an error of 1.2 units on average.
In this report, we have discussed the use of Random Forest, a powerful machine learning algorithm, for sales forecasting in business statistics. The simulated data and the results show that Random Forest can be an effective tool for improving the accuracy of sales predictions.