Sales Prediction Model Using Classification Decision Tree Approach For Small Medium Enterprise Based on Indonesian E-Commerce Data

School of Economic and Business, Telkom University

Raden Johannes, Andry Alamsyah

07 November 2022

Abstrak

The growth of internet users in Indonesia gives an impact on many aspects of daily life, including commerce. Indonesian smallmedium enterprises took this advantage of new media to derive their activity by the meaning of online commerce. Until now, there is no known practical implementation of how to predict their sales and revenue using their historical transaction. In this paper, we build a sales prediction model on the Indonesian footwear industry using real-life data crawled on Tokopedia, one of the biggest e-commerce providers in Indonesia. Data mining is a discipline that can be used to gather information by processing the data. By using the method of classification in data mining, this research will describe patterns of the market and predict the potential of the region in the national market commodities. Our approach is based on the classification decision tree. We managed to determine predicted the number of items sold by the viewers, price, and type of shoes.

Keywords: Prediction Model, Data Mining, Classification, Decision Tree, CHAID.

Output

We managed to determine predicted the number of items sold by the viewers, price, and type of shoes.

Theoretical Background

Social Computing

Social computing is concerned with the study of social behavior, and social context based on computational systems. Social computing provides four main facilities to the behavioral modeling.

  1. Model: Building To create & build up models for behavior. 1.Analysis: Review the creation & already created models with their design work.
  2. Pattern mining: Minimize the patterns through mining.
  3. Prediction: Follow the rules & regulations to control the error in the designing.

Classification

Classification models describe data relationships and predict values for future observations. Classification maps data into predefined groups of classes. It is often referred to as supervised learning because the classes are determined before examining the data.

Decision Tree

The decision tree is a logical model represented as a binary (two-way split) tree that shows how the value of a target variable (output) can be predicted by using the values of a set of predictor variables (input). Decision tree is a predictive model which can be used to represent both classifiers and regression models.

CRISP-DP

The Cross-Industry Standard Process for Data Mining (CRISP-DM) is a popular methodology for increasing the success of DM projects. This methodology defines a non-rigid sequence of six phases, which allow the building and implementation of a DM model to be used in a real environment, helping to support business decisions

CRISP-DM Process

CRISP-DM Process

Methodology

We investigate data of shoe sales in Indonesia using the web mining method on online marketplace website (tokopedia.com) until March \(3^{rd}\), 2015. The attribute used in this research are:

  1. price,
  2. type of shoes,
  3. insurance,
  4. product viewer,
  5. city of the seller,
  6. rating of speed, service, and accuracy.

has an impact on product sales.

CHAID Analysis

There are several steps to make a decision tree using the CHAID algorithm which are:

  1. Merging. Category merging can be done on an independent variable that has more than two categories that are related.
  2. Splitting. In this part independent variable which used as the best split node. Splitting was conducted with a p-value on each independent variable.
  3. Stopping. The decision tree should be terminated by the rules. If there is no significant independent variable or if a tree reaches a maximum value limit of the tree defined specifications.
CHAID Flowchart

CHAID Flowchart

CHAID

In CHAID analysis, the following are the components of the decision tree:

CHAID Modelling

Menggunakan data yang telah dikategorisasi terlebih dahulu.

Model CHAID

Model CHAID

Prediction

With terminal and classification patterns, we can predict sales. For illustration, if we sell shoes at Rp 62.000 with 80 viewers and classified as “Sneakers”, there is a 98,2% probability that the shoes will be sold 1-143 unit, 1.3% probability that the shoes will be sold 155-287 unit, 0.3% probability that the shoes will be sold 431-573 unit, and 0.2% probability that the shoes will be sold 860-1002 unit.

Discussion

CHAID sudah digunakan pada project Lenny, MB, dan Rendy.