Raden Johannes, Andry Alamsyah
07 November 2022
The growth of internet users in Indonesia gives an impact on many aspects of daily life, including commerce. Indonesian smallmedium enterprises took this advantage of new media to derive their activity by the meaning of online commerce. Until now, there is no known practical implementation of how to predict their sales and revenue using their historical transaction. In this paper, we build a sales prediction model on the Indonesian footwear industry using real-life data crawled on Tokopedia, one of the biggest e-commerce providers in Indonesia. Data mining is a discipline that can be used to gather information by processing the data. By using the method of classification in data mining, this research will describe patterns of the market and predict the potential of the region in the national market commodities. Our approach is based on the classification decision tree. We managed to determine predicted the number of items sold by the viewers, price, and type of shoes.
Keywords: Prediction Model, Data Mining, Classification, Decision Tree, CHAID.
Social computing is concerned with the study of social behavior, and social context based on computational systems. Social computing provides four main facilities to the behavioral modeling.
Classification models describe data relationships and predict values for future observations. Classification maps data into predefined groups of classes. It is often referred to as supervised learning because the classes are determined before examining the data.
The decision tree is a logical model represented as a binary (two-way split) tree that shows how the value of a target variable (output) can be predicted by using the values of a set of predictor variables (input). Decision tree is a predictive model which can be used to represent both classifiers and regression models.
The Cross-Industry Standard Process for Data Mining (CRISP-DM) is a popular methodology for increasing the success of DM projects. This methodology defines a non-rigid sequence of six phases, which allow the building and implementation of a DM model to be used in a real environment, helping to support business decisions
CRISP-DM Process
We investigate data of shoe sales in Indonesia using the web mining method on online marketplace website (tokopedia.com) until March \(3^{rd}\), 2015. The attribute used in this research are:
has an impact on product sales.
There are several steps to make a decision tree using the CHAID algorithm which are:
CHAID Flowchart
In CHAID analysis, the following are the components of the decision tree:
Menggunakan data yang telah dikategorisasi terlebih dahulu.
Model CHAID
With terminal and classification patterns, we can predict sales. For illustration, if we sell shoes at Rp 62.000 with 80 viewers and classified as “Sneakers”, there is a 98,2% probability that the shoes will be sold 1-143 unit, 1.3% probability that the shoes will be sold 155-287 unit, 0.3% probability that the shoes will be sold 431-573 unit, and 0.2% probability that the shoes will be sold 860-1002 unit.
CHAID sudah digunakan pada project Lenny, MB, dan Rendy.