Introduction

DQ-pedia is one of the e-commerce companies in the DQ Universe. Currently, DQ-pedia is aggressively expanding partners in the form of small stalls in various remote areas to expand business reach. DQ-pedia will assist DQ-pedia partners through strategies to increase partner revenue and conduct analysis to ensure stock availability of goods in each partner. In order to help these Partners, DQ-pedia needs to evaluate and analyze the sales data of each partner. Because the sales data of this partner is quite large

Import and exploration data

data <- read.csv("https://storage.googleapis.com/dqlab-dataset/transaksi_stok_dan_penjualan.tsv", header = TRUE, sep = '\t')

#Show top 5 data
head(data)
##   No.Transaksi Jenis.Transaksi    Tanggal Nama.Pelanggan Kode.Produk
## 1            1      Stok Masuk 01-04-2020              -     Item-01
## 2            1      Stok Masuk 01-04-2020              -     Item-02
## 3            1      Stok Masuk 01-04-2020              -     Item-03
## 4            1      Stok Masuk 01-04-2020              -     Item-04
## 5            1      Stok Masuk 01-04-2020              -     Item-05
## 6            1      Stok Masuk 01-04-2020              -     Item-06
##               Nama.Produk Jumlah Harga
## 1               Batere AA     44     -
## 2   Bawang Putih 200 gram     26     -
## 3   Gula Pasir Putih 1 kg     50     -
## 4            Kopi Instant     27     -
## 5       Mi Goreng Instant    100     -
## 6 Sabun Mandi Cair 100 ml     56     -
#Show bottom 5 data
tail(data)
##      No.Transaksi Jenis.Transaksi    Tanggal Nama.Pelanggan Kode.Produk
## 1837          589      Stok Masuk 30-07-2020              -     Item-11
## 1838          589      Stok Masuk 30-07-2020              -     Item-18
## 1839          590      Stok Masuk 31-07-2020              -     Item-02
## 1840          590      Stok Masuk 31-07-2020              -     Item-03
## 1841          590      Stok Masuk 31-07-2020              -     Item-07
## 1842          590      Stok Masuk 31-07-2020              -     Item-09
##                Nama.Produk Jumlah Harga
## 1837     Sabun Cuci Piring     33     -
## 1838     Telur Ayam Negeri     79     -
## 1839 Bawang Putih 200 gram     26     -
## 1840 Gula Pasir Putih 1 kg     50     -
## 1841     Sabut Cuci Piring      6     -
## 1842            Roti Tawar     15     -
#Show the structure data
str(data)
## 'data.frame':    1842 obs. of  8 variables:
##  $ No.Transaksi   : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Jenis.Transaksi: chr  "Stok Masuk" "Stok Masuk" "Stok Masuk" "Stok Masuk" ...
##  $ Tanggal        : chr  "01-04-2020" "01-04-2020" "01-04-2020" "01-04-2020" ...
##  $ Nama.Pelanggan : chr  "-" "-" "-" "-" ...
##  $ Kode.Produk    : chr  "Item-01" "Item-02" "Item-03" "Item-04" ...
##  $ Nama.Produk    : chr  "Batere AA" "Bawang Putih 200 gram" "Gula Pasir Putih 1 kg" "Kopi Instant" ...
##  $ Jumlah         : int  44 26 50 27 100 56 6 25 15 67 ...
##  $ Harga          : chr  "-" "-" "-" "-" ...

Change data type

the data type of the Date (Tanggal) variable is character and immediately realized that for the purposes of subsequent analysis it would be easier if the Date (Tanggal) variable had a ‘date’ data type

data$Tanggal <- as.Date(data$Tanggal, "%d-%m-%Y")

#Check the sturcture data
str(data$Tanggal)
##  Date[1:1842], format: "2020-04-01" "2020-04-01" "2020-04-01" "2020-04-01" "2020-04-01" ...
#Add a new column to store Month (Bulan) and Year (Tahun) data
data$Bulan_Tahun <- format(data$Tanggal, "%m-%Y")

#Show top 5 data
head(data,5)
##   No.Transaksi Jenis.Transaksi    Tanggal Nama.Pelanggan Kode.Produk
## 1            1      Stok Masuk 2020-04-01              -     Item-01
## 2            1      Stok Masuk 2020-04-01              -     Item-02
## 3            1      Stok Masuk 2020-04-01              -     Item-03
## 4            1      Stok Masuk 2020-04-01              -     Item-04
## 5            1      Stok Masuk 2020-04-01              -     Item-05
##             Nama.Produk Jumlah Harga Bulan_Tahun
## 1             Batere AA     44     -     04-2020
## 2 Bawang Putih 200 gram     26     -     04-2020
## 3 Gula Pasir Putih 1 kg     50     -     04-2020
## 4          Kopi Instant     27     -     04-2020
## 5     Mi Goreng Instant    100     -     04-2020

Change type numeric data

#Change the data type of the variable Harga to numeric
data$Harga <- as.numeric(data$Harga)
## Warning: NAs introduced by coercion
#Change data NA to 0
data$Harga[is.na(data$Harga)] <- 0

#Check the structure data
str(data$Harga)
##  num [1:1842] 0 0 0 0 0 0 0 0 0 0 ...
#Show top 5 data
head(data,5)
##   No.Transaksi Jenis.Transaksi    Tanggal Nama.Pelanggan Kode.Produk
## 1            1      Stok Masuk 2020-04-01              -     Item-01
## 2            1      Stok Masuk 2020-04-01              -     Item-02
## 3            1      Stok Masuk 2020-04-01              -     Item-03
## 4            1      Stok Masuk 2020-04-01              -     Item-04
## 5            1      Stok Masuk 2020-04-01              -     Item-05
##             Nama.Produk Jumlah Harga Bulan_Tahun
## 1             Batere AA     44     0     04-2020
## 2 Bawang Putih 200 gram     26     0     04-2020
## 3 Gula Pasir Putih 1 kg     50     0     04-2020
## 4          Kopi Instant     27     0     04-2020
## 5     Mi Goreng Instant    100     0     04-2020

Sales Analytics per Month

The first data that we must provide is data on sales in April to July to find out what the sales rate is like each month

#Then take the data with the type of transaction is Sales (Penjualan)
data_penjualan = data[data$Jenis.Transaksi=="Penjualan",]

#Perform the data aggregate function to get monthly sales
penjualan_perbulan = aggregate(x=data_penjualan$Jumlah, 
                     by = list(Bulan_Tahun = data_penjualan$Bulan_Tahun),
                     FUN = sum)

#Take the bar plot out of Sales per month (penjualan_perbulan)
barplot(penjualan_perbulan$x,
        names.arg =penjualan_perbulan$Bulan_Tahun,
        xlab="Month",
        ylab="Penjualan",
        col="blue",
        main="Penjualan perbulan",
        border="red") 

Xera Department Store Partner Sales Analysis Results April - June 2022

Sales of Xera Department Store Partners in April - June 2020 further declined. This can be seen from the bar plot of the number of sales of goods per month. The highest sales owned by Xera Department Store Partners were in April 2020 with 1753 items sold. But in June 2020, Xera Department Store Partners only sold 170 products.

Customers with the Biggest Purchases

#Determine which 10 customers have the biggest purchases!
#Issue data with transaction type is Sales 
data_penjualan = data[data$Jenis.Transaksi=="Penjualan",]

#Perform the data aggregate function to get purchases per customer
pembelian_pelanggan=aggregate(
             x=data_penjualan$Jumlah,
             by =list(Pelanggan = data_penjualan$Nama.Pelanggan),
             FUN = sum)

#Sort customer data by number of purchases from largest to smallest
pembelian_pelanggan = pembelian_pelanggan[order(-pembelian_pelanggan$x), ]

#Take the top 10 values from the data above
head(pembelian_pelanggan, 10)
##     Pelanggan  x
## 63      Wulan 87
## 35      Marta 85
## 68      Zahra 85
## 3        Andi 83
## 53       Siti 82
## 36 Mita, Mira 81
## 29      Kardi 77
## 26      Indah 73
## 18      Farah 71
## 31     Krisda 71

Comparative Analysis of Incoming and Outgoing Goods

#Comparison of incoming and outgoing goods per month
aggregate(
  x=data$Jumlah, 
  by = list(Bulan = data$Bulan_Tahun, Jenis_Transaksi = data$Jenis.Transaksi), 
  FUN = sum )
##     Bulan Jenis_Transaksi    x
## 1 04-2020       Penjualan 1753
## 2 05-2020       Penjualan 1485
## 3 06-2020       Penjualan  170
## 4 04-2020      Stok Masuk 4527
## 5 05-2020      Stok Masuk 4306
## 6 06-2020      Stok Masuk 4323
## 7 07-2020      Stok Masuk 4509
#Visualize data with appropriate charts
#Create a transaction table using the aggregate function
data_transaksi = aggregate(
  x=data$Jumlah, 
  by = list(Bulan = data$Bulan_Tahun, Jenis_Transaksi = data$Jenis.Transaksi), 
  FUN = sum)

#Issue sales transaction data and incoming stock
data_penjualan <- data_transaksi[(data_transaksi$Jenis_Transaksi) == "Penjualan",]
data_stok_masuk <- data_transaksi[(data_transaksi$Jenis_Transaksi) == "Stok Masuk",]

#Combine the two data above using the merge function with left join
data_gabungan = merge(data_stok_masuk,data_penjualan,by='Bulan', all.x=TRUE)
data_gabungan = data.frame(Bulan = data_gabungan$Bulan,
                           Stok_Masuk = data_gabungan$x.x,
                           Penjualan = data_gabungan$x.y)

#Check if there is NA data. If there is a data NA, you can replace it with 0
data_gabungan$Penjualan[is.na(data_gabungan$Penjualan)] <- 0

#Change the format of the combined data by performing the transpose command. Then change the column name using month
data_gabung = t(as.matrix(data_gabungan[-1]))
colnames(data_gabung) = data_gabungan$Bulan

#Take out plot bars with multiple categories to compare incoming stock with sales. Then take the legend out of the barplot.
barplot(data_gabung,
        main='Perbandingan Penjualan dengan Stok Masuk',
        ylab='Jumlah Barang', 
        xlab='Bulan',
        beside = TRUE, 
        col=c("red","blue"))
legend('topright',fill=c("red","blue"),legend=c('Stok Masuk','Penjualan'))

Stock Analysis Results Of Goods Entered With Sales at Xera Department Store Partners

Earlier, we also conducted an analysis to compare the stock of incoming goods with sales at Xera Department Store Partners. Based on the analysis I did, it can be concluded that the stock of inventory at Xera Department Store Partners is still quite a lot considering that on average there are around 4000 items entered into Xera Department Store Partners while this store only sells around 1400-1700 items in April and May.

Analysis of the Relationship Between the Price of Goods and the Number of Transactions

At this time I want to see if there is a relationship between the price of goods and the number of transactions, the next data that will be compared in value is the price of the goods with the number of transactions to find out whether there is a difference between the number of transactions and the price of goods

#Analysis of the relationship between the Price of Goods and the Transaction Amount
#Choose data with the type of Sales transaction
data <- data[(data$Jenis.Transaksi) == "Penjualan",]

#Turn price data into Integers
data$Harga <- as.integer(data$Harga)

#Change the NA value to 0
data$Harga[is.na(data$Harga)] <- 0

#Calculates the number of transactions based on the price range
data_transaksi <- aggregate(
  x=data$No.Transaksi, 
  by = list(Harga = data$Harga), 
  FUN = length)

#Mengurutkan data dari harga termahal
data_transaksi = data_transaksi[order(-data_transaksi$Harga), ]

#Data visualization of the relationship of the price of goods with the number of transactions
#Before using the hist() command you need to break the transaction data above into vector data forms as follows
data_transaksi_freq = as.vector(rep(data_transaksi$Harga, data_transaksi$x))

#After getting the above data, we can output the histogram from the table above by using the hist() command
hist(data_transaksi_freq,
     main="Hubungan antara harga barang dengan transaksi",
     xlab="Rentang harga barang",
     col="green"
)

Results of Analysis of the Relationship Between the Price Range of Goods and the Number of Transactions

I also use bar plots to see how the price range of goods relates to the number of transactions. From the results I got, I can find out that goods in the price range of Rp. 10,000-Rp. 15,000 are sold more than other price ranges. After that, goods with a price of less than Rp. 5000 took second place. I can conclude that items that are in this price range are the best-selling items.