DQ-pedia is one of the e-commerce companies in the DQ Universe. Currently, DQ-pedia is aggressively expanding partners in the form of small stalls in various remote areas to expand business reach. DQ-pedia will assist DQ-pedia partners through strategies to increase partner revenue and conduct analysis to ensure stock availability of goods in each partner. In order to help these Partners, DQ-pedia needs to evaluate and analyze the sales data of each partner. Because the sales data of this partner is quite large
data <- read.csv("https://storage.googleapis.com/dqlab-dataset/transaksi_stok_dan_penjualan.tsv", header = TRUE, sep = '\t')
#Show top 5 data
head(data)
## No.Transaksi Jenis.Transaksi Tanggal Nama.Pelanggan Kode.Produk
## 1 1 Stok Masuk 01-04-2020 - Item-01
## 2 1 Stok Masuk 01-04-2020 - Item-02
## 3 1 Stok Masuk 01-04-2020 - Item-03
## 4 1 Stok Masuk 01-04-2020 - Item-04
## 5 1 Stok Masuk 01-04-2020 - Item-05
## 6 1 Stok Masuk 01-04-2020 - Item-06
## Nama.Produk Jumlah Harga
## 1 Batere AA 44 -
## 2 Bawang Putih 200 gram 26 -
## 3 Gula Pasir Putih 1 kg 50 -
## 4 Kopi Instant 27 -
## 5 Mi Goreng Instant 100 -
## 6 Sabun Mandi Cair 100 ml 56 -
#Show bottom 5 data
tail(data)
## No.Transaksi Jenis.Transaksi Tanggal Nama.Pelanggan Kode.Produk
## 1837 589 Stok Masuk 30-07-2020 - Item-11
## 1838 589 Stok Masuk 30-07-2020 - Item-18
## 1839 590 Stok Masuk 31-07-2020 - Item-02
## 1840 590 Stok Masuk 31-07-2020 - Item-03
## 1841 590 Stok Masuk 31-07-2020 - Item-07
## 1842 590 Stok Masuk 31-07-2020 - Item-09
## Nama.Produk Jumlah Harga
## 1837 Sabun Cuci Piring 33 -
## 1838 Telur Ayam Negeri 79 -
## 1839 Bawang Putih 200 gram 26 -
## 1840 Gula Pasir Putih 1 kg 50 -
## 1841 Sabut Cuci Piring 6 -
## 1842 Roti Tawar 15 -
#Show the structure data
str(data)
## 'data.frame': 1842 obs. of 8 variables:
## $ No.Transaksi : int 1 1 1 1 1 1 1 1 1 1 ...
## $ Jenis.Transaksi: chr "Stok Masuk" "Stok Masuk" "Stok Masuk" "Stok Masuk" ...
## $ Tanggal : chr "01-04-2020" "01-04-2020" "01-04-2020" "01-04-2020" ...
## $ Nama.Pelanggan : chr "-" "-" "-" "-" ...
## $ Kode.Produk : chr "Item-01" "Item-02" "Item-03" "Item-04" ...
## $ Nama.Produk : chr "Batere AA" "Bawang Putih 200 gram" "Gula Pasir Putih 1 kg" "Kopi Instant" ...
## $ Jumlah : int 44 26 50 27 100 56 6 25 15 67 ...
## $ Harga : chr "-" "-" "-" "-" ...
the data type of the Date (Tanggal) variable is character and immediately realized that for the purposes of subsequent analysis it would be easier if the Date (Tanggal) variable had a ‘date’ data type
data$Tanggal <- as.Date(data$Tanggal, "%d-%m-%Y")
#Check the sturcture data
str(data$Tanggal)
## Date[1:1842], format: "2020-04-01" "2020-04-01" "2020-04-01" "2020-04-01" "2020-04-01" ...
#Add a new column to store Month (Bulan) and Year (Tahun) data
data$Bulan_Tahun <- format(data$Tanggal, "%m-%Y")
#Show top 5 data
head(data,5)
## No.Transaksi Jenis.Transaksi Tanggal Nama.Pelanggan Kode.Produk
## 1 1 Stok Masuk 2020-04-01 - Item-01
## 2 1 Stok Masuk 2020-04-01 - Item-02
## 3 1 Stok Masuk 2020-04-01 - Item-03
## 4 1 Stok Masuk 2020-04-01 - Item-04
## 5 1 Stok Masuk 2020-04-01 - Item-05
## Nama.Produk Jumlah Harga Bulan_Tahun
## 1 Batere AA 44 - 04-2020
## 2 Bawang Putih 200 gram 26 - 04-2020
## 3 Gula Pasir Putih 1 kg 50 - 04-2020
## 4 Kopi Instant 27 - 04-2020
## 5 Mi Goreng Instant 100 - 04-2020
#Change the data type of the variable Harga to numeric
data$Harga <- as.numeric(data$Harga)
## Warning: NAs introduced by coercion
#Change data NA to 0
data$Harga[is.na(data$Harga)] <- 0
#Check the structure data
str(data$Harga)
## num [1:1842] 0 0 0 0 0 0 0 0 0 0 ...
#Show top 5 data
head(data,5)
## No.Transaksi Jenis.Transaksi Tanggal Nama.Pelanggan Kode.Produk
## 1 1 Stok Masuk 2020-04-01 - Item-01
## 2 1 Stok Masuk 2020-04-01 - Item-02
## 3 1 Stok Masuk 2020-04-01 - Item-03
## 4 1 Stok Masuk 2020-04-01 - Item-04
## 5 1 Stok Masuk 2020-04-01 - Item-05
## Nama.Produk Jumlah Harga Bulan_Tahun
## 1 Batere AA 44 0 04-2020
## 2 Bawang Putih 200 gram 26 0 04-2020
## 3 Gula Pasir Putih 1 kg 50 0 04-2020
## 4 Kopi Instant 27 0 04-2020
## 5 Mi Goreng Instant 100 0 04-2020
The first data that we must provide is data on sales in April to July to find out what the sales rate is like each month
#Then take the data with the type of transaction is Sales (Penjualan)
data_penjualan = data[data$Jenis.Transaksi=="Penjualan",]
#Perform the data aggregate function to get monthly sales
penjualan_perbulan = aggregate(x=data_penjualan$Jumlah,
by = list(Bulan_Tahun = data_penjualan$Bulan_Tahun),
FUN = sum)
#Take the bar plot out of Sales per month (penjualan_perbulan)
barplot(penjualan_perbulan$x,
names.arg =penjualan_perbulan$Bulan_Tahun,
xlab="Month",
ylab="Penjualan",
col="blue",
main="Penjualan perbulan",
border="red")
Sales of Xera Department Store Partners in April - June 2020 further declined. This can be seen from the bar plot of the number of sales of goods per month. The highest sales owned by Xera Department Store Partners were in April 2020 with 1753 items sold. But in June 2020, Xera Department Store Partners only sold 170 products.
#Determine which 10 customers have the biggest purchases!
#Issue data with transaction type is Sales
data_penjualan = data[data$Jenis.Transaksi=="Penjualan",]
#Perform the data aggregate function to get purchases per customer
pembelian_pelanggan=aggregate(
x=data_penjualan$Jumlah,
by =list(Pelanggan = data_penjualan$Nama.Pelanggan),
FUN = sum)
#Sort customer data by number of purchases from largest to smallest
pembelian_pelanggan = pembelian_pelanggan[order(-pembelian_pelanggan$x), ]
#Take the top 10 values from the data above
head(pembelian_pelanggan, 10)
## Pelanggan x
## 63 Wulan 87
## 35 Marta 85
## 68 Zahra 85
## 3 Andi 83
## 53 Siti 82
## 36 Mita, Mira 81
## 29 Kardi 77
## 26 Indah 73
## 18 Farah 71
## 31 Krisda 71
#Comparison of incoming and outgoing goods per month
aggregate(
x=data$Jumlah,
by = list(Bulan = data$Bulan_Tahun, Jenis_Transaksi = data$Jenis.Transaksi),
FUN = sum )
## Bulan Jenis_Transaksi x
## 1 04-2020 Penjualan 1753
## 2 05-2020 Penjualan 1485
## 3 06-2020 Penjualan 170
## 4 04-2020 Stok Masuk 4527
## 5 05-2020 Stok Masuk 4306
## 6 06-2020 Stok Masuk 4323
## 7 07-2020 Stok Masuk 4509
#Visualize data with appropriate charts
#Create a transaction table using the aggregate function
data_transaksi = aggregate(
x=data$Jumlah,
by = list(Bulan = data$Bulan_Tahun, Jenis_Transaksi = data$Jenis.Transaksi),
FUN = sum)
#Issue sales transaction data and incoming stock
data_penjualan <- data_transaksi[(data_transaksi$Jenis_Transaksi) == "Penjualan",]
data_stok_masuk <- data_transaksi[(data_transaksi$Jenis_Transaksi) == "Stok Masuk",]
#Combine the two data above using the merge function with left join
data_gabungan = merge(data_stok_masuk,data_penjualan,by='Bulan', all.x=TRUE)
data_gabungan = data.frame(Bulan = data_gabungan$Bulan,
Stok_Masuk = data_gabungan$x.x,
Penjualan = data_gabungan$x.y)
#Check if there is NA data. If there is a data NA, you can replace it with 0
data_gabungan$Penjualan[is.na(data_gabungan$Penjualan)] <- 0
#Change the format of the combined data by performing the transpose command. Then change the column name using month
data_gabung = t(as.matrix(data_gabungan[-1]))
colnames(data_gabung) = data_gabungan$Bulan
#Take out plot bars with multiple categories to compare incoming stock with sales. Then take the legend out of the barplot.
barplot(data_gabung,
main='Perbandingan Penjualan dengan Stok Masuk',
ylab='Jumlah Barang',
xlab='Bulan',
beside = TRUE,
col=c("red","blue"))
legend('topright',fill=c("red","blue"),legend=c('Stok Masuk','Penjualan'))
Earlier, we also conducted an analysis to compare the stock of incoming goods with sales at Xera Department Store Partners. Based on the analysis I did, it can be concluded that the stock of inventory at Xera Department Store Partners is still quite a lot considering that on average there are around 4000 items entered into Xera Department Store Partners while this store only sells around 1400-1700 items in April and May.
At this time I want to see if there is a relationship between the price of goods and the number of transactions, the next data that will be compared in value is the price of the goods with the number of transactions to find out whether there is a difference between the number of transactions and the price of goods
#Analysis of the relationship between the Price of Goods and the Transaction Amount
#Choose data with the type of Sales transaction
data <- data[(data$Jenis.Transaksi) == "Penjualan",]
#Turn price data into Integers
data$Harga <- as.integer(data$Harga)
#Change the NA value to 0
data$Harga[is.na(data$Harga)] <- 0
#Calculates the number of transactions based on the price range
data_transaksi <- aggregate(
x=data$No.Transaksi,
by = list(Harga = data$Harga),
FUN = length)
#Mengurutkan data dari harga termahal
data_transaksi = data_transaksi[order(-data_transaksi$Harga), ]
#Data visualization of the relationship of the price of goods with the number of transactions
#Before using the hist() command you need to break the transaction data above into vector data forms as follows
data_transaksi_freq = as.vector(rep(data_transaksi$Harga, data_transaksi$x))
#After getting the above data, we can output the histogram from the table above by using the hist() command
hist(data_transaksi_freq,
main="Hubungan antara harga barang dengan transaksi",
xlab="Rentang harga barang",
col="green"
)
I also use bar plots to see how the price range of goods relates to the number of transactions. From the results I got, I can find out that goods in the price range of Rp. 10,000-Rp. 15,000 are sold more than other price ranges. After that, goods with a price of less than Rp. 5000 took second place. I can conclude that items that are in this price range are the best-selling items.