In this analysis, i want to share about how to use Apriori and EDA for Market Basket Analaysis. Before that, the data that’s about to be analyze was generate by me with microsoft excel and the data is about transaction of a bakery shop in 1 day. There are 2 data that I generate, the first one is transaction data, and the second one is customers data. The transaction data tells us about the transaction per item that the customers bought and it has 3 variables. The customer data tells us about the customer that went to the store that made some transaction and it has 9 variables.
About the bakery shop that’s located at South Jakarta, and I’m about to analyze a bakery shop that sells beverages too. Here, I want to know the customer behaviour about buying at the bakery shop using Apriori method and EDA.
First step is we import the data and import some packages that we will need for this project
library(tidyverse)
library(lubridate)
library(viridis)
library(ggthemes)
library(gridExtra)
library(ggridges)
library(arules)
library(arulesViz)
library(dplyr)
library(ggplot2)
x1 <- read.csv("C:/Users/user/Documents/SEM 5/BISNIS ANALITIK/ETS BA/dp.csv",header=TRUE,sep=";") %>% mutate(Jam_Transaksi=hms(Jam_Transaksi))
x2 <- read.csv("C:/Users/user/Documents/SEM 5/BISNIS ANALITIK/ETS BA/dt.csv",header=TRUE,sep=";")
From the code above, I name the data transaction data as x1, and customer data as x2. Then, from the transaction data, there is a time variable that the item is bought. I only took the hours only, so i could easily analyze. But before that, we must see does the data have a missing value or not.
print(sum(is.null(x2)))
## [1] 0
print(sum(is.null(x1)))
## [1] 0
x1[x1==0] <- NA
x2[x2==0] <- NA
print(sum(is.na(x2)))
## [1] 18
print(sum(is.na(x1)))
## [1] 0
missing_value <- subset(x2, is.na(x2$Banyak_Produk))
missing_value
## Transaksi Jam_Transaksi Lama_Antrian Jenis_Kelamin Pembayaran Daerah_Rumah
## 31 31 10:31:46 0:01:23 FEMALE CARD Jkt_Selatan
## 35 35 10:49:53 0:00:18 FEMALE CASH Jkt_Pusat
## 38 38 10:51:16 0:00:01 MALE E-WALLET Tanggerang
## 47 47 11:13:09 0:00:21 FEMALE CARD Jkt_Barat
## 55 55 11:33:35 0:00:31 FEMALE E-WALLET Bekasi
## 63 63 11:58:29 0:06:31 MALE E-WALLET Tanggerang
## 68 68 12:02:33 0:04:36 FEMALE CARD Jkt_Selatan
## 76 76 12:21:37 0:02:50 FEMALE CARD Jkt_Barat
## 86 86 12:47:29 0:00:02 FEMALE CARD Jkt_Selatan
## 87 87 12:48:29 0:01:43 FEMALE CARD Jkt_Pusat
## 101 101 13:01:53 0:00:40 FEMALE CARD Jkt_Pusat
## 113 113 13:22:22 0:05:13 MALE E-WALLET Tanggerang
## 115 115 13:24:06 0:00:24 FEMALE CARD Jkt_Selatan
## 120 120 13:33:23 0:04:47 FEMALE E-WALLET Jkt_Barat
## 122 122 13:36:51 0:02:26 FEMALE CARD Jkt_Timur
## 177 177 15:44:11 0:10:20 MALE E-WALLET Tanggerang
## 181 181 15:50:45 0:08:55 FEMALE CARD Jkt_Selatan
## 190 190 18:16:18 0:01:51 FEMALE CASH Jkt _Utara
## Type_Pelanggan Type_Dine Banyak_Produk Total_Harga
## 31 MEMBERSHIP ONLINE NA Rp0
## 35 MEMBERSHIP ONLINE NA Rp0
## 38 NORMAL TAKE_AWAY NA Rp0
## 47 NORMAL ONLINE NA Rp0
## 55 NORMAL TAKE_AWAY NA Rp0
## 63 NORMAL TAKE_AWAY NA Rp0
## 68 MEMBERSHIP ONLINE NA Rp0
## 76 NORMAL ONLINE NA Rp0
## 86 NORMAL ONLINE NA Rp0
## 87 MEMBERSHIP ONLINE NA Rp0
## 101 MEMBERSHIP ONLINE NA Rp0
## 113 NORMAL TAKE_AWAY NA Rp0
## 115 MEMBERSHIP ONLINE NA Rp0
## 120 NORMAL ONLINE NA Rp0
## 122 NORMAL ONLINE NA Rp0
## 177 NORMAL TAKE_AWAY NA Rp0
## 181 MEMBERSHIP ONLINE NA Rp0
## 190 MEMBERSHIP ONLINE NA Rp0
From the result above, it can’t detect the missing value or 0. Then, I tried another syntax that will indicate if there’s 0, it means missing value. After that, it’s shown that there are 18 data from x2 that had some missing value. The solution is, we could delete the missing value in data x2. Why deleting it? Because the customer didn’t make any transaction so I deleted it. However, data x1 and x2 are connected. So, deleting the transaction that had a missing value in customer data, we must delete the same transaction at transaction data.
x2 <- drop_na(x2)
x1 <- drop(x1[])
print(x2) #to check is there any missing valur or not
## Transaksi Jam_Transaksi Lama_Antrian Jenis_Kelamin Pembayaran Daerah_Rumah
## 1 1 8:05:27 0:00:00 FEMALE CASH Jkt _Utara
## 2 2 8:12:37 0:00:56 FEMALE E-WALLET Bekasi
## 3 3 8:22:52 0:11:04 FEMALE CARD Jkt_Selatan
## 4 4 8:35:17 0:00:38 MALE E-WALLET Tanggerang
## 5 5 8:40:39 0:00:53 FEMALE E-WALLET Bekasi
## 6 6 8:41:18 0:04:22 FEMALE E-WALLET Jkt_Barat
## 7 7 8:41:56 0:04:53 FEMALE CARD Jkt_Selatan
## 8 8 8:47:50 0:03:16 MALE E-WALLET Tanggerang
## 9 9 8:50:20 0:02:16 MALE E-WALLET Tanggerang
## 10 10 8:53:16 0:02:22 MALE E-WALLET Tanggerang
## 11 11 9:00:37 0:03:44 FEMALE CARD Jkt_Selatan
## 12 12 9:14:11 0:00:13 MALE E-WALLET Tanggerang
## 13 13 9:19:23 0:00:35 MALE E-WALLET Tanggerang
## 14 14 9:33:56 0:09:44 FEMALE E-WALLET Bekasi
## 15 15 9:35:00 0:08:38 FEMALE CARD Jkt_Selatan
## 16 16 9:35:45 0:08:30 FEMALE CASH Jkt _Utara
## 17 17 9:41:24 0:04:04 FEMALE CASH Jkt_Pusat
## 18 18 9:42:40 0:01:09 FEMALE CARD Jkt_Selatan
## 19 19 9:55:12 0:00:05 FEMALE CARD Jkt_Pusat
## 20 20 10:02:39 0:05:33 FEMALE CARD Jkt_Selatan
## 21 21 10:08:26 0:00:57 MALE E-WALLET Depok
## 22 22 10:09:29 0:03:48 FEMALE CARD Jkt_Barat
## 23 23 10:12:06 0:08:53 FEMALE CARD Jkt_Pusat
## 24 24 10:12:15 0:08:32 FEMALE CASH Jkt _Utara
## 25 25 10:14:11 0:02:56 FEMALE CASH Jkt _Utara
## 26 26 10:16:30 0:00:19 FEMALE CARD Jkt_Selatan
## 27 27 10:22:11 0:01:05 FEMALE E-WALLET Bekasi
## 28 28 10:23:26 0:08:05 FEMALE E-WALLET Bekasi
## 29 29 10:23:33 0:00:44 FEMALE E-WALLET Bekasi
## 30 30 10:26:59 0:03:36 FEMALE E-WALLET Bekasi
## 31 32 10:44:59 0:06:11 MALE E-WALLET Tanggerang
## 32 33 10:46:15 0:01:36 FEMALE CARD Jkt_Barat
## 33 34 10:46:42 0:03:21 FEMALE CARD Jkt_Selatan
## 34 36 10:49:58 0:06:25 FEMALE E-WALLET Bekasi
## 35 37 10:50:19 0:09:33 MALE E-WALLET Tanggerang
## 36 39 10:54:02 0:02:59 FEMALE E-WALLET Bekasi
## 37 40 10:57:33 0:01:30 FEMALE CARD Jkt_Selatan
## 38 41 10:58:54 0:05:00 MALE E-WALLET Tanggerang
## 39 42 11:01:26 0:00:54 MALE E-WALLET Tanggerang
## 40 43 11:03:36 0:02:44 MALE E-WALLET Depok
## 41 44 11:06:46 0:09:06 MALE E-WALLET Depok
## 42 45 11:09:16 0:00:28 FEMALE E-WALLET Bekasi
## 43 46 11:10:51 0:00:12 FEMALE CARD Jkt_Pusat
## 44 48 11:13:41 0:01:00 FEMALE CARD Jkt_Selatan
## 45 49 11:15:29 0:01:21 MALE E-WALLET Tanggerang
## 46 50 11:15:59 0:01:57 FEMALE CARD Jkt_Selatan
## 47 51 11:17:23 0:02:27 MALE E-WALLET Tanggerang
## 48 52 11:22:55 0:00:03 MALE E-WALLET Tanggerang
## 49 53 11:25:11 0:00:50 MALE E-WALLET Depok
## 50 54 11:29:49 0:02:30 MALE E-WALLET Depok
## 51 56 11:38:57 0:01:58 FEMALE CARD Jkt_Timur
## 52 57 11:39:22 0:01:01 FEMALE CARD Jkt_Selatan
## 53 58 11:41:26 0:04:21 FEMALE CARD Jkt_Selatan
## 54 59 11:48:37 0:05:42 MALE E-WALLET Tanggerang
## 55 60 11:50:19 0:03:50 FEMALE CASH Jkt_Pusat
## 56 61 11:51:37 0:02:33 FEMALE CARD Jkt_Selatan
## 57 62 11:56:28 0:05:56 FEMALE CASH Jkt_Pusat
## 58 64 11:59:14 0:04:11 FEMALE CASH Jkt_Pusat
## 59 65 11:59:54 0:03:51 FEMALE CASH Jkt _Utara
## 60 66 12:00:05 0:02:16 MALE E-WALLET Depok
## 61 67 12:00:38 0:02:01 FEMALE CARD Jkt_Selatan
## 62 69 12:02:42 0:06:38 FEMALE E-WALLET Bekasi
## 63 70 12:03:29 0:00:07 MALE E-WALLET Tanggerang
## 64 71 12:07:16 0:04:03 MALE E-WALLET Tanggerang
## 65 72 12:07:24 0:03:37 MALE E-WALLET Tanggerang
## 66 73 12:13:22 0:01:21 FEMALE E-WALLET Jkt_Barat
## 67 74 12:14:11 0:04:25 FEMALE CARD Jkt_Selatan
## 68 75 12:20:21 0:17:16 MALE E-WALLET Tanggerang
## 69 77 12:22:11 0:01:27 FEMALE CARD Jkt_Barat
## 70 78 12:24:41 0:10:57 MALE E-WALLET Depok
## 71 79 12:25:36 0:00:32 MALE E-WALLET Tanggerang
## 72 80 12:27:51 0:01:12 FEMALE CASH Jkt_Pusat
## 73 81 12:28:36 0:01:38 FEMALE E-WALLET Bekasi
## 74 82 12:38:09 0:01:50 FEMALE CARD Jkt_Timur
## 75 83 12:42:25 0:02:36 MALE E-WALLET Tanggerang
## 76 84 12:46:06 0:03:05 FEMALE E-WALLET Bekasi
## 77 85 12:46:10 0:01:43 FEMALE E-WALLET Bekasi
## 78 88 12:49:52 0:04:12 FEMALE CARD Jkt_Selatan
## 79 89 12:50:59 0:03:39 FEMALE CARD Jkt_Timur
## 80 90 12:51:08 0:02:25 MALE E-WALLET Tanggerang
## 81 91 12:51:38 0:03:48 FEMALE E-WALLET Bekasi
## 82 92 12:52:56 0:15:52 MALE E-WALLET Depok
## 83 93 12:54:46 0:02:28 MALE E-WALLET Tanggerang
## 84 94 12:55:32 0:02:04 FEMALE CARD Jkt_Selatan
## 85 95 12:56:07 0:06:55 FEMALE CARD Jkt_Selatan
## 86 96 12:57:45 0:03:32 MALE E-WALLET Tanggerang
## 87 97 12:58:17 0:01:46 FEMALE E-WALLET Jkt_Barat
## 88 98 12:59:28 0:03:37 FEMALE CASH Jkt_Pusat
## 89 99 13:00:10 0:06:33 MALE E-WALLET Tanggerang
## 90 100 13:01:51 0:00:48 FEMALE E-WALLET Jkt_Barat
## 91 102 13:02:22 0:00:52 MALE E-WALLET Depok
## 92 103 13:05:15 0:00:12 FEMALE CARD Jkt_Pusat
## 93 104 13:06:32 0:04:39 MALE E-WALLET Depok
## 94 105 13:08:29 0:02:32 MALE E-WALLET Tanggerang
## 95 106 13:10:43 0:00:16 FEMALE CARD Jkt_Selatan
## 96 107 13:11:08 0:02:51 FEMALE CARD Jkt_Timur
## 97 108 13:11:20 0:00:37 FEMALE CASH Jkt_Pusat
## 98 109 13:13:41 0:02:00 MALE E-WALLET Depok
## 99 110 13:15:29 0:04:36 MALE E-WALLET Tanggerang
## 100 111 13:15:43 0:01:28 FEMALE CARD Jkt_Pusat
## 101 112 13:20:44 0:01:17 MALE E-WALLET Tanggerang
## 102 114 13:23:53 0:00:23 FEMALE E-WALLET Bekasi
## 103 116 13:26:00 0:03:50 FEMALE CARD Jkt_Selatan
## 104 117 13:31:22 0:03:11 MALE E-WALLET Tanggerang
## 105 118 13:32:19 0:03:08 FEMALE E-WALLET Jkt_Barat
## 106 119 13:32:26 0:02:15 FEMALE CARD Jkt_Pusat
## 107 121 13:35:57 0:01:17 MALE E-WALLET Tanggerang
## 108 123 13:38:08 0:01:42 FEMALE CARD Jkt_Selatan
## 109 124 13:38:50 0:01:47 FEMALE E-WALLET Bekasi
## 110 125 13:40:49 0:00:16 MALE E-WALLET Tanggerang
## 111 126 13:40:49 0:01:49 MALE E-WALLET Tanggerang
## 112 127 13:42:55 0:12:02 MALE E-WALLET Tanggerang
## 113 128 13:43:15 0:03:28 FEMALE E-WALLET Jkt_Barat
## 114 129 13:49:35 0:03:26 FEMALE CARD Jkt_Pusat
## 115 130 13:49:47 0:07:26 FEMALE CARD Jkt_Selatan
## 116 131 13:50:14 0:07:38 MALE E-WALLET Tanggerang
## 117 132 13:53:05 0:01:34 FEMALE CASH Jkt_Pusat
## 118 133 13:53:58 0:00:18 FEMALE E-WALLET Bekasi
## 119 134 13:55:34 0:02:21 MALE E-WALLET Tanggerang
## 120 135 13:56:49 0:00:07 FEMALE CARD Jkt_Timur
## 121 136 13:57:26 0:05:58 FEMALE CASH Jkt_Pusat
## 122 137 13:58:31 0:00:20 MALE E-WALLET Depok
## 123 138 13:58:38 0:02:35 MALE E-WALLET Depok
## 124 139 13:59:56 0:00:48 FEMALE E-WALLET Bekasi
## 125 140 14:01:21 0:07:53 FEMALE CARD Jkt_Selatan
## 126 141 14:01:34 0:02:44 FEMALE E-WALLET Jkt_Barat
## 127 142 14:03:01 0:01:36 FEMALE CARD Jkt_Selatan
## 128 143 14:04:55 0:00:29 MALE E-WALLET Tanggerang
## 129 144 14:10:24 0:00:40 FEMALE CARD Jkt_Selatan
## 130 145 14:10:39 0:01:46 FEMALE CARD Jkt_Barat
## 131 146 14:20:16 0:02:42 FEMALE CARD Jkt_Selatan
## 132 147 14:20:48 0:01:12 MALE E-WALLET Depok
## 133 148 14:28:19 0:03:40 FEMALE CARD Jkt_Selatan
## 134 149 14:33:03 0:08:24 MALE E-WALLET Tanggerang
## 135 150 14:37:35 0:00:08 FEMALE E-WALLET Bekasi
## 136 151 14:39:41 0:01:38 FEMALE CARD Jkt_Selatan
## 137 152 14:41:26 0:03:57 MALE E-WALLET Tanggerang
## 138 153 14:42:40 0:12:04 MALE E-WALLET Tanggerang
## 139 154 14:48:13 0:04:05 FEMALE CARD Jkt_Timur
## 140 155 14:51:34 0:02:23 FEMALE E-WALLET Jkt_Barat
## 141 156 14:51:55 0:01:00 MALE E-WALLET Tanggerang
## 142 157 14:52:21 0:02:50 FEMALE CASH Jkt_Pusat
## 143 158 14:57:40 0:01:43 MALE E-WALLET Depok
## 144 159 14:57:52 0:03:12 FEMALE E-WALLET Bekasi
## 145 160 14:57:54 0:01:03 FEMALE CASH Jkt_Pusat
## 146 161 14:58:24 0:00:04 FEMALE E-WALLET Bekasi
## 147 162 15:01:49 0:14:08 FEMALE CARD Jkt_Selatan
## 148 163 15:07:52 0:03:18 MALE E-WALLET Tanggerang
## 149 164 15:11:30 0:05:12 FEMALE E-WALLET Bekasi
## 150 165 15:16:15 0:01:20 MALE E-WALLET Tanggerang
## 151 166 15:16:45 0:00:18 FEMALE CARD Jkt_Selatan
## 152 167 15:17:24 0:03:40 MALE E-WALLET Tanggerang
## 153 168 15:19:37 0:00:43 FEMALE CARD Jkt_Pusat
## 154 169 15:30:22 0:01:37 FEMALE CASH Jkt_Pusat
## 155 170 15:34:24 0:03:03 MALE E-WALLET Tanggerang
## 156 171 15:34:28 0:04:19 FEMALE CARD Jkt_Selatan
## 157 172 15:36:46 0:00:57 MALE E-WALLET Depok
## 158 173 15:37:03 0:03:48 FEMALE CARD Jkt_Selatan
## 159 174 15:38:04 0:03:53 FEMALE CARD Jkt_Selatan
## 160 175 15:42:26 0:04:26 FEMALE CARD Jkt_Pusat
## 161 176 15:42:36 0:02:30 MALE E-WALLET Tanggerang
## 162 178 15:44:39 0:01:52 MALE E-WALLET Tanggerang
## 163 179 15:44:48 0:08:59 FEMALE E-WALLET Bekasi
## 164 180 15:50:38 0:07:47 MALE E-WALLET Tanggerang
## 165 182 15:53:03 0:00:53 FEMALE CARD Jkt_Pusat
## 166 183 16:09:09 0:00:14 FEMALE CASH Jkt _Utara
## 167 184 16:18:50 0:01:04 FEMALE CASH Jkt_Pusat
## 168 185 17:03:11 0:01:09 MALE E-WALLET Tanggerang
## 169 186 17:18:48 0:02:06 MALE E-WALLET Tanggerang
## 170 187 17:28:42 0:08:58 FEMALE CARD Jkt_Selatan
## 171 188 17:50:06 0:01:04 FEMALE CASH Jkt_Pusat
## 172 189 18:12:47 0:00:16 FEMALE E-WALLET Bekasi
## 173 191 18:26:02 0:07:49 FEMALE CASH Jkt_Pusat
## 174 192 18:28:17 0:05:12 FEMALE CARD Jkt_Barat
## 175 193 18:31:33 0:00:05 MALE E-WALLET Tanggerang
## 176 194 18:34:43 0:01:21 MALE E-WALLET Tanggerang
## 177 195 18:38:03 0:09:54 FEMALE E-WALLET Bekasi
## 178 196 18:44:20 0:05:18 FEMALE CASH Jkt_Pusat
## 179 197 18:48:53 0:00:08 FEMALE CARD Jkt_Barat
## 180 198 18:53:12 0:00:42 FEMALE CARD Jkt_Selatan
## 181 199 18:54:34 0:01:20 MALE E-WALLET Tanggerang
## Type_Pelanggan Type_Dine Banyak_Produk Total_Harga
## 1 MEMBERSHIP ONLINE 5 Rp65,000
## 2 NORMAL TAKE_AWAY 5 Rp51,000
## 3 MEMBERSHIP ONLINE 2 Rp18,000
## 4 NORMAL TAKE_AWAY 7 Rp87,000
## 5 NORMAL TAKE_AWAY 1 Rp10,000
## 6 NORMAL ONLINE 5 Rp52,000
## 7 NORMAL ONLINE 1 Rp10,000
## 8 NORMAL DINE_IN 1 Rp20,000
## 9 NORMAL TAKE_AWAY 1 Rp20,000
## 10 NORMAL TAKE_AWAY 6 Rp75,000
## 11 MEMBERSHIP ONLINE 3 Rp42,000
## 12 NORMAL DINE_IN 2 Rp28,000
## 13 NORMAL TAKE_AWAY 2 Rp18,000
## 14 NORMAL ONLINE 2 Rp23,000
## 15 NORMAL ONLINE 4 Rp48,000
## 16 MEMBERSHIP ONLINE 2 Rp18,000
## 17 MEMBERSHIP ONLINE 3 Rp33,000
## 18 NORMAL ONLINE 2 Rp16,000
## 19 MEMBERSHIP ONLINE 3 Rp34,000
## 20 MEMBERSHIP ONLINE 3 Rp30,000
## 21 NORMAL DINE_IN 1 Rp10,000
## 22 NORMAL ONLINE 2 Rp25,000
## 23 MEMBERSHIP ONLINE 3 Rp32,000
## 24 MEMBERSHIP ONLINE 2 Rp18,000
## 25 MEMBERSHIP ONLINE 2 Rp30,000
## 26 NORMAL ONLINE 3 Rp33,000
## 27 NORMAL ONLINE 3 Rp35,000
## 28 NORMAL TAKE_AWAY 1 Rp10,000
## 29 NORMAL TAKE_AWAY 3 Rp33,000
## 30 NORMAL TAKE_AWAY 8 Rp100,000
## 31 NORMAL TAKE_AWAY 3 Rp38,000
## 32 NORMAL ONLINE 2 Rp23,000
## 33 NORMAL ONLINE 2 Rp20,000
## 34 NORMAL TAKE_AWAY 3 Rp47,000
## 35 NORMAL TAKE_AWAY 3 Rp45,000
## 36 NORMAL ONLINE 3 Rp26,000
## 37 NORMAL ONLINE 2 Rp25,000
## 38 NORMAL DINE_IN 1 Rp10,000
## 39 NORMAL TAKE_AWAY 2 Rp20,000
## 40 NORMAL DINE_IN 4 Rp41,000
## 41 NORMAL DINE_IN 3 Rp35,000
## 42 NORMAL TAKE_AWAY 3 Rp28,000
## 43 MEMBERSHIP ONLINE 2 Rp35,000
## 44 MEMBERSHIP ONLINE 2 Rp19,000
## 45 NORMAL DINE_IN 3 Rp50,000
## 46 MEMBERSHIP ONLINE 1 Rp15,000
## 47 NORMAL TAKE_AWAY 2 Rp20,000
## 48 NORMAL TAKE_AWAY 4 Rp50,000
## 49 NORMAL DINE_IN 1 Rp15,000
## 50 NORMAL DINE_IN 3 Rp28,000
## 51 NORMAL ONLINE 3 Rp38,000
## 52 MEMBERSHIP ONLINE 3 Rp28,000
## 53 MEMBERSHIP ONLINE 5 Rp47,000
## 54 NORMAL TAKE_AWAY 3 Rp45,000
## 55 MEMBERSHIP ONLINE 2 Rp23,000
## 56 NORMAL ONLINE 2 Rp30,000
## 57 MEMBERSHIP ONLINE 4 Rp43,000
## 58 MEMBERSHIP ONLINE 3 Rp42,000
## 59 MEMBERSHIP ONLINE 3 Rp38,000
## 60 NORMAL DINE_IN 1 Rp10,000
## 61 MEMBERSHIP ONLINE 3 Rp28,000
## 62 NORMAL TAKE_AWAY 4 Rp43,000
## 63 NORMAL TAKE_AWAY 4 Rp43,000
## 64 NORMAL TAKE_AWAY 1 Rp15,000
## 65 NORMAL TAKE_AWAY 6 Rp74,000
## 66 NORMAL ONLINE 2 Rp16,000
## 67 MEMBERSHIP ONLINE 3 Rp40,000
## 68 NORMAL TAKE_AWAY 5 Rp62,000
## 69 NORMAL ONLINE 6 Rp77,000
## 70 NORMAL DINE_IN 3 Rp35,000
## 71 NORMAL TAKE_AWAY 2 Rp35,000
## 72 MEMBERSHIP ONLINE 1 Rp10,000
## 73 NORMAL TAKE_AWAY 1 Rp20,000
## 74 NORMAL ONLINE 3 Rp31,000
## 75 NORMAL TAKE_AWAY 4 Rp38,000
## 76 NORMAL TAKE_AWAY 3 Rp28,000
## 77 NORMAL TAKE_AWAY 6 Rp63,000
## 78 MEMBERSHIP ONLINE 3 Rp35,000
## 79 NORMAL ONLINE 2 Rp20,000
## 80 NORMAL TAKE_AWAY 1 Rp15,000
## 81 NORMAL ONLINE 3 Rp35,000
## 82 NORMAL DINE_IN 4 Rp46,000
## 83 NORMAL TAKE_AWAY 1 Rp15,000
## 84 NORMAL ONLINE 2 Rp35,000
## 85 MEMBERSHIP ONLINE 3 Rp35,000
## 86 NORMAL TAKE_AWAY 1 Rp20,000
## 87 NORMAL ONLINE 1 Rp8,000
## 88 MEMBERSHIP ONLINE 3 Rp35,000
## 89 NORMAL TAKE_AWAY 2 Rp30,000
## 90 NORMAL ONLINE 1 Rp10,000
## 91 NORMAL DINE_IN 6 Rp92,000
## 92 MEMBERSHIP ONLINE 2 Rp18,000
## 93 NORMAL DINE_IN 1 Rp15,000
## 94 NORMAL TAKE_AWAY 5 Rp58,000
## 95 NORMAL ONLINE 2 Rp23,000
## 96 NORMAL ONLINE 2 Rp30,000
## 97 MEMBERSHIP ONLINE 5 Rp46,000
## 98 NORMAL DINE_IN 3 Rp35,000
## 99 NORMAL TAKE_AWAY 2 Rp24,000
## 100 MEMBERSHIP ONLINE 1 Rp15,000
## 101 NORMAL DINE_IN 3 Rp38,000
## 102 NORMAL TAKE_AWAY 6 Rp70,000
## 103 MEMBERSHIP ONLINE 3 Rp33,000
## 104 NORMAL TAKE_AWAY 2 Rp16,000
## 105 NORMAL ONLINE 4 Rp43,000
## 106 MEMBERSHIP ONLINE 2 Rp27,000
## 107 NORMAL TAKE_AWAY 1 Rp20,000
## 108 MEMBERSHIP ONLINE 3 Rp35,000
## 109 NORMAL TAKE_AWAY 2 Rp23,000
## 110 NORMAL TAKE_AWAY 4 Rp50,000
## 111 NORMAL TAKE_AWAY 2 Rp18,000
## 112 NORMAL TAKE_AWAY 1 Rp20,000
## 113 NORMAL ONLINE 2 Rp25,000
## 114 MEMBERSHIP ONLINE 3 Rp50,000
## 115 NORMAL ONLINE 4 Rp60,000
## 116 NORMAL TAKE_AWAY 3 Rp45,000
## 117 MEMBERSHIP ONLINE 2 Rp27,000
## 118 NORMAL ONLINE 3 Rp33,000
## 119 NORMAL TAKE_AWAY 2 Rp35,000
## 120 NORMAL ONLINE 2 Rp30,000
## 121 MEMBERSHIP ONLINE 2 Rp35,000
## 122 NORMAL DINE_IN 4 Rp43,000
## 123 NORMAL DINE_IN 2 Rp20,000
## 124 NORMAL TAKE_AWAY 4 Rp50,000
## 125 NORMAL ONLINE 3 Rp37,000
## 126 NORMAL ONLINE 1 Rp10,000
## 127 NORMAL ONLINE 4 Rp46,000
## 128 NORMAL DINE_IN 1 Rp8,000
## 129 NORMAL ONLINE 5 Rp62,000
## 130 NORMAL ONLINE 1 Rp12,000
## 131 MEMBERSHIP ONLINE 2 Rp25,000
## 132 NORMAL DINE_IN 3 Rp30,000
## 133 MEMBERSHIP ONLINE 1 Rp15,000
## 134 NORMAL TAKE_AWAY 6 Rp74,000
## 135 NORMAL TAKE_AWAY 6 Rp63,000
## 136 MEMBERSHIP ONLINE 3 Rp37,000
## 137 NORMAL TAKE_AWAY 5 Rp60,000
## 138 NORMAL TAKE_AWAY 5 Rp50,000
## 139 NORMAL ONLINE 1 Rp15,000
## 140 NORMAL ONLINE 3 Rp40,000
## 141 NORMAL DINE_IN 6 Rp66,000
## 142 MEMBERSHIP ONLINE 4 Rp52,000
## 143 NORMAL DINE_IN 3 Rp29,000
## 144 NORMAL TAKE_AWAY 1 Rp8,000
## 145 MEMBERSHIP ONLINE 5 Rp50,000
## 146 NORMAL ONLINE 3 Rp34,000
## 147 MEMBERSHIP ONLINE 2 Rp18,000
## 148 NORMAL TAKE_AWAY 2 Rp18,000
## 149 NORMAL TAKE_AWAY 2 Rp28,000
## 150 NORMAL TAKE_AWAY 3 Rp33,000
## 151 MEMBERSHIP ONLINE 3 Rp41,000
## 152 NORMAL TAKE_AWAY 1 Rp8,000
## 153 MEMBERSHIP ONLINE 3 Rp35,000
## 154 MEMBERSHIP ONLINE 2 Rp25,000
## 155 NORMAL TAKE_AWAY 2 Rp18,000
## 156 NORMAL ONLINE 2 Rp20,000
## 157 NORMAL DINE_IN 1 Rp8,000
## 158 MEMBERSHIP ONLINE 1 Rp15,000
## 159 NORMAL ONLINE 3 Rp40,000
## 160 MEMBERSHIP ONLINE 3 Rp32,000
## 161 NORMAL TAKE_AWAY 4 Rp45,000
## 162 NORMAL TAKE_AWAY 4 Rp50,000
## 163 NORMAL TAKE_AWAY 1 Rp8,000
## 164 NORMAL TAKE_AWAY 4 Rp60,000
## 165 MEMBERSHIP ONLINE 2 Rp30,000
## 166 MEMBERSHIP ONLINE 1 Rp12,000
## 167 MEMBERSHIP ONLINE 2 Rp23,000
## 168 NORMAL TAKE_AWAY 1 Rp15,000
## 169 NORMAL DINE_IN 2 Rp32,000
## 170 MEMBERSHIP ONLINE 4 Rp39,000
## 171 MEMBERSHIP ONLINE 4 Rp41,000
## 172 NORMAL TAKE_AWAY 3 Rp35,000
## 173 MEMBERSHIP ONLINE 4 Rp48,000
## 174 NORMAL ONLINE 3 Rp28,000
## 175 NORMAL TAKE_AWAY 4 Rp45,000
## 176 NORMAL TAKE_AWAY 1 Rp15,000
## 177 NORMAL TAKE_AWAY 1 Rp8,000
## 178 MEMBERSHIP ONLINE 2 Rp20,000
## 179 NORMAL ONLINE 1 Rp12,000
## 180 MEMBERSHIP ONLINE 1 Rp15,000
## 181 NORMAL TAKE_AWAY 4 Rp38,000
As we could see, there are no missing value any more and we could continue anlyze
From the transaction data, it is shown that there’s a transaction time(Jam Transaksi). That variable could tell us at what time is the bakery busy.
Grafik1 <- x1 %>%
mutate(Hour = as.factor(hour(x1$Jam_Transaksi))) %>%
group_by(Hour) %>% summarise(Count=n()) %>%
ggplot(aes(x=Hour,y=Count,fill=Count))+
theme_fivethirtyeight()+
geom_bar(stat="identity")+
ggtitle("Transaction by Hour")+
theme(legend.position="none")
Grafik1
From the graph above, we could see that the purchases start to increase from 10:00 until 13:00. Then, from 13:00 until 16:00 the purchases kept decreasing. So, i can interpret that, the most purchase was at lunch time.
Grafik2 <- x1 %>%
group_by(Nama_Produk) %>%
summarise(Count = n()) %>%
arrange(desc(Count)) %>%
ggplot(aes(x=reorder(Nama_Produk,Count),y=Count,fill=Nama_Produk))+
geom_bar(stat="identity")+
coord_flip()+
ggtitle("Bakery Signature Product")+
labs(y= "", x = "Product")+
theme(legend.position="none")
Grafik2
From the graph above, the most product that is purchase is Kopi(“Coffee”), then Teh(“Tea”), etc.
From the customer data, there’s a total of item that is bought by the customer. From that variabel(“Banyak_Produk”), we can see the mean of produk that is bought per customer at some certain hour.
Grafik3.1 <- x1 %>%
mutate(Hour = as.factor(hour(x1$Jam_Transaksi)))%>%
group_by(Hour) %>%
summarise(Count= n())
Grafik3.2 <- x1 %>%
mutate(Hour = as.factor(hour(x1$Jam_Transaksi)))%>%
group_by(Hour,x1$Transkasi) %>%
summarise(n_distinct(x1$Transkasi)) %>%
summarise(Count=n())
Grafik3.3 <- data.frame(Grafik3.1, # Days, total items
Grafik3.2[2], # unique transactions
Grafik3.1[2]/Grafik3.2[2]) # items per unique transaction
colnames(Grafik3.3) <- c("Hour","Line","Unique","Items.Trans")
Grafik3 <-
ggplot(Grafik3.3,aes(x=Hour,y=Items.Trans,fill=Items.Trans))+
theme_fivethirtyeight()+
geom_bar(stat="identity")+
ggtitle("Unique Transaction by Hour")+
theme(legend.position="none")+
geom_text(aes(label=round(Items.Trans,1)), vjust=2)
Grafik3
From the graph above, it tells us about the mean or average total item that is bought by the customers per hour. From the barplot above, the most average total item that is bought by customer is at 08:00 and 14:00. From 08:00, it tells that most of the customer need to by more food/drinks because the customer didn’t make or eat breakfast at home, so the customer bought many than usual. Then, at 14:00 is lunch time where people need to eat many so they bought more than average.
From the customer data, there are many variables that we can explore more. There are a lot of variabels that we can conclude such as membership, payment etc.
From the customer data, we know that the bakery shop has some membership customer. The variable could tell us how many membership customer that went to the bakery shop and make some purchase
member <- x2 %>%
group_by(Type_Pelanggan) %>%
count() %>%
ungroup() %>%
mutate(per=`n`/sum(`n`)) %>%
arrange(desc(Type_Pelanggan))
member$label <- scales::percent(member$per)
ggplot(member=member)+
geom_bar(aes(x="", y=member$per, fill=member$Type_Pelanggan), stat="identity", width = 1)+
coord_polar("y", start=0)+
theme_void()+
geom_text(aes(x=1, y = cumsum(member$per) - member$per/2, label=member$label))
The pie chart above explain that 29% that went to the bakery shop and bought something was 29% is a membership of that bakery shop and the others were not a normal customer.
Not there is only the type of customer that went to the store, the customer data tells us about what type of payment does the customer use.
payment <- x2 %>%
group_by(Pembayaran) %>%
count() %>%
ungroup() %>%
mutate(per=`n`/sum(`n`)) %>%
arrange(desc(Pembayaran))
payment$label <- scales::percent(payment$per)
ggplot(payment=payment)+
geom_bar(aes(x="", y=payment$per, fill=payment$Pembayaran), stat="identity", width = 1)+
coord_polar("y", start=0)+
theme_void()+
geom_text(aes(x=1, y = cumsum(payment$per) - payment$per/2, label=payment$label))
Now we know that most of the customer pay for their purchase with E-Wallet(Gopay OVO, & etc) with 56% of all the customer at 1 day and 32% of the customer that payed with debit card and the rest is with cash.
The customer data also shown the customer domicile.Now we can see where’s does the most buyers domicile
Grafik6 <- x2 %>% #grafik daerah rumah
group_by(Daerah_Rumah) %>%
summarise(Count = n()) %>%
arrange(desc(Count)) %>%
ggplot(aes(x=Daerah_Rumah,y=Count,fill=Daerah_Rumah))+
geom_bar(stat="identity")+
ggtitle("Daerah Rumah pelanggan")+
theme(legend.position="none")
Grafik6
From the barplot above, the most buyer at that day was customer that live in Tanggerang.
From the customer data, it gives us information about the customer gender. The gender variable could give us some new information.
jk <- x2 %>%
group_by(Jenis_Kelamin) %>%
count() %>%
ungroup() %>%
mutate(per=`n`/sum(`n`)) %>%
arrange(desc(Jenis_Kelamin))
jk$label <- scales::percent(jk$per)
ggplot(jk=jk)+
geom_bar(aes(x="", y=jk$per, fill=jk$Jenis_Kelamin), stat="identity", width = 1)+
coord_polar("y", start=0)+
theme_void()+
geom_text(aes(x=1, y = cumsum(jk$per) - jk$per/2, label=jk$label))
As you could see, most of the costumer is female. From this new insight, the bakery shop owner can boost the sales with adding a promo like “TGIF(Thanks God Is Female)” where the membership customer and female will get a promotion. From this type of promo, it can boost the membership for the bakery shop and boost the sales.
There are variables about the dine type of customer.
dt <- x2 %>%
group_by(Type_Dine) %>%
count() %>%
ungroup() %>%
mutate(per=`n`/sum(`n`)) %>%
arrange(desc(Type_Dine))
dt$label <- scales::percent(dt$per)
ggplot(dt=dt)+
geom_bar(aes(x="", y=dt$per, fill=dt$Type_Dine), stat="identity", width = 1)+
coord_polar("y", start=0)+
theme_void()+
geom_text(aes(x=1, y = cumsum(dt$per) - dt$per/2, label=dt$label))
From the pie chart above, most of the type dine customer is online(gofood and grabfood).
First of all, maybe i’ll explain what is apriori. Imagine in grocery shop, there are a housewife, college students, etc. Every people have their own business when they go to grocery shopping. Housewife buying diapers and milk for their child, a college students attend to buy chips and sweet drinks or food. From these buying patterns, it can help to increase the sales in several ways. Example if there is a pair of items, A and B, that are frequently bought together. To boost some sales, there are ways such as, put items A and B in the same shelf so the buyers could easily see the items, or the store could give a discount or promotion for buying item A and B together, etc.
In apriori method, there are three common ways to measure association: 1.) Support, support is to find the combination of the item in the database. The greater the support for the items, the more items are purchased or it shows the dominance of an item from the whole transaction. 2.) Confidence, confidence is the probability of a several products being purchased, while on product is definitely purchase. 3.) Lift, lift indicates the validity of the transaction process and provides information on whether the item was purchased together with other items. If the value of lift is 1, it will shown that the combination is independent. If the value is greater that 1, it will shown that the combination have a positive correlation and if the value is less than 1, it will shown that the combination have a negative correlation.
y <- read.transactions("C:/Users/user/Documents/SEM 5/BISNIS ANALITIK/ETS BA/dp.csv",format="single",cols=c(1,3),sep=";"
)
rules_a <- apriori(y,parameter=list(support=0.01,confidence=.6,maxlen=3))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.6 0.1 1 none FALSE TRUE 5 0.01 1
## maxlen target ext
## 3 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 1
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[9 item(s), 182 transaction(s)] done [0.00s].
## sorting and recoding items ... [8 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 done [0.01s].
## writing ... [8 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
rules_a
## set of 8 rules
inspect(head(rules_a,by="support",n=3))
## lhs rhs support confidence coverage
## [1] {Roti Abon,Roti Cokelat} => {Kopi} 0.04395604 0.6153846 0.07142857
## [2] {Roti Abon,Roti Keju} => {Roti Cokelat} 0.03296703 0.6666667 0.04945055
## [3] {Roti Abon,Roti Keju} => {Kopi} 0.03296703 0.6666667 0.04945055
## lift count
## [1] 1.191489 8
## [2] 1.733333 6
## [3] 1.290780 6
With the result above, it’s filtered by the 3 highest support with the minimum of support is 0.01, minimum of confidence is 0.6 and the maxlen or the maximum of combination to be shown is 3. From the table above, it tells us that if the customer at that day bought Roti Abon and Roti Cokelat, the customer will bought Kopi and the reality, there are 8 customer that bought with those combintation. It also shown the value of the support is 0.04, with confidence 0.62 and lift 1.2. The support value is not that high because the transaction is not that many and it’s fine, therefore we could see the value of confidence is greater than 60% with the value of lift is greater than 1. For the other combination has the same interpretation.
If we see carefully see the table, most of the following combination is either Kopi or Roti Coklat. So, if the owner of the bakery shop wants to boost sales, my suggestion is to have a promotion of buy 2(choose : Roti Abon, Roti Cokelat, or Roti Keju) and the customer will get 1 free item(Kopi). It will be a good thing if the promotion is only for the membership only. Seeing that, the other customer that haven’t been a membership will be interested to be a membership.
Here are some graph about the apriori
#plot(rules_a, method="paracoord", control=list(reorder=TRUE)) can use this
#plot(rules_a, method="two-key plot") can use this
plot(rules_a, method="graph")
Here are some of my recommendation for the bakery shop
2.To increase consumer loyalty, a promo is held for every member who successfully invites his friends to join as members
3.Cooperate with various cashless payment providers to get attractive discounts for new customers.
4.Establish new promotion for member only or it depends on the owner. My recommendation is : - Promtion = Buy 2(choose : Roti Abon, Roti Cokelat, or Roti Keju) get 1 Kopi - TGIF(Thanks God is Female) = Discount for membership women who shops in store