1 Generate Data frame

Suppose you are a data scientist, and you get a project at a start-up company, for instance Kopi Kenangan. Let’s say, you are asking to generate the collection of any possible data set from their daily sales. If I asking you: what kind of data set that you can generate?. Here, I assume you want to provide them the following data set:

  • Id : there are 5000 transactions.
  • Date: daily 5000 transactions, start from 2018/01/01.
  • Name: create 20 random cashier names (you can use names of your classmate including your self) to cover all 5000 transactions at Kopi Kenangan.
  • City: allocate this 5000 transactions to the biggest cities in Indonesia (with the same proportion). Here I assume,
    • Jakarta
    • Bogor
    • Depok
    • Tangerang
    • Bekasi
  • Outlet: allocate this 5000 transactions in five outlets. Here I assume,
    • Outlet 1
    • Outlet 2
    • Outlet 3
    • Outlet 4
    • Outlet 5
  • Menu: generate random sales of 5000 menu items at Kopi Kenangan every day. Here, I assume,
    • Cappucino
    • Es Kopi Susu
    • Hot Caramel Latte
    • Hot Chocolate
    • Hot Red Velvet Latte
    • Ice Americano
    • Ice Berry Coffe
    • Ice Cafe Latte
    • Ice Caramel Latte
    • Ice Coffee Avocado
    • Ice Coffee Lite
    • Ice Matcha Espresso
    • Ice Matcha Latte
    • Ice Red Velvet Latte
  • Price: generate random prices for each menu items above (min=18000, and max=45000)
  • Discount: generate random discounts for each menu items above (min=0.05, and max=0.12)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
ID <- (1:5000)
Date <- sort(sample(seq
                    (as.Date("2018/01/01"),
                      as.Date("2020/09/22"), 
                      by="day"),
                    5000,T),
             F)
Name <- sample(c("adi", "Bakti Siregar", "Cucup", "Dedi", "Edo", 
                 "Fega", "Geral", "Herman", "Ito", "Jojo", 
                 "Kevin", "Leda", "Mero", "Nito", "Oco", 
                 "Petu", "Qin", "Rocky", "Songko", "Toto"),
               5000, T)
City <- sample(c(rep(c("Jakarta",
                       "Bogor",
                       "Tangerang",
                       "Depok", 
                       "Bekasi"), 
                     times=1000)))
Outlet <- sample((c("Outlet 1",
                    "Outlet 2",
                    "Outlet 3",
                    "Outlet 4",
                    "Outlet 5")), 
                 5000, T)
Menu <- sample(c("Cappucino",
                 "Es Kopi Susu", 
                 "Hot Caramel Latte", 
                 "Hot Chocolate", 
                 "Hot Red Velvet Latte", 
                 "Ice Americano",
                 "Ice Berry Coffe",
                 "Ice Cafe Latte",
                 "Ice Caramel Latte",
                 "Ice Coffee Avocado",
                 "Ice Coffee Lite", 
                 "Ice Matcha Espresso",
                 "Ice Matcha Latte",
                 "Ice Red Velvet Latte"),
               5000,T)
Pricelist <- sample(18000:45000, 14, T)
Discount <- round(runif(14, min = 0.05, max = 0.12),2)
Price <- data.frame(Menu = c
                 ("Cappucino",
                   "Es Kopi Susu", 
                   "Hot Caramel Latte", 
                   "Hot Chocolate", 
                   "Hot Red Velvet Latte", 
                   "Ice Americano",
                   "Ice Berry Coffe",
                   "Ice Cafe Latte",
                   "Ice Caramel Latte",
                   "Ice Coffee Avocado",
                   "Ice Coffee Lite",
                   "Ice Matcha Espresso",
                   "Ice Matcha Latte",
                   "Ice Red Velvet Latte"), 
                 Price = Pricelist, 
                 Discount = Discount,
                 Total_Price = round((Pricelist-Pricelist*Discount),0))
Transaction <- left_join(data.frame(Menu = Menu), Price)
## Joining, by = "Menu"
Sales <- data.frame (ID, 
                     Name, 
                     Date, 
                     City, 
                     Outlet, 
                     Transaction)
head(Sales,20)
##    ID          Name       Date      City   Outlet                 Menu Price
## 1   1        Herman 2018-01-01 Tangerang Outlet 5         Es Kopi Susu 31962
## 2   2           Qin 2018-01-01 Tangerang Outlet 1 Ice Red Velvet Latte 34047
## 3   3           Ito 2018-01-01 Tangerang Outlet 5  Ice Matcha Espresso 18526
## 4   4          Petu 2018-01-01   Jakarta Outlet 3       Ice Cafe Latte 34353
## 5   5          Leda 2018-01-01     Bogor Outlet 4 Ice Red Velvet Latte 34047
## 6   6           adi 2018-01-01    Bekasi Outlet 2         Es Kopi Susu 31962
## 7   7         Geral 2018-01-02     Depok Outlet 4            Cappucino 44996
## 8   8         Cucup 2018-01-02     Bogor Outlet 1      Ice Coffee Lite 41919
## 9   9           Oco 2018-01-02     Bogor Outlet 5   Ice Coffee Avocado 36836
## 10 10           adi 2018-01-03 Tangerang Outlet 4   Ice Coffee Avocado 36836
## 11 11           Oco 2018-01-03 Tangerang Outlet 4   Ice Coffee Avocado 36836
## 12 12           Ito 2018-01-03   Jakarta Outlet 4        Hot Chocolate 21758
## 13 13           adi 2018-01-03     Depok Outlet 1            Cappucino 44996
## 14 14          Leda 2018-01-03   Jakarta Outlet 2         Es Kopi Susu 31962
## 15 15           adi 2018-01-04     Depok Outlet 5    Hot Caramel Latte 32087
## 16 16         Rocky 2018-01-04   Jakarta Outlet 2      Ice Coffee Lite 41919
## 17 17         Kevin 2018-01-04     Depok Outlet 4    Hot Caramel Latte 32087
## 18 18          Dedi 2018-01-04     Depok Outlet 5            Cappucino 44996
## 19 19 Bakti Siregar 2018-01-04     Bogor Outlet 5    Hot Caramel Latte 32087
## 20 20        Herman 2018-01-05     Bogor Outlet 1            Cappucino 44996
##    Discount Total_Price
## 1      0.10       28766
## 2      0.10       30642
## 3      0.07       17229
## 4      0.09       31261
## 5      0.10       30642
## 6      0.10       28766
## 7      0.05       42746
## 8      0.05       39823
## 9      0.09       33521
## 10     0.09       33521
## 11     0.09       33521
## 12     0.12       19147
## 13     0.05       42746
## 14     0.10       28766
## 15     0.06       30162
## 16     0.05       39823
## 17     0.06       30162
## 18     0.05       42746
## 19     0.06       30162
## 20     0.05       42746

2 Extraction

In this section, your expecter to be able apply a very basic data frame manipulation called Extraction. Please cover the following tasks:

  • Extract all data set or transactions at Kopi Kenangan, in the specific city for instance Jakarta.
library(dplyr)
row.names(Sales) <- NULL
Sales %>% 
  filter(City == "Jakarta") %>% 
  head(20) %>%
  print()
##    ID          Name       Date    City   Outlet                 Menu Price
## 1   4          Petu 2018-01-01 Jakarta Outlet 3       Ice Cafe Latte 34353
## 2  12           Ito 2018-01-03 Jakarta Outlet 4        Hot Chocolate 21758
## 3  14          Leda 2018-01-03 Jakarta Outlet 2         Es Kopi Susu 31962
## 4  16         Rocky 2018-01-04 Jakarta Outlet 2      Ice Coffee Lite 41919
## 5  22           Qin 2018-01-05 Jakarta Outlet 5            Cappucino 44996
## 6  27         Geral 2018-01-06 Jakarta Outlet 4            Cappucino 44996
## 7  28          Petu 2018-01-06 Jakarta Outlet 2      Ice Coffee Lite 41919
## 8  30 Bakti Siregar 2018-01-07 Jakarta Outlet 5 Ice Red Velvet Latte 34047
## 9  32          Nito 2018-01-07 Jakarta Outlet 3    Ice Caramel Latte 38967
## 10 48           Qin 2018-01-10 Jakarta Outlet 5     Ice Matcha Latte 27233
## 11 50         Rocky 2018-01-10 Jakarta Outlet 2       Ice Cafe Latte 34353
## 12 52          Dedi 2018-01-10 Jakarta Outlet 1        Ice Americano 42957
## 13 53         Cucup 2018-01-10 Jakarta Outlet 3            Cappucino 44996
## 14 60         Kevin 2018-01-13 Jakarta Outlet 5         Es Kopi Susu 31962
## 15 68          Mero 2018-01-14 Jakarta Outlet 2        Hot Chocolate 21758
## 16 79         Cucup 2018-01-16 Jakarta Outlet 3     Ice Matcha Latte 27233
## 17 80          Leda 2018-01-17 Jakarta Outlet 5        Ice Americano 42957
## 18 82        Songko 2018-01-18 Jakarta Outlet 4     Ice Matcha Latte 27233
## 19 84           Edo 2018-01-19 Jakarta Outlet 4      Ice Berry Coffe 41332
## 20 87           Qin 2018-01-20 Jakarta Outlet 5        Ice Americano 42957
##    Discount Total_Price
## 1      0.09       31261
## 2      0.12       19147
## 3      0.10       28766
## 4      0.05       39823
## 5      0.05       42746
## 6      0.05       42746
## 7      0.05       39823
## 8      0.10       30642
## 9      0.12       34291
## 10     0.12       23965
## 11     0.09       31261
## 12     0.11       38232
## 13     0.05       42746
## 14     0.10       28766
## 15     0.12       19147
## 16     0.12       23965
## 17     0.11       38232
## 18     0.12       23965
## 19     0.11       36785
## 20     0.11       38232
  • Extract all data set or transactions at Kopi Kenangan, in the specific menu for instance Hot Chocolate.
library(dplyr)
Sales %>% 
  filter (Menu == "Hot Chocolate") %>% 
  head(20) %>%
  print()
##     ID   Name       Date      City   Outlet          Menu Price Discount
## 1   12    Ito 2018-01-03   Jakarta Outlet 4 Hot Chocolate 21758     0.12
## 2   68   Mero 2018-01-14   Jakarta Outlet 2 Hot Chocolate 21758     0.12
## 3   70   Petu 2018-01-15     Depok Outlet 4 Hot Chocolate 21758     0.12
## 4   71  Rocky 2018-01-15     Bogor Outlet 3 Hot Chocolate 21758     0.12
## 5   73   Jojo 2018-01-15     Bogor Outlet 3 Hot Chocolate 21758     0.12
## 6   88    Edo 2018-01-20   Jakarta Outlet 1 Hot Chocolate 21758     0.12
## 7  120  Geral 2018-01-26 Tangerang Outlet 1 Hot Chocolate 21758     0.12
## 8  139 Songko 2018-01-30    Bekasi Outlet 3 Hot Chocolate 21758     0.12
## 9  147    Ito 2018-02-01   Jakarta Outlet 2 Hot Chocolate 21758     0.12
## 10 149   Nito 2018-02-02    Bekasi Outlet 1 Hot Chocolate 21758     0.12
## 11 153    Edo 2018-02-02     Bogor Outlet 1 Hot Chocolate 21758     0.12
## 12 163   Nito 2018-02-05    Bekasi Outlet 5 Hot Chocolate 21758     0.12
## 13 164   Petu 2018-02-05    Bekasi Outlet 4 Hot Chocolate 21758     0.12
## 14 174    Edo 2018-02-07   Jakarta Outlet 5 Hot Chocolate 21758     0.12
## 15 191  Cucup 2018-02-09     Depok Outlet 3 Hot Chocolate 21758     0.12
## 16 194    Oco 2018-02-10     Bogor Outlet 4 Hot Chocolate 21758     0.12
## 17 195    Oco 2018-02-10     Bogor Outlet 4 Hot Chocolate 21758     0.12
## 18 204  Cucup 2018-02-12     Bogor Outlet 5 Hot Chocolate 21758     0.12
## 19 227  Geral 2018-02-17     Depok Outlet 2 Hot Chocolate 21758     0.12
## 20 274    Qin 2018-02-27 Tangerang Outlet 2 Hot Chocolate 21758     0.12
##    Total_Price
## 1        19147
## 2        19147
## 3        19147
## 4        19147
## 5        19147
## 6        19147
## 7        19147
## 8        19147
## 9        19147
## 10       19147
## 11       19147
## 12       19147
## 13       19147
## 14       19147
## 15       19147
## 16       19147
## 17       19147
## 18       19147
## 19       19147
## 20       19147
  • Extract all data set or transactions at Kopi Kenangan, in the specific cashier names for instance Bakti Siregar.
library(dplyr)
Sales %>%
  filter(Name == "Bakti Siregar") %>%
  head (20) %>%
  print()
##     ID          Name       Date      City   Outlet                 Menu Price
## 1   19 Bakti Siregar 2018-01-04     Bogor Outlet 5    Hot Caramel Latte 32087
## 2   21 Bakti Siregar 2018-01-05 Tangerang Outlet 5       Ice Cafe Latte 34353
## 3   30 Bakti Siregar 2018-01-07   Jakarta Outlet 5 Ice Red Velvet Latte 34047
## 4   43 Bakti Siregar 2018-01-09    Bekasi Outlet 3      Ice Berry Coffe 41332
## 5   45 Bakti Siregar 2018-01-09     Depok Outlet 3            Cappucino 44996
## 6   67 Bakti Siregar 2018-01-14     Bogor Outlet 4      Ice Coffee Lite 41919
## 7   85 Bakti Siregar 2018-01-20    Bekasi Outlet 4      Ice Berry Coffe 41332
## 8   99 Bakti Siregar 2018-01-23    Bekasi Outlet 1      Ice Coffee Lite 41919
## 9  112 Bakti Siregar 2018-01-25 Tangerang Outlet 1   Ice Coffee Avocado 36836
## 10 175 Bakti Siregar 2018-02-07     Bogor Outlet 5 Ice Red Velvet Latte 34047
## 11 177 Bakti Siregar 2018-02-07     Depok Outlet 4       Ice Cafe Latte 34353
## 12 196 Bakti Siregar 2018-02-10 Tangerang Outlet 5     Ice Matcha Latte 27233
## 13 217 Bakti Siregar 2018-02-16    Bekasi Outlet 2 Hot Red Velvet Latte 30535
## 14 235 Bakti Siregar 2018-02-20    Bekasi Outlet 1    Hot Caramel Latte 32087
## 15 244 Bakti Siregar 2018-02-22   Jakarta Outlet 3   Ice Coffee Avocado 36836
## 16 322 Bakti Siregar 2018-03-06 Tangerang Outlet 5    Ice Caramel Latte 38967
## 17 335 Bakti Siregar 2018-03-09     Depok Outlet 1       Ice Cafe Latte 34353
## 18 372 Bakti Siregar 2018-03-15    Bekasi Outlet 1         Es Kopi Susu 31962
## 19 391 Bakti Siregar 2018-03-18     Depok Outlet 5     Ice Matcha Latte 27233
## 20 404 Bakti Siregar 2018-03-21     Bogor Outlet 4       Ice Cafe Latte 34353
##    Discount Total_Price
## 1      0.06       30162
## 2      0.09       31261
## 3      0.10       30642
## 4      0.11       36785
## 5      0.05       42746
## 6      0.05       39823
## 7      0.11       36785
## 8      0.05       39823
## 9      0.09       33521
## 10     0.10       30642
## 11     0.09       31261
## 12     0.12       23965
## 13     0.07       28398
## 14     0.06       30162
## 15     0.09       33521
## 16     0.12       34291
## 17     0.09       31261
## 18     0.10       28766
## 19     0.12       23965
## 20     0.09       31261
  • Extract all data set or transactions at Kopi Kenangan, in the specific price for instance >=40000.
library(dplyr)
#Transaction with price before discount above or equal 40000 (Price)
Sales %>% 
  filter (Price >= "40000") %>%
  head(20) %>%
  print()
##    ID          Name       Date      City   Outlet            Menu Price
## 1   7         Geral 2018-01-02     Depok Outlet 4       Cappucino 44996
## 2   8         Cucup 2018-01-02     Bogor Outlet 1 Ice Coffee Lite 41919
## 3  13           adi 2018-01-03     Depok Outlet 1       Cappucino 44996
## 4  16         Rocky 2018-01-04   Jakarta Outlet 2 Ice Coffee Lite 41919
## 5  18          Dedi 2018-01-04     Depok Outlet 5       Cappucino 44996
## 6  20        Herman 2018-01-05     Bogor Outlet 1       Cappucino 44996
## 7  22           Qin 2018-01-05   Jakarta Outlet 5       Cappucino 44996
## 8  23           Edo 2018-01-05 Tangerang Outlet 1       Cappucino 44996
## 9  24          Dedi 2018-01-05    Bekasi Outlet 1 Ice Coffee Lite 41919
## 10 27         Geral 2018-01-06   Jakarta Outlet 4       Cappucino 44996
## 11 28          Petu 2018-01-06   Jakarta Outlet 2 Ice Coffee Lite 41919
## 12 31         Rocky 2018-01-07    Bekasi Outlet 5 Ice Coffee Lite 41919
## 13 35          Petu 2018-01-08     Bogor Outlet 2 Ice Coffee Lite 41919
## 14 42          Leda 2018-01-09 Tangerang Outlet 5       Cappucino 44996
## 15 43 Bakti Siregar 2018-01-09    Bekasi Outlet 3 Ice Berry Coffe 41332
## 16 44        Songko 2018-01-09     Bogor Outlet 4   Ice Americano 42957
## 17 45 Bakti Siregar 2018-01-09     Depok Outlet 3       Cappucino 44996
## 18 51        Songko 2018-01-10 Tangerang Outlet 4 Ice Coffee Lite 41919
## 19 52          Dedi 2018-01-10   Jakarta Outlet 1   Ice Americano 42957
## 20 53         Cucup 2018-01-10   Jakarta Outlet 3       Cappucino 44996
##    Discount Total_Price
## 1      0.05       42746
## 2      0.05       39823
## 3      0.05       42746
## 4      0.05       39823
## 5      0.05       42746
## 6      0.05       42746
## 7      0.05       42746
## 8      0.05       42746
## 9      0.05       39823
## 10     0.05       42746
## 11     0.05       39823
## 12     0.05       39823
## 13     0.05       39823
## 14     0.05       42746
## 15     0.11       36785
## 16     0.11       38232
## 17     0.05       42746
## 18     0.05       39823
## 19     0.11       38232
## 20     0.05       42746
#Transaction with price above or equal 40000 (Total Price)
Sales %>% 
  filter (Total_Price >= "40000") %>%
  head(20) %>%
  print()
##     ID          Name       Date      City   Outlet      Menu Price Discount
## 1    7         Geral 2018-01-02     Depok Outlet 4 Cappucino 44996     0.05
## 2   13           adi 2018-01-03     Depok Outlet 1 Cappucino 44996     0.05
## 3   18          Dedi 2018-01-04     Depok Outlet 5 Cappucino 44996     0.05
## 4   20        Herman 2018-01-05     Bogor Outlet 1 Cappucino 44996     0.05
## 5   22           Qin 2018-01-05   Jakarta Outlet 5 Cappucino 44996     0.05
## 6   23           Edo 2018-01-05 Tangerang Outlet 1 Cappucino 44996     0.05
## 7   27         Geral 2018-01-06   Jakarta Outlet 4 Cappucino 44996     0.05
## 8   42          Leda 2018-01-09 Tangerang Outlet 5 Cappucino 44996     0.05
## 9   45 Bakti Siregar 2018-01-09     Depok Outlet 3 Cappucino 44996     0.05
## 10  53         Cucup 2018-01-10   Jakarta Outlet 3 Cappucino 44996     0.05
## 11  65        Songko 2018-01-13    Bekasi Outlet 1 Cappucino 44996     0.05
## 12  76          Mero 2018-01-16     Bogor Outlet 5 Cappucino 44996     0.05
## 13  91          Toto 2018-01-21     Bogor Outlet 5 Cappucino 44996     0.05
## 14 115         Rocky 2018-01-25 Tangerang Outlet 4 Cappucino 44996     0.05
## 15 117        Songko 2018-01-25     Bogor Outlet 2 Cappucino 44996     0.05
## 16 119         Cucup 2018-01-26     Bogor Outlet 5 Cappucino 44996     0.05
## 17 121          Jojo 2018-01-26     Depok Outlet 3 Cappucino 44996     0.05
## 18 134         Rocky 2018-01-29   Jakarta Outlet 3 Cappucino 44996     0.05
## 19 173          Dedi 2018-02-06     Depok Outlet 4 Cappucino 44996     0.05
## 20 176           Qin 2018-02-07   Jakarta Outlet 3 Cappucino 44996     0.05
##    Total_Price
## 1        42746
## 2        42746
## 3        42746
## 4        42746
## 5        42746
## 6        42746
## 7        42746
## 8        42746
## 9        42746
## 10       42746
## 11       42746
## 12       42746
## 13       42746
## 14       42746
## 15       42746
## 16       42746
## 17       42746
## 18       42746
## 19       42746
## 20       42746
  • Add a new variable, call Total_Price to your data frame (data frame that you have done above)
Sales$Total_Price <- round(Sales$Price-Sales$Price*Sales$Discount,0)
Sales %>% head(20) %>%
  print()
##    ID          Name       Date      City   Outlet                 Menu Price
## 1   1        Herman 2018-01-01 Tangerang Outlet 5         Es Kopi Susu 31962
## 2   2           Qin 2018-01-01 Tangerang Outlet 1 Ice Red Velvet Latte 34047
## 3   3           Ito 2018-01-01 Tangerang Outlet 5  Ice Matcha Espresso 18526
## 4   4          Petu 2018-01-01   Jakarta Outlet 3       Ice Cafe Latte 34353
## 5   5          Leda 2018-01-01     Bogor Outlet 4 Ice Red Velvet Latte 34047
## 6   6           adi 2018-01-01    Bekasi Outlet 2         Es Kopi Susu 31962
## 7   7         Geral 2018-01-02     Depok Outlet 4            Cappucino 44996
## 8   8         Cucup 2018-01-02     Bogor Outlet 1      Ice Coffee Lite 41919
## 9   9           Oco 2018-01-02     Bogor Outlet 5   Ice Coffee Avocado 36836
## 10 10           adi 2018-01-03 Tangerang Outlet 4   Ice Coffee Avocado 36836
## 11 11           Oco 2018-01-03 Tangerang Outlet 4   Ice Coffee Avocado 36836
## 12 12           Ito 2018-01-03   Jakarta Outlet 4        Hot Chocolate 21758
## 13 13           adi 2018-01-03     Depok Outlet 1            Cappucino 44996
## 14 14          Leda 2018-01-03   Jakarta Outlet 2         Es Kopi Susu 31962
## 15 15           adi 2018-01-04     Depok Outlet 5    Hot Caramel Latte 32087
## 16 16         Rocky 2018-01-04   Jakarta Outlet 2      Ice Coffee Lite 41919
## 17 17         Kevin 2018-01-04     Depok Outlet 4    Hot Caramel Latte 32087
## 18 18          Dedi 2018-01-04     Depok Outlet 5            Cappucino 44996
## 19 19 Bakti Siregar 2018-01-04     Bogor Outlet 5    Hot Caramel Latte 32087
## 20 20        Herman 2018-01-05     Bogor Outlet 1            Cappucino 44996
##    Discount Total_Price
## 1      0.10       28766
## 2      0.10       30642
## 3      0.07       17229
## 4      0.09       31261
## 5      0.10       30642
## 6      0.10       28766
## 7      0.05       42746
## 8      0.05       39823
## 9      0.09       33521
## 10     0.09       33521
## 11     0.09       33521
## 12     0.12       19147
## 13     0.05       42746
## 14     0.10       28766
## 15     0.06       30162
## 16     0.05       39823
## 17     0.06       30162
## 18     0.05       42746
## 19     0.06       30162
## 20     0.05       42746
  • Add a new variable, call Category_Price to your data frame (data frame that you have done above), Here, I assume: “expensive”, “so-so”, and “cheap”.
Sales$Category_Price <- 
  ifelse(Sales$Total_Price >= 37000, 
         "Expensive", 
         ifelse(Sales$Total_Price >= 29000 & 
                  Sales$Total_Price < 37000, 
                "So-so", 
                "Cheap"))
Sales %>% head(20) %>%
  print()
##    ID          Name       Date      City   Outlet                 Menu Price
## 1   1        Herman 2018-01-01 Tangerang Outlet 5         Es Kopi Susu 31962
## 2   2           Qin 2018-01-01 Tangerang Outlet 1 Ice Red Velvet Latte 34047
## 3   3           Ito 2018-01-01 Tangerang Outlet 5  Ice Matcha Espresso 18526
## 4   4          Petu 2018-01-01   Jakarta Outlet 3       Ice Cafe Latte 34353
## 5   5          Leda 2018-01-01     Bogor Outlet 4 Ice Red Velvet Latte 34047
## 6   6           adi 2018-01-01    Bekasi Outlet 2         Es Kopi Susu 31962
## 7   7         Geral 2018-01-02     Depok Outlet 4            Cappucino 44996
## 8   8         Cucup 2018-01-02     Bogor Outlet 1      Ice Coffee Lite 41919
## 9   9           Oco 2018-01-02     Bogor Outlet 5   Ice Coffee Avocado 36836
## 10 10           adi 2018-01-03 Tangerang Outlet 4   Ice Coffee Avocado 36836
## 11 11           Oco 2018-01-03 Tangerang Outlet 4   Ice Coffee Avocado 36836
## 12 12           Ito 2018-01-03   Jakarta Outlet 4        Hot Chocolate 21758
## 13 13           adi 2018-01-03     Depok Outlet 1            Cappucino 44996
## 14 14          Leda 2018-01-03   Jakarta Outlet 2         Es Kopi Susu 31962
## 15 15           adi 2018-01-04     Depok Outlet 5    Hot Caramel Latte 32087
## 16 16         Rocky 2018-01-04   Jakarta Outlet 2      Ice Coffee Lite 41919
## 17 17         Kevin 2018-01-04     Depok Outlet 4    Hot Caramel Latte 32087
## 18 18          Dedi 2018-01-04     Depok Outlet 5            Cappucino 44996
## 19 19 Bakti Siregar 2018-01-04     Bogor Outlet 5    Hot Caramel Latte 32087
## 20 20        Herman 2018-01-05     Bogor Outlet 1            Cappucino 44996
##    Discount Total_Price Category_Price
## 1      0.10       28766          Cheap
## 2      0.10       30642          So-so
## 3      0.07       17229          Cheap
## 4      0.09       31261          So-so
## 5      0.10       30642          So-so
## 6      0.10       28766          Cheap
## 7      0.05       42746      Expensive
## 8      0.05       39823      Expensive
## 9      0.09       33521          So-so
## 10     0.09       33521          So-so
## 11     0.09       33521          So-so
## 12     0.12       19147          Cheap
## 13     0.05       42746      Expensive
## 14     0.10       28766          Cheap
## 15     0.06       30162          So-so
## 16     0.05       39823      Expensive
## 17     0.06       30162          So-so
## 18     0.05       42746      Expensive
## 19     0.06       30162          So-so
## 20     0.05       42746      Expensive

3 Renames Data Frame

Please rename all variables of your data frame (data frame that you have done above) in your language.

library(tidyverse)
## -- Attaching packages -------------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.2     v purrr   0.3.4
## v tibble  3.0.3     v stringr 1.4.0
## v tidyr   1.1.2     v forcats 0.5.0
## v readr   1.3.1
## -- Conflicts ----------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
Sales %>%
  rename ("Nama"="Name", 
          "Tanggal_Transaksi"="Date",
          "Kota"="City",
          "Harga"="Price", 
          "Diskon"="Discount",
          "Harga_Total"="Total_Price") %>%
  head(20) %>%
  print()
##    ID          Nama Tanggal_Transaksi      Kota   Outlet                 Menu
## 1   1        Herman        2018-01-01 Tangerang Outlet 5         Es Kopi Susu
## 2   2           Qin        2018-01-01 Tangerang Outlet 1 Ice Red Velvet Latte
## 3   3           Ito        2018-01-01 Tangerang Outlet 5  Ice Matcha Espresso
## 4   4          Petu        2018-01-01   Jakarta Outlet 3       Ice Cafe Latte
## 5   5          Leda        2018-01-01     Bogor Outlet 4 Ice Red Velvet Latte
## 6   6           adi        2018-01-01    Bekasi Outlet 2         Es Kopi Susu
## 7   7         Geral        2018-01-02     Depok Outlet 4            Cappucino
## 8   8         Cucup        2018-01-02     Bogor Outlet 1      Ice Coffee Lite
## 9   9           Oco        2018-01-02     Bogor Outlet 5   Ice Coffee Avocado
## 10 10           adi        2018-01-03 Tangerang Outlet 4   Ice Coffee Avocado
## 11 11           Oco        2018-01-03 Tangerang Outlet 4   Ice Coffee Avocado
## 12 12           Ito        2018-01-03   Jakarta Outlet 4        Hot Chocolate
## 13 13           adi        2018-01-03     Depok Outlet 1            Cappucino
## 14 14          Leda        2018-01-03   Jakarta Outlet 2         Es Kopi Susu
## 15 15           adi        2018-01-04     Depok Outlet 5    Hot Caramel Latte
## 16 16         Rocky        2018-01-04   Jakarta Outlet 2      Ice Coffee Lite
## 17 17         Kevin        2018-01-04     Depok Outlet 4    Hot Caramel Latte
## 18 18          Dedi        2018-01-04     Depok Outlet 5            Cappucino
## 19 19 Bakti Siregar        2018-01-04     Bogor Outlet 5    Hot Caramel Latte
## 20 20        Herman        2018-01-05     Bogor Outlet 1            Cappucino
##    Harga Diskon Harga_Total Category_Price
## 1  31962   0.10       28766          Cheap
## 2  34047   0.10       30642          So-so
## 3  18526   0.07       17229          Cheap
## 4  34353   0.09       31261          So-so
## 5  34047   0.10       30642          So-so
## 6  31962   0.10       28766          Cheap
## 7  44996   0.05       42746      Expensive
## 8  41919   0.05       39823      Expensive
## 9  36836   0.09       33521          So-so
## 10 36836   0.09       33521          So-so
## 11 36836   0.09       33521          So-so
## 12 21758   0.12       19147          Cheap
## 13 44996   0.05       42746      Expensive
## 14 31962   0.10       28766          Cheap
## 15 32087   0.06       30162          So-so
## 16 41919   0.05       39823      Expensive
## 17 32087   0.06       30162          So-so
## 18 44996   0.05       42746      Expensive
## 19 32087   0.06       30162          So-so
## 20 44996   0.05       42746      Expensive

4 Case Study

According to your data frame, pleas provide me the following tasks:

  • Find out the frequency of sales of which menu items are best-selling in Kopi Kenangan!
library(dplyr)
Menu_Sales <- data.frame(table(Sales$Menu))
Best_Menu <- Menu_Sales [order(Menu_Sales$Freq),] %>% 
  tail(1)
names(Best_Menu) <- c(
  'Menu', 'Quantities')
print(Best_Menu)
##           Menu Quantities
## 2 Es Kopi Susu        379
  • Find out which city got the most sales!
library(dplyr)

City_Sales <- aggregate(Total_Price ~ City, data = Sales, sum)
Best_City <- City_Sales[
  order(City_Sales$Total_Price, decreasing = T),] %>% 
  head(1) %>%
  print()
##    City Total_Price
## 3 Depok    31355089
  • Find out which city has the most discounted sales!
library(dplyr)

City_disc <- aggregate(Discount ~ City, data = Sales, sum)
Best_Disc <- City_disc[
  order(City_disc$Discount, 
        decreasing = T),] %>% 
  head (1) %>% 
  print()
##    City Discount
## 3 Depok    90.89
  • what year were the most sales?
library(dplyr)
Yearly_Sales <- Sales %>% 
  separate(Date, c("year", "month","day"), sep="-") %>% 
  select(year) %>% 
  table() %>% 
  as.data.frame()
Best_Year <- Yearly_Sales[
  order(Yearly_Sales$Freq, 
        decreasing = T),] %>%
  head(1) %>%
  as.data.frame()
names(Best_Year) <- c('Year', 'Total_Sales')
print(Best_Year)
##   Year Total_Sales
## 1 2018        1824