Kasus 1

Asumsikan Anda telah mengumpulkan beberapa kumpulan data dari perusahaan ABC Property seperti yang dapat kita lihat pada tabel berikut:

Id             <- (1:10000)
Marketing_Name <- rep(c("Angel","Sherly","Vanessa","Irene","Julian",
                        "Jeffry","Nikita","Kefas","Siana","Lala",
                        "Fallen","Ardifo","Kevin","Juen","Jerrel",
                        "Imelda","Widi","Theodora","Elvani","Jonathan",
                        "Sofia","Abraham","Siti","Niko","Sefli",
                        "Bene", "Diana", "Pupe", "Andi", "Tatha",
                        "Endri", "Monika", "Hans", "Debora","Hanifa",
                        "James", "Jihan", "Friska","Ardiwan", "Bakti",
                        "Anthon","Amry", "Wiwik", "Bastian", "Budi",
                        "Leo","Simon","Matius","Arry", "Eliando"), 200)
Work_Exp       <- rep(c(1.3,2.4,2.5,3.6,3.7,4.7,5.7,6.7,7.7,7.3,
                        5.3,5.3,10,9.3,3.3,3.3,3.4,3.4,3.5,5.6,
                        3.5,4.6,4.6,5.7,6.2,4.4,6.4,6.4,3.5,7.5,
                        4.6,3.7,4.7,4.3,5.2,6.3,7.4,2.4,3.4,8.2,
                        6.4,7.2,1.5,7.5,10,4.5,6.5,7.2,7.1,7.6),200)
City           <- sample(c("Jakarta","Bogor","Depok","Tengerang","Bekasi"),10000, replace = T)
Cluster        <- sample(c("Victoria","Palmyra","Winona","Tiara", "Narada",
                           "Peronia","Lavesh","Alindra","Sweethome", "Asera",
                           "Teradamai","Albasia", "Adara","Neon","Arana",
                           "Asoka", "Primadona", "Mutiara","Permata","Alamanda" ), 10000, replace=T)
Price          <- sample(c(7000:15000),10000, replace = T)
Date_Sales     <- sample(seq(as.Date("2018/01/01"), by = "day", length.out = 1000),10000, replace = T)
Advertisement  <- sample(c(1:20), 10000, replace = T)
Data           <- data.frame(Id, 
                             Marketing_Name,
                             Work_Exp,
                             City,
                             Cluster,
                             Price,
                             Date_Sales,
                             Advertisement)
library(DT)
datatable(Data)

Soal 1

Kategorikan variabel Harga pada dataset di atas menjadi tiga kelompok sebagai berikut:

  • \(\text{High} > 12000\)
  • \(10000 \le \text{Medium} \le 12000\)
  • \(\text{Low} < 10000\)

Tetapkan ke dalam variabel baru yang disebut Kelas dengan menggunakan fungsi kontrol If, else if, dan else.

Data$Kelas <- ifelse(Data$Price < 10000, "low",
              ifelse(Data$Price >=  10000 & Data$Price <= 12000, "medium", "high"))
datatable(Data)

Soal 2

Kategorikan variabel Harga pada dataset di atas menjadi enam kelompok sebagai berikut:

  • Booking_fee nya 5 % jika \(\text{Price} < 8000\)
  • Booking_fee nya 6 % jika \(8000 \le \text{Price} < 9000\)
  • Booking_fee nya 7 % jika \(9000 \le \text{Price} < 10000\)
  • Booking_fee nya 8 % jika \(10000 \le \text{Price} < 11000\)
  • Booking_fee nya 9 % jika \(11000 \le \text{Price} < 13000\)
  • Booking_fee nya 10 % jika \(13000 \le \text{Price} \le 15000\)

Tetapkan ke dalam variabel baru yang disebut Booking_fee dengan menggunakan fungsi kontrol If, else if, dan else.

Data$Booking_Fee <- ifelse(Data$Price < 8000, 5/100*Data$Price,
                    ifelse(Data$Price >= 8000 & Data$Price < 9000, 6/100*Data$Price,
                    ifelse(Data$Price >= 9000 & Data$Price < 10000, 7/100*Data$Price,
                    ifelse(Data$Price >= 10000 & Data$Price < 11000, 8/100*Data$Price,
                    ifelse(Data$Price >= 11000 & Data$Price < 13000, 9/100*Data$Price,
                                                                     10/100*Data$Price)))))
datatable(Data)

Soal 3

Menurut kumpulan data akhir yang telah Anda buat pada soal no 2, saya berasumsi bahwa Anda telah bekerja sebagai pemasaran di perusahaan ABC Property, bagaimana Anda dapat mengumpulkan semua informasi tentang penjualan Anda dengan menggunakan pernyataan for.

library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.4     v dplyr   1.0.7
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   2.0.1     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
for(x in "Angel"){
  (Data %>% filter (Marketing_Name == x) %>%
     head(8) %>%
     print()
   )
  break
}
##    Id Marketing_Name Work_Exp      City  Cluster Price Date_Sales Advertisement
## 1   1          Angel      1.3 Tengerang Victoria 14051 2018-08-04             5
## 2  51          Angel      1.3     Depok  Albasia 10251 2020-02-21            12
## 3 101          Angel      1.3    Bekasi Alamanda 11218 2019-06-12            11
## 4 151          Angel      1.3     Bogor    Tiara  9426 2018-11-13            16
## 5 201          Angel      1.3 Tengerang Alamanda  8250 2020-02-27            12
## 6 251          Angel      1.3     Depok    Adara 13717 2020-04-14            16
## 7 301          Angel      1.3    Bekasi   Lavesh 13449 2020-09-02            13
## 8 351          Angel      1.3     Depok  Permata  7167 2018-05-07            12
##    Kelas Booking_Fee
## 1   high     1405.10
## 2 medium      820.08
## 3 medium     1009.62
## 4    low      659.82
## 5    low      495.00
## 6   high     1371.70
## 7   high     1344.90
## 8    low      358.35

Soal 4

Jika Anda akan mendapatkan bonus 2% dari Booking fee per unit sebagai pemasaran dan juga mendapatkan bonus tambahan 1% jika Anda telah bekerja di perusahaan ini selama lebih dari 3 tahun. Silakan hitung total bonus dengan menggunakan pernyataan if, for, dan break.

for(i in Data$Marketing_Name){
  Data$Bonus <- ifelse(Data$Work_Exp>3,
                       Data$Booking_Fee*0.03,
                       Data$Booking_Fee*0.02)
  break
}

Bonus_Perorang <- data.frame(aggregate(Data$Bonus, by = list(Marketing = Data$Marketing_Name,
                                                             Work_Exp = Data$Work_Exp), FUN = sum))


Bonus_Angel <- Bonus_Perorang %>% filter(Marketing == "Angel")
Bonus_Angel
##   Marketing Work_Exp        x
## 1     Angel      1.3 3680.036

Soal 5

Pada bagian ini, Anda diharapkan dapa membuat fungsi yang dapat menjawab setiap penyataan dibawah ini dengan melibatkan setiap fungsi kontrol yang dipelajari pada pertemuan 7.

  • Siapa nama marketing pemasaran terbaik?
  • Kota dan Cluster mana yang paling menguntungkan?
  • Hitung total biaya iklan Anda, jika Anda harus membayarnya $4 setiap kali iklan.
  • Hitung rata-rata biaya iklan untuk setiap marketing di Perusahaan tersebut.
  • Hitung Total Pendapatan (dalam Bulanan)

Marketing Terbaik

tp <- data.frame(aggregate(Data$Price, 
                 by = list(Marketing = Data$Marketing_Name),
                 FUN = sum))

bm <- tp %>% 
      filter(x == max(tp$x)) %>% 
      dplyr::rename("Total Pendapatan" = x) %>%
      print()
##   Marketing Total Pendapatan
## 1     Endri          2279391

Kota dan Kluster paling untung

cc <- data.frame(aggregate(Data$Price, by = list(
                           City = Data$City, Cluster = Data$Cluster),
                           FUN = sum))

bcc <- cc %>%
       filter(x == max(cc$x)) %>%
       dplyr::rename("Keuntungan" = x) %>%
       print()
##        City Cluster Keuntungan
## 1 Tengerang    Neon    1363802

Hitung total biaya iklan Anda, jika Anda harus membayarnya $4 setiap kali iklan.

Data$Biaya_Iklan  <- Data$Advertisement*4
Total_Biaya_Iklan <- sum(Data$Biaya_Iklan)
Total_Biaya_Iklan
## [1] 422168

Hitung rata-rata biaya iklan untuk setiap marketing di Perusahaan tersebut.

Marketing_Advertisement_Cost <- data.frame(aggregate(Data$Biaya_Iklan,
                                                     by = list(Marketing = Data$Marketing_Name),
                                                     FUN = mean)) %>%
                                                     dplyr::rename("Rata-rata" = x)
datatable(Marketing_Advertisement_Cost)

Hitung Total Pendapatan (dalam Bulanan)

oktdate            <- Data%>%
                      separate(Date_Sales, c("year", "month", "day"))
oktdate$Pendapatan <- Data$Price + Data$Booking_Fee - Data$Bonus - Data$Biaya_Iklan

Pend_Bulanan       <- aggregate.data.frame(oktdate$Pendapatan,
                                       by = list(Month = oktdate$month, Year = oktdate$year),
                                       FUN = sum)%>%
                      dplyr::rename("Pendapatan Bulanan" = x)
datatable(Pend_Bulanan)

Kasus 2

Misalkan Anda memiliki proyek riset pasar untuk mempertahankan beberapa pelanggan potensial di perusahaan Anda. Mari kita asumsikan Anda bekerja di perusahaan asuransi ABC. Untuk melakukannya, Anda ingin mengumpulkan kumpulan data berikut:

  • Marital_Status : menetapkan status perkawinan acak (“Ya”, “Tidak”)
  • Address : berikan alamat acak (JABODETABEK)
  • Work_Location : menetapkan lokasi kerja secara acak (JABODETABEK)
  • Age : menetapkan urutan angka acak (dari 19 hingga 60)
  • Academic : menetapkan tingkat akademik acak (“J.School”, “H.School”, “Sarjana”, “Magister”, “Phd”)
  • Job : 10 pekerjaan acak untuk setiap tingkat akademik
  • Grade : 5 nilai acak untuk setiap Pekerjaan
  • Income : tetapkan pendapatan yang mungkin untuk setiap Pekerjaan
  • Spending : tetapkan kemungkinan pengeluaran untuk setiap Pekerjaan
  • Number_of_children: menetapkan nomor acak di antara 0 dan 10 (sesuai dengan status perkawinan)
  • Private_vehicle : menetapkan kemungkinan kendaraan pribadi untuk setiap orang (“Mobil”, “sepeda motor”, “Umum”)
  • Home : “Sewa”, “Milik”, “Kredit”

Soal 1

Tolong berikan saya kumpulan data tentang informasi 50000 pelanggan yang mengacu pada setiap variabel di atas!

library(plyr)
## ------------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## ------------------------------------------------------------------------------
## 
## Attaching package: 'plyr'
## The following objects are masked from 'package:dplyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize
## The following object is masked from 'package:purrr':
## 
##     compact
Marital_Status           <- sample(c("Yes","No"), 50000, replace = T )

Address                  <- sample(c("Jakarta","Bogor","Depok","Tangerang","Bekasi"),50000, replace = T)

Work_Location            <- sample(rep(c("Jakarta","Bogor","Depok","Tangerang","Bekasi"),times=10000))

Age                      <- sample(c(19:60),50000, replace=T)

Academic                 <- sample(c("J.School","H.School", "Sarjana", "Magister", "Phd"), 50000, replace = T)

Job                      <- ifelse(Academic == "J.School", 
                                   sample(c("Asisten Rumah Tangga", "Waiters", "Cleaning Services", "Pedagang", "Supir",
                                            "Jasa Antar Barang", "Tukang Urut", "Penjaga Warung", "Sales Minuman",                                                    "Tukang Cukur Rambut"),
                                          replace = T),
                            ifelse(Academic == "H.School",
                                   sample(c("Barista", "Marketing Properti", "Penulis", "Guru TK", "Guru Bimbel",
                                            "Penjual Makanan", "Supir", "Juru Masak", "Sales Asuransi", "Artis"),
                                          replace = T),
                            ifelse(Academic == "Sarjana", 
                                   sample(c("Guru SMP", "Guru SMA", "Data Analis", "Pengamat", "Pelatih Olahraga",
                                            "Pengusaha", "Designer", "Akuntan", "Sutradara", "Jurnalis"),
                                          replace = T),
                            ifelse(Academic == "Magister",
                                   sample(c("Menteri", "Dokter", "Dosen", "Pengamat", "Advokat",
                                            "Manajer Keuangan", "App Developer", "Pengusaha", "Direktur", "Peneliti                                                    Lingkungan"),
                                          replace = T),
                            sample(c("Owner Perusahaan", "Dokter Spesialis", "Penasehat Hukum", "Pilot", "Aktuaris", 
                                     "Peneliti Ilmiah", "Komisaris Utama", "Rektor", "Anggota Dewan", "Air Navigation"),
                                      replace = T)
                            ))))

Grade                    <- sample(c("A", "B", "C", "D", "E"), 50000, replace = T)

Income                   <- ifelse(Academic == "J.School",
                                   round_any(runif(length(Academic=="J.School"), 
                                                   2500000, 3000000), 100000),
                            ifelse(Academic == "H.School",
                                   round_any(runif(length(Academic == "H.School"), 
                                                   3500000, 5000000), 100000),
                            ifelse(Academic == "Sarjana",
                                   round_any(runif(length(Academic == "Sarjana"), 
                                                   5500000, 6000000),100000),
                            ifelse(Academic == "Magister",
                                   round_any(runif(length(Academic == "Magister"), 
                                                   6500000 , 8000000), 100000),
                            round_any(runif(length(Academic == "Phd"), 
                                            8500000, 10000000),100000)
                            ))))

Spending                 <- round_any(Income * 50/100, 10000)

Number_of_Children       <- ifelse(Marital_Status=="Yes",sample(c(0:10)),0)

Private_Vehicle          <- sample(c("Mobil", "Motor", "Umum"), 50000, replace = T)

Home                     <- sample(c("Sewa","Milik","Kredit"), 50000, replace = T)

abc                      <- data.frame(Marital_Status,
                                       Address,
                                       Work_Location,
                                       Age,
                                       Academic,
                                       Job,
                                       Grade,
                                       Income,
                                       Spending,
                                       Number_of_Children,
                                       Private_Vehicle,
                                       Home)
library(DT)
datatable(abc)
## Warning in instance$preRenderHook(instance): It seems your data is too big
## for client-side DataTables. You may consider server-side processing: https://
## rstudio.github.io/DT/server.html

Soal 2

Ringkasan Statistik penting seperti apa yang bisa Anda dapatkan dari kumpulan data Anda?

lapply(abc, summary)
## $Marital_Status
##    Length     Class      Mode 
##     50000 character character 
## 
## $Address
##    Length     Class      Mode 
##     50000 character character 
## 
## $Work_Location
##    Length     Class      Mode 
##     50000 character character 
## 
## $Age
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   19.00   29.00   39.00   39.52   50.00   60.00 
## 
## $Academic
##    Length     Class      Mode 
##     50000 character character 
## 
## $Job
##    Length     Class      Mode 
##     50000 character character 
## 
## $Grade
##    Length     Class      Mode 
##     50000 character character 
## 
## $Income
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##  2500000  3900000  5800000  5856568  7600000 10000000 
## 
## $Spending
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 1250000 1950000 2900000 2928284 3800000 5000000 
## 
## $Number_of_Children
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.000   0.000   2.503   5.000  10.000 
## 
## $Private_Vehicle
##    Length     Class      Mode 
##     50000 character character 
## 
## $Home
##    Length     Class      Mode 
##     50000 character character

Soal 3

Menurut perhitungan dan analisis Anda, pelanggan mana yang potensial untuk Anda pertahankan?

abc$Potential <- ifelse((abc$Income-abc$Spending) > 0.2*abc$Income, "yes", "no")
datatable(abc)
## Warning in instance$preRenderHook(instance): It seems your data is too big
## for client-side DataTables. You may consider server-side processing: https://
## rstudio.github.io/DT/server.html

Referensi

  1. ref 1
  2. ref 2
  3. ref 3