1 Persiapan Data

1.1 Instal Library

library(ggplot2) library(leaflet) library(scales) library(tidyr) library(colorspace) library(ggridges) library(lubridate)

1.2 Data Import / Read Data

kopi <- read.csv("data_input/coffeeanalysis.csv")

Melihat Data teratas

head(kopi)

Melihat 10 data Terbawah

tail(kopi)

Melihat Dimensi Data

dim(kopi)
## [1] 2095   12

Melihat Isi data Kolom

names(kopi)
##  [1] "name"        "roaster"     "roast"       "loc_country" "origin_1"   
##  [6] "origin_2"    "X100g_USD"   "rating"      "review_date" "desc_1"     
## [11] "desc_2"      "desc_3"

Dari pemeriksaan yang dilakukan dapat di simpulkan : * Data kopi berisi 2095 baris dan 12 kolom * Setiap nama kolom : “name”, “roaster”, “roast”, “loc_country”, “origin_1”, “origin_2”, “X100g_USD”, “rating”, “review_date” “desc_1”, “desc_2”, “desc_3”

2 Data Cleansing

2.1 Melihat Struktur Data

str(kopi)
## 'data.frame':    2095 obs. of  12 variables:
##  $ name       : chr  "â\200œSweetyâ\200\235 Espresso Blend" "Flora Blend Espresso" "Ethiopia Shakiso Mormora" "Ethiopia Suke Quto" ...
##  $ roaster    : chr  "A.R.C." "A.R.C." "Revel Coffee" "Roast House" ...
##  $ roast      : chr  "Medium-Light" "Medium-Light" "Medium-Light" "Medium-Light" ...
##  $ loc_country: chr  "Hong Kong" "Hong Kong" "United States" "United States" ...
##  $ origin_1   : chr  "Panama" "Africa" "Guji Zone" "Guji Zone" ...
##  $ origin_2   : chr  "Ethiopia" "Asia Pacific" "Southern Ethiopia" "Oromia Region" ...
##  $ X100g_USD  : num  14.32 9.05 4.7 4.19 4.85 ...
##  $ rating     : int  95 94 92 92 94 93 93 93 93 94 ...
##  $ review_date: chr  "November 2017" "November 2017" "November 2017" "November 2017" ...
##  $ desc_1     : chr  "Evaluated as espresso. Sweet-toned, deeply rich, chocolaty. Vanilla paste, dark chocolate, narcissus, pink grap"| __truncated__ "Evaluated as espresso. Sweetly tart, floral-toned. Honeysuckle, oak, dried apricot, dark chocolate, thyme in ar"| __truncated__ "Crisply sweet, cocoa-toned. Lemon blossom, roasted cacao nib, date, rice candy, white peppercorn in aroma and c"| __truncated__ "Delicate, sweetly spice-toned. Pink peppercorn, date, myrrh, lavender, roasted cacao nib in aroma and cup. Cris"| __truncated__ ...
##  $ desc_2     : chr  "An espresso blend comprised of coffees from Panama and Ethiopia. A.R.C., whose motto is â\200œmore than special"| __truncated__ "An espresso blend comprised of coffees from Africa and the Asia-Pacific. A.R.C., whose motto is â\200œmore than"| __truncated__ "This coffee tied for the third-highest rating in a tasting of 71 organic-certified coffees from Africa for Coff"| __truncated__ "This coffee tied for the third-highest rating in a tasting of 71 organic-certified coffees from Africa for Coff"| __truncated__ ...
##  $ desc_3     : chr  "A radiant espresso blend that shines equally in the straight shot and in milk, alive with notes of rich dark ch"| __truncated__ "A floral-driven straight shot, amplified with notes of stone fruit and chocolate in cappuccino-scaled milk." "A gently spice-toned, floral- driven wet-processed Ethiopia cup with pleasing notes of cocoa throughout." "Lavender-like flowers and hints of zesty pink peppercorn animate this crisply sweet wet-processed Ethiopia cup." ...

2.2 Merubah Type data

kopi$roast <- as.factor(kopi$roast)
#kopi$review_date <- as.Date(kopi$review_date, "%m/%y")
str(kopi)
## 'data.frame':    2095 obs. of  12 variables:
##  $ name       : chr  "â\200œSweetyâ\200\235 Espresso Blend" "Flora Blend Espresso" "Ethiopia Shakiso Mormora" "Ethiopia Suke Quto" ...
##  $ roaster    : chr  "A.R.C." "A.R.C." "Revel Coffee" "Roast House" ...
##  $ roast      : Factor w/ 6 levels "","Dark","Light",..: 6 6 6 6 4 3 6 6 6 4 ...
##  $ loc_country: chr  "Hong Kong" "Hong Kong" "United States" "United States" ...
##  $ origin_1   : chr  "Panama" "Africa" "Guji Zone" "Guji Zone" ...
##  $ origin_2   : chr  "Ethiopia" "Asia Pacific" "Southern Ethiopia" "Oromia Region" ...
##  $ X100g_USD  : num  14.32 9.05 4.7 4.19 4.85 ...
##  $ rating     : int  95 94 92 92 94 93 93 93 93 94 ...
##  $ review_date: chr  "November 2017" "November 2017" "November 2017" "November 2017" ...
##  $ desc_1     : chr  "Evaluated as espresso. Sweet-toned, deeply rich, chocolaty. Vanilla paste, dark chocolate, narcissus, pink grap"| __truncated__ "Evaluated as espresso. Sweetly tart, floral-toned. Honeysuckle, oak, dried apricot, dark chocolate, thyme in ar"| __truncated__ "Crisply sweet, cocoa-toned. Lemon blossom, roasted cacao nib, date, rice candy, white peppercorn in aroma and c"| __truncated__ "Delicate, sweetly spice-toned. Pink peppercorn, date, myrrh, lavender, roasted cacao nib in aroma and cup. Cris"| __truncated__ ...
##  $ desc_2     : chr  "An espresso blend comprised of coffees from Panama and Ethiopia. A.R.C., whose motto is â\200œmore than special"| __truncated__ "An espresso blend comprised of coffees from Africa and the Asia-Pacific. A.R.C., whose motto is â\200œmore than"| __truncated__ "This coffee tied for the third-highest rating in a tasting of 71 organic-certified coffees from Africa for Coff"| __truncated__ "This coffee tied for the third-highest rating in a tasting of 71 organic-certified coffees from Africa for Coff"| __truncated__ ...
##  $ desc_3     : chr  "A radiant espresso blend that shines equally in the straight shot and in milk, alive with notes of rich dark ch"| __truncated__ "A floral-driven straight shot, amplified with notes of stone fruit and chocolate in cappuccino-scaled milk." "A gently spice-toned, floral- driven wet-processed Ethiopia cup with pleasing notes of cocoa throughout." "Lavender-like flowers and hints of zesty pink peppercorn animate this crisply sweet wet-processed Ethiopia cup." ...

2.3 Cek for missing value

colSums(is.na(kopi))
##        name     roaster       roast loc_country    origin_1    origin_2 
##           0           0           0           0           0           0 
##   X100g_USD      rating review_date      desc_1      desc_2      desc_3 
##           0           0           0           0           0           0
anyNA(kopi)
## [1] FALSE

2.4 Menghapus kolom

menghapus kolom origin_2, desc_1, desc_2, desc_2

kopi <- kopi[,-c(10:12)]
str(kopi)
## 'data.frame':    2095 obs. of  9 variables:
##  $ name       : chr  "â\200œSweetyâ\200\235 Espresso Blend" "Flora Blend Espresso" "Ethiopia Shakiso Mormora" "Ethiopia Suke Quto" ...
##  $ roaster    : chr  "A.R.C." "A.R.C." "Revel Coffee" "Roast House" ...
##  $ roast      : Factor w/ 6 levels "","Dark","Light",..: 6 6 6 6 4 3 6 6 6 4 ...
##  $ loc_country: chr  "Hong Kong" "Hong Kong" "United States" "United States" ...
##  $ origin_1   : chr  "Panama" "Africa" "Guji Zone" "Guji Zone" ...
##  $ origin_2   : chr  "Ethiopia" "Asia Pacific" "Southern Ethiopia" "Oromia Region" ...
##  $ X100g_USD  : num  14.32 9.05 4.7 4.19 4.85 ...
##  $ rating     : int  95 94 92 92 94 93 93 93 93 94 ...
##  $ review_date: chr  "November 2017" "November 2017" "November 2017" "November 2017" ...

3 Data Explanation

Penjelasan Singkat

summary(kopi)
##      name             roaster                   roast      loc_country       
##  Length:2095        Length:2095                    :  15   Length:2095       
##  Class :character   Class :character   Dark        :   5   Class :character  
##  Mode  :character   Mode  :character   Light       : 287   Mode  :character  
##                                        Medium      : 259                     
##                                        Medium-Dark :  39                     
##                                        Medium-Light:1490                     
##    origin_1           origin_2           X100g_USD           rating     
##  Length:2095        Length:2095        Min.   :  0.120   Min.   :84.00  
##  Class :character   Class :character   1st Qu.:  4.930   1st Qu.:92.00  
##  Mode  :character   Mode  :character   Median :  5.860   Median :93.00  
##                                        Mean   :  9.323   Mean   :93.11  
##                                        3rd Qu.:  8.785   3rd Qu.:94.00  
##                                        Max.   :132.280   Max.   :98.00  
##  review_date       
##  Length:2095       
##  Class :character  
##  Mode  :character  
##                    
##                    
## 

Ringkasan * Jenis roast kopi yang sering digunakan adalah Medium-Light sebanyak 1490 * Jenis roast kopi yang paling sedikit digunakan adalah Dark sebanyak 5 * Harga tertinggi biji kopi adalah $ 132.280/100g 7 harga terendah adalah $0.120/100g * Rating kopi tertinggi diangka 98 dan rating terendah diangka 84

3.1 Melihat pesebaran data rating kopi

aggregate(rating ~ origin_1,kopi,mean)
boxplot(kopi$rating)

Insight: terdapat outlier, persebaran data rating kopi di angka 92 sampai 94

4 Data Manipulation & Transformation

Negara penghasil kopi (Origin_1 & Origin_2) mana yang punya harga tertinggi

kopi[kopi$X100g_USD == 132.280 ,]
  • Jawaban : negara Boquete Growing Region & Western Panama yang mempunyai harga kopi tertinggi yaitu $0.12/100g dengan rating 97

  • rata rata rating kopi dunia yang banyak di konsumsi adalah 93

mean(kopi$rating)
## [1] 93.11408

Kesimpulan : * rating sebuh kopi tidak di pengaruhi oleh rating * roast yang paling banyak digunakan adalah Medium-Light * persebaran data rating kopi di angka 92 sampai 94