1 Dataset Pokemon

Pada artikel ini, kita akan melakukan analisis eksploratif pada dataset Pokemon. Pokemon merupakan dataset yang berisikan karakteristik pokemon seperti: nama pokemon, jenis pokemon, dan karakteristik kekuatan pokemon.

Kolom-kolom pada dataset tersebut, antara lain:

  • number : nomor seri pokemon
  • name : nama pokemon
  • type : jenis pokemon
  • total : total nilai karakteristik serangan, kecepatan, health point, dan pertahanan pokemon
  • hp : health point
  • attack : Kekuatan serangan
  • defense : kekuatan pertahanan
  • special_attack : kekuatan serangan khusus
  • special_defense : kekuatan pertahanan khusus
  • speed : tingkat kecepatan

2 Persiapan

2.1 Library

if(!require(tidyverse)) install.packages("tidyverse")
if(!require(skimr)) install.packages("skimr")
if(!require(DataExplorer)) install.packages("DataExplorer")

library(dplyr)
library(ggplot2)
library(tidyr)
library(readr)
library(tibble)
library(skimr)
library(DataExplorer)
if (!require(devtools)) install.packages("devtools")
## Loading required package: devtools
## Loading required package: usethis
devtools::install_github("boxuancui/DataExplorer")
## Skipping install of 'DataExplorer' from a github remote, the SHA1 (fcfe2bb5) has not changed since last install.
##   Use `force = TRUE` to force installation

Terdapat tiga buah library yang diperlukan dalam tutorial ini, antara lain:

  1. tidyverse : koleksi paket R yang dirancang untuk ilmu data. Semua paket berbagi filosofi desain, tata bahasa, dan struktur data yang mendasarinya.
  2. skimr : menyediakan fungsi untuk membuat ringkasan data yang dapat dibaca secara cepat.
  3. DataExplorer : menyediakan fungsi yang dapat membantu proses otomasi analisis data eksploratif

2.2 Import Dataset

Data yang kita miliki memiliki format .csv. Untuk megimport data tersebut, kita dapat menggunakan fungsi read_csv dari library readr.

pokemon <- read_csv("data/pokemon.csv")

Untuk mengecek 10 observasi awal dataset tersebut, jalankan sintaks berikut:

pokemon

3 Ringkasan Data

glimpse(pokemon)
## Rows: 1,168
## Columns: 10
## $ number          <chr> " 001", " 001", " 002", " 002", " 003", " 003", " 003…
## $ name            <chr> "Bulbasaur", "Bulbasaur", "Ivysaur", "Ivysaur", "Venu…
## $ type            <chr> "GRASS", "POISON", "GRASS", "POISON", "GRASS", "POISO…
## $ total           <dbl> 318, 318, 405, 405, 525, 525, 625, 625, 309, 405, 534…
## $ hp              <dbl> 45, 45, 60, 60, 80, 80, 80, 80, 39, 58, 78, 78, 78, 7…
## $ attack          <dbl> 49, 49, 62, 62, 82, 82, 100, 100, 52, 64, 84, 84, 130…
## $ defense         <dbl> 49, 49, 63, 63, 83, 83, 123, 123, 43, 58, 78, 78, 111…
## $ special_attack  <dbl> 65, 65, 80, 80, 100, 100, 122, 122, 60, 80, 109, 109,…
## $ special_defense <dbl> 65, 65, 80, 80, 100, 100, 120, 120, 50, 65, 85, 85, 8…
## $ speed           <dbl> 45, 45, 60, 60, 80, 80, 80, 80, 65, 80, 100, 100, 100…
summary(pokemon)
##     number              name               type               total      
##  Length:1168        Length:1168        Length:1168        Min.   :180.0  
##  Class :character   Class :character   Class :character   1st Qu.:334.0  
##  Mode  :character   Mode  :character   Mode  :character   Median :453.0  
##                                                           Mean   :435.6  
##                                                           3rd Qu.:515.0  
##                                                           Max.   :780.0  
##        hp             attack          defense       special_attack  
##  Min.   :  1.00   Min.   :  5.00   Min.   :  5.00   Min.   : 10.00  
##  1st Qu.: 50.00   1st Qu.: 55.00   1st Qu.: 50.00   1st Qu.: 50.00  
##  Median : 66.00   Median : 75.00   Median : 70.00   Median : 65.00  
##  Mean   : 69.53   Mean   : 78.82   Mean   : 74.37   Mean   : 72.62  
##  3rd Qu.: 82.00   3rd Qu.:100.00   3rd Qu.: 90.00   3rd Qu.: 95.00  
##  Max.   :255.00   Max.   :190.00   Max.   :230.00   Max.   :194.00  
##  special_defense      speed       
##  Min.   : 20.00   Min.   :  5.00  
##  1st Qu.: 50.00   1st Qu.: 47.00  
##  Median : 70.00   Median : 65.50  
##  Mean   : 71.72   Mean   : 68.59  
##  3rd Qu.: 90.00   3rd Qu.: 90.00  
##  Max.   :230.00   Max.   :180.00
skim(pokemon)
Data summary
Name pokemon
Number of rows 1168
Number of columns 10
_______________________
Column type frequency:
character 3
numeric 7
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
number 0 1 4 6 0 772 0
name 0 1 3 26 0 773 0
type 0 1 3 8 0 18 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
total 0 1 435.63 116.53 180 334 453.0 515 780 ▃▆▇▂▁
hp 0 1 69.53 24.92 1 50 66.0 82 255 ▃▇▁▁▁
attack 0 1 78.82 31.71 5 55 75.0 100 190 ▂▇▆▂▁
defense 0 1 74.37 30.76 5 50 70.0 90 230 ▃▇▂▁▁
special_attack 0 1 72.62 31.77 10 50 65.0 95 194 ▅▇▅▂▁
special_defense 0 1 71.72 27.27 20 50 70.0 90 230 ▇▇▂▁▁
speed 0 1 68.59 28.32 5 47 65.5 90 180 ▃▇▆▁▁
plot_intro(pokemon)

plot_missing(pokemon)

4 Variasi

4.1 Data Kontinu

plot_histogram(pokemon)

4.2 Data Kategorikal

plot_bar(pokemon)
## 2 columns ignored with more than 50 categories.
## number: 772 categories
## name: 773 categories

5 Kovarian

5.1 Koefisien Korelasi

plot_correlation(pokemon)
## 2 features with more than 20 categories ignored!
## number: 772 categories
## name: 773 categories

5.2 Kategorikal vs Kontinu

plot_boxplot(pokemon, by = "type")

5.3 Kontinu vs Kontinu

pokemon %>% 
  select(!c(name, number, type)) %>% 
  plot_scatterplot(by = "total")

6 Jenis Pokemon Terkuat

pokemon %>%
  ggplot() +
  geom_boxplot(aes(x = type, y = total)) +
  coord_flip()

7 Pokemon Terkuat

pokemon %>%
  arrange(desc(total))

8 Pokemon Terlemah

pokemon %>%
  arrange(total)

9 Pokemon Tecepat

pokemon %>%
  arrange(desc(speed))

10 Pokemon Terkuat dari Tiap Jenisnya

max <- pokemon %>%
  group_by(type) %>%
  summarise(total = max(total)) 
## `summarise()` ungrouping output (override with `.groups` argument)
pokemon %>%
  right_join(max, by = c("type", "total"))
right_join(pokemon, max, by = c("type", "total"))

11 Pokemon dengan Tingkat Serangan Spesial Tertinggi tiap Jenisnya

max <- pokemon %>%
  group_by(type) %>%
  summarise(special_attack = max(special_attack)) 
## `summarise()` ungrouping output (override with `.groups` argument)
pokemon %>%
  right_join(max, by = c("type", "special_attack"))

12 Pokemon dengan Tingkat Pertahanan Spesial Tertinggi tiap Jenisnya

max <- pokemon %>%
  group_by(type) %>%
  summarise(special_defense = max(special_defense)) 
## `summarise()` ungrouping output (override with `.groups` argument)
pokemon %>%
  right_join(max, by = c("type", "special_defense"))

13 Apakah Pokemon dengan Tingkat Serangan Spesial Tinggi akan Memiliki Tingkat Pertahanan yang Tinggi juga?

ggplot(pokemon, aes(x =special_attack, y = special_defense)) +
  geom_point() +
  geom_smooth()
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'