Klasemen Sementara Liga Premier

Jadwal Pertandingan Premier League 2020/2021 diawali 12 September 2020 dan berakhir pada 23 Mei 2021. Berikut adalah tabel klasemen sementara disertai dengan daftar Top Score dan visualisasinya.

Sumber: https://www.bbc.com/sport/football/tables Last updated 17th April 2021 at 14:14

Informasi ini diberikan dengan memanfaatkan web scraping. Web scraping adalah teknik yang lebih “smart” untuk otomatisasi proses copy-paste ini. Selain agar lebih efisien, tujuan utama dari web scraping sebenarnya adalah memanfaatkan struktur atau pola dari suatu laman web untuk mengekstrak dan menyimpan data dalam format yang diinginkan untuk dianalisis lebih lanjut.

Teknik scraping yang digunakan adalah parsing HTML dari suatu laman web menggunakan CSS selector. Tools yang akan digunakan adalah R dengan package rvest.

load package

library(rvest)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.3     ✓ purrr   0.3.4
## ✓ tibble  3.1.2     ✓ dplyr   1.0.6
## ✓ tidyr   1.1.3     ✓ stringr 1.4.0
## ✓ readr   1.4.0     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter()         masks stats::filter()
## x readr::guess_encoding() masks rvest::guess_encoding()
## x dplyr::lag()            masks stats::lag()

Inspect and scrape dengan membaca file HTML dari halaman website dengan menggunakan fungsi read_html

url <- "https://www.bbc.com/sport/football/premier-league/table"
html <- url %>% read_html

menggunakan fungsi html_node dan html_table untuk mendapatkan data frame

epl_table <- html %>%
  html_node(".gs-o-table") %>%
  html_table

str(epl_table)
## tibble [21 × 12] (S3: tbl_df/tbl/data.frame)
##  $     : chr [1:21] "1" "2" "3" "4" ...
##  $     : chr [1:21] "team hasn't moved" "team hasn't moved" "team hasn't moved" "team hasn't moved" ...
##  $ Team: chr [1:21] "Man City" "Man Utd" "Liverpool" "Chelsea" ...
##  $ P   : chr [1:21] "38" "38" "38" "38" ...
##  $ W   : chr [1:21] "27" "21" "20" "19" ...
##  $ D   : chr [1:21] "5" "11" "9" "10" ...
##  $ L   : chr [1:21] "6" "6" "9" "9" ...
##  $ F   : chr [1:21] "83" "73" "68" "58" ...
##  $ A   : chr [1:21] "32" "44" "42" "36" ...
##  $ GD  : chr [1:21] "51" "29" "26" "22" ...
##  $ Pts : chr [1:21] "86" "74" "69" "67" ...
##  $ Form: chr [1:21] "WWon 2 - 0 against Crystal Palace on May 1st 2021.LLost 1 - 2 against Chelsea on May 8th 2021.WWon 4 - 3 agains"| __truncated__ "WWon 3 - 1 against Aston Villa on May 9th 2021.LLost 1 - 2 against Leicester City on May 11th 2021.LLost 2 - 4 "| __truncated__ "WWon 2 - 0 against Southampton on May 8th 2021.WWon 4 - 2 against Manchester United on May 13th 2021.WWon 2 - 1"| __truncated__ "WWon 2 - 0 against Fulham on May 1st 2021.WWon 2 - 1 against Manchester City on May 8th 2021.LLost 0 - 1 agains"| __truncated__ ...

Menghapus dua kolom pertama

epl_table[1:2] <- list(NULL)

Menghapus satu baris terakhir

epl_table <- epl_table[-21,]
str(epl_table)
## tibble [20 × 10] (S3: tbl_df/tbl/data.frame)
##  $ Team: chr [1:20] "Man City" "Man Utd" "Liverpool" "Chelsea" ...
##  $ P   : chr [1:20] "38" "38" "38" "38" ...
##  $ W   : chr [1:20] "27" "21" "20" "19" ...
##  $ D   : chr [1:20] "5" "11" "9" "10" ...
##  $ L   : chr [1:20] "6" "6" "9" "9" ...
##  $ F   : chr [1:20] "83" "73" "68" "58" ...
##  $ A   : chr [1:20] "32" "44" "42" "36" ...
##  $ GD  : chr [1:20] "51" "29" "26" "22" ...
##  $ Pts : chr [1:20] "86" "74" "69" "67" ...
##  $ Form: chr [1:20] "WWon 2 - 0 against Crystal Palace on May 1st 2021.LLost 1 - 2 against Chelsea on May 8th 2021.WWon 4 - 3 agains"| __truncated__ "WWon 3 - 1 against Aston Villa on May 9th 2021.LLost 1 - 2 against Leicester City on May 11th 2021.LLost 2 - 4 "| __truncated__ "WWon 2 - 0 against Southampton on May 8th 2021.WWon 4 - 2 against Manchester United on May 13th 2021.WWon 2 - 1"| __truncated__ "WWon 2 - 0 against Fulham on May 1st 2021.WWon 2 - 1 against Manchester City on May 8th 2021.LLost 0 - 1 agains"| __truncated__ ...

Reformat kolom Form

epl_table$Form[1]
## [1] "WWon 2 - 0 against Crystal Palace on May 1st 2021.LLost 1 - 2 against Chelsea on May 8th 2021.WWon 4 - 3 against Newcastle United on May 14th 2021.LLost 2 - 3 against Brighton & Hove Albion on May 18th 2021.WWon 5 - 0 against Everton on May 23rd 2021."
install.packages("stringr")
## Installing package into '/home/badriyah/R/x86_64-pc-linux-gnu-library/4.1'
## (as 'lib' is unspecified)
library(stringr)
extract_form <- function(form){
  str_extract_all(form, "WWon|DDrew|LLost")
}

form <- sapply(epl_table$Form, extract_form, USE.NAMES = FALSE)
str(form)
## List of 20
##  $ : chr [1:5] "WWon" "LLost" "WWon" "LLost" ...
##  $ : chr [1:5] "WWon" "LLost" "LLost" "DDrew" ...
##  $ : chr [1:5] "WWon" "WWon" "WWon" "WWon" ...
##  $ : chr [1:5] "WWon" "WWon" "LLost" "WWon" ...
##  $ : chr [1:5] "DDrew" "LLost" "WWon" "LLost" ...
##  $ : chr [1:5] "WWon" "LLost" "DDrew" "WWon" ...
##  $ : chr [1:5] "WWon" "LLost" "WWon" "LLost" ...
##  $ : chr [1:5] "WWon" "WWon" "WWon" "WWon" ...
##  $ : chr [1:5] "LLost" "WWon" "WWon" "WWon" ...
##  $ : chr [1:5] "WWon" "DDrew" "LLost" "WWon" ...
##  $ : chr [1:5] "LLost" "DDrew" "LLost" "WWon" ...
##  $ : chr [1:5] "LLost" "WWon" "LLost" "WWon" ...
##  $ : chr [1:5] "DDrew" "WWon" "LLost" "LLost" ...
##  $ : chr [1:5] "WWon" "LLost" "WWon" "LLost" ...
##  $ : chr [1:5] "LLost" "WWon" "WWon" "LLost" ...
##  $ : chr [1:5] "WWon" "LLost" "DDrew" "WWon" ...
##  $ : chr [1:5] "LLost" "WWon" "LLost" "LLost" ...
##  $ : chr [1:5] "LLost" "LLost" "LLost" "DDrew" ...
##  $ : chr [1:5] "DDrew" "LLost" "LLost" "LLost" ...
##  $ : chr [1:5] "LLost" "LLost" "WWon" "LLost" ...

Selanjutnya dalam setiap elemen list, ekstrak satu huruf W, D, atau L lalu gabungkan dengan delimiter tanda koma. Ekstraksi huruf menggunakan fungsi str_extract.

simply_form <- function(form){
  form %>%
    str_extract("W|D|L") %>%
    paste(collapse = ",")
}

form <- sapply(form, simply_form)
str(form)
##  chr [1:20] "W,L,W,L,W" "W,L,L,D,W" "W,W,W,W,W" "W,W,L,W,L" "D,L,W,L,L" ...

update kolom Form pada data frame epl_table dengan vector form.

epl_table$Form <- form

Hasil scraping

print(epl_table)
## # A tibble: 20 x 10
##    Team           P     W     D     L     F     A     GD    Pts   Form     
##    <chr>          <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>    
##  1 Man City       38    27    5     6     83    32    51    86    W,L,W,L,W
##  2 Man Utd        38    21    11    6     73    44    29    74    W,L,L,D,W
##  3 Liverpool      38    20    9     9     68    42    26    69    W,W,W,W,W
##  4 Chelsea        38    19    10    9     58    36    22    67    W,W,L,W,L
##  5 Leicester      38    20    6     12    68    50    18    66    D,L,W,L,L
##  6 West Ham       38    19    8     11    62    47    15    65    W,L,D,W,W
##  7 Tottenham      38    18    8     12    68    45    23    62    W,L,W,L,W
##  8 Arsenal        38    18    7     13    55    39    16    61    W,W,W,W,W
##  9 Leeds          38    18    5     15    62    54    8     59    L,W,W,W,W
## 10 Everton        38    17    8     13    47    48    -1    59    W,D,L,W,L
## 11 Aston Villa    38    16    7     15    55    46    9     55    L,D,L,W,W
## 12 Newcastle      38    12    9     17    46    62    -16   45    L,W,L,W,W
## 13 Wolves         38    12    9     17    36    52    -16   45    D,W,L,L,L
## 14 Crystal Palace 38    12    8     18    41    66    -25   44    W,L,W,L,L
## 15 Southampton    38    12    7     19    47    68    -21   43    L,W,W,L,L
## 16 Brighton       38    9     14    15    40    46    -6    41    W,L,D,W,L
## 17 Burnley        38    10    9     19    33    55    -22   39    L,W,L,L,L
## 18 Fulham         38    5     13    20    27    53    -26   28    L,L,L,D,L
## 19 West Brom      38    5     11    22    35    76    -41   26    D,L,L,L,L
## 20 Sheff Utd      38    7     2     29    20    63    -43   23    L,L,W,L,W

Membuat dataframe

dtf<- read.table(file="epl.csv", header =TRUE, sep = ",")
dtf[1:1] <- list(NULL)
print (dtf)
##     W  D  L  F  A  GD Pts Form
## 1  23  5  4 67 23  44  74   NA
## 2  18  9  4 61 34  27  63   NA
## 3  17  5  9 55 37  18  56   NA
## 4  16  7  8 51 39  12  55   NA
## 5  15  9  7 50 31  19  54   NA
## 6  15  7  9 53 37  16  52   NA
## 7  14  7 10 52 35  17  49   NA
## 8  14  6 10 41 38   3  48   NA
## 9  13  6 12 43 35   8  45   NA
## 10 14  3 14 49 49   0  45   NA
## 11 13  5 12 43 33  10  44   NA
## 12 10  8 13 31 41 -10  38   NA
## 13 10  8 13 33 52 -19  38   NA
## 14 10  6 15 39 56 -17  36   NA
## 15  7 12 12 33 38  -5  33   NA
## 16  8  9 14 25 42 -17  33   NA
## 17  8  8 15 32 51 -19  32   NA
## 18  5 11 16 24 42 -18  26   NA
## 19  5  9 17 28 59 -31  24   NA
## 20  4  2 25 17 55 -38  14   NA

Legenda P : Played W : Won F : For A : Against GD : GD Pts : Points Form: Form

Visualisasi

par(mfrow=c(1,1))
barplot(dtf[, "Pts"],
        # ubah warna ouline menjadi steelblue
        border="steelblue",
        # ubah wana box
        col= c("grey", "yellow", "steelblue", "green", "orange"),
        # ubah nama grup dari A sampai E
        names.arg = LETTERS[1:20],
        # ubah orientasi menajadi horizontal
        horiz=TRUE)

DAFTAR PUSTAKA:

https://www.nurandi.id/blog/web-scraping-dengan-r-dan-rvest-parsing-tabel-html/ https://bookdown.org/moh_rosidi2610/Metode_Numerik/dataviz.html#plotfunc