Klasemen Sementara Liga Premier
Jadwal Pertandingan Premier League 2020/2021 diawali 12 September 2020 dan berakhir pada 23 Mei 2021. Berikut adalah tabel klasemen sementara disertai dengan daftar Top Score dan visualisasinya.
Sumber: https://www.bbc.com/sport/football/tables Last updated 17th April 2021 at 14:14
Informasi ini diberikan dengan memanfaatkan web scraping. Web scraping adalah teknik yang lebih “smart” untuk otomatisasi proses copy-paste ini. Selain agar lebih efisien, tujuan utama dari web scraping sebenarnya adalah memanfaatkan struktur atau pola dari suatu laman web untuk mengekstrak dan menyimpan data dalam format yang diinginkan untuk dianalisis lebih lanjut.
Teknik scraping yang digunakan adalah parsing HTML dari suatu laman web menggunakan CSS selector. Tools yang akan digunakan adalah R dengan package rvest.
load package
library(rvest)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.3 ✓ purrr 0.3.4
## ✓ tibble 3.1.2 ✓ dplyr 1.0.6
## ✓ tidyr 1.1.3 ✓ stringr 1.4.0
## ✓ readr 1.4.0 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x readr::guess_encoding() masks rvest::guess_encoding()
## x dplyr::lag() masks stats::lag()
Inspect and scrape dengan membaca file HTML dari halaman website dengan menggunakan fungsi read_html
url <- "https://www.bbc.com/sport/football/premier-league/table"
html <- url %>% read_html
menggunakan fungsi html_node dan html_table untuk mendapatkan data frame
epl_table <- html %>%
html_node(".gs-o-table") %>%
html_table
str(epl_table)
## tibble [21 × 12] (S3: tbl_df/tbl/data.frame)
## $ : chr [1:21] "1" "2" "3" "4" ...
## $ : chr [1:21] "team hasn't moved" "team hasn't moved" "team hasn't moved" "team hasn't moved" ...
## $ Team: chr [1:21] "Man City" "Man Utd" "Liverpool" "Chelsea" ...
## $ P : chr [1:21] "38" "38" "38" "38" ...
## $ W : chr [1:21] "27" "21" "20" "19" ...
## $ D : chr [1:21] "5" "11" "9" "10" ...
## $ L : chr [1:21] "6" "6" "9" "9" ...
## $ F : chr [1:21] "83" "73" "68" "58" ...
## $ A : chr [1:21] "32" "44" "42" "36" ...
## $ GD : chr [1:21] "51" "29" "26" "22" ...
## $ Pts : chr [1:21] "86" "74" "69" "67" ...
## $ Form: chr [1:21] "WWon 2 - 0 against Crystal Palace on May 1st 2021.LLost 1 - 2 against Chelsea on May 8th 2021.WWon 4 - 3 agains"| __truncated__ "WWon 3 - 1 against Aston Villa on May 9th 2021.LLost 1 - 2 against Leicester City on May 11th 2021.LLost 2 - 4 "| __truncated__ "WWon 2 - 0 against Southampton on May 8th 2021.WWon 4 - 2 against Manchester United on May 13th 2021.WWon 2 - 1"| __truncated__ "WWon 2 - 0 against Fulham on May 1st 2021.WWon 2 - 1 against Manchester City on May 8th 2021.LLost 0 - 1 agains"| __truncated__ ...
Menghapus dua kolom pertama
epl_table[1:2] <- list(NULL)
Menghapus satu baris terakhir
epl_table <- epl_table[-21,]
str(epl_table)
## tibble [20 × 10] (S3: tbl_df/tbl/data.frame)
## $ Team: chr [1:20] "Man City" "Man Utd" "Liverpool" "Chelsea" ...
## $ P : chr [1:20] "38" "38" "38" "38" ...
## $ W : chr [1:20] "27" "21" "20" "19" ...
## $ D : chr [1:20] "5" "11" "9" "10" ...
## $ L : chr [1:20] "6" "6" "9" "9" ...
## $ F : chr [1:20] "83" "73" "68" "58" ...
## $ A : chr [1:20] "32" "44" "42" "36" ...
## $ GD : chr [1:20] "51" "29" "26" "22" ...
## $ Pts : chr [1:20] "86" "74" "69" "67" ...
## $ Form: chr [1:20] "WWon 2 - 0 against Crystal Palace on May 1st 2021.LLost 1 - 2 against Chelsea on May 8th 2021.WWon 4 - 3 agains"| __truncated__ "WWon 3 - 1 against Aston Villa on May 9th 2021.LLost 1 - 2 against Leicester City on May 11th 2021.LLost 2 - 4 "| __truncated__ "WWon 2 - 0 against Southampton on May 8th 2021.WWon 4 - 2 against Manchester United on May 13th 2021.WWon 2 - 1"| __truncated__ "WWon 2 - 0 against Fulham on May 1st 2021.WWon 2 - 1 against Manchester City on May 8th 2021.LLost 0 - 1 agains"| __truncated__ ...
Reformat kolom Form
epl_table$Form[1]
## [1] "WWon 2 - 0 against Crystal Palace on May 1st 2021.LLost 1 - 2 against Chelsea on May 8th 2021.WWon 4 - 3 against Newcastle United on May 14th 2021.LLost 2 - 3 against Brighton & Hove Albion on May 18th 2021.WWon 5 - 0 against Everton on May 23rd 2021."
install.packages("stringr")
## Installing package into '/home/badriyah/R/x86_64-pc-linux-gnu-library/4.1'
## (as 'lib' is unspecified)
library(stringr)
extract_form <- function(form){
str_extract_all(form, "WWon|DDrew|LLost")
}
form <- sapply(epl_table$Form, extract_form, USE.NAMES = FALSE)
str(form)
## List of 20
## $ : chr [1:5] "WWon" "LLost" "WWon" "LLost" ...
## $ : chr [1:5] "WWon" "LLost" "LLost" "DDrew" ...
## $ : chr [1:5] "WWon" "WWon" "WWon" "WWon" ...
## $ : chr [1:5] "WWon" "WWon" "LLost" "WWon" ...
## $ : chr [1:5] "DDrew" "LLost" "WWon" "LLost" ...
## $ : chr [1:5] "WWon" "LLost" "DDrew" "WWon" ...
## $ : chr [1:5] "WWon" "LLost" "WWon" "LLost" ...
## $ : chr [1:5] "WWon" "WWon" "WWon" "WWon" ...
## $ : chr [1:5] "LLost" "WWon" "WWon" "WWon" ...
## $ : chr [1:5] "WWon" "DDrew" "LLost" "WWon" ...
## $ : chr [1:5] "LLost" "DDrew" "LLost" "WWon" ...
## $ : chr [1:5] "LLost" "WWon" "LLost" "WWon" ...
## $ : chr [1:5] "DDrew" "WWon" "LLost" "LLost" ...
## $ : chr [1:5] "WWon" "LLost" "WWon" "LLost" ...
## $ : chr [1:5] "LLost" "WWon" "WWon" "LLost" ...
## $ : chr [1:5] "WWon" "LLost" "DDrew" "WWon" ...
## $ : chr [1:5] "LLost" "WWon" "LLost" "LLost" ...
## $ : chr [1:5] "LLost" "LLost" "LLost" "DDrew" ...
## $ : chr [1:5] "DDrew" "LLost" "LLost" "LLost" ...
## $ : chr [1:5] "LLost" "LLost" "WWon" "LLost" ...
Selanjutnya dalam setiap elemen list, ekstrak satu huruf W, D, atau L lalu gabungkan dengan delimiter tanda koma. Ekstraksi huruf menggunakan fungsi str_extract.
simply_form <- function(form){
form %>%
str_extract("W|D|L") %>%
paste(collapse = ",")
}
form <- sapply(form, simply_form)
str(form)
## chr [1:20] "W,L,W,L,W" "W,L,L,D,W" "W,W,W,W,W" "W,W,L,W,L" "D,L,W,L,L" ...
update kolom Form pada data frame epl_table dengan vector form.
epl_table$Form <- form
Hasil scraping
print(epl_table)
## # A tibble: 20 x 10
## Team P W D L F A GD Pts Form
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 Man City 38 27 5 6 83 32 51 86 W,L,W,L,W
## 2 Man Utd 38 21 11 6 73 44 29 74 W,L,L,D,W
## 3 Liverpool 38 20 9 9 68 42 26 69 W,W,W,W,W
## 4 Chelsea 38 19 10 9 58 36 22 67 W,W,L,W,L
## 5 Leicester 38 20 6 12 68 50 18 66 D,L,W,L,L
## 6 West Ham 38 19 8 11 62 47 15 65 W,L,D,W,W
## 7 Tottenham 38 18 8 12 68 45 23 62 W,L,W,L,W
## 8 Arsenal 38 18 7 13 55 39 16 61 W,W,W,W,W
## 9 Leeds 38 18 5 15 62 54 8 59 L,W,W,W,W
## 10 Everton 38 17 8 13 47 48 -1 59 W,D,L,W,L
## 11 Aston Villa 38 16 7 15 55 46 9 55 L,D,L,W,W
## 12 Newcastle 38 12 9 17 46 62 -16 45 L,W,L,W,W
## 13 Wolves 38 12 9 17 36 52 -16 45 D,W,L,L,L
## 14 Crystal Palace 38 12 8 18 41 66 -25 44 W,L,W,L,L
## 15 Southampton 38 12 7 19 47 68 -21 43 L,W,W,L,L
## 16 Brighton 38 9 14 15 40 46 -6 41 W,L,D,W,L
## 17 Burnley 38 10 9 19 33 55 -22 39 L,W,L,L,L
## 18 Fulham 38 5 13 20 27 53 -26 28 L,L,L,D,L
## 19 West Brom 38 5 11 22 35 76 -41 26 D,L,L,L,L
## 20 Sheff Utd 38 7 2 29 20 63 -43 23 L,L,W,L,W
Membuat dataframe
dtf<- read.table(file="epl.csv", header =TRUE, sep = ",")
dtf[1:1] <- list(NULL)
print (dtf)
## W D L F A GD Pts Form
## 1 23 5 4 67 23 44 74 NA
## 2 18 9 4 61 34 27 63 NA
## 3 17 5 9 55 37 18 56 NA
## 4 16 7 8 51 39 12 55 NA
## 5 15 9 7 50 31 19 54 NA
## 6 15 7 9 53 37 16 52 NA
## 7 14 7 10 52 35 17 49 NA
## 8 14 6 10 41 38 3 48 NA
## 9 13 6 12 43 35 8 45 NA
## 10 14 3 14 49 49 0 45 NA
## 11 13 5 12 43 33 10 44 NA
## 12 10 8 13 31 41 -10 38 NA
## 13 10 8 13 33 52 -19 38 NA
## 14 10 6 15 39 56 -17 36 NA
## 15 7 12 12 33 38 -5 33 NA
## 16 8 9 14 25 42 -17 33 NA
## 17 8 8 15 32 51 -19 32 NA
## 18 5 11 16 24 42 -18 26 NA
## 19 5 9 17 28 59 -31 24 NA
## 20 4 2 25 17 55 -38 14 NA
Legenda P : Played W : Won F : For A : Against GD : GD Pts : Points Form: Form
Visualisasi
par(mfrow=c(1,1))
barplot(dtf[, "Pts"],
# ubah warna ouline menjadi steelblue
border="steelblue",
# ubah wana box
col= c("grey", "yellow", "steelblue", "green", "orange"),
# ubah nama grup dari A sampai E
names.arg = LETTERS[1:20],
# ubah orientasi menajadi horizontal
horiz=TRUE)
DAFTAR PUSTAKA:
https://www.nurandi.id/blog/web-scraping-dengan-r-dan-rvest-parsing-tabel-html/ https://bookdown.org/moh_rosidi2610/Metode_Numerik/dataviz.html#plotfunc