library(dplyr)
library(tidyr)
library(readr)
library(knitr)
library(kableExtra)
library(stringr)
library(lubridate)
library(ggplot2)
library(outliers)
library(caret)
setwd("C:\\Users\\Parikshit\\Documents\\Master Of Analytics\\Semester 3\\Data Preprocessing")
Having played professional chess for quite some time now, it was one of the motivations for me to take up two chess data sets, merge them and preprocess them thereby making it amenable for further processing and analysis. The end goal is also to put together a win summary of all the players (who have won atleast one game) in the data who took part in the 2014 world rapid chess championships.
In this assignment, the data is at the game level and I have gone forward merging it with the opening table to get a sense of opening classification a game came under. Importantly, to keep ambiguity at bay, I have prefixed the column names of white and black moves with the letter “op”. Also, I have converted the data to a tidy format by gathering the columns with chess moves into two columns namely ‘color_turn’ and ‘move’.
I went on to split the results column into white player’s and black player’s results. Even the round and table number features are created from the round column based on the delimiter “.”
In addition, I have also examined for outliers in the numeric variables present in the game.opening (merged) data set.
Also, I have range normalised the ELO ratings of the players to facilitate talent comparison among the players.
Also, some of the challenges that I faced in this exercise are with regards to dealing with NA values. NA values were a result of left joining games table with the opening table. Though the presence of NA values were justified, for representational purposes, I have replaced it with a hyphen(“-”).
The data set used here is of the 2014 world rapid chess championships. Having played chess professionally for quite some time now has motivated me enough to pick this data set and get it to a form where I can quickly calculate the number of points a player has gained or scored in this tournament.
The data set resided on a remote SQL server database named Chess [https://relational.fit.cvut.cz/dataset/Chess]. The data was accessed through a MySQL workbench installed on my desktop machine and later exported and stored as a csv file in my working directory.
With regards to the data sets: They were already relational in nature with two data sets namely game and opening. game table consists of game level details and a clear description of the data is as below:
The opening table consists of chess opening level details and a clear description of the same is as below:
I left join the game table with the opening table to get the type of opening played in every game at the same time retaining all the entries in the game table. Also, before doing so, to keep ambiguity at bay, I prefix the column names of the first two pairs of white player’s and black player’s moves (in the opening table) with the letter “op”.
game<-read.csv("chess_game.csv") #Reading game table
opening<-read.csv("chess_opening.csv") #Reading opening table
kable(head(game), caption = "Game Table")
| game_id | opening_id | event | site | event_date | round | white | black | game_result | ECO | whiteElo | BlackElo | opening | w1 | b1 | w2 | b2 | w3 | b3 | w4 | b4 | w5 | b5 | w6 | b6 | w7 | b7 | w8 | b8 | w9 | b9 | w10 | b10 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 713 | FIDE World Rapid 2014 | Dubai UAE | 2014-06-16 | 2.1 | Van Wely, Loek | Caruana, Fabiano | 1/2-1/2 | A41 | 2654 | 2791 | Rat Defense | d4 | d6 | Nf3 | Nf6 | c4 | g6 | g3 | Bg7 | Bg2 | O-O | O-O | Bf5 | Nh4 | Be4 | f3 | Bc6 | Nc3 | e5 | e4 | Ne8 |
| 2 | 1019 | FIDE World Rapid 2014 | Dubai UAE | 2014-06-16 | 2.2 | Guseinov, Gadir | Carlsen, Magnus | 1/2-1/2 | C08 | 2613 | 2881 | King’s Pawn Opening | e4 | e6 | d4 | d5 | Nd2 | c5 | exd5 | exd5 | Bb5+ | Nc6 | Ngf3 | cxd4 | O-O | Bd6 | Re1+ | Nge7 | Nxd4 | O-O | h3 | a6 |
| 3 | 513 | FIDE World Rapid 2014 | Dubai UAE | 2014-06-16 | 2.3 | Mamedyarov, Shakhriyar | Sargissian, Gabriel | 1-0 | A45 | 2743 | 2686 | Trompowsky Attack | d4 | Nf6 | Bg5 | e6 | e4 | h6 | Bxf6 | Qxf6 | Nc3 | Bb4 | Qd2 | d6 | a3 | Bxc3 | Qxc3 | Nc6 | Nf3 | O-O | Bd3 | dxe5 |
| 4 | 152 | FIDE World Rapid 2014 | Dubai UAE | 2014-06-16 | 2.4 | Yu, Yangyi | Svidler, Peter | 1-0 | B48 | 2675 | 2753 | Sicilian Defense | e4 | c5 | Nf3 | e6 | d4 | cxd4 | Nxd4 | Nc6 | Nc3 | Qc7 | Be3 | a6 | Qd2 | Nf6 | O-O-O | Be7 | f4 | b5 | exf6 | bxc3 |
| 5 | 818 | FIDE World Rapid 2014 | Dubai UAE | 2014-06-16 | 2.5 | Milov, Vadim | Karjakin, Sergey | 0-1 | E46 | 2622 | 2771 | Indian Game | d4 | Nf6 | c4 | e6 | Nc3 | Bb4 | e3 | O-O | Nge2 | d5 | a3 | Be7 | cxd5 | exd5 | b4 | c6 | Ng3 | Be6 | O-O | b5 |
| 6 | 526 | FIDE World Rapid 2014 | Dubai UAE | 2014-06-16 | 2.6 | Safarli, Eltaj | Korobov, Anton | 1/2-1/2 | C25 | 2649 | 2680 | Vienna Game | e4 | e5 | Nc3 | Bc5 | Nf3 | d6 | Na4 | Bb6 | Nxb6 | axb6 | d4 | exd4 | Qxd4 | Nf6 | e5 | Nc6 | Bb5 | dxe5 | Bf4 | O-O |
kable(head(opening), caption = "Opening Table")
| opening_id | name | code | w1 | b1 | w2 | b2 | w3 | b3 | w4 | b4 | variation |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 9 | Scandinavian Defense | - | e4 | d5 | exd5 | Qxd5 | Nc3 | Qa5 | d4 | e5 | Anderssen Counterattack |
| 10 | Scandinavian Defense | - | e4 | d5 | exd5 | Qxd5 | Nc3 | Qd6 | d4 | Nf6 | Bronstein Variation |
| 11 | Scandinavian Defense | - | e4 | d5 | exd5 | Qxd5 | Nc3 | Qa5 | d4 | Nf6 | Classical Variation |
| 12 | Scandinavian Defense | - | e4 | d5 | exd5 | Qxd5 | Nc3 | Qd6 | - | - | Gubinsky-Melts Defense |
| 13 | Scandinavian Defense | - | e4 | d5 | exd5 | Nf6 | c4 | e6 | - | - | Icelandic-Palme Gambit |
| 14 | Scandinavian Defense | - | e4 | d5 | exd5 | Nf6 | d4 | Nxd5 | c4 | Nb4 | Kiel Variation |
#Padding the opening moves with a prefix "op"
names(opening)[4:11]<-str_pad(names(opening)[4:11],width=3,side="left",pad=c("p"))
names(opening)[4:11]<-str_pad(names(opening)[4:11],width=4,side="left",pad=c("o"))
#Left joining game table with opening table on opening_id
game.opening<-merge(game,opening,by.x="opening_id",by.y="opening_id",all.x=TRUE)
kable(head(game.opening[c(1:3,37:40)]),caption = "Merged Table - Game table left joined with Opening table")
| opening_id | game_id | event | opb1 | opw2 | opb2 | opw3 |
|---|---|---|---|---|---|---|
| 19 | 47 | FIDE World Rapid 2014 | d5 | exd5 | Qxd5 | - |
| 28 | 7 | FIDE World Rapid 2014 | e5 | Nf3 | Nc6 | d4 |
| 30 | 109 | FIDE World Rapid 2014 | e5 | Nf3 | Nc6 | d4 |
| 41 | 285 | FIDE World Rapid 2014 | e5 | Nf3 | Nc6 | d4 |
| 43 | 162 | FIDE World Rapid 2014 | e5 | Nf3 | Nc6 | d4 |
| 43 | 211 | FIDE World Rapid 2014 | e5 | Nf3 | Nc6 | d4 |
Here I check for the structure of my data set and apply appropriate type conversions. The structure of game.opening data set is an R data frame.
#Checking the structure of the data frame game.opening
str(game.opening)
## 'data.frame': 295 obs. of 44 variables:
## $ opening_id : int 19 28 30 41 43 43 86 89 94 94 ...
## $ game_id : int 47 7 109 285 162 211 13 118 94 54 ...
## $ event : Factor w/ 1 level "FIDE World Rapid 2014": 1 1 1 1 1 1 1 1 1 1 ...
## $ site : Factor w/ 1 level "Dubai UAE": 1 1 1 1 1 1 1 1 1 1 ...
## $ event_date : Factor w/ 2 levels "2014-06-16","2014-06-17": 1 1 2 2 2 2 1 2 2 1 ...
## $ round : num 4.13 2.7 6.41 10.4 7.44 9.16 2.13 6.51 6.26 5.3 ...
## $ white : Factor w/ 104 levels "Adly, Ahmed",..: 102 69 4 63 26 101 55 15 39 74 ...
## $ black : Factor w/ 105 levels "Adly, Ahmed",..: 50 51 6 52 4 104 74 58 3 43 ...
## $ game_result: Factor w/ 3 levels "0-1","1-0","1/2-1/2": 1 2 1 3 2 3 1 3 2 1 ...
## $ ECO : Factor w/ 137 levels "A01","A04","A05",..: 75 75 75 106 107 102 39 39 59 59 ...
## $ whiteElo : int 2675 2730 2396 2586 2662 2715 2659 2299 2682 2685 ...
## $ BlackElo : int 2712 2659 2635 2546 2396 2546 2685 2632 2505 2771 ...
## $ opening : Factor w/ 27 levels "English Opening",..: 18 18 18 19 19 19 20 20 20 20 ...
## $ w1 : Factor w/ 5 levels "b3","c4","d4",..: 4 4 4 3 3 3 4 4 4 4 ...
## $ b1 : Factor w/ 9 levels "c5","c6","d5",..: 5 5 5 3 3 3 1 1 1 1 ...
## $ w2 : Factor w/ 13 levels "b3","Bb2","Bc4",..: 13 13 13 6 6 6 5 5 13 13 ...
## $ b2 : Factor w/ 17 levels "a6","b6","Bc5",..: 16 16 16 12 12 12 8 17 9 9 ...
## $ w3 : Factor w/ 29 levels "b3","b4","Bb2",..: 15 15 15 25 25 25 21 20 15 15 ...
## $ b3 : Factor w/ 35 levels "a6","b5","b6",..: 23 23 23 32 15 15 35 30 16 16 ...
## $ w4 : Factor w/ 45 levels "a3","a4","b3",..: 42 42 42 38 24 24 22 22 42 42 ...
## $ b4 : Factor w/ 52 levels "a5","a6","b5",..: 10 7 38 17 37 11 35 18 38 38 ...
## $ w5 : Factor w/ 60 levels "a3","a4","b3",..: 49 19 49 16 44 11 44 44 42 42 ...
## $ b5 : Factor w/ 62 levels "a5","a6","axb6",..: 23 10 23 32 53 39 17 35 2 2 ...
## $ w6 : Factor w/ 71 levels "a3","a4","b3",..: 11 13 34 2 63 38 12 9 41 13 ...
## $ b6 : Factor w/ 70 levels "a5","a6","b5",..: 56 7 67 6 10 57 30 49 40 39 ...
## $ w7 : Factor w/ 81 levels "a3","a4","b3",..: 67 57 75 37 13 56 31 7 45 56 ...
## $ b7 : Factor w/ 76 levels "a5","a6","b4",..: 63 39 48 45 32 63 35 31 54 15 ...
## $ w8 : Factor w/ 99 levels "a3","a4","b4",..: 58 23 31 25 25 78 57 72 46 12 ...
## $ b8 : Factor w/ 91 levels "a5","a6","axb6",..: 1 3 7 78 2 6 8 37 5 77 ...
## $ w9 : Factor w/ 105 levels "a3","a4","a5",..: 95 65 4 84 84 13 84 40 1 84 ...
## $ b9 : Factor w/ 107 levels "a4","a5","a6",..: 42 77 54 24 90 6 91 28 8 92 ...
## $ w10 : Factor w/ 116 levels "a3","a4","axb5",..: 102 33 51 112 41 109 68 97 47 74 ...
## $ b10 : Factor w/ 115 levels "a4","a5","a5+",..: 22 105 97 98 53 77 76 106 45 90 ...
## $ name : Factor w/ 41 levels "Gruenfeld Defense",..: 24 25 25 25 25 25 27 27 27 27 ...
## $ code : Factor w/ 1 level "-": 1 1 1 1 1 1 1 1 1 1 ...
## $ opw1 : Factor w/ 8 levels "-","c3","d4",..: 5 5 5 5 5 5 5 5 5 5 ...
## $ opb1 : Factor w/ 14 levels "-","a6","b5",..: 7 9 9 9 9 9 5 5 5 5 ...
## $ opw2 : Factor w/ 20 levels "-","a3","b3",..: 14 20 20 20 20 20 20 20 8 8 ...
## $ opb2 : Factor w/ 28 levels "-","a6","b5",..: 28 22 22 22 22 22 22 22 14 14 ...
## $ opw3 : Factor w/ 33 levels "-","a3","b3",..: 1 18 18 18 18 18 18 18 24 24 ...
## $ opb3 : Factor w/ 45 levels "-","a6","b5",..: 1 26 26 26 26 26 16 16 45 45 ...
## $ opw4 : Factor w/ 44 levels "-","a3","b4",..: 1 37 37 37 6 6 37 37 1 1 ...
## $ opb4 : Factor w/ 55 levels "-","a6","b5",..: 1 10 10 10 1 1 36 36 1 1 ...
## $ variation : Factor w/ 921 levels "-"," Accelerated Gurgenidze ",..: 502 160 507 740 782 782 7 472 33 33 ...
#Coercing opening_id to a factor data type
game.opening$opening_id<-as.factor(game.opening$opening_id)
#Splitting the round column into round and table number column
game.opening<-game.opening%>%separate(round,into=c("round","table_number"),sep="\\.")
#Converting the round and table number column to a factor type
#Ordering levels of round column
game.opening$round<-factor(game.opening$round,
levels=c("1","2","3","4","5", "6","7","8","9","10"),
ordered = TRUE)
game.opening$table_number<-factor(game.opening$table_number)
#Converting column: players who played white to a character column
game.opening$white<-as.character(game.opening$white)
#similarly with black
game.opening$black<-as.character(game.opening$black)
#Coercing game_id to a character data type
game.opening$game_id<-as.character(game.opening$game_id)
#Coercing event to a character data type
game.opening$event<-as.character(game.opening$event)
#Coercing site to a character data type
game.opening$site<-as.character(game.opening$site)
#Coercing the moves columns to a character vector
game.opening[15:34]<-sapply(game.opening[15:34],as.character)
game.opening[37:44]<-sapply(game.opening[37:44],as.character)
#Coercing ELO ratings of players to a numeric data type
game.opening$whiteElo<-as.numeric(game.opening$whiteElo)
game.opening$BlackElo<-as.numeric(game.opening$BlackElo)
kable(head(game.opening[c('round','table_number')]), 'latex',caption = "Game.Opening table after manipulating the round column")
The first few pairs of moves (white side and black side) that unfolded during the game (columns 15 to 34) and as well as the first few pairs of opening moves according to the classical chess books (columns 37 to 44) are all in separate columns of the game.opening data frame. This contributes to the unitidiness of the data set.
Keeping in mind the above and to make the data set amenable for further analyses, I have gathered the above mentioned columns into two columns (‘color_turn’ and ‘move’).
To understand and see the result of the above exercise, I would like to print out the first 10 rows of the ‘color_turn’ and ‘move’ column from the game.opening data frame.
game.opening<-game.opening%>%gather(c(15:34,37:44),key="color_turn",value="move")
kable(head(game.opening,10),
caption="Tidy Game.Opening Table")
| opening_id | game_id | event | site | event_date | round | table_number | white | black | game_result | ECO | whiteElo | BlackElo | opening | name | code | variation | color_turn | move |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 19 | 47 | FIDE World Rapid 2014 | Dubai UAE | 2014-06-16 | 4 | 13 | Yu, Yangyi | Le, Quang Liem | 0-1 | C45 | 2675 | 2712 | Scotch Game | Scandinavian Defense | - | Mieses-Kotroc Variation | w1 | e4 |
| 28 | 7 | FIDE World Rapid 2014 | Dubai UAE | 2014-06-16 | 2 | 7 | Nepomniachtchi, Ian | Lenic, Luka | 1-0 | C45 | 2730 | 2659 | Scotch Game | Scotch Game | - | Classical Variation | w1 | e4 |
| 30 | 109 | FIDE World Rapid 2014 | Dubai UAE | 2014-06-17 | 6 | 41 | Al Qudaimi, Basheer | Amin, Bassem | 0-1 | C45 | 2396 | 2635 | Scotch Game | Scotch Game | - | Millennium Variation | w1 | e4 |
| 41 | 285 | FIDE World Rapid 2014 | Dubai UAE | 2014-06-17 | 10 | 4 | Moradiabadi, Elshan | Lu, Shanglei | 1/2-1/2 | D44 | 2586 | 2546 | Semi-Slav Defense | Scotch Game | - | Romanishin Variation | w1 | d4 |
| 43 | 162 | FIDE World Rapid 2014 | Dubai UAE | 2014-06-17 | 7 | 44 | Fedoseev, Vladimir3 | Al Qudaimi, Basheer | 1-0 | D45 | 2662 | 2396 | Semi-Slav Defense | Scotch Game | - | Scotch Gambit | w1 | d4 |
| 43 | 211 | FIDE World Rapid 2014 | Dubai UAE | 2014-06-17 | 9 | 16 | Wojtaszek, Radoslaw | Yudin, Sergei | 1/2-1/2 | D31 | 2715 | 2546 | Semi-Slav Defense | Scotch Game | - | Scotch Gambit | w1 | d4 |
| 86 | 13 | FIDE World Rapid 2014 | Dubai UAE | 2014-06-16 | 2 | 13 | Mamedov, Rauf | Polgar, Judit | 0-1 | B22 | 2659 | 2685 | Sicilian Defense | Sicilian Defense | - | Accelerated Dragon, Exchange Variation | w1 | e4 |
| 89 | 118 | FIDE World Rapid 2014 | Dubai UAE | 2014-06-17 | 6 | 51 | Breckenridge, Steven James | Mchedlishvili, Mikheil | 1/2-1/2 | B22 | 2299 | 2632 | Sicilian Defense | Sicilian Defense | - | Maroczy Bind, Breyer Variation | w1 | e4 |
| 94 | 94 | FIDE World Rapid 2014 | Dubai UAE | 2014-06-17 | 6 | 26 | Inarkiev, Ernesto | Al-Sayed, Mohammed | 1-0 | B90 | 2682 | 2505 | Sicilian Defense | Sicilian Defense | - | Alapin Variation, Barmen Defense | w1 | e4 |
| 94 | 54 | FIDE World Rapid 2014 | Dubai UAE | 2014-06-16 | 5 | 3 | Polgar, Judit | Karjakin, Sergey | 0-1 | B90 | 2685 | 2771 | Sicilian Defense | Sicilian Defense | - | Alapin Variation, Barmen Defense | w1 | e4 |
I would like to split the results column into white player result (wp_rslt) and black player result (bp_rslt) by separating it on the delimiter “-”. wp_rslt and bp_rslt will be coerced to a numeric data type.
Also, from this, I would like to create a column (winner) with the name of the winning player and if the game was drawn, I would like to flag the respective cell of the column as “DRAWN”.
game.opening<-game.opening%>%separate(game_result,into=c("wp_rslt","bp_rslt"),sep="-")
game.opening$wp_rslt<-ifelse(game.opening$wp_rslt=="1/2",0.5,game.opening$wp_rslt)
game.opening$bp_rslt<-ifelse(game.opening$bp_rslt=="1/2",0.5,game.opening$bp_rslt)
game.opening$wp_rslt<-as.numeric(game.opening$wp_rslt)
game.opening$bp_rslt<-as.numeric(game.opening$bp_rslt)
game.opening$winner<-ifelse(game.opening$wp_rslt==1.0,
game.opening$white,
ifelse(game.opening$bp_rslt==1.0,
game.opening$black,"DRAWN"))
kable(head(game.opening), caption = "game.opening table after manipulation of game result column")
| opening_id | game_id | event | site | event_date | round | table_number | white | black | wp_rslt | bp_rslt | ECO | whiteElo | BlackElo | opening | name | code | variation | color_turn | move | winner |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 19 | 47 | FIDE World Rapid 2014 | Dubai UAE | 2014-06-16 | 4 | 13 | Yu, Yangyi | Le, Quang Liem | 0.0 | 1.0 | C45 | 2675 | 2712 | Scotch Game | Scandinavian Defense | - | Mieses-Kotroc Variation | w1 | e4 | Le, Quang Liem |
| 28 | 7 | FIDE World Rapid 2014 | Dubai UAE | 2014-06-16 | 2 | 7 | Nepomniachtchi, Ian | Lenic, Luka | 1.0 | 0.0 | C45 | 2730 | 2659 | Scotch Game | Scotch Game | - | Classical Variation | w1 | e4 | Nepomniachtchi, Ian |
| 30 | 109 | FIDE World Rapid 2014 | Dubai UAE | 2014-06-17 | 6 | 41 | Al Qudaimi, Basheer | Amin, Bassem | 0.0 | 1.0 | C45 | 2396 | 2635 | Scotch Game | Scotch Game | - | Millennium Variation | w1 | e4 | Amin, Bassem |
| 41 | 285 | FIDE World Rapid 2014 | Dubai UAE | 2014-06-17 | 10 | 4 | Moradiabadi, Elshan | Lu, Shanglei | 0.5 | 0.5 | D44 | 2586 | 2546 | Semi-Slav Defense | Scotch Game | - | Romanishin Variation | w1 | d4 | DRAWN |
| 43 | 162 | FIDE World Rapid 2014 | Dubai UAE | 2014-06-17 | 7 | 44 | Fedoseev, Vladimir3 | Al Qudaimi, Basheer | 1.0 | 0.0 | D45 | 2662 | 2396 | Semi-Slav Defense | Scotch Game | - | Scotch Gambit | w1 | d4 | Fedoseev, Vladimir3 |
| 43 | 211 | FIDE World Rapid 2014 | Dubai UAE | 2014-06-17 | 9 | 16 | Wojtaszek, Radoslaw | Yudin, Sergei | 0.5 | 0.5 | D31 | 2715 | 2546 | Semi-Slav Defense | Scotch Game | - | Scotch Gambit | w1 | d4 | DRAWN |
I scan for missing values in the game.opening data frame and I find that there are NA entries corresponding to a few opening id’s in the game table. These NA’s are present in the code column and the first four pairs of opening moves as present in the opening table.
Also, there are some hyphens (-) under the code and opening moves columns. This is completely fine and I felt no need of any modification.
For representational purposes, I corece the NA cells to appear as “-”.
sapply(game.opening,function(x){sum(is.na(x))})
## opening_id game_id event site event_date
## 0 0 0 0 0
## round table_number white black wp_rslt
## 0 0 0 0 0
## bp_rslt ECO whiteElo BlackElo opening
## 0 0 0 0 0
## name code variation color_turn move
## 1764 1764 1764 0 504
## winner
## 0
game.opening[is.na(game.opening)]<-"-"
## Warning in `[<-.factor`(`*tmp*`, thisvar, value = "-"): invalid factor
## level, NA generated
kable(head(game.opening), caption = "game.opening table after scanning for missing values")
| opening_id | game_id | event | site | event_date | round | table_number | white | black | wp_rslt | bp_rslt | ECO | whiteElo | BlackElo | opening | name | code | variation | color_turn | move | winner |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 19 | 47 | FIDE World Rapid 2014 | Dubai UAE | 2014-06-16 | 4 | 13 | Yu, Yangyi | Le, Quang Liem | 0.0 | 1.0 | C45 | 2675 | 2712 | Scotch Game | Scandinavian Defense | - | Mieses-Kotroc Variation | w1 | e4 | Le, Quang Liem |
| 28 | 7 | FIDE World Rapid 2014 | Dubai UAE | 2014-06-16 | 2 | 7 | Nepomniachtchi, Ian | Lenic, Luka | 1.0 | 0.0 | C45 | 2730 | 2659 | Scotch Game | Scotch Game | - | Classical Variation | w1 | e4 | Nepomniachtchi, Ian |
| 30 | 109 | FIDE World Rapid 2014 | Dubai UAE | 2014-06-17 | 6 | 41 | Al Qudaimi, Basheer | Amin, Bassem | 0.0 | 1.0 | C45 | 2396 | 2635 | Scotch Game | Scotch Game | - | Millennium Variation | w1 | e4 | Amin, Bassem |
| 41 | 285 | FIDE World Rapid 2014 | Dubai UAE | 2014-06-17 | 10 | 4 | Moradiabadi, Elshan | Lu, Shanglei | 0.5 | 0.5 | D44 | 2586 | 2546 | Semi-Slav Defense | Scotch Game | - | Romanishin Variation | w1 | d4 | DRAWN |
| 43 | 162 | FIDE World Rapid 2014 | Dubai UAE | 2014-06-17 | 7 | 44 | Fedoseev, Vladimir3 | Al Qudaimi, Basheer | 1.0 | 0.0 | D45 | 2662 | 2396 | Semi-Slav Defense | Scotch Game | - | Scotch Gambit | w1 | d4 | Fedoseev, Vladimir3 |
| 43 | 211 | FIDE World Rapid 2014 | Dubai UAE | 2014-06-17 | 9 | 16 | Wojtaszek, Radoslaw | Yudin, Sergei | 0.5 | 0.5 | D31 | 2715 | 2546 | Semi-Slav Defense | Scotch Game | - | Scotch Gambit | w1 | d4 | DRAWN |
In this section, I scan the ELO ratings column and game results columns (white and black) for possibe outliers.
par(mfrow=c(3,3))
hist(game.opening$whiteElo,main="Histogram of White Player ELO Ratings",xlab="White - ELO Ratings")
boxplot(game.opening$whiteElo,main="Box plot of White player ELO Ratings ")
hist(game.opening$BlackElo,main="Histogram of Black Player ELO Ratings",xlab="Black - ELO Ratings")
boxplot(game.opening$BlackElo,main="Box plot of black player ELO Ratings ")
boxplot(game.opening$wp_rslt,main="Box plot of white players' results")
boxplot(game.opening$bp_rslt,main="Box plot of balck players' results")
From the above, though the ELO ratings box plots indicate the presence of outliers, it is quite natural to have a wide spectrum of ELO rated players participating in a rapid championship. More importantly, FIDE chess ratings are not to be tinkered and hence I leave them as it is.
Though we should not tamper with the FIDE ratings, I fell to enable comparison and to discern the strength of the players, I create two new columns with the range normalised ELO ratings (For White and Black).
For this, I use a custom defined function that carries out the range normalisation.
#Custom defined function
rangenormal<-function(x){(x-min(x))/(max(x)-min(x))}
game.opening$whiteElo_trfm<-rangenormal(game.opening$whiteElo)
game.opening$BlackElo_trfm<-rangenormal(game.opening$BlackElo)
kable(head(game.opening[22:23]),caption = "Transformed ELO ratings of the players")
| whiteElo_trfm | BlackElo_trfm |
|---|---|
| 0.6628478 | 0.7415902 |
| 0.7528642 | 0.6605505 |
| 0.2062193 | 0.6238532 |
| 0.5171849 | 0.4877676 |
| 0.6415712 | 0.2584098 |
| 0.7283142 | 0.4877676 |
I would like to summarise the number of wins by each player (who have won atleast one game) in the 2014 world rapid chess championships. For representational purpose, I would like to print out just the first 5 rows of the winner_summary table.
winner_summary<-game.opening%>%group_by(winner)%>%summarise(wins=n_distinct(game_id))
names(winner_summary)[1]<-"Winners"
names(winner_summary)[2]<-"# Wins"
kable(head(winner_summary),caption = "Table of Player Wins")
| Winners | # Wins |
|---|---|
| Adly, Ahmed | 2 |
| Al-Modiahki, Mohamad | 2 |
| Aleksandrov, Aleksej | 1 |
| Amin, Bassem | 2 |
| Anand, Viswanathan | 3 |
| Andriasian, Zaven | 3 |
Relational.fit.cvut.cz. (2018). Dataset. [online] Available at: https://relational.fit.cvut.cz/dataset/Chess [Accessed 3 Jun. 2018].