The goal of this extra credit 2

Chess ELO calculations

Based on the difference in ratings between the chess players and each of their opponents in our Project 1 tournament, calculate each player’s expected score (e.g., 4.3) and the difference from their actual score (e.g., 4.0). List the five players who most overperformed relative to their expected scores, and the five players that most underperformed relative to their expected score.

You’ll find some small differences in different implementations of ELO formulas. You may use any reasonably-sourced formula, but please cite your source. Due end of day Sunday 18-Feb-2024.

Image source (and potentially useful reference!): The Elo Rating System for Chess and Beyond. Feb 15, 2019. [video, 7m]

In this R Markdown, I will load the same files that have been created previously, two local files, and import the data as dataframes into RStudio.

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## Using libcurl 8.3.0 with Schannel
## 
## 
## Attaching package: 'curl'
## 
## 
## The following object is masked from 'package:readr':
## 
##     parse_date

Step1: Load files

In this section the three existing files that have been created in the project1 are loaded into RStudio. The same files will be loaded from the Data folder.

#Load three files 
Game_DF <- read.csv("Data/Game_Data.csv")
Player_DF <- read.csv("Data/Player_data.csv")
Average_rating <- read.csv("Data/Average_rating_report.csv")

print("Three files are laoded")
## [1] "Three files are laoded"

Step2: Data manipulation

In this section, the loaded data will be used to create the scores for individual players.

What is Elo, it is called after Arpad Elo a Hungarian-American physics professor. Elo is a rating score method for calculating the relative skill levels of players in specifically 2-player games like chess. Since is it for 2-payer, it also now used for other sports as well such as other 2-palyer games like board games and video games.

let’s start by understanding what Elo formula is, the following explains to how calculate the new score based on Elo rating after a game is given:

\[ \text{R}_{New} = \text{R}_{Old} + 32 \times (S - E) \]

where:

- \(\text{R}_{New}\) is the player’s new Elo rating,

- \(\text{R}_{Old}\) is the player’s old Elo rating,

-32 is the K-factor, which determines how much the rating changes after each game, K factor can change, but here I will be using 32 as suggested by the referred YouTube source.

- \(S\) is the actual score of the game (1 for a win (W), 0.5 for a draw (D), and 0 for a loss (L)),

and - \(E\), also known as \(P\), is the expected score of the game based on the player’s and opponent’s ratings.

The expected score \(P_\text{A}\) which predicts the outcome of the game of player A versus player B is calculated using the following formula:

\[ P_\text{A} = \frac{1}{1 + 10^{((\text{R}_\text{B} - \text{R}_\text{A})/400)}} \]

Now the probability of player A’s is calculated as below:

\[ P_\text{A} = { 10^{((\text{R}_\text{A} - \text{R}_\text{B})/400)}}\times P_\text{B} \]

where: - \(P_\text{A}\) is player A’s win probability, \(P_\text{B}\) is player B’s win probability, and \(R_\text{A}\) and \(R_\text{B}\) are player A and B ratings. The second formula is related to the first one by replacing \(P_\text{B}=1-P_\text{A}\).

So we start with the initial rating of the player as it is stored in Player_DF, and based on the result of each round we keep updating the score along the way toward the last round of 7. For each step also we calcualtes the probability of player winning rate, P.

We also define two simple function cal player_P for probability calculation and player_R for rating calculation.

player_P <- function(Player1_R, Player2_R) {
  # This function calculate the propability of player 1 wining against player 2 based on the rate of player 1 and player 2.
  # the equation is shown above is used 
  player_prob <-  numeric(max(length(Player1_R),length(Player1_R))) #define the vector to the size of the maximum of length of the two input 
  #Define a logic vector to use for filtering the NA b=for both player1 and 2 
  NA_logic <- c(is.na(Player1_R) | is.na(Player2_R))
  player_prob[!NA_logic] <- 1/(1+10^((Player2_R[!NA_logic]-Player1_R[!NA_logic])/400)) #calcualte the player_prob based on player1 and player2 _Rs

  # Fill in NA values with the original Player1_R values
  player_prob[NA_logic] <- NA  
  return(player_prob)
}

player_R <- function(Old_R, game_result , player_prob){
  # This function calculate the new rating of player based their old rating, the acheived scope, and the probability of wining.
  # the equation is shown above is used
  #use switch to change the game staus to numer to calcualte the New_R
  game_result_value <- sapply(unlist(game_result), function(x) {
    switch(x,
         "W" = 1.0,
         "L" = 0.0,
         "D" = 0.5,
         "H" = 0.0,
         "B" = 0.0,
         "U" = 0.0,
         0.0)
    })
  New_R <- numeric(length(Old_R)) #defien the vector 
  NA_logic <-  is.na(player_prob) # check for player_ptob with NA which means no change to the current player or no information to updat the rating 
  
  New_R[!NA_logic] <- Old_R[!NA_logic]+32*(game_result_value[!NA_logic]-player_prob[!NA_logic])  # update rating for those with propability score 
  New_R[NA_logic] = Old_R[NA_logic] #thpse w/o score, copy the old one as new 
  return(New_R)
}

#now after the two functions are defined, we start with the first game and populating the player A vs player B rating for each game and correct the players rating based on the result of each game. 


#Inizializing the DataFrame of the Win propability of each round, Win_P# and Rating for each round  Rate_R#
dataframe_size <- nrow(Player_DF)
DF <- data.frame(
  player_no    = numeric(dataframe_size),
  pre_rate    = numeric(dataframe_size),
  Win_P1       = numeric(dataframe_size),
  Rate_R1      = numeric(dataframe_size),
  Win_P2       = numeric(dataframe_size),
  Rate_R2      = numeric(dataframe_size),
  Win_P3       = numeric(dataframe_size),
  Rate_R3      = numeric(dataframe_size),
  Win_P4       = numeric(dataframe_size),
  Rate_R4      = numeric(dataframe_size),
  Win_P5       = numeric(dataframe_size),
  Rate_R5      = numeric(dataframe_size),
  Win_P6       = numeric(dataframe_size),
  Rate_R6      = numeric(dataframe_size),
  Win_P7       = numeric(dataframe_size),
  Rate_R7      = numeric(dataframe_size)
  )
#Assign the player_no 
DF["player_no"] <- Player_DF["player_no"]
#Assign the pre rating 
DF["pre_rate"] <- Player_DF["rate_pre"]


for (i in seq(ceiling(length(DF)/2)-1)){
  Temp_1 <- Game_DF|> filter(round_no==i)|>
    select(player1_no,player2_no,round_status) # find the palyer1 for the round 
#  Temp_2 <- Game_DF|> filter(round_no==i)|>
#    select(player2_no)
#  Temp_3 <- Game_DF|> filter(round_no==i)|>
#    select(round_status)
  #the Temp_1 and Temp_2 are the player number which is the row numer in DF, thus they are used to return the column values for even rows, i.e., row 2, whcoh is the previous rating for each player of the specific row number in Temp_1 or _2
  
  DF[unlist(Temp_1["player1_no"]),i*2+1] <- player_P(DF[unlist(Temp_1["player1_no"]),i*2],DF[unlist(Temp_1["player1_no"]),i*2]) # calculate the wining peobability for player1_no
  
  #the first player previous rating, with the status of the game and the previosuly calcualted win probability, is pass to funtion player_R to calcualte the new R
  DF[unlist(Temp_1["player1_no"]),i*2+2] <- round(player_R(DF[unlist(Temp_1["player1_no"]),2*i],unlist(Temp_1["round_status"]),unlist( DF[unlist(Temp_1["player1_no"]),i*2+1])),0)  # calculate the new ratign baased on the game status and round it   
}

DF <- DF|>mutate(improvement = Rate_R7-pre_rate)

print("List of underperformed players ")
## [1] "List of underperformed players "
DF|>select(player_no,pre_rate,Rate_R7,improvement)|>arrange(desc(improvement)) |> head(10)
##    player_no pre_rate Rate_R7 improvement
## 1          1     1794    1874          80
## 2          2     1553    1633          80
## 3          3     1384    1464          80
## 4          4     1716    1780          64
## 5          5     1655    1719          64
## 6          6     1686    1734          48
## 7          7     1649    1697          48
## 8          8     1641    1689          48
## 9          9     1411    1459          48
## 10        10     1365    1413          48
print("List of overperformed players")
## [1] "List of overperformed players"
DF|>select(player_no,pre_rate,Rate_R7,improvement)|>arrange(improvement) |> head(10)
##    player_no pre_rate Rate_R7 improvement
## 1         63     1175    1079         -96
## 2         53     1393    1313         -80
## 3         54     1270    1190         -80
## 4         55     1186    1106         -80
## 5         56     1153    1073         -80
## 6         57     1092    1012         -80
## 7         58      917     837         -80
## 8         59      853     773         -80
## 9         60      967     887         -80
## 10        62     1530    1450         -80

References:

  1. The Elo Rating System for Chess and Beyond-YouTube