Based on difference in ratings between the chess players and each of their opponents in our Project 1 tournament, calculate each player’s expected score (e.g. 4.3) and the difference from their actual score (e.g 4.0). List the five players who most over performed relative to their expected score, and the five players that most under performed relative to their expected score. It should be noted that while the Elo equation will be used here, it will function as a predictor of success for each match played, but will not be individually applied to alter a rating after each match The cited source for the equation used can be found at:
Glickman, M. E., & Jones, A. C. (1999). Rating the chess rating system. CHANCE-BERLIN THEN NEW YORK-, 12, 21-28
Formula use is in the form of:
Ea = 1/(1 + 10)^(Rb-Ra)/400, where Ea is the probability of winning a particular match against opponent b, using the pre-tournament ratings of both player a and b.
The expected score for the tournament then becomes the sum of the probabilities for each match played.
PP = Ea1 + Ea2 + …… Ea7, PP is predicted points.
We will then find the participants where the actual points were greater than predicted (over performed), identify those where the actual points were less than predicted (under performed), and identify the top five (5) in each of those categories.
We will first read in a CSV file that was created during Project 1 that has the necessary information to calculate the top five (5) participants who over performed and the five (5) participants who under performed. We will also clean up the data frame by removing some unnecessary variables for this assignment.
# Place file in working directory
extra_credit <- read.csv(file = "extra_credit.csv")
extra_credit1 <- extra_credit[, c("player_number", "player_name", "total_points", "player_state", "pre_rating", "rnd1_rating", "rnd2_rating", "rnd3_rating", "rnd4_rating", "rnd5_rating", "rnd6_rating", "rnd7_rating")]
head(extra_credit1,5)
## player_number player_name total_points player_state
## 1 1 GARY HUA 6.0 ON
## 2 2 DAKSHESH DARURI 6.0 MI
## 3 3 ADITYA BAJAJ 6.0 MI
## 4 4 PATRICK H SCHILLING 5.5 MI
## 5 5 HANSHI ZUO 5.5 MI
## pre_rating rnd1_rating rnd2_rating rnd3_rating rnd4_rating rnd5_rating
## 1 1794 1436 1563 1600 1610 1649
## 2 1553 1175 917 1716 1629 1604
## 3 1384 1641 955 1745 1563 1712
## 4 1716 1363 1507 1553 1579 1655
## 5 1655 1242 980 1663 1666 1716
## rnd6_rating rnd7_rating
## 1 1663 1716
## 2 1595 1649
## 3 1666 1663
## 4 1564 1794
## 5 1610 1629
Before we begin to calculate the probability of a match being won by a participant, we need to find the difference between their pre-tournament ratings for each match. We will do that by creating a “difference” column for each match played and then divide that number by 400 to match the formula input.
extra_credit1 <- extra_credit1 |> mutate(rnd1_diff = (rnd1_rating - pre_rating) / 400,
rnd2_diff = (rnd2_rating - pre_rating) / 400,
rnd3_diff = (rnd3_rating - pre_rating) / 400,
rnd4_diff = (rnd4_rating - pre_rating) / 400,
rnd5_diff = (rnd5_rating - pre_rating) / 400,
rnd6_diff = (rnd6_rating - pre_rating) / 400,
rnd7_diff = (rnd7_rating - pre_rating) / 400)
extra_credit1 <- extra_credit1 |> relocate(contains("diff"), .before = pre_rating)
We will now complete the calculations necessary to determine the probability of winning for each round.
extra_credit1 <- extra_credit1 |> mutate(power_raise_1 = 10 ^ rnd1_diff,
power_raise_2 = 10 ^ rnd2_diff,
power_raise_3 = 10 ^ rnd3_diff,
power_raise_4 = 10 ^ rnd4_diff,
power_raise_5 = 10 ^ rnd5_diff,
power_raise_6 = 10 ^ rnd6_diff,
power_raise_7= 10 ^ rnd7_diff)
extra_credit1 <- extra_credit1 |> relocate(contains("raise"), .before = rnd1_diff)
Here is where we complete the probability of winning each round.
extra_credit1 <- extra_credit1 |> mutate(prob_1 = 1 / (1 + power_raise_1),
prob_2 = 1 / (1 + power_raise_2),
prob_3 = 1 / (1 + power_raise_3),
prob_4 = 1 / (1 + power_raise_4),
prob_5 = 1 / (1 + power_raise_5),
prob_6 = 1 / (1 + power_raise_6),
prob_7 = 1 / (1 + power_raise_7))
extra_credit1 <- extra_credit1 |> relocate(contains("prob"), .before = power_raise_1)
Now we have to conduct a little housekeeping. There were rounds where a participant did not actually play an opponent. This resulted in an NA being passed through all the calculations and now exist in the probability scores. We need to now convert them to zeroes to be summed for the final predicted score, and then add up the probabilities for that predicted score.
mv1 <- ifelse(is.na(extra_credit1$prob_1), 0, extra_credit1$prob_1 )
extra_credit1$prob_1 <- mv1
mv2 <- ifelse(is.na(extra_credit1$prob_2), 0, extra_credit1$prob_2 )
extra_credit1$prob_2 <- mv2
mv3 <- ifelse(is.na(extra_credit1$prob_3), 0, extra_credit1$prob_3 )
extra_credit1$prob_3 <- mv3
mv4 <- ifelse(is.na(extra_credit1$prob_4), 0, extra_credit1$prob_4 )
extra_credit1$prob_4 <- mv4
mv5 <- ifelse(is.na(extra_credit1$prob_5), 0, extra_credit1$prob_5 )
extra_credit1$prob_5 <- mv5
mv6 <- ifelse(is.na(extra_credit1$prob_6), 0, extra_credit1$prob_6 )
extra_credit1$prob_6 <- mv6
mv7 <- ifelse(is.na(extra_credit1$prob_7), 0, extra_credit1$prob_7 )
extra_credit1$prob_7 <- mv7
extra_credit1 <- extra_credit1 |> mutate(pred_score = round((prob_1 + prob_2 + prob_3 + prob_4 + prob_5 + prob_6 + prob_7), digits=2))
extra_credit1 <- extra_credit1 |> relocate(contains("pred_score"), .before = prob_1)
extra_credit1 <- extra_credit1 |> mutate(performance = total_points - pred_score)
extra_credit1 <- extra_credit1 |> relocate(contains("player_state"), .before = total_points)
extra_credit1 <- extra_credit1 |> relocate(contains("performance"), .before = prob_1)
We can now get the final results needed to answer the questions of who were the participants that over performed and under during this tournament.
performance <- extra_credit1[,c("player_number", "player_name", "player_state", "performance", "pred_score", "total_points")]
top_perf <- performance |> arrange(desc(performance))
head(top_perf,5)
## player_number player_name player_state performance
## 1 3 ADITYA BAJAJ MI 4.05
## 2 15 ZACHARY JAMES HOUGHTON MI 3.13
## 3 10 ANVIT RAO MI 3.06
## 4 46 JACOB ALEXANDER LAVALLEY MI 2.96
## 5 37 AMIYATOSH PWNANANDAM MI 2.73
## pred_score total_points
## 1 1.95 6.0
## 2 1.37 4.5
## 3 1.94 5.0
## 4 0.04 3.0
## 5 0.77 3.5
under_perf <- performance |> arrange(performance)
head(under_perf,5)
## player_number player_name player_state performance
## 1 25 LOREN SCHWIEBERT MI -2.78
## 2 30 GEORGE AVERY JONES ON -2.52
## 3 42 JARED GE MI -2.01
## 4 31 RISHI SHETTY MI -1.59
## 5 35 JOSHUA DAVID LEE MI -1.46
## pred_score total_points
## 1 6.28 3.5
## 2 6.02 3.5
## 3 5.01 3.0
## 4 5.09 3.5
## 5 4.96 3.5