Preface

This is a simple example of how to create maps using R’s ggmap package. It is heavily based on this following SharpSight Labs post: http://sharpsightlabs.com/blog/map-talent-competitiveness/. A big thank you to that author - check out the blog!

There are two main sections to this page. In the first section, I get data from the FIDE (Federation International Des Echecs - chess’ main governing body) website and wrangle it. In the second section, I create two maps - one that highlights the top ten countries by the average ELO score (i.e., the rating) of its members, and another that highlights the top ten countries whose best players have the highest average ELO score. This will involve a little more wrangling.

Section 1 - Getting Data

First, let’s load some libraries and get the June 2017 rating from FIDE’s website.

########## LOADING AND FORMATTING DATA ##########
library(stringr)
library(tidyr)
library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggmap) 
library(extrafontdb)
library(Rttf2pt1)
library(extrafont)
## Registering fonts with R
#load data
chess.no.headers <- read.delim("C:/Users/alan/Documents/R/Scripts and projects/Chess/standard_jun17frl.txt", header = F)

Problem #1: all the columns are smooshed into one very wide column. Let’s separate them (in a very inelegant way!)

a <- separate(chess.no.headers, V1, into = c("ID", "Other"), sep = 9)
b <- separate(a, Other, into = c("Name", "Other"), sep = 60)
c <- separate(b, Other, into = c("Fed", "Other"), sep = -58) #no idea why this is necessary
d <- separate(c, Other, into = c("Sex", "Other"), sep = -54)
e <- separate(d, Other, into = c("Tit", "Other"), sep = -50)
f <- separate(e, Other, into = c("WTit", "Other"), sep = -44)
g <- separate(f, Other, into = c("OTit", "Other"), sep = -28)
h <- separate(g, Other, into = c("FOA.JUN17", "Other"), sep = -19)
i <- separate(h, Other, into = c("Gms", "Other"), sep = -15)
j <- separate(i, Other, into = c("K", "Other"), sep = -12)
k <- separate(j, Other, into = c("B.Day", "Flag"), sep = -5)

chess <- k

Still a little clean-up left, including changing the class of many of the columns. I also want to remove players without a rating.

sapply(chess, class) #all characters
##          ID        Name         Fed         Sex         Tit        WTit 
## "character" "character" "character" "character" "character" "character" 
##        OTit   FOA.JUN17         Gms           K       B.Day        Flag 
## "character" "character" "character" "character" "character" "character"
head(chess)
##          ID                                                         Name
## 1 ID Number       Name                                                  
## 2 4309014                                                               
## 3 5700230         A B, Muhammad Yusop                                   
## 4 35077023        A Chakravarthy                                        
## 5 10207538        A E M, Doshtagir                                      
## 6 10206612        A K M, Sourab                                         
##          Fed  Sex  Tit   WTit             OTit FOA.JUN17  Gms   K   B.Day
## 1        Fed  Sex  Tit   WTit  OTit            FOA JUN17  Gms  K   B-day 
## 2        NZL  M                                    1717   0    40  0000  
## 3        MAS  M                                    1582   0    40  0000  
## 4        IND  M                                    1151   0    40  1986  
## 5        BAN  M                                    1840   0    40  1974  
## 6        BAN  M                                    1705   0    40  0000  
##   Flag
## 1 Flag
## 2     
## 3     
## 4 i   
## 5 i   
## 6
#getting rid of first line
chess <- chess[-1,]

chess$Fed <- str_trim(chess$Fed)
chess$Sex <- str_trim(chess$Sex)
chess$Tit <- str_trim(chess$Tit)
chess$WTit <- str_trim(chess$WTit)
chess$OTit <- str_trim(chess$OTit)
chess$Flag <- str_trim(chess$Flag)

#changing categories
chess$ID <- as.numeric(chess$ID)
chess$Fed <- as.factor(chess$Fed)
chess$Sex <- as.factor(chess$Sex)
chess$Tit <- as.factor(chess$Tit)
chess$WTit <- as.factor(chess$WTit)
chess$OTit <- as.factor(chess$OTit)
chess$FOA.JUN17 <- as.numeric(chess$FOA.JUN17)
## Warning: NAs introduced by coercion
chess$Gms <- as.numeric(chess$Gms)
chess$K <- as.numeric(chess$K)
chess$B.Day <- as.numeric(chess$B.Day)
chess$Flag <- as.factor(chess$Flag)

head(chess)
##         ID                                                         Name
## 2  4309014                                                             
## 3  5700230       A B, Muhammad Yusop                                   
## 4 35077023       A Chakravarthy                                        
## 5 10207538       A E M, Doshtagir                                      
## 6 10206612       A K M, Sourab                                         
## 7  5045886       A K, Kalshyan                                         
##   Fed Sex Tit WTit OTit FOA.JUN17 Gms  K B.Day Flag
## 2 NZL   M                    1717   0 40     0     
## 3 MAS   M                    1582   0 40     0     
## 4 IND   M                    1151   0 40  1986    i
## 5 BAN   M                    1840   0 40  1974    i
## 6 BAN   M                    1705   0 40     0     
## 7 IND   M                    1863   0 20  1964
levels(chess$Fed)
##   [1] "AFG" "AHO" "ALB" "ALG" "AND" "ANG" "ARG" "ARM" "ARU" "AUS" "AUT"
##  [12] "AZE" "BAH" "BAN" "BAR" "BDI" "BEL" "BER" "BHU" "BIH" "BLR" "BOL"
##  [23] "BOT" "BRA" "BRN" "BRU" "BUL" "BUR" "CAM" "CAN" "CGO" "CHI" "CHN"
##  [34] "CIV" "CMR" "COL" "CPV" "CRC" "CRO" "CUB" "CYP" "CZE" "DEN" "DJI"
##  [45] "DOM" "ECU" "EGY" "ENG" "ERI" "ESA" "ESP" "EST" "ETH" "FAI" "FID"
##  [56] "FIJ" "FIN" "FRA" "GAB" "GAM" "GCI" "GEO" "GER" "GHA" "GRE" "GUA"
##  [67] "GUM" "GUY" "HAI" "HKG" "HON" "HUN" "INA" "Ind" "IND" "IRI" "IRL"
##  [78] "IRQ" "ISL" "ISR" "ISV" "ITA" "IVB" "JAM" "JCI" "JOR" "JPN" "KAZ"
##  [89] "KEN" "KGZ" "KOR" "KOS" "KSA" "KUW" "LAO" "LAT" "LBA" "LBN" "LBR"
## [100] "LES" "LIE" "LTU" "LUX" "MAC" "MAD" "MAR" "MAS" "MAW" "MDA" "MDV"
## [111] "MEX" "MGL" "MKD" "MLI" "MLT" "MNC" "MNE" "MOZ" "MRI" "MTN" "MYA"
## [122] "NAM" "NCA" "NED" "NEP" "NGR" "NOR" "NRU" "NZL" "OMA" "PAK" "PAN"
## [133] "PAR" "PER" "PHI" "PLE" "PLW" "PNG" "POL" "POR" "PUR" "QAT" "ROU"
## [144] "RSA" "RUS" "RWA" "SCO" "SEN" "SEY" "SGP" "SLE" "SLO" "SMR" "SOL"
## [155] "SOM" "SRB" "SRI" "SSD" "STP" "SUD" "SUI" "SUR" "SVK" "SWE" "SWZ"
## [166] "SYR" "TAN" "THA" "TJK" "TKM" "TLS" "TOG" "TPE" "TTO" "TUN" "TUR"
## [177] "UAE" "UGA" "UKR" "URU" "USA" "UZB" "VEN" "VIE" "WLS" "YEM" "ZAM"
## [188] "ZIM"
#removing entries without a rating
chess <- filter(chess, !(is.na(FOA.JUN17))) 

The mean and median rating are around 1750, which is actually really surprising. I expected it to be significantly lower. This is very bad for my self-esteem - apparently I’m even worse than I thought!

summary(chess$FOA.JUN17) #mean is 1742, median is 1765
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1001    1500    1765    1742    1999    2832
ggplot(chess, aes(x = FOA.JUN17)) + geom_histogram(bins = sqrt(nrow(chess))) #way too many bins

ggplot(chess, aes(x = FOA.JUN17)) + geom_histogram(bins = 50)

Section 2: Maps. One weird, one not-so-weird.

First, let’s find the top ten chess powers based on the average ELO rating of their members. To maintain some level of sanity, I’m excluding countries with less than 31 members.

#top ten countries - total
chess_top_ten <- chess %>%
     group_by(Fed) %>%
     summarize(Number = n(), Average = mean(FOA.JUN17, na.rm = T), Median = median(FOA.JUN17, na.rm = T)) %>%
     filter(Number > 30) %>%
     arrange(desc(Average)) %>%
     filter(Average > 1968.08) #Okay, I'm being cheap here - I added this because I knew from previous code that this would yield the top ten

Well. That’s unepxected. Apparently the top powers are * Cuba * Indonesia * Myanmar * Kosovo (plot this as point) * Montenegro (plot this as point) * China * USA * Bosnia & Herzegovina (plot this as point) * FYR Macedonia (plot this as point) * Nigeria

Well, okay. Let’s get to making a map!

#adding country list
Country <- c("Cuba", "Indonesia", "Myanmar", "Kosovo", "Montenegro", "China", "USA", "Bosnia and Herzegovina", "Macedonia", "Nigeria")

chess_top_ten <- cbind(chess_top_ten, Country)

#making a map
map.world <- map_data("world")

#joining datasets
map.world_joined <- left_join(map.world, chess_top_ten, by = c('region' = 'Country'))
## Warning: Column `region`/`Country` joining character vector and factor,
## coercing into character vector
#making new variable for top ten countries
map.world_joined <- map.world_joined %>% mutate(fill_flg = ifelse(is.na(Average),F,T)) 

### make points for the small countries
df.country_points <- data.frame(country = c("Kosovo","Montenegro", "Bosnia & Herzegovina", "FYR Macedonia"),stringsAsFactors = F)

geocode.country_points <- geocode(df.country_points$country)
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=Kosovo&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=Montenegro&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=Bosnia%20&%20Herzegovina&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=FYR%20Macedonia&sensor=false
df.country_points <- cbind(df.country_points,geocode.country_points)

print(df.country_points)
##                country      lon      lat
## 1               Kosovo 20.90298 42.60264
## 2           Montenegro 19.37439 42.70868
## 3 Bosnia & Herzegovina 17.67908 43.91589
## 4        FYR Macedonia 21.74527 41.60863

Before making a map, I have a confession to make: some of the preceding code is useless. Specifically, I wanted to, as per SharpSight’s tutorial, represent tiny countries by a dot. Awesome, except when I tried, the close proximity of the countries made the map look worse. I’m keeping the code as is (in part because I’ll likely use it as a reference in my future work), but I’m not going to implement that part of the code.

Map time:

#making a map. Decided not to map the small countries since it would be such a pain in the rear 
ggplot() + 
     geom_polygon(data = map.world_joined, aes(x = long, y = lat, group = group, fill = fill_flg)) + 
  
  scale_fill_manual(values = c("#CCCCCC", "#e60000")) +
     labs(title = "Best Average Chess (min 30 players) - June 2017", 
          subtitle = "source: https://ratings.fide.com/download.phtml") +
     theme(text = element_text(family = "Open Sans", color = "#FFFFFF"),
           plot.background = element_rect(fill = "#444444"),
           panel.background = element_rect(fill = "#444444"),
           panel.grid = element_blank(),
           plot.title = element_text(size = 15),
           plot.subtitle = element_text(size = 10),
           axis.text = element_blank(),
           axis.title = element_blank(),
           axis.ticks = element_blank(),
           legend.position = "none"
     )

“Cool story, bro / sis”, I thought when I looked at this. Since this is a pet project, I want to futz around with the code until I get a map that maps onto what a chess fan might guess is a map of the strongest chess countries.

Hah! “Maps onto”. See what I did there?

My intuition is that we typically assess a country’s strength not by its average player’s perceived strength, but how strong we think the elite players are. So let’s find the top ten countries based on the average score of their top 100 players:

#Top countries based on top 100 players

chess.min.100.index <- chess %>%
     group_by(Fed) %>%
     summarize(n = n()) %>%
     filter(n >= 100)

chess.min.100 <- filter(chess, Fed %in% chess.min.100.index$Fed)

#top 10 countries based on their top 100 players
chess_top_ten_strong <- chess.min.100 %>%
     group_by(Fed) %>%
     arrange(Fed, desc(rank(FOA.JUN17))) %>%
     slice(1:100) %>% #didn't know this before. dplyr: it's pretty great!
     summarize(Number = n(), Average = mean(FOA.JUN17, na.rm = T), Median = median(FOA.JUN17, na.rm = T)) %>%
     arrange(desc(Average)) %>%
     slice(1:10) #top_n would be even better! 

Alright! Now we’re talking! Based on this metric, the top ten countries are: * Russia * Ukraine * USA * Germany * China * Poland * France * Hungary (home of my favorite living player, Judit Polgar!) * Spain * India

Sadly, Canada did not make the cut. Nor did Egypt; sorry, mom and dad…

Some deal as before: let’s make a map:

#adding country list
Country_Strong <- c("Russia", "Ukraine", "USA", "Germany", "China", "Poland", "France", "Hungary", "Spain", "India")

chess_top_ten_strong <- cbind(chess_top_ten_strong, Country_Strong)

#joining datasets
map.world_joined_strong <- left_join(map.world, chess_top_ten_strong, by = c('region' = 'Country_Strong'))
## Warning: Column `region`/`Country_Strong` joining character vector and
## factor, coercing into character vector
#making new variable for top ten countries
map.world_joined_strong <- map.world_joined_strong %>% mutate(fill_flg = ifelse(is.na(Average),F,T)) 

ggplot() + 
     geom_polygon(data = map.world_joined_strong, aes(x = long, y = lat, group = group, fill = fill_flg)) +
     scale_fill_manual(values = c("#CCCCCC", "#0000ff")) +
     labs(title = "Best Average Chess (top 100) - June 2017", 
          subtitle = "source: https://ratings.fide.com/download.phtml") +
     theme(text = element_text(family = "Open Sans", color = "#FFFFFF"),
           plot.background = element_rect(fill = "#444444"),
           panel.background = element_rect(fill = "#444444"),
           panel.grid = element_blank(),
           plot.title = element_text(size = 15),
           plot.subtitle = element_text(size = 10),
           axis.text = element_blank(),
           axis.title = element_blank(),
           axis.ticks = element_blank(),
           legend.position = "none"
     )

Top Secret Bonus Section

As I said, this is a simple, unambitious bit of code that would be outright theft had I not given credit to SharpSight Labs. Other than a cheap way for me to store a script that I can use later, what can we conclude from this?