title: “Data 607 - Project 1”

author: “Sufian”

date: “9/18/2019”

output: html_document

Rpubs Links:

http://www.rpubs.com/ssufian/530973

GitHub Links:

https://github.com/ssufian/Data_607

In this project, the task is to transform a text file with chess tournament results into a clean tidy data

format and store the results as a .CSV file with the following information for all of the players:

Player’s Name

Player’s State

Total Number of Points

Player’s Pre-Rating

Average Pre Chess Rating of Opponents

Load the Tournament Data

# Load necessary libraries
library(stringr)
library(DT)
library(tidyr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
url <- 'https://raw.githubusercontent.com/ssufian/Data_607/master/tournamentinfo.txt'
mydata <- read.table(file =url, header=F , stringsAsFactors =FALSE, sep = ",")

mydata[1:10,]
##  [1] "-----------------------------------------------------------------------------------------" 
##  [2] " Pair | Player Name                     |Total|Round|Round|Round|Round|Round|Round|Round| "
##  [3] " Num  | USCF ID / Rtg (Pre->Post)       | Pts |  1  |  2  |  3  |  4  |  5  |  6  |  7  | "
##  [4] "-----------------------------------------------------------------------------------------" 
##  [5] "    1 | GARY HUA                        |6.0  |W  39|W  21|W  18|W  14|W   7|D  12|D   4|" 
##  [6] "   ON | 15445895 / R: 1794   ->1817     |N:2  |W    |B    |W    |B    |W    |B    |W    |" 
##  [7] "-----------------------------------------------------------------------------------------" 
##  [8] "    2 | DAKSHESH DARURI                 |6.0  |W  63|W  58|L   4|W  17|W  16|W  20|W   7|" 
##  [9] "   MI | 14598900 / R: 1553   ->1663     |N:2  |B    |W    |B    |W    |B    |W    |B    |" 
## [10] "-----------------------------------------------------------------------------------------"

Replace dash lines with empty spaces

library (stringr)
newdata <- data.frame(str_replace_all(mydata$V1,"-",""))
newdata[1:10,]
##  [1]                                                                                           
##  [2]  Pair | Player Name                     |Total|Round|Round|Round|Round|Round|Round|Round| 
##  [3]  Num  | USCF ID / Rtg (Pre>Post)       | Pts |  1  |  2  |  3  |  4  |  5  |  6  |  7  |  
##  [4]                                                                                           
##  [5]     1 | GARY HUA                        |6.0  |W  39|W  21|W  18|W  14|W   7|D  12|D   4| 
##  [6]    ON | 15445895 / R: 1794   >1817     |N:2  |W    |B    |W    |B    |W    |B    |W    |  
##  [7]                                                                                           
##  [8]     2 | DAKSHESH DARURI                 |6.0  |W  63|W  58|L   4|W  17|W  16|W  20|W   7| 
##  [9]    MI | 14598900 / R: 1553   >1663     |N:2  |B    |W    |B    |W    |B    |W    |B    |  
## [10]                                                                                           
## 131 Levels:  ...

Removing empty spaces

#take only populated rows that do not contain emptly lines
newdata <- as.data.frame(newdata[!apply(newdata == "", 1, all),])

newdata[1:10,]
##  [1]  Pair | Player Name                     |Total|Round|Round|Round|Round|Round|Round|Round| 
##  [2]  Num  | USCF ID / Rtg (Pre>Post)       | Pts |  1  |  2  |  3  |  4  |  5  |  6  |  7  |  
##  [3]     1 | GARY HUA                        |6.0  |W  39|W  21|W  18|W  14|W   7|D  12|D   4| 
##  [4]    ON | 15445895 / R: 1794   >1817     |N:2  |W    |B    |W    |B    |W    |B    |W    |  
##  [5]     2 | DAKSHESH DARURI                 |6.0  |W  63|W  58|L   4|W  17|W  16|W  20|W   7| 
##  [6]    MI | 14598900 / R: 1553   >1663     |N:2  |B    |W    |B    |W    |B    |W    |B    |  
##  [7]     3 | ADITYA BAJAJ                    |6.0  |L   8|W  61|W  25|W  21|W  11|W  13|W  12| 
##  [8]    MI | 14959604 / R: 1384   >1640     |N:2  |W    |B    |W    |B    |W    |B    |W    |  
##  [9]     4 | PATRICK H SCHILLING             |5.5  |W  23|D  28|W   2|W  26|D   5|W  19|D   1| 
## [10]    MI | 12616049 / R: 1716   >1744     |N:2  |W    |B    |W    |B    |W    |B    |B    |  
## 131 Levels:  ...
odd <- newdata[seq(1,nrow(newdata),2),] # only odd rows
even <- newdata[seq(2,nrow(newdata),2),] # only even rows

head(odd) #check out my odd rows
## [1]  Pair | Player Name                     |Total|Round|Round|Round|Round|Round|Round|Round| 
## [2]     1 | GARY HUA                        |6.0  |W  39|W  21|W  18|W  14|W   7|D  12|D   4| 
## [3]     2 | DAKSHESH DARURI                 |6.0  |W  63|W  58|L   4|W  17|W  16|W  20|W   7| 
## [4]     3 | ADITYA BAJAJ                    |6.0  |L   8|W  61|W  25|W  21|W  11|W  13|W  12| 
## [5]     4 | PATRICK H SCHILLING             |5.5  |W  23|D  28|W   2|W  26|D   5|W  19|D   1| 
## [6]     5 | HANSHI ZUO                      |5.5  |W  45|W  37|D  12|D  13|D   4|W  14|W  17| 
## 131 Levels:  ...
head(even) #check out my even rows
## [1]  Num  | USCF ID / Rtg (Pre>Post)       | Pts |  1  |  2  |  3  |  4  |  5  |  6  |  7  | 
## [2]    ON | 15445895 / R: 1794   >1817     |N:2  |W    |B    |W    |B    |W    |B    |W    | 
## [3]    MI | 14598900 / R: 1553   >1663     |N:2  |B    |W    |B    |W    |B    |W    |B    | 
## [4]    MI | 14959604 / R: 1384   >1640     |N:2  |W    |B    |W    |B    |W    |B    |W    | 
## [5]    MI | 12616049 / R: 1716   >1744     |N:2  |W    |B    |W    |B    |W    |B    |B    | 
## [6]    MI | 14601533 / R: 1655   >1690     |N:2  |B    |W    |B    |W    |B    |W    |B    | 
## 131 Levels:  ...

Segregating my even and odd rows, checking it out

Merging rows into columns into a temp table; called new_table

#Merging the odd & the even rows into a newly created empty place holder table

new_table <- data.frame(c())
new_table <- data.frame(paste(odd,even))

new_table[1:5,]
## [1]  Pair | Player Name                     |Total|Round|Round|Round|Round|Round|Round|Round|   Num  | USCF ID / Rtg (Pre>Post)       | Pts |  1  |  2  |  3  |  4  |  5  |  6  |  7  | 
## [2]     1 | GARY HUA                        |6.0  |W  39|W  21|W  18|W  14|W   7|D  12|D   4|    ON | 15445895 / R: 1794   >1817     |N:2  |W    |B    |W    |B    |W    |B    |W    |  
## [3]     2 | DAKSHESH DARURI                 |6.0  |W  63|W  58|L   4|W  17|W  16|W  20|W   7|    MI | 14598900 / R: 1553   >1663     |N:2  |B    |W    |B    |W    |B    |W    |B    |  
## [4]     3 | ADITYA BAJAJ                    |6.0  |L   8|W  61|W  25|W  21|W  11|W  13|W  12|    MI | 14959604 / R: 1384   >1640     |N:2  |W    |B    |W    |B    |W    |B    |W    |  
## [5]     4 | PATRICK H SCHILLING             |5.5  |W  23|D  28|W   2|W  26|D   5|W  19|D   1|    MI | 12616049 / R: 1716   >1744     |N:2  |W    |B    |W    |B    |W    |B    |B    |  
## 65 Levels:     1 | GARY HUA                        |6.0  |W  39|W  21|W  18|W  14|W   7|D  12|D   4|    ON | 15445895 / R: 1794   >1817     |N:2  |W    |B    |W    |B    |W    |B    |W    | ...

Merge the even and odd rows in columns

Creating headers in the merge table while eliminating the pipes (|) deliminators and deleting first rows

since the new headers were created; like starting from a clean slate of a new table

# Create headers
Headers <- c("Pair","Player Name","Total","Round 1","Round 2","Round 3", "Round 4","Round 5","Round 6","Round 7","State","USCF ID / Rtg (Pre->Post)","Pts","1","2","3","4","5","6","7","Avg Pre Rating")

# Eliminating | and Separating into columns

df <- separate(data = new_table, col=paste.odd..even., into = Headers, sep = "\\|")

# HOusekeeping: Delete row containing all columns names
df <- df[-c(1), ]

df[1:5,]
##     Pair                       Player Name Total Round 1 Round 2 Round 3
## 2     1   GARY HUA                         6.0     W  39   W  21   W  18
## 3     2   DAKSHESH DARURI                  6.0     W  63   W  58   L   4
## 4     3   ADITYA BAJAJ                     6.0     L   8   W  61   W  25
## 5     4   PATRICK H SCHILLING              5.5     W  23   D  28   W   2
## 6     5   HANSHI ZUO                       5.5     W  45   W  37   D  12
##   Round 4 Round 5 Round 6 Round 7   State        USCF ID / Rtg (Pre->Post)
## 2   W  14   W   7   D  12   D   4     ON   15445895 / R: 1794   >1817     
## 3   W  17   W  16   W  20   W   7     MI   14598900 / R: 1553   >1663     
## 4   W  21   W  11   W  13   W  12     MI   14959604 / R: 1384   >1640     
## 5   W  26   D   5   W  19   D   1     MI   12616049 / R: 1716   >1744     
## 6   D  13   D   4   W  14   W  17     MI   14601533 / R: 1655   >1690     
##     Pts     1     2     3     4     5     6     7 Avg Pre Rating
## 2 N:2   W     B     W     B     W     B     W                   
## 3 N:2   B     W     B     W     B     W     B                   
## 4 N:2   W     B     W     B     W     B     W                   
## 5 N:2   W     B     W     B     W     B     B                   
## 6 N:2   B     W     B     W     B     W     B
# Extracting numerical values from "USCF ID / Rtg (Pre->Post)" column for calculating averages later

temp <- str_extract_all(df$`USCF ID / Rtg (Pre->Post)`,"\\b\\d{1,}")
temp <- data.frame(as.character(temp)) #making into dataframe
head(temp)
##              as.character.temp.
## 1 c("15445895", "1794", "1817")
## 2 c("14598900", "1553", "1663")
## 3 c("14959604", "1384", "1640")
## 4 c("12616049", "1716", "1744")
## 5 c("14601533", "1655", "1690")
## 6 c("15055204", "1686", "1687")
# breaking out the columns; extracting the 3 numericals

# Split the data frame: 1 column -> 3 different columns

temp <- separate(data = temp, col = as.character.temp., into = c("col1","col2","col3"), sep = ",")

head(temp)
##           col1    col2     col3
## 1 c("15445895"  "1794"  "1817")
## 2 c("14598900"  "1553"  "1663")
## 3 c("14959604"  "1384"  "1640")
## 4 c("12616049"  "1716"  "1744")
## 5 c("14601533"  "1655"  "1690")
## 6 c("15055204"  "1686"  "1687")
# Temporary column vectors
ID <- str_extract_all(temp$col1,"[[:digit:]]{1,}")
Pre_rate <- str_extract_all(temp$col2,"[[:digit:]]{1,}")
Post_rate <- str_extract_all(temp$col3,"[[:digit:]]{1,}")

#converting ID, pre rating & post rating columns into numerics for calculation purposes
df$`USCF ID` <- as.numeric(ID)
df$`Pre Rating` <- as.numeric(Pre_rate)
df$`Post Rating` <- as.numeric(Post_rate)

head(df)
##     Pair                       Player Name Total Round 1 Round 2 Round 3
## 2     1   GARY HUA                         6.0     W  39   W  21   W  18
## 3     2   DAKSHESH DARURI                  6.0     W  63   W  58   L   4
## 4     3   ADITYA BAJAJ                     6.0     L   8   W  61   W  25
## 5     4   PATRICK H SCHILLING              5.5     W  23   D  28   W   2
## 6     5   HANSHI ZUO                       5.5     W  45   W  37   D  12
## 7     6   HANSEN SONG                      5.0     W  34   D  29   L  11
##   Round 4 Round 5 Round 6 Round 7   State        USCF ID / Rtg (Pre->Post)
## 2   W  14   W   7   D  12   D   4     ON   15445895 / R: 1794   >1817     
## 3   W  17   W  16   W  20   W   7     MI   14598900 / R: 1553   >1663     
## 4   W  21   W  11   W  13   W  12     MI   14959604 / R: 1384   >1640     
## 5   W  26   D   5   W  19   D   1     MI   12616049 / R: 1716   >1744     
## 6   D  13   D   4   W  14   W  17     MI   14601533 / R: 1655   >1690     
## 7   W  35   D  10   W  27   W  21     OH   15055204 / R: 1686   >1687     
##     Pts     1     2     3     4     5     6     7 Avg Pre Rating  USCF ID
## 2 N:2   W     B     W     B     W     B     W                    15445895
## 3 N:2   B     W     B     W     B     W     B                    14598900
## 4 N:2   W     B     W     B     W     B     W                    14959604
## 5 N:2   W     B     W     B     W     B     B                    12616049
## 6 N:2   B     W     B     W     B     W     B                    14601533
## 7 N:3   W     B     W     B     B     W     B                    15055204
##   Pre Rating Post Rating
## 2       1794        1817
## 3       1553        1663
## 4       1384        1640
## 5       1716        1744
## 6       1655        1690
## 7       1686        1687

Subsetting the temp table into smaller one and taking only the relevant columns

df_new <- subset(df, select = c(1,2,11,22,3,23,24,21))
head(df_new)
##     Pair                       Player Name   State  USCF ID Total
## 2     1   GARY HUA                             ON  15445895 6.0  
## 3     2   DAKSHESH DARURI                      MI  14598900 6.0  
## 4     3   ADITYA BAJAJ                         MI  14959604 6.0  
## 5     4   PATRICK H SCHILLING                  MI  12616049 5.5  
## 6     5   HANSHI ZUO                           MI  14601533 5.5  
## 7     6   HANSEN SONG                          OH  15055204 5.0  
##   Pre Rating Post Rating Avg Pre Rating
## 2       1794        1817               
## 3       1553        1663               
## 4       1384        1640               
## 5       1716        1744               
## 6       1655        1690               
## 7       1686        1687
class(head(df_new$'Pre Rating')) # check class type
## [1] "numeric"
class(head(df_new$`Avg Pre Rating`)) # check class type
## [1] "character"
# converting characters in Avg. Pre Rating columns to numeric
df_new$`Avg Pre Rating` <- 0

head(df_new)
##     Pair                       Player Name   State  USCF ID Total
## 2     1   GARY HUA                             ON  15445895 6.0  
## 3     2   DAKSHESH DARURI                      MI  14598900 6.0  
## 4     3   ADITYA BAJAJ                         MI  14959604 6.0  
## 5     4   PATRICK H SCHILLING                  MI  12616049 5.5  
## 6     5   HANSHI ZUO                           MI  14601533 5.5  
## 7     6   HANSEN SONG                          OH  15055204 5.0  
##   Pre Rating Post Rating Avg Pre Rating
## 2       1794        1817              0
## 3       1553        1663              0
## 4       1384        1640              0
## 5       1716        1744              0
## 6       1655        1690              0
## 7       1686        1687              0
df_new[39,] #double checking on a single Player
##      Pair                       Player Name   State  USCF ID Total
## 40    39   JOEL R HENDON                        MI  12923035 3.0  
##    Pre Rating Post Rating Avg Pre Rating
## 40       1436        1413              0

Extricating numericals from opponents to compute its avg scores and create a Opponents only table

# Extracting numerical values from rounds played (opponents against)
opp1 <- data.frame(as.numeric(str_extract_all(df$`Round 1`,"[[:digit:]]{1,}")))
opp2 <- data.frame(as.numeric(str_extract_all(df$`Round 2`,"[[:digit:]]{1,}")))
opp3 <- data.frame(as.numeric(str_extract_all(df$`Round 3`,"[[:digit:]]{1,}")))
opp4 <- data.frame(as.numeric(str_extract_all(df$`Round 4`,"[[:digit:]]{1,}")))
opp5 <- data.frame(as.numeric(str_extract_all(df$`Round 5`,"[[:digit:]]{1,}")))
opp6 <- data.frame(as.numeric(str_extract_all(df$`Round 6`,"[[:digit:]]{1,}")))
opp7 <- data.frame(as.numeric(str_extract_all(df$`Round 7`,"[[:digit:]]{1,}")))

#  build an Opponents data frame
opponents <- cbind(df$Pair, df$`Player Name`, opp1, opp2, opp3, opp4, opp5, opp6, opp7)
names(opponents) <- c("Pair","Player Name","Opp 1","Opp 2","Opp 3","Opp 4","Opp 5","Opp 6","Opp 7")

head(opponents) # Opponents only Dataframe
##     Pair                       Player Name Opp 1 Opp 2 Opp 3 Opp 4 Opp 5
## 1     1   GARY HUA                            39    21    18    14     7
## 2     2   DAKSHESH DARURI                     63    58     4    17    16
## 3     3   ADITYA BAJAJ                         8    61    25    21    11
## 4     4   PATRICK H SCHILLING                 23    28     2    26     5
## 5     5   HANSHI ZUO                          45    37    12    13     4
## 6     6   HANSEN SONG                         34    29    11    35    10
##   Opp 6 Opp 7
## 1    12     4
## 2    20     7
## 3    13    12
## 4    19     1
## 5    14    17
## 6    27    21

Continuing with the Opponents only table, calculate how many games each of them had played

# Detecting any NA replaced and by 0.
opponents[is.na(opponents)] <- 0

head(opponents)
##     Pair                       Player Name Opp 1 Opp 2 Opp 3 Opp 4 Opp 5
## 1     1   GARY HUA                            39    21    18    14     7
## 2     2   DAKSHESH DARURI                     63    58     4    17    16
## 3     3   ADITYA BAJAJ                         8    61    25    21    11
## 4     4   PATRICK H SCHILLING                 23    28     2    26     5
## 5     5   HANSHI ZUO                          45    37    12    13     4
## 6     6   HANSEN SONG                         34    29    11    35    10
##   Opp 6 Opp 7
## 1    12     4
## 2    20     7
## 3    13    12
## 4    19     1
## 5    14    17
## 6    27    21
opponents[is.na(opponents)] <- 0 # NA replacement by zero
count = 0
# Finding number of games played
for(i in 1:nrow(opponents)){
    for(j in 3:9){
    if(opponents[i,j]>0){
      count = count + 1
    }
    opponents$No_of_Games_Played[i] <- sum(count)
    
    }
  count = 0
}

head(opponents)
##     Pair                       Player Name Opp 1 Opp 2 Opp 3 Opp 4 Opp 5
## 1     1   GARY HUA                            39    21    18    14     7
## 2     2   DAKSHESH DARURI                     63    58     4    17    16
## 3     3   ADITYA BAJAJ                         8    61    25    21    11
## 4     4   PATRICK H SCHILLING                 23    28     2    26     5
## 5     5   HANSHI ZUO                          45    37    12    13     4
## 6     6   HANSEN SONG                         34    29    11    35    10
##   Opp 6 Opp 7 No_of_Games_Played
## 1    12     4                  7
## 2    20     7                  7
## 3    13    12                  7
## 4    19     1                  7
## 5    14    17                  7
## 6    27    21                  7

Matching up players (via Pairing column) with its opponents played and compute the pre rating averages

#Resetting Avg Pre Rating column to zero before calculating their averages
df_new$`Avg Pre Rating` <- 0

# Compute Average Pre-Rating for each player
for (k in 3:9){
  for (j in 1:nrow(df_new)){
    for (i in 1:nrow(df_new)){
      if (as.numeric(opponents[j,k]) == as.numeric(df_new$Pair[i])){
        df_new$`Avg Pre Rating`[j] <- as.numeric(df_new$`Avg Pre Rating`[j]) + as.numeric(df_new$`Pre Rating`[i])
      }
    }
  }
}

# Each player's average based on the number of played games
df_new$`Avg Pre Rating`  <- round(as.numeric(df_new$`Avg Pre Rating`) / opponents$No_of_Games_Played,0)

df_new[1:10,]
##      Pair                       Player Name   State  USCF ID Total
## 2      1   GARY HUA                             ON  15445895 6.0  
## 3      2   DAKSHESH DARURI                      MI  14598900 6.0  
## 4      3   ADITYA BAJAJ                         MI  14959604 6.0  
## 5      4   PATRICK H SCHILLING                  MI  12616049 5.5  
## 6      5   HANSHI ZUO                           MI  14601533 5.5  
## 7      6   HANSEN SONG                          OH  15055204 5.0  
## 8      7   GARY DEE SWATHELL                    MI  11146376 5.0  
## 9      8   EZEKIEL HOUGHTON                     MI  15142253 5.0  
## 10     9   STEFANO LEE                          ON  14954524 5.0  
## 11    10   ANVIT RAO                            MI  14150362 5.0  
##    Pre Rating Post Rating Avg Pre Rating
## 2        1794        1817           1605
## 3        1553        1663           1469
## 4        1384        1640           1564
## 5        1716        1744           1574
## 6        1655        1690           1501
## 7        1686        1687           1519
## 8        1649        1673           1372
## 9        1641        1657           1468
## 10       1411        1564           1523
## 11       1365        1544           1554

Data Analysis via ggplotings

library(ggplot2)
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
require(ggthemes)
## Loading required package: ggthemes
# Scatter plot to Check if there's any reasonable trend?

p1 <- ggplot(df_new, aes(df_new$'Pre Rating', y = df_new$'Post Rating')) + geom_point(aes(color=State))+theme_economist()

p1 <- ggplotly(p1)

p1
fit <- lm(df_new$'Avg Pre Rating' ~ df_new$'Pre Rating', data = df_new)

p3 <- ggplot(df_new, aes(df_new$'Pre Rating', y = df_new$'Avg Pre Rating')) + 
  geom_point(aes(color=df_new$Total)) + theme_economist() +
  stat_smooth(method = "lm", col = "red") +
  
    ggtitle(paste("Adj R2 = ",signif(summary(fit)$adj.r.squared, 5),";
                     Intercept =",signif(fit$coef[[1]],5 ),";
                     Slope =",signif(fit$coef[[2]], 5),";
                     P =",signif(summary(fit)$coef[2,4], 5)))
  
  

p3 <- ggplotly(p3)

p3

Observations from Scatterplots:

  1. By States: There appears to have a positive correlation between player’s pre tournament ratings vs. 

    opponents’ post tournament ratings. By states, they appear high-ranking players or very good

    players were retaining their ranking after the tournament; Does this mean that good players

    were playing lower ranked players and therefore were able to maintain their high status? That is a

    question we shall seek to answer with other type of charts, if possible.

  2. The 2nd chart showed a comparison between the player’s own pre-rating vs. their opponents’ average

    pre-rating. The color corresponds to the total number of points scored in the tournament.

    All observations above the red diagonal line represent players who had a higher pre-rating than

    the average of their opponents’ pre tournament ratings, and the points below the line is the

    reverse.

Possible Conclusion from Scatterplots?

chart 1 showed players across states had maintained their ranking status post-tournament;

alluding to speculation that high-ranking players might be playing against lower-ranking players. 

Chart 2 gives suggest that its almost half of highly ranked players were matched against lower ranked

players. This is visually discerning by the Red regression line which is almost flat as evidenced by

small slope of 0.128, see 2nd scatter plot.
#histogram to confirm that first scatter plot (chart 1) was misleading

hist(df_new$`Avg Pre Rating`, main="Avg Opponents Pre Tournament Ratings", xlab="opponents' Avg pre rating")

hist(df_new$`Pre Rating`, main="Players' Own Pre Tournament Ratings", xlab="Players' own Pre rating")

hist(df_new$`Post Rating`, main="Players' Own Post Tournament Ratings", xlab="Players' own Post rating")

Observations from Basic Histograms, to confirm or deny the hypothesis above?

  1. The first histogram which is normally distributed, showed a lower avg. pre-tournament ranking for

    the opponents of around 1350. The 2nd histogram, showed a left-skewed distribution with a

    higher pre-tournament ranking of around 1500. The 3rd histogram also showed a left-skwed

    distribution with a higher post-tournament ranking, albeit, being more spread out.

  2. The 2 histogram charts, suggested that given, the distributions of the opponents’s and the players’

    ranking, its is more likely than not that they were match up against lower-ranked players and thus

    able to retained their post-tournament status fairly well.

ggplot(df_new, aes(df$'State',df$`Pre Rating`)) +geom_boxplot()

ggplot(df_new, aes(df$'State',df$`Post Rating`)) +geom_boxplot()

Observations of distribution of own players’ ranking from Box plots:

  1. Perhaps, Box plots does a better job in explaining the rankings of players before and after the

    tournament.

    By States again: Except for Ohio which had only 1 entry (therefore no distribution), both Michigan

    and Ontario players’ had approximately the same median pre-tournment rankings vs. post tournment

    rankings. Also, they had a smaller pre-tournament spread while their post-tournament spread

    increased.

#Creating a data table 
datatable(df_new, rownames=FALSE)
#Writing to CSV file
write.table(df_new,file="Sufian_Suwarman_Project1.csv",row.names=FALSE,col.names=TRUE,sep=",")