Linear Hierarchy Demo

A demonstrtaion of the tools currently in the linear hierarchy package

When individuals operate within a group setting it is often advantageous to rate
these individuals; henceforth refered as players, by their skill or inluence as
measured by dayadic interactions among themselves. For instance if player A
competes with player B several times and consistently wins then we would
expect player A to be rated “better” than player B. If we introduce a third
player, C, to the group who competes with both A and B then based on their
interactions we could find some process to rate these players. Rating players
based on their interactions within a particular group has many applications,
however in contemporary society we are most often concerned with ratings when
they are associated with games of skill such as in professional sports or games.

plot of chunk unnamed-chunk-1

Ratings are also used in animal behavior research as a way of describing the
social structure of a group of animals. Researchers will use dominance
interactions in order to build a linear heirarchy and assess how individuals of
different rankings have different social or biological experiences.

plot of chunk unnamed-chunk-2

Regardless of the application devising an appropriate algorithm for rating
players is an essential step in this process. Below is a demonstration
of the linHierarchy package currently being developed for R in which users may
apply different popular rating algorithms in order to produce ratings which
best describe the nature of group interactions. The methods that are currently
available are:

Inconsistancy and Strength of Inconsitancy (I&SI) method H. de Vries
David's Score H.A. David
Elo Rating A. Elo

linHierarchy Package

The linHierarchy package is currently under construction however many functions
are currently availabe for testing. If you would like to install the package you
will need to have installed the devtools package from CRAN in order to pull the
latest release form the github repository.

require(devtools)

install_github("nmmarquez/linHierarchy")

library(linHierarchy)  # load the package and dependencies

## Loading required package: coda
## Loading required package: lattice
## Loading required package: MASS
## Loading required package: MCMCpack
## ##
## ## Markov Chain Monte Carlo Package (MCMCpack)
## ## Copyright (C) 2003-2014 Andrew D. Martin, Kevin M. Quinn, and Jong Hee Park
## ##
## ## Support provided by the U.S. National Science Foundation
## ## (Grants SES-0350646 and SES-0350613)
## ##

Creating and manipulating basic interaction data

We will first need to generate a set of generic data which we can work with.

# generate generic data
interactions <- data.frame(a = sample(letters[1:10], 100, T), b = sample(letters[1:10], 
    100, T), o = sample(c(-1, -1, 0, 1, 1), 100, T), d = Sys.time() + runif(100, 
    40, 160))
head(interactions)

##   a b  o                   d
## 1 c f -1 2014-06-03 13:43:30
## 2 h d  1 2014-06-03 13:41:57
## 3 e e  1 2014-06-03 13:43:29
## 4 i j  0 2014-06-03 13:43:23
## 5 j e  0 2014-06-03 13:43:11
## 6 a i  1 2014-06-03 13:42:53

In this data set the first and second column represents the first and second
player of an interaction respectively. The third column represents the outcome
of the interaction where 1 is a win for the first player, a -1 for the second,
and a zero indicates a tie. The fourth column is the time of the interaction.
In order for the linHierarchy package to create player ratings, your data will
need to be in a similar structure so that it may be created to an object of
class “interData” by the intTableConv function. The help page for the function
describes all the necessary preprocessing necessary in order for the object to
be properly created.

id1 <- intTableConv(interactions)
id1

## $players
##  [1] "i" "h" "f" "a" "g" "d" "e" "b" "j" "c"
## 
## $datetime
##                     start                       end 
## "2014-06-03 13:41:57 EDT" "2014-06-03 13:43:55 EDT" 
## 
##   player.1 player.2 outcome            datetime
## 1        i        g      -1 2014-06-03 13:41:57
## 2        h        d       1 2014-06-03 13:41:57
## 3        i        c      -1 2014-06-03 13:41:58
## 4        h        j      -1 2014-06-03 13:42:00
## 5        f        a       0 2014-06-03 13:42:00
## 6        f        j       1 2014-06-03 13:42:02

“interData” objects are esentially a special type of list where the first
element is a character vector of the players in the group, the second is the
range of time that the data covers, and the third is a data frame where each
row is a seperate interaction.“interData” objects are the base object of the
linHierarchy package for which everything is built around.

For instance if we can subset our interactions so that they only include
specific players, check the number of interactions between two players, or build
a matrix showing the number of times any one player has defeated another using a
single function call on the “interData” object.

subset(id1, players = id1$players[1:5])  # interactions with specific players

## $players
## [1] "i" "f" "a" "g" "h"
## 
## $datetime
##                     start                       end 
## "2014-06-03 13:41:57 EDT" "2014-06-03 13:43:40 EDT" 
## 
##   player.1 player.2 outcome            datetime
## 1        i        g      -1 2014-06-03 13:41:57
## 2        f        a       0 2014-06-03 13:42:00
## 3        a        g      -1 2014-06-03 13:42:08
## 4        a        i      -1 2014-06-03 13:42:12
## 5        g        f       1 2014-06-03 13:42:13
## 6        i        h      -1 2014-06-03 13:42:19

numInt(id1$players[1], id1$players[2], id1)  # interaction count between players

## [1] 4

toInterMat(id1)  # matrix where cell is the times row player defeats col player

##   i h f a g d e b j c
## i 0 3 0 1 0 1 0 2 0 0
## h 1 0 0 1 1 1 2 1 0 0
## f 1 0 0 1 0 0 0 2 1 2
## a 1 0 0 0 1 1 1 0 0 0
## g 2 0 1 1 0 2 1 0 0 0
## d 0 0 1 0 4 0 1 1 1 1
## e 0 1 1 1 1 0 0 0 0 0
## b 0 0 1 0 2 0 3 0 1 1
## j 1 1 1 0 1 2 1 0 0 1
## c 2 2 0 0 1 2 1 3 1 0

Genrating ratings/rankings with various algorithms

In addition we can apply functions to the “interData” object which generate
hierarchies as per the methods listed above.

The David's Score method developed by H.A. David, generates a numeric value for
players based on their average winning perctentage against another player.

davidScore(id1)

##    players    score
## 1        c  18.4167
## 2        j  16.1667
## 3        h   8.0833
## 4        d  -0.3333
## 5        f  -1.4167
## 6        b  -3.2500
## 7        i  -5.0833
## 8        a  -7.1667
## 9        g -10.6667
## 10       e -14.7500

The I&SI method on the other hand, builds a rating system with no numeric values
however is better able to deal with instances where transitivity of dominance is
assumed and missing interactions can be dealt with. The output for the I&SI
method returnsa lsit of several objects which are better described in Han de
Vries (1998) Finding a Dominance Order Most Consistant with a Linear Hierarchy.

IaSI(id1)

## $rankings
##    ranking players
## 1        1       f
## 2        2       c
## 3        3       j
## 4        4       i
## 5        5       h
## 6        6       a
## 7        7       d
## 8        8       b
## 9        9       e
## 10      10       g
## 
## $IandSI
##  I SI 
##  5 34 
## 
## $dom.Matrix
##     f   c   j   i h   a d b   e   g
## f 0.0 1.0 0.5 1.0 0 1.0 0 1 0.0 0.0
## c 0.0 0.0 0.5 1.0 1 0.0 1 1 1.0 1.0
## j 0.5 0.5 0.0 1.0 1 0.0 1 0 1.0 1.0
## i 0.0 0.0 0.0 0.0 1 0.5 1 1 0.0 0.0
## h 0.0 0.0 0.0 0.0 0 1.0 1 1 1.0 1.0
## a 0.0 0.0 0.0 0.5 0 0.0 1 0 0.5 0.5
## d 1.0 0.0 0.0 0.0 0 0.0 0 1 1.0 1.0
## b 0.0 0.0 1.0 0.0 0 0.0 0 0 1.0 1.0
## e 1.0 0.0 0.0 0.0 0 0.5 0 0 0.0 0.5
## g 1.0 0.0 0.0 1.0 0 0.5 0 0 0.5 0.0

The elo method developed by Arpad Elo again develops numeric ratings and tracks
changes over time. The eloTable function generates an object of class “eloTable”
which may be plotted to visually track these changes.

et1 <- eloTable(id1)  # create elo table object
extractScores(et1)  # extract the latest scores from the table

##     player score            datetime
## 167      j  1137 2014-06-03 13:43:38
## 183      c  1131 2014-06-03 13:43:54
## 170      i  1067 2014-06-03 13:43:40
## 184      d  1048 2014-06-03 13:43:54
## 186      b  1021 2014-06-03 13:43:55
## 146      a  1000 2014-06-03 13:43:21
## 185      f   959 2014-06-03 13:43:55
## 182      e   950 2014-06-03 13:43:53
## 169      h   882 2014-06-03 13:43:40
## 176      g   805 2014-06-03 13:43:51

plot(et1)  # plot the changes in individuals scores

plot of chunk unnamed-chunk-10

There are also Bayesian methods of taht may be used to calculate player ratings based on the BRadley-Terry pairwise comparsions model. using the MCMCBradTerr function an appropriate data frame is created to parametrize player ratings to pass to the MCMClogit function of MCMCpack. The function used to generate a proper matrix for player ratings using a logit model is also provided so that users can also apply frequentists methods to the data or further parametrize the model.

prDF <- toBayesDF(id1)  # create data frame for player rating paramterization
summary(glm(outcome ~ . - 1, data = prDF[, -1], family = "binomial"))  # logistic regression Bradley-Terry model

## 
## Call:
## glm(formula = outcome ~ . - 1, family = "binomial", data = prDF[, 
##     -1])
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.831  -0.998  -0.782   1.150   1.626  
## 
## Coefficients:
##   Estimate Std. Error z value Pr(>|z|)
## h  -0.0174     0.6866   -0.03     0.98
## f   0.5960     0.8066    0.74     0.46
## a  -0.4391     0.8228   -0.53     0.59
## g  -0.4168     0.7082   -0.59     0.56
## d   0.1560     0.7272    0.21     0.83
## e  -0.8729     0.8077   -1.08     0.28
## b   0.0915     0.7168    0.13     0.90
## j   0.9543     0.8228    1.16     0.25
## c   1.0658     0.7412    1.44     0.15
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 101.199  on 73  degrees of freedom
## Residual deviance:  91.839  on 64  degrees of freedom
## AIC: 109.8
## 
## Number of Fisher Scoring iterations: 4


posterior <- MCMCBradTerr(id1)  # Create posterior using algorithm from MCMCpack
summary(posterior)  # Create posterior using MCMC algorith from MCMCpack

## 
## Iterations = 1001:11000
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 10000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##       Mean    SD Naive SE Time-series SE
## h  0.00985 0.725  0.00725         0.0385
## f  0.71299 0.826  0.00826         0.0440
## a -0.53279 0.914  0.00914         0.0528
## g -0.50350 0.726  0.00726         0.0383
## d  0.18299 0.769  0.00769         0.0423
## e -1.06827 0.856  0.00856         0.0466
## b  0.08252 0.757  0.00757         0.0415
## j  1.17715 0.872  0.00872         0.0494
## c  1.20786 0.816  0.00816         0.0468
## 
## 2. Quantiles for each variable:
## 
##     2.5%    25%      50%     75% 97.5%
## h -1.375 -0.477 -0.00148  0.4801 1.501
## f -0.915  0.171  0.72035  1.2567 2.379
## a -2.350 -1.172 -0.48762  0.1131 1.216
## g -1.883 -0.989 -0.50932 -0.0186 0.932
## d -1.300 -0.324  0.17870  0.6749 1.745
## e -2.766 -1.621 -1.07162 -0.4781 0.581
## b -1.389 -0.418  0.04241  0.5559 1.596
## j -0.517  0.596  1.17862  1.7171 2.938
## c -0.306  0.644  1.16078  1.7224 2.898

plot(posterior)

plot of chunk unnamed-chunk-11

Additional functions for manipulating “interData” objects exist as well as other
algorithms for generating player ratings/rankings. Full details can be found at
our github repository. https://github.com/nmmarquez/linHierarchy