Using some MLS data, I wanted to illustrate some issues that I think are important to be wary of when reading about ELO ratings. I’m not saying that ELO ratings are invalid - but rather depending upon what user choices we make (regarding the constant used, for instance), we may be addressing different questions to those that we think we are. I also caution about using these ratings to rank teams as the 1st, 2nd, 3rd etc. These ratings should be interpreted more as giving you a guide as to the respective differnce in win probability between two teams - not a categorical assessment of who is definitively better. I prefer the Glicko ratings system as it adds in a deviation measure that gets at this issue.
I have written this very quickly and without too much discussion. I hope it’s easy to follow as I didn’t initially write it with the idea that I would post it and haven’t had time to edit. I might possibly add to it in the future.
Thanks to Tom Worville for providing me with the MLS results data. Here are some initial ELO ratings of the MLS teams using data from the previous season:
## team elo
## 1 Chicago Fire 1950
## 2 Colorado Rapids 1905
## 3 Columbus Crew 2050
## 4 DC United 2075
## 5 FC Dallas 2050
## 6 Houston Dynamo 1905
## 7 Los Angeles Galaxy 2160
## 8 Montreal Impact 1900
## 9 New England Revolution 2025
## 10 New York City 1900
## 11 New York Red Bulls 2025
## 12 Orlando City 1920
## 13 Philadelphia Union 2000
## 14 Portland Timbers 2000
## 15 Real Salt Lake 2075
## 16 San Jose Earthquakes 1925
## 17 Seattle Sounders 2075
## 18 Sporting Kansas City 2035
## 19 Toronto FC 1950
## 20 Vancouver Whitecaps 2010
These will be the starting point, though we could have decided to start every team at the same value. When we have only 20 weeks worth of data to explore changes in ratings, I think it makes sense to use previous ratings as our start point. However, when teams change so much from season to season, then it may make sense to reset the ratings. Alternatively, with the Glicko system (below) we could keep the ratings but increase the deviation (our uncertainty) at the change of each season.
Below are the results of the 2015 MLS season. I’m showing the top 6 and bottom 6 results. There are 201 in total broken down into 20 weeks. The result variable = 1 if home win, 0 if away win and 0.5 if a tie:
## Date home visitor result week
## 1 2015-03-09 Seattle Sounders New England Revolution 1.0 1
## 2 2015-03-09 Sporting Kansas City New York Red Bulls 0.5 1
## 3 2015-03-08 FC Dallas San Jose Earthquakes 1.0 1
## 4 2015-03-08 Houston Dynamo Columbus Crew 1.0 1
## 5 2015-03-08 Orlando City New York City 0.5 1
## 6 2015-03-08 Portland Timbers Real Salt Lake 0.5 1
## Date home visitor result week
## 196 2015-07-18 Portland Timbers Vancouver Whitecaps 0.5 20
## 197 2015-07-18 Real Salt Lake Houston Dynamo 1.0 20
## 198 2015-07-18 Seattle Sounders Colorado Rapids 0.0 20
## 199 2015-07-18 Sporting Kansas City Montreal Impact 1.0 20
## 200 2015-07-18 Los Angeles Galaxy San Jose Earthquakes 1.0 20
## 201 2015-07-18 Toronto FC Philadelphia Union 1.0 20
ELO ratings are re-calculated every week. If a team plays twice in that week then only one new rating is calculated. The ELO rating contains a contant ‘k’. This can be ‘optimally’ worked out based upon historical data (i.e. knowing that a difference of N ratings points between two teams should lead to a win probability of x% - e.g. a 100 point rating difference ought to be a 64% win probability for the superior team). However, it can be difficult to have sufficient historical data to accurately calculate ‘k’. Further, the distribution of win probabilities over a range of ratings differences may not perfectly fit the curve/distribution that ELO ratings are based upon.
For soccer, generally an ELO of between 20-40 appears to be satisfactory. It’s also possible to have different k values for whether a winner is an expected versus unexpected winner, but I won’t develop this idea here.
Here are the final ratings after week 20 for three different constants:
##
## Elo Ratings For 20 Players Playing 201 Games
##
## Player Rating Games Win Draw Loss Lag
## 1 Los Angeles Galaxy 2114 22 9 7 6 0
## 2 Sporting Kansas City 2082 18 9 6 3 0
## 3 FC Dallas 2063 20 10 5 5 0
## 4 DC United 2052 22 10 5 7 0
## 5 Seattle Sounders 2038 21 10 2 9 0
## 6 Columbus Crew 2032 21 8 6 7 0
## 7 New York Red Bulls 2030 19 8 5 6 0
## 8 Real Salt Lake 2026 21 6 8 7 0
## 9 Portland Timbers 2021 21 9 5 7 0
## 10 Vancouver Whitecaps 2021 21 10 3 8 0
## 11 New England Revolution 1991 22 7 6 9 0
## 12 Toronto FC 1978 18 8 3 7 0
## 13 Philadelphia Union 1969 21 6 4 11 0
## 14 San Jose Earthquakes 1953 19 7 4 8 0
## 15 Colorado Rapids 1953 20 5 9 6 0
## 16 Orlando City 1939 20 6 6 8 0
## 17 Houston Dynamo 1928 20 6 6 8 0
## 18 Chicago Fire 1921 19 5 3 11 0
## 19 Montreal Impact 1918 17 6 3 8 0
## 20 New York City 1906 20 5 6 9 0
##
## Elo Ratings For 20 Players Playing 201 Games
##
## Player Rating Games Win Draw Loss Lag
## 1 Sporting Kansas City 2104 18 9 6 3 0
## 2 Los Angeles Galaxy 2103 22 9 7 6 0
## 3 FC Dallas 2071 20 10 5 5 0
## 4 DC United 2039 22 10 5 7 0
## 5 New York Red Bulls 2030 19 8 5 6 0
## 6 Portland Timbers 2030 21 9 5 7 0
## 7 Columbus Crew 2027 21 8 6 7 0
## 8 Vancouver Whitecaps 2021 21 10 3 8 0
## 9 Seattle Sounders 2021 21 10 2 9 0
## 10 Real Salt Lake 2009 21 6 8 7 0
## 11 Toronto FC 1989 18 8 3 7 0
## 12 New England Revolution 1975 22 7 6 9 0
## 13 Colorado Rapids 1973 20 5 9 6 0
## 14 Philadelphia Union 1962 21 6 4 11 0
## 15 San Jose Earthquakes 1961 19 7 4 8 0
## 16 Orlando City 1944 20 6 6 8 0
## 17 Houston Dynamo 1932 20 6 6 8 0
## 18 Montreal Impact 1924 17 6 3 8 0
## 19 New York City 1909 20 5 6 9 0
## 20 Chicago Fire 1909 19 5 3 11 0
##
## Elo Ratings For 20 Players Playing 201 Games
##
## Player Rating Games Win Draw Loss Lag
## 1 Sporting Kansas City 2124 18 9 6 3 0
## 2 Los Angeles Galaxy 2097 22 9 7 6 0
## 3 FC Dallas 2078 20 10 5 5 0
## 4 Portland Timbers 2038 21 9 5 7 0
## 5 New York Red Bulls 2029 19 8 5 6 0
## 6 DC United 2027 22 10 5 7 0
## 7 Columbus Crew 2024 21 8 6 7 0
## 8 Vancouver Whitecaps 2020 21 10 3 8 0
## 9 Seattle Sounders 2004 21 10 2 9 0
## 10 Toronto FC 2000 18 8 3 7 0
## 11 Real Salt Lake 1997 21 6 8 7 0
## 12 Colorado Rapids 1991 20 5 9 6 0
## 13 San Jose Earthquakes 1966 19 7 4 8 0
## 14 Philadelphia Union 1959 21 6 4 11 0
## 15 New England Revolution 1959 22 7 6 9 0
## 16 Orlando City 1947 20 6 6 8 0
## 17 Houston Dynamo 1933 20 6 6 8 0
## 18 Montreal Impact 1930 17 6 3 8 0
## 19 New York City 1913 20 5 6 9 0
## 20 Chicago Fire 1898 19 5 3 11 0
Overall these are pretty similar, though notably the top teams and bottom teams do shuffle around. This plot below shows the rank order changes over a range of constants:
An improvement to the ELO ratings system (in my view), is the Glicko ratings system developed by Mark Glickman - see link below. The major advantage of this is that it has a ‘deviation’ parameter which provides a measure of uncertainty about the rating.
For the purpose of this illustration, I’ve initially assigned a deviation of 100 ratings points at week 1. This means, for instance, that LA Galaxy’s initial rating of 2160 is actually likely in the range 2060-2260. As we gain more information from more results, the deviation (our certainty in our Ratings measure) gets smaller.
There is also a constant value associated with the Glicko ratings system - typically constant values are smaller than those used with the ELO system. For this illustration, I shall just use one constant value (10).
##
## Glicko Ratings For 20 Players Playing 201 Games
##
## Player Rating Deviation Games Win Draw Loss Lag
## 1 Sporting Kansas City 2110 73.11 18 9 6 3 0
## 2 Los Angeles Galaxy 2094 69.89 22 9 7 6 0
## 3 FC Dallas 2074 70.99 20 10 5 5 0
## 4 DC United 2047 69.80 22 10 5 7 0
## 5 New York Red Bulls 2035 71.40 19 8 5 6 0
## 6 Portland Timbers 2032 69.58 21 9 5 7 0
## 7 Seattle Sounders 2030 70.04 21 10 2 9 0
## 8 Vancouver Whitecaps 2027 69.80 21 10 3 8 0
## 9 Columbus Crew 2021 69.18 21 8 6 7 0
## 10 Real Salt Lake 2005 69.70 21 6 8 7 0
## 11 Toronto FC 1990 72.18 18 8 3 7 0
## 12 New England Revolution 1979 68.41 22 7 6 9 0
## 13 Colorado Rapids 1977 71.29 20 5 9 6 0
## 14 San Jose Earthquakes 1971 72.07 19 7 4 8 0
## 15 Philadelphia Union 1951 70.75 21 6 4 11 0
## 16 Orlando City 1948 71.14 20 6 6 8 0
## 17 Houston Dynamo 1937 71.21 20 6 6 8 0
## 18 Montreal Impact 1928 73.47 17 6 3 8 0
## 19 New York City 1904 71.23 20 5 6 9 0
## 20 Chicago Fire 1902 71.77 19 5 3 11 0
Here you can see that we have a deviation measure of uncertainty about each final rating. We can plot that like this:
There are many other things that can be looked at using ELO/Glicko type models. A major one is how to differentially address the value of home-wins versus away-wins (or for that matter home ties versus away ties). I can look at that in more detail in a future post. I hope from this that it can be seen that the choice of the constant factor in either the ELO or Glicko rating system is not trivial. Depending on what values are chosen, the ratings output can address different questions (i.e. which teams are ‘hot’ verus which teams are consistently strong over long periods).
Further, I hope that it’s evident that ELO shouldn’t be seen as a way of perfectly ordinally ranking teams as 1st, 2nd, 3rd,…. etc. Rather, by using the Glicko, we can achieve some confidence over how different teams are likely to be from one another in their rating. In sports like soccer, and especially in the MLS, it’s typical for teams to be very similar to one another in ratings.