Words With Friends Statistics

Over the past few months, I and my friend Bill (a pseudonym) have played 107 games of Words With Friends, a Scrabble-like game in which players take turns building words crossword-puzzle style on a 15-by-15 gameboard. I've kept track of our scores during this series, and have found the resulting dataset to be useful in answering certain questions about the game.

First let's take a look at the distributions of our scores. Figure 1 below shows kernel density estimates of my and Bill's final scores in each game. I was surprised at how normal (in the Gaussian sense) my scores appear. In fact, my scores pass a Shapiro test for normality with p = 0.8883. They have a mean of 385.5 and standard deviation of 43.6.

The distribution of Bill's scores is somewhat different, having a definite positive skew. A Shapiro test gives p = 0.0001; at the 5% level of significance we would reject the hypothesis that Bill's scores are drawn from a Normal distribution. In any case the distribution has a mean of 340.7 and standard deviation of 41.1.

plot of chunk makeFig1

Since the winner is the player with the highest score, what's really important is the difference between our scores. Figure 2 shows the distribution of that difference, i.e. my score minus Bill's score for each game. A Shapiro test for this distribution gives p = 0.0867, meaning that at the 5% level of significance we cannot reject the hypothesis that the distribution is normal. The mean is 44.8 and the standard deviation is 67.9. This means that I tend to beat Bill by about 45 points per game. Given this mean and standard deviation, my win fraction should be about 0.7454. In fact, I won 80 out of the 107 games, giving me a win fraction of 0.7477.

plot of chunk makeFig2

I also wondered whether there would be any correlation between our scores. For example, the game could be “zero sum”, such that when one player gets a higher score, his opponent tends to get a lower score. This would cause a negative correlation between scores. On the other hand it could be that when one player gets a high score, his opponent rises to the challenge and scores higher as well. This would cause positive correlation between scores. Of course, it might also be that the scores are like draws from two (potentially different) random distributions, and are unrelated in any way.

Figure 3 is a plot of Bill's score versus my score over the 107-game series. At first glance it doesn't appear that our scores are related at all. However, linear regression shows that they are in fact negatively correlated. The regression equation:

Bill = 444 - 0.2679*Me

is shown by the dotted line in Figure 3. The slope of the line is significant at the 5% confidence level, with p = 0.003. The regression has an r-squared of just 0.0807, meaning that it explains very little. In other words, my getting a high score does tend to reduce my opponent's score, but only slightly. On average, each additional point I get reduces Bill's score by 0.27 points. However, the standard error of the regression is 39.6 points, meaning that other factors such as the particular letters Bill draws, his skill at playing the game and (perhaps) sheer luck have a much larger effect on his final score.

plot of chunk makeFig3