In A Game of Battleship, Strategy Influences the Outcome Given You Are Doing It Right.

Research by Cameron Bennage

with mentoring from Dr. Howard Skogman and Dr. Tasneem Zaihra

Abstract:

Battleship is a classic board game in which two opponents place five ships in a 10 by 10 grid and take turns calling points to sink all of the opponent’s ships before his/hers are sunk. For this research, we wanted to analyze the effect of different strategies, or processes for selecting points to call. The goal of the research was to compare different strategies to determine if one had a significant advantage over another and to find an optimal strategy. Using a computer program written in Python which allows millions of games to be played and records the number of turns until the game was finished, we could perform non-parametric tests and look at the data graphically to determine if the pattern used to pick points makes a significant difference on the outcome. Our results indicate that picking points in a specific pattern affects the outcome of the game. Of the patterns that proved to have a significant improvement, the one that had the most improvement was the pattern that worked from the center to the outside. Meaning it eliminated the points with the most ship combinations sooner.

Introduction:

For this research, in order to create a computer program that could beat you in a game of Battleship, it was necessary to know if the way you picked points from the grid in a game of Battleship had a significant impact on the outcome of the game. If so, what is the best pattern, or strategy such that it can win most of the time? There are many different ideas and methods behind all of this.

First, the game. Battleship is a board game in which two players each place 5 ships, the lengths are 5, 4, 3, 3 and 2, on a 10x10 grid without the opponent seeing where they have been placed. The players take turns calling points from the grid to sink all of the opponent’s ships before his/hers are sunk. A ship is considered sunk if all of the points it occupies have been called. It is a game of strategy with some luck. It is easy enough to make the computer play since it requires picking points and checking if all ships are sunk. As a side note, the fewest number of turns to finish a game in is 17 turns. That means selecting every ship’s direction, length, hit location, and placement correct the first time. While it does show up in some of the data, it should not be thought of as significant due to its extremely low probability.

For this research, a computer program written in Python was designed to play Battleship. It was used for the gathering of the data, and different versions were created for the different strategies.

All of the strategies tested relied on picking points with a given probability distribution until a ship was located. After one was hit, all strategies used the same algorithm to finish finding the other points of the ship. The first significant observation in designing the probability distributions for shot selection was to select only from every other point on the grid. Since the smallest ship takes up two spaces, selecting in this way guarantees that every ship will be located. We describe the various tested distributions below. After a hit, the algorithm selects a direction by considering the distance to the closest boundary or miss in that direction, then selecting the direction with the largest distance. The idea is a ship is more likely to be where there is more space, thus giving better chances at sinking the ship faster.

What is meant about patterns and point choice? Let’s take that selection of 50 points from every other point on the grid, referred to as the default. Let’s imagine you group those 50 points into 2 or more different sets to which you randomly pick a point from each grouping and then repeat after each group has had one chosen. You can probably think of a ton of different ways to group your points. For this research, eight different patterns have been chosen to be analyzed. Each pattern has a certain characteristic which makes it different than just random groupings. The names given to them are Quadrants, Mixed Halves, X Pattern, Top-Down, Spiral, Spiral Double, Scorpion, and X Pattern Half. See section 1 of the appendix for images of each of the patterns. Despite the fact you can group your set of points in many different ways, does it actually make a difference as to how you do it such that you gain an advantage? If it does make a difference, what characteristic gives the best advantage? Can that characteristic be used to create the optimal strategy?

Methods 1:

In order to get data on the different distributions for each of the patterns, the Battleship program was run one million times for each pattern. Distribution refers to the number of shots needed to finish a game, as shown at the end of the results section. Each game recorded its number of shots until all ships were sunk, reset and played again. One million games were played with each strategy to provide a reasonable estimation of the distribution for the number of turns to complete the game. (Graphs included in the appendix under section 2). A million games produced small variability between two data sets of the same pattern. The reason for the variability comes from the large number of different ways a single game can play out.

Graphical and nonparametric techniques were used in the analysis of this data. They include comparing the differences in values between the number of shots to win, Kolmogorov-Smirnov test, a randomization test run for 10,000 repetitions, and graphically comparing histograms of the distributions.

For the Kolmogorov-Smirnov test (KS test), a discrete version was used to compare the empirical cumulative distribution functions (ECDF) between two patterns. What the test does is it finds the max distance between the two different ECDFs, called the D statistic. From that, the p-value is calculated.

Results 1:

The data in Table 1 is the general stats from each of the patterns. Each pattern appears pretty similar. It is hard to tell what is better, or worse, so that’s why additional tests must be conducted.

General Stats - Table 1
Min Median Max Mean Standard.Deviation
Default 19 50 70 49.051 8.472
Quadrants 18 50 71 48.962 8.491
Mixed Halves 18 50 71 49.048 8.498
X Pattern 17 50 71 49.041 8.512
Top Down 18 50 70 49.017 8.495
Spiral 18 50 71 48.885 8.511
Double Spiral 18 49 70 48.726 8.530
Scorpion 18 50 70 49.027 8.478
X Pattern Half 18 50 71 49.139 8.463

What Table 2 represents is the sum of the absolute value of the differences between the tallied values from 19-69 for the default and the pattern chosen. This gives a good idea of the amount of shifting the distribution has compared to the default. From this, 2, 3, 4, & 7 all appear to have about the same amount of change and 1, 5, 6, & 8 have a bigger difference with 6 really standing out. That’s where the graph can help out.

Absolute Value of Differences - Table 2
Quadrants - 2.1 9836
Diffused Halves - 2.2 7518
X Pattern - 2.3 7788
Top Down - 2.4 7557
Spiral - 2.5 14789
Double - Spiral 2.6 31317
Scorpion - 2.7 6380
X Pattern Half - 2.8 10650

When graphing the differences at each point, it is possible to see if the pattern improved or not. A negative value shows that the pattern was higher at that value. Positive means the default was higher. A general trend is there is some improvement for the patterns, meaning a shift left in the distribution. Putting the two parts together, patterns 1, 5, & 6 have the largest change and in the right direction.

For the next test, we must look at the empirical CDF for each of the patterns. From these CDFs, a KS test for the discrete case can be performed. This measures the differences between empirical CDF points between patterns to figure out whether they have been sampled from the same distribution or not. The patterns were tested against the default. The results are shown in Table 3.

Kolmogorov-Smirnov - Table 3
D.Statistic P.Value
Default 0.000850 0.863
Quadrants 0.004162 0.000
Mixed Halves 0.001251 0.414
X Pattern 0.001940 0.046
Top Down 0.002006 0.036
Spiral 0.006900 0.000
Double Spiral 0.014459 0.000
Scorpion 0.001637 0.137
X Pattern Half 0.004450 0.000

For this test, the D statistic is the maximum difference between any same two points of the empirical CDFs. As a comparison, the test for the default against itself is shown at the top. If you order them from the largest D statistic to smallest, it is 6, 5, 8, 1, 4, 3, 7, 2, then 0. In addition, sorting by the p-values yields the same order. This test, with 99% confidence, indicates that patterns 1, 5, 6, & 8 are significantly different than the default pattern, indicating they must have come from a different distribution. The same four patterns from the previous test.

One more additional test to conduct is a randomization test. This is done against the default pattern. What this test looks at is how likely it is to get the samples as they were gathered. For this, the mean was used as the test statistic.

Randomization Test (P-Values)
Quadrants 0
Diffused Halves 0.7959
X Pattern 0.3801
Top Down 0.0034
Spiral 0
Double Spiral 0
Scorpion 0.0433
X Pattern Half 0

From the results, the two highest are 2 and 3. The next are 4 and 7 with very tiny, yet probable, chances they are the same. Lastly, patterns 1, 5, 6, & 8 have less than a 1/10,000 chance of being picked such that the differences in the means are higher or lower than the original samples. For those patterns, it is with great certainty that you cannot randomly get the samples like that, so they must be different.

From the results of all the tests, the pattern can have an influence on the distribution given it has certain criteria. These criteria will be discussed with a particular focus on what makes Double Spiral stand out.

Discussion 1:

From our results, let us look at only the patterns that proved to be significant for each of the tests. These patterns are Quadrants, Spiral, Double Spiral, and X Pattern Half. Stated earlier, each had a different characteristic that produced the pattern. For Quadrants, it’s one of the simplest ways to group them, and it gets the shots to be more spread out from the default’s random clumping. For Spiral, ships have a greater chance of being more central due to the possible combinations present, so it starts inside and works outside. For Double Spiral it works the same way, only it doubles the sections to pick from. Each of the sections from the spiral was split in half and it still works from the inside out. For X Pattern Half, it is a variation from X Pattern in that it combines the sections outside the X into 2 groups, not 4. Additionally, X pattern’s characteristic is that it cuts the board up going through the center not leaving many places for the remaining ships to be.

While each of those four patterns prove to be significant, you can see that one stands out more than the others. That is the Double Spiral pattern. It has the lowest mean, the highest sum of differences, and the biggest D-statistic. This must mean that the pattern has a trait which leads to better shooting when playing Battleship. Given the Double Spiral Pattern has 10 different sections, it’s easy to understand why. When you call 5 points, you get a cluster of points centrally located in the grid since it starts by calling in the center groups. When you call 10 points, you get an even spread of points. When you call 15, you get a set of points that is denser in the center. When you call 20, its back to being an even spread. This repeats until the end. Knowing that there are more ship placement combinations in the center of the grid, it makes sense to call more of the points in the center. This is also true for the Spiral but it is less prominent.

From the analysis of the patterns above, we can see that the pattern that has the greatest improvement is the one where it is bias towards the center and slowly moves out. As stated before, this is due to the fact that ships can be placed in more ways in the center of the board than the outside. Thus allowing you a greater chance of hitting a ship when calling central points. We can use this knowledge to construct a new way to approach calling points, and then test whether it is any good.

Methods 2:

To gather the data for this, the same Battleship program was used but with the function for calling points being rewritten to match what was described above. It also uses 1,000,000 games to construct the probability densities for each.

For the tests used, we looked at the stats of the data, the empirical cumulative distribution function, and had observations from playing each of the versions. In a later section, the Kolmogorov-Smirnov test was used along with graphical analysis of the means.

Results 2:

Looking at the basic probabilities version, we can see an improvement from the default pattern above. It has a mean which is also lower than the Double Spiral pattern. This is only the start.

Next we considered what would happen if the program only picked the point that had the most ship combinations. Thus being most efficient and most probable to end sooner. When looking at the results from the probabilities max, you get a mean that is about 4 turns lower. For a game when every turn makes a difference, this is significant. Table 5 breaks down the details for each of the distributions.

Probabilities Comparison - Table 5
Quarter.1 Median Quarter.3 Mean
Default 43 50 56 49.05
Double Spiral 43 49 56 48.73
Probabilities 42 49 55 48.26
Prob Max 38 44 51 44.83

In theory, it appears as though we have found our best strategy for Battleship as it clearly outperforms the others. This is where playing the game helps to understand why it’s not.

When you first play, it is just a normal game with it picking the points in the center first. Second game, it picks many of those same points again with some being different. After the third game, you can pick up the pattern and start to form a strategy against it. The first points called from this version are included in Section 3 of the appendix.

This means there is a way for the human to adapt and start to win most of the time. This is not ideal. We do not want the human player to be able to win most of the time. After sitting back and thinking for a moment about what to do, we came up with a solution to test. Since having simple probabilities was good and picking the max each time was the best and worst, we concluded there must be some distribution in between those two that not only makes finding a strategy difficult, but also gets the mean as low as it can.

The solution was to use an exponent on the ship combinations before turning them into probabilities. The logic behind it comes from the nature of the exponent function. It gives much greater power to the values that are bigger and does so quickly. The first version was simply squaring the number. So 10 would be 100 and 34 would be 1156. As you can see from this version, a much greater weight was given to the 34 point than the 10 meaning it is more likely to pick it. While this does improve, it can get better. All of the powers we tested were 2, 3, 4, 5, 10, 25, 50, and 100. Each with improvement from the last. Table 6 shows the details of the stats from the distributions.

Probability Weights Comparison - Table 6
Exponent Q1 Median Q3 Max Mean STD
1 42 49 55 70 48.262 8.33
2 41 48 54 68 47.267 8.35
3 41 47 53 68 46.658 8.36
4 40 46 53 68 46.290 8.39
5 40 46 53 68 46.038 8.42
10 39 45 52 68 45.335 8.42
25 38 45 51 69 44.706 8.44
50 38 44 51 73 44.678 8.65
100 38 44 51 73 44.714 8.74

You might conclude that a 200 power would be better. In fact, it would not be. It is possible to show that the 100 power is nearly identical to the max. Graphically it comes out like below. Knowing the 100 power is nearly the same as the max, we will consider them the same.

With this we are presented with a challenge. How do we test each of these to find out which one fits our ideal strategy? That is one with a low mean and difficult to strategize against.

With a little thought, it was not hard to figure this out. If we recall what made the max bad, we have our solution. It was that we could see a pattern and place ships as such to avoid the pattern and give us a better chance at winning. So the strategy that was the most difficult to find a pattern in would be ideal along with getting the mean as low as possible.

For this, we recorded the first 17 shots of each probabilities game, analyzed how many times certain points were called, took the top 10 most called points from 5,000 games, and reran the Battleship program for only 200,000 times such that when it placed ships for it to sink it did not place ships in those 10 points. Essentially, we are strategizing against the computer to see the effect it has.

From these results, a KS test between the distributions with and without the strategy each proved to be different. So we can conclude, with 100% certainty, that this strategy has an effect on the distribution and looking at the means you can see it brings the mean up. As you progress higher in the exponent, the difference between the original and the strategy gets larger.

Discussion 2:

We know now that we can effect the outcome with a strategy but how does this help to figure out which one is better? That is where we can look at a plot of the means for both the original and the one with the strategy.

The original is plotted with black dots. They decrease and bottom out for the 25, 50, and 100 powers. Plotted above them is the means for when a strategy is used. This is where the research took a surprising turn. So while the means get further away from the original mean, meaning the strategy is having a larger effect, they actually decrease among themselves to a point. The best part comes from 50 and 100 actually going up in value. This means there must be a minimum value for the function describing these means. With a minimum, we can find the optimal strategy.

Visually we can see 25 lies around what would be the minimum for this. For the distributions with the strategy, two additional ones were added. They are 20 and 30 powers. Taking the means of 20, 25, and 30 we can approximate the function locally with a parabola, and calculus gives us the minimum to find what power is optimal.

The two points in blue in the plot above are the 23rd power. They represent the optimal strategy for Battleship. The proof is as follows. Without using a strategy, the power’s mean lies very close to the best you can get. In addition, a KS test between the 23rd and 25th power proves significant, meaning the choice of power makes a difference in the distribution. With using a strategy, the power’s mean is the best it can be. A KS test between 23rd and 25th power again proves significant.

With the 23rd power being nearly the best, and when you try to strategize against the program it is the best, we can conclude the optimal strategy for playing Battleship is to weigh the grid points based on the number of ship combinations raised to the 23rd power and call your points based on the probabilities given by those values.

Acknowledgements:

I would like to thank Dr. Howard Skogman, Dr. Tasneem Zaihra, Mr. Ken Mead, my family and my friends for helping me with the many different aspects of this project. They include statistical analysis, testing of the program, exploration of the data, bouncing off ideas, and supporting me.

References:

Arnold, T. B., Emerson, J. W., & R Core Team. (2013, October 25). Package ‘dgof’. Retrieved December 18, 2019, from https://cran.r-project.org/web/packages/dgof/dgof.pdf.

Kim, A. (2019, November 25). How to Compare Two Distributions in Practice. Retrieved December 18, 2019, from https://towardsdatascience.com/how-to-compare-two-distributions-in-practice-8c676904a285.

Kolmogorov-Smirnov Goodness-of-Fit Test. (n.d.). Retrieved November, 2019, from https://www.itl.nist.gov/div898/handbook/eda/section3/eda35g.htm.

Kuiper, S and Sklar, J (2013): Practicing Statistics: Guided Investigations for the Second Course. Pearson Addison-Wesley.

Appendix:

Section 1:

Below is each of the different patterns discussed. They are color-coded such that it picks a point from red and then down the rainbow to purple before returning back to red. Some sections may be of an uneven size which means when it gets to a point where one section is out, it just skips it and goes to the one that has points left.

Section 2:

Section 3:

These are the first 17 points from 10 games of the program that only picks the max. It is clear that a pattern exists that is easy to exploit.

44,55,33,66,26,73,37,62,48,58,68,78,38,28,18,67,69

44,55,45,65,54,56,57,33,73,74,75,76,77,26,27,28,29

44,55,66,33,73,62,37,26,84,48,15,22,77,67,87,76,75

55,44,66,33,73,74,75,76,77,72,26,37,22,48,15,16,25

44,45,46,43,57,52,75,64,65,66,67,68,22,23,32,12,83

55,45,65,75,85,34,26,36,46,42,53,57,13,21,71,48,94

44,55,33,66,56,76,86,96,62,37,25,73,48,51,41,31,61

44,45,54,64,33,34,43,53,23,13,3,76,26,37,72,68,85

55,44,45,46,43,62,37,74,75,76,73,72,71,22,67,31,32

44,45,53,54,55,66,56,76,67,65,64,63,32,26,27,36,25