## Monday, November 11, 2013

### Diminishing Returns in the Game of Go

We've been playing a lot of Go at work lately.  Sometimes my mind wanders as I play, and I see parallels to life in the board positions.  Arguments, negotiations, trades, consensus, squabbling, greed and fear.

The goal of Go is to surround as much territory using your stones as possible.  So, if at all possible, players should try to place stones in such a way as to grab the most area first, and then leave small amounts of area, just a few points, for last.  The game ends either when one player surrenders or when both players feel that the remaining moves are worth zero points or less.  I wonder:  what does the rate of diminishing returns look like in Go?  How much is the first move worth versus the second?  The tenth?  the hundredth?  Knowing this will help me understand when it might be worth it to walk away from a battle versus to keep fighting it at each stage of the game.

As a secondary goal, I'd like to evaluate komi.  Komi is an amount of points granted to the white player to offset the disadvantage of going second.  Ideally, the komi should be set such that if both players played perfectly, then black's score would exactly equal white's score.  If Go is to be a fair game, then if I assigned each move m a point value V(m) and that black and white took turns making M moves in a game without passing, I would expect that

 Equation 1:  Calculation of komi using V(m), the point value of the m-th move in a game of Go.  M is the number of moves in an ideal, perfect game, and it is assumed that no player pass during play.

I will use GNU Go and the Go Text Protocol to evaluate a freely available collection of Go games from Andries Brouwer, and also to generate a set of computer games where GNU Go plays itself.  After each move, I will ask GNU Go to run evaluate_score and see how the score has changed from the previous move.  I posit that this difference is an estimate for V(m) in the equation above.

I used this python script to generate the simulated games.  I then produced a box and whisker plot for the calculated V(m) for each move using the following bash commands:

 ``` # Get all deltas. cat all-sims | awk '{print \$2, \$3}' > generated-files/delta-vs-move``` ```# Produce box and whisker plots for each move. cat generated-files/delta-vs-move | sort -k 1 -gk 2 | awk 'BEGIN {n=0;l=1} (\$1==l) {a[n]=\$2;n+=1} (\$1!=l) {print l, a[int((n-1)/2)], a[int(3*(n-1)/4)], a[int((n-1)/4)], a[n-1], a[0];n=1;l=\$1;a[0]=\$2} END {print l, a[int((n-1)/2)], a[int(3*(n-1)/4)], a[int((n-1)/4)], a[n-1], a[0]}' > generated-files/simulated-box-and-whisker ```

Plotting using xmgrace tells a pretty cool story:

 Figure 1:  Value of moves in go V(m) calculated using 5000 games simulated by GNU Go on a 19 by 19 board.  Score estimates come from GNU Go's built in estimate_score function.
There are several important messages to take from Figure 1.  First, as demonstrated by the narrow interquartile range shown in blue, that typical moves in Go have a value that is closely related to the move number.  Second, we see that Go is also a game of tactics where particular tactical situations can be responsible for wild fluctuations in the score.  Very good tactical moves account for the extreme high move values, and very poor moves account for the extreme low move values.  This plot also underscores the point of my study:  when it is a good idea to simply walk away from a battle and play elsewhere.  Clearly, if I believe that a battle is worth 20 or more points, I should always fight it according to this plot.  Below that, the story becomes more interesting.
 Figure 2:  This is identical to Figure 1, but rescaled and with the whiskers removed from the box and whisker plot.  Note that the IQRs did not include moves with values less than zero, and nothing is hidden below the x-axis.

Figure 2 shows a different view of the simulated data in Figure 1.  The interquartile range shows the values of typical moves in Go.  The median value of the first move of the game is 10.4 points.  The median value of the tenth is 7.3,  the fiftieth is 4.9, the hundredth is 2.8, and the hundred-fiftieth is 1.3.  These figures give me an idea of the immediate value of sente, or when it may be worthwhile to simply ignore the opponent's last move and play elsewhere.

Next, I wonder if Equation 1 allows me to reconstruct Komi.  A simple command line allows me to perform the calculation:

 ``` cat simulated-box-and-whisker | awk '{print \$4}' | awk 'BEGIN {t=0} (NR % 2 == 0) {t -= \$1} (NR % 2 == 1) {t += \$1} END {print t}'``` ``` -0.3 ```

This estimate is not even close to GNU Go's default Komi, 5.5.  This reveals a limitation of my study:  while most moves' values fall in a narrow interquartile range, no game is composed entirely of typical moves shown in Figure 2.  Victory lies not in these average moves, but instead in the long tail of outliers visible in Figure 1 (green extremes).

Now let's try this experiment again with 5000 human games.  I will start with the empty board, then after each turn ask GNU Go to evaluate the new value of the board.  The difference between the previous board value and the next will be called the value of each move.

 Figure 3:  Value of moves in go V(m) calculated using even human games by GNU Go on a 19 by 19 board.  Score estimates come from GNU Go's built in estimate_score function.
Figure 3 looks vaguely reminiscent of Figure 1, but with a glaring difference:  the value of the moves in the mid-game actually seems to be greater than the value of the early game moves!  Let's take a closer look at this by focusing on the interquartile range.

 Figure 4:  This is identical to Figure 3, but rescaled and with the whiskers removed from the box and whisker plot.  Note that the IQRs did not include moves with values less than zero, and nothing is hidden below the x-axis.

Comparing Figure 4 to Figure 2, it becomes clear that there is something profoundly different between the human games and the AI games.  The GNU Go-predicted point value of moves remains high, around 10 stones on average, out to the hundredth move.  Further, the interquartile range is much broader and skewed towards more valuable moves.  So, what's going on here?

First, consider that most of the human games end in resignation.  None of the GNU Go games end in resignation.  So, what do human players do when they feel that they are losing lost that GNU Go does not?  They try a hail Mary.  They make an exceptionally risky move that, while likely to fail, may put them back in the game.  If this risk fails, a human player will probably resign.

 Figure 5:  Human games tend to be shorter than GNU Go games, because GNU Go never resigns.  Before resigning, humans tend to make a big gamble that, should it succeed, is worth a significant number of points.  If aliens came to this planet and wanted to understand the difference between humans and computers, this would tell them all that they need to know.

Figure 5 shows us that human games tend to be shorter than GNU Go games due to resignation, and taken together with figures 3 and 4 we see that the losing player tends to make big gambles that are worth a lot of points if they succeed.  AI players, on the other hand, cannot transcend their heuristic algorithm.  They will make the move that they see as most profitable every time, whether they are ahead or behind.

Finally, I want to discuss exactly what the value of a move means.  Above, I have posited that the value of a move is the point value, in stones.  But, what if you don't care about points, and only care about winning or losing by any number of points?  The one point that puts you slightly ahead of your opponent, then, is worth more than 50 points that still leave you behind.

This is a difference between the Pachi algorithm and GNU Go.  GNU Go is concerned about values in stones, and Pachi is concerned about probability of winning.  Pachi, in fact, will resign if it cannot find any scenario in which it will win.  Pachi's Monte Carlo algorithm was a huge breakthrough in computer Go, but its skill on a 19 by 19 board is still a far cry from professional play.  A fun experiment is to use the sgf-analyze.pl script included with Pachi to track probability of winning through a game, and as such to identify the major plays and blunders of a game.

Pachi supposedly plays at a 2d level on KGS, which means that it is likely to have a hard time analyzing games by players at a higher level than 2d.

Out of curiosity, I tried runing sgf-analyze on a 9d game annotated by GoGameGuru.  I ran 1 million simulations per move using the extra pattern files, and ran analyze-sgf once for black and once for white.  The output of sgf-analyze looks something like this:

 ``` 2, W, D16, C12, 0.52 4, W, D3, M17, 0.52 6, W, P3, R9, 0.52 ``` `...`
Sample output from Pachi's sgf-analyze.pl script.

 ``` ./sgf-analyze.pl game.sgf B -t =1000000 threads=4 -d 0 dynkomi=none > analysis-B``` `./sgf-analyze.pl game.sgf W -t =1000000 threads=4 -d 0 dynkomi=none > analysis-W` `tail -n +2 analysis-B | tr -d "," | awk '(\$3 == \$4) {print \$1, \$5}' > B-agree` `tail -n +2 analysis-B | tr -d "," | awk '(\$3 != \$4) {print \$1, \$5}' > B-disagree` `tail -n +2 analysis-W | tr -d "," | awk '(\$3 == \$4) {print \$1, 1 - \$5}' > W-agree` `tail -n +2 analysis-W | tr -d "," | awk '(\$3 != \$4) {print \$1, 1 - \$5}' > W-disagree` `xmgrace -nxy B-agree B-disagree W-agree W-disagree`
The output from sgf-analyze is easy to work with.  I used a procedure like this to generate the beautiful plot in Figure 6.

The first column is the move, the second column is the player color, the third is the move actually played, the fourth is the suggested move by Pachi, and the fifth column is the probability (according to Pachi) of this player winning assuming that he plays the recommended move. There is no doubt that these 9d players are much better than Pachi, so I'll assume that their moves are at least as good as the recommended one, and that this probability equals their probability of winning at the end of this move.

Like GNU Go, Pachi is only as good as its heuristic algorithm, and if it can't see as far ahead in the game as the professionals can, then the probability it predicts will not be illuminating.  Indeed, comparing the GoGameGuru's notes on which moves were brilliant and which were mistakes to Pachi's did not yield any noticeable correlation.  In fact, Pachi seemed to think that black resigned when he was in fact coming back!

 Figure 6:  Pachi analysis of Junghwan vs. Yue.  Major predicted turning points in the game are noted.  For instance, from moves 123 to about 182, white was pulling ahead.  After move 182, black was coming back in the game according to Pachi, but resigned at the end of move 238.  Compare this to the analysis given at GoGameGuru.  Pachi was run with 1 million simulations per move and using the extra pattern files.
Pachi's analysis of the 9d game is understandably flawed, as it is not as strong as these players.  Still, the analysis by Pachi shows an important point:  the value of a move depends greatly on your definition of value.  If you care about expected point value then, indeed, an early game move is much more valuable than a late game move.  On the other hand, if you care about winning or losing then the game can certainly come down to a matter of a few points, and those few points can make a world of difference regardless of when they are secured.  The professional game analyzed in Figure 6 changed hands no fewer than three times:   twice in the mid-game, and once again at the end.  If I had a lot of spare processor cycles, it would be interesting to perform this analysis with Pachi on a large number of games and explore the distribution of points where the lead player changes hands.

This analysis may have raised more questions than answers, but I did learn a few things about computer Go and human nature:
• No AI can transcend its heuristics.  The AI will relentlessly play the move it sees as most profitable at every turn, whereas human players who know they are behind will tend to take a gamble to try to get back into the game, then resign if it fails.
• Whether or not there are diminishing returns in Go depends on your definition of value:  if you consider value as the number of points secured per move, then the first dozen moves of the game are worth about 10 points apiece, and any move after the 200th is worth perhaps one.  If you consider the value of a move as whether or not you can win the game with it, that could happen at any time.
• Komi does not appear in the plot of diminishing returns.  While all games tend to fall within a few points of the value curve, no game is composed entirely of average moves.
Thanks to Andrew Jackson and Chris Gaiteri for illuminating conversations on this topic!