The goal of Go is to surround as much territory using your stones as possible. So, if at all possible, players should try to place stones in such a way as to grab the most area first, and then leave small amounts of area, just a few points, for last. The game ends either when one player surrenders or when both players feel that the remaining moves are worth zero points or less. I wonder: what does the rate of diminishing returns look like in Go? How much is the first move worth versus the second? The tenth? the hundredth? Knowing this will help me understand when it might be worth it to walk away from a battle versus to keep fighting it at each stage of the game.
As a secondary goal, I'd like to evaluate komi. Komi is an amount of points granted to the white player to offset the disadvantage of going second. Ideally, the komi should be set such that if both players played perfectly, then black's score would exactly equal white's score. If Go is to be a fair game, then if I assigned each move m a point value V(m) and that black and white took turns making M moves in a game without passing, I would expect that
Equation 1: Calculation of komi using V(m), the point value of the mth move in a game of Go. M is the number of moves in an ideal, perfect game, and it is assumed that no player pass during play. 
I will use GNU Go and the Go Text Protocol to evaluate a freely available collection of Go games from Andries Brouwer, and also to generate a set of computer games where GNU Go plays itself. After each move, I will ask GNU Go to run evaluate_score and see how the score has changed from the previous move. I posit that this difference is an estimate for V(m) in the equation above.
I used this python script to generate the simulated games. I then produced a box and whisker plot for the calculated V(m) for each move using the following bash commands:
# Get all deltas.
# Produce box and whisker plots for each move. 
Plotting using xmgrace tells a pretty cool story:
Figure 1: Value of moves in go V(m) calculated using 5000 games simulated by GNU Go on a 19 by 19 board. Score estimates come from GNU Go's built in estimate_score function. 
Figure 2 shows a different view of the simulated data in Figure 1. The interquartile range shows the values of typical moves in Go. The median value of the first move of the game is 10.4 points. The median value of the tenth is 7.3, the fiftieth is 4.9, the hundredth is 2.8, and the hundredfiftieth is 1.3. These figures give me an idea of the immediate value of sente, or when it may be worthwhile to simply ignore the opponent's last move and play elsewhere.
Next, I wonder if Equation 1 allows me to reconstruct Komi. A simple command line allows me to perform the calculation:
cat simulatedboxandwhisker  awk '{print $4}'  awk 'BEGIN {t=0} (NR % 2 == 0) {t = $1} (NR % 2 == 1) {t += $1} END {print t}'

This estimate is not even close to GNU Go's default Komi, 5.5. This reveals a limitation of my study: while most moves' values fall in a narrow interquartile range, no game is composed entirely of typical moves shown in Figure 2. Victory lies not in these average moves, but instead in the long tail of outliers visible in Figure 1 (green extremes).
Now let's try this experiment again with 5000 human games. I will start with the empty board, then after each turn ask GNU Go to evaluate the new value of the board. The difference between the previous board value and the next will be called the value of each move.

Comparing Figure 4 to Figure 2, it becomes clear that there is something profoundly different between the human games and the AI games. The GNU Gopredicted point value of moves remains high, around 10 stones on average, out to the hundredth move. Further, the interquartile range is much broader and skewed towards more valuable moves. So, what's going on here?
First, consider that most of the human games end in resignation. None of the GNU Go games end in resignation. So, what do human players do when they feel that they are losing lost that GNU Go does not? They try a hail Mary. They make an exceptionally risky move that, while likely to fail, may put them back in the game. If this risk fails, a human player will probably resign.
Figure 5 shows us that human games tend to be shorter than GNU Go games due to resignation, and taken together with figures 3 and 4 we see that the losing player tends to make big gambles that are worth a lot of points if they succeed. AI players, on the other hand, cannot transcend their heuristic algorithm. They will make the move that they see as most profitable every time, whether they are ahead or behind.
Finally, I want to discuss exactly what the value of a move means. Above, I have posited that the value of a move is the point value, in stones. But, what if you don't care about points, and only care about winning or losing by any number of points? The one point that puts you slightly ahead of your opponent, then, is worth more than 50 points that still leave you behind.
This is a difference between the Pachi algorithm and GNU Go. GNU Go is concerned about values in stones, and Pachi is concerned about probability of winning. Pachi, in fact, will resign if it cannot find any scenario in which it will win. Pachi's Monte Carlo algorithm was a huge breakthrough in computer Go, but its skill on a 19 by 19 board is still a far cry from professional play. A fun experiment is to use the sgfanalyze.pl script included with Pachi to track probability of winning through a game, and as such to identify the major plays and blunders of a game.
Pachi supposedly plays at a 2d level on KGS, which means that it is likely to have a hard time analyzing games by players at a higher level than 2d.
Out of curiosity, I tried runing sgfanalyze on a 9d game annotated by GoGameGuru. I ran 1 million simulations per move using the extra pattern files, and ran analyzesgf once for black and once for white. The output of sgfanalyze looks something like this:
2, W, D16, C12, 0.52
... 
./sgfanalyze.pl game.sgf B t =1000000 threads=4 d 0 dynkomi=none > analysisB ./sgfanalyze.pl game.sgf W t =1000000 threads=4 d 0 dynkomi=none > analysisW tail n +2 analysisB  tr d ","  awk '($3 == $4) {print $1, $5}' > Bagree tail n +2 analysisB  tr d ","  awk '($3 != $4) {print $1, $5}' > Bdisagree tail n +2 analysisW  tr d ","  awk '($3 == $4) {print $1, 1  $5}' > Wagree tail n +2 analysisW  tr d ","  awk '($3 != $4) {print $1, 1  $5}' > Wdisagree xmgrace nxy Bagree Bdisagree Wagree Wdisagree 
Like GNU Go, Pachi is only as good as its heuristic algorithm, and if it can't see as far ahead in the game as the professionals can, then the probability it predicts will not be illuminating. Indeed, comparing the GoGameGuru's notes on which moves were brilliant and which were mistakes to Pachi's did not yield any noticeable correlation. In fact, Pachi seemed to think that black resigned when he was in fact coming back!
Figure 6: Pachi analysis of Junghwan vs. Yue. Major predicted turning points in the game are noted. For instance, from moves 123 to about 182, white was pulling ahead. After move 182, black was coming back in the game according to Pachi, but resigned at the end of move 238. Compare this to the analysis given at GoGameGuru. Pachi was run with 1 million simulations per move and using the extra pattern files. 
This analysis may have raised more questions than answers, but I did learn a few things about computer Go and human nature:
 No AI can transcend its heuristics. The AI will relentlessly play the move it sees as most profitable at every turn, whereas human players who know they are behind will tend to take a gamble to try to get back into the game, then resign if it fails.
 Whether or not there are diminishing returns in Go depends on your definition of value: if you consider value as the number of points secured per move, then the first dozen moves of the game are worth about 10 points apiece, and any move after the 200th is worth perhaps one. If you consider the value of a move as whether or not you can win the game with it, that could happen at any time.
 Komi does not appear in the plot of diminishing returns. While all games tend to fall within a few points of the value curve, no game is composed entirely of average moves.
Thanks to Andrew Jackson and Chris Gaiteri for illuminating conversations on this topic!
No comments:
Post a Comment