The basic variant of MCTS doesn't require a special management for terminal states.
It just updates the score / number of visits variable depending on the result (win / lose) and proceeds with back-propagation.
It's a sort of istantaneous playout/simulation and cannot be skipped (allowing MCTS to converge to the game-theoretical value).
The algorithm can be improved (e.g. MONTE-CARLO TREE SEARCH SOLVER) using large scores (say +∞ / -∞) and ad hoc backpropagation / selection schemes:
A special provision is then taken when backing such proven values up the tree. There are three cases to consider as shown in Fig. (we use the negamax formulation, alternating signs between levels).
First, when a simulation backs up a proven loss (−∞) from a child
cto a parentp, the parent nodepbecomes, and is labelled as, a proven win (∞), that is, the position is won for the player atpbecause the move played leads to a win (left backup diagram in the figure).When backing up a proven win (∞) from
ctop, one must, however, also look at the other children ofpto determinep’s value. In the second case, when all child nodes ofpare also a proven win (∞), then the value ofpbecomes a proven loss (-∞), because all moves lead to a position lost forp(middle backup diagram in the figure).However, the third case occurs if there exists at least one child with a value different value from a proven win. Then we cannot label
pas a proven loss. Insteadpgets updates as if a simulation win (instead of a proven win) were being backed up from nodec(right backup diagram in the figure;vanduindicate non-proven values). Non-proven values are backed up as in regular MCTS.
- Further details in Monte-Carlo Tree Search in Lines of Action by Mark H.M. Winands, Yngvi Bjornsson and Jahn-Takeshi Saito.
