Return to Answer

deleted 9 characters in body

edited Aug 2, 2018 at 7:27

4.3k
3
28
38

The basic variant of MCTS doesn't require a special management for terminal states.

It just updates the score / number of visits variable depending on the result (win / lose) and proceeds with back-propagation.

It's a sort of istantaneous playout/simulation and cannot be skipped (allowing MCTS to converge to the game-theoretical value).

The algorithm can be improved (e.g. MONTE-CARLO TREE SEARCH SOLVER) using large scores (say +∞ / -∞) and ad hoc backpropagation / selection schemes:

A special provision is then taken when backing such proven values up the tree. There are three cases to consider as shown in Fig. (we use the negamax formulation, alternating signs between levels).

First, when a simulation backs up a proven loss (−∞) from a child c to a parent p, the parent node p becomes, and is labelled as, a proven win (∞), that is, the position is won for the player at p because the move played leads to a win (left backup diagram in the figure).

When backing up a proven win (∞) from c to p, one must, however, also look at the other children of p to determine p’s value. In the second case, when all child nodes of p are also a proven win (∞), then the value of p becomes a proven loss (-∞), because all moves lead to a position lost for p (middle backup diagram in the figure).

However, the third case occurs if there exists at least one child with a value different value from a proven win. Then we cannot label p as a proven loss. Instead p gets updates as if a simulation win (instead of a proven win) were being backed up from node c (right backup diagram in the figure; v and u indicate non-proven values). Non-proven values are backed up as in regular MCTS.

Further details in Monte-Carlo Tree Search in Lines of Action by Mark H.M. Winands, Yngvi Bjornsson and Jahn-Takeshi Saito.

The basic variant of MCTS doesn't require a special management for terminal states.

It just updates the score / number of visits variable depending on the result (win / lose) and proceeds with back-propagation.

It's a sort of istantaneous playout/simulation and cannot be skipped (allowing MCTS to converge to the game-theoretical value).

The algorithm can be improved (e.g. MONTE-CARLO TREE SEARCH SOLVER) using large scores (say +∞ / -∞) and ad hoc backpropagation / selection schemes:

A special provision is then taken when backing such proven values up the tree. There are three cases to consider as shown in Fig. (we use the negamax formulation, alternating signs between levels).

First, when a simulation backs up a proven loss (−∞) from a child c to a parent p, the parent node p becomes, and is labelled as, a proven win (∞), that is, the position is won for the player at p because the move played leads to a win (left backup diagram in the figure).

When backing up a proven win (∞) from c to p, one must, however, also look at the other children of p to determine p’s value. In the second case, when all child nodes of p are also a proven win (∞), then the value of p becomes a proven loss (-∞), because all moves lead to a position lost for p (middle backup diagram in the figure).

However, the third case occurs if there exists at least one child with a value different value from a proven win. Then we cannot label p as a proven loss. Instead p gets updates as if a simulation win (instead of a proven win) were being backed up from node c (right backup diagram in the figure; v and u indicate non-proven values). Non-proven values are backed up as in regular MCTS.

Further details in Monte-Carlo Tree Search in Lines of Action by Mark H.M. Winands, Yngvi Bjornsson and Jahn-Takeshi Saito.

The basic variant of MCTS doesn't require a special management for terminal states.

It just updates the score / number of visits depending on the result (win / lose) and proceeds with back-propagation.

It's a sort of istantaneous playout/simulation and cannot be skipped (allowing MCTS to converge to the game-theoretical value).

The algorithm can be improved (e.g. MONTE-CARLO TREE SEARCH SOLVER) using large scores (say +∞ / -∞) and ad hoc backpropagation / selection schemes:

A special provision is then taken when backing such proven values up the tree. There are three cases to consider as shown in Fig. (we use the negamax formulation, alternating signs between levels).

First, when a simulation backs up a proven loss (−∞) from a child c to a parent p, the parent node p becomes, and is labelled as, a proven win (∞), that is, the position is won for the player at p because the move played leads to a win (left backup diagram in the figure).

When backing up a proven win (∞) from c to p, one must, however, also look at the other children of p to determine p’s value. In the second case, when all child nodes of p are also a proven win (∞), then the value of p becomes a proven loss (-∞), because all moves lead to a position lost for p (middle backup diagram in the figure).

However, the third case occurs if there exists at least one child with a value different value from a proven win. Then we cannot label p as a proven loss. Instead p gets updates as if a simulation win (instead of a proven win) were being backed up from node c (right backup diagram in the figure; v and u indicate non-proven values). Non-proven values are backed up as in regular MCTS.

Further details in Monte-Carlo Tree Search in Lines of Action by Mark H.M. Winands, Yngvi Bjornsson and Jahn-Takeshi Saito.

Source Link

answered Aug 1, 2018 at 10:05

manlio

4.3k
3
28
38

The basic variant of MCTS doesn't require a special management for terminal states.

It just updates the score / number of visits variable depending on the result (win / lose) and proceeds with back-propagation.

It's a sort of istantaneous playout/simulation and cannot be skipped (allowing MCTS to converge to the game-theoretical value).

The algorithm can be improved (e.g. MONTE-CARLO TREE SEARCH SOLVER) using large scores (say +∞ / -∞) and ad hoc backpropagation / selection schemes:

A special provision is then taken when backing such proven values up the tree. There are three cases to consider as shown in Fig. (we use the negamax formulation, alternating signs between levels).

First, when a simulation backs up a proven loss (−∞) from a child c to a parent p, the parent node p becomes, and is labelled as, a proven win (∞), that is, the position is won for the player at p because the move played leads to a win (left backup diagram in the figure).

When backing up a proven win (∞) from c to p, one must, however, also look at the other children of p to determine p’s value. In the second case, when all child nodes of p are also a proven win (∞), then the value of p becomes a proven loss (-∞), because all moves lead to a position lost for p (middle backup diagram in the figure).

However, the third case occurs if there exists at least one child with a value different value from a proven win. Then we cannot label p as a proven loss. Instead p gets updates as if a simulation win (instead of a proven win) were being backed up from node c (right backup diagram in the figure; v and u indicate non-proven values). Non-proven values are backed up as in regular MCTS.

Further details in Monte-Carlo Tree Search in Lines of Action by Mark H.M. Winands, Yngvi Bjornsson and Jahn-Takeshi Saito.