Revisions to Why does Min-Max algorithm delays a good move indefinitely?

added 247 characters in body

edited Oct 24, 2019 at 16:39

295
3
11

In a game-theoretical sense, both of the moves you describe are equally good in the scenarios you describe, so the algorithm is "correct" in that it doesn't matter which move it picks -- you do not explicitly encode a preference for faster wins (or faster gains of $+100$) over slower scores.

Introducing a discount factor to explicitly encode such a preference, as described in ryan's answer, can be a solution... but it's not very common in game AI / minimax / alphabeta search literature, and it can also be detrimental in slightly different scenarios. For example, if we can get a score of $+100$ in one ply, or a score of $+101$ in the second ply, we would generally prefer the path leading to that delayed $+101$... but a common choice of a discount factor $\gamma = 0.9$ would lead us to prefer the suboptimal path. Note that this does not just hold for the example of $\gamma = 0.9$; we can come up with such a counterexample for any value $0 \leq \gamma < 1$ that will cause the solution with discount factor to break down and lead to suboptimal behaviour.

A more robust solution is a combination of iterative deepening with move ordering. In practice, we rarely use a plain minimax search with a predetermined depth limit. In practice, we usually do something like:

Start out with a $1$-ply search
Proceed to run a $2$-ply search afterwards if we still have time
Proceed to run a $3$-ply search if we still have time
etc.

This is referred to as iterative deepening. Now, when we start a new search (with an increased depth limit), a common trick is to first re-order the moves at the root node based on the scores found in the previous search (with the lower depth limit).

Now, imagine that we apply this process to the scenario described in your question. The Qxb8 line of play will first be discovered to give a $+100$ score in one of the first iterations (shallow searches). Based on this, we will sort that Qxb8 line to be searched before the Qa1+ line when moving on to the next iteration with a deeper search. Both lines will then be discovered to have the same $+100$ score, but now, due to clever move ordering (not specific to this case, but clever move ordering in general thanks to iterative deepening!), the shorter line will have been searched first and be the preferred line of play.

Note that the "trick" with iterative deepening + move ordering is not just useful to solve the issue highlighted in this question, but in general can also significantly increase the number of prunings that can be achieved when we extend the basic minimax algorithms with techniques like alpha-beta pruning.

In a game-theoretical sense, both of the moves you describe are equally good in the scenarios you describe, so the algorithm is "correct" in that it doesn't matter which move it picks -- you do not explicitly encode a preference for faster wins (or faster gains of $+100$) over slower scores.

Introducing a discount factor to explicitly encode such a preference, as described in ryan's answer, can be a solution... but it's not very common in game AI / minimax / alphabeta search literature, and it can also be detrimental in slightly different scenarios. For example, if we can get a score of $+100$ in one ply, or a score of $+101$ in the second ply, we would generally prefer the path leading to that delayed $+101$... but a common choice of a discount factor $\gamma = 0.9$ would lead us to prefer the suboptimal path.

A more robust solution is a combination of iterative deepening with move ordering. In practice, we rarely use a plain minimax search with a predetermined depth limit. In practice, we usually do something like:

Start out with a $1$-ply search
Proceed to run a $2$-ply search afterwards if we still have time
Proceed to run a $3$-ply search if we still have time
etc.

This is referred to as iterative deepening. Now, when we start a new search (with an increased depth limit), a common trick is to first re-order the moves at the root node based on the scores found in the previous search (with the lower depth limit).

Now, imagine that we apply this process to the scenario described in your question. The Qxb8 line of play will first be discovered to give a $+100$ score in one of the first iterations (shallow searches). Based on this, we will sort that Qxb8 line to be searched before the Qa1+ line when moving on to the next iteration with a deeper search. Both lines will then be discovered to have the same $+100$ score, but now, due to clever move ordering (not specific to this case, but clever move ordering in general thanks to iterative deepening!), the shorter line will have been searched first and be the preferred line of play.

Note that the "trick" with iterative deepening + move ordering is not just useful to solve the issue highlighted in this question, but in general can also significantly increase the number of prunings that can be achieved when we extend the basic minimax algorithms with techniques like alpha-beta pruning.

In a game-theoretical sense, both of the moves you describe are equally good in the scenarios you describe, so the algorithm is "correct" in that it doesn't matter which move it picks -- you do not explicitly encode a preference for faster wins (or faster gains of $+100$) over slower scores.

Introducing a discount factor to explicitly encode such a preference, as described in ryan's answer, can be a solution... but it's not very common in game AI / minimax / alphabeta search literature, and it can also be detrimental in slightly different scenarios. For example, if we can get a score of $+100$ in one ply, or a score of $+101$ in the second ply, we would generally prefer the path leading to that delayed $+101$... but a common choice of a discount factor $\gamma = 0.9$ would lead us to prefer the suboptimal path. Note that this does not just hold for the example of $\gamma = 0.9$; we can come up with such a counterexample for any value $0 \leq \gamma < 1$ that will cause the solution with discount factor to break down and lead to suboptimal behaviour.

A more robust solution is a combination of iterative deepening with move ordering. In practice, we rarely use a plain minimax search with a predetermined depth limit. In practice, we usually do something like:

Start out with a $1$-ply search
Proceed to run a $2$-ply search afterwards if we still have time
Proceed to run a $3$-ply search if we still have time
etc.

This is referred to as iterative deepening. Now, when we start a new search (with an increased depth limit), a common trick is to first re-order the moves at the root node based on the scores found in the previous search (with the lower depth limit).

Now, imagine that we apply this process to the scenario described in your question. The Qxb8 line of play will first be discovered to give a $+100$ score in one of the first iterations (shallow searches). Based on this, we will sort that Qxb8 line to be searched before the Qa1+ line when moving on to the next iteration with a deeper search. Both lines will then be discovered to have the same $+100$ score, but now, due to clever move ordering (not specific to this case, but clever move ordering in general thanks to iterative deepening!), the shorter line will have been searched first and be the preferred line of play.

Note that the "trick" with iterative deepening + move ordering is not just useful to solve the issue highlighted in this question, but in general can also significantly increase the number of prunings that can be achieved when we extend the basic minimax algorithms with techniques like alpha-beta pruning.

added 317 characters in body

Source Link

edited Oct 20, 2019 at 9:20

Dennis Soemers

295
3
11

In a game-theoretical sense, both of the moves you describe are equally good in the scenarios you describe, so the algorithm is "correct" in that it doesn't matter which move it picks -- you do not explicitly encode a preference for faster wins (or faster gains of $+100$) over slower scores.

Introducing a discount factor to explicitly encode such a preference, as described in ryan's answer, can be a solution... but it's not very common in game AI / minimax / alphabeta search literature, and it can also be detrimental in slightly different scenarios. For example, if we can get a score of $+100$ in one ply, or a score of $+101$ in the second ply, we would generally prefer the path leading to that delayed $+101$... but a common choice of a discount factor $\gamma = 0.9$ would lead us to prefer the suboptimal path.

A more robust solution is a combination of iterative deepening with move ordering. In practice, we rarely use a plain minimax search with a predetermined depth limit. In practice, we usually do something like:

Start out with a $1$-ply search
Proceed to run a $2$-ply search afterwards if we still have time
Proceed to run a $3$-ply search if we still have time
etc.

This is referred to as iterative deepening. Now, when we start a new search (with an increased depth limit), a common trick is to first re-order the moves at the root node based on the scores found in the previous search (with the lower depth limit).

Now, imagine that we apply this process to the scenario described in your question. The Qxb8 line of play will first be discovered to give a $+100$ score in one of the first iterations (shallow searches). Based on this, we will sort that Qxb8 line to be searched before the Qa1+ line when moving on to the next iteration with a deeper search. Both lines will then be discovered to have the same $+100$ score, but now, due to clever move ordering (not specific to this case, but clever move ordering in general thanks to iterative deepening!), the shorter line will have been searched first and be the preferred line of play.

Note that the "trick" with iterative deepening + move ordering is not just useful to solve the issue highlighted in this question, but in general can also significantly increase the number of prunings that can be achieved when we extend the basic minimax algorithms with techniques like alpha-beta pruning.

In a game-theoretical sense, both of the moves you describe are equally good in the scenarios you describe, so the algorithm is "correct" in that it doesn't matter which move it picks -- you do not explicitly encode a preference for faster wins (or faster gains of $+100$) over slower scores.

Introducing a discount factor to explicitly encode such a preference, as described in ryan's answer, can be a solution... but it's not very common in game AI / minimax / alphabeta search literature, and it can also be detrimental in slightly different scenarios. For example, if we can get a score of $+100$ in one ply, or a score of $+101$ in the second ply, we would generally prefer the path leading to that delayed $+101$... but a common choice of a discount factor $\gamma = 0.9$ would lead us to prefer the suboptimal path.

A more robust solution is a combination of iterative deepening with move ordering. In practice, we rarely use a plain minimax search with a predetermined depth limit. In practice, we usually do something like:

Start out with a $1$-ply search
Proceed to run a $2$-ply search afterwards if we still have time
Proceed to run a $3$-ply search if we still have time
etc.

This is referred to as iterative deepening. Now, when we start a new search (with an increased depth limit), a common trick is to first re-order the moves at the root node based on the scores found in the previous search (with the lower depth limit).

Now, imagine that we apply this process to the scenario described in your question. The Qxb8 line of play will first be discovered to give a $+100$ score in one of the first iterations (shallow searches). Based on this, we will sort that Qxb8 line to be searched before the Qa1+ line when moving on to the next iteration with a deeper search. Both lines will then be discovered to have the same $+100$ score, but now, due to clever move ordering (not specific to this case, but clever move ordering in general thanks to iterative deepening!), the shorter line will have been searched first and be the preferred line of play.

In a game-theoretical sense, both of the moves you describe are equally good in the scenarios you describe, so the algorithm is "correct" in that it doesn't matter which move it picks -- you do not explicitly encode a preference for faster wins (or faster gains of $+100$) over slower scores.

Introducing a discount factor to explicitly encode such a preference, as described in ryan's answer, can be a solution... but it's not very common in game AI / minimax / alphabeta search literature, and it can also be detrimental in slightly different scenarios. For example, if we can get a score of $+100$ in one ply, or a score of $+101$ in the second ply, we would generally prefer the path leading to that delayed $+101$... but a common choice of a discount factor $\gamma = 0.9$ would lead us to prefer the suboptimal path.

A more robust solution is a combination of iterative deepening with move ordering. In practice, we rarely use a plain minimax search with a predetermined depth limit. In practice, we usually do something like:

Start out with a $1$-ply search
Proceed to run a $2$-ply search afterwards if we still have time
Proceed to run a $3$-ply search if we still have time
etc.

This is referred to as iterative deepening. Now, when we start a new search (with an increased depth limit), a common trick is to first re-order the moves at the root node based on the scores found in the previous search (with the lower depth limit).

Now, imagine that we apply this process to the scenario described in your question. The Qxb8 line of play will first be discovered to give a $+100$ score in one of the first iterations (shallow searches). Based on this, we will sort that Qxb8 line to be searched before the Qa1+ line when moving on to the next iteration with a deeper search. Both lines will then be discovered to have the same $+100$ score, but now, due to clever move ordering (not specific to this case, but clever move ordering in general thanks to iterative deepening!), the shorter line will have been searched first and be the preferred line of play.

Note that the "trick" with iterative deepening + move ordering is not just useful to solve the issue highlighted in this question, but in general can also significantly increase the number of prunings that can be achieved when we extend the basic minimax algorithms with techniques like alpha-beta pruning.

Source Link

answered Oct 18, 2019 at 20:09

Dennis Soemers

295
3
11

In a game-theoretical sense, both of the moves you describe are equally good in the scenarios you describe, so the algorithm is "correct" in that it doesn't matter which move it picks -- you do not explicitly encode a preference for faster wins (or faster gains of $+100$) over slower scores.

Introducing a discount factor to explicitly encode such a preference, as described in ryan's answer, can be a solution... but it's not very common in game AI / minimax / alphabeta search literature, and it can also be detrimental in slightly different scenarios. For example, if we can get a score of $+100$ in one ply, or a score of $+101$ in the second ply, we would generally prefer the path leading to that delayed $+101$... but a common choice of a discount factor $\gamma = 0.9$ would lead us to prefer the suboptimal path.

A more robust solution is a combination of iterative deepening with move ordering. In practice, we rarely use a plain minimax search with a predetermined depth limit. In practice, we usually do something like:

Start out with a $1$-ply search
Proceed to run a $2$-ply search afterwards if we still have time
Proceed to run a $3$-ply search if we still have time
etc.

This is referred to as iterative deepening. Now, when we start a new search (with an increased depth limit), a common trick is to first re-order the moves at the root node based on the scores found in the previous search (with the lower depth limit).

Now, imagine that we apply this process to the scenario described in your question. The Qxb8 line of play will first be discovered to give a $+100$ score in one of the first iterations (shallow searches). Based on this, we will sort that Qxb8 line to be searched before the Qa1+ line when moving on to the next iteration with a deeper search. Both lines will then be discovered to have the same $+100$ score, but now, due to clever move ordering (not specific to this case, but clever move ordering in general thanks to iterative deepening!), the shorter line will have been searched first and be the preferred line of play.

Stack Exchange Network

Return to Answer