11

In a game of Stockfish (White) vs Leela (Black), in this position, Stockfish plays the move Bxg7.

[FEN "2rr4/5pbk/PqP3p1/1N2BpPp/1PQ1bP1P/8/3R4/R4K2 w - - 3 40"] 1. Bxg7 Rxc6 2. Qxf7 Bg2+ 3. Rxg2 Rc1+ 4. Rxc1 Rd1+ 5. Rxd1 Qf2+ 6. Kxf2 

If you plug the FEN string straight into the Lichess Analysis Board and turn on the browser engine (also Stockfish), it does indeed show the best move as Bxg7, giving White a +2.6 advantage on the eval bar at a depth of 26.

However, as soon as you play that move, the eval bar drops to 0.0, because it sees that Black can play Rxc6 and force a draw in only 5 moves.

I thought the evaluation and best move for any engine were directly related. The eval should stay the same if you play the best move.

How can playing the best move cause a drop in the eval bar?

3
  • chess.stackexchange.com/questions/29417/… You might be interested, although this position is less dramatic since the "wrong move" only leads to a draw (as opposed to loss). Commented Oct 22, 2024 at 0:33
  • 2
    This question is similar to: What does it mean when stockfish evaluates a move as an inaccuracy after previously thinking it was the best move?. If you believe it’s different, please edit the question, make it clear how it’s different and/or how the answers on that question are not helpful for your problem. Commented Oct 22, 2024 at 13:25
  • 1
    Both helpful links, thanks. @BCLC I think, whilst that is a similar question, the scenario here is quite different, because the draw can be forced in only 5 moves, which is within the engine depth. The accepted answer here mentions "probably wrong" moves are pruned, which explains this scenario Commented Oct 22, 2024 at 14:33

3 Answers 3

23

The horizon effect. The engine can only see so far. When you make a move, it can now see further and find the draw that it missed before when it still needed to look at all the alternatives too.

Especially when the line involves "probably wrong" moves, like sacrifices that don't bring immediate returns, they may get disregarded (pruned) quickly while they're further out, but may be considered seriously if they're 1 move closer to the current position.

8

You can get a hint of this by letting the engine run longer. At d = 26, it shows Bxg7 as the best move with an eval of +2.6. At d = 45 (I unfortunately didn't catch exactly when the best move changed), the best move changed to Qxf7 with +0.1 eval.

If it were indeed true that:

The eval should stay the same if you play the best move.

Then how can this shift be explained?

The answer is as explained in RemcoGerlich's answer. Engines are not perfect, they see only up to some depth, and they can make mistakes. It should be obvious that letting an engine run with 1 second of thinking time per move should yield different results to letting the engine run with 10 minutes of thinking time per move. An engine running on 1s/move should be expected to miss nuances that take more thinking time to see, and that's what's happening here. There is some subtlety to the position such that Stockfish, at d = 26, cannot see, but it sees it at d = 45. It's further possible that at even higher depth, Stockfish will see yet more resources for either side; we don't know until we let Stockfish run that long.

1

Do you think most positions where the evaluation is between -1 and 1 are draws with best play from both sides? The starting position, for instance?

If so we will expect that if you let the engine play the best moves for the whole game the evaluation will tend to 0.

In fact, this makes the engine evaluation sometimes changing move-over-move to be an ideal quality of an engine in my opinion. The engine would be less helpful to me if it evaluated almost all opening positions as 0.0. It would also be less helpful to me if it didn't evaluate dead drawn theoretical endgames as 0.0. So I think it is a good thing that the evaluation must decrease in magnitude somewhere in between.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.