C (gcc), 679787679787 652892
8484 76 bytes, 679619/1215236679619 652740 incorrect guesses.
#define l p[a][b][c][d] p[128][128][128][128];a,b,c,d;g(h){l=h;a=b;b=c;c=d;d=h;h=l;p[a][b][c][d]=h;h=p[a=b][b=c][c=d][d=h];} Update: ~27000 points off with updated file, 16 points (8 bytes) with a better-golfed function.
Explanation
The way this works is that as the code runs through the text, it memorizes the last character that terminated any given 4-character sequence, and returns that value. Somewhat similar to Arnauld's approach above, but relies on the inherent likelihood of two given 4-character sequences terminating the same way.
De-golfed:
int p[128][128][128][128]; int a,b,c,d; g(h){ p[a][b][c][d]=h; // Memorize a=b;the last character. b=c; h=p[a=b][b=c][c=d][d=h]; // c=d; Read the guess. d=h;We save several h=p[a][b][c][d]; // bytes with the assignments inside indices. }