Skip to main content
Indent to increase level of bulleted list.
Source Link
user40980
user40980
  • int fbs[MAX_STATES][MAX_MOVES]; // feedbacks
  • void Feedback(fb) { fbs[lastState][lastMove] += f; }
  • U32 Output(state) {
  • try random move every N_RAND'th time (rand() % N_RAND == 0)
  •  
    • return lastMove = rand() % MAX_MOVES;
  • Find max(fbs[lastState][m=0..MAX_MOVES])
  •  
    • if found, return m; // risk with this to try opponents other moves.
    •  
    • else, force random
  • lastState = state; lastMove = m;
  • As rotations have same meaning, there are less meaningful states and could use less bits for those. But we may want AI to "find" those "meaningfulnesses" out by itself and optimize itself later
  • Output: Index of free cells (other outputs are impossible)
  • on 2nd iteration we can have as much as 5 options (log(5)>2bits), so 3b output is good.
  • There are less meaningful outputs (at 1st iteration: center, corner, side) - could have output as index of meaningful outputs
    • on 2nd iteration we can have as much as 5 options (log(5)>2bits), so 3b output is good.
    • There are less meaningful outputs (at 1st iteration: center, corner, side) - could have output as index of meaningful outputs
     
  • The impossible and not meaningful moves will be ignored later after "training" by weighting them down (as bad moves) automatically AI.
  • Could use 4 bit feedback for (lost/won in N steps)
  • If state not found - could interpolate from similar states instead of trying random
  • [State][Move] combined are 15+3 18b or less. So you have 157464 or less state-move combinations.
  • Using 1B resolution/precision for feedback is would use 157KB or RAM (if you move to more complex games, optimizing memory usage becomes you main job). It is wasting as you only wanna sort ~5 possible moves by their fb, so 3b would be enough, but with "arithmetical coding" it's still a big waste of 3rd bit. In the trained and optimized model just 1 bit is enough as you only want the best moves (1) and worse moves' feedback should go to (0) (RAM: 20KB), or just write down the best 3b moves for 19683 state (7.4KB), can be compressed better...
  • To make it smarter, you could add aging to the feedbacks, so that if game rules change, it (or should we call she) would start winning sooner.
  • Wiki: "138 terminal board positions"
  • Can do only 4-5 moves at max before ending in 1 of those 138 positions, means we have less than 5x138 decisions to make, means the best answers are <259B of data (possibly around 100B). Good AI would optimize its RAM usage from 157KB to 0.1KB. The best AI would zip this info probably to <30 bytes and use 157KB RAM for much greater problems.
    • Can do only 4-5 moves at max before ending in 1 of those 138 positions, means we have less than 5x138 decisions to make, means the best answers are <259B of data (possibly around 100B). Good AI would optimize its RAM usage from 157KB to 0.1KB. The best AI would zip this info probably to <30 bytes and use 157KB RAM for much greater problems.
  • int fbs[MAX_STATES][MAX_MOVES]; // feedbacks
  • void Feedback(fb) { fbs[lastState][lastMove] += f; }
  • U32 Output(state) {
  • try random move every N_RAND'th time (rand() % N_RAND == 0)
  •  
    • return lastMove = rand() % MAX_MOVES;
  • Find max(fbs[lastState][m=0..MAX_MOVES])
  •  
    • if found, return m; // risk with this to try opponents other moves.
    •  
    • else, force random
  • lastState = state; lastMove = m;
  • As rotations have same meaning, there are less meaningful states and could use less bits for those. But we may want AI to "find" those "meaningfulnesses" out by itself and optimize itself later
  • Output: Index of free cells (other outputs are impossible)
  • on 2nd iteration we can have as much as 5 options (log(5)>2bits), so 3b output is good.
  • There are less meaningful outputs (at 1st iteration: center, corner, side) - could have output as index of meaningful outputs
     
  • The impossible and not meaningful moves will be ignored later after "training" by weighting them down (as bad moves) automatically AI.
  • Could use 4 bit feedback for (lost/won in N steps)
  • If state not found - could interpolate from similar states instead of trying random
  • [State][Move] combined are 15+3 18b or less. So you have 157464 or less state-move combinations.
  • Using 1B resolution/precision for feedback is would use 157KB or RAM (if you move to more complex games, optimizing memory usage becomes you main job). It is wasting as you only wanna sort ~5 possible moves by their fb, so 3b would be enough, but with "arithmetical coding" it's still a big waste of 3rd bit. In the trained and optimized model just 1 bit is enough as you only want the best moves (1) and worse moves' feedback should go to (0) (RAM: 20KB), or just write down the best 3b moves for 19683 state (7.4KB), can be compressed better...
  • To make it smarter, you could add aging to the feedbacks, so that if game rules change, it (or should we call she) would start winning sooner.
  • Wiki: "138 terminal board positions"
  • Can do only 4-5 moves at max before ending in 1 of those 138 positions, means we have less than 5x138 decisions to make, means the best answers are <259B of data (possibly around 100B). Good AI would optimize its RAM usage from 157KB to 0.1KB. The best AI would zip this info probably to <30 bytes and use 157KB RAM for much greater problems.
  • int fbs[MAX_STATES][MAX_MOVES]; // feedbacks
  • void Feedback(fb) { fbs[lastState][lastMove] += f; }
  • U32 Output(state) {
  • try random move every N_RAND'th time (rand() % N_RAND == 0)
    • return lastMove = rand() % MAX_MOVES;
  • Find max(fbs[lastState][m=0..MAX_MOVES])
    • if found, return m; // risk with this to try opponents other moves.
    • else, force random
  • lastState = state; lastMove = m;
  • As rotations have same meaning, there are less meaningful states and could use less bits for those. But we may want AI to "find" those "meaningfulnesses" out by itself and optimize itself later
  • Output: Index of free cells (other outputs are impossible)
    • on 2nd iteration we can have as much as 5 options (log(5)>2bits), so 3b output is good.
    • There are less meaningful outputs (at 1st iteration: center, corner, side) - could have output as index of meaningful outputs
  • The impossible and not meaningful moves will be ignored later after "training" by weighting them down (as bad moves) automatically AI.
  • Could use 4 bit feedback for (lost/won in N steps)
  • If state not found - could interpolate from similar states instead of trying random
  • [State][Move] combined are 15+3 18b or less. So you have 157464 or less state-move combinations.
  • Using 1B resolution/precision for feedback is would use 157KB or RAM (if you move to more complex games, optimizing memory usage becomes you main job). It is wasting as you only wanna sort ~5 possible moves by their fb, so 3b would be enough, but with "arithmetical coding" it's still a big waste of 3rd bit. In the trained and optimized model just 1 bit is enough as you only want the best moves (1) and worse moves' feedback should go to (0) (RAM: 20KB), or just write down the best 3b moves for 19683 state (7.4KB), can be compressed better...
  • To make it smarter, you could add aging to the feedbacks, so that if game rules change, it (or should we call she) would start winning sooner.
  • Wiki: "138 terminal board positions"
    • Can do only 4-5 moves at max before ending in 1 of those 138 positions, means we have less than 5x138 decisions to make, means the best answers are <259B of data (possibly around 100B). Good AI would optimize its RAM usage from 157KB to 0.1KB. The best AI would zip this info probably to <30 bytes and use 157KB RAM for much greater problems.
Source Link

Trivial, Basic, Simple concept of AI for Tic-tac-toe, which was my 1st test for my AI prototype years ago.

Game

  • N x N cells. N=3 => 9 cells
  • Every cell can have 3 states (Empty, 0, 1). Game has 3ˇ9 = 19683 states
  • Player must choose 1 cell

AI input: State of the game (1 of 81 states)

  • 15 bits for combinations of the state (log2(3ˇ9))
  • Not: using 2 bits per cell state, 9x2 is 18 bits

AI Output: Cell (move)

  • Use 3 bits (use 1 of 8 free cells) - yes 9th cell wont be chosen, but we have 9 free cells only in first iteration and only if we start and in this moment it has the same meaning as cell[0]
  • Not: using 9 bits (1 bit per cell)
  • Not: using 4 bits (4-bit index of the cell)

AI Feedback: Game won or lost (1 bit)

AI internal

  • int fbs[MAX_STATES][MAX_MOVES]; // feedbacks
  • void Feedback(fb) { fbs[lastState][lastMove] += f; }
  • U32 Output(state) {
  • try random move every N_RAND'th time (rand() % N_RAND == 0)
    • return lastMove = rand() % MAX_MOVES;
  • Find max(fbs[lastState][m=0..MAX_MOVES])
    • if found, return m; // risk with this to try opponents other moves.
    • else, force random
  • lastState = state; lastMove = m;

Notes: (you can skip those)

  • As rotations have same meaning, there are less meaningful states and could use less bits for those. But we may want AI to "find" those "meaningfulnesses" out by itself and optimize itself later
  • Output: Index of free cells (other outputs are impossible)
  • on 2nd iteration we can have as much as 5 options (log(5)>2bits), so 3b output is good.
  • There are less meaningful outputs (at 1st iteration: center, corner, side) - could have output as index of meaningful outputs
  • The impossible and not meaningful moves will be ignored later after "training" by weighting them down (as bad moves) automatically AI.
  • Could use 4 bit feedback for (lost/won in N steps)
  • If state not found - could interpolate from similar states instead of trying random
  • [State][Move] combined are 15+3 18b or less. So you have 157464 or less state-move combinations.
  • Using 1B resolution/precision for feedback is would use 157KB or RAM (if you move to more complex games, optimizing memory usage becomes you main job). It is wasting as you only wanna sort ~5 possible moves by their fb, so 3b would be enough, but with "arithmetical coding" it's still a big waste of 3rd bit. In the trained and optimized model just 1 bit is enough as you only want the best moves (1) and worse moves' feedback should go to (0) (RAM: 20KB), or just write down the best 3b moves for 19683 state (7.4KB), can be compressed better...
  • To make it smarter, you could add aging to the feedbacks, so that if game rules change, it (or should we call she) would start winning sooner.
  • Wiki: "138 terminal board positions"
  • Can do only 4-5 moves at max before ending in 1 of those 138 positions, means we have less than 5x138 decisions to make, means the best answers are <259B of data (possibly around 100B). Good AI would optimize its RAM usage from 157KB to 0.1KB. The best AI would zip this info probably to <30 bytes and use 157KB RAM for much greater problems.