Return to Answer

Indent to increase level of bulleted list.

Source Link

edited Oct 21, 2015 at 13:17

user40980

int fbs[MAX_STATES][MAX_MOVES]; // feedbacks
void Feedback(fb) { fbs[lastState][lastMove] += f; }
U32 Output(state) {
try random move every N_RAND'th time (rand() % N_RAND == 0)

- return lastMove = rand() % MAX_MOVES;
Find max(fbs[lastState][m=0..MAX_MOVES])

- if found, return m; // risk with this to try opponents other moves.

- else, force random
lastState = state; lastMove = m;

As rotations have same meaning, there are less meaningful states and could use less bits for those. But we may want AI to "find" those "meaningfulnesses" out by itself and optimize itself later
Output: Index of free cells (other outputs are impossible)

on 2nd iteration we can have as much as 5 options (log(5)>2bits), so 3b output is good.

There are less meaningful outputs (at 1st iteration: center, corner, side) - could have output as index of meaningful outputs
- on 2nd iteration we can have as much as 5 options (log(5)>2bits), so 3b output is good.
- There are less meaningful outputs (at 1st iteration: center, corner, side) - could have output as index of meaningful outputs

The impossible and not meaningful moves will be ignored later after "training" by weighting them down (as bad moves) automatically AI.
Could use 4 bit feedback for (lost/won in N steps)
If state not found - could interpolate from similar states instead of trying random
[State][Move] combined are 15+3 18b or less. So you have 157464 or less state-move combinations.
Using 1B resolution/precision for feedback is would use 157KB or RAM (if you move to more complex games, optimizing memory usage becomes you main job). It is wasting as you only wanna sort ~5 possible moves by their fb, so 3b would be enough, but with "arithmetical coding" it's still a big waste of 3rd bit. In the trained and optimized model just 1 bit is enough as you only want the best moves (1) and worse moves' feedback should go to (0) (RAM: 20KB), or just write down the best 3b moves for 19683 state (7.4KB), can be compressed better...
To make it smarter, you could add aging to the feedbacks, so that if game rules change, it (or should we call she) would start winning sooner.
Wiki: "138 terminal board positions"

Can do only 4-5 moves at max before ending in 1 of those 138 positions, means we have less than 5x138 decisions to make, means the best answers are <259B of data (possibly around 100B). Good AI would optimize its RAM usage from 157KB to 0.1KB. The best AI would zip this info probably to <30 bytes and use 157KB RAM for much greater problems.
- Can do only 4-5 moves at max before ending in 1 of those 138 positions, means we have less than 5x138 decisions to make, means the best answers are <259B of data (possibly around 100B). Good AI would optimize its RAM usage from 157KB to 0.1KB. The best AI would zip this info probably to <30 bytes and use 157KB RAM for much greater problems.

int fbs[MAX_STATES][MAX_MOVES]; // feedbacks
void Feedback(fb) { fbs[lastState][lastMove] += f; }
U32 Output(state) {
try random move every N_RAND'th time (rand() % N_RAND == 0)

- return lastMove = rand() % MAX_MOVES;
Find max(fbs[lastState][m=0..MAX_MOVES])

- if found, return m; // risk with this to try opponents other moves.

- else, force random
lastState = state; lastMove = m;

As rotations have same meaning, there are less meaningful states and could use less bits for those. But we may want AI to "find" those "meaningfulnesses" out by itself and optimize itself later
Output: Index of free cells (other outputs are impossible)

on 2nd iteration we can have as much as 5 options (log(5)>2bits), so 3b output is good.

There are less meaningful outputs (at 1st iteration: center, corner, side) - could have output as index of meaningful outputs

The impossible and not meaningful moves will be ignored later after "training" by weighting them down (as bad moves) automatically AI.
Could use 4 bit feedback for (lost/won in N steps)
If state not found - could interpolate from similar states instead of trying random
[State][Move] combined are 15+3 18b or less. So you have 157464 or less state-move combinations.
Using 1B resolution/precision for feedback is would use 157KB or RAM (if you move to more complex games, optimizing memory usage becomes you main job). It is wasting as you only wanna sort ~5 possible moves by their fb, so 3b would be enough, but with "arithmetical coding" it's still a big waste of 3rd bit. In the trained and optimized model just 1 bit is enough as you only want the best moves (1) and worse moves' feedback should go to (0) (RAM: 20KB), or just write down the best 3b moves for 19683 state (7.4KB), can be compressed better...
To make it smarter, you could add aging to the feedbacks, so that if game rules change, it (or should we call she) would start winning sooner.
Wiki: "138 terminal board positions"

Can do only 4-5 moves at max before ending in 1 of those 138 positions, means we have less than 5x138 decisions to make, means the best answers are <259B of data (possibly around 100B). Good AI would optimize its RAM usage from 157KB to 0.1KB. The best AI would zip this info probably to <30 bytes and use 157KB RAM for much greater problems.

int fbs[MAX_STATES][MAX_MOVES]; // feedbacks
void Feedback(fb) { fbs[lastState][lastMove] += f; }
U32 Output(state) {
try random move every N_RAND'th time (rand() % N_RAND == 0)
- return lastMove = rand() % MAX_MOVES;
Find max(fbs[lastState][m=0..MAX_MOVES])
- if found, return m; // risk with this to try opponents other moves.
- else, force random
lastState = state; lastMove = m;

As rotations have same meaning, there are less meaningful states and could use less bits for those. But we may want AI to "find" those "meaningfulnesses" out by itself and optimize itself later
Output: Index of free cells (other outputs are impossible)
- on 2nd iteration we can have as much as 5 options (log(5)>2bits), so 3b output is good.
- There are less meaningful outputs (at 1st iteration: center, corner, side) - could have output as index of meaningful outputs
The impossible and not meaningful moves will be ignored later after "training" by weighting them down (as bad moves) automatically AI.
Could use 4 bit feedback for (lost/won in N steps)
If state not found - could interpolate from similar states instead of trying random
[State][Move] combined are 15+3 18b or less. So you have 157464 or less state-move combinations.
Using 1B resolution/precision for feedback is would use 157KB or RAM (if you move to more complex games, optimizing memory usage becomes you main job). It is wasting as you only wanna sort ~5 possible moves by their fb, so 3b would be enough, but with "arithmetical coding" it's still a big waste of 3rd bit. In the trained and optimized model just 1 bit is enough as you only want the best moves (1) and worse moves' feedback should go to (0) (RAM: 20KB), or just write down the best 3b moves for 19683 state (7.4KB), can be compressed better...
To make it smarter, you could add aging to the feedbacks, so that if game rules change, it (or should we call she) would start winning sooner.
Wiki: "138 terminal board positions"
- Can do only 4-5 moves at max before ending in 1 of those 138 positions, means we have less than 5x138 decisions to make, means the best answers are <259B of data (possibly around 100B). Good AI would optimize its RAM usage from 157KB to 0.1KB. The best AI would zip this info probably to <30 bytes and use 157KB RAM for much greater problems.

Source Link

answered Oct 21, 2015 at 9:47

slyy2048

Trivial, Basic, Simple concept of AI for Tic-tac-toe, which was my 1st test for my AI prototype years ago.

Game

N x N cells. N=3 => 9 cells
Every cell can have 3 states (Empty, 0, 1). Game has 3ˇ9 = 19683 states
Player must choose 1 cell

AI input: State of the game (1 of 81 states)

15 bits for combinations of the state (log2(3ˇ9))
Not: using 2 bits per cell state, 9x2 is 18 bits

AI Output: Cell (move)

Use 3 bits (use 1 of 8 free cells) - yes 9th cell wont be chosen, but we have 9 free cells only in first iteration and only if we start and in this moment it has the same meaning as cell[0]
Not: using 9 bits (1 bit per cell)
Not: using 4 bits (4-bit index of the cell)

AI Feedback: Game won or lost (1 bit)

AI internal

int fbs[MAX_STATES][MAX_MOVES]; // feedbacks
void Feedback(fb) { fbs[lastState][lastMove] += f; }
U32 Output(state) {
try random move every N_RAND'th time (rand() % N_RAND == 0)
- return lastMove = rand() % MAX_MOVES;
Find max(fbs[lastState][m=0..MAX_MOVES])
- if found, return m; // risk with this to try opponents other moves.
- else, force random
lastState = state; lastMove = m;

Notes: (you can skip those)

As rotations have same meaning, there are less meaningful states and could use less bits for those. But we may want AI to "find" those "meaningfulnesses" out by itself and optimize itself later
Output: Index of free cells (other outputs are impossible)

on 2nd iteration we can have as much as 5 options (log(5)>2bits), so 3b output is good.
There are less meaningful outputs (at 1st iteration: center, corner, side) - could have output as index of meaningful outputs

The impossible and not meaningful moves will be ignored later after "training" by weighting them down (as bad moves) automatically AI.
Could use 4 bit feedback for (lost/won in N steps)
If state not found - could interpolate from similar states instead of trying random
[State][Move] combined are 15+3 18b or less. So you have 157464 or less state-move combinations.
Using 1B resolution/precision for feedback is would use 157KB or RAM (if you move to more complex games, optimizing memory usage becomes you main job). It is wasting as you only wanna sort ~5 possible moves by their fb, so 3b would be enough, but with "arithmetical coding" it's still a big waste of 3rd bit. In the trained and optimized model just 1 bit is enough as you only want the best moves (1) and worse moves' feedback should go to (0) (RAM: 20KB), or just write down the best 3b moves for 19683 state (7.4KB), can be compressed better...
To make it smarter, you could add aging to the feedbacks, so that if game rules change, it (or should we call she) would start winning sooner.
Wiki: "138 terminal board positions"

Can do only 4-5 moves at max before ending in 1 of those 138 positions, means we have less than 5x138 decisions to make, means the best answers are <259B of data (possibly around 100B). Good AI would optimize its RAM usage from 157KB to 0.1KB. The best AI would zip this info probably to <30 bytes and use 157KB RAM for much greater problems.