Timeline for Alternative approach for Q-Learning

3 events

when toggle format	what		by	license	comment
Dec 12, 2023 at 18:48	comment	added	BitTickler		IMHO (99% sure), you do not even need bellman equation and complicated updates for the tic-tac-toe kind of problems. If you stay table based (5000-ish states for tic tac toe), you can simply update the q-table from back to front (the last move to the first move of the episode) with a simple Q(st,a) = MAX(Q(st-1,a),R).
Nov 5, 2019 at 11:15	review	First posts
Nov 5, 2019 at 13:46
Nov 5, 2019 at 11:12	history	answered	Abdul Rehman	CC BY-SA 4.0