Timeline for Choosing the right parameters for SARSA and Q-Learning & Comparing Models

8 events

when toggle format	what		by	license	comment
S May 10, 2017 at 8:04	history	suggested	VividD		added a tag
May 9, 2017 at 18:05	review	Suggested edits
S May 10, 2017 at 8:04
Apr 21, 2017 at 9:11	answer	added	Neil Slater		timeline score: 1
Apr 21, 2017 at 8:37	history	edited	Neil Slater	CC BY-SA 3.0	added 510 characters in body
Apr 21, 2017 at 8:28	comment	added	user4218673		@NeilSlater for the second post: - Ah so essentially, what it means is that on each state the selected action will contribute towards an índex with X probability. At the end of the 249 days, based on the accumulated índex the final value is calculated. Why am I doing this? Well its just the way the system works, its an optimization model for a regulatory situation, which implies revising the whole reward every 250 days, based on the performance (which is tied to actions) of the other 249 days.
Apr 21, 2017 at 8:24	comment	added	user4218673		@NeilSlater First of all, thank you for your replies. For the first section: - Tabular method. They are being trained in a simulation, the optimal policy will be achieved once per model. The state-space is quite large actually, which is one of my concerns about how many epochs (episodes) I should have, its a 25087*3000 space-action (3-state combination, 250,8,7 and 3000 possible actions per state).
Apr 20, 2017 at 16:09	review	First posts
Apr 20, 2017 at 23:37
Apr 20, 2017 at 16:08	history	asked	user4218673	CC BY-SA 3.0