Timeline for Choosing the right parameters for SARSA and Q-Learning & Comparing Models
Current License: CC BY-SA 3.0
8 events
| when toggle format | what | by | license | comment | |
|---|---|---|---|---|---|
| S May 10, 2017 at 8:04 | history | suggested | VividD | added a tag | |
| May 9, 2017 at 18:05 | review | Suggested edits | |||
| S May 10, 2017 at 8:04 | |||||
| Apr 21, 2017 at 9:11 | answer | added | Neil Slater | timeline score: 1 | |
| Apr 21, 2017 at 8:37 | history | edited | Neil Slater | CC BY-SA 3.0 | added 510 characters in body |
| Apr 21, 2017 at 8:28 | comment | added | user4218673 | @NeilSlater for the second post: - Ah so essentially, what it means is that on each state the selected action will contribute towards an índex with X probability. At the end of the 249 days, based on the accumulated índex the final value is calculated. Why am I doing this? Well its just the way the system works, its an optimization model for a regulatory situation, which implies revising the whole reward every 250 days, based on the performance (which is tied to actions) of the other 249 days. | |
| Apr 21, 2017 at 8:24 | comment | added | user4218673 | @NeilSlater First of all, thank you for your replies. For the first section: - Tabular method. They are being trained in a simulation, the optimal policy will be achieved once per model. The state-space is quite large actually, which is one of my concerns about how many epochs (episodes) I should have, its a 250*8*7*3000 space-action (3-state combination, 250,8,7 and 3000 possible actions per state). | |
| Apr 20, 2017 at 16:09 | review | First posts | |||
| Apr 20, 2017 at 23:37 | |||||
| Apr 20, 2017 at 16:08 | history | asked | user4218673 | CC BY-SA 3.0 |