Skip to main content
8 events
when toggle format what by license comment
S May 10, 2017 at 8:04 history suggested VividD
added a tag
May 9, 2017 at 18:05 review Suggested edits
S May 10, 2017 at 8:04
Apr 21, 2017 at 9:11 answer added Neil Slater timeline score: 1
Apr 21, 2017 at 8:37 history edited Neil Slater CC BY-SA 3.0
added 510 characters in body
Apr 21, 2017 at 8:28 comment added user4218673 @NeilSlater for the second post: - Ah so essentially, what it means is that on each state the selected action will contribute towards an índex with X probability. At the end of the 249 days, based on the accumulated índex the final value is calculated. Why am I doing this? Well its just the way the system works, its an optimization model for a regulatory situation, which implies revising the whole reward every 250 days, based on the performance (which is tied to actions) of the other 249 days.
Apr 21, 2017 at 8:24 comment added user4218673 @NeilSlater First of all, thank you for your replies. For the first section: - Tabular method. They are being trained in a simulation, the optimal policy will be achieved once per model. The state-space is quite large actually, which is one of my concerns about how many epochs (episodes) I should have, its a 250*8*7*3000 space-action (3-state combination, 250,8,7 and 3000 possible actions per state).
Apr 20, 2017 at 16:09 review First posts
Apr 20, 2017 at 23:37
Apr 20, 2017 at 16:08 history asked user4218673 CC BY-SA 3.0