Delayed traces to cope with delay in RL
There may be 2 options:
- Delay affects only the rewards, i.e., consequences in terms of the action taken are immediately experienced by the agent (already from next state) but reward for present transition is given with some delay.
- Delay affects both actions and rewards: if the agents chooses an action at time
$t$ , the corresponding transition actually happens after some time$t+d$ , with respect to the state at time$t+d$ .
Delayed traces should work as the standard traces but being triggered after some time, corresponding to the delay. So the trace related with state-action pair
! This may be wrong. Do we want to update
As a start, setting
So far a simple grid world is implemented where an agent and a goal are randomly initialised. Actions corresponds to cardinal points plus a dummy action with which the agent stays. Reward is
HPs are basically thrown at random.
Surely the code is wrong somewhere as performance are strictly worse with the additional information of the delay.
UPDATE: Fixed replacing traces. Currently