Potentially wrong reward #237

Max-Fu · 2020-11-21T00:51:12Z

Inside the gym environment, there are two robot speed: self.speed and self.robot_speed; while self.robot_speed is set to a constant, self.speed is the true speed. Yet in the reward function, the function calls self.robot_speed instead of self.speed (check this). I think this creates the reward mis-specification problem (i.e. DDPG learns trivial policy). Can one of the repo creators check if this is indeed an error? Thanks! (I just restarted my run and will check if this solve the issue.)

CourchesneA · 2020-11-23T15:50:43Z

@Max-Fu I think there has not been a lot of test and tuning of that reward function. Please submit a PR if you can improve the current version

…own#237

Max-Fu mentioned this issue Nov 21, 2020

Pytorch RL learns bad policy with default paramters #215

Open

MasWag added a commit to MasWag/gym-duckietown that referenced this issue Aug 15, 2024

Use self.speed to compute rewards fixes duckietown#237

12361eb

MasWag added a commit to MasWag/gym-duckietown that referenced this issue Aug 15, 2024

Use self.speed to compute rewards + updated the weight. fixes duckiet…

8cca5db

…own#237

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potentially wrong reward #237

Potentially wrong reward #237

Max-Fu commented Nov 21, 2020 •

edited

Loading

CourchesneA commented Nov 23, 2020

Potentially wrong reward #237

Potentially wrong reward #237

Comments

Max-Fu commented Nov 21, 2020 • edited Loading

CourchesneA commented Nov 23, 2020

Max-Fu commented Nov 21, 2020 •

edited

Loading