You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The mean rewards are computed by adding the mean of all stored cumulative rewards to the self.tracking_data dictionary self.tracking_data["Reward / Total reward (mean)"].append(np.mean(track_rewards)). Then every time the data is mean to be written the mean of all the rewards store in self.tracking_data["Reward / Total reward (mean)"] is written self.writer.add_scalar(k, np.mean(v), timestep) and the tracking data is cleared. The issue is that points are appended every time there is data inside self._track_rewards the cumulative rewards storage. This results all the cumulative rewards that have been added in storage since the last write to be meaned, added to the tracking data and meaned again on write.
eg. say each episode is 3 steps, only 1 env instance running. Writing done every 9 steps
At the end of step 9 the mean cumulative reward of the past 3 episodes is 14.(6) . The cumulative reward that is being written to tensorboard is -29.2 VERY DIFFERENT
SOLUTION: self._track_rewards.clear() after every time data is added to self.tracking_data["Reward / Total reward (mean)"]
What skrl version are you using?
1.0.0
What ML framework/library version are you using?
pytorch
Additional system information
No response
The text was updated successfully, but these errors were encountered:
Description
The mean rewards are computed by adding the mean of all stored cumulative rewards to the self.tracking_data dictionary
self.tracking_data["Reward / Total reward (mean)"].append(np.mean(track_rewards))
. Then every time the data is mean to be written the mean of all the rewards store in self.tracking_data["Reward / Total reward (mean)"] is writtenself.writer.add_scalar(k, np.mean(v), timestep)
and the tracking data is cleared. The issue is that points are appended every time there is data inside self._track_rewards the cumulative rewards storage. This results all the cumulative rewards that have been added in storage since the last write to be meaned, added to the tracking data and meaned again on write.eg. say each episode is 3 steps, only 1 env instance running. Writing done every 9 steps
step1: self._track_rewards = [] self.tracking_data["Reward / Total reward (mean)"]=[]
step2: self._track_rewards = [] self.tracking_data["Reward / Total reward (mean)"]=[]
step3: Episode finishes with cumulative reward -30: self._track_rewards = [-30] self.tracking_data["Reward / Total reward (mean)"]=[-30]
step4: self._track_rewards = [-30] self.tracking_data["Reward / Total reward (mean)"]=[-30, -30]
step5: self._track_rewards = [-30] self.tracking_data["Reward / Total reward (mean)"]=[-30, -30, -30]
step6: Episode finished with cumulative reward -4: step3: self._track_rewards = [-30, -4] self.tracking_data["Reward / Total reward (mean)"]=[-30, -30, -30, -17]
step7: self._track_rewards = [-30, -4] self.tracking_data["Reward / Total reward (mean)"]=[-30, -30, -30, -17, -17]
step8: self._track_rewards = [-30, -4] self.tracking_data["Reward / Total reward (mean)"]=[-30, -30, -30, -17, -17, -17]
step9 : Episode finished with reward -10: self._track_rewards = [-30, -4, -10] self.tracking_data["Reward / Total reward (mean)"]=[-30, -30, -30, -17, -17, -22]
At the end of step 9 the mean cumulative reward of the past 3 episodes is 14.(6) . The cumulative reward that is being written to tensorboard is -29.2 VERY DIFFERENT
SOLUTION:
self._track_rewards.clear()
after every time data is added to self.tracking_data["Reward / Total reward (mean)"]What skrl version are you using?
1.0.0
What ML framework/library version are you using?
pytorch
Additional system information
No response
The text was updated successfully, but these errors were encountered: