-
Notifications
You must be signed in to change notification settings - Fork 333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
curiosity #53
base: master
Are you sure you want to change the base?
curiosity #53
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Hi, @zhchaoo
@Curt-Park What do you think? Curiosity is not rainbow series, so it is different from the concept of this tutorial. But it is also one of algorithms using DQN. |
@@ -0,0 +1,723 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like the most important part in this tutorial.
Currently, it doesn't look having self-contained information so that people understand it enough by this.
How about adding more information like mathematical formulation? or the difference from the naive DQN?
Reply via ReviewNB
@@ -0,0 +1,723 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line #38. # Forward prediction: predict next state feature, given current state feature and action (one-hot)
Suggestion (Google Docstyle):
"""Forward prediction.predict next state feature, given current state feature and action (one-hot).
"""
Reply via ReviewNB
@@ -0,0 +1,723 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line #39. pred_s_next = F.relu(self.pred_module1( torch.cat([feature_x, a_vec.float()], dim = -1).detach()))
suggestion (Black Style):
pred_s_next = F.relu(self.pred_module1(torch.cat([feature_x, a_vec.float()], dim =-1).detach())
)
Reply via ReviewNB
@@ -0,0 +1,723 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line #59. self.use_extrinsic = use_extrinsic
Need to add description in the docstring
Reply via ReviewNB
@@ -0,0 +1,723 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line #60. self.intrinsic_scale = intrinsic_scale
Need to add description in the docstring
Reply via ReviewNB
@@ -0,0 +1,723 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line #73. self.icm = ICM(obs_dim, action_dim).to(self.device)
Need to add description in the docstring
Reply via ReviewNB
@@ -0,0 +1,723 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line #201. a_vec = F.one_hot(action, num_classes = self.env.action_space.n).reshape(-1,self.env.action_space.n) # convert action from int to one-hot format
Line is too long. Suggestion (Black style):
# convert action from int to one-hot formata_vec = F.one_hot(
action, num_classes=self.env.action_space.n
).reshape(-1, self.env.action_space.n)
Reply via ReviewNB
@@ -0,0 +1,723 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -0,0 +1,723 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line #37. def pred(self, feature_x, a_vec):
Need to add type annotation:
feature_x: torch.Tensor, a_vec: torch.Tensor
Reply via ReviewNB
experiment curiosity, use icm to add intrinsic rewards.
Deepak Pathak, Pulkit Agrawal, Alexei A. Efros and Trevor Darrell. Curiosity-driven Exploration by Self-supervised Prediction. In ICML 2017.