Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWAC doesn't profit from offline data #166

Open
im-Kitsch opened this issue Jul 14, 2022 · 4 comments
Open

AWAC doesn't profit from offline data #166

im-Kitsch opened this issue Jul 14, 2022 · 4 comments

Comments

@im-Kitsch
Copy link

Hi,

@anair13 , it's nice that we can get the code, seems you answer AWAC questions frequently, so I just directly make "@" to you.

In AWAC paper the main benifit is that switching from offline-training to online training there is no "dip" of the performance. But when I run it on mujoco-gym environment, it doesn't get benifit from the pre-training on offline dataset.

  • HalfCheetah, it learns nothing , the episode returns are almost always below zero.
  • Ant, it performs nearly expert performance after switching from offline to online, but it have a huge dip to nearly zero.
  • Walker2d, it also has a dip.

I run the code in repo examples/awac/mujoco/awac1.py with all default settings, seems pretraining on offline data doesn't help these experiments. I find this link in issues(https://drive.google.com/file/d/1Qy5SYIGNwdeTHAGNjbRfuP5pSiRw8JzJ/view), looks in this file the leraning processs also doesn't profit much from the offline-learning.

Do I have to change any hyperparameter? If would be really super nice if I can reproduce the paper result.

Looking forward to your reply.

Best.

@Winston-Gu
Copy link

Met the same problem... In my case, i checked my result in "pretrain_q.csv", and found it seem like the offline_training procedure didn't actually happen... I'm looking closely into the source code, and i think maybe the default hyperparameters should be alterd.

@Winston-Gu
Copy link

This is my result for HalfCheetah, as you noted, "it learned nothing".

While the result shown in the paper looks like this:

I noticed that when creating the HalfCheetah-v2 environment, gym raised a warning indicating that HalfCheetah-v2 is outdated, is there any possibility that some changes in the environment caused this problem?

@Roberto09
Copy link

Just wondering, is the general issue that after pretraining the average returns go to zero during the training phase? Or that the model learns nothing during pretraining (i.e. returns are always near 0 during the pretraining phase)?

@linhlpv
Copy link

linhlpv commented Apr 24, 2023

Hi @Winston-Gu , it seems that my question is not related to the problem discussed in here but and I am sorry for that. But I'm trying to reproduce the AWAC results and stucking with creating the figures like showed in the AWAC paper. I see that you maybe could create the similar figures like in AWAC paper. Could you please help me with that?
Thank you so much and wish you have a nice day.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants