I wonder if anybody gets better results than I do #32

tsadigovAgmail · 2020-12-16T22:10:18Z

tsadigovAgmail
Dec 16, 2020

I ran the example for ch6 10 times,
Then changed config so that it does not learn, I mean does not update weights.
The results seem very similar. It averages between 35-40. I wonder if anybody gets better results than I do. May be I am applying example wrong way.

praveen-palanisamy · 2020-12-20T19:13:01Z

praveen-palanisamy
Dec 20, 2020
Maintainer

Hi @tsadigovAgmail !

Chapter 6 is primarily focused on the Deep Q-Learner so, I am assuming you are running the deep_Q_learner.py script. By default, the agent will train in the SeaquestNoFrameskip-v4 Atari RL Gym environment (you can choose a different environment using the --env flag). Depending the environment you are using, the reward you observed (35-40) may be very less which even a randomly acting agent can easily get. This is likely why when you configured the agent to not update the weights, you still saw the agent getting rewards.

If you would like to dig more into your observation, please share the parameters and configs you are using so that we can take a closer look.

0 replies

tsadigovAgmail · 2022-01-07T06:47:53Z

tsadigovAgmail
Jan 7, 2022
Author

Thanks @praveen-palanisamy
I am back onto this endeaveour and would love to have nudge/support

How I came here is I wanted to make sure I interpret chart in the right way so I disabled learning part to have some baseline.
Now I know part of my misunderstanding. I saw the graph values going up and thought it means the model is learning/improving. But it should not because I commented out weight update. What was actually happening is model had some performance and graph was showing mean comes close to the real value. The shape of the graph like it is going up fast and slowing down eventually is just a result of how the performance value is calculated- averaging all values. So starting value is 0 and when there is fewer samples each additional sample moves it in bigger steps to the real value. As sample size increases each additional samples weight becomes smaller.

I was wrongly interpreting it as result of epsilon decreasing and model stabilizing.
I will try using moving average to concentrate on current performance of the model.

0 replies

tsadigovAgmail · 2022-01-07T06:49:42Z

tsadigovAgmail
Jan 7, 2022
Author

Do you have a sample chart of what should I see while the model really learns and how long can I expect it to take?

0 replies

tsadigovAgmail · 2022-01-07T07:27:42Z

tsadigovAgmail
Jan 7, 2022
Author

My current setup
I tried to use code unmodified but learning gets stuck when agent decides to just do nothing, so I use early termination on step 6000
I also added parameter to use as name of the experiment/run and I pass git commit id so that I know which version of the code resulted in the specific results. This also allows me to have several parallel runs of the same model simultaneusly. I also save the model/weights each time learning occurs.

The baseline_NOT_learning shows fixed performance as expected (I did 4 runs with similar result not showing in screenshot)

but I dont understand why all of the learning versions degrade in performance ? Do I need to just wait for longer? I would like to compare to results you had.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I wonder if anybody gets better results than I do #32

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

I wonder if anybody gets better results than I do #32

tsadigovAgmail Dec 16, 2020

Replies: 4 comments

praveen-palanisamy Dec 20, 2020 Maintainer

tsadigovAgmail Jan 7, 2022 Author

tsadigovAgmail Jan 7, 2022 Author

tsadigovAgmail Jan 7, 2022 Author

tsadigovAgmail
Dec 16, 2020

praveen-palanisamy
Dec 20, 2020
Maintainer

tsadigovAgmail
Jan 7, 2022
Author

tsadigovAgmail
Jan 7, 2022
Author

tsadigovAgmail
Jan 7, 2022
Author