-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How do i know when ive reached an optimum while training #153
Comments
An opponent player in the evaluation phase is a random player in default. I think the winning rate of a perfect player versus a random player is about 98% in Tic-Tac-Toe, because random players sometimes choose correct actions. Generally speaking, "optimal" policy cannot be defined in multi-player games, while the maximum entropy Nash equilibrium is recognized as the representative policy. |
YuriCat, can you add an arg in the train function that has evaluate to a different agent? Like i want to evaluate against my model from 20 epochs ago to see if it is improving or not. How can i do this ? I only see its supported in evaluate.py but not train.py |
Thanks for your suggestion. By the way, comparing against a model just before the current model may give us an interesting result, since policies trained by RL are sometimes with in a loop like Rock-Scissors-Paper. |
for example when training tic tac toe, is the optimum reached when win rate == 0.50? my win rate is so far always above 0.50. i havent used the evaluate function yet because i feel like the win_rate printed after every epoch is already an evaluation?
The text was updated successfully, but these errors were encountered: