diff --git a/CHANGELOG.md b/CHANGELOG.md new file mode 100644 index 000000000..5ecd6dd06 --- /dev/null +++ b/CHANGELOG.md @@ -0,0 +1,18 @@ +# Change Log + +Here you can find all important changes in this project. RL-Studio adheres to [Semantic Versioning](http://semver.org/). +## [Unreleased] + +## [1.2.0] - 2022-01-11 +### Added: + +- Add new files in envs/gazebo/f1/models/: original files are split into different modular files such as images.py, reset.py, step.py, settings.py, simplified_perception.py ([PR #122](https://github.com/JdeRobot/RL-Studio/pull/122)) + +### Changed: +- entry file name: `rl-studio.py ` +- config file names: now they follow naming `config_mode_task_algorithm_agent_simulator` +- files structure in envs/gazebo/f1/models/: files are named `task_algorithm.py` + + + +[1.2.0]: https://github.com/JdeRobot/RL-Studio/pull/122 diff --git a/CODING.md b/CODING.md new file mode 100644 index 000000000..ae48413e4 --- /dev/null +++ b/CODING.md @@ -0,0 +1,51 @@ +# CODING STYLE + +If you are contributing to RL-Studio tool development, please follow the below coding style and structure of directories, files recommendations: + +## SOME RULES + +- [PEP 8](https://peps.python.org/pep-0008/) style guide for Python Code +- [Black](https://github.com/psf/black) format +- [Pylint](https://pypi.org/project/pylint/) as static code analyser +- Constants variable must be upper-case (e.g `TIME_THRESHOLD`, `NUMBER_OF_RUNS`) +- Comment all non trivial functions to ease readability +- All the internal imported packages must be imported from the root of the project (e.g `import rl_studio.agents` instead `import agents`) +- Organize imports before pushing your code to the repo + +- When creating a project, please keep in mind: + + - in **/agents** directory, files names should be `mode_task_algorithm_simulator_framework.py`, i.e. `trainer_followline_ddpg_F1_gazebo_tf.py` or `inferencer_mountaincar_qlearn_openai_pytorch.py`. In case of not using framework leave it blank. + - in **/envs/gazebo/f1/models** directory, files names should be `task_algorithm_framework.py`, i.e. `followline_ddpg_gazebo_tf.py` or `followlane_qlearn_pytorch.py`. In case of not using framework leave it blank. + - As a general rule, **classes names** have to follow convention `ModeTaskAlgorithmAgentSimuladorFramework`, i.e. `TrainerFollowLaneDDPGF1GazeboPytorch` or `InferencerFollowLaneDQNF1GazeboTF` + - in **/envs/gazebo** directory, classes names follow rule `TaskAlgorithmAgentSimulatorFramework`, i.e. `FollowlineDDPGF1GazeboTF`. + +# Directory architecture + +## Config files + +- in **/config** directory add a configuration file with the following format `config_mode_task_algorithm_agent_simulator.yaml`, i.e. `config_training_followlane_qlearn_F1_carla.yaml` + +## Models + +- Add a trained brain in **/checkpoints** folder. You can configure it in the config.yaml file. Automatically the app will add a directory with the format `task_algorithm_agent_simulator_framework` where to save models. +- The file model should have the format `timestamp_maxreward_epoch_ADITIONALTEXT.h5` in format h5 i.e. `09122002_max45678_epoch435_actor_conv2d32x64_critic_conv2d32x64_actionsCont_stateImg_rewardDiscrete.h5` to indicate the main features of the model saved in order to easily find the exact model. + +## Metrics + +- In **/metrics** folder should be saved statistics and metrics of models. You can configure it in the config.yaml file. Automatically the app will add a directory with the format `mode/task_algorithm_agent_simulator_framework/data` where to save data. + +## Graphics + +- In **/metrics** folder should be saved graphics of models. You can configure it in the config.yaml file. Automatically the app will add a directory with the format `mode/task_algorithm_agent_simulator_framework/graphics` where to save graphics. + +## Logs and TensorBoard files + +- In **/logs** folder should be saved TensorBoard and logs files. You can configure it in the config.yaml file. + For TensorBoard, automatically the app will add a directory with the format `mode/task_algorithm_agent_simulator_framework/TensorBoard`. + + For logs, the app automatically will add a directory with the format `mode/task_algorithm_agent_simulator_framework/logs`. + +# TIPS + +- You can refer to "mountain_car qlearning" project as an example of how to implement a real time monitoring +- You can refer to "cartpole dqn" project as an example of how to implement logging diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md new file mode 100644 index 000000000..f09ba8427 --- /dev/null +++ b/CONTRIBUTING.md @@ -0,0 +1,53 @@ +# Contributing to Reinforcement Learning Studio + +Thanks for your interest on contributing! + +This file contains a set of rules to contributing to the project and the +rest of the projects developed by JdeRobot. +If you have any doubt about how to contribute contact one of the maintainers +of the project. They will be pleased to tell you how you can contribute with your +knowledge to the project and organization! + +* [Code of conduct](#code-of-conduct) +* [Prerequisites before contributing](#prerequisites-before-contributing) +* [How can I contribute?](#how-can-i-contribute) + + + +## Code of conduct +Please report any unacceptable behavior to any of [the maintainers](#i-have-a-question). + + +## Prerequisites before contributing +In order to contribute to JdeRobot projects, please read carefully the project README.md/[webpage](https://github.com/JdeRobot/RL-Studio) before +starting contributing to understand the purpose of the project and where you can contribute. + + +## How can I contribute? +Any JdeRobot project follows the same workflow when contributing. + +* **Find a problem or possible improvement for the project:** First of all, check that the feature/bug is not listed in the current [open issues](https://github.com/JdeRobot/RL-Studio/issues). + +* **Create an issue:** [Create an issue](https://github.com/JdeRobot/RL-Studio/issues/new) in the project with the problem/improvement you will +address. In this issue, explain carefully what you will be updating and how this changes will impact the project. + Provide any complementary information to explain it (code samples, screenshots ...). You should information about: + * Expected behavior + * Actual behavior + * Steps to reproduce + * Environment + * Possible cause + * Possible solution + +The two following points are different depending on the permissions you have to the repo. +* **[If you have write permission] Work in a separate branch always:** Create a new branch with a describable name (you can use the issue number as branch name "issue_xxx"). Create your commits in that branch making the needed changes. Please, use describable names as commit messages, so everyone can understand easily the changes you made. + +* **[If you only have read permission] Fork the project:** Fork the project. Work on that copy of the repo, making the desirable changes. Please, use describable names as commit messages, so everyone can understand easily the changes you made. + +* **Open a pull request:** A pull request is compulsory any time a new change wants to be added to the core or the project. After solving the issue, create a pull request with your branch. In this pull request include all the commits made, write a good description of the changes made and refer to the issue solved to make things easier to the maintainers. Include any additional resource that would be interesting (references, screenshots...). Link the PR with the issue. + +* **Testing and merging pull requests** +One of maintainers will review your code. Reviewer could ask you to modify your pull request. + Please provide timely response for reviewers (within weeks, not months), otherwise you submission could be postponed or even rejected. + +* **[If you have write permission] Don't accept your own pull requests:** Wait for a project maintainer to accept the changes you made. They will probably comment the pull request with some feedback and will consider if it can be merge to the master branch. Be proactive and kind! + diff --git a/doubts_glossary.md b/FAQ.md similarity index 100% rename from doubts_glossary.md rename to FAQ.md diff --git a/README.md b/README.md index 902fe09b2..a4b794a5e 100644 --- a/README.md +++ b/README.md @@ -4,19 +4,48 @@ ## [![forthebadge](https://forthebadge.com/images/badges/for-robots.svg)](https://forthebadge.com) [![forthebadge](https://forthebadge.com/images/badges/made-with-python.svg)](https://forthebadge.com) -## [![Dependencies Status](https://img.shields.io/badge/dependencies-up%20to%20date-brightgreen.svg) ](https://github.com/TezRomacH/python-package-template/pulls?utf8=%E2%9C%93&q=is%3Apr%20author%3Aapp%2Fdependabot)[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg) ](https://github.com/psf/black) [![Pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white)](https://github.com/TezRomacH/python-package-template/blob/master/.pre-commit-config.yaml) [![License](https://img.shields.io/github/license/TezRomacH/python-package-template)](https://github.com/JdeRobot/RL-Studio/blob/main/LICENSE.md) ![](https://img.shields.io/badge/Dependencies-Poetry-blue) +## [![Dependencies Status](https://img.shields.io/badge/dependencies-up%20to%20date-brightgreen.svg) ](https://github.com/TezRomacH/python-package-template/pulls?utf8=%E2%9C%93&q=is%3Apr%20author%3Aapp%2Fdependabot)[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg) ](https://github.com/psf/black) [![Pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white)](https://github.com/TezRomacH/python-package-template/blob/master/.pre-commit-config.yaml) [![License](https://img.shields.io/badge/license-GNU-orange)](https://github.com/JdeRobot/RL-Studio/blob/main/LICENSE) -![](https://img.shields.io/badge/Gazebo-11-orange) ![](https://img.shields.io/badge/ROS-Noetic-blue) ![](https://img.shields.io/badge/Python-3.8-yellowInstall) +![](https://img.shields.io/badge/Gazebo-11-orange) ![](https://img.shields.io/badge/ROS-Noetic-blue) ![](https://img.shields.io/badge/Python-3.8-yellowInstall) ![](https://img.shields.io/badge/Carla-0.9.13-yellow) ![](https://img.shields.io/badge/TensorFlow-2.9.11-brightgreen) ![](https://img.shields.io/badge/PyTorch-1.13-yellowgreen) -RL-Studio is a platform for training reinforcement learning algorithms for robots with different environments and algorithms. You can create your agent, environment and algorithm and compare it with others. +Reinforcement Learning Studio, RL-Studio, is a platform for developing robotic applications with reinforcement learning algorithms. Its modular design allows to work easily with different agents and algoritmhs in autonomous tasks and any simulator. -## Installation +# Introduction -### Install ROS +RL-Studio is designed to work with robots, as autonomous vehicles, in any relevant tasks with any simulators that provide adequate realism to transfer development to real environments automatically and with robustness --sim2real transfer. +The designed agents can support any type of sensor that collects information from the environment and, with reinforcement learning algorithms, can send correct signals to the actuators to adequately command the robot, following the standard reinforcement learning cycle. -RL-Studio works with ROS Noetic. You can [install ROS Noetic in the official documentation](http://wiki.ros.org/noetic/Installation/Ubuntu) and installing ROS Noetic Full Desktop. +## Working Modes + +Rl-Studio allows you to work with different modes of operation, all of them neccesary to build a RL app: + +- Training: the objective of any development in RL-Studio is to design a training that generates a suitable model for the environment in question. see diagram +- Retraining of models already generated, so that they continue learning in the same or different environments. +- Inference: Trained models are tested --inference -- in different environments in order to validate their learning. + +## Agents + +RL-Studio is designed to work with any robotic agent, mainly in autonomous driving through complex environments. However, thanks to the modularity of the application, it is easy to create new agents that are tested in other tasks, such as manipulation, legged robots, drones and so on. + +## Algorithms + +Qlearn, DQN, DDPG and PPO have currently been implemented to work on the different tasks developed. However, it is easy to design any other algorithms. + +## Deep Learning frameworks + +[Tensorflow](https://www.tensorflow.org/) 2.9.11 and [Pytorch](https://pytorch.org/) 1.13 are currently supported, although it is very easy to extend to others. + +## Simulators and ROS + +RL-Studio supports [ROS](http://wiki.ros.org/) Noetic which is necesary to interact with [Gazebo](https://classic.gazebosim.org/) or also as a bridge with [Carla](https://carla.readthedocs.io/en/0.9.13/). Although Carla can work without ROS also. Currently there are working canonical reinforcement lerning tasks with OpenAI simulator through [gymnasium](https://gymnasium.farama.org/) library. + +# Installation + +## Install ROS + +RL-Studio works with ROS Noetic. You can [install ROS Noetic from the official documentation](http://wiki.ros.org/noetic/Installation/Ubuntu) and installing ROS Noetic Full Desktop. ### Clone the RL-studio repository @@ -30,46 +59,53 @@ or git clone https://github.com/JdeRobot/RL-Studio.git ``` -### Install dependencies with Poetry (recommended): -```bash -curl -sSL https://install.python-poetry.org | python3 - -export PATH="/root/.local/bin:$PATH" -``` +## Install dependencies using pip: -Install dependencies: +_It is highly recommended to create a virtual environment:_ ```bash -poetry install +cd RL-Studio +pip install -r requirements.txt ``` -### Install dependencies using pip (not recommended): - -_Note: In case you don't want to use Poetry as a dependency manager, you can install it with pip as follows (previously it is highly recommended to create a virtual environment):_ +Add the project to `PYTHONPATH`: ```bash -cd RL-Studio -pip install -r requirements.txt +echo "export PYTHONPATH=$PYTHONPATH:~/PATH/TO/RL-Studio" >> ~/.bashrc +source ~/.bashrc ``` The commits follow the [gitmoji](https://gitmoji.dev/) convention and the code is formatted with [Black](https://black.readthedocs.io/en/stable/). -#### Install rl_studio as package -```bash -cd ~/PATH/TO/RL-Studio/rl_studio -pip install -e . -``` +## Checking everything. Set environment -## Set environments +### Set ROS Noetic and Formula 1 agent configuration -### Set Noetic and Formula 1 agent configuration +The fastest way to verify that the installation has been successful is to follow the next steps. + +To connect RL-Studio with ROS and Gazebo and the different agents and circuits installed: ```bash cd ~/PATH/TO/RL-Studio/rl_studio/installation bash setup_noetic.bash ``` +> :warning: if bash file execution gives an error, in some configurations can be fixed by editing the bash file and changing the line + +```bash +catkin_make +``` + +by + +```bash +catkin_make -DPYTHON_EXECUTABLE=/usr/bin/python3 -DPYTHON_INCLUDE_DIR=/usr/include +``` + +where python3.8 is the actual Python version in your virtual env. + The installation downloads the CustomRobots repository into the above directory, as follows: ```bash @@ -98,16 +134,14 @@ export GAZEBO_RESOURCE_PATH=$GAZEBO_RESOURCE_PATH:$HOME/PATH/TO/RL-Studio/rl_stu . . . ``` -### Continuing setting Formula1 environment - -To set Formula 1 environment running the following script (the same folder that before): +To set Formula 1 environment run the following script (the same folder that before): -``` +```bash cd ~/PATH/TO/RL-Studio/rl_studio/installation ./formula1_setup.bash ``` -The following routes will be added to the `.bashrc` file (for `formula1` environment), please check it: +The following routes will be added to the `.bashrc` file: ```bash . . . @@ -119,11 +153,56 @@ export GYM_GAZEBO_WORLD_MONTREAL_F1=$HOME/PATH/TO/RL-Studio/rl_studio/installati There will be as many variables as there are circuits to be executed. In case you want to work with other circuits or agents, there will be necessary add the correct paths to variables in `.bashrc` file in the same way. +And finally, do not forget adding +```bash +export PYTHONPATH=$PYTHONPATH:PATH/TO/RL-Studio +``` + +## Usage/Examples + To check that everything is working correctly you can try launching a ROS exercise by typing: -```bash -cd $HOME/PATH/TO/RL-Studio/rl_studio/CustomRobots/f1/launch +```python +cd /PATH/TO/RL-Studio/rl_studio/CustomRobots/f1/launch roslaunch simple_circuit.launch ``` -And to begin training and inferencing, please go to [README.md](https://github.com/JdeRobot/RL-Studio/blob/main/rl_studio/README.md) +and you could see something similar to the screenshot + +![](./rl_studio/docs/gazebo_screenshot.png) + + +# Work with RL-Studio + + +Additional information on how to create, run and test reinforcement learning models, how to create a configuration file to launch the application and to begin training and inferencing, please go to [rl-studio](https://github.com/JdeRobot/RL-Studio/blob/main/rl_studio/README.md). + +Information about coding or naming classes and files, how the directory structure is designed and where to save models, metrics, logs and graphics, please go to [codig style file](https://github.com/JdeRobot/RL-Studio/blob/main/CODING.md). + +FAQ please go to [answering questions](https://github.com/JdeRobot/RL-Studio/blob/main/FAQ.md). + +# Reference + +A paper about RL-Studio appears in Volume **590** of the **Lecture Notes in Networks and Systems** series of Springer and can be cited with bibtex entry: + +``` +@inproceedings{fernandez2023rl, + title={RL-Studio: A Tool for Reinforcement Learning Methods in Robotics}, + author={Fern{\'a}ndez de Cabo, Pedro and Lucas, Rub{\'e}n and Arranz, Ignacio and Paniego, Sergio and Ca{\~n}as, Jos{\'e} M}, + booktitle={Iberian Robotics conference}, + pages={502--513}, + year={2023}, + organization={Springer} +} +``` +or +```text +Fernández de Cabo, P., Lucas, R., Arranz, I., Paniego, S., & Cañas, J. M. (2023). RL-Studio: A Tool for Reinforcement Learning Methods in Robotics. In Iberian Robotics conference (pp. 502-513). Springer, Cham. +``` +# Contributing + +Contributions are always welcome! + +See [CONTRIBUTING](CONTRIBUTING.md) for ways to get started. + +Please adhere to this project's `code of conduct`. diff --git a/coding_style.md b/coding_style.md deleted file mode 100644 index 669f8d494..000000000 --- a/coding_style.md +++ /dev/null @@ -1,19 +0,0 @@ -# CODING STYLE -If you are contributing to RL-Studio tool development. Please, follow the following coding styile recommendations: - -## RULES - -- Constants variable must be upper-case (e.g `TIME_THRESHOLD`, `NUMBER_OF_RUNS`) -- Comment all non trivial functions to ease readability -- When creating a project: - - Add a configuration file in "config" folder with the used configuration for an optimal training/inferencing - with the following format `config___.yaml` - - Add a trained brain in "checkpoints" folder to enable a new developer to run an already trained model -- Please use the "black" formatter tool before pushing your code to the repo -- All the internal imported packages must be imported from the root of the project (e.g `import rl_studio.agents` instead `import agents`) -- Organize imports before pushing your code to the repo - -## TIPS - -- You can refer to "mountain_car qlearning" project as an example of how to implement a real time monitoring -- You can refer to "cartpole dqn" project as an example of how to implement logging \ No newline at end of file diff --git a/poetry.lock b/poetry.lock deleted file mode 100644 index a4136c6e1..000000000 --- a/poetry.lock +++ /dev/null @@ -1,742 +0,0 @@ -[[package]] -name = "appdirs" -version = "1.4.4" -description = "A small Python module for determining appropriate platform-specific dirs, e.g. a \"user data dir\"." -category = "main" -optional = false -python-versions = "*" - -[[package]] -name = "atomicwrites" -version = "1.4.0" -description = "Atomic file writes." -category = "dev" -optional = false -python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*" - -[[package]] -name = "attrs" -version = "21.2.0" -description = "Classes Without Boilerplate" -category = "dev" -optional = false -python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*" - -[package.extras] -dev = ["coverage[toml] (>=5.0.2)", "furo", "hypothesis", "mypy", "pre-commit", "pympler", "pytest (>=4.3.0)", "pytest-mypy-plugins", "six", "sphinx", "sphinx-notfound-page", "zope.interface"] -docs = ["furo", "sphinx", "sphinx-notfound-page", "zope.interface"] -tests = ["coverage[toml] (>=5.0.2)", "hypothesis", "mypy", "pympler", "pytest (>=4.3.0)", "pytest-mypy-plugins", "six", "zope.interface"] -tests_no_zope = ["coverage[toml] (>=5.0.2)", "hypothesis", "mypy", "pympler", "pytest (>=4.3.0)", "pytest-mypy-plugins", "six"] - -[[package]] -name = "catkin-pkg" -version = "0.4.23" -description = "catkin package library" -category = "main" -optional = false -python-versions = "*" - -[package.dependencies] -docutils = "*" -pyparsing = "*" -python-dateutil = "*" - -[[package]] -name = "certifi" -version = "2021.5.30" -description = "Python package for providing Mozilla's CA Bundle." -category = "main" -optional = false -python-versions = "*" - -[[package]] -name = "cfgv" -version = "3.3.0" -description = "Validate configuration and produce human readable error messages." -category = "main" -optional = false -python-versions = ">=3.6.1" - -[[package]] -name = "chardet" -version = "4.0.0" -description = "Universal encoding detector for Python 2 and 3" -category = "main" -optional = false -python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*" - -[[package]] -name = "cloudpickle" -version = "1.6.0" -description = "Extended pickling support for Python objects" -category = "main" -optional = false -python-versions = ">=3.5" - -[[package]] -name = "colorama" -version = "0.4.4" -description = "Cross-platform colored terminal text." -category = "dev" -optional = false -python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*" - -[[package]] -name = "cycler" -version = "0.10.0" -description = "Composable style cycles" -category = "main" -optional = false -python-versions = "*" - -[package.dependencies] -six = "*" - -[[package]] -name = "decorator" -version = "4.4.2" -description = "Decorators for Humans" -category = "main" -optional = false -python-versions = ">=2.6, !=3.0.*, !=3.1.*" - -[[package]] -name = "defusedxml" -version = "0.6.0" -description = "XML bomb protection for Python stdlib modules" -category = "main" -optional = false -python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*" - -[[package]] -name = "distlib" -version = "0.3.2" -description = "Distribution utilities" -category = "main" -optional = false -python-versions = "*" - -[[package]] -name = "distro" -version = "1.5.0" -description = "Distro - an OS platform information API" -category = "main" -optional = false -python-versions = "*" - -[[package]] -name = "dnspython" -version = "2.2.1" -description = "DNS toolkit" -category = "main" -optional = false -python-versions = ">=3.6,<4.0" - -[package.extras] -curio = ["curio (>=1.2,<2.0)", "sniffio (>=1.1,<2.0)"] -dnssec = ["cryptography (>=2.6,<37.0)"] -doh = ["h2 (>=4.1.0)", "httpx (>=0.21.1)", "requests (>=2.23.0,<3.0.0)", "requests-toolbelt (>=0.9.1,<0.10.0)"] -idna = ["idna (>=2.1,<4.0)"] -trio = ["trio (>=0.14,<0.20)"] -wmi = ["wmi (>=1.5.1,<2.0.0)"] - -[[package]] -name = "docutils" -version = "0.17.1" -description = "Docutils -- Python Documentation Utilities" -category = "main" -optional = false -python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*" - -[[package]] -name = "email-validator" -version = "1.2.1" -description = "A robust email syntax and deliverability validation library." -category = "main" -optional = false -python-versions = "!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*,>=2.7" - -[package.dependencies] -dnspython = ">=1.15.0" -idna = ">=2.0.0" - -[[package]] -name = "environs" -version = "9.2.0" -description = "simplified environment variable parsing" -category = "main" -optional = false -python-versions = ">=3.6" - -[package.dependencies] -marshmallow = ">=2.7.0" -python-dotenv = "*" - -[package.extras] -dev = ["dj-database-url", "dj-email-url", "django-cache-url", "flake8 (==3.8.4)", "flake8-bugbear (==20.1.4)", "mypy (==0.790)", "pre-commit (>=2.4,<3.0)", "pytest", "tox"] -django = ["dj-database-url", "dj-email-url", "django-cache-url"] -lint = ["flake8 (==3.8.4)", "flake8-bugbear (==20.1.4)", "mypy (==0.790)", "pre-commit (>=2.4,<3.0)"] -tests = ["dj-database-url", "dj-email-url", "django-cache-url", "pytest"] - -[[package]] -name = "filelock" -version = "3.0.12" -description = "A platform independent file lock." -category = "main" -optional = false -python-versions = "*" - -[[package]] -name = "future" -version = "0.18.2" -description = "Clean single-source support for Python 3 and 2" -category = "main" -optional = false -python-versions = ">=2.6, !=3.0.*, !=3.1.*, !=3.2.*" - -[[package]] -name = "gym" -version = "0.17.3" -description = "The OpenAI Gym: A toolkit for developing and comparing your reinforcement learning agents." -category = "main" -optional = false -python-versions = ">=3.5" - -[package.dependencies] -cloudpickle = ">=1.2.0,<1.7.0" -numpy = ">=1.10.4" -pyglet = ">=1.4.0,<=1.5.0" -scipy = "*" - -[package.extras] -all = ["Pillow", "atari_py (>=0.2.0,<0.3.0)", "box2d-py (>=2.3.5,<2.4.0)", "imageio", "imageio", "mujoco_py (>=1.50,<2.0)", "mujoco_py (>=1.50,<2.0)", "opencv-python"] -atari = ["Pillow", "atari_py (>=0.2.0,<0.3.0)", "opencv-python"] -box2d = ["box2d-py (>=2.3.5,<2.4.0)"] -mujoco = ["imageio", "mujoco_py (>=1.50,<2.0)"] -robotics = ["imageio", "mujoco_py (>=1.50,<2.0)"] - -[[package]] -name = "identify" -version = "2.2.11" -description = "File identification library for Python" -category = "main" -optional = false -python-versions = ">=3.6.1" - -[package.extras] -license = ["editdistance-s"] - -[[package]] -name = "idna" -version = "2.10" -description = "Internationalized Domain Names in Applications (IDNA)" -category = "main" -optional = false -python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*" - -[[package]] -name = "imageio" -version = "2.9.0" -description = "Library for reading and writing a wide range of image, video, scientific, and volumetric data formats." -category = "main" -optional = false -python-versions = ">=3.5" - -[package.dependencies] -numpy = "*" -pillow = "*" - -[package.extras] -ffmpeg = ["imageio-ffmpeg"] -fits = ["astropy"] -full = ["astropy", "gdal", "imageio-ffmpeg", "itk"] -gdal = ["gdal"] -itk = ["itk"] - -[[package]] -name = "iniconfig" -version = "1.1.1" -description = "iniconfig: brain-dead simple config-ini parsing" -category = "dev" -optional = false -python-versions = "*" - -[[package]] -name = "kiwisolver" -version = "1.3.1" -description = "A fast implementation of the Cassowary constraint solver" -category = "main" -optional = false -python-versions = ">=3.6" - -[[package]] -name = "marshmallow" -version = "3.12.2" -description = "A lightweight library for converting complex datatypes to and from native Python datatypes." -category = "main" -optional = false -python-versions = ">=3.5" - -[package.extras] -dev = ["flake8 (==3.9.2)", "flake8-bugbear (==21.4.3)", "mypy (==0.910)", "pre-commit (>=2.4,<3.0)", "pytest", "pytz", "simplejson", "tox"] -docs = ["alabaster (==0.7.12)", "autodocsumm (==0.2.6)", "sphinx (==4.0.3)", "sphinx-issues (==1.2.0)", "sphinx-version-warning (==1.1.2)"] -lint = ["flake8 (==3.9.2)", "flake8-bugbear (==21.4.3)", "mypy (==0.910)", "pre-commit (>=2.4,<3.0)"] -tests = ["pytest", "pytz", "simplejson"] - -[[package]] -name = "matplotlib" -version = "3.3.2" -description = "Python plotting package" -category = "main" -optional = false -python-versions = ">=3.6" - -[package.dependencies] -certifi = ">=2020.06.20" -cycler = ">=0.10" -kiwisolver = ">=1.0.1" -numpy = ">=1.15" -pillow = ">=6.2.0" -pyparsing = ">=2.0.3,<2.0.4 || >2.0.4,<2.1.2 || >2.1.2,<2.1.6 || >2.1.6" -python-dateutil = ">=2.1" - -[[package]] -name = "netifaces" -version = "0.10.9" -description = "Portable network interface information." -category = "main" -optional = false -python-versions = "*" - -[[package]] -name = "networkx" -version = "2.5.1" -description = "Python package for creating and manipulating graphs and networks" -category = "main" -optional = false -python-versions = ">=3.6" - -[package.dependencies] -decorator = ">=4.3,<5" - -[package.extras] -all = ["lxml", "matplotlib", "numpy", "pandas", "pydot", "pygraphviz", "pytest", "pyyaml", "scipy"] -gdal = ["gdal"] -lxml = ["lxml"] -matplotlib = ["matplotlib"] -numpy = ["numpy"] -pandas = ["pandas"] -pydot = ["pydot"] -pygraphviz = ["pygraphviz"] -pytest = ["pytest"] -pyyaml = ["pyyaml"] -scipy = ["scipy"] - -[[package]] -name = "nodeenv" -version = "1.6.0" -description = "Node.js virtual environment builder" -category = "main" -optional = false -python-versions = "*" - -[[package]] -name = "numpy" -version = "1.21.0" -description = "NumPy is the fundamental package for array computing with Python." -category = "main" -optional = false -python-versions = ">=3.7" - -[[package]] -name = "opencv-python" -version = "4.2.0.32" -description = "Wrapper package for OpenCV python bindings." -category = "main" -optional = false -python-versions = "*" - -[package.dependencies] -numpy = ">=1.11.1" - -[[package]] -name = "packaging" -version = "21.0" -description = "Core utilities for Python packages" -category = "dev" -optional = false -python-versions = ">=3.6" - -[package.dependencies] -pyparsing = ">=2.0.2" - -[[package]] -name = "pandas" -version = "1.4.4" -description = "Powerful data structures for data analysis, time series, and statistics" -category = "main" -optional = false -python-versions = ">=3.8" - -[package.dependencies] -numpy = [ - {version = ">=1.18.5", markers = "platform_machine != \"aarch64\" and platform_machine != \"arm64\" and python_version < \"3.10\""}, - {version = ">=1.19.2", markers = "platform_machine == \"aarch64\" and python_version < \"3.10\""}, - {version = ">=1.20.0", markers = "platform_machine == \"arm64\" and python_version < \"3.10\""}, - {version = ">=1.21.0", markers = "python_version >= \"3.10\""}, -] -python-dateutil = ">=2.8.1" -pytz = ">=2020.1" - -[package.extras] -test = ["hypothesis (>=5.5.3)", "pytest (>=6.0)", "pytest-xdist (>=1.31)"] - -[[package]] -name = "pillow" -version = "8.3.1" -description = "Python Imaging Library (Fork)" -category = "main" -optional = false -python-versions = ">=3.6" - -[[package]] -name = "pluggy" -version = "0.13.1" -description = "plugin and hook calling mechanisms for python" -category = "dev" -optional = false -python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*" - -[package.extras] -dev = ["pre-commit", "tox"] - -[[package]] -name = "pre-commit" -version = "2.13.0" -description = "A framework for managing and maintaining multi-language pre-commit hooks." -category = "main" -optional = false -python-versions = ">=3.6.1" - -[package.dependencies] -cfgv = ">=2.0.0" -identify = ">=1.0.0" -nodeenv = ">=0.11.1" -pyyaml = ">=5.1" -toml = "*" -virtualenv = ">=20.0.8" - -[[package]] -name = "py" -version = "1.10.0" -description = "library with cross-python path, ini-parsing, io, code, log facilities" -category = "dev" -optional = false -python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*" - -[[package]] -name = "pydantic" -version = "1.10.2" -description = "Data validation and settings management using python type hints" -category = "main" -optional = false -python-versions = ">=3.7" - -[package.dependencies] -email-validator = {version = ">=1.0.3", optional = true, markers = "extra == \"email\""} -typing-extensions = ">=4.1.0" - -[package.extras] -dotenv = ["python-dotenv (>=0.10.4)"] -email = ["email-validator (>=1.0.3)"] - -[[package]] -name = "pygame" -version = "2.1.2" -description = "Python Game Development" -category = "main" -optional = false -python-versions = ">=3.6" - -[[package]] -name = "pyglet" -version = "1.5.0" -description = "Cross-platform windowing and multimedia library" -category = "main" -optional = false -python-versions = "*" - -[package.dependencies] -future = "*" - -[[package]] -name = "pyparsing" -version = "2.4.7" -description = "Python parsing module" -category = "main" -optional = false -python-versions = ">=2.6, !=3.0.*, !=3.1.*, !=3.2.*" - -[[package]] -name = "pytest" -version = "6.2.4" -description = "pytest: simple powerful testing with Python" -category = "dev" -optional = false -python-versions = ">=3.6" - -[package.dependencies] -atomicwrites = {version = ">=1.0", markers = "sys_platform == \"win32\""} -attrs = ">=19.2.0" -colorama = {version = "*", markers = "sys_platform == \"win32\""} -iniconfig = "*" -packaging = "*" -pluggy = ">=0.12,<1.0.0a1" -py = ">=1.8.2" -toml = "*" - -[package.extras] -testing = ["argcomplete", "hypothesis (>=3.56)", "mock", "nose", "requests", "xmlschema"] - -[[package]] -name = "python-dateutil" -version = "2.8.1" -description = "Extensions to the standard Python datetime module" -category = "main" -optional = false -python-versions = "!=3.0.*,!=3.1.*,!=3.2.*,>=2.7" - -[package.dependencies] -six = ">=1.5" - -[[package]] -name = "python-dotenv" -version = "0.18.0" -description = "Read key-value pairs from a .env file and set them as environment variables" -category = "main" -optional = false -python-versions = "*" - -[package.extras] -cli = ["click (>=5.0)"] - -[[package]] -name = "pytz" -version = "2022.2.1" -description = "World timezone definitions, modern and historical" -category = "main" -optional = false -python-versions = "*" - -[[package]] -name = "pywavelets" -version = "1.1.1" -description = "PyWavelets, wavelet transform module" -category = "main" -optional = false -python-versions = ">=3.5" - -[package.dependencies] -numpy = ">=1.13.3" - -[[package]] -name = "pyyaml" -version = "5.4.1" -description = "YAML parser and emitter for Python" -category = "main" -optional = false -python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*, !=3.5.*" - -[[package]] -name = "requests" -version = "2.25.1" -description = "Python HTTP for Humans." -category = "main" -optional = false -python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*" - -[package.dependencies] -certifi = ">=2017.4.17" -chardet = ">=3.0.2,<5" -idna = ">=2.5,<3" -urllib3 = ">=1.21.1,<1.27" - -[package.extras] -security = ["cryptography (>=1.3.4)", "pyOpenSSL (>=0.14)"] -socks = ["PySocks (>=1.5.6,!=1.5.7)", "win-inet-pton"] - -[[package]] -name = "rospkg" -version = "1.2.8" -description = "ROS package library" -category = "main" -optional = false -python-versions = "*" - -[package.dependencies] -catkin-pkg = "*" -distro = "*" -PyYAML = "*" - -[[package]] -name = "scikit-image" -version = "0.17.2" -description = "Image processing in Python" -category = "main" -optional = false -python-versions = ">=3.6" - -[package.dependencies] -imageio = ">=2.3.0" -matplotlib = ">=2.0.0,<3.0.0 || >3.0.0" -networkx = ">=2.0" -numpy = ">=1.15.1" -pillow = ">=4.3.0,<7.1.0 || >7.1.0,<7.1.1 || >7.1.1" -PyWavelets = ">=1.1.1" -scipy = ">=1.0.1" -tifffile = ">=2019.7.26" - -[package.extras] -docs = ["cloudpickle (>=0.2.1)", "dask[array] (>=0.15.0)", "matplotlib (>=3.0.1)", "numpydoc (>=0.9)", "pandas (>=0.23.0)", "pooch (>=0.5.2)", "pytest-runner", "scikit-learn", "seaborn (>=0.7.1)", "sphinx (>=1.8,<=2.4.4)", "sphinx-copybutton", "sphinx-gallery (>=0.3.1)"] -optional = ["SimpleITK", "astropy (>=1.2.0)", "cloudpickle (>=0.2.1)", "dask[array] (>=0.15.0)", "pooch (>=0.5.2)", "pyamg", "qtpy"] -test = ["codecov", "flake8", "pytest (!=3.7.3)", "pytest-cov", "pytest-localserver"] - -[[package]] -name = "scipy" -version = "1.6.1" -description = "SciPy: Scientific Library for Python" -category = "main" -optional = false -python-versions = ">=3.7" - -[package.dependencies] -numpy = ">=1.16.5" - -[[package]] -name = "six" -version = "1.14.0" -description = "Python 2 and 3 compatibility utilities" -category = "main" -optional = false -python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*" - -[[package]] -name = "tifffile" -version = "2021.7.2" -description = "Read and write TIFF files" -category = "main" -optional = false -python-versions = ">=3.7" - -[package.dependencies] -numpy = ">=1.15.1" - -[package.extras] -all = ["imagecodecs (>=2021.4.28)", "lxml", "matplotlib (>=3.2)"] - -[[package]] -name = "toml" -version = "0.10.2" -description = "Python Library for Tom's Obvious, Minimal Language" -category = "main" -optional = false -python-versions = ">=2.6, !=3.0.*, !=3.1.*, !=3.2.*" - -[[package]] -name = "typing-extensions" -version = "4.3.0" -description = "Backported and Experimental Type Hints for Python 3.7+" -category = "main" -optional = false -python-versions = ">=3.7" - -[[package]] -name = "urllib3" -version = "1.26.6" -description = "HTTP library with thread-safe connection pooling, file post, and more." -category = "main" -optional = false -python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*, <4" - -[package.extras] -brotli = ["brotlipy (>=0.6.0)"] -secure = ["certifi", "cryptography (>=1.3.4)", "idna (>=2.0.0)", "ipaddress", "pyOpenSSL (>=0.14)"] -socks = ["PySocks (>=1.5.6,!=1.5.7,<2.0)"] - -[[package]] -name = "virtualenv" -version = "20.4.7" -description = "Virtual Python Environment builder" -category = "main" -optional = false -python-versions = "!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,>=2.7" - -[package.dependencies] -appdirs = ">=1.4.3,<2" -distlib = ">=0.3.1,<1" -filelock = ">=3.0.0,<4" -six = ">=1.9.0,<2" - -[package.extras] -docs = ["proselint (>=0.10.2)", "sphinx (>=3)", "sphinx-argparse (>=0.2.5)", "sphinx-rtd-theme (>=0.4.3)", "towncrier (>=19.9.0rc1)"] -testing = ["coverage (>=4)", "coverage-enable-subprocess (>=1)", "flaky (>=3)", "packaging (>=20.0)", "pytest (>=4)", "pytest-env (>=0.6.2)", "pytest-freezegun (>=0.4.1)", "pytest-mock (>=2)", "pytest-randomly (>=1)", "pytest-timeout (>=1)", "xonsh (>=0.9.16)"] - -[metadata] -lock-version = "1.1" -python-versions = "^3.8" -content-hash = "ec9bc779d18c367b0a9c67c32b018503c0d27dc99625dc09c3e5f80737b567d2" - -[metadata.files] -appdirs = [] -atomicwrites = [] -attrs = [] -catkin-pkg = [] -certifi = [] -cfgv = [] -chardet = [] -cloudpickle = [] -colorama = [] -cycler = [] -decorator = [] -defusedxml = [] -distlib = [] -distro = [] -docutils = [] -environs = [] -filelock = [] -future = [] -gym = [] -identify = [] -idna = [] -imageio = [] -iniconfig = [] -kiwisolver = [] -marshmallow = [] -matplotlib = [] -netifaces = [] -networkx = [] -nodeenv = [] -numpy = [] -opencv-python = [] -packaging = [] -pillow = [] -pluggy = [] -pre-commit = [] -py = [] -pygame = [] -pyglet = [] -pyparsing = [] -pytest = [] -python-dateutil = [] -python-dotenv = [] -pywavelets = [] -pyyaml = [] -requests = [] -rospkg = [] -scikit-image = [] -scipy = [] -six = [] -tifffile = [] -toml = [] -urllib3 = [] -virtualenv = [] diff --git a/pyproject.toml b/pyproject.toml deleted file mode 100644 index bc4bc5737..000000000 --- a/pyproject.toml +++ /dev/null @@ -1,33 +0,0 @@ -[tool.poetry] -name = "rl-studio" -version = "0.1.0" -description = "Platform for training reinforcement-learning algorithms for robots" -authors = ["NachoAz "] -readme = 'README.md' -repository = "https://github.com/JdeRobot/RL-Studio" -keywords = ["python", "reinforcement-learning", "artificial-intelligence", "jderobot"] -# Avoid publish: https://twitter.com/pypi/status/1097241506304454656 -classifiers = ["Private :: Do Not Upload"] - -[tool.poetry.dependencies] -python = "^3.8" -catkin-pkg = "0.4.23" -rospkg = "1.2.8" -matplotlib = "3.3.2" -defusedxml = "0.6.0" -opencv-python = "4.2.0.32" -scikit-image = "0.17.2" -netifaces = "0.10.9" -numpy = "1.21.0" -gym = "0.17.3" -requests = "^2.25.1" -six = "1.14.0" -pyglet = "^1.2.0" -environs = "9.2.0" -pre-commit = "^2.12.1" -pygame = "^2.1.2" -pydantic = {extras = ["email"], version = "^1.10.2"} -pandas = "^1.4.4" - -[tool.poetry.dev-dependencies] -pytest = "^6.2.4" diff --git a/requirements.txt b/requirements.txt index 3012a8962..d4568c4a3 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,363 +1,185 @@ -appdirs==1.4.4; python_full_version >= "3.6.1" \ - --hash=sha256:a841dacd6b99318a741b166adb07e19ee71a274450e68237b4650ca1055ab128 \ - --hash=sha256:7d5d0167b2b1ba821647616af46a749d1c653740dd0d2415100fe26e27afdf41 -catkin-pkg==0.4.23 \ - --hash=sha256:fbfb107e7e7f3167175b6a68bd51eee7d5a85b2e18c4dbee96d715178f029d8c \ - --hash=sha256:28ee181cca827c0aabf9397351f58a97e1475ca5ac7c106a5916e3ee191cd3d0 -certifi==2021.5.30; python_version >= "3.6" and python_full_version < "3.0.0" or python_full_version >= "3.5.0" and python_version >= "3.6" \ - --hash=sha256:50b1e4f8446b06f41be7dd6338db18e0990601dce795c2b1686458aa7e8fa7d8 \ - --hash=sha256:2bbf76fd432960138b3ef6dda3dde0544f27cbf8546c458e60baf371917ba9ee -cfgv==3.3.0; python_full_version >= "3.6.1" \ - --hash=sha256:b449c9c6118fe8cca7fa5e00b9ec60ba08145d281d52164230a69211c5d597a1 \ - --hash=sha256:9e600479b3b99e8af981ecdfc80a0296104ee610cab48a5ae4ffd0b668650eb1 -chardet==4.0.0; python_version >= "2.7" and python_full_version < "3.0.0" or python_full_version >= "3.5.0" \ - --hash=sha256:f864054d66fd9118f2e67044ac8981a54775ec5b67aed0441892edb553d21da5 \ - --hash=sha256:0d6f53a15db4120f2b08c94f11e7d93d2c911ee118b6b30a04ec3ee8310179fa -cloudpickle==1.6.0; python_version >= "3.5" \ - --hash=sha256:3a32d0eb0bc6f4d0c57fbc4f3e3780f7a81e6fee0fa935072884d58ae8e1cc7c \ - --hash=sha256:9bc994f9e9447593bd0a45371f0e7ac7333710fcf64a4eb9834bf149f4ef2f32 -cycler==0.10.0; python_version >= "3.6" \ - --hash=sha256:1d8a5ae1ff6c5cf9b93e8811e581232ad8920aeec647c37316ceac982b08cb2d \ - --hash=sha256:cd7b2d1018258d7247a71425e9f26463dfb444d411c39569972f4ce586b0c9d8 -decorator==4.4.2; python_version >= "3.6" and python_full_version < "3.0.0" or python_full_version >= "3.2.0" and python_version >= "3.6" \ - --hash=sha256:41fa54c2a0cc4ba648be4fd43cff00aedf5b9465c9bf18d64325bc225f08f760 \ - --hash=sha256:e3a62f0520172440ca0dcc823749319382e377f37f140a0b99ef45fecb84bfe7 -defusedxml==0.6.0; (python_version >= "2.7" and python_full_version < "3.0.0") or (python_full_version >= "3.5.0") \ - --hash=sha256:6687150770438374ab581bb7a1b327a847dd9c5749e396102de3fad4e8a3ef93 \ - --hash=sha256:f684034d135af4c6cbb949b8a4d2ed61634515257a67299e5f940fbaa34377f5 -distlib==0.3.2; python_full_version >= "3.6.1" \ - --hash=sha256:23e223426b28491b1ced97dc3bbe183027419dfc7982b4fa2f05d5f3ff10711c \ - --hash=sha256:106fef6dc37dd8c0e2c0a60d3fca3e77460a48907f335fa28420463a6f799736 -distro==1.5.0 \ - --hash=sha256:df74eed763e18d10d0da624258524ae80486432cd17392d9c3d96f5e83cd2799 \ - --hash=sha256:0e58756ae38fbd8fc3020d54badb8eae17c5b9dcbed388b17bb55b8a5928df92 -docutils==0.17.1; python_version >= "2.7" and python_full_version < "3.0.0" or python_full_version >= "3.5.0" \ - --hash=sha256:cf316c8370a737a022b72b56874f6602acf974a37a9fba42ec2876387549fc61 \ - --hash=sha256:686577d2e4c32380bb50cbb22f575ed742d58168cee37e99117a854bcd88f125 -environs==9.2.0; python_version >= "3.6" \ - --hash=sha256:10dca340bff9c912e99d237905909390365e32723c2785a9f3afa6ef426c53bc \ - --hash=sha256:36081033ab34a725c2414f48ee7ec7f7c57e498d8c9255d61fbc7f2d4bf60865 -filelock==3.0.12; python_full_version >= "3.6.1" \ - --hash=sha256:929b7d63ec5b7d6b71b0fa5ac14e030b3f70b75747cef1b10da9b879fef15836 \ - --hash=sha256:18d82244ee114f543149c66a6e0c14e9c4f8a1044b5cdaadd0f82159d6a6ff59 -future==0.18.2; python_version >= "3.5" and python_full_version < "3.0.0" or python_full_version >= "3.3.0" and python_version >= "3.5" \ - --hash=sha256:b1bead90b70cf6ec3f0710ae53a525360fa360d306a86583adc6bf83a4db537d -gym==0.17.3; python_version >= "3.5" \ - --hash=sha256:96a7dd4e9cdb39e30c7a79e5773570fd9408f7fdb58c714c293cfbb314818eb6 -identify==2.2.11; python_full_version >= "3.6.1" \ - --hash=sha256:7abaecbb414e385752e8ce02d8c494f4fbc780c975074b46172598a28f1ab839 \ - --hash=sha256:a0e700637abcbd1caae58e0463861250095dfe330a8371733a471af706a4a29a -idna==2.10; python_version >= "2.7" and python_full_version < "3.0.0" or python_full_version >= "3.5.0" \ - --hash=sha256:b97d804b1e9b523befed77c48dacec60e6dcb0b5391d57af6a65a312a90648c0 \ - --hash=sha256:b307872f855b18632ce0c21c5e45be78c0ea7ae4c15c828c20788b26921eb3f6 -imageio==2.9.0; python_version >= "3.6" \ - --hash=sha256:3604d751f03002e8e0e7650aa71d8d9148144a87daf17cb1f3228e80747f2e6b \ - --hash=sha256:52ddbaeca2dccf53ba2d6dec5676ca7bc3b2403ef8b37f7da78b7654bb3e10f0 -kiwisolver==1.3.1; python_version >= "3.6" \ - --hash=sha256:fd34fbbfbc40628200730bc1febe30631347103fc8d3d4fa012c21ab9c11eca9 \ - --hash=sha256:d3155d828dec1d43283bd24d3d3e0d9c7c350cdfcc0bd06c0ad1209c1bbc36d0 \ - --hash=sha256:5a7a7dbff17e66fac9142ae2ecafb719393aaee6a3768c9de2fd425c63b53e21 \ - --hash=sha256:f8d6f8db88049a699817fd9178782867bf22283e3813064302ac59f61d95be05 \ - --hash=sha256:5f6ccd3dd0b9739edcf407514016108e2280769c73a85b9e59aa390046dbf08b \ - --hash=sha256:225e2e18f271e0ed8157d7f4518ffbf99b9450fca398d561eb5c4a87d0986dd9 \ - --hash=sha256:cf8b574c7b9aa060c62116d4181f3a1a4e821b2ec5cbfe3775809474113748d4 \ - --hash=sha256:232c9e11fd7ac3a470d65cd67e4359eee155ec57e822e5220322d7b2ac84fbf0 \ - --hash=sha256:b38694dcdac990a743aa654037ff1188c7a9801ac3ccc548d3341014bc5ca278 \ - --hash=sha256:ca3820eb7f7faf7f0aa88de0e54681bddcb46e485beb844fcecbcd1c8bd01689 \ - --hash=sha256:c8fd0f1ae9d92b42854b2979024d7597685ce4ada367172ed7c09edf2cef9cb8 \ - --hash=sha256:1e1bc12fb773a7b2ffdeb8380609f4f8064777877b2225dec3da711b421fda31 \ - --hash=sha256:72c99e39d005b793fb7d3d4e660aed6b6281b502e8c1eaf8ee8346023c8e03bc \ - --hash=sha256:8be8d84b7d4f2ba4ffff3665bcd0211318aa632395a1a41553250484a871d454 \ - --hash=sha256:31dfd2ac56edc0ff9ac295193eeaea1c0c923c0355bf948fbd99ed6018010b72 \ - --hash=sha256:563c649cfdef27d081c84e72a03b48ea9408c16657500c312575ae9d9f7bc1c3 \ - --hash=sha256:78751b33595f7f9511952e7e60ce858c6d64db2e062afb325985ddbd34b5c131 \ - --hash=sha256:a357fd4f15ee49b4a98b44ec23a34a95f1e00292a139d6015c11f55774ef10de \ - --hash=sha256:5989db3b3b34b76c09253deeaf7fbc2707616f130e166996606c284395da3f18 \ - --hash=sha256:c08e95114951dc2090c4a630c2385bef681cacf12636fb0241accdc6b303fd81 \ - --hash=sha256:44a62e24d9b01ba94ae7a4a6c3fb215dc4af1dde817e7498d901e229aaf50e4e \ - --hash=sha256:50af681a36b2a1dee1d3c169ade9fdc59207d3c31e522519181e12f1b3ba7000 \ - --hash=sha256:a53d27d0c2a0ebd07e395e56a1fbdf75ffedc4a05943daf472af163413ce9598 \ - --hash=sha256:834ee27348c4aefc20b479335fd422a2c69db55f7d9ab61721ac8cd83eb78882 \ - --hash=sha256:5c3e6455341008a054cccee8c5d24481bcfe1acdbc9add30aa95798e95c65621 \ - --hash=sha256:acef3d59d47dd85ecf909c359d0fd2c81ed33bdff70216d3956b463e12c38a54 \ - --hash=sha256:c5518d51a0735b1e6cee1fdce66359f8d2b59c3ca85dc2b0813a8aa86818a030 \ - --hash=sha256:b9edd0110a77fc321ab090aaa1cfcaba1d8499850a12848b81be2222eab648f6 \ - --hash=sha256:0cd53f403202159b44528498de18f9285b04482bab2a6fc3f5dd8dbb9352e30d \ - --hash=sha256:33449715e0101e4d34f64990352bce4095c8bf13bed1b390773fc0a7295967b3 \ - --hash=sha256:401a2e9afa8588589775fe34fc22d918ae839aaaf0c0e96441c0fdbce6d8ebe6 \ - --hash=sha256:950a199911a8d94683a6b10321f9345d5a3a8433ec58b217ace979e18f16e248 -marshmallow==3.12.2; python_version >= "3.6" \ - --hash=sha256:d4090ca9a36cd129126ad8b10c3982c47d4644a6e3ccb20534b512badce95f35 \ - --hash=sha256:77368dfedad93c3a041cbbdbce0b33fac1d8608c9e2e2288408a43ce3493d2ff -matplotlib==3.3.2; python_version >= "3.6" \ - --hash=sha256:27f9de4784ae6fb97679556c5542cf36c0751dccb4d6407f7c62517fa2078868 \ - --hash=sha256:06866c138d81a593b535d037b2727bec9b0818cadfe6a81f6ec5715b8dd38a89 \ - --hash=sha256:5ccecb5f78b51b885f0028b646786889f49c54883e554fca41a2a05998063f23 \ - --hash=sha256:69cf76d673682140f46c6cb5e073332c1f1b2853c748dc1cb04f7d00023567f7 \ - --hash=sha256:371518c769d84af8ec9b7dcb871ac44f7a67ef126dd3a15c88c25458e6b6d205 \ - --hash=sha256:793e061054662aa27acaff9201cdd510a698541c6e8659eeceb31d66c16facc6 \ - --hash=sha256:16b241c3d17be786966495229714de37de04472da472277869b8d5b456a8df00 \ - --hash=sha256:3fb0409754b26f48045bacd6818e44e38ca9338089f8ba689e2f9344ff2847c7 \ - --hash=sha256:548cfe81476dbac44db96e9c0b074b6fb333b4d1f12b1ae68dbed47e45166384 \ - --hash=sha256:f0268613073df055bcc6a490de733012f2cf4fe191c1adb74e41cec8add1a165 \ - --hash=sha256:57be9e21073fc367237b03ecac0d9e4b8ddbe38e86ec4a316857d8d93ac9286c \ - --hash=sha256:be2f0ec62e0939a9dcfd3638c140c5a74fc929ee3fd1f31408ab8633db6e1523 \ - --hash=sha256:c5d0c2ae3e3ed4e9f46b7c03b40d443601012ffe8eb8dfbb2bd6b2d00509f797 \ - --hash=sha256:a522de31e07ed7d6f954cda3fbd5ca4b8edbfc592a821a7b00291be6f843292e \ - --hash=sha256:8bc1d3284dee001f41ec98f59675f4d723683e1cc082830b440b5f081d8e0ade \ - --hash=sha256:799c421bc245a0749c1515b6dea6dc02db0a8c1f42446a0f03b3b82a60a900dc \ - --hash=sha256:2f5eefc17dc2a71318d5a3496313be5c351c0731e8c4c6182c9ac3782cfc4076 \ - --hash=sha256:3d2edbf59367f03cd9daf42939ca06383a7d7803e3993eb5ff1bee8e8a3fbb6b -netifaces==0.10.9 \ - --hash=sha256:b2ff3a0a4f991d2da5376efd3365064a43909877e9fabfa801df970771161d29 \ - --hash=sha256:0c4304c6d5b33fbd9b20fdc369f3a2fef1a8bbacfb6fd05b9708db01333e9e7b \ - --hash=sha256:7a25a8e28281504f0e23e181d7a9ed699c72f061ca6bdfcd96c423c2a89e75fc \ - --hash=sha256:6d84e50ec28e5d766c9911dce945412dc5b1ce760757c224c71e1a9759fa80c2 \ - --hash=sha256:f911b7f0083d445c8d24cfa5b42ad4996e33250400492080f5018a28c026db2b \ - --hash=sha256:4921ed406386246b84465950d15a4f63480c1458b0979c272364054b29d73084 \ - --hash=sha256:5b3167f923f67924b356c1338eb9ba275b2ba8d64c7c2c47cf5b5db49d574994 \ - --hash=sha256:db881478f1170c6dd524175ba1c83b99d3a6f992a35eca756de0ddc4690a1940 \ - --hash=sha256:f0427755c68571df37dc58835e53a4307884a48dec76f3c01e33eb0d4a3a81d7 \ - --hash=sha256:7cc6fd1eca65be588f001005446a47981cbe0b2909f5be8feafef3bf351a4e24 \ - --hash=sha256:b47e8f9ff6846756be3dc3fb242ca8e86752cd35a08e06d54ffc2e2a2aca70ea \ - --hash=sha256:f8885cc48c8c7ad51f36c175e462840f163cb4687eeb6c6d7dfaf7197308e36b \ - --hash=sha256:755050799b5d5aedb1396046f270abfc4befca9ccba3074f3dbbb3cb34f13aae \ - --hash=sha256:ad10acab2ef691eb29a1cc52c3be5ad1423700e993cc035066049fa72999d0dc \ - --hash=sha256:563a1a366ee0fb3d96caab79b7ac7abd2c0a0577b157cc5a40301373a0501f89 \ - --hash=sha256:30ed89ab8aff715caf9a9d827aa69cd02ad9f6b1896fd3fb4beb998466ed9a3c \ - --hash=sha256:75d3a4ec5035db7478520ac547f7c176e9fd438269e795819b67223c486e5cbe \ - --hash=sha256:078986caf4d6a602a4257d3686afe4544ea74362b8928e9f4389b5cd262bc215 \ - --hash=sha256:3095218b66d359092b82f07c5422293c2f6559cf8d36b96b379cc4cdc26eeffa \ - --hash=sha256:da298241d87bcf468aa0f0705ba14572ad296f24c4fda5055d6988701d6fd8e1 \ - --hash=sha256:86b8a140e891bb23c8b9cb1804f1475eb13eea3dbbebef01fcbbf10fbafbee42 \ - --hash=sha256:2dee9ffdd16292878336a58d04a20f0ffe95555465fee7c9bd23b3490ef2abf3 -networkx==2.5.1; python_version >= "3.6" \ - --hash=sha256:0635858ed7e989f4c574c2328380b452df892ae85084144c73d8cd819f0c4e06 \ - --hash=sha256:109cd585cac41297f71103c3c42ac6ef7379f29788eb54cb751be5a663bb235a -nodeenv==1.6.0; python_full_version >= "3.6.1" \ - --hash=sha256:621e6b7076565ddcacd2db0294c0381e01fd28945ab36bcf00f41c5daf63bef7 \ - --hash=sha256:3ef13ff90291ba2a4a7a4ff9a979b63ffdd00a464dbe04acf0ea6471517a4c2b -numpy==1.20.2; python_version >= "3.7" \ - --hash=sha256:e9459f40244bb02b2f14f6af0cd0732791d72232bbb0dc4bab57ef88e75f6935 \ - --hash=sha256:a8e6859913ec8eeef3dbe9aed3bf475347642d1cdd6217c30f28dee8903528e6 \ - --hash=sha256:9cab23439eb1ebfed1aaec9cd42b7dc50fc96d5cd3147da348d9161f0501ada5 \ - --hash=sha256:9c0fab855ae790ca74b27e55240fe4f2a36a364a3f1ebcfd1fb5ac4088f1cec3 \ - --hash=sha256:61d5b4cf73622e4d0c6b83408a16631b670fc045afd6540679aa35591a17fe6d \ - --hash=sha256:d15007f857d6995db15195217afdbddfcd203dfaa0ba6878a2f580eaf810ecd6 \ - --hash=sha256:d76061ae5cab49b83a8cf3feacefc2053fac672728802ac137dd8c4123397677 \ - --hash=sha256:bad70051de2c50b1a6259a6df1daaafe8c480ca98132da98976d8591c412e737 \ - --hash=sha256:719656636c48be22c23641859ff2419b27b6bdf844b36a2447cb39caceb00935 \ - --hash=sha256:aa046527c04688af680217fffac61eec2350ef3f3d7320c07fd33f5c6e7b4d5f \ - --hash=sha256:2428b109306075d89d21135bdd6b785f132a1f5a3260c371cee1fae427e12727 \ - --hash=sha256:e8e4fbbb7e7634f263c5b0150a629342cc19b47c5eba8d1cd4363ab3455ab576 \ - --hash=sha256:edb1f041a9146dcf02cd7df7187db46ab524b9af2515f392f337c7cbbf5b52cd \ - --hash=sha256:c73a7975d77f15f7f68dacfb2bca3d3f479f158313642e8ea9058eea06637931 \ - --hash=sha256:6c915ee7dba1071554e70a3664a839fbc033e1d6528199d4621eeaaa5487ccd2 \ - --hash=sha256:471c0571d0895c68da309dacee4e95a0811d0a9f9f532a48dc1bea5f3b7ad2b7 \ - --hash=sha256:4703b9e937df83f5b6b7447ca5912b5f5f297aba45f91dbbbc63ff9278c7aa98 \ - --hash=sha256:abc81829c4039e7e4c30f7897938fa5d4916a09c2c7eb9b244b7a35ddc9656f4 \ - --hash=sha256:377751954da04d4a6950191b20539066b4e19e3b559d4695399c5e8e3e683bf6 \ - --hash=sha256:6e51e417d9ae2e7848314994e6fc3832c9d426abce9328cf7571eefceb43e6c9 \ - --hash=sha256:780ae5284cb770ade51d4b4a7dce4faa554eb1d88a56d0e8b9f35fca9b0270ff \ - --hash=sha256:924dc3f83de20437de95a73516f36e09918e9c9c18d5eac520062c49191025fb \ - --hash=sha256:97ce8b8ace7d3b9288d88177e66ee75480fb79b9cf745e91ecfe65d91a856042 \ - --hash=sha256:878922bf5ad7550aa044aa9301d417e2d3ae50f0f577de92051d739ac6096cee -opencv-python==4.2.0.32 \ - --hash=sha256:9cd9bd72f4a9743ef6f11f0f96784bd215a542e996db1717d4c2d3d03eb81a1b \ - --hash=sha256:68c1c846dd267cd7e293d3fc0bb238db0a744aa1f2e721e327598f00cb982098 \ - --hash=sha256:a2b08aec2eacae868723136383d9eb84a33062a7a7ec5ec3bd2c423bd1355946 \ - --hash=sha256:afbc81a3870739610a9f9a1197374d6a45892cf1933c90fc5617d39790991ed3 \ - --hash=sha256:e36a8857be2c849e54009f1bee25e8c34fbc683fcd38c6c700af4cba5f8d57c2 \ - --hash=sha256:2baf1213ae2fd678991f905d7b2b94eddfdfb5f75757db0f0b31eebd48ca200d \ - --hash=sha256:db1d49b753e6e6c76585f21d09c7e9812176732baa9bddb64bc2fc6cd24d4179 \ - --hash=sha256:eae3da9231d87980f8082d181c276a04f7a6fdac130cebd467390b96dd05f944 \ - --hash=sha256:8c76983c9ec3e4cf3a4c1d172ec4285332d9fb1c7194d724aff0c518437471ee \ - --hash=sha256:8002959146ed21959e3118c60c8e94ceac02eea15b691da6c62cff4787c63f7f \ - --hash=sha256:a1a5517301dc8d56243a14253d231ec755b94486b4fff2ae68269bc941bb1f2e \ - --hash=sha256:889eef049d38488b5b4646c48a831feed37c0fd44f3d83c05cff80f4baded145 \ - --hash=sha256:703910aaa1dcd25a412f78a190fb7a352d9a64ee7d9a35566d786f3cc66ebf20 \ - --hash=sha256:32384e675f7cefe707cac40a95eeb142d6869065e39c5500374116297cd8ca6d \ - --hash=sha256:f01a87a015227d8af407161eb48222fc3c8b01661cdc841e2b86eee4f1a7a417 \ - --hash=sha256:e699232fd033ef0053efec2cba0a7505514f374ba7b18c732a77cb5304311ef9 \ - --hash=sha256:a8529a79233f3581a66984acd16bce52ab0163f6f77568dd69e9ee4956d2e1db \ - --hash=sha256:312dda54c7e809c20d7409418060ae0e9cdbe82975e7ced429eb3c234ffc0d4a \ - --hash=sha256:167a6aff9bd124a3a67e0ec25d0da5ecdc8d96a56405e3e5e7d586c4105eb1bb \ - --hash=sha256:baeb5dd8b21c718580687f5b4efd03f8139b1c56239cdf6b9805c6946e80f268 \ - --hash=sha256:0f2e739c582e8c5e432130648bc6d66a56bc65f4cd9ff0bc7033033d2130c7a3 \ - --hash=sha256:ee6814c94dbf1cae569302afef9dd29efafc52373e8770ded0db549a3b6e0c00 \ - --hash=sha256:0f3d159ad6cb9cbd188c726f87485f0799a067a0a15f34c25d7b5c8db3cb2e50 \ - --hash=sha256:6841bb9cc24751dbdf94e7eefc4e6d70ec297952501954471299fd12ab67391c \ - --hash=sha256:1b90d50bc7a31e9573a8da1b80fcd1e4d9c86c0e5f76387858e1b87eb8b0332b \ - --hash=sha256:e242ed419aeb2488e0f9ee6410a34917f0f8d62b3ae96aa3170d83bae75004e2 \ - --hash=sha256:5c50634dd8f2f866fd99fd939292ce10e52bef82804ebc4e7f915221c3b7e951 -pillow==8.3.1; python_version >= "3.6" \ - --hash=sha256:196560dba4da7a72c5e7085fccc5938ab4075fd37fe8b5468869724109812edd \ - --hash=sha256:29c9569049d04aaacd690573a0398dbd8e0bf0255684fee512b413c2142ab723 \ - --hash=sha256:c088a000dfdd88c184cc7271bfac8c5b82d9efa8637cd2b68183771e3cf56f04 \ - --hash=sha256:fc214a6b75d2e0ea7745488da7da3c381f41790812988c7a92345978414fad37 \ - --hash=sha256:a17ca41f45cf78c2216ebfab03add7cc350c305c38ff34ef4eef66b7d76c5229 \ - --hash=sha256:67b3666b544b953a2777cb3f5a922e991be73ab32635666ee72e05876b8a92de \ - --hash=sha256:ff04c373477723430dce2e9d024c708a047d44cf17166bf16e604b379bf0ca14 \ - --hash=sha256:9364c81b252d8348e9cc0cb63e856b8f7c1b340caba6ee7a7a65c968312f7dab \ - --hash=sha256:a2f381932dca2cf775811a008aa3027671ace723b7a38838045b1aee8669fdcf \ - --hash=sha256:d0da39795049a9afcaadec532e7b669b5ebbb2a9134576ebcc15dd5bdae33cc0 \ - --hash=sha256:2b6dfa068a8b6137da34a4936f5a816aba0ecc967af2feeb32c4393ddd671cba \ - --hash=sha256:a4eef1ff2d62676deabf076f963eda4da34b51bc0517c70239fafed1d5b51500 \ - --hash=sha256:660a87085925c61a0dcc80efb967512ac34dbb256ff7dd2b9b4ee8dbdab58cf4 \ - --hash=sha256:15a2808e269a1cf2131930183dcc0419bc77bb73eb54285dde2706ac9939fa8e \ - --hash=sha256:969cc558cca859cadf24f890fc009e1bce7d7d0386ba7c0478641a60199adf79 \ - --hash=sha256:2ee77c14a0299d0541d26f3d8500bb57e081233e3fa915fa35abd02c51fa7fae \ - --hash=sha256:c11003197f908878164f0e6da15fce22373ac3fc320cda8c9d16e6bba105b844 \ - --hash=sha256:3f08bd8d785204149b5b33e3b5f0ebbfe2190ea58d1a051c578e29e39bfd2367 \ - --hash=sha256:70af7d222df0ff81a2da601fab42decb009dc721545ed78549cb96e3a1c5f0c8 \ - --hash=sha256:37730f6e68bdc6a3f02d2079c34c532330d206429f3cee651aab6b66839a9f0e \ - --hash=sha256:4bc3c7ef940eeb200ca65bd83005eb3aae8083d47e8fcbf5f0943baa50726856 \ - --hash=sha256:c35d09db702f4185ba22bb33ef1751ad49c266534339a5cebeb5159d364f6f82 \ - --hash=sha256:0b2efa07f69dc395d95bb9ef3299f4ca29bcb2157dc615bae0b42c3c20668ffc \ - --hash=sha256:cc866706d56bd3a7dbf8bac8660c6f6462f2f2b8a49add2ba617bc0c54473d83 \ - --hash=sha256:9a211b663cf2314edbdb4cf897beeb5c9ee3810d1d53f0e423f06d6ebbf9cd5d \ - --hash=sha256:c2a5ff58751670292b406b9f06e07ed1446a4b13ffced6b6cab75b857485cbc8 \ - --hash=sha256:c379425c2707078dfb6bfad2430728831d399dc95a7deeb92015eb4c92345eaf \ - --hash=sha256:114f816e4f73f9ec06997b2fde81a92cbf0777c9e8f462005550eed6bae57e63 \ - --hash=sha256:8960a8a9f4598974e4c2aeb1bff9bdd5db03ee65fd1fce8adf3223721aa2a636 \ - --hash=sha256:147bd9e71fb9dcf08357b4d530b5167941e222a6fd21f869c7911bac40b9994d \ - --hash=sha256:1fd5066cd343b5db88c048d971994e56b296868766e461b82fa4e22498f34d77 \ - --hash=sha256:f4ebde71785f8bceb39dcd1e7f06bcc5d5c3cf48b9f69ab52636309387b097c8 \ - --hash=sha256:1c03e24be975e2afe70dfc5da6f187eea0b49a68bb2b69db0f30a61b7031cee4 \ - --hash=sha256:2cac53839bfc5cece8fdbe7f084d5e3ee61e1303cccc86511d351adcb9e2c792 -pre-commit==2.13.0; python_full_version >= "3.6.1" \ - --hash=sha256:b679d0fddd5b9d6d98783ae5f10fd0c4c59954f375b70a58cbe1ce9bcf9809a4 \ - --hash=sha256:764972c60693dc668ba8e86eb29654ec3144501310f7198742a767bec385a378 -pyglet==1.5.0 \ - --hash=sha256:a42f599ebd0dc8113563041c402ae09be05cdcbc643bb1183785141ba3c3304e \ - --hash=sha256:6ea918985feddfa9bf0fcc01ffe9ff5849e7b6e832d9b2e03b9d2a36369cb6ee -pyparsing==2.4.7; python_version >= "3.6" and python_full_version < "3.0.0" or python_full_version >= "3.3.0" and python_version >= "3.6" \ - --hash=sha256:ef9d7589ef3c200abe66653d3f1ab1033c3c419ae9b9bdb1240a85b024efc88b \ - --hash=sha256:c203ec8783bf771a155b207279b9bccb8dea02d8f0c9e5f8ead507bc3246ecc1 -python-dateutil==2.8.1; python_version >= "3.6" and python_full_version < "3.0.0" or python_full_version >= "3.3.0" and python_version >= "3.6" \ - --hash=sha256:73ebfe9dbf22e832286dafa60473e4cd239f8592f699aa5adaf10050e6e1823c \ - --hash=sha256:75bb3f31ea686f1197762692a9ee6a7550b59fc6ca3a1f4b5d7e32fb98e2da2a -python-dotenv==0.18.0; python_version >= "3.6" \ - --hash=sha256:effaac3c1e58d89b3ccb4d04a40dc7ad6e0275fda25fd75ae9d323e2465e202d \ - --hash=sha256:dd8fe852847f4fbfadabf6183ddd4c824a9651f02d51714fa075c95561959c7d -pywavelets==1.1.1; python_version >= "3.6" \ - --hash=sha256:35959c041ec014648575085a97b498eafbbaa824f86f6e4a59bfdef8a3fe6308 \ - --hash=sha256:55e39ec848ceec13c9fa1598253ae9dd5c31d09dfd48059462860d2b908fb224 \ - --hash=sha256:c06d2e340c7bf8b9ec71da2284beab8519a3908eab031f4ea126e8ccfc3fd567 \ - --hash=sha256:be105382961745f88d8196bba5a69ee2c4455d87ad2a2e5d1eed6bd7fda4d3fd \ - --hash=sha256:076ca8907001fdfe4205484f719d12b4a0262dfe6652fa1cfc3c5c362d14dc84 \ - --hash=sha256:7947e51ca05489b85928af52a34fe67022ab5b81d4ae32a4109a99e883a0635e \ - --hash=sha256:9e2528823ccf5a0a1d23262dfefe5034dce89cd84e4e124dc553dfcdf63ebb92 \ - --hash=sha256:80b924edbc012ded8aa8b91cb2fd6207fb1a9a3a377beb4049b8a07445cec6f0 \ - --hash=sha256:c2a799e79cee81a862216c47e5623c97b95f1abee8dd1f9eed736df23fb653fb \ - --hash=sha256:d510aef84d9852653d079c84f2f81a82d5d09815e625f35c95714e7364570ad4 \ - --hash=sha256:889d4c5c5205a9c90118c1980df526857929841df33e4cd1ff1eff77c6817a65 \ - --hash=sha256:68b5c33741d26c827074b3d8f0251de1c3019bb9567b8d303eb093c822ce28f1 \ - --hash=sha256:18a51b3f9416a2ae6e9a35c4af32cf520dd7895f2b69714f4aa2f4342fca47f9 \ - --hash=sha256:cfe79844526dd92e3ecc9490b5031fca5f8ab607e1e858feba232b1b788ff0ea \ - --hash=sha256:2f7429eeb5bf9c7068002d0d7f094ed654c77a70ce5e6198737fd68ab85f8311 \ - --hash=sha256:720dbcdd3d91c6dfead79c80bf8b00a1d8aa4e5d551dc528c6d5151e4efc3403 \ - --hash=sha256:bc5e87b72371da87c9bebc68e54882aada9c3114e640de180f62d5da95749cd3 \ - --hash=sha256:98b2669c5af842a70cfab33a7043fcb5e7535a690a00cd251b44c9be0be418e5 \ - --hash=sha256:e02a0558e0c2ac8b8bbe6a6ac18c136767ec56b96a321e0dfde2173adfa5a504 \ - --hash=sha256:6162dc0ae04669ea04b4b51420777b9ea2d30b0a9d02901b2a3b4d61d159c2e9 \ - --hash=sha256:39c74740718e420d38c78ca4498568fa57976d78d5096277358e0fa9629a7aea \ - --hash=sha256:79f5b54f9dc353e5ee47f0c3f02bebd2c899d49780633aa771fed43fa20b3149 \ - --hash=sha256:935ff247b8b78bdf77647fee962b1cc208c51a7b229db30b9ba5f6da3e675178 \ - --hash=sha256:6ebfefebb5c6494a3af41ad8c60248a95da267a24b79ed143723d4502b1fe4d7 \ - --hash=sha256:6bc78fb9c42a716309b4ace56f51965d8b5662c3ba19d4591749f31773db1125 \ - --hash=sha256:411e17ca6ed8cf5e18a7ca5ee06a91c25800cc6c58c77986202abf98d749273a \ - --hash=sha256:83c5e3eb78ce111c2f0b45f46106cc697c3cb6c4e5f51308e1f81b512c70c8fb \ - --hash=sha256:2b634a54241c190ee989a4af87669d377b37c91bcc9cf0efe33c10ff847f7841 \ - --hash=sha256:732bab78435c48be5d6bc75486ef629d7c8f112e07b313bf1f1a2220ab437277 \ - --hash=sha256:1a64b40f6acb4ffbaccce0545d7fc641744f95351f62e4c6aaa40549326008c9 -pyyaml==5.4.1; python_full_version >= "3.6.1" \ - --hash=sha256:3b2b1824fe7112845700f815ff6a489360226a5609b96ec2190a45e62a9fc922 \ - --hash=sha256:129def1b7c1bf22faffd67b8f3724645203b79d8f4cc81f674654d9902cb4393 \ - --hash=sha256:4465124ef1b18d9ace298060f4eccc64b0850899ac4ac53294547536533800c8 \ - --hash=sha256:bb4191dfc9306777bc594117aee052446b3fa88737cd13b7188d0e7aa8162185 \ - --hash=sha256:6c78645d400265a062508ae399b60b8c167bf003db364ecb26dcab2bda048253 \ - --hash=sha256:4e0583d24c881e14342eaf4ec5fbc97f934b999a6828693a99157fde912540cc \ - --hash=sha256:72a01f726a9c7851ca9bfad6fd09ca4e090a023c00945ea05ba1638c09dc3347 \ - --hash=sha256:895f61ef02e8fed38159bb70f7e100e00f471eae2bc838cd0f4ebb21e28f8541 \ - --hash=sha256:3bd0e463264cf257d1ffd2e40223b197271046d09dadf73a0fe82b9c1fc385a5 \ - --hash=sha256:e4fac90784481d221a8e4b1162afa7c47ed953be40d31ab4629ae917510051df \ - --hash=sha256:5accb17103e43963b80e6f837831f38d314a0495500067cb25afab2e8d7a4018 \ - --hash=sha256:e1d4970ea66be07ae37a3c2e48b5ec63f7ba6804bdddfdbd3cfd954d25a82e63 \ - --hash=sha256:cb333c16912324fd5f769fff6bc5de372e9e7a202247b48870bc251ed40239aa \ - --hash=sha256:fe69978f3f768926cfa37b867e3843918e012cf83f680806599ddce33c2c68b0 \ - --hash=sha256:dd5de0646207f053eb0d6c74ae45ba98c3395a571a2891858e87df7c9b9bd51b \ - --hash=sha256:08682f6b72c722394747bddaf0aa62277e02557c0fd1c42cb853016a38f8dedf \ - --hash=sha256:d2d9808ea7b4af864f35ea216be506ecec180628aced0704e34aca0b040ffe46 \ - --hash=sha256:8c1be557ee92a20f184922c7b6424e8ab6691788e6d86137c5d93c1a6ec1b8fb \ - --hash=sha256:fd7f6999a8070df521b6384004ef42833b9bd62cfee11a09bda1079b4b704247 \ - --hash=sha256:bfb51918d4ff3d77c1c856a9699f8492c612cde32fd3bcd344af9be34999bfdc \ - --hash=sha256:fa5ae20527d8e831e8230cbffd9f8fe952815b2b7dae6ffec25318803a7528fc \ - --hash=sha256:0f5f5786c0e09baddcd8b4b45f20a7b5d61a7e7e99846e3c799b05c7c53fa696 \ - --hash=sha256:294db365efa064d00b8d1ef65d8ea2c3426ac366c0c4368d930bf1c5fb497f77 \ - --hash=sha256:74c1485f7707cf707a7aef42ef6322b8f97921bd89be2ab6317fd782c2d53183 \ - --hash=sha256:d483ad4e639292c90170eb6f7783ad19490e7a8defb3e46f97dfe4bacae89122 \ - --hash=sha256:fdc842473cd33f45ff6bce46aea678a54e3d21f1b61a7750ce3c498eedfe25d6 \ - --hash=sha256:49d4cdd9065b9b6e206d0595fee27a96b5dd22618e7520c33204a4a3239d5b10 \ - --hash=sha256:c20cfa2d49991c8b4147af39859b167664f2ad4561704ee74c1de03318e898db \ - --hash=sha256:607774cbba28732bfa802b54baa7484215f530991055bb562efbed5b2f20a45e -requests==2.25.1; (python_version >= "2.7" and python_full_version < "3.0.0") or (python_full_version >= "3.5.0") \ - --hash=sha256:c210084e36a42ae6b9219e00e48287def368a26d03a048ddad7bfee44f75871e \ - --hash=sha256:27973dd4a904a4f13b263a19c866c13b92a39ed1c964655f025f3f8d3d75b804 -rospkg==1.2.8 \ - --hash=sha256:61dc65c5575fe5fe8061fe1064f439fe64a17af6dcad42efefbe4ca0490661cc \ - --hash=sha256:f0db1a4ed29ff12174a673376080bbaaf52024d3982993570b8a13b3862c8709 -scikit-image==0.17.2; python_version >= "3.6" \ - --hash=sha256:bd954c0588f0f7e81d9763dc95e06950e68247d540476e06cb77bcbcd8c2d8b3 \ - --hash=sha256:11eec2e65cd4cd6487fe1089aa3538dbe25525aec7a36f5a0f14145df0163ce7 \ - --hash=sha256:c5c277704b12e702e34d1f7b7a04d5ee8418735f535d269c74c02c6c9f8abee2 \ - --hash=sha256:1fda9109a19dc9d7a4ac152d1fc226fed7282ad186a099f14c0aa9151f0c758e \ - --hash=sha256:86a834f9a4d30201c0803a48a25364fe8f93f9feb3c58f2c483d3ce0a3e5fe4a \ - --hash=sha256:87ca5168c6fc36b7a298a1db2d185a8298f549854342020f282f747a4e4ddce9 \ - --hash=sha256:e99fa7514320011b250a21ab855fdd61ddcc05d3c77ec9e8f13edcc15d3296b5 \ - --hash=sha256:ee3db438b5b9f8716a91ab26a61377a8a63356b186706f5b979822cc7241006d \ - --hash=sha256:6b65a103edbc34b22640daf3b084dc9e470c358d3298c10aa9e3b424dcc02db6 \ - --hash=sha256:c0876e562991b0babff989ff4d00f35067a2ddef82e5fdd895862555ffbaec25 \ - --hash=sha256:178210582cc62a5b25c633966658f1f2598615f9c3f27f36cf45055d2a74b401 \ - --hash=sha256:7bedd3881ca4fea657a894815bcd5e5bf80944c26274f6b6417bb770c3f4f8e6 \ - --hash=sha256:113bcacdfc839854f527a166a71768708328208e7b66e491050d6a57fa6727c7 -scipy==1.6.1; python_version >= "3.7" \ - --hash=sha256:a15a1f3fc0abff33e792d6049161b7795909b40b97c6cc2934ed54384017ab76 \ - --hash=sha256:e79570979ccdc3d165456dd62041d9556fb9733b86b4b6d818af7a0afc15f092 \ - --hash=sha256:a423533c55fec61456dedee7b6ee7dce0bb6bfa395424ea374d25afa262be261 \ - --hash=sha256:33d6b7df40d197bdd3049d64e8e680227151673465e5d85723b3b8f6b15a6ced \ - --hash=sha256:6725e3fbb47da428794f243864f2297462e9ee448297c93ed1dcbc44335feb78 \ - --hash=sha256:5fa9c6530b1661f1370bcd332a1e62ca7881785cc0f80c0d559b636567fab63c \ - --hash=sha256:bd50daf727f7c195e26f27467c85ce653d41df4358a25b32434a50d8870fc519 \ - --hash=sha256:f46dd15335e8a320b0fb4685f58b7471702234cba8bb3442b69a3e1dc329c345 \ - --hash=sha256:0e5b0ccf63155d90da576edd2768b66fb276446c371b73841e3503be1d63fb5d \ - --hash=sha256:2481efbb3740977e3c831edfd0bd9867be26387cacf24eb5e366a6a374d3d00d \ - --hash=sha256:68cb4c424112cd4be886b4d979c5497fba190714085f46b8ae67a5e4416c32b4 \ - --hash=sha256:5f331eeed0297232d2e6eea51b54e8278ed8bb10b099f69c44e2558c090d06bf \ - --hash=sha256:0c8a51d33556bf70367452d4d601d1742c0e806cd0194785914daf19775f0e67 \ - --hash=sha256:83bf7c16245c15bc58ee76c5418e46ea1811edcc2e2b03041b804e46084ab627 \ - --hash=sha256:794e768cc5f779736593046c9714e0f3a5940bc6dcc1dba885ad64cbfb28e9f0 \ - --hash=sha256:5da5471aed911fe7e52b86bf9ea32fb55ae93e2f0fac66c32e58897cfb02fa07 \ - --hash=sha256:8e403a337749ed40af60e537cc4d4c03febddcc56cd26e774c9b1b600a70d3e4 \ - --hash=sha256:a5193a098ae9f29af283dcf0041f762601faf2e595c0db1da929875b7570353f \ - --hash=sha256:c4fceb864890b6168e79b0e714c585dbe2fd4222768ee90bc1aa0f8218691b11 -six==1.14.0; (python_version >= "2.7" and python_full_version < "3.0.0") or (python_full_version >= "3.3.0") \ - --hash=sha256:8f3cd2e254d8f793e7f3d6d9df77b92252b52637291d0f0da013c76ea2724b6c \ - --hash=sha256:236bdbdce46e6e6a3d61a337c0f8b763ca1e8717c03b369e87a7ec7ce1319c0a -tifffile==2021.7.2; python_version >= "3.7" \ - --hash=sha256:3025eaecf3c188ebb9c82e6476da9d8e712630e972f00ea38e054351928e56cc \ - --hash=sha256:17fc5ca901f7d7b827a16e4695668e6120319324efe6d1333258397d8a71dedb -toml==0.10.2; python_full_version >= "3.6.1" \ - --hash=sha256:806143ae5bfb6a3c6e736a764057db0e6a0e05e338b5630894a5f779cabb4f9b \ - --hash=sha256:b3bda1d108d5dd99f4a20d24d9c348e91c4db7ab1b749200bded2f839ccbe68f -urllib3==1.26.6; python_version >= "2.7" and python_full_version < "3.0.0" or python_full_version >= "3.5.0" and python_version < "4" \ - --hash=sha256:39fb8672126159acb139a7718dd10806104dec1e2f0f6c88aab05d17df10c8d4 \ - --hash=sha256:f57b4c16c62fa2760b7e3d97c35b255512fb6b59a259730f36ba32ce9f8e342f -virtualenv==20.4.7; python_full_version >= "3.6.1" \ - --hash=sha256:2b0126166ea7c9c3661f5b8e06773d28f83322de7a3ff7d06f0aed18c9de6a76 \ - --hash=sha256:14fdf849f80dbb29a4eb6caa9875d476ee2a5cf76a5f5415fa2f1606010ab467 - -tqdm==4.64.0 --hash=sha256:74a2cdefe14d11442cedf3ba4e21a3b84ff9a2dbdc6cfae2c34addb2a14a5ea6 - ---extra-index-url https://download.pytorch.org/whl/cu113 -torch==1.12.1+cu113 --hash=sha256:4adf483ac2d047534a7d023f0022bd8694d87627068ad6dddf186cb3273bbfa2 +absl-py==1.3.0 +actionlib==1.14.0 +alembic==1.9.2 +angles==1.9.13 +astunparse==1.6.3 +bondpy==1.8.6 +cachetools==5.2.0 +camera-calibration==1.17.0 +camera-calibration-parsers==1.12.0 +catkin==0.8.10 +catkin-pkg==0.5.2 +certifi==2022.9.24 +charset-normalizer==2.1.1 +cloudpickle==1.6.0 +cmaes==0.9.1 +colorlog==6.7.0 +contourpy==1.0.6 +controller-manager==0.19.6 +controller-manager-msgs==0.19.6 +cv-bridge==1.16.2 +cycler==0.11.0 +defusedxml==0.7.1 +diagnostic-analysis==1.11.0 +diagnostic-common-diagnostics==1.11.0 +diagnostic-updater==1.11.0 +distlib==0.3.6 +distro==1.8.0 +docutils==0.19 +dynamic-reconfigure==1.7.3 +empy==3.3.4 +et-xmlfile==1.1.0 +flatbuffers==22.11.23 +fonttools==4.38.0 +future==0.18.3 +gast==0.4.0 +gazebo_plugins==2.9.2 +gazebo_ros==2.9.2 +gencpp==0.7.0 +geneus==3.0.0 +genlisp==0.4.18 +genmsg==0.6.0 +gennodejs==2.0.2 +genpy==0.6.15 +google-auth==2.14.1 +google-auth-oauthlib==0.4.6 +google-pasta==0.2.0 +greenlet==2.0.1 +grpcio==1.50.0 +gym==0.17.3 +gym-notices==0.0.8 +Gymnasium==0.26.3 +gymnasium-notices==0.0.1 +h5py==3.7.0 +idna==3.4 +image-geometry==1.16.2 +importlib-metadata==5.1.0 +importlib-resources==5.10.2 +interactive-markers==1.12.0 +joblib==1.2.0 +joint-state-publisher==1.15.1 +joint-state-publisher-gui==1.15.1 +keras==2.11.0 +kiwisolver==1.4.4 +laser_geometry==1.6.7 +libclang==14.0.6 +Mako==1.2.4 +Markdown==3.4.1 +MarkupSafe==2.1.1 +matplotlib==3.6.2 +message-filters==1.15.15 +netifaces==0.11.0 +numpy==1.24.1 +oauthlib==3.2.2 +opencv-python==4.6.0.66 +openpyxl==3.0.10 +opt-einsum==3.3.0 +optuna==3.1.0 +packaging==21.3 +pandas==1.5.2 +Pillow==9.3.0 +platformdirs==2.5.4 +protobuf==3.19.6 +pyasn1==0.4.8 +pyasn1-modules==0.2.8 +pydantic==1.10.2 +pyglet==1.5.0 +Pygments==2.14.0 +pyparsing==3.0.9 +python-dateutil==2.8.2 +python-qt-binding==0.4.4 +pytz==2022.6 +PyYAML==6.0 +qt-dotgraph==0.4.2 +qt-gui==0.4.2 +qt-gui-cpp==0.4.2 +qt-gui-py-common==0.4.2 +reloading==1.1.2 +requests==2.28.1 +requests-oauthlib==1.3.1 +resource_retriever==1.12.7 +rosbag==1.15.15 +rosboost-cfg==1.15.8 +rosclean==1.15.8 +roscreate==1.15.8 +rosgraph==1.15.15 +roslaunch==1.15.15 +roslib==1.15.8 +roslint==0.12.0 +roslz4==1.15.15 +rosmake==1.15.8 +rosmaster==1.15.15 +rosmsg==1.15.15 +rosnode==1.15.15 +rosparam==1.15.15 +rospkg==1.4.0 +rospy==1.15.15 +rosservice==1.15.15 +rostest==1.15.15 +rostopic==1.15.15 +rosunit==1.15.8 +roswtf==1.15.15 +rqt-moveit==0.5.10 +rqt-reconfigure==0.5.5 +rqt-robot-dashboard==0.5.8 +rqt-robot-monitor==0.5.14 +rqt-rviz==0.7.0 +rqt_action==0.4.9 +rqt_bag==0.5.1 +rqt_bag_plugins==0.5.1 +rqt_console==0.4.11 +rqt_dep==0.4.12 +rqt_graph==0.4.14 +rqt_gui==0.5.3 +rqt_gui_py==0.5.3 +rqt_image_view==0.4.16 +rqt_launch==0.4.9 +rqt_logger_level==0.4.11 +rqt_msg==0.4.10 +rqt_nav_view==0.5.7 +rqt_plot==0.4.13 +rqt_pose_view==0.5.11 +rqt_publisher==0.4.10 +rqt_py_common==0.5.3 +rqt_py_console==0.4.10 +rqt_robot_steering==0.5.12 +rqt_runtime_monitor==0.5.9 +rqt_service_caller==0.4.10 +rqt_shell==0.4.11 +rqt_srv==0.4.9 +rqt_tf_tree==0.6.3 +rqt_top==0.4.10 +rqt_topic==0.4.13 +rqt_web==0.4.10 +rsa==4.9 +rviz==1.14.19 +scikit-learn==1.2.0 +scipy==1.10.0 +sensor-msgs==1.13.1 +six==1.16.0 +sklearn==0.0.post1 +smach==2.5.0 +smach-ros==2.5.0 +smclib==1.8.6 +SQLAlchemy==1.4.46 +tensorboard==2.11.0 +tensorboard-data-server==0.6.1 +tensorboard-plugin-wit==1.8.1 +tensorflow==2.11.0 +tensorflow-estimator==2.11.0 +tensorflow-io-gcs-filesystem==0.28.0 +tensorrt==0.0.1 +termcolor==2.1.1 +tf2-geometry-msgs==0.7.6 +tf2-kdl==0.7.6 +tf2-py==0.7.6 +tf2-ros==0.7.6 +threadpoolctl==3.1.0 +topic-tools==1.15.15 +tqdm==4.64.1 +typing_extensions==4.4.0 +urllib3==1.26.13 +Werkzeug==2.2.2 +wrapt==1.14.1 +xacro==1.14.14 +zipp==3.11.0 diff --git a/rl_studio/README.md b/rl_studio/README.md index c4e497f6e..9f1505c95 100644 --- a/rl_studio/README.md +++ b/rl_studio/README.md @@ -1,12 +1,54 @@ -# Run RL Studio +# Introduction + +RL-Studio allows training, retraining and inference of already created models. +We have called each of these options **modes of operation**, or **modes** for short. + +- In the **training mode** we define a specific task to achieve, a specific agent, the algorithm necessary to learn and the simulator where the training will be executed. The final result is a model generated and saved to be able to be used. +- In **retraining mode**, the objective is to use an already generated model to retrain on it and generate new models. It is convenient if you do not want to start a training from scratch. +- In **inference mode**, a previously generated model is loaded and executed in other environments for the same task and the same agent. In this way we check the generalization, robustness and goodness of the models trained with the algorithm used. + +## config file + +The parameterization of RL-Studio is done with a yaml configuration file. There is a general **config.yaml** file with inside comments and with the following structure that it is necessary to understand well in order to work correctly: + +- settings: general parameters as mode, task, algorithm, agent... +- ros: general parameters for ROS +- carla: Carla environment launch params +- inference and retraining models: file to be called in retrained or inference mode +- algorithm parameters: general params for algorithms implemented such as PPO, DQN... +- agents: specific params for every agent, such as sensor configuration +- states: params to define the input state such as image, laser, low-data adquisition, Lidar. There are differents from the agents +- actions: actions definitions for the agent to take. In case of AV could be linear and angular velocity, throttle, brake. +- rewards: reward function +- environment parameters: each environment has its own parameters, which are defined in this place. + +It is possible to add new options but you should avoid modifying the existing ones. + +Due to the length of the config file, to work with RL-Studio it is convenient to create config files for each task that needs to be done, in order to leave the configuration ready to launch the application in the fastest and most comfortable way. The files must have the same format as the general config.yaml with the file name of the form: + +``` +config_mode_task_algorithm_agent_simulator.yaml +``` + +The file must be saved in the directory + +``` +/PATH/TO/RL-Studio/rl_studio/config/ +``` + +There are several config files to take as example. If you need more information about coding style, please refer to [coding](./CODING.md) file. ## Project diagram -![](./docs/rlstudio-diagram.png) +The following graph shows a conceptual diagram of the operation of RL-Studio in training mode. In the case of making inference or retraining, the process is similar + +![](./docs/rlstudio-diagram.svg) + +# Run RL Studio ## Usage -To run RL-Studio, first go to dir +Open the `config.yaml` file and set the params you need. Then to run RL-Studio, go to directory ```bash cd ~/PATH/TO/RL-Studio/rl_studio @@ -14,31 +56,39 @@ cd ~/PATH/TO/RL-Studio/rl_studio and then just type (depending on how the dependencies are managed): -```bash -poetry run python main_rlstudio.py -n [algorithm] -a [agent] -e [environment] -f config/config.yaml # if using Poetry for dependencies -python main_rlstudio.py -n [algorithm] -a [agent] -e [environment] -f config/config.yaml # if using PIP for dependencies -``` - -The config.yaml contains all project hyperparams and configuration needed to execute correctly. -For example, if you want to train a F1 agent in Circuit Simple with Q-learning algorithm, just type: +Pip: -```bash -poetry run python main_rlstudio.py -n qlearn -a f1 -e simple -f config/config_f1_qlearn.yaml # if using Poetry for dependencies -python main_rlstudio.py -n qlearn -a f1 -e simple -f config/config_f1_qlearn.yaml # if using PIP for dependencies ``` +python rl-studio.py -f config/ +``` +where can be any config you can create or previosly existed. -Or an inference making use of the script that uses a library created for that purpose +When launching Gazebo, if you found and error, try killing all possible previous ROS-Gazebo processes: -```bash -poetry run python main_rlstudio.py -n qlearn -a f1 -e simple -f config/config_f1_qlearn.yaml -m inference # if using Poetry for dependencies -python main_rlstudio.py -n qlearn -a f1 -e simple -f config/config_f1_qlearn.yaml -m inference # if using PIP for dependencies +``` +killall gzserver +killall gzclient ``` -> :warning: If you want to use inferencing in a program language other than python, you will -> need extend the main_rlstudio.py to listen for inputs in a port and execute the loaded brain/algorithm to provide -> outputs in the desired way. Note that inference_rlstudio.py is just the library used to inference +## Config.yaml +The config.yaml contains all project hyperparams and configuration needed to execute correctly. In case you want to train a Formula 1 agent in a Follow Lane task in Gazebo, with a PPO algorithm and Tensorflow Deep Learning framework, you can use next example from a config.yaml example file: + +```yaml +settings: + algorithm: PPO + task: follow_lane + environment: simple + mode: training # or inference + agent: f1 + simulator: gazebo + framework: tensorflow +``` -Open the `config.yaml` file and set the params you need. +Remaining params should be adjusted too. There are many working yaml files in config folder to check them. +> :warning: If you want to use inferencing in a program language other than python, you will +> need extend the rl-studio.py to listen for inputs in a port and execute the loaded brain/algorithm to provide +> outputs in the desired way. +More info about how to config and launch any task, please go to [agents](agents/README.md) section. diff --git a/rl_studio/__init__.py b/rl_studio/__init__.py index 69c2ed6c3..c1f1e41ea 100755 --- a/rl_studio/__init__.py +++ b/rl_studio/__init__.py @@ -22,6 +22,27 @@ kwargs={'random_start_level': 0.05} ) +register( + id="myCartpole-v1", + entry_point="rl_studio.envs.openai_gym.cartpole.cartpole_env_improved:CartPoleEnv", + max_episode_steps=500, + kwargs={'random_start_level': 0.05} +) + + +register( + id="myCartpole-continuous-v0", + entry_point="rl_studio.envs.openai_gym.cartpole.cartpole_env_continuous:CartPoleEnv", + max_episode_steps=500, + kwargs={'random_start_level': 0.05} +) + +register( + id="myCartpole-continuous-v1", + entry_point="rl_studio.envs.openai_gym.cartpole.cartpole_env_continuous_improved:CartPoleEnv", + max_episode_steps=500, + kwargs={'random_start_level': 0.05} +) # MountainCar envs register( diff --git a/rl_studio/agents/README.md b/rl_studio/agents/README.md new file mode 100644 index 000000000..34c50f89b --- /dev/null +++ b/rl_studio/agents/README.md @@ -0,0 +1,103 @@ +# Agents + +In order to config, launch and work with RL-Stdio, it is necesary to understand the config.yaml file. Next we show you diferent options to get the most of our tool. + +## Agent: F1 - Task: Follow Line - Algorithm: Q-learn - Mode: training or retraining + +We provide a file with the necesary configuration, which you can use it directly through the CLI: +```bash +python rl-studio.py -f config/config_training_followline_qlearn_f1_gazebo.yaml + +``` + + +### Main configuration options + + +```yaml +settings: + algorithm: qlearn + task: follow_line + environment: simple # circuit + mode: training # retraining + agent: f1 + simulator: gazebo # carla + framework: _ # not necesary for Q-learn +``` +In retraining mode, you have to configure Q table name to load in training + +```yaml +retraining: + qlearn: + retrain_qlearn_model_name: "20230104-211624_Circuit-simple_States-sp1_Actions-simple_Rewards-followline_center_epsilon-0.95_epoch-1_step-184_reward-1310-qtable.npy" + +``` + + + + + +## Agent: F1 - Task: Follow Line - Algorithm: Q-learn - Mode: inference + + + + + + +## How to use DDPG in F1 - Follow line - camera sensor with DDPG algorithm + +For Formula1 F1 agent follows line with camera sensor, the main features are: + +- **state/observation**: Currently there are two ways to generate the input state that feeds the RL algorithm through a camera: **simplified perception of n points** or the **raw image**. + With simplified perception, the image is divided into regions and the points of the road central line generate the state that feeds the neural network. + In case the input space is raw image, the state is the image obtained by the camera sensor. This image must be resized so that it can be processed by the neural networks. + +- **actions**: _discrete_ or _continuous_. In the case of discrete actions, sets of pairs [linear velocity, angular velocity] specific to each circuit are generated. The continuous actions are established with the minimum and maximum ranges of linear and angular velocity. + +- **reward**: _discrete_ or _linear_. The discrete reward function generates values ​​obtained by trial and error where the reward is bigger or lower according to the distance to the road line center. The linear reward function is determined by the relationship between the linear and angular velocity of the car and its position with respect to the center line of the road. + +## Setting Params in DDPG F1 - follow line camera sensor + +The parameters must be configured through the config.yaml file in the /config directory. The most relevant parameters are: + +Agent: + +- image_resizing: 10. Generally the size of the image captured by the camera sensor is determined in the agent configuration and the standard is 480x640 pixels. This size is too large for neural network processing so it should be reduced. This variable determines the percentage of image size reduction, i.e. 10 means that it is reduced to 10% of its original size, so in the default size the image is reduced to 48x64 pixels. + +- new_image_size: 32. It gives us another way of reducing the image for processing in neural networks. In this case, the parameter determined here generates an image of size number x number, i.e., 32x32, 64x64... which is more efficient for processing in neural networks. + +- raw_image: False. It is a Boolean variable that, if True, takes as input state of the neural network, the raw image obtained by the camera sensor. If this variable is False, the image obtained will be preprocessed and converted to black and white to obtain the necessary information and then it will be reduced in size to feed the neural network. + +- State_space: image or sp1, sp3... gives us the distance in pixels down from line that marks the horizon of the road. + +--- + +## Deep Q Networks (DQN) + +Based on [Human-level control through deep reinforcement learning whitepaper](https://www.nature.com/articles/nature14236?wm=book_wap_0005), it allows working with multidimensional states through Deep Neural Nets and discrete actions. + +## How to use DQN in F1 - Follow line - camera sensor with DQN algorithm + +Like DDPG Formula1 F1 agent following the line with camera sensor, the main features are: + +- **state/observation**: Currently there are two ways to generate the input state that feeds the RL algorithm through a camera: **simplified perception of n points** or the **raw image**. + With simplified perception, the image is divided into regions and the points of the road central line generate the state that feeds the neural network. + In case the input space is raw image, the state is the image obtained by the camera sensor. This image must be resized so that it can be processed by the neural networks. + +- **actions**: only _discrete_ working like DDPG F1 agent. + +- **reward**: _discrete_ or _linear_. The discrete reward function generates values ​​obtained by trial and error where the reward is bigger or lower according to the distance to the road line center. The linear reward function is determined by the relationship between the linear and angular velocity of the car and its position with respect to the center line of the road. + +## Setting Params in DQN F1 - follow line camera sensor + +The parameters must be configured through the config.yaml file in the /config directory. The most relevant parameters are: + +Agent: + +- image_resizing: 10. Generally the size of the image captured by the camera sensor is determined in the agent configuration and the standard is 480x640 pixels. This size is too large for neural network processing so it should be reduced. This variable determines the percentage of image size reduction, i.e. 10 means that it is reduced to 10% of its original size, so in the default size the image is reduced to 48x64 pixels. + +- new_image_size: 32. It gives us another way of reducing the image for processing in neural networks. In this case, the parameter determined here generates an image of size number x number, i.e., 32x32, 64x64... which is more efficient for processing in neural networks. + +- raw_image: False. It is a Boolean variable that, if True, takes as input state of the neural network, the raw image obtained by the camera sensor. If this variable is False, the image obtained will be preprocessed and converted to black and white to obtain the necessary information and then it will be reduced in size to feed the neural network. + +- State_space: image or sp1, sp3... gives us the distance in pixels down from line that marks the horizon of the road. diff --git a/rl_studio/agents/__init__.py b/rl_studio/agents/__init__.py index 9a0a9f886..c3e760b54 100644 --- a/rl_studio/agents/__init__.py +++ b/rl_studio/agents/__init__.py @@ -1,160 +1,504 @@ from rl_studio.agents.agents_type import AgentsType from rl_studio.agents.exceptions import NoValidTrainingType +from rl_studio.agents.tasks_type import TasksType +from rl_studio.agents.frameworks_type import FrameworksType +from rl_studio.agents.utils import print_messages from rl_studio.algorithms.algorithms_type import AlgorithmsType +from rl_studio.envs.envs_type import EnvsType class TrainerFactory: def __new__(cls, config): - agent = config.agent["name"] - algorithm = config.algorithm["name"] - - # F1 - if agent == AgentsType.F1.value: - # Q-learn - if algorithm == AlgorithmsType.QLEARN.value: - from rl_studio.agents.f1.train_qlearn import F1Trainer + """ + There are many options: + + Tasks: + - Follow_line + - Follow_lane + + Agents: + - F1 (Gazebo) + - robot_mesh + - Mountain car + - Cartpole + - Autoparking (Gazebo) + - AutoCarla (Carla) + - Turtlebot (Gazebo) + + Algorithms: + - qlearn + - DQN + - DDPG + - PPO + + Simulators: + - Gazebo + - OpenAI + - Carla + - SUMO + """ + + agent = config["settings"]["agent"] + algorithm = config["settings"]["algorithm"] + task = config["settings"].get("task") + simulator = config["settings"].get("simulator") + framework = config["settings"].get("framework") + + print_messages( + "TrainerFactory", + task=task, + algorithm=algorithm, + simulator=simulator, + agent=agent, + framework=framework, + ) + + # ============================= + # FollowLine - F1 - qlearn - Gazebo + # ============================= + if ( + task == TasksType.FOLLOWLINEGAZEBO.value + and agent == AgentsType.F1GAZEBO.value + and algorithm == AlgorithmsType.QLEARN.value + and simulator == EnvsType.GAZEBO.value + ): + from rl_studio.agents.f1.train_followline_qlearn_f1_gazebo import ( + TrainerFollowLineQlearnF1Gazebo, + ) - return F1Trainer(config) + return TrainerFollowLineQlearnF1Gazebo(config) + + # ============================= + # FollowLine - F1 - DDPG - Gazebo - TensorFlow + # ============================= + elif ( + task == TasksType.FOLLOWLINEGAZEBO.value + and agent == AgentsType.F1GAZEBO.value + and algorithm == AlgorithmsType.DDPG.value + and simulator == EnvsType.GAZEBO.value + and framework == FrameworksType.TF.value + ): + from rl_studio.agents.f1.train_followline_ddpg_f1_gazebo_tf import ( + TrainerFollowLineDDPGF1GazeboTF, + ) - # DDPG - elif algorithm == AlgorithmsType.DDPG.value: - from rl_studio.agents.f1.train_ddpg import F1TrainerDDPG + return TrainerFollowLineDDPGF1GazeboTF(config) + + # ============================= + # FollowLine - F1 - DQN - Gazebo - TensorFlow + # ============================= + elif ( + task == TasksType.FOLLOWLINEGAZEBO.value + and agent == AgentsType.F1GAZEBO.value + and algorithm == AlgorithmsType.DQN.value + and simulator == EnvsType.GAZEBO.value + and framework == FrameworksType.TF.value + ): + from rl_studio.agents.f1.train_followline_dqn_f1_gazebo_tf import ( + TrainerFollowLineDQNF1GazeboTF, + ) - return F1TrainerDDPG(config) + return TrainerFollowLineDQNF1GazeboTF(config) + + # ============================= + # Follow Lane - F1 - qlearn - Gazebo + # ============================= + elif ( + task == TasksType.FOLLOWLANEGAZEBO.value + and agent == AgentsType.F1GAZEBO.value + and algorithm == AlgorithmsType.QLEARN.value + and simulator == EnvsType.GAZEBO.value + ): + from rl_studio.agents.f1.train_followlane_qlearn_f1_gazebo import ( + TrainerFollowLaneQlearnF1Gazebo, + ) - # DQN - elif algorithm == AlgorithmsType.DQN.value: - from rl_studio.agents.f1.train_dqn import DQNF1FollowLineTrainer + return TrainerFollowLaneQlearnF1Gazebo(config) + + # ============================= + # Follow Lane - F1 - DDPG - Gazebo - TF + # ============================= + elif ( + task == TasksType.FOLLOWLANEGAZEBO.value + and agent == AgentsType.F1GAZEBO.value + and algorithm == AlgorithmsType.DDPG.value + and simulator == EnvsType.GAZEBO.value + and framework == FrameworksType.TF.value + ): + from rl_studio.agents.f1.train_followlane_ddpg_f1_gazebo_tf import ( + TrainerFollowLaneDDPGF1GazeboTF, + ) - return DQNF1FollowLineTrainer(config) + return TrainerFollowLaneDDPGF1GazeboTF(config) + + # ============================= + # Follow Lane - F1 - DQN - Gazebo - TF + # ============================= + elif ( + task == TasksType.FOLLOWLANEGAZEBO.value + and agent == AgentsType.F1GAZEBO.value + and algorithm == AlgorithmsType.DQN.value + and simulator == EnvsType.GAZEBO.value + and framework == FrameworksType.TF.value + ): + from rl_studio.agents.f1.train_followlane_dqn_f1_gazebo_tf import ( + TrainerFollowLaneDQNF1GazeboTF, + ) - elif agent == AgentsType.TURTLEBOT.value: - from rl_studio.agents.turtlebot.turtlebot_trainer import TurtlebotTrainer + return TrainerFollowLaneDQNF1GazeboTF(config) + + # ============================= + # Robot Mesh - Qlearn - Gazebo + # ============================= + elif ( + agent == AgentsType.ROBOT_MESH.value + and algorithm == AlgorithmsType.QLEARN.value + ): + from rl_studio.agents.robot_mesh.train_qlearn import ( + QLearnRobotMeshTrainer as RobotMeshTrainer, + ) - return TurtlebotTrainer(config) + return RobotMeshTrainer(config) - elif agent == AgentsType.ROBOT_MESH.value: - if algorithm == AlgorithmsType.QLEARN.value: - from rl_studio.agents.robot_mesh.train_qlearn import ( - QLearnRobotMeshTrainer as RobotMeshTrainer, - ) - elif algorithm == AlgorithmsType.MANUAL.value: - from rl_studio.agents.robot_mesh.manual_pilot import ( - ManualRobotMeshTrainer as RobotMeshTrainer, - ) + # ============================= + # Robot Mesh - Manual + # ============================= + elif ( + agent == AgentsType.ROBOT_MESH.value + and algorithm == AlgorithmsType.MANUAL.value + ): + from rl_studio.agents.robot_mesh.manual_pilot import ( + ManualRobotMeshTrainer as RobotMeshTrainer, + ) return RobotMeshTrainer(config) - elif agent == AgentsType.MOUNTAIN_CAR.value: - if algorithm == AlgorithmsType.QLEARN.value: - from rl_studio.agents.mountain_car.train_qlearn import ( - QLearnMountainCarTrainer as MountainCarTrainer, - ) - elif algorithm == AlgorithmsType.MANUAL.value: - from rl_studio.agents.mountain_car.manual_pilot import ( - ManualMountainCarTrainerr as MountainCarTrainer, - ) + # ============================= + # Mountain Car - Qlearn + # ============================= + elif ( + agent == AgentsType.MOUNTAIN_CAR.value + and algorithm == AlgorithmsType.QLEARN.value + ): + from rl_studio.agents.mountain_car.train_qlearn import ( + QLearnMountainCarTrainer as MountainCarTrainer, + ) return MountainCarTrainer(config) - elif agent == AgentsType.CARTPOLE.value: - if algorithm == AlgorithmsType.DQN.value: - from rl_studio.agents.cartpole.train_dqn import ( - DQNCartpoleTrainer as CartpoleTrainer, - ) - elif algorithm == AlgorithmsType.QLEARN.value: - from rl_studio.agents.cartpole.train_qlearn import ( - QLearnCartpoleTrainer as CartpoleTrainer, - ) - elif algorithm == AlgorithmsType.PPO.value: - from rl_studio.agents.cartpole.train_ppo import ( - PPOCartpoleTrainer as CartpoleTrainer, - ) + + # ============================= + # Mountain Car - Manual + # ============================= + elif ( + agent == AgentsType.MOUNTAIN_CAR.value + and algorithm == AlgorithmsType.MANUAL.value + ): + from rl_studio.agents.mountain_car.manual_pilot import ( + ManualMountainCarTrainerr as MountainCarTrainer, + ) + + return MountainCarTrainer(config) + + # ============================= + # CartPole - DQN + # ============================= + elif ( + agent == AgentsType.CARTPOLE.value and algorithm == AlgorithmsType.DQN.value + ): + from rl_studio.agents.cartpole.train_dqn import ( + DQNCartpoleTrainer as CartpoleTrainer, + ) + return CartpoleTrainer(config) - # AutoParking - elif agent == AgentsType.AUTOPARKING.value: - # DDPG - if algorithm == AlgorithmsType.DDPG.value: - from rl_studio.agents.autoparking.train_ddpg import ( - DDPGAutoparkingTrainer, - ) + # ============================= + # CartPole - Qlearn + # ============================= + elif ( + agent == AgentsType.CARTPOLE.value + and algorithm == AlgorithmsType.QLEARN.value + ): + from rl_studio.agents.cartpole.train_qlearn import ( + QLearnCartpoleTrainer as CartpoleTrainer, + ) - return DDPGAutoparkingTrainer(config) + return CartpoleTrainer(config) - elif algorithm == AlgorithmsType.QLEARN.value: - from rl_studio.agents.autoparking.train_qlearn import ( - QlearnAutoparkingTrainer, - ) + # ============================= + # CartPole - PPO + # ============================= + elif ( + agent == AgentsType.CARTPOLE.value and algorithm == AlgorithmsType.PPO.value + ): + from rl_studio.agents.cartpole.train_ppo import ( + PPOCartpoleTrainer as CartpoleTrainer, + ) + + return CartpoleTrainer(config) + # ============================= + # CartPole - PPO CONTINUOUS + # ============================= + elif ( + agent == AgentsType.CARTPOLE.value and algorithm == AlgorithmsType.PPO_CONTINIUOUS.value + ): + from rl_studio.agents.cartpole.train_ppo_continous import ( + PPOCartpoleTrainer as CartpoleTrainer, + ) - return QlearnAutoparkingTrainer(config) + return CartpoleTrainer(config) + + # ============================= + # CartPole - DDPG + # ============================= + elif ( + agent == AgentsType.CARTPOLE.value and algorithm == AlgorithmsType.DDPG.value and + framework == FrameworksType.PYTORCH.value + ): + from rl_studio.agents.cartpole.train_ddpg import ( + DDPGCartpoleTrainer as CartpoleTrainer, + ) + + return CartpoleTrainer(config) + + + # ============================= + # Autoparking - F1 - DDPG - Gazebo - TF + # ============================= + elif ( + task == TasksType.AUTOPARKINGGAZEBO.value + and agent == AgentsType.AUTOPARKINGGAZEBO.value + and algorithm == AlgorithmsType.DDPG.value + and simulator == EnvsType.GAZEBO.value + and framework == FrameworksType.TF.value + ): + from rl_studio.agents.autoparking.train_ddpg import ( + TrainerAutoParkingDDPGGazeboTF, + ) + + return TrainerAutoParkingDDPGGazeboTF(config) + + # ============================= + # Autoparking - F1 - Qlearn - Gazebo + # ============================= + elif ( + task == TasksType.AUTOPARKINGGAZEBO.value + and agent == AgentsType.AUTOPARKINGGAZEBO.value + and algorithm == AlgorithmsType.QLEARN.value + and simulator == EnvsType.GAZEBO.value + ): + from rl_studio.agents.autoparking.train_ddpg import ( + TrainerAutoParkingQlearnGazebo, + ) + + return TrainerAutoParkingQlearnGazebo(config) + + # ============================= + # Turtlebot - Qlearn - Gazebo + # ============================= + elif agent == AgentsType.TURTLEBOT.value: + from rl_studio.agents.turtlebot.turtlebot_trainer import TurtlebotTrainer + + return TurtlebotTrainer(config) + + # ============================= + # Pendulum - DDPG - Pytorch + # ============================= elif agent == AgentsType.PENDULUM.value: if algorithm == AlgorithmsType.DDPG_TORCH.value: from rl_studio.agents.pendulum.train_ddpg import ( DDPGPendulumTrainer as PendulumTrainer, ) - return PendulumTrainer(config) + return PendulumTrainer(config) + + elif algorithm == AlgorithmsType.PPO_CONTINIUOUS.value: + from rl_studio.agents.pendulum.train_ppo import ( + PPOPendulumTrainer as PendulumTrainer, + ) + return PendulumTrainer(config) + else: raise NoValidTrainingType(agent) -class InferenceExecutorFactory: +class InferencerFactory: def __new__(cls, config): - agent = config.agent["name"] - algorithm = config.algorithm["name"] + agent = config["settings"]["agent"] + algorithm = config["settings"]["algorithm"] + task = config["settings"].get("task") + simulator = config["settings"].get("simulator") + framework = config["settings"].get("framework") + print_messages( + "InferenceExecutorFactory", + task=task, + algorithm=algorithm, + simulator=simulator, + agent=agent, + framework=framework, + ) + + # ============================= + # FollowLine - F1 - qlearn - Gazebo + # ============================= + if ( + task == TasksType.FOLLOWLINEGAZEBO.value + and agent == AgentsType.F1GAZEBO.value + and algorithm == AlgorithmsType.QLEARN.value + and simulator == EnvsType.GAZEBO.value + ): + from rl_studio.agents.f1.inference_followline_qlearn_f1_gazebo import ( + InferencerFollowLineQlearnF1Gazebo, + ) + + return InferencerFollowLineQlearnF1Gazebo(config) + + # ============================= + # FollowLine - F1 - DDPG - Gazebo - TensorFlow + # ============================= + elif ( + task == TasksType.FOLLOWLINEGAZEBO.value + and agent == AgentsType.F1GAZEBO.value + and algorithm == AlgorithmsType.DDPG.value + and simulator == EnvsType.GAZEBO.value + and framework == FrameworksType.TF.value + ): + from rl_studio.agents.f1.inference_followline_ddpg_f1_gazebo_tf import ( + InferencerFollowLineDDPGF1GazeboTF, + ) - if agent == AgentsType.ROBOT_MESH.value: + return InferencerFollowLineDDPGF1GazeboTF(config) + + # ============================= + # FollowLine - F1 - DQN - Gazebo - TensorFlow + # ============================= + elif ( + task == TasksType.FOLLOWLINEGAZEBO.value + and agent == AgentsType.F1GAZEBO.value + and algorithm == AlgorithmsType.DQN.value + and simulator == EnvsType.GAZEBO.value + and framework == FrameworksType.TF.value + ): + from rl_studio.agents.f1.inference_followline_dqn_f1_gazebo_tf import ( + InferencerFollowLineDQNF1GazeboTF, + ) + + return InferencerFollowLineDQNF1GazeboTF(config) + + # ============================= + # Follow Lane - F1 - qlearn - Gazebo + # ============================= + elif ( + task == TasksType.FOLLOWLANEGAZEBO.value + and agent == AgentsType.F1GAZEBO.value + and algorithm == AlgorithmsType.QLEARN.value + and simulator == EnvsType.GAZEBO.value + ): + from rl_studio.agents.f1.inference_followlane_qlearn_f1_gazebo import ( + InferencerFollowLaneQlearnF1Gazebo, + ) + + return InferencerFollowLaneQlearnF1Gazebo(config) + + # ============================= + # Follow Lane - F1 - DDPG - Gazebo - TF + # ============================= + elif ( + task == TasksType.FOLLOWLANEGAZEBO.value + and agent == AgentsType.F1GAZEBO.value + and algorithm == AlgorithmsType.DDPG.value + and simulator == EnvsType.GAZEBO.value + and framework == FrameworksType.TF.value + ): + from rl_studio.agents.f1.inference_followlane_ddpg_f1_gazebo_tf import ( + InferencerFollowLaneDDPGF1GazeboTF, + ) + + return InferencerFollowLaneDDPGF1GazeboTF(config) + + # ============================= + # Follow Lane - F1 - DQN - Gazebo - TF + # ============================= + elif ( + task == TasksType.FOLLOWLANEGAZEBO.value + and agent == AgentsType.F1GAZEBO.value + and algorithm == AlgorithmsType.DQN.value + and simulator == EnvsType.GAZEBO.value + and framework == FrameworksType.TF.value + ): + from rl_studio.agents.f1.inference_followlane_dqn_f1_gazebo_tf import ( + InferencerFollowLaneDQNF1GazeboTF, + ) + + return InferencerFollowLaneDQNF1GazeboTF(config) + + # ============================= + # Robot Mesh - Qlearn - Gazebo + # ============================= + elif agent == AgentsType.ROBOT_MESH.value: from rl_studio.agents.robot_mesh.inference_qlearn import ( QLearnRobotMeshInferencer, ) return QLearnRobotMeshInferencer(config) - elif agent == AgentsType.F1.value: - from rl_studio.agents.f1.inference_qlearn import F1Inferencer - - return F1Inferencer(config) - - # elif agent == AgentsType.TURTLEBOT.value: - # from rl_studio.agents.turtlebot.turtlebot_Inferencer import TurtlebotInferencer - # - # return TurtlebotInferencer(config) - # - # - # + # ============================= + # CartPole + # ============================= elif agent == AgentsType.CARTPOLE.value: if algorithm == AlgorithmsType.DQN.value: from rl_studio.agents.cartpole.inference_dqn import ( DQNCartpoleInferencer as CartpoleInferencer, ) - elif algorithm == AlgorithmsType.QLEARN.value: - from rl_studio.agents.cartpole.inference_qlearn import ( - QLearnCartpoleInferencer as CartpoleInferencer, - ) elif algorithm == AlgorithmsType.PPO.value: from rl_studio.agents.cartpole.inference_ppo import ( PPOCartpoleInferencer as CartpoleInferencer, ) + elif algorithm == AlgorithmsType.PPO_CONTINIUOUS.value: + from rl_studio.agents.cartpole.inference_ppo_continuous import ( + PPOCartpoleInferencer as CartpoleInferencer, + ) + elif algorithm == AlgorithmsType.DDPG.value and framework == FrameworksType.PYTORCH.value: + from rl_studio.agents.cartpole.inference_ddpg import ( + DDPGCartpoleInferencer as CartpoleInferencer, + ) elif algorithm == AlgorithmsType.PROGRAMMATIC.value: from rl_studio.agents.cartpole.inference_no_rl import ( NoRLCartpoleInferencer as CartpoleInferencer, ) + else: + from rl_studio.agents.cartpole.inference_qlearn import ( + QLearnCartpoleInferencer as CartpoleInferencer, + ) return CartpoleInferencer(config) + # ============================= + # Mountain Car - Qlearn + # ============================= elif agent == AgentsType.MOUNTAIN_CAR.value: from rl_studio.agents.mountain_car.inference_qlearn import ( - MountainCarInferencer, + QLearnMountainCarInferencer, ) - return MountainCarInferencer(config) + return QLearnMountainCarInferencer(config) + # ============================= + # Pendulum - DDPG - Pytorch + # ============================= elif agent == AgentsType.PENDULUM.value: - from rl_studio.agents.pendulum.inference_ddpg import ( - DDPGPendulumInferencer as PendulumInferencer, - ) + if algorithm == AlgorithmsType.DDPG_TORCH.value: + from rl_studio.agents.pendulum.inference_ddpg import ( + DDPGPendulumInferencer as PendulumInferencer, + ) + + return PendulumInferencer(config) + + elif algorithm == AlgorithmsType.PPO_CONTINIUOUS.value: + from rl_studio.agents.pendulum.inference_ppo import ( + PPOPendulumInferencer as PendulumInferencer, + ) - return PendulumInferencer(config) + return PendulumInferencer(config) else: raise NoValidTrainingType(agent) diff --git a/rl_studio/agents/agents_type.py b/rl_studio/agents/agents_type.py index 3e7e83ac4..3e1641e89 100644 --- a/rl_studio/agents/agents_type.py +++ b/rl_studio/agents/agents_type.py @@ -3,9 +3,11 @@ class AgentsType(Enum): F1 = "f1" + F1GAZEBO = "f1" TURTLEBOT = "turtlebot" ROBOT_MESH = "robot_mesh" MOUNTAIN_CAR = "mountain_car" CARTPOLE = "cartpole" - AUTOPARKING = "autoparking" PENDULUM = "pendulum" + AUTOPARKINGGAZEBO = "autoparkingRL" + AUTOCARLA = "auto_carla" diff --git a/rl_studio/agents/cartpole/cartpole_Inferencer.py b/rl_studio/agents/cartpole/cartpole_Inferencer.py new file mode 100644 index 000000000..641db2999 --- /dev/null +++ b/rl_studio/agents/cartpole/cartpole_Inferencer.py @@ -0,0 +1,107 @@ +import datetime +import gc + +import gym + +import logging +from tqdm import tqdm + +import torch +class CartpoleInferencer: + def __init__(self, params): + self.now = datetime.datetime.now() + # self.environment params + self.params = params + self.environment_params = params["environments"] + self.env_name = self.environment_params["env_name"] + self.config = params["settings"] + self.agent_config = params["agent"] + + if self.config["logging_level"] == "debug": + self.LOGGING_LEVEL = logging.DEBUG + elif self.config["logging_level"] == "error": + self.LOGGING_LEVEL = logging.ERROR + elif self.config["logging_level"] == "critical": + self.LOGGING_LEVEL = logging.CRITICAL + else: + self.LOGGING_LEVEL = logging.INFO + + self.experiments = self.environment_params.get("experiments", 1) + self.RANDOM_PERTURBATIONS_LEVEL = self.environment_params.get("random_perturbations_level", 0) + self.FIRST_RANDOM_PERTURBATIONS_LEVEL = self.environment_params.get("random_perturbations_level", 0) + self.RANDOM_PERTURBATIONS_LEVEL_STEP = self.environment_params.get("random_perturbations_level_step", 0.1) + self.PERTURBATIONS_INTENSITY_STD = self.environment_params.get("perturbations_intensity_std", 0) + self.FIRST_PERTURBATIONS_INTENSITY_STD = self.environment_params.get("perturbations_intensity_std", 0) + self.PERTURBATIONS_INTENSITY_STD_STEP = self.environment_params.get("perturbations_intensity_std_step", 2) + self.RANDOM_START_LEVEL = self.environment_params.get("random_start_level", 0) + self.INITIAL_POLE_ANGLE = self.environment_params.get("initial_pole_angle", None) + self.FIRST_INITIAL_POLE_ANGLE = self.environment_params.get("initial_pole_angle", None) + self.INITIAL_POLE_ANGLE_STEP = self.environment_params.get("initial_pole_angle_steps", 0.1) + + self.non_recoverable_angle = self.environment_params[ + "non_recoverable_angle" + ] + + self.RUNS = self.environment_params["runs"] + self.SHOW_EVERY = self.environment_params[ + "show_every" + ] + self.UPDATE_EVERY = self.environment_params[ + "update_every" + ] # How often the current progress is recorded + torch.no_grad() + + # Unfortunately, max_steps is not working with new_step_api=True and it is not giving any benefit. + # self.env = gym.make(self.env_name, new_step_api=True, random_start_level=random_start_level) + self.env = gym.make(self.env_name, random_start_level=self.RANDOM_START_LEVEL, + initial_pole_angle=self.INITIAL_POLE_ANGLE, + non_recoverable_angle=self.non_recoverable_angle) + def main(self): + + if self.experiments > 1: + # self.PERTURBATIONS_INTENSITY_STD = 0 + # self.RANDOM_PERTURBATIONS_LEVEL = 0 + # self.INITIAL_POLE_ANGLE = 0 + # self.run_experiment() + self.PERTURBATIONS_INTENSITY_STD = self.FIRST_PERTURBATIONS_INTENSITY_STD + self.RANDOM_PERTURBATIONS_LEVEL = self.FIRST_RANDOM_PERTURBATIONS_LEVEL + self.INITIAL_POLE_ANGLE = self.FIRST_INITIAL_POLE_ANGLE + self.run_experiment() + # First base experiment, then perturbation experiments, then frequency and then initial angle + for experiment in tqdm(range(self.experiments)): + self.PERTURBATIONS_INTENSITY_STD += self.PERTURBATIONS_INTENSITY_STD_STEP + self.RANDOM_PERTURBATIONS_LEVEL = self.FIRST_RANDOM_PERTURBATIONS_LEVEL + self.INITIAL_POLE_ANGLE = self.FIRST_INITIAL_POLE_ANGLE + self.run_experiment() + torch.cuda.empty_cache() + + print(f"finished intensity experiment {experiment}") + # for experiment in tqdm(range(self.experiments)): + # self.PERTURBATIONS_INTENSITY_STD = self.FIRST_PERTURBATIONS_INTENSITY_STD + # self.RANDOM_PERTURBATIONS_LEVEL += self.RANDOM_PERTURBATIONS_LEVEL_STEP + # self.INITIAL_POLE_ANGLE = self.FIRST_INITIAL_POLE_ANGLE + # self.run_experiment() + # torch.cuda.empty_cache() + # + # print(f"finished frequency experiment {experiment}") + # for experiment in tqdm(range(self.experiments)): + # self.PERTURBATIONS_INTENSITY_STD = 0 + # self.RANDOM_PERTURBATIONS_LEVEL = 0 + # self.INITIAL_POLE_ANGLE += self.INITIAL_POLE_ANGLE_STEP + # self.RUNS = 10 + # + # if self.INITIAL_POLE_ANGLE > 0.9: + # exit(0) + # # Unfortunately, max_steps is not working with new_step_api=True and it is not giving any benefit. + # self.env = gym.make(self.env_name, random_start_level=self.RANDOM_START_LEVEL, + # initial_pole_angle=self.INITIAL_POLE_ANGLE, + # non_recoverable_angle=0.9) + # + # self.run_experiment() + # torch.cuda.empty_cache() + # + # print(f"finished init angle experiment {experiment}") + # else: + self.run_experiment() + + diff --git a/rl_studio/agents/cartpole/inference_ddpg.py b/rl_studio/agents/cartpole/inference_ddpg.py new file mode 100644 index 000000000..71bcd3862 --- /dev/null +++ b/rl_studio/agents/cartpole/inference_ddpg.py @@ -0,0 +1,149 @@ +import datetime +import time +import random + +import gym +import matplotlib.pyplot as plt +from torch.utils import tensorboard +from tqdm import tqdm + +import logging + +from rl_studio.agents.cartpole.cartpole_Inferencer import CartpoleInferencer +from rl_studio.visual.ascii.images import JDEROBOT_LOGO +from rl_studio.visual.ascii.text import JDEROBOT, LETS_GO +from rl_studio.agents.cartpole.utils import store_array, show_fails_success_comparisson +from rl_studio.wrappers.inference_rlstudio import InferencerWrapper + + +class DDPGCartpoleInferencer(CartpoleInferencer): + def __init__(self, params): + super().__init__(params); + self.BLOCKED_EXPERIENCE_BATCH = self.environment_params[ + "block_experience_batch" + ] + + self.actions = self.env.action_space + + self.losses_list, self.reward_list, self.states_list, self.episode_len_list, self.epsilon_list = ( + [], + [], + [], + [], + [], + ) # metrics + # recorded for graph + + inference_file = params["inference"]["inference_file"] + # TODO the first parameter (algorithm) should come from configuration + self.inferencer = InferencerWrapper("ddpg", inference_file, env=self.env) + + def print_init_info(self): + logging.info(JDEROBOT) + logging.info(JDEROBOT_LOGO) + logging.info(f"\t- Start hour: {datetime.datetime.now()}\n") + logging.info(f"\t- self.environment params:\n{self.environment_params}") + + def gather_statistics(self, losses, ep_len, episode_rew, states): + if losses is not None: + self.losses_list.append(losses / ep_len) + self.reward_list.append(episode_rew) + self.episode_len_list.append(ep_len) + self.states_list.append(states) + + # def final_demonstration(self): + # for i in tqdm(range(2)): + # obs, done, rew = self.env.reset(), False, 0 + # while not done: + # obs = np.append(obs, -1) + # A = self.deepq.get_action(obs, self.env.action_space.n, epsilon=0) + # obs, reward, done, info = self.env.step(A.item()) + # rew += reward + # time.sleep(0.01) + # self.env.render() + # logging.info("\ndemonstration episode : {}, reward : {}".format(i, rew)) + + def run_experiment(self): + epoch_start_time = datetime.datetime.now() + + logs_dir = 'logs/cartpole/ddpg/inference/' + logs_file_name = 'logs_file_' + str(self.RANDOM_START_LEVEL) + '_' + str( + self.RANDOM_PERTURBATIONS_LEVEL) + '_' + str(epoch_start_time) \ + + str(self.PERTURBATIONS_INTENSITY_STD) + '.log' + logging.basicConfig(filename=logs_dir + logs_file_name, filemode='a', + level=self.LOGGING_LEVEL, + format='%(name)s - %(levelname)s - %(message)s') + self.print_init_info() + + start_time_format = epoch_start_time.strftime("%Y%m%d_%H%M") + logging.info(LETS_GO) + total_reward_in_epoch = 0 + episode_rewards = [] + global_steps = 0 + w = tensorboard.SummaryWriter(log_dir=f"{logs_dir}/tensorboard/{start_time_format}") + total_secs=0 + self.reward_list = [] + self.states_list = [] + + for run in tqdm(range(self.RUNS)): + states = [] + + state, done, prev_prob_act, ep_len, episode_rew = self.env.reset(), False, None, 0, 0 + while not done: + actor_loss = None + states.append(state[2]) + + ep_len += 1 + global_steps += 1 + if random.uniform(0, 1) < self.RANDOM_PERTURBATIONS_LEVEL: + perturbation_action = random.randrange(2) + state, done, _, _ = self.env.perturbate(perturbation_action, self.PERTURBATIONS_INTENSITY_STD) + logging.debug("perturbated in step {} with action {}".format(episode_rew, perturbation_action)) + + if self.RANDOM_PERTURBATIONS_LEVEL > 1 and random.uniform(0, 1) < self.RANDOM_PERTURBATIONS_LEVEL - 1: + perturbation_action = random.randrange(2) + state, done, _, _ = self.env.perturbate(perturbation_action, self.PERTURBATIONS_INTENSITY_STD) + logging.debug("perturbated in step {} with action {}".format(episode_rew, perturbation_action)) + + + action = self.inferencer.inference(state) + next_state, reward, done, info = self.env.step(action) + total_secs+=info["time"] + + episode_rew += reward + total_reward_in_epoch += reward + state = next_state + + w.add_scalar("reward/episode_reward", episode_rew, global_step=run) + episode_rewards.append(episode_rew) + + if run % self.SHOW_EVERY == 0 and run != 0: + self.env.render() + + self.gather_statistics(actor_loss, ep_len, episode_rew, states) + + # monitor progress + if (run+1) % self.UPDATE_EVERY == 0: + time_spent = datetime.datetime.now() - epoch_start_time + epoch_start_time = datetime.datetime.now() + avgsecs = total_secs / total_reward_in_epoch + total_secs = 0 + updates_message = 'Run: {0} Average: {1} time spent {2} avg_iter {3}'.format(run, total_reward_in_epoch / self.UPDATE_EVERY, + str(time_spent), avgsecs) + logging.info(updates_message) + print(updates_message) + + total_reward_in_epoch = 0 + + # self.final_demonstration() + base_file_name = f'_rewards_rsl-{self.RANDOM_START_LEVEL}_rpl-{self.RANDOM_PERTURBATIONS_LEVEL}_pi-{self.PERTURBATIONS_INTENSITY_STD}_init_{self.INITIAL_POLE_ANGLE}' + file_path = f'{logs_dir}{datetime.datetime.now()}_{base_file_name}.pkl' + store_array(self.reward_list, file_path) + base_file_name = f'_states_rsl-{self.RANDOM_START_LEVEL}_rpl-{self.RANDOM_PERTURBATIONS_LEVEL}_pi-{self.PERTURBATIONS_INTENSITY_STD}_init_{self.INITIAL_POLE_ANGLE}' + file_path = f'{logs_dir}{datetime.datetime.now()}_{base_file_name}.pkl' + store_array(self.states_list, file_path) + # plt.plot(self.reward_list) + # plt.legend("reward per episode") + # plt.show() + + diff --git a/rl_studio/agents/cartpole/inference_dqn.py b/rl_studio/agents/cartpole/inference_dqn.py index 3f45a2371..e1c2adcc9 100755 --- a/rl_studio/agents/cartpole/inference_dqn.py +++ b/rl_studio/agents/cartpole/inference_dqn.py @@ -7,7 +7,8 @@ import logging import numpy as np -from rl_studio.agents.cartpole.utils import store_rewards, show_fails_success_comparisson +from rl_studio.agents.cartpole.cartpole_Inferencer import CartpoleInferencer +from rl_studio.agents.cartpole.utils import store_array, show_fails_success_comparisson from rl_studio.wrappers.inference_rlstudio import InferencerWrapper from tqdm import tqdm @@ -15,53 +16,17 @@ from rl_studio.visual.ascii.text import JDEROBOT, LETS_GO -class DQNCartpoleInferencer: +class DQNCartpoleInferencer(CartpoleInferencer): def __init__(self, params): - - self.now = datetime.datetime.now() - # self.environment params - self.params = params - self.environment_params = params.environment["params"] - self.env_name = params.environment["params"]["env_name"] - self.config = params.settings["params"] - self.agent_config = params.agent["params"] - - if self.config["logging_level"] == "debug": - self.LOGGING_LEVEL = logging.DEBUG - elif self.config["logging_level"] == "error": - self.LOGGING_LEVEL = logging.ERROR - elif self.config["logging_level"] == "critical": - self.LOGGING_LEVEL = logging.CRITICAL - else: - self.LOGGING_LEVEL = logging.INFO - - self.RANDOM_PERTURBATIONS_LEVEL = self.environment_params.get("random_perturbations_level", 0) - self.PERTURBATIONS_INTENSITY_STD = self.environment_params.get("perturbations_intensity_std", 0) - self.RANDOM_START_LEVEL = self.environment_params.get("random_start_level", 0) - self.INITIAL_POLE_ANGLE = self.environment_params.get("initial_pole_angle", None) - - # Unfortunately, max_steps is not working with new_step_api=True and it is not giving any benefit. - # self.env = gym.make(self.env_name, new_step_api=True, random_start_level=random_start_level) - non_recoverable_angle = self.environment_params[ - "non_recoverable_angle" - ] - self.env = gym.make(self.env_name, random_start_level=self.RANDOM_START_LEVEL, initial_pole_angle=self.INITIAL_POLE_ANGLE, - non_recoverable_angle=non_recoverable_angle) - - self.RUNS = self.environment_params["runs"] - self.SHOW_EVERY = self.environment_params[ - "show_every" - ] - self.UPDATE_EVERY = self.environment_params[ - "update_every" - ] # How oftern the current progress is recorded + super().__init__(params); self.OBJECTIVE = self.environment_params[ "objective_reward" ] self.actions = self.env.action_space.n - self.losses_list, self.reward_list, self.episode_len_list, self.epsilon_list = ( + self.losses_list, self.reward_list, self.states_list, self.episode_len_list, self.epsilon_list = ( + [], [], [], [], @@ -69,7 +34,7 @@ def __init__(self, params): ) # metrics recorded for graph self.epsilon = 0 - inference_file = params.inference["params"]["inference_file"] + inference_file = params["inference"]["inference_file"] # TODO the first parameter (algorithm) should come from configuration self.inferencer = InferencerWrapper("dqn", inference_file, env=self.env) @@ -79,7 +44,7 @@ def print_init_info(self): logging.info(f"\t- Start hour: {datetime.datetime.now()}\n") logging.info(f"\t- self.environment params:\n{self.environment_params}") - def main(self): + def run_experiment(self): epoch_start_time = datetime.datetime.now() logs_dir = 'logs/cartpole/dqn/inference/' @@ -93,35 +58,47 @@ def main(self): unsuccessful_episodes_count = 0 episodes_rewards = [] + self.states_list = [] logging.info(LETS_GO) total_reward_in_epoch = 0 + total_secs=0 for run in tqdm(range(self.RUNS)): obs, done, rew = self.env.reset(), False, 0 + states = [] while not done: + states.append(obs[2]) if random.uniform(0, 1) < self.RANDOM_PERTURBATIONS_LEVEL: perturbation_action = random.randrange(self.env.action_space.n) obs, done, _, _ = self.env.perturbate(perturbation_action, self.PERTURBATIONS_INTENSITY_STD) logging.info("perturbated in step {} with action {}".format(rew, perturbation_action)) + if self.RANDOM_PERTURBATIONS_LEVEL > 1 and random.uniform(0, 1) < self.RANDOM_PERTURBATIONS_LEVEL - 1: + perturbation_action = random.randrange(2) + obs, done, _, _ = self.env.perturbate(perturbation_action, self.PERTURBATIONS_INTENSITY_STD) + logging.debug("perturbated in step {} with action {}".format(rew, perturbation_action)) + A = self.inferencer.inference(obs) obs, reward, done, info = self.env.step(A.item()) + total_secs+=info["time"] rew += reward total_reward_in_epoch += reward - if run % self.SHOW_EVERY == 0: + if run % self.SHOW_EVERY == 0 and run != 0: self.env.render() # monitor progress episodes_rewards.append(rew) - + self.states_list.append(states) if (run+1) % self.UPDATE_EVERY == 0: time_spent = datetime.datetime.now() - epoch_start_time epoch_start_time = datetime.datetime.now() - updates_message = 'Run: {0} Average: {1} time spent {2}'.format(run, + avgsecs = total_secs / total_reward_in_epoch + total_secs = 0 + updates_message = 'Run: {0} Average: {1} time spent {2} avg_time_iter {3}'.format(run, total_reward_in_epoch / self.UPDATE_EVERY, - str(time_spent)) + str(time_spent), avgsecs) logging.info(updates_message) print(updates_message) total_reward_in_epoch = 0 @@ -132,7 +109,10 @@ def main(self): logging.info(f'unsuccessful episodes => {unsuccessful_episodes_count}') base_file_name = f'_rewards_rsl-{self.RANDOM_START_LEVEL}_rpl-{self.RANDOM_PERTURBATIONS_LEVEL}_pi-{self.PERTURBATIONS_INTENSITY_STD}_init_{self.INITIAL_POLE_ANGLE}' file_path = f'./logs/cartpole/dqn/inference/{datetime.datetime.now()}_{base_file_name}.pkl' - store_rewards(episodes_rewards, file_path) - show_fails_success_comparisson(self.RUNS, self.OBJECTIVE, episodes_rewards, - self.RANDOM_START_LEVEL, self.RANDOM_PERTURBATIONS_LEVEL, - self.PERTURBATIONS_INTENSITY_STD, self.INITIAL_POLE_ANGLE); + store_array(episodes_rewards, file_path) + base_file_name = f'_states_rsl-{self.RANDOM_START_LEVEL}_rpl-{self.RANDOM_PERTURBATIONS_LEVEL}_pi-{self.PERTURBATIONS_INTENSITY_STD}_init_{self.INITIAL_POLE_ANGLE}' + file_path = f'{logs_dir}{datetime.datetime.now()}_{base_file_name}.pkl' + store_array(self.states_list, file_path) + # show_fails_success_comparisson(self.RUNS, self.OBJECTIVE, episodes_rewards, + # self.RANDOM_START_LEVEL, self.RANDOM_PERTURBATIONS_LEVEL, + # self.PERTURBATIONS_INTENSITY_STD, self.INITIAL_POLE_ANGLE); diff --git a/rl_studio/agents/cartpole/inference_no_rl.py b/rl_studio/agents/cartpole/inference_no_rl.py index 8c5b2758c..7290ea66f 100755 --- a/rl_studio/agents/cartpole/inference_no_rl.py +++ b/rl_studio/agents/cartpole/inference_no_rl.py @@ -4,42 +4,16 @@ import matplotlib.pyplot as plt import numpy as np +from rl_studio.agents.cartpole.cartpole_Inferencer import CartpoleInferencer from rl_studio.visual.ascii.images import JDEROBOT_LOGO from rl_studio.visual.ascii.text import JDEROBOT, QLEARN_CAMERA, LETS_GO -from rl_studio.agents.cartpole.utils import store_rewards +from rl_studio.agents.cartpole.utils import store_array import random -class NoRLCartpoleInferencer: +class NoRLCartpoleInferencer(CartpoleInferencer): def __init__(self, params): - # TODO: Create a pydantic metaclass to simplify the way we extract the params - # environment params - self.params = params - self.environment_params = params.environment["params"] - self.env_name = params.environment["params"]["env_name"] - self.RANDOM_PERTURBATIONS_LEVEL = self.environment_params.get("random_perturbations_level", 0) - self.PERTURBATIONS_INTENSITY_STD = self.environment_params.get("perturbations_intensity_std", 0) - self.RANDOM_START_LEVEL = self.environment_params.get("random_start_level", 0) - self.INITIAL_POLE_ANGLE = self.environment_params.get("initial_pole_angle", None) - - # Unfortunately, max_steps is not working with new_step_api=True and it is not giving any benefit. - # self.env = gym.make(self.env_name, new_step_api=True, random_start_level=random_start_level) - non_recoverable_angle = self.environment_params[ - "non_recoverable_angle" - ] - self.env = gym.make(self.env_name, random_start_level=self.RANDOM_START_LEVEL, - initial_pole_angle=self.INITIAL_POLE_ANGLE, - non_recoverable_angle=non_recoverable_angle) - self.RUNS = self.environment_params[ - "runs" - ] # Number of iterations run TODO set this from config.yml - self.SHOW_EVERY = self.environment_params[ - "show_every" - ] # How oftern the current solution is rendered TODO set this from config.yml - self.UPDATE_EVERY = self.environment_params[ - "update_every" - ] # How oftern the current progress is recorded TODO set this from config.yml - + super().__init__(params); self.previousCnt = [] # array of all scores over runs self.metrics = { "ep": [], @@ -56,7 +30,13 @@ def __init__(self, params): self.env.done = True self.total_episodes = 20000 - + self.losses_list, self.reward_list, self.states_list, self.episode_len_list, self.epsilon_list = ( + [], + [], + [], + [], + [], + ) # metrics recorded for graph def print_init_info(self): print(JDEROBOT) print(JDEROBOT_LOGO) @@ -75,11 +55,16 @@ def evaluate_from_step(self, state): # Execute the action and get feedback next_state, reward, done, info = self.env.step(action) + # updates_message = 'avg control iter time = {0}'.format(info["time"]) + # print(updates_message) + return next_state, done - def main(self): + def run_experiment(self): self.print_init_info() + self.reward_list = [] + self.states_list = [] start_time = datetime.datetime.now() start_time_format = start_time.strftime("%Y%m%d_%H%M") @@ -90,30 +75,42 @@ def main(self): state = self.env.reset() done = False # has the enviroment finished? cnt = 0 # how may movements cart has made + states = [] while not done: cnt += 1 + states.append(state[2]) - if run % self.SHOW_EVERY == 0: + if run % self.SHOW_EVERY == 0 and run != 0: self.env.render() # if running RL comment this oustatst if random.uniform(0, 1) < self.RANDOM_PERTURBATIONS_LEVEL: perturbation_action = random.randrange(self.env.action_space.n) obs, done, _, _ = self.env.perturbate(perturbation_action, self.PERTURBATIONS_INTENSITY_STD) + if self.RANDOM_PERTURBATIONS_LEVEL > 1 and random.uniform(0, 1) < self.RANDOM_PERTURBATIONS_LEVEL - 1: + perturbation_action = random.randrange(2) + state, done, _, _ = self.env.perturbate(perturbation_action, self.PERTURBATIONS_INTENSITY_STD) next_state, done = self.evaluate_from_step(state) + + if not done: state = next_state # Add new metrics for graph - self.metrics["ep"].append(run) - self.metrics["avg"].append(cnt) - + self.episode_len_list.append(run) + self.reward_list.append(cnt) + self.states_list.append(states) self.env.close() - base_file_name = f'_rewards_rsl-{self.RANDOM_START_LEVEL}_rpl-{self.RANDOM_PERTURBATIONS_LEVEL}_pi-{self.PERTURBATIONS_INTENSITY_STD}' - file_path = f'./logs/cartpole/no_rl/inference/{datetime.datetime.now()}_{base_file_name}.pkl' - store_rewards(self.metrics["avg"], file_path) + logs_dir = './logs/cartpole/no_rl/inference/' + + base_file_name = f'_rewards_rsl-{self.RANDOM_START_LEVEL}_rpl-{self.RANDOM_PERTURBATIONS_LEVEL}_pi-{self.PERTURBATIONS_INTENSITY_STD}_init_{self.INITIAL_POLE_ANGLE}' + file_path = f'{logs_dir}{datetime.datetime.now()}_{base_file_name}.pkl' + store_array(self.reward_list, file_path) + base_file_name = f'_states_rsl-{self.RANDOM_START_LEVEL}_rpl-{self.RANDOM_PERTURBATIONS_LEVEL}_pi-{self.PERTURBATIONS_INTENSITY_STD}_init_{self.INITIAL_POLE_ANGLE}' + file_path = f'{logs_dir}{datetime.datetime.now()}_{base_file_name}.pkl' + store_array(self.states_list, file_path) # Plot graph - plt.plot(self.metrics["ep"], self.metrics["avg"], label="average rewards") - plt.legend(loc=4) - plt.show() + # plt.plot(self.episode_len_list, self.reward_list, label="average rewards") + # plt.legend(loc=4) + # plt.show() diff --git a/rl_studio/agents/cartpole/inference_ppo.py b/rl_studio/agents/cartpole/inference_ppo.py index 254cd61ac..6cbadbb2a 100644 --- a/rl_studio/agents/cartpole/inference_ppo.py +++ b/rl_studio/agents/cartpole/inference_ppo.py @@ -12,71 +12,34 @@ import logging from rl_studio.agents.cartpole import utils +from rl_studio.agents.cartpole.cartpole_Inferencer import CartpoleInferencer from rl_studio.algorithms.ppo import Actor, Critic, Mish, t, get_dist from rl_studio.visual.ascii.images import JDEROBOT_LOGO from rl_studio.visual.ascii.text import JDEROBOT, LETS_GO -from rl_studio.agents.cartpole.utils import store_rewards, show_fails_success_comparisson +from rl_studio.agents.cartpole.utils import store_array, show_fails_success_comparisson from rl_studio.wrappers.inference_rlstudio import InferencerWrapper -class PPOCartpoleInferencer: +class PPOCartpoleInferencer(CartpoleInferencer): def __init__(self, params): - - self.now = datetime.datetime.now() - # self.environment params - self.params = params - self.environment_params = params.environment["params"] - self.env_name = params.environment["params"]["env_name"] - self.config = params.settings["params"] - self.agent_config = params.agent["params"] - - if self.config["logging_level"] == "debug": - self.LOGGING_LEVEL = logging.DEBUG - elif self.config["logging_level"] == "error": - self.LOGGING_LEVEL = logging.ERROR - elif self.config["logging_level"] == "critical": - self.LOGGING_LEVEL = logging.CRITICAL - else: - self.LOGGING_LEVEL = logging.INFO - - self.RANDOM_PERTURBATIONS_LEVEL = self.environment_params.get("random_perturbations_level", 0) - self.PERTURBATIONS_INTENSITY_STD = self.environment_params.get("perturbations_intensity_std", 0) - self.RANDOM_START_LEVEL = self.environment_params.get("random_start_level", 0) - self.INITIAL_POLE_ANGLE = self.environment_params.get("initial_pole_angle", None) - - non_recoverable_angle = self.environment_params[ - "non_recoverable_angle" - ] - # Unfortunately, max_steps is not working with new_step_api=True and it is not giving any benefit. - # self.env = gym.make(self.env_name, new_step_api=True, random_start_level=random_start_level) - self.env = gym.make(self.env_name, random_start_level=self.RANDOM_START_LEVEL, initial_pole_angle=self.INITIAL_POLE_ANGLE, - non_recoverable_angle=non_recoverable_angle) - - self.RUNS = self.environment_params["runs"] - self.SHOW_EVERY = self.environment_params[ - "show_every" - ] - self.UPDATE_EVERY = self.environment_params[ - "update_every" - ] # How often the current progress is recorded + super().__init__(params); self.BLOCKED_EXPERIENCE_BATCH = self.environment_params[ "block_experience_batch" ] - self.actions = self.env.action_space.n + self.actions = self.env.action_space - self.losses_list, self.reward_list, self.episode_len_list, self.epsilon_list = ( + self.losses_list, self.reward_list, self.states_list, self.episode_len_list, self.epsilon_list = ( + [], [], [], [], [], - ) # metrics + ) # metrics recorded for graph # recorded for graph - self.epsilon = params.algorithm["params"]["epsilon"] - self.GAMMA = params.algorithm["params"]["gamma"] - self.NUMBER_OF_EXPLORATION_STEPS = 128 + self.epsilon = params["algorithm"]["epsilon"] - inference_file = params.inference["params"]["inference_file"] + inference_file = params["inference"]["inference_file"] # TODO the first parameter (algorithm) should come from configuration self.inferencer = InferencerWrapper("ppo", inference_file, env=self.env) @@ -86,13 +49,13 @@ def print_init_info(self): logging.info(f"\t- Start hour: {datetime.datetime.now()}\n") logging.info(f"\t- self.environment params:\n{self.environment_params}") - def gather_statistics(self, losses, ep_len, episode_rew): + def gather_statistics(self, losses, ep_len, episode_rew, state): if losses is not None: self.losses_list.append(losses / ep_len) self.reward_list.append(episode_rew) self.episode_len_list.append(ep_len) self.epsilon_list.append(self.epsilon) - + self.states_list.append(state) # def final_demonstration(self): # for i in tqdm(range(2)): # obs, done, rew = self.env.reset(), False, 0 @@ -105,8 +68,10 @@ def gather_statistics(self, losses, ep_len, episode_rew): # self.env.render() # logging.info("\ndemonstration episode : {}, reward : {}".format(i, rew)) - def main(self): + def run_experiment(self): epoch_start_time = datetime.datetime.now() + self.reward_list = [] + self.reward_list = [] logs_dir = 'logs/cartpole/ppo/inference/' logs_file_name = 'logs_file_' + str(self.RANDOM_START_LEVEL) + '_' + str( @@ -123,11 +88,15 @@ def main(self): episode_rewards = [] global_steps = 0 w = tensorboard.SummaryWriter(log_dir=f"{logs_dir}/tensorboard/{start_time_format}") + total_secs=0 + self.reward_list = [] for run in tqdm(range(self.RUNS)): + states = [] state, done, prev_prob_act, ep_len, episode_rew = self.env.reset(), False, None, 0, 0 while not done: actor_loss = None + states.append(state[2]) ep_len += 1 global_steps += 1 @@ -135,9 +104,13 @@ def main(self): perturbation_action = random.randrange(self.env.action_space.n) state, done, _, _ = self.env.perturbate(perturbation_action, self.PERTURBATIONS_INTENSITY_STD) logging.debug("perturbated in step {} with action {}".format(episode_rew, perturbation_action)) + if self.RANDOM_PERTURBATIONS_LEVEL > 1 and random.uniform(0, 1) < self.RANDOM_PERTURBATIONS_LEVEL - 1: + perturbation_action = random.randrange(2) + state, done, _, _ = self.env.perturbate(perturbation_action, self.PERTURBATIONS_INTENSITY_STD) action = self.inferencer.inference(state) next_state, reward, done, info = self.env.step(action.detach().data.numpy()) + total_secs+=info["time"] episode_rew += reward total_reward_in_epoch += reward @@ -146,17 +119,19 @@ def main(self): w.add_scalar("reward/episode_reward", episode_rew, global_step=run) episode_rewards.append(episode_rew) - if run % self.SHOW_EVERY == 0: + if run % self.SHOW_EVERY == 0 and run != 0: self.env.render() - self.gather_statistics(actor_loss, ep_len, episode_rew) + self.gather_statistics(actor_loss, ep_len, episode_rew, states) # monitor progress if (run+1) % self.UPDATE_EVERY == 0: time_spent = datetime.datetime.now() - epoch_start_time epoch_start_time = datetime.datetime.now() - updates_message = 'Run: {0} Average: {1} time spent {2}'.format(run, total_reward_in_epoch / self.UPDATE_EVERY, - str(time_spent)) + avgsecs = total_secs / total_reward_in_epoch + total_secs = 0 + updates_message = 'Run: {0} Average: {1} time spent {2} avg_iter {3}'.format(run, total_reward_in_epoch / self.UPDATE_EVERY, + str(time_spent), avgsecs) logging.info(updates_message) print(updates_message) @@ -165,9 +140,11 @@ def main(self): # self.final_demonstration() base_file_name = f'_rewards_rsl-{self.RANDOM_START_LEVEL}_rpl-{self.RANDOM_PERTURBATIONS_LEVEL}_pi-{self.PERTURBATIONS_INTENSITY_STD}_init_{self.INITIAL_POLE_ANGLE}' file_path = f'{logs_dir}{datetime.datetime.now()}_{base_file_name}.pkl' - store_rewards(self.reward_list, file_path) - plt.plot(self.reward_list) - plt.legend("reward per episode") - plt.show() - + store_array(self.reward_list, file_path) + base_file_name = f'_states_rsl-{self.RANDOM_START_LEVEL}_rpl-{self.RANDOM_PERTURBATIONS_LEVEL}_pi-{self.PERTURBATIONS_INTENSITY_STD}_init_{self.INITIAL_POLE_ANGLE}' + file_path = f'{logs_dir}{datetime.datetime.now()}_{base_file_name}.pkl' + store_array(self.states_list, file_path) + # plt.plot(self.reward_list) + # plt.legend("reward per episode") + # plt.show() diff --git a/rl_studio/agents/cartpole/inference_ppo_continuous.py b/rl_studio/agents/cartpole/inference_ppo_continuous.py new file mode 100644 index 000000000..5041bfbff --- /dev/null +++ b/rl_studio/agents/cartpole/inference_ppo_continuous.py @@ -0,0 +1,148 @@ +import datetime +import time +import random + +import gym +import matplotlib.pyplot as plt +from torch.utils import tensorboard +from tqdm import tqdm + +import logging + +from rl_studio.agents.cartpole.cartpole_Inferencer import CartpoleInferencer +from rl_studio.visual.ascii.images import JDEROBOT_LOGO +from rl_studio.visual.ascii.text import JDEROBOT, LETS_GO +from rl_studio.agents.cartpole.utils import store_array, show_fails_success_comparisson +from rl_studio.wrappers.inference_rlstudio import InferencerWrapper + + +class PPOCartpoleInferencer(CartpoleInferencer): + def __init__(self, params): + super().__init__(params); + self.BLOCKED_EXPERIENCE_BATCH = self.environment_params[ + "block_experience_batch" + ] + + self.actions = self.env.action_space + + self.losses_list, self.reward_list, self.states_list, self.episode_len_list, self.epsilon_list = ( + [], + [], + [], + [], + [], + ) # metrics recorded for graph + # recorded for graph + self.epsilon = params["algorithm"]["epsilon"] + + inference_file = params["inference"]["inference_file"] + # TODO the first parameter (algorithm) should come from configuration + self.inferencer = InferencerWrapper("ppo_continuous", inference_file, env=self.env) + + def print_init_info(self): + logging.info(JDEROBOT) + logging.info(JDEROBOT_LOGO) + logging.info(f"\t- Start hour: {datetime.datetime.now()}\n") + logging.info(f"\t- self.environment params:\n{self.environment_params}") + + def gather_statistics(self, losses, ep_len, episode_rew, state): + if losses is not None: + self.losses_list.append(losses / ep_len) + self.reward_list.append(episode_rew) + self.states_list.append(state) + # self.episode_len_list.append(ep_len) + # self.epsilon_list.append(self.epsilon) + + # def final_demonstration(self): + # def final_demonstration(self): + # for i in tqdm(range(2)): + # obs, done, rew = self.env.reset(), False, 0 + # while not done: + # obs = np.append(obs, -1) + # A = self.deepq.get_action(obs, self.env.action_space.n, epsilon=0) + # obs, reward, done, info = self.env.step(A.item()) + # rew += reward + # time.sleep(0.01) + # self.env.render() + # logging.info("\ndemonstration episode : {}, reward : {}".format(i, rew)) + + def run_experiment(self): + epoch_start_time = datetime.datetime.now() + self.reward_list = [] + self.reward_list = [] + + logs_dir = 'logs/cartpole/ppo_continuous/inference/' + logs_file_name = 'logs_file_' + str(self.RANDOM_START_LEVEL) + '_' + str( + self.RANDOM_PERTURBATIONS_LEVEL) + '_' + str(epoch_start_time) \ + + str(self.PERTURBATIONS_INTENSITY_STD) + '.log' + logging.basicConfig(filename=logs_dir + logs_file_name, filemode='a', + level=self.LOGGING_LEVEL, + format='%(name)s - %(levelname)s - %(message)s') + self.print_init_info() + + start_time_format = epoch_start_time.strftime("%Y%m%d_%H%M") + logging.info(LETS_GO) + total_reward_in_epoch = 0 + episode_rewards = [] + global_steps = 0 + w = tensorboard.SummaryWriter(log_dir=f"{logs_dir}/tensorboard/{start_time_format}") + total_secs=0 + + for run in tqdm(range(self.RUNS)): + states = [] + state, done, prev_prob_act, ep_len, episode_rew = self.env.reset(), False, None, 0, 0 + while not done: + actor_loss = None + states.append(state[2]) + ep_len += 1 + global_steps += 1 + if random.uniform(0, 1) < self.RANDOM_PERTURBATIONS_LEVEL: + perturbation_action = random.randrange(2) + state, done, _, _ = self.env.perturbate(perturbation_action, self.PERTURBATIONS_INTENSITY_STD) + logging.debug("perturbated in step {} with action {}".format(episode_rew, perturbation_action)) + if self.RANDOM_PERTURBATIONS_LEVEL > 1 and random.uniform(0, 1) < self.RANDOM_PERTURBATIONS_LEVEL - 1: + perturbation_action = random.randrange(2) + state, done, _, _ = self.env.perturbate(perturbation_action, self.PERTURBATIONS_INTENSITY_STD) + + action = self.inferencer.inference(state) + next_state, reward, done, info = self.env.step(action) + total_secs += info["time"] + + episode_rew += reward + total_reward_in_epoch += reward + state = next_state + + # w.add_scalar("reward/episode_reward", episode_rew, global_step=run) + episode_rewards.append(episode_rew) + + if run % self.SHOW_EVERY == 0 and run != 0: + self.env.render() + + self.gather_statistics(actor_loss, ep_len, episode_rew, states) + + # monitor progress + if (run+1) % self.UPDATE_EVERY == 0: + time_spent = datetime.datetime.now() - epoch_start_time + epoch_start_time = datetime.datetime.now() + avgsecs = total_secs / total_reward_in_epoch + total_secs = 0 + updates_message = 'Run: {0} Average: {1} time spent {2} avg_iter {3}'.format(run, + total_reward_in_epoch / self.UPDATE_EVERY, + str(time_spent), avgsecs) + logging.info(updates_message) + print(updates_message) + + total_reward_in_epoch = 0 + + # self.final_demonstration() + base_file_name = f'_rewards_rsl-{self.RANDOM_START_LEVEL}_rpl-{self.RANDOM_PERTURBATIONS_LEVEL}_pi-{self.PERTURBATIONS_INTENSITY_STD}_init_{self.INITIAL_POLE_ANGLE}' + file_path = f'{logs_dir}{datetime.datetime.now()}_{base_file_name}.pkl' + store_array(self.reward_list, file_path) + base_file_name = f'_states_rsl-{self.RANDOM_START_LEVEL}_rpl-{self.RANDOM_PERTURBATIONS_LEVEL}_pi-{self.PERTURBATIONS_INTENSITY_STD}_init_{self.INITIAL_POLE_ANGLE}' + file_path = f'{logs_dir}{datetime.datetime.now()}_{base_file_name}.pkl' + store_array(self.states_list, file_path) + # plt.plot(self.reward_list) + # plt.legend("reward per episode") + # plt.show() + + diff --git a/rl_studio/agents/cartpole/inference_qlearn.py b/rl_studio/agents/cartpole/inference_qlearn.py index 8af8fc39e..4fea33d00 100755 --- a/rl_studio/agents/cartpole/inference_qlearn.py +++ b/rl_studio/agents/cartpole/inference_qlearn.py @@ -5,45 +5,20 @@ import numpy as np from rl_studio.agents.cartpole import utils +from rl_studio.agents.cartpole.cartpole_Inferencer import CartpoleInferencer from rl_studio.visual.ascii.images import JDEROBOT_LOGO from rl_studio.visual.ascii.text import JDEROBOT, QLEARN_CAMERA, LETS_GO from rl_studio.wrappers.inference_rlstudio import InferencerWrapper -from rl_studio.agents.cartpole.utils import store_rewards, show_fails_success_comparisson +from rl_studio.agents.cartpole.utils import store_array, show_fails_success_comparisson import random -class QLearnCartpoleInferencer: +class QLearnCartpoleInferencer(CartpoleInferencer): def __init__(self, params): - # TODO: Create a pydantic metaclass to simplify the way we extract the params - # environment params - self.params = params - self.environment_params = params.environment["params"] - self.env_name = params.environment["params"]["env_name"] - self.RANDOM_PERTURBATIONS_LEVEL = self.environment_params.get("random_perturbations_level", 0) - self.PERTURBATIONS_INTENSITY_STD = self.environment_params.get("perturbations_intensity_std", 0) - self.RANDOM_START_LEVEL = self.environment_params.get("random_start_level", 0) - self.INITIAL_POLE_ANGLE = self.environment_params.get("initial_pole_angle", None) - - # Unfortunately, max_steps is not working with new_step_api=True and it is not giving any benefit. - # self.env = gym.make(self.env_name, new_step_api=True, random_start_level=random_start_level) - non_recoverable_angle = self.environment_params[ - "non_recoverable_angle" - ] - self.env = gym.make(self.env_name, random_start_level=self.RANDOM_START_LEVEL, - initial_pole_angle=self.INITIAL_POLE_ANGLE, - non_recoverable_angle=non_recoverable_angle) - self.RUNS = self.environment_params[ - "runs" - ] # Number of iterations run TODO set this from config.yml - self.ANGLE_BINS = self.environment_params["angle_bins"] - self.POS_BINS = self.environment_params["pos_bins"] - - self.SHOW_EVERY = self.environment_params[ - "show_every" - ] # How oftern the current solution is rendered - self.UPDATE_EVERY = self.environment_params[ - "update_every" - ] # How oftern the current progress is recorded + super().__init__(params); + + self.ANGLE_BINS = self.environment_params.get("angle_bins", 100) + self.POS_BINS = self.environment_params.get("pos_bins", 20) self.bins, self.obsSpaceSize, self.qTable = utils.create_bins_and_q_table( self.env, self.ANGLE_BINS, self.POS_BINS @@ -55,6 +30,13 @@ def __init__(self, params): "min": [], "max": [], } # metrics recorded for graph + self.reward_list, self.states_list, self.episode_len_list, self.min, self.max = ( + [], + [], + [], + [], + [], + ) # metrics recorded for graph # algorithm params self.states_counter = {} self.states_reward = {} @@ -62,8 +44,8 @@ def __init__(self, params): self.actions = range(self.env.action_space.n) self.env.done = True - inference_file = params.inference["params"]["inference_file"] - actions_file = params.inference["params"]["actions_file"] + inference_file = params["inference"]["inference_file"] + actions_file = params["inference"].get("actions_file") self.total_episodes = 20000 @@ -88,18 +70,22 @@ def evaluate_from_step(self, state): nextState, reward, done, info = self.env.step(action) nextState = utils.get_discrete_state(nextState, self.bins, self.obsSpaceSize) - return nextState, done + return nextState, done, info["time"] - def main(self): + def run_experiment(self): self.print_init_info() + self.states_list = [] + self.reward_list = [] start_time = datetime.datetime.now() start_time_format = start_time.strftime("%Y%m%d_%H%M") print(LETS_GO) - + total_secs=0 for run in range(self.RUNS): + states = [] + state = utils.get_discrete_state( self.env.reset(), self.bins, self.obsSpaceSize ) @@ -108,27 +94,68 @@ def main(self): while not done: cnt += 1 - - if run % self.SHOW_EVERY == 0: + states.append(state[2]) + if run % self.SHOW_EVERY == 0 and run != 0: self.env.render() # if running RL comment this oustatst if random.uniform(0, 1) < self.RANDOM_PERTURBATIONS_LEVEL: perturbation_action = random.randrange(self.env.action_space.n) - obs, done, _, _ = self.env.perturbate(perturbation_action, self.PERTURBATIONS_INTENSITY_STD) - next_state, done = self.evaluate_from_step(state) + state, done, _, _ = self.env.perturbate(perturbation_action, self.PERTURBATIONS_INTENSITY_STD) + state = utils.get_discrete_state(state, self.bins, self.obsSpaceSize) + if self.RANDOM_PERTURBATIONS_LEVEL > 1 and random.uniform(0, 1) < self.RANDOM_PERTURBATIONS_LEVEL - 1: + perturbation_action = random.randrange(self.env.action_space.n) + state, done, _, _ = self.env.perturbate(perturbation_action, self.PERTURBATIONS_INTENSITY_STD) + state = utils.get_discrete_state(state, self.bins, self.obsSpaceSize) + + next_state, done, secs = self.evaluate_from_step(state) + total_secs += secs if not done: state = next_state + self.previousCnt.append(cnt) + + if run % self.UPDATE_EVERY == 0: + latestRuns = self.previousCnt[-self.UPDATE_EVERY:] + averageCnt = sum(latestRuns) / len(latestRuns) + avgsecs = total_secs / sum(latestRuns) + total_secs = 0 + self.min.append(min(latestRuns)) + self.max.append(max(latestRuns)) + + time_spent = datetime.datetime.now() - self.now + self.now = datetime.datetime.now() + print( + "Run:", + run, + "Average:", + averageCnt, + "Min:", + min(latestRuns), + "Max:", + max(latestRuns), + "time spent", + time_spent, + "time", + self.now, + "avg_iter_time", + avgsecs, + ) # Add new metrics for graph - self.metrics["ep"].append(run) - self.metrics["avg"].append(cnt) + self.episode_len_list.append(run) + self.reward_list.append(cnt) + self.states_list.append(states) self.env.close() - base_file_name = f'_rewards_rsl-{self.RANDOM_START_LEVEL}_rpl-{self.RANDOM_PERTURBATIONS_LEVEL}_pi-{self.PERTURBATIONS_INTENSITY_STD}' - file_path = f'./logs/cartpole/qlearning/inference/{datetime.datetime.now()}_{base_file_name}.pkl' - store_rewards(self.metrics["avg"], file_path) + logs_dir = './logs/cartpole/qlearning/inference/' + + base_file_name = f'_rewards_rsl-{self.RANDOM_START_LEVEL}_rpl-{self.RANDOM_PERTURBATIONS_LEVEL}_pi-{self.PERTURBATIONS_INTENSITY_STD}_init_{self.INITIAL_POLE_ANGLE}' + file_path = f'{logs_dir}{datetime.datetime.now()}_{base_file_name}.pkl' + store_array(self.reward_list, file_path) + base_file_name = f'_states_rsl-{self.RANDOM_START_LEVEL}_rpl-{self.RANDOM_PERTURBATIONS_LEVEL}_pi-{self.PERTURBATIONS_INTENSITY_STD}_init_{self.INITIAL_POLE_ANGLE}' + file_path = f'{logs_dir}{datetime.datetime.now()}_{base_file_name}.pkl' + store_array(self.states_list, file_path) # Plot graph - plt.plot(self.metrics["ep"], self.metrics["avg"], label="average rewards") - plt.legend(loc=4) - plt.show() + # plt.plot(self.episode_len_list, self.reward_list, label="average rewards") + # plt.legend(loc=4) + # plt.show() diff --git a/rl_studio/agents/cartpole/requirements.txt b/rl_studio/agents/cartpole/requirements.txt index dfd7b4ff0..826989f2c 100644 --- a/rl_studio/agents/cartpole/requirements.txt +++ b/rl_studio/agents/cartpole/requirements.txt @@ -1,5 +1,117 @@ +absl-py==1.3.0 +#actionlib==1.13.2 +#angles==1.9.13 +appdirs==1.4.4 +cachetools==5.2.0 +#camera-calibration-parsers==1.12.0 +#catkin==0.8.10 +#catkin-pkg==0.4.23 +certifi==2021.5.30 +cfgv==3.3.0 +chardet==4.0.0 +cloudpickle==1.6.0 +#controller-manager==0.19.5 +#controller-manager-msgs==0.19.5 +#cv-bridge==1.16.0 +cycler==0.10.0 +decorator==4.4.2 +defusedxml==0.6.0 +#diagnostic-updater==1.11.0 +distlib==0.3.2 +distro==1.5.0 +docutils==0.17.1 +#dynamic-reconfigure==1.7.3 +environs==9.2.0 +filelock==3.0.12 +future==0.18.2 +#gazebo_plugins==2.9.2 +#gazebo_ros==2.9.2 +#gencpp==0.6.5 +#geneus==3.0.0 +#genlisp==0.4.18 +#genmsg==0.5.16 +#gennodejs==2.0.2 +#genpy==0.6.15 +#google-auth==2.15.0 +#google-auth-oauthlib==0.4.6 +#grpcio==1.51.1 gym==0.25.0 +gym-notices==0.0.8 +gymnasium-notices==0.0.1 +identify==2.2.11 +idna==2.10 +imageio==2.9.0 +importlib-metadata==5.2.0 +#jax-jumpy==0.2.0 +#kiwisolver==1.3.1 +Markdown==3.4.1 +MarkupSafe==2.1.1 +marshmallow==3.12.2 matplotlib==3.3.2 -numpy==1.17.4 +#message-filters==1.15.14 +#netifaces==0.10.9 +#networkx==2.5.1 +#nodeenv==1.6.0 +numpy==1.18.5 +#oauthlib==3.2.2 +opencv-python==4.2.0.32 +pandas==1.4.4 +Pillow==8.3.1 +#pre-commit==2.13.0 +#protobuf==3.20.3 +py-markdown-table==0.3.3 +#pyasn1==0.4.8 +#pyasn1-modules==0.2.8 +pydantic==1.10.3 +pygame==2.1.0 +Pygments==2.14.00 +pyglet==1.5.0 +pyparsing==2.4.7 +python-dateutil==2.8.1 +python-dotenv==0.18.0 +pytz==2022.7 +#PyWavelets==1.1.1 +PyYAML==5.4.1 +requests==2.25.1 +requests-oauthlib==1.3.1 +#rosbag==1.15.14 +#rosclean==1.15.8 +#rosgraph==1.15.14 +#roslaunch==1.15.14 +#roslib==1.15.8 +#roslz4==1.15.14 +#rosmaster==1.15.14 +#rosmsg==1.15.14 +#rosnode==1.15.14 +#rosparam==1.15.14 +#rospkg==1.2.8 +#rospy==1.15.14 +#rosservice==1.15.14 +#rostest==1.15.14 +#rostopic==1.15.14 +#rosunit==1.15.8 +#roswtf==1.15.14 +rsa==4.9 +scikit-image==0.17.2 +scipy==1.6.1 +#sensor-msgs==1.13.1 +#Shimmy==0.2.0 +#six==1.14.0 +#smclib==1.8.6 +tensorboard==2.11.0 +tensorboard-data-server==0.6.1 +tensorboard-plugin-wit==1.8.1 +#tf==1.13.2 +#tf2-py==0.7.5 +#tf2-ros==0.7.5 +tifffile==2021.7.2 +toml==0.10.2 +#topic-tools==1.15.14 +#torch==1.12.1+cu113 +torch==1.12.1 tqdm==4.64.0 -py-markdown-table==0.3.3 \ No newline at end of file +typing_extensions==4.4.0 +urllib3==1.26.6 +virtualenv==20.4.7 +Werkzeug==2.2.2 +zipp==3.11.0 \ No newline at end of file diff --git a/rl_studio/agents/cartpole/train_ddpg.py b/rl_studio/agents/cartpole/train_ddpg.py new file mode 100644 index 000000000..4ede300c1 --- /dev/null +++ b/rl_studio/agents/cartpole/train_ddpg.py @@ -0,0 +1,248 @@ +import datetime +import random + +import gym +import matplotlib.pyplot as plt +from torch.utils import tensorboard +from tqdm import tqdm +import torch + +import logging + +from rl_studio.agents.cartpole import utils +from rl_studio.algorithms.ddpg_torch import Actor, Critic, Memory +from rl_studio.visual.ascii.images import JDEROBOT_LOGO +from rl_studio.visual.ascii.text import JDEROBOT, LETS_GO +from rl_studio.agents.cartpole.utils import store_array, save_metadata + + +class DDPGCartpoleTrainer: + def __init__(self, params): + + self.now = datetime.datetime.now() + # self.environment params + self.params = params + self.environment_params = params.get("environments") + self.env_name = params.get("environments")["env_name"] + self.config = params.get("settings") + self.agent_config = params.get("agent") + + if self.config["logging_level"] == "debug": + self.LOGGING_LEVEL = logging.DEBUG + elif self.config["logging_level"] == "error": + self.LOGGING_LEVEL = logging.ERROR + elif self.config["logging_level"] == "critical": + self.LOGGING_LEVEL = logging.CRITICAL + else: + self.LOGGING_LEVEL = logging.INFO + + self.RANDOM_PERTURBATIONS_LEVEL = self.environment_params.get("random_perturbations_level", 0) + self.PERTURBATIONS_INTENSITY_STD = self.environment_params.get("perturbations_intensity_std", 0) + self.RANDOM_START_LEVEL = self.environment_params.get("random_start_level", 0) + self.INITIAL_POLE_ANGLE = self.environment_params.get("initial_pole_angle", None) + + non_recoverable_angle = self.environment_params[ + "non_recoverable_angle" + ] + # Unfortunately, max_steps is not working with new_step_api=True and it is not giving any benefit. + # self.env = gym.make(self.env_name, new_step_api=True, random_start_level=random_start_level) + self.env = gym.make(self.env_name, random_start_level=self.RANDOM_START_LEVEL, + initial_pole_angle=self.INITIAL_POLE_ANGLE, + non_recoverable_angle=non_recoverable_angle) + + self.RUNS = self.environment_params["runs"] + self.SHOW_EVERY = self.environment_params[ + "show_every" + ] + self.UPDATE_EVERY = self.environment_params[ + "update_every" + ] # How often the current progress is recorded + self.OBJECTIVE_REWARD = self.environment_params[ + "objective_reward" + ] + self.BLOCKED_EXPERIENCE_BATCH = self.environment_params[ + "block_experience_batch" + ] + + self.actions = self.env.action_space.shape[0] + + self.losses_list, self.reward_list, self.episode_len_list, self.epsilon_list = ( + [], + [], + [], + [], + ) # metrics + # recorded for graph + self.GAMMA = params["algorithm"]["gamma"] + hidden_size = params["algorithm"]["hidden_size"] + self.batch_size = params["algorithm"]["batch_size"] + self.tau = 1e-2 + + self.max_avg = 100 + + self.num_actions = self.env.action_space.shape[0] + input_dim = self.env.observation_space.shape[0] + + self.actor = Actor(input_dim, self.num_actions, self.env.action_space, hidden_size) + self.actor_target = Actor(input_dim, self.num_actions, self.env.action_space, hidden_size) + self.critic = Critic(input_dim + self.num_actions, hidden_size, self.num_actions) + self.critic_target = Critic(input_dim + self.num_actions, hidden_size, self.num_actions) + + # We initialize the target networks as copies of the original networks + for target_param, param in zip(self.actor_target.parameters(), self.actor.parameters()): + target_param.data.copy_(param.data) + for target_param, param in zip(self.critic_target.parameters(), self.critic.parameters()): + target_param.data.copy_(param.data) + + # Training + self.memory = Memory(50000) + self.global_step = 0 + + def print_init_info(self): + logging.info(JDEROBOT) + logging.info(JDEROBOT_LOGO) + logging.info(f"\t- Start hour: {datetime.datetime.now()}\n") + logging.info(f"\t- self.environment params:\n{self.environment_params}") + + def gather_statistics(self, losses, ep_len, episode_rew): + if losses is not None: + self.losses_list.append(losses / ep_len) + self.reward_list.append(episode_rew) + self.episode_len_list.append(ep_len) + + # def final_demonstration(self): + # for i in tqdm(range(2)): + # obs, done, rew = self.env.reset(), False, 0 + # while not done: + # obs = np.append(obs, -1) + # A = self.deepq.get_action(obs, self.env.action_space.n, epsilon=0) + # obs, reward, done, info = self.env.step(A.item()) + # rew += reward + # time.sleep(0.01) + # self.env.render() + # logging.info("\ndemonstration episode : {}, reward : {}".format(i, rew)) + + def update(self, batch_size): + states, actions, rewards, next_states, _ = self.memory.sample(batch_size) + states = torch.FloatTensor(states) + actions = torch.FloatTensor(actions) + rewards = torch.FloatTensor(rewards) + next_states = torch.FloatTensor(next_states) + + # Critic loss + qvals = self.critic.forward(states, actions) + next_actions = self.actor_target.forward(next_states) + next_q = self.critic_target.forward(next_states, next_actions.detach()) + qprime = rewards + self.GAMMA * next_q + critic_loss = self.critic.critic_criterion(qvals, qprime) + + # Actor loss + policy_loss = self.critic.forward(states, self.actor.forward(states)).mean() + + # update networks + self.actor.actor_optimizer.zero_grad() + policy_loss.backward() + self.actor.actor_optimizer.step() + + self.critic.critic_optimizer.zero_grad() + critic_loss.backward() + self.critic.critic_optimizer.step() + + # update target networks + for target_param, param in zip(self.actor_target.parameters(), self.actor.parameters()): + target_param.data.copy_(param.data * self.tau + target_param.data * (1.0 - self.tau)) + + for target_param, param in zip(self.critic_target.parameters(), self.critic.parameters()): + target_param.data.copy_(param.data * self.tau + target_param.data * (1.0 - self.tau)) + + return policy_loss, critic_loss + def main(self): + epoch_start_time = datetime.datetime.now() + + logs_dir = 'logs/cartpole/ddpg/training/' + logs_file_name = 'logs_file_' + str(self.RANDOM_START_LEVEL) + '_' + str( + self.RANDOM_PERTURBATIONS_LEVEL) + '_' + str(epoch_start_time) \ + + str(self.PERTURBATIONS_INTENSITY_STD) + '.log' + logging.basicConfig(filename=logs_dir + logs_file_name, filemode='a', + level=self.LOGGING_LEVEL, + format='%(name)s - %(levelname)s - %(message)s') + self.print_init_info() + + start_time_format = epoch_start_time.strftime("%Y%m%d_%H%M") + + if self.config["save_model"]: + save_metadata("ddpg", start_time_format, self.params) + + logging.info(LETS_GO) + total_reward_in_epoch = 0 + episode_rewards = [] + w = tensorboard.SummaryWriter(log_dir=f"{logs_dir}/tensorboard/{start_time_format}") + total_secs=0 + total_eps=0 + + for run in tqdm(range(self.RUNS)): + state, done, ep_len, episode_rew = self.env.reset(), False, 0, 0 + self.actor.reset_noise() + while not done: + actor_loss = None + + ep_len += 1 + total_eps+=1 + self.global_step += 1 + if random.uniform(0, 1) < self.RANDOM_PERTURBATIONS_LEVEL: + perturbation_action = random.randrange(2) + state, done, _, _ = self.env.perturbate(perturbation_action, self.PERTURBATIONS_INTENSITY_STD) + logging.debug("perturbated in step {} with action {}".format(episode_rew, perturbation_action)) + + action = self.actor.get_action(state, ep_len) + w.add_scalar("actions/ep", action, global_step=self.global_step) + + next_state, reward, done, info = self.env.step(action) + total_secs+=info["time"] + self.memory.push(state, action, reward, next_state, done) + + if len(self.memory) > self.batch_size: + actor_loss, critic_loss = self.update(self.batch_size) + w.add_scalar("loss/actor_loss", actor_loss, global_step=self.global_step) + w.add_scalar("loss/critic_loss", critic_loss, global_step=self.global_step) + + episode_rew += reward + total_reward_in_epoch += reward + state = next_state + + w.add_scalar("reward/episode_reward", episode_rew, global_step=run) + episode_rewards.append(episode_rew) + + if run % self.SHOW_EVERY == 0: + self.env.render() + + self.gather_statistics(actor_loss, ep_len, episode_rew) + + # monitor progress + if (run + 1) % self.UPDATE_EVERY == 0: + time_spent = datetime.datetime.now() - epoch_start_time + epoch_start_time = datetime.datetime.now() + avgsecs = total_secs / total_reward_in_epoch + total_secs = 0 + updates_message = 'Run: {0} Average: {1} time spent {2} avg_iter {3}'.format(run, total_reward_in_epoch / self.UPDATE_EVERY, + str(time_spent), avgsecs) + logging.info(updates_message) + print(updates_message) + last_average = total_reward_in_epoch / self.UPDATE_EVERY; + if self.config["save_model"] and last_average > self.max_avg: + self.max_avg = total_reward_in_epoch / self.UPDATE_EVERY + logging.info(f"Saving model . . .") + utils.save_ddpg_model(self.actor, start_time_format, last_average) + + if last_average >= self.OBJECTIVE_REWARD: + logging.info("Training objective reached!!") + break + total_reward_in_epoch = 0 + + # self.final_demonstration() + base_file_name = f'_rewards_rsl-{self.RANDOM_START_LEVEL}_rpl-{self.RANDOM_PERTURBATIONS_LEVEL}_pi-{self.PERTURBATIONS_INTENSITY_STD}' + file_path = f'{logs_dir}{datetime.datetime.now()}_{base_file_name}.pkl' + store_array(self.reward_list, file_path) + plt.plot(self.reward_list) + plt.legend("reward per episode") + plt.show() diff --git a/rl_studio/agents/cartpole/train_dqn.py b/rl_studio/agents/cartpole/train_dqn.py index 860501c4c..26226a314 100755 --- a/rl_studio/agents/cartpole/train_dqn.py +++ b/rl_studio/agents/cartpole/train_dqn.py @@ -13,7 +13,7 @@ from rl_studio.algorithms.dqn_torch import DQN_Agent from rl_studio.visual.ascii.images import JDEROBOT_LOGO from rl_studio.visual.ascii.text import JDEROBOT, LETS_GO -from rl_studio.agents.cartpole.utils import store_rewards, save_metadata +from rl_studio.agents.cartpole.utils import store_array, save_metadata class DQNCartpoleTrainer: @@ -22,10 +22,10 @@ def __init__(self, params): self.now = datetime.datetime.now() # self.environment params self.params = params - self.environment_params = params.environment["params"] - self.env_name = params.environment["params"]["env_name"] - self.config = params.settings["params"] - self.agent_config = params.agent["params"] + self.environment_params = params["environments"] + self.env_name = self.environment_params["env_name"] + self.config = params["settings"] + self.agent_config = params["agent"] if self.config["logging_level"] == "debug": self.LOGGING_LEVEL = logging.DEBUG @@ -75,13 +75,13 @@ def __init__(self, params): ) # metrics # recorded for graph self.epsilon = 1 - self.EPSILON_DISCOUNT = params.algorithm["params"]["epsilon_discount"] - self.GAMMA = params.algorithm["params"]["gamma"] + self.EPSILON_DISCOUNT = params["algorithm"]["epsilon_discount"] + self.GAMMA = params["algorithm"]["gamma"] self.NUMBER_OF_EXPLORATION_STEPS = 128 input_dim = self.env.observation_space.shape[0] output_dim = self.env.action_space.n - self.exp_replay_size = params.algorithm["params"]["batch_size"] + self.exp_replay_size = params["algorithm"]["batch_size"] self.deepq = DQN_Agent( layer_sizes=[input_dim, 64, output_dim], lr=1e-3, @@ -116,10 +116,10 @@ def print_init_info(self): def evaluate_and_collect(self, state): A = self.deepq.get_action(state, self.env.action_space.n, self.epsilon) - next_state, reward, done, _ = self.env.step(A.item()) + next_state, reward, done, info = self.env.step(A.item()) self.deepq.collect_experience([state, A.item(), reward, next_state]) - return next_state, reward, done + return next_state, reward, done, info["time"] def train_in_batches(self, trainings, batch_size): losses = 0 @@ -166,6 +166,7 @@ def main(self): logging.info(LETS_GO) number_of_steps = 128 total_reward_in_epoch = 0 + total_secs = 0 for run in tqdm(range(self.RUNS)): state, done, losses, ep_len, episode_rew = self.env.reset(), False, 0, 0, 0 while not done: @@ -175,7 +176,8 @@ def main(self): perturbation_action = random.randrange(self.env.action_space.n) state, done, _, _ = self.env.perturbate(perturbation_action, self.PERTURBATIONS_INTENSITY_STD) logging.debug("perturbated in step {} with action {}".format(episode_rew, perturbation_action)) - next_state, reward, done = self.evaluate_and_collect(state) + next_state, reward, done, secs = self.evaluate_and_collect(state) + total_secs+=secs state = next_state episode_rew += reward total_reward_in_epoch += reward @@ -195,8 +197,10 @@ def main(self): if (run+1) % self.UPDATE_EVERY == 0: time_spent = datetime.datetime.now() - epoch_start_time epoch_start_time = datetime.datetime.now() - updates_message = 'Run: {0} Average: {1} epsilon {2} time spent {3}'.format(run, total_reward_in_epoch / self.UPDATE_EVERY, - self.epsilon, str(time_spent)) + avgsecs = total_secs / total_reward_in_epoch + total_secs = 0 + updates_message = 'Run: {0} Average: {1} epsilon {2} time_spent {3} avg_time_iter {4}'.format(run, total_reward_in_epoch / self.UPDATE_EVERY, + self.epsilon, str(time_spent), avgsecs) logging.info(updates_message) print(updates_message) if self.config["save_model"] and total_reward_in_epoch / self.UPDATE_EVERY > self.max_avg: @@ -211,7 +215,7 @@ def main(self): # self.final_demonstration() base_file_name = f'_rewards_rsl-{self.RANDOM_START_LEVEL}_rpl-{self.RANDOM_PERTURBATIONS_LEVEL}_pi-{self.PERTURBATIONS_INTENSITY_STD}' file_path = f'./logs/cartpole/dqn/training/{datetime.datetime.now()}_{base_file_name}.pkl' - store_rewards(self.reward_list, file_path) + store_array(self.reward_list, file_path) plt.plot(self.reward_list) plt.legend("reward per episode") plt.show() diff --git a/rl_studio/agents/cartpole/train_ppo.py b/rl_studio/agents/cartpole/train_ppo.py index efc3669cd..f2f349ce0 100644 --- a/rl_studio/agents/cartpole/train_ppo.py +++ b/rl_studio/agents/cartpole/train_ppo.py @@ -1,13 +1,10 @@ import datetime -import time import random import gym import matplotlib.pyplot as plt from torch.utils import tensorboard from tqdm import tqdm -import numpy as np -import torch import logging @@ -15,7 +12,7 @@ from rl_studio.algorithms.ppo import Actor, Critic, Mish, t, get_dist from rl_studio.visual.ascii.images import JDEROBOT_LOGO from rl_studio.visual.ascii.text import JDEROBOT, LETS_GO -from rl_studio.agents.cartpole.utils import store_rewards, save_metadata +from rl_studio.agents.cartpole.utils import store_array, save_metadata class PPOCartpoleTrainer: @@ -24,10 +21,10 @@ def __init__(self, params): self.now = datetime.datetime.now() # self.environment params self.params = params - self.environment_params = params.environment["params"] - self.env_name = params.environment["params"]["env_name"] - self.config = params.settings["params"] - self.agent_config = params.agent["params"] + self.environment_params = params.get("environments") + self.env_name = params.get("environments")["env_name"] + self.config = params.get("settings") + self.agent_config = params.get("agent") if self.config["logging_level"] == "debug": self.LOGGING_LEVEL = logging.DEBUG @@ -43,6 +40,7 @@ def __init__(self, params): self.RANDOM_START_LEVEL = self.environment_params.get("random_start_level", 0) self.INITIAL_POLE_ANGLE = self.environment_params.get("initial_pole_angle", None) + non_recoverable_angle = self.environment_params[ "non_recoverable_angle" ] @@ -74,8 +72,8 @@ def __init__(self, params): [], ) # metrics # recorded for graph - self.epsilon = params.algorithm["params"]["epsilon"] - self.GAMMA = params.algorithm["params"]["gamma"] + self.epsilon = params.get("algorithm").get("epsilon") + self.GAMMA = params.get("algorithm").get("gamma") self.NUMBER_OF_EXPLORATION_STEPS = 128 input_dim = self.env.observation_space.shape[0] @@ -133,7 +131,7 @@ def main(self): episode_rewards = [] global_steps = 0 w = tensorboard.SummaryWriter(log_dir=f"{logs_dir}/tensorboard/{start_time_format}") - + total_secs=0 for run in tqdm(range(self.RUNS)): state, done, prev_prob_act, ep_len, episode_rew = self.env.reset(), False, None, 0, 0 while not done: @@ -152,6 +150,7 @@ def main(self): prob_act = dist.log_prob(action) next_state, reward, done, info = self.env.step(action.detach().data.numpy()) + total_secs+=info["time"] advantage = reward + (1 - done) * self.GAMMA * self.critic(t(next_state)) - self.critic(t(state)) w.add_scalar("loss/advantage", advantage, global_step=global_steps) @@ -180,15 +179,17 @@ def main(self): if (run+1) % self.UPDATE_EVERY == 0: time_spent = datetime.datetime.now() - epoch_start_time epoch_start_time = datetime.datetime.now() - updates_message = 'Run: {0} Average: {1} time spent {2}'.format(run, total_reward_in_epoch / self.UPDATE_EVERY, - str(time_spent)) + avgsecs = total_secs / total_reward_in_epoch + total_secs = 0 + updates_message = 'Run: {0} Average: {1} time spent {2} avg_iter {3}'.format(run, total_reward_in_epoch / self.UPDATE_EVERY, + str(time_spent), avgsecs) logging.info(updates_message) print(updates_message) last_average = total_reward_in_epoch / self.UPDATE_EVERY; if self.config["save_model"] and last_average > self.max_avg: self.max_avg = total_reward_in_epoch / self.UPDATE_EVERY logging.info(f"Saving model . . .") - utils.save_ppo_model(self.actor, start_time_format, last_average, self.params) + utils.save_ppo_model(self.actor, start_time_format, last_average) if last_average >= self.OBJECTIVE_REWARD: logging.info("Training objective reached!!") @@ -198,7 +199,7 @@ def main(self): # self.final_demonstration() base_file_name = f'_rewards_rsl-{self.RANDOM_START_LEVEL}_rpl-{self.RANDOM_PERTURBATIONS_LEVEL}_pi-{self.PERTURBATIONS_INTENSITY_STD}' file_path = f'{logs_dir}{datetime.datetime.now()}_{base_file_name}.pkl' - store_rewards(self.reward_list, file_path) + store_array(self.reward_list, file_path) plt.plot(self.reward_list) plt.legend("reward per episode") plt.show() diff --git a/rl_studio/agents/cartpole/train_ppo_continous.py b/rl_studio/agents/cartpole/train_ppo_continous.py new file mode 100644 index 000000000..a22c00826 --- /dev/null +++ b/rl_studio/agents/cartpole/train_ppo_continous.py @@ -0,0 +1,210 @@ +import datetime +import random + +import gym +import matplotlib.pyplot as plt +from torch.utils import tensorboard +from tqdm import tqdm + +import logging + +from rl_studio.agents.cartpole import utils +from rl_studio.algorithms.ppo_continuous import PPO +from rl_studio.visual.ascii.images import JDEROBOT_LOGO +from rl_studio.visual.ascii.text import JDEROBOT, LETS_GO +from rl_studio.agents.cartpole.utils import store_array, save_metadata + + +class PPOCartpoleTrainer: + def __init__(self, params): + + self.now = datetime.datetime.now() + # self.environment params + self.params = params + self.environment_params = params.get("environments") + self.env_name = params.get("environments")["env_name"] + self.config = params.get("settings") + self.agent_config = params.get("agent") + + if self.config["logging_level"] == "debug": + self.LOGGING_LEVEL = logging.DEBUG + elif self.config["logging_level"] == "error": + self.LOGGING_LEVEL = logging.ERROR + elif self.config["logging_level"] == "critical": + self.LOGGING_LEVEL = logging.CRITICAL + else: + self.LOGGING_LEVEL = logging.INFO + + self.RANDOM_PERTURBATIONS_LEVEL = self.environment_params.get("random_perturbations_level", 0) + self.PERTURBATIONS_INTENSITY_STD = self.environment_params.get("perturbations_intensity_std", 0) + self.RANDOM_START_LEVEL = self.environment_params.get("random_start_level", 0) + self.INITIAL_POLE_ANGLE = self.environment_params.get("initial_pole_angle", None) + + non_recoverable_angle = self.environment_params[ + "non_recoverable_angle" + ] + # Unfortunately, max_steps is not working with new_step_api=True and it is not giving any benefit. + # self.env = gym.make(self.env_name, new_step_api=True, random_start_level=random_start_level) + self.env = gym.make(self.env_name, random_start_level=self.RANDOM_START_LEVEL, + initial_pole_angle=self.INITIAL_POLE_ANGLE, + non_recoverable_angle=non_recoverable_angle) + + self.RUNS = self.environment_params["runs"] + self.SHOW_EVERY = self.environment_params[ + "show_every" + ] + self.UPDATE_EVERY = self.environment_params[ + "update_every" + ] # How often the current progress is recorded + self.OBJECTIVE_REWARD = self.environment_params[ + "objective_reward" + ] + self.BLOCKED_EXPERIENCE_BATCH = self.environment_params[ + "block_experience_batch" + ] + + self.actions = self.env.action_space.shape[0] + + self.losses_list, self.reward_list, self.episode_len_list, self.epsilon_list = ( + [], + [], + [], + [], + ) # metrics + # recorded for graph + self.epsilon = params.get("algorithm").get("epsilon") + self.GAMMA = params.get("algorithm").get("gamma") + self.episodes_update = params.get("algorithm").get("episodes_update") + + input_dim = self.env.observation_space.shape[0] + lr_actor = 0.0003 + lr_critic = 0.001 + K_epochs = 80 + action_std = 0.6 # starting std for action distribution (Multivariate Normal) + self.action_std_decay_rate = 0.05 # linearly decay action_std (action_std = action_std - action_std_decay_rate) + self.min_action_std = 0.1 # minimum action_std (stop decay after action_std <= min_action_std) + self.action_std_decay_freq = int(2.5e5) # action_std decay frequency (in num timesteps) + self.ppo_agent = PPO(input_dim, self.actions, lr_actor, lr_critic, self.GAMMA, K_epochs, self.epsilon, + True, action_std) + + self.max_avg = 0 + + def print_init_info(self): + logging.info(JDEROBOT) + logging.info(JDEROBOT_LOGO) + logging.info(f"\t- Start hour: {datetime.datetime.now()}\n") + logging.info(f"\t- self.environment params:\n{self.environment_params}") + + def gather_statistics(self, losses, ep_len, episode_rew): + if losses is not None: + self.losses_list.append(losses / ep_len) + self.reward_list.append(episode_rew) + self.episode_len_list.append(ep_len) + self.epsilon_list.append(self.epsilon) + + # def final_demonstration(self): + # for i in tqdm(range(2)): + # obs, done, rew = self.env.reset(), False, 0 + # while not done: + # obs = np.append(obs, -1) + # A = self.deepq.get_action(obs, self.env.action_space.n, epsilon=0) + # obs, reward, done, info = self.env.step(A.item()) + # rew += reward + # time.sleep(0.01) + # self.env.render() + # logging.info("\ndemonstration episode : {}, reward : {}".format(i, rew)) + + def main(self): + epoch_start_time = datetime.datetime.now() + + logs_dir = 'logs/cartpole/ppo_continuous/training/' + logs_file_name = 'logs_file_' + str(self.RANDOM_START_LEVEL) + '_' + str( + self.RANDOM_PERTURBATIONS_LEVEL) + '_' + str(epoch_start_time) \ + + str(self.PERTURBATIONS_INTENSITY_STD) + '.log' + logging.basicConfig(filename=logs_dir + logs_file_name, filemode='a', + level=self.LOGGING_LEVEL, + format='%(name)s - %(levelname)s - %(message)s') + self.print_init_info() + + start_time_format = epoch_start_time.strftime("%Y%m%d_%H%M") + + if self.config["save_model"]: + save_metadata("ppo_continuous", start_time_format, self.params) + + logging.info(LETS_GO) + total_reward_in_epoch = 0 + episode_rewards = [] + global_steps = 0 + w = tensorboard.SummaryWriter(log_dir=f"{logs_dir}/tensorboard/{start_time_format}") + total_secs=0 + + for run in tqdm(range(self.RUNS)): + state, done, prev_prob_act, ep_len, episode_rew = self.env.reset(), False, None, 0, 0 + while not done: + actor_loss = None + + ep_len += 1 + global_steps += 1 + if random.uniform(0, 1) < self.RANDOM_PERTURBATIONS_LEVEL: + perturbation_action = random.randrange(2) + state, done, _, _ = self.env.perturbate(perturbation_action, self.PERTURBATIONS_INTENSITY_STD) + logging.debug("perturbated in step {} with action {}".format(episode_rew, perturbation_action)) + + action = self.ppo_agent.select_action(state) + next_state, reward, done, info = self.env.step(action) + total_secs+=info["time"] + self.ppo_agent.buffer.rewards.append(reward) + self.ppo_agent.buffer.is_terminals.append(done) + + # update PPO agent + if global_steps % self.episodes_update == 0: + self.ppo_agent.update() + + if global_steps % self.action_std_decay_freq == 0: + self.ppo_agent.decay_action_std(self.action_std_decay_rate, self.min_action_std) + + # w.add_scalar("actions/action_prob", dist.probs, global_step=global_steps) + + episode_rew += reward + total_reward_in_epoch += reward + state = next_state + + w.add_scalar("reward/episode_reward", episode_rew, global_step=run) + episode_rewards.append(episode_rew) + + if run % self.SHOW_EVERY == 0: + self.env.render() + + self.gather_statistics(actor_loss, ep_len, episode_rew) + + # monitor progress + if (run + 1) % self.UPDATE_EVERY == 0: + time_spent = datetime.datetime.now() - epoch_start_time + epoch_start_time = datetime.datetime.now() + avgsecs = total_secs / total_reward_in_epoch + total_secs = 0 + updates_message = 'Run: {0} Average: {1} time spent {2} avg_iter {3}'.format(run, + total_reward_in_epoch / self.UPDATE_EVERY, + str(time_spent), avgsecs) + logging.info(updates_message) + print(updates_message) + last_average = total_reward_in_epoch / self.UPDATE_EVERY; + if self.config["save_model"] and last_average > self.max_avg: + self.max_avg = total_reward_in_epoch / self.UPDATE_EVERY + logging.info(f"Saving model . . .") + checkpoints_path = "./logs/cartpole/ppo_continuous/checkpoints/" + start_time_format + "_actor_avg_" \ + + str(last_average) + self.ppo_agent.save(checkpoints_path) + + if last_average >= self.OBJECTIVE_REWARD: + logging.info("Training objective reached!!") + break + total_reward_in_epoch = 0 + + # self.final_demonstration() + base_file_name = f'_rewards_rsl-{self.RANDOM_START_LEVEL}_rpl-{self.RANDOM_PERTURBATIONS_LEVEL}_pi-{self.PERTURBATIONS_INTENSITY_STD}' + file_path = f'{logs_dir}{datetime.datetime.now()}_{base_file_name}.pkl' + store_array(self.reward_list, file_path) + plt.plot(self.reward_list) + plt.legend("reward per episode") + plt.show() diff --git a/rl_studio/agents/cartpole/train_qlearn.py b/rl_studio/agents/cartpole/train_qlearn.py index eed747484..d649f2aff 100755 --- a/rl_studio/agents/cartpole/train_qlearn.py +++ b/rl_studio/agents/cartpole/train_qlearn.py @@ -8,7 +8,7 @@ from rl_studio.algorithms.qlearn_multiple_states import QLearn from rl_studio.visual.ascii.images import JDEROBOT_LOGO from rl_studio.visual.ascii.text import JDEROBOT, QLEARN_CAMERA, LETS_GO -from rl_studio.agents.cartpole.utils import store_rewards, save_metadata +from rl_studio.agents.cartpole.utils import store_array, save_metadata class QLearnCartpoleTrainer: @@ -18,8 +18,8 @@ def __init__(self, params): self.now = datetime.datetime.now() # environment params self.params = params - self.environment_params = params.environment["params"] - self.env_name = params.environment["params"]["env_name"] + self.environment_params = params["environments"] + self.env_name = self.environment_params["env_name"] self.RANDOM_PERTURBATIONS_LEVEL = self.environment_params.get("random_perturbations_level", 0) self.PERTURBATIONS_INTENSITY_STD = self.environment_params.get("perturbations_intensity_std", 0) @@ -29,14 +29,15 @@ def __init__(self, params): self.reward_value = self.environment_params.get("reward_value", 1) self.reward_shaping = self.environment_params.get("reward_shaping", 0) - non_recoverable_angle = self.environment_params[ "non_recoverable_angle" ] # Unfortunately, max_steps is not working with new_step_api=True and it is not giving any benefit. # self.env = gym.make(self.env_name, new_step_api=True, random_start_level=random_start_level) - self.env = gym.make(self.env_name, random_start_level=self.RANDOM_START_LEVEL, initial_pole_angle=self.INITIAL_POLE_ANGLE, - non_recoverable_angle=non_recoverable_angle, punish=self.punish, reward_value=self.reward_value, + self.env = gym.make(self.env_name, random_start_level=self.RANDOM_START_LEVEL, + initial_pole_angle=self.INITIAL_POLE_ANGLE, + non_recoverable_angle=non_recoverable_angle, punish=self.punish, + reward_value=self.reward_value, reward_shaping=self.reward_shaping) self.RUNS = self.environment_params["runs"] # Number of iterations run @@ -66,19 +67,19 @@ def __init__(self, params): "max": [], } # metrics recorded for graph # algorithm params - self.alpha = params.algorithm["params"]["alpha"] - self.epsilon = params.algorithm["params"]["epsilon"] - self.gamma = params.algorithm["params"]["gamma"] + self.alpha = params["algorithm"]["alpha"] + self.epsilon = params["algorithm"]["epsilon"] + self.gamma = params["algorithm"]["gamma"] self.states_counter = {} self.states_reward = {} self.last_time_steps = np.ndarray(0) - self.config = params.settings["params"] + self.config = params["settings"] self.actions = range(self.env.action_space.n) self.env.done = True self.total_episodes = 20000 - self.epsilon_discount = params.algorithm["params"][ + self.epsilon_discount = params["algorithm"][ "epsilon_discount" ] # Default 0.9986 @@ -111,7 +112,7 @@ def evaluate_and_learn_from_step(self, state): self.qlearn.learn(state, action, reward, nextState, done) - return nextState, done + return nextState, done, info["time"] def main(self): @@ -123,10 +124,12 @@ def main(self): if self.config["save_model"]: print(f"\nSaving actions . . .\n") - utils.save_actions_qlearn(self.actions, start_time_format, self.params) + utils.save_actions_qlearn(self.actions, start_time_format) save_metadata("qlearning", start_time_format, self.params) print(LETS_GO) + total_secs = 0 + episode_rewards = [] for run in range(self.RUNS): state = utils.get_discrete_state( @@ -141,21 +144,24 @@ def main(self): if run % self.SHOW_EVERY == 0: self.env.render() # if running RL comment this oustatst - next_state, done = self.evaluate_and_learn_from_step(state) - + next_state, done, secs = self.evaluate_and_learn_from_step(state) + total_secs+=secs if not done: state = next_state self.previousCnt.append(cnt) + episode_rewards.append(cnt) # Add new metrics for graph if run % self.UPDATE_EVERY == 0: - latestRuns = self.previousCnt[-self.UPDATE_EVERY :] + latestRuns = self.previousCnt[-self.UPDATE_EVERY:] averageCnt = sum(latestRuns) / len(latestRuns) + avgsecs = total_secs / sum(latestRuns) + total_secs = 0 self.metrics["ep"].append(run) - self.metrics["avg"].append(averageCnt) self.metrics["min"].append(min(latestRuns)) self.metrics["max"].append(max(latestRuns)) + self.metrics["avg"].append(averageCnt) time_spent = datetime.datetime.now() - self.now self.now = datetime.datetime.now() @@ -173,7 +179,9 @@ def main(self): "time spent", time_spent, "time", - self.now + self.now, + "avg iter time", + avgsecs, ) if run % self.SAVE_EVERY == 0: if self.config["save_model"]: @@ -187,7 +195,7 @@ def main(self): base_file_name = f'_rewards_' file_path = f'./logs/cartpole/qlearning/training/{datetime.datetime.now()}_{base_file_name}.pkl' - store_rewards(self.metrics["avg"], file_path) + store_array(episode_rewards, file_path) # Plot graph plt.plot(self.metrics["ep"], self.metrics["avg"], label="average rewards") diff --git a/rl_studio/agents/cartpole/utils.py b/rl_studio/agents/cartpole/utils.py index d8e06df5b..979e208c5 100755 --- a/rl_studio/agents/cartpole/utils.py +++ b/rl_studio/agents/cartpole/utils.py @@ -32,21 +32,21 @@ def save_model_qlearn(qlearn, current_time, avg): def params_to_markdown_list(dictionary): md_list = [] - for item in dictionary["params"]: - md_list.append({"parameter": item, "value": dictionary["params"][item]}) + for item in dictionary: + md_list.append({"parameter": item, "value": dictionary.get(item)}) return md_list def save_metadata(algorithm, current_time, params): metadata = open("./logs/cartpole/" + algorithm + "/checkpoints/" + current_time + "_metadata.md", "a") metadata.write("AGENT PARAMETERS\n") - metadata.write(markdownTable(params_to_markdown_list(params.agent)).setParams(row_sep='always').getMarkdown()) + metadata.write(markdownTable(params_to_markdown_list(params.get("agent"))).setParams(row_sep='always').getMarkdown()) metadata.write("\n```\n\nSETTINGS PARAMETERS\n") - metadata.write(markdownTable(params_to_markdown_list(params.settings)).setParams(row_sep='always').getMarkdown()) + metadata.write(markdownTable(params_to_markdown_list(params.get("settings"))).setParams(row_sep='always').getMarkdown()) metadata.write("\n```\n\nENVIRONMENT PARAMETERS\n") - metadata.write(markdownTable(params_to_markdown_list(params.environment)).setParams(row_sep='always').getMarkdown()) + metadata.write(markdownTable(params_to_markdown_list(params.get("environments"))).setParams(row_sep='always').getMarkdown()) metadata.write("\n```\n\nALGORITHM PARAMETERS\n") - metadata.write(markdownTable(params_to_markdown_list(params.algorithm)).setParams(row_sep='always').getMarkdown()) + metadata.write(markdownTable(params_to_markdown_list(params.get("algorithm"))).setParams(row_sep='always').getMarkdown()) metadata.close() def save_dqn_model(dqn, current_time, average, params): base_file_name = "_epsilon_{}".format(round(epsilon, 2)) @@ -58,8 +58,16 @@ def save_dqn_model(dqn, current_time, average, params): pickle.dump(dqn.q_net, file_dump) file_dump.close() +def save_ddpg_model(actor, current_time, average): + file_dump = open( + "./logs/cartpole/ddpg/checkpoints/" + current_time + "_actor_avg_" + str( + average) + ".pkl", + "wb", + ) + pickle.dump(actor, file_dump) + file_dump.close() -def save_ppo_model(actor, current_time, average, params): +def save_ppo_model(actor, current_time, average): file_dump = open( "./logs/cartpole/ppo/checkpoints/" + current_time + "_actor_avg_" + str( average) + ".pkl", @@ -68,8 +76,7 @@ def save_ppo_model(actor, current_time, average, params): pickle.dump(actor.model, file_dump) file_dump.close() - -def save_actions_qlearn(actions, start_time, params): +def save_actions_qlearn(actions, start_time): file_dump = open("./logs/cartpole/qlearning/checkpoints/actions_set_" + start_time, "wb") pickle.dump(actions, file_dump) file_dump.close() @@ -170,7 +177,7 @@ def plot_detail_random_perturbations_monitoring(unsuccessful_episodes_count, suc ax16.set(ylabel="Rewards") -def store_rewards(rewards, file_path): +def store_array(rewards, file_path): file_dump = open(file_path, "wb") pickle.dump(rewards, file_dump) diff --git a/rl_studio/agents/f1/inference_followlane_ddpg_f1_gazebo_tf.py b/rl_studio/agents/f1/inference_followlane_ddpg_f1_gazebo_tf.py new file mode 100644 index 000000000..3110e949b --- /dev/null +++ b/rl_studio/agents/f1/inference_followlane_ddpg_f1_gazebo_tf.py @@ -0,0 +1,335 @@ +from datetime import datetime, timedelta +import os +import random +import time + +import gymnasium as gym +import numpy as np +import tensorflow as tf +from tqdm import tqdm + +from rl_studio.agents.f1.loaders import ( + LoadAlgorithmParams, + LoadEnvParams, + LoadEnvVariablesDDPGGazebo, + LoadGlobalParams, +) +from rl_studio.agents.utils import ( + render_params, + save_dataframe_episodes, + LoggingHandler, +) +from rl_studio.algorithms.ddpg import ( + ModifiedTensorBoard, + OUActionNoise, + Buffer, + DDPGAgent, +) +from rl_studio.algorithms.utils import save_actorcritic_model +from rl_studio.envs.gazebo.gazebo_envs import * + + +class InferencerFollowLaneDDPGF1GazeboTF: + """ + Mode: Inference + Task: Follow Lane + Algorithm: DDPG + Agent: F1 + Simulator: Gazebo + Framework: TensorFlow + """ + + def __init__(self, config): + self.algoritmhs_params = LoadAlgorithmParams(config) + self.env_params = LoadEnvParams(config) + self.environment = LoadEnvVariablesDDPGGazebo(config) + self.global_params = LoadGlobalParams(config) + os.makedirs(f"{self.global_params.models_dir}", exist_ok=True) + os.makedirs(f"{self.global_params.logs_dir}", exist_ok=True) + os.makedirs(f"{self.global_params.metrics_data_dir}", exist_ok=True) + os.makedirs(f"{self.global_params.metrics_graphics_dir}", exist_ok=True) + + self.log_file = f"{self.global_params.logs_dir}/{time.strftime('%Y%m%d-%H%M%S')}_{self.global_params.mode}_{self.global_params.task}_{self.global_params.algorithm}_{self.global_params.agent}_{self.global_params.framework}.log" + + def main(self): + + log = LoggingHandler(self.log_file) + + ## Load Environment + env = gym.make(self.env_params.env_name, **self.environment.environment) + + random.seed(1) + np.random.seed(1) + tf.compat.v1.random.set_random_seed(1) + + start_time = datetime.now() + best_epoch = 1 + current_max_reward = 0 + best_step = 0 + best_epoch_training_time = 0 + ## Reset env + state, state_size = env.reset() + + log.logger.info( + f"\nstates = {self.global_params.states}\n" + f"states_set = {self.global_params.states_set}\n" + f"states_len = {len(self.global_params.states_set)}\n" + f"actions = {self.global_params.actions}\n" + f"actions set = {self.global_params.actions_set}\n" + f"actions_len = {len(self.global_params.actions_set)}\n" + f"actions_range = {range(len(self.global_params.actions_set))}\n" + f"batch_size = {self.algoritmhs_params.batch_size}\n" + f"logs_tensorboard_dir = {self.global_params.logs_tensorboard_dir}\n" + ) + + # Init Agents + ac_agent = DDPGAgent( + self.environment.environment, + len(self.global_params.actions_set), + state_size, + self.global_params.models_dir, + ) + ## ----- Load model ----- + model = ac_agent.load_inference_model( + self.global_params.models_dir, self.environment.environment + ) + # Init TensorBoard + tensorboard = ModifiedTensorBoard( + log_dir=f"{self.global_params.logs_tensorboard_dir}/{self.algoritmhs_params.model_name}-{time.strftime('%Y%m%d-%H%M%S')}" + ) + ## ------------- START INFERENCING -------------------- + for episode in tqdm( + range(1, self.env_params.total_episodes + 1), ascii=True, unit="episodes" + ): + tensorboard.step = episode + done = False + cumulated_reward = 0 + step = 1 + start_time_epoch = datetime.now() + + prev_state, prev_state_size = env.reset() + + while not done: + tf_prev_state = tf.expand_dims(tf.convert_to_tensor(prev_state), 0) + # action = ac_agent.policy( + # tf_prev_state, ou_noise, self.global_params.actions + # ) + actions = model.predict(tf_prev_state) + action = [[actions[0][0][0], actions[1][0][0]]] + state, reward, done, _ = env.step(action, step) + cumulated_reward += reward + prev_state = state + step += 1 + + log.logger.debug( + f"\nstate = {state}\n" + f"state type = {type(state)}\n" + f"prev_state = {prev_state}\n" + f"prev_state = {type(prev_state)}\n" + f"action = {action}\n" + f"actions type = {type(action)}\n" + ) + render_params( + task=self.global_params.task, + v=action[0][0], # for continuous actions + w=action[0][1], # for continuous actions + episode=episode, + step=step, + state=state, + # v=self.global_params.actions_set[action][ + # 0 + # ], # this case for discrete + # w=self.global_params.actions_set[action][ + # 1 + # ], # this case for discrete + reward_in_step=reward, + cumulated_reward_in_this_episode=cumulated_reward, + _="--------------------------", + best_episode_until_now=best_epoch, + in_best_step=best_step, + with_highest_reward=int(current_max_reward), + in_best_epoch_training_time=best_epoch_training_time, + ) + log.logger.debug( + f"\nepisode = {episode}\n" + f"step = {step}\n" + f"actions_len = {len(self.global_params.actions_set)}\n" + f"actions_range = {range(len(self.global_params.actions_set))}\n" + f"actions = {self.global_params.actions_set}\n" + f"reward_in_step = {reward}\n" + f"cumulated_reward = {cumulated_reward}\n" + f"done = {done}\n" + ) + # best episode + if current_max_reward <= cumulated_reward: + current_max_reward = cumulated_reward + best_epoch = episode + best_step = step + best_epoch_training_time = datetime.now() - start_time_epoch + # saving params to show + self.global_params.actions_rewards["episode"].append(episode) + self.global_params.actions_rewards["step"].append(step) + # For continuous actios + # self.actions_rewards["v"].append(action[0][0]) + # self.actions_rewards["w"].append(action[0][1]) + self.global_params.actions_rewards["reward"].append(reward) + self.global_params.actions_rewards["center"].append( + env.image_center + ) + + # Showing stats in screen for monitoring. Showing every 'save_every_step' value + if not step % self.env_params.save_every_step: + save_dataframe_episodes( + self.environment.environment, + self.global_params.metrics_data_dir, + self.global_params.aggr_ep_rewards, + self.global_params.actions_rewards, + ) + log.logger.debug( + f"SHOWING BATCH OF STEPS\n" + f"current_max_reward = {cumulated_reward}\n" + f"current epoch = {episode}\n" + f"current step = {step}\n" + f"best epoch so far = {best_epoch}\n" + f"best step so far = {best_step}\n" + f"best_epoch_training_time = {best_epoch_training_time}\n" + ) + ##################################################### + ### save in case of completed steps in one episode + if step >= self.env_params.estimated_steps: + done = True + log.logger.info( + f"\nEPISODE COMPLETED\n" + f"in episode = {episode}\n" + f"steps = {step}\n" + f"cumulated_reward = {cumulated_reward}\n" + ) + save_actorcritic_model( + ac_agent, + self.global_params, + self.algoritmhs_params, + cumulated_reward, + episode, + "LAPCOMPLETED", + ) + + ##################################################### + #### save best lap in episode + if ( + cumulated_reward - self.environment.environment["rewards"]["penal"] + ) >= current_max_reward and episode > 1: + self.global_params.best_current_epoch["best_epoch"].append(best_epoch) + self.global_params.best_current_epoch["highest_reward"].append( + current_max_reward + ) + self.global_params.best_current_epoch["best_step"].append(best_step) + self.global_params.best_current_epoch[ + "best_epoch_training_time" + ].append(best_epoch_training_time) + self.global_params.best_current_epoch[ + "current_total_training_time" + ].append(datetime.now() - start_time) + + save_dataframe_episodes( + self.environment.environment, + self.global_params.metrics_data_dir, + self.global_params.best_current_epoch, + ) + save_actorcritic_model( + ac_agent, + self.global_params, + self.algoritmhs_params, + cumulated_reward, + episode, + "BESTLAP", + ) + log.logger.info( + f"\nsaving best lap\n" + f"in episode = {episode}\n" + f"current_max_reward = {cumulated_reward}\n" + f"steps = {step}\n" + ) + ##################################################### + ### end episode in time settings: 2 hours, 15 hours... + if ( + datetime.now() - timedelta(hours=self.global_params.training_time) + > start_time + ) or (episode > self.env_params.total_episodes): + log.logger.info( + f"\nTraining Time over\n" + f"current_max_reward = {cumulated_reward}\n" + f"epoch = {episode}\n" + f"step = {step}\n" + ) + if cumulated_reward > current_max_reward: + save_actorcritic_model( + ac_agent, + self.global_params, + self.algoritmhs_params, + cumulated_reward, + episode, + "FINISHTIME", + ) + + break + + ##################################################### + ### save every save_episode times + self.global_params.ep_rewards.append(cumulated_reward) + if not episode % self.env_params.save_episodes: + average_reward = sum( + self.global_params.ep_rewards[-self.env_params.save_episodes :] + ) / len(self.global_params.ep_rewards[-self.env_params.save_episodes :]) + min_reward = min( + self.global_params.ep_rewards[-self.env_params.save_episodes :] + ) + max_reward = max( + self.global_params.ep_rewards[-self.env_params.save_episodes :] + ) + tensorboard.update_stats( + reward_avg=int(average_reward), + reward_max=int(max_reward), + steps=step, + ) + self.global_params.aggr_ep_rewards["episode"].append(episode) + self.global_params.aggr_ep_rewards["step"].append(step) + self.global_params.aggr_ep_rewards["avg"].append(average_reward) + self.global_params.aggr_ep_rewards["max"].append(max_reward) + self.global_params.aggr_ep_rewards["min"].append(min_reward) + self.global_params.aggr_ep_rewards["epoch_training_time"].append( + (datetime.now() - start_time_epoch).total_seconds() + ) + self.global_params.aggr_ep_rewards["total_training_time"].append( + (datetime.now() - start_time).total_seconds() + ) + + save_actorcritic_model( + ac_agent, + self.global_params, + self.algoritmhs_params, + cumulated_reward, + episode, + "BATCH", + ) + + save_dataframe_episodes( + self.environment.environment, + self.global_params.metrics_data_dir, + self.global_params.aggr_ep_rewards, + ) + log.logger.info( + f"\nsaving BATCH\n" + f"current_max_reward = {cumulated_reward}\n" + f"best_epoch = {best_epoch}\n" + f"best_step = {best_step}\n" + f"best_epoch_training_time = {best_epoch_training_time}\n" + ) + + ##################################################### + ### save last episode, not neccesarily the best one + save_dataframe_episodes( + self.environment.environment, + self.global_params.metrics_data_dir, + self.global_params.aggr_ep_rewards, + ) + env.close() diff --git a/rl_studio/agents/f1/inference_followlane_dqn_f1_gazebo_tf.py b/rl_studio/agents/f1/inference_followlane_dqn_f1_gazebo_tf.py new file mode 100644 index 000000000..de108889e --- /dev/null +++ b/rl_studio/agents/f1/inference_followlane_dqn_f1_gazebo_tf.py @@ -0,0 +1,319 @@ +from datetime import datetime, timedelta +import os +import random +import time + +import gymnasium as gym +import numpy as np +import tensorflow as tf +from tqdm import tqdm + +from rl_studio.agents.f1.loaders import ( + LoadAlgorithmParams, + LoadEnvParams, + LoadEnvVariablesDQNGazebo, + LoadGlobalParams, +) +from rl_studio.agents.utils import ( + render_params, + save_dataframe_episodes, + save_batch, + save_best_episode_dqn, + LoggingHandler, +) +from rl_studio.algorithms.dqn_keras import ( + ModifiedTensorBoard, + DQN, +) +from rl_studio.envs.gazebo.gazebo_envs import * + + +class InferencerFollowLaneDQNF1GazeboTF: + """ + Mode: Inference + Task: Follow Lane + Algorithm: DQN + Agent: F1 + Simulator: Gazebo + Framework: TensorFlow + """ + + def __init__(self, config): + self.algoritmhs_params = LoadAlgorithmParams(config) + self.env_params = LoadEnvParams(config) + self.environment = LoadEnvVariablesDQNGazebo(config) + self.global_params = LoadGlobalParams(config) + + os.makedirs(f"{self.global_params.models_dir}", exist_ok=True) + os.makedirs(f"{self.global_params.logs_dir}", exist_ok=True) + os.makedirs(f"{self.global_params.metrics_data_dir}", exist_ok=True) + os.makedirs(f"{self.global_params.metrics_graphics_dir}", exist_ok=True) + + self.log_file = f"{self.global_params.logs_dir}/{time.strftime('%Y%m%d-%H%M%S')}_{self.global_params.mode}_{self.global_params.task}_{self.global_params.algorithm}_{self.global_params.agent}_{self.global_params.framework}.log" + + def main(self): + + log = LoggingHandler(self.log_file) + ## Load Environment + env = gym.make(self.env_params.env_name, **self.environment.environment) + + random.seed(1) + np.random.seed(1) + tf.compat.v1.random.set_random_seed(1) + + start_time = datetime.now() + best_epoch = 1 + current_max_reward = 0 + best_step = 0 + best_epoch_training_time = 0 + epsilon = self.algoritmhs_params.epsilon + epsilon_discount = self.algoritmhs_params.epsilon_discount + epsilon_min = self.algoritmhs_params.epsilon_min + + ## Reset env + state, state_size = env.reset() + + log.logger.info( + f"\nstates = {self.global_params.states}\n" + f"states_set = {self.global_params.states_set}\n" + f"states_len = {len(self.global_params.states_set)}\n" + f"actions = {self.global_params.actions}\n" + f"actions set = {self.global_params.actions_set}\n" + f"actions_len = {len(self.global_params.actions_set)}\n" + f"actions_range = {range(len(self.global_params.actions_set))}\n" + f"epsilon = {epsilon}\n" + f"batch_size = {self.algoritmhs_params.batch_size}\n" + f"logs_tensorboard_dir = {self.global_params.logs_tensorboard_dir}\n" + ) + + ## --------------------- Deep Nets ------------------ + # Init Agent + dqn_agent = DQN( + self.environment.environment, + self.algoritmhs_params, + len(self.global_params.actions_set), + state_size, + self.global_params.models_dir, + self.global_params, + ) + ## ----- Load model ----- + model = dqn_agent.load_inference_model( + self.global_params.models_dir, self.environment.environment + ) + # Init TensorBoard + tensorboard = ModifiedTensorBoard( + log_dir=f"{self.global_params.logs_tensorboard_dir}/{self.algoritmhs_params.model_name}-{time.strftime('%Y%m%d-%H%M%S')}" + ) + # show rewards stats per episode + + ## ------------- START INFERENCING -------------------- + for episode in tqdm( + range(1, self.env_params.total_episodes + 1), ascii=True, unit="episodes" + ): + tensorboard.step = episode + done = False + cumulated_reward = 0 + step = 1 + start_time_epoch = datetime.now() + + observation, _ = env.reset() + + while not done: + actions = model.predict(observation) + action = np.argmax(actions) + new_observation, reward, done, _ = env.step(action, step) + + cumulated_reward += reward + observation = new_observation + step += 1 + + log.logger.debug( + f"\nobservation = {observation}\n" + f"observation type = {type(observation)}\n" + f"new_observation = {new_observation}\n" + f"new_observation = {type(new_observation)}\n" + f"action = {action}\n" + f"actions type = {type(action)}\n" + ) + render_params( + task=self.global_params.task, + episode=episode, + step=step, + state=state, + reward_in_step=reward, + cumulated_reward_in_this_episode=cumulated_reward, + _="--------------------------", + best_episode_until_now=best_epoch, + in_best_step=best_step, + with_highest_reward=int(current_max_reward), + in_best_epoch_training_time=best_epoch_training_time, + ) + log.logger.debug( + f"\nepisode = {episode}\n" + f"step = {step}\n" + f"actions_len = {len(self.global_params.actions_set)}\n" + f"actions_range = {range(len(self.global_params.actions_set))}\n" + f"actions = {self.global_params.actions_set}\n" + f"epsilon = {epsilon}\n" + f"observation = {observation}\n" + f"reward_in_step = {reward}\n" + f"cumulated_reward = {cumulated_reward}\n" + f"done = {done}\n" + ) + + # best episode + if current_max_reward <= cumulated_reward: + current_max_reward = cumulated_reward + best_epoch = episode + best_step = step + best_epoch_training_time = datetime.now() - start_time_epoch + # saving params to show + self.global_params.actions_rewards["episode"].append(episode) + self.global_params.actions_rewards["step"].append(step) + # For continuous actios + # self.actions_rewards["v"].append(action[0][0]) + # self.actions_rewards["w"].append(action[0][1]) + self.global_params.actions_rewards["reward"].append(reward) + self.global_params.actions_rewards["center"].append( + env.image_center + ) + + # Showing stats in screen for monitoring. Showing every 'save_every_step' value + if not step % self.env_params.save_every_step: + log.logger.debug( + f"SHOWING BATCH OF STEPS\n" + f"current_max_reward = {cumulated_reward}\n" + f"current epoch = {episode}\n" + f"current step = {step}\n" + f"best epoch so far = {best_epoch}\n" + f"best step so far = {best_step}\n" + f"best_epoch_training_time = {best_epoch_training_time}\n" + ) + ##################################################### + ### save in case of completed steps in one episode + if step >= self.env_params.estimated_steps: + done = True + log.logger.info( + f"\nEPISODE COMPLETED\n" + f"in episode = {episode}\n" + f"steps = {step}\n" + f"cumulated_reward = {cumulated_reward}\n" + f"epsilon = {epsilon}\n" + ) + dqn_agent.model.save( + f"{self.global_params.models_dir}/{self.algoritmhs_params.model_name}_LAPCOMPLETED_Max{int(cumulated_reward)}_Epoch{episode}_inTime{time.strftime('%Y%m%d-%H%M%S')}.model" + ) + + ##################################################### + #### save best lap in episode + if ( + cumulated_reward - self.environment.environment["rewards"]["penal"] + ) >= current_max_reward and episode > 1: + + self.global_params.best_current_epoch["best_epoch"].append(best_epoch) + self.global_params.best_current_epoch["highest_reward"].append( + current_max_reward + ) + self.global_params.best_current_epoch["best_step"].append(best_step) + self.global_params.best_current_epoch[ + "best_epoch_training_time" + ].append(best_epoch_training_time) + self.global_params.best_current_epoch[ + "current_total_training_time" + ].append(datetime.now() - start_time) + + save_dataframe_episodes( + self.environment.environment, + self.global_params.metrics_data_dir, + self.global_params.best_current_epoch, + ) + dqn_agent.model.save( + f"{self.global_params.models_dir}/{self.algoritmhs_params.model_name}_LAPCOMPLETED_Max{int(cumulated_reward)}_Epoch{episode}_inTime{time.strftime('%Y%m%d-%H%M%S')}.model" + ) + + log.logger.info( + f"\nsaving best lap\n" + f"in episode = {episode}\n" + f"current_max_reward = {cumulated_reward}\n" + f"steps = {step}\n" + f"epsilon = {epsilon}\n" + ) + ##################################################### + ### end episode in time settings: 2 hours, 15 hours... + if ( + datetime.now() - timedelta(hours=self.global_params.training_time) + > start_time + ) or (episode > self.env_params.total_episodes): + log.logger.info( + f"\nTraining Time over\n" + f"current_max_reward = {cumulated_reward}\n" + f"epoch = {episode}\n" + f"step = {step}\n" + f"epsilon = {epsilon}\n" + ) + if cumulated_reward > current_max_reward: + dqn_agent.model.save( + f"{self.global_params.models_dir}/{self.algoritmhs_params.model_name}_LAPCOMPLETED_Max{int(cumulated_reward)}_Epoch{episode}_inTime{time.strftime('%Y%m%d-%H%M%S')}.model" + ) + + break + + ##################################################### + ### save every save_episode times + self.global_params.ep_rewards.append(cumulated_reward) + if not episode % self.env_params.save_episodes: + average_reward = sum( + self.global_params.ep_rewards[-self.env_params.save_episodes :] + ) / len(self.global_params.ep_rewards[-self.env_params.save_episodes :]) + min_reward = min( + self.global_params.ep_rewards[-self.env_params.save_episodes :] + ) + max_reward = max( + self.global_params.ep_rewards[-self.env_params.save_episodes :] + ) + tensorboard.update_stats( + reward_avg=int(average_reward), + reward_max=int(max_reward), + steps=step, + ) + self.global_params.aggr_ep_rewards["episode"].append(episode) + self.global_params.aggr_ep_rewards["step"].append(step) + self.global_params.aggr_ep_rewards["avg"].append(average_reward) + self.global_params.aggr_ep_rewards["max"].append(max_reward) + self.global_params.aggr_ep_rewards["min"].append(min_reward) + self.global_params.aggr_ep_rewards["epoch_training_time"].append( + (datetime.now() - start_time_epoch).total_seconds() + ) + self.global_params.aggr_ep_rewards["total_training_time"].append( + (datetime.now() - start_time).total_seconds() + ) + if max_reward > current_max_reward: + dqn_agent.model.save( + f"{self.global_params.models_dir}/{self.algoritmhs_params.model_name}_LAPCOMPLETED_Max{int(cumulated_reward)}_Epoch{episode}_inTime{time.strftime('%Y%m%d-%H%M%S')}.model" + ) + save_dataframe_episodes( + self.environment.environment, + self.global_params.metrics_data_dir, + self.global_params.aggr_ep_rewards, + ) + log.logger.info( + f"\nsaving BATCH\n" + f"current_max_reward = {cumulated_reward}\n" + f"best_epoch = {best_epoch}\n" + f"best_step = {best_step}\n" + f"best_epoch_training_time = {best_epoch_training_time}\n" + ) + save_dataframe_episodes( + self.environment.environment, + self.global_params.metrics_data_dir, + self.global_params.aggr_ep_rewards, + ) + + ##################################################### + ### save last episode, not neccesarily the best one + save_dataframe_episodes( + self.environment.environment, + self.global_params.metrics_data_dir, + self.global_params.aggr_ep_rewards, + ) + env.close() diff --git a/rl_studio/agents/f1/inference_followlane_qlearn_f1_gazebo.py b/rl_studio/agents/f1/inference_followlane_qlearn_f1_gazebo.py new file mode 100644 index 000000000..e1cb5025b --- /dev/null +++ b/rl_studio/agents/f1/inference_followlane_qlearn_f1_gazebo.py @@ -0,0 +1,271 @@ +from datetime import datetime, timedelta +import os +import time + +import gymnasium as gym +import numpy as np +from reloading import reloading +from tqdm import tqdm + +from rl_studio.agents.f1.loaders import ( + LoadAlgorithmParams, + LoadEnvParams, + LoadEnvVariablesQlearnGazebo, + LoadGlobalParams, +) +from rl_studio.agents.utils import ( + render_params, + save_dataframe_episodes, + save_batch, + save_best_episode, + LoggingHandler, +) +from rl_studio.algorithms.qlearn import QLearn, QLearnF1 +from rl_studio.envs.gazebo.gazebo_envs import * + + +class InferencerFollowLaneQlearnF1Gazebo: + """ + Mode: Inference + Task: Follow Lane + Algorithm: Qlearn + Agent: F1 + Simulator: Gazebo + """ + + def __init__(self, config): + self.algoritmhs_params = LoadAlgorithmParams(config) + self.env_params = LoadEnvParams(config) + self.environment = LoadEnvVariablesQlearnGazebo(config) + self.global_params = LoadGlobalParams(config) + + os.makedirs(f"{self.global_params.models_dir}", exist_ok=True) + os.makedirs(f"{self.global_params.logs_dir}", exist_ok=True) + os.makedirs(f"{self.global_params.metrics_data_dir}", exist_ok=True) + os.makedirs(f"{self.global_params.metrics_graphics_dir}", exist_ok=True) + + self.log_file = f"{self.global_params.logs_dir}/{time.strftime('%Y%m%d-%H%M%S')}_{self.global_params.mode}_{self.global_params.task}_{self.global_params.algorithm}_{self.global_params.agent}_{self.global_params.framework}.log" + + def main(self): + + log = LoggingHandler(self.log_file) + + ## Load Environment + env = gym.make(self.env_params.env_name, **self.environment.environment) + + start_time = datetime.now() + best_epoch = 1 + current_max_reward = 0 + best_step = 0 + best_epoch_training_time = 0 + epsilon = self.environment.environment["epsilon"] + epsilon_decay = epsilon / (self.env_params.total_episodes // 2) + # states_counter = {} + + log.logger.info( + f"\nstates = {self.global_params.states}\n" + f"states_set = {self.global_params.states_set}\n" + f"states_len = {len(self.global_params.states_set)}\n" + f"actions = {self.global_params.actions}\n" + f"actions set = {self.global_params.actions_set}\n" + f"actions_len = {len(self.global_params.actions_set)}\n" + f"actions_range = {range(len(self.global_params.actions_set))}\n" + f"epsilon = {epsilon}\n" + f"epsilon_decay = {epsilon_decay}\n" + f"alpha = {self.environment.environment['alpha']}\n" + f"gamma = {self.environment.environment['gamma']}\n" + ) + ## --- init Qlearn + qlearn = QLearnF1( + len(self.global_params.states_set), + self.global_params.actions, + len(self.global_params.actions_set), + self.environment.environment["epsilon"], + self.environment.environment["alpha"], + self.environment.environment["gamma"], + self.environment.environment["num_regions"], + ) + + ## load q model + qlearn.load_table( + f"{self.global_params.models_dir}/{self.environment.environment['inference_qlearn_model_name']}" + ) + + ## ------------- START TRAINING -------------------- + for episode in tqdm( + range(1, self.env_params.total_episodes + 1), + ascii=True, + unit="episodes", + ): + done = False + cumulated_reward = 0 + step = 0 + start_time_epoch = datetime.now() + + ## reset env() + observation, _ = env.reset() + + while not done: + step += 1 + # Pick an action based on the current state + action = qlearn.inference(observation) + + # Execute the action and get feedback + new_observation, reward, done, _ = env.step(action, step) + cumulated_reward += reward + + log.logger.debug( + f"\nobservation = {observation}\n" + f"observation[0]= {observation[0]}\n" + f"observation type = {type(observation)}\n" + f"observation[0] type = {type(observation[0])}\n" + f"new_observation = {new_observation}\n" + f"new_observation = {type(new_observation)}\n" + f"action = {action}\n" + f"actions type = {type(action)}\n" + ) + + observation = new_observation + + # render params + render_params( + action=action, + episode=episode, + step=step, + v=self.global_params.actions_set[action][ + 0 + ], # this case for discrete + w=self.global_params.actions_set[action][ + 1 + ], # this case for discrete + epsilon=epsilon, + observation=observation, + reward_in_step=reward, + cumulated_reward=cumulated_reward, + done=done, + ) + + log.logger.debug( + f"\nepisode = {episode}\n" + f"step = {step}\n" + f"actions_len = {len(self.global_params.actions_set)}\n" + f"actions_range = {range(len(self.global_params.actions_set))}\n" + f"actions = {self.global_params.actions_set}\n" + f"v = {self.global_params.actions_set[action][0]}\n" + f"w = {self.global_params.actions_set[action][1]}\n" + f"observation = {observation}\n" + f"reward_in_step = {reward}\n" + f"cumulated_reward = {cumulated_reward}\n" + f"done = {done}\n" + ) + + # best episode and step's stats + if current_max_reward <= cumulated_reward and episode > 1: + ( + current_max_reward, + best_epoch, + best_step, + best_epoch_training_time, + ) = save_best_episode( + self.global_params, + cumulated_reward, + episode, + step, + start_time_epoch, + reward, + env.image_center, + ) + + # Showing stats in screen for monitoring. Showing every 'save_every_step' value + if not step % self.env_params.save_every_step: + log.logger.debug( + f"SHOWING BATCH OF STEPS\n" + f"current_max_reward = {cumulated_reward}\n" + f"current epoch = {episode}\n" + f"current step = {step}\n" + f"best epoch so far = {best_epoch}\n" + f"best step so far = {best_step}\n" + f"best_epoch_training_time = {best_epoch_training_time}\n" + ) + + # End epoch + if step > self.env_params.estimated_steps: + done = True + log.logger.info( + f"\nEPISODE COMPLETED\n" + f"in episode = {episode}\n" + f"steps = {step}\n" + f"cumulated_reward = {cumulated_reward}\n" + ) + + # Save best lap + if cumulated_reward >= current_max_reward: + self.global_params.best_current_epoch["best_epoch"].append(best_epoch) + self.global_params.best_current_epoch["highest_reward"].append( + cumulated_reward + ) + self.global_params.best_current_epoch["best_step"].append(best_step) + self.global_params.best_current_epoch[ + "best_epoch_training_time" + ].append(best_epoch_training_time) + self.global_params.best_current_epoch[ + "current_total_training_time" + ].append(datetime.now() - start_time) + save_dataframe_episodes( + self.environment.environment, + self.global_params.metrics_data_dir, + self.global_params.best_current_epoch, + ) + log.logger.info( + f"\nsaving best lap\n" + f"in episode = {episode}\n" + f"current_max_reward = {cumulated_reward}\n" + f"steps = {step}\n" + f"epsilon = {epsilon}\n" + ) + # end of training by: + # training time setting: 2 hours, 15 hours... + # num epochs + if ( + datetime.now() - timedelta(hours=self.global_params.training_time) + > start_time + ) or (episode > self.env_params.total_episodes): + if cumulated_reward >= current_max_reward: + save_dataframe_episodes( + self.environment.environment, + self.global_params.metrics_data_dir, + self.global_params.best_current_epoch, + ) + log.logger.info( + f"\nTraining Time over\n" + f"current_max_reward = {cumulated_reward}\n" + f"epoch = {episode}\n" + f"step = {step}\n" + ) + break + + # save best values every save_episode times + self.global_params.ep_rewards.append(cumulated_reward) + if not episode % self.env_params.save_episodes: + self.global_params.aggr_ep_rewards = save_batch( + episode, + step, + start_time_epoch, + start_time, + self.global_params, + self.env_params, + ) + save_dataframe_episodes( + self.environment.environment, + self.global_params.metrics_data_dir, + self.global_params.aggr_ep_rewards, + ) + log.logger.info( + f"\nsaving BATCH\n" + f"current_max_reward = {cumulated_reward}\n" + f"best_epoch = {best_epoch}\n" + f"best_step = {best_step}\n" + f"best_epoch_training_time = {best_epoch_training_time}\n" + ) + + env.close() diff --git a/rl_studio/agents/f1/inference_followline_ddpg_f1_gazebo_tf.py b/rl_studio/agents/f1/inference_followline_ddpg_f1_gazebo_tf.py new file mode 100644 index 000000000..9bc502436 --- /dev/null +++ b/rl_studio/agents/f1/inference_followline_ddpg_f1_gazebo_tf.py @@ -0,0 +1,336 @@ +from datetime import datetime, timedelta +import os +import random +import time + +import gymnasium as gym +import numpy as np +import tensorflow as tf +from tqdm import tqdm + +from rl_studio.agents.f1.loaders import ( + LoadAlgorithmParams, + LoadEnvParams, + LoadEnvVariablesDDPGGazebo, + LoadGlobalParams, +) +from rl_studio.agents.utils import ( + render_params, + save_dataframe_episodes, + LoggingHandler, +) +from rl_studio.algorithms.ddpg import ( + ModifiedTensorBoard, + OUActionNoise, + Buffer, + DDPGAgent, +) +from rl_studio.algorithms.utils import save_actorcritic_model +from rl_studio.envs.gazebo.gazebo_envs import * + + +class InferencerFollowLineDDPGF1GazeboTF: + """ + Mode: Inference + Task: Follow Line + Algorithm: DDPG + Agent: F1 + Simulator: Gazebo + Framework: TensorFlow + """ + + def __init__(self, config): + self.algoritmhs_params = LoadAlgorithmParams(config) + self.env_params = LoadEnvParams(config) + self.environment = LoadEnvVariablesDDPGGazebo(config) + self.global_params = LoadGlobalParams(config) + + os.makedirs(f"{self.global_params.models_dir}", exist_ok=True) + os.makedirs(f"{self.global_params.logs_dir}", exist_ok=True) + os.makedirs(f"{self.global_params.metrics_data_dir}", exist_ok=True) + os.makedirs(f"{self.global_params.metrics_graphics_dir}", exist_ok=True) + + self.log_file = f"{self.global_params.logs_dir}/{time.strftime('%Y%m%d-%H%M%S')}_{self.global_params.mode}_{self.global_params.task}_{self.global_params.algorithm}_{self.global_params.agent}_{self.global_params.framework}.log" + + def main(self): + + log = LoggingHandler(self.log_file) + ## Load Environment + env = gym.make(self.env_params.env_name, **self.environment.environment) + + random.seed(1) + np.random.seed(1) + tf.compat.v1.random.set_random_seed(1) + + start_time = datetime.now() + best_epoch = 1 + current_max_reward = 0 + best_step = 0 + best_epoch_training_time = 0 + ## Reset env + state, state_size = env.reset() + + log.logger.info( + f"\nstates = {self.global_params.states}\n" + f"states_set = {self.global_params.states_set}\n" + f"states_len = {len(self.global_params.states_set)}\n" + f"actions = {self.global_params.actions}\n" + f"actions set = {self.global_params.actions_set}\n" + f"actions_len = {len(self.global_params.actions_set)}\n" + f"actions_range = {range(len(self.global_params.actions_set))}\n" + f"batch_size = {self.algoritmhs_params.batch_size}\n" + f"logs_tensorboard_dir = {self.global_params.logs_tensorboard_dir}\n" + ) + # Init Agents + ac_agent = DDPGAgent( + self.environment.environment, + len(self.global_params.actions_set), + state_size, + self.global_params.models_dir, + ) + ## ----- Load model ----- + model = ac_agent.load_inference_model( + self.global_params.models_dir, self.environment.environment + ) + + # Init TensorBoard + tensorboard = ModifiedTensorBoard( + log_dir=f"{self.global_params.logs_tensorboard_dir}/{self.algoritmhs_params.model_name}-{time.strftime('%Y%m%d-%H%M%S')}" + ) + + ## ------------- START INFERENCING -------------------- + for episode in tqdm( + range(1, self.env_params.total_episodes + 1), ascii=True, unit="episodes" + ): + tensorboard.step = episode + done = False + cumulated_reward = 0 + step = 1 + start_time_epoch = datetime.now() + + prev_state, prev_state_size = env.reset() + + while not done: + tf_prev_state = tf.expand_dims(tf.convert_to_tensor(prev_state), 0) + # action = ac_agent.policy( + # tf_prev_state, ou_noise, self.global_params.actions + # ) + actions = model.predict(tf_prev_state) + action = [[actions[0][0][0], actions[1][0][0]]] + state, reward, done, _ = env.step(action, step) + cumulated_reward += reward + prev_state = state + step += 1 + + log.logger.debug( + f"\nstate = {state}\n" + f"state type = {type(state)}\n" + f"prev_state = {prev_state}\n" + f"prev_state = {type(prev_state)}\n" + f"action = {action}\n" + f"actions type = {type(action)}\n" + ) + render_params( + task=self.global_params.task, + v=action[0][0], # for continuous actions + w=action[0][1], # for continuous actions + episode=episode, + step=step, + state=state, + # v=self.global_params.actions_set[action][ + # 0 + # ], # this case for discrete + # w=self.global_params.actions_set[action][ + # 1 + # ], # this case for discrete + reward_in_step=reward, + cumulated_reward_in_this_episode=cumulated_reward, + _="--------------------------", + best_episode_until_now=best_epoch, + in_best_step=best_step, + with_highest_reward=int(current_max_reward), + in_best_epoch_training_time=best_epoch_training_time, + ) + log.logger.debug( + f"\nepisode = {episode}\n" + f"step = {step}\n" + f"actions_len = {len(self.global_params.actions_set)}\n" + f"actions_range = {range(len(self.global_params.actions_set))}\n" + f"actions = {self.global_params.actions_set}\n" + f"reward_in_step = {reward}\n" + f"cumulated_reward = {cumulated_reward}\n" + f"done = {done}\n" + ) + # best episode + if current_max_reward <= cumulated_reward: + current_max_reward = cumulated_reward + best_epoch = episode + best_step = step + best_epoch_training_time = datetime.now() - start_time_epoch + # saving params to show + self.global_params.actions_rewards["episode"].append(episode) + self.global_params.actions_rewards["step"].append(step) + # For continuous actios + # self.actions_rewards["v"].append(action[0][0]) + # self.actions_rewards["w"].append(action[0][1]) + self.global_params.actions_rewards["reward"].append(reward) + self.global_params.actions_rewards["center"].append( + env.image_center + ) + + # Showing stats in screen for monitoring. Showing every 'save_every_step' value + if not step % self.env_params.save_every_step: + save_dataframe_episodes( + self.environment.environment, + self.global_params.metrics_data_dir, + self.global_params.aggr_ep_rewards, + self.global_params.actions_rewards, + ) + log.logger.debug( + f"SHOWING BATCH OF STEPS\n" + f"current_max_reward = {cumulated_reward}\n" + f"current epoch = {episode}\n" + f"current step = {step}\n" + f"best epoch so far = {best_epoch}\n" + f"best step so far = {best_step}\n" + f"best_epoch_training_time = {best_epoch_training_time}\n" + ) + ##################################################### + ### save in case of completed steps in one episode + if step >= self.env_params.estimated_steps: + done = True + log.logger.info( + f"\nEPISODE COMPLETED\n" + f"in episode = {episode}\n" + f"steps = {step}\n" + f"cumulated_reward = {cumulated_reward}\n" + ) + save_actorcritic_model( + ac_agent, + self.global_params, + self.algoritmhs_params, + cumulated_reward, + episode, + "LAPCOMPLETED", + ) + ##################################################### + #### save best lap in episode + if ( + cumulated_reward - self.environment.environment["rewards"]["penal"] + ) >= current_max_reward and episode > 1: + self.global_params.best_current_epoch["best_epoch"].append(best_epoch) + self.global_params.best_current_epoch["highest_reward"].append( + current_max_reward + ) + self.global_params.best_current_epoch["best_step"].append(best_step) + self.global_params.best_current_epoch[ + "best_epoch_training_time" + ].append(best_epoch_training_time) + self.global_params.best_current_epoch[ + "current_total_training_time" + ].append(datetime.now() - start_time) + + save_dataframe_episodes( + self.environment.environment, + self.global_params.metrics_data_dir, + self.global_params.best_current_epoch, + ) + save_actorcritic_model( + ac_agent, + self.global_params, + self.algoritmhs_params, + cumulated_reward, + episode, + "BESTLAP", + ) + log.logger.info( + f"\nsaving best lap\n" + f"in episode = {episode}\n" + f"current_max_reward = {cumulated_reward}\n" + f"steps = {step}\n" + ) + + ##################################################### + ### end episode in time settings: 2 hours, 15 hours... + if ( + datetime.now() - timedelta(hours=self.global_params.training_time) + > start_time + ) or (episode > self.env_params.total_episodes): + log.logger.info( + f"\nTraining Time over\n" + f"current_max_reward = {cumulated_reward}\n" + f"epoch = {episode}\n" + f"step = {step}\n" + ) + if cumulated_reward > current_max_reward: + save_actorcritic_model( + ac_agent, + self.global_params, + self.algoritmhs_params, + cumulated_reward, + episode, + "FINISHTIME", + ) + + break + + ##################################################### + ### save every save_episode times + self.global_params.ep_rewards.append(cumulated_reward) + if not episode % self.env_params.save_episodes: + average_reward = sum( + self.global_params.ep_rewards[-self.env_params.save_episodes :] + ) / len(self.global_params.ep_rewards[-self.env_params.save_episodes :]) + min_reward = min( + self.global_params.ep_rewards[-self.env_params.save_episodes :] + ) + max_reward = max( + self.global_params.ep_rewards[-self.env_params.save_episodes :] + ) + tensorboard.update_stats( + reward_avg=int(average_reward), + reward_max=int(max_reward), + steps=step, + ) + self.global_params.aggr_ep_rewards["episode"].append(episode) + self.global_params.aggr_ep_rewards["step"].append(step) + self.global_params.aggr_ep_rewards["avg"].append(average_reward) + self.global_params.aggr_ep_rewards["max"].append(max_reward) + self.global_params.aggr_ep_rewards["min"].append(min_reward) + self.global_params.aggr_ep_rewards["epoch_training_time"].append( + (datetime.now() - start_time_epoch).total_seconds() + ) + self.global_params.aggr_ep_rewards["total_training_time"].append( + (datetime.now() - start_time).total_seconds() + ) + + save_actorcritic_model( + ac_agent, + self.global_params, + self.algoritmhs_params, + cumulated_reward, + episode, + "BATCH", + ) + + save_dataframe_episodes( + self.environment.environment, + self.global_params.metrics_data_dir, + self.global_params.aggr_ep_rewards, + ) + log.logger.info( + f"\nsaving BATCH\n" + f"current_max_reward = {cumulated_reward}\n" + f"best_epoch = {best_epoch}\n" + f"best_step = {best_step}\n" + f"best_epoch_training_time = {best_epoch_training_time}\n" + ) + + ##################################################### + ### save last episode, not neccesarily the best one + save_dataframe_episodes( + self.environment.environment, + self.global_params.metrics_data_dir, + self.global_params.aggr_ep_rewards, + ) + env.close() diff --git a/rl_studio/agents/f1/inference_followline_qlearn_f1_gazebo.py b/rl_studio/agents/f1/inference_followline_qlearn_f1_gazebo.py new file mode 100644 index 000000000..ed18985c2 --- /dev/null +++ b/rl_studio/agents/f1/inference_followline_qlearn_f1_gazebo.py @@ -0,0 +1,271 @@ +from datetime import datetime, timedelta +import os +import time + +import gymnasium as gym +import numpy as np +from reloading import reloading +from tqdm import tqdm + +from rl_studio.agents.f1.loaders import ( + LoadAlgorithmParams, + LoadEnvParams, + LoadEnvVariablesQlearnGazebo, + LoadGlobalParams, +) +from rl_studio.agents.utils import ( + render_params, + save_dataframe_episodes, + save_batch, + save_best_episode, + LoggingHandler, +) +from rl_studio.algorithms.qlearn import QLearn, QLearnF1 +from rl_studio.envs.gazebo.gazebo_envs import * + + +class InferencerFollowLineQlearnF1Gazebo: + """ + Mode: Inference + Task: Follow Line + Algorithm: Qlearn + Agent: F1 + Simulator: Gazebo + """ + + def __init__(self, config): + self.algoritmhs_params = LoadAlgorithmParams(config) + self.env_params = LoadEnvParams(config) + self.environment = LoadEnvVariablesQlearnGazebo(config) + self.global_params = LoadGlobalParams(config) + + os.makedirs(f"{self.global_params.models_dir}", exist_ok=True) + os.makedirs(f"{self.global_params.logs_dir}", exist_ok=True) + os.makedirs(f"{self.global_params.metrics_data_dir}", exist_ok=True) + os.makedirs(f"{self.global_params.metrics_graphics_dir}", exist_ok=True) + + self.log_file = f"{self.global_params.logs_dir}/{time.strftime('%Y%m%d-%H%M%S')}_{self.global_params.mode}_{self.global_params.task}_{self.global_params.algorithm}_{self.global_params.agent}_{self.global_params.framework}.log" + + def main(self): + + log = LoggingHandler(self.log_file) + + ## Load Environment + env = gym.make(self.env_params.env_name, **self.environment.environment) + + start_time = datetime.now() + best_epoch = 1 + current_max_reward = 0 + best_step = 0 + best_epoch_training_time = 0 + epsilon = self.environment.environment["epsilon"] + epsilon_decay = epsilon / (self.env_params.total_episodes // 2) + # states_counter = {} + + log.logger.info( + f"\nstates = {self.global_params.states}\n" + f"states_set = {self.global_params.states_set}\n" + f"states_len = {len(self.global_params.states_set)}\n" + f"actions = {self.global_params.actions}\n" + f"actions set = {self.global_params.actions_set}\n" + f"actions_len = {len(self.global_params.actions_set)}\n" + f"actions_range = {range(len(self.global_params.actions_set))}\n" + f"epsilon = {epsilon}\n" + f"epsilon_decay = {epsilon_decay}\n" + f"alpha = {self.environment.environment['alpha']}\n" + f"gamma = {self.environment.environment['gamma']}\n" + ) + ## --- init Qlearn + qlearn = QLearnF1( + len(self.global_params.states_set), + self.global_params.actions, + len(self.global_params.actions_set), + self.environment.environment["epsilon"], + self.environment.environment["alpha"], + self.environment.environment["gamma"], + self.environment.environment["num_regions"], + ) + + ## load q model + qlearn.load_table( + f"{self.global_params.models_dir}/{self.environment.environment['inference_qlearn_model_name']}" + ) + + ## ------------- START TRAINING -------------------- + for episode in tqdm( + range(1, self.env_params.total_episodes + 1), + ascii=True, + unit="episodes", + ): + done = False + cumulated_reward = 0 + step = 0 + start_time_epoch = datetime.now() + + ## reset env() + observation, _ = env.reset() + + while not done: + step += 1 + # Pick an action based on the current state + action = qlearn.inference(observation) + + # Execute the action and get feedback + new_observation, reward, done, _ = env.step(action, step) + cumulated_reward += reward + + log.logger.debug( + f"\nobservation = {observation}\n" + f"observation[0]= {observation[0]}\n" + f"observation type = {type(observation)}\n" + f"observation[0] type = {type(observation[0])}\n" + f"new_observation = {new_observation}\n" + f"new_observation = {type(new_observation)}\n" + f"action = {action}\n" + f"actions type = {type(action)}\n" + ) + + observation = new_observation + + # render params + render_params( + action=action, + episode=episode, + step=step, + v=self.global_params.actions_set[action][ + 0 + ], # this case for discrete + w=self.global_params.actions_set[action][ + 1 + ], # this case for discrete + epsilon=epsilon, + observation=observation, + reward_in_step=reward, + cumulated_reward=cumulated_reward, + done=done, + ) + + log.logger.debug( + f"\nepisode = {episode}\n" + f"step = {step}\n" + f"actions_len = {len(self.global_params.actions_set)}\n" + f"actions_range = {range(len(self.global_params.actions_set))}\n" + f"actions = {self.global_params.actions_set}\n" + f"v = {self.global_params.actions_set[action][0]}\n" + f"w = {self.global_params.actions_set[action][1]}\n" + f"observation = {observation}\n" + f"reward_in_step = {reward}\n" + f"cumulated_reward = {cumulated_reward}\n" + f"done = {done}\n" + ) + + # best episode and step's stats + if current_max_reward <= cumulated_reward and episode > 1: + ( + current_max_reward, + best_epoch, + best_step, + best_epoch_training_time, + ) = save_best_episode( + self.global_params, + cumulated_reward, + episode, + step, + start_time_epoch, + reward, + env.image_center, + ) + + # Showing stats in screen for monitoring. Showing every 'save_every_step' value + if not step % self.env_params.save_every_step: + log.logger.debug( + f"SHOWING BATCH OF STEPS\n" + f"current_max_reward = {cumulated_reward}\n" + f"current epoch = {episode}\n" + f"current step = {step}\n" + f"best epoch so far = {best_epoch}\n" + f"best step so far = {best_step}\n" + f"best_epoch_training_time = {best_epoch_training_time}\n" + ) + + # End epoch + if step > self.env_params.estimated_steps: + done = True + log.logger.info( + f"\nEPISODE COMPLETED\n" + f"in episode = {episode}\n" + f"steps = {step}\n" + f"cumulated_reward = {cumulated_reward}\n" + ) + + # Save best lap + if cumulated_reward >= current_max_reward: + self.global_params.best_current_epoch["best_epoch"].append(best_epoch) + self.global_params.best_current_epoch["highest_reward"].append( + cumulated_reward + ) + self.global_params.best_current_epoch["best_step"].append(best_step) + self.global_params.best_current_epoch[ + "best_epoch_training_time" + ].append(best_epoch_training_time) + self.global_params.best_current_epoch[ + "current_total_training_time" + ].append(datetime.now() - start_time) + save_dataframe_episodes( + self.environment.environment, + self.global_params.metrics_data_dir, + self.global_params.best_current_epoch, + ) + log.logger.info( + f"\nsaving best lap\n" + f"in episode = {episode}\n" + f"current_max_reward = {cumulated_reward}\n" + f"steps = {step}\n" + f"epsilon = {epsilon}\n" + ) + # end of training by: + # training time setting: 2 hours, 15 hours... + # num epochs + if ( + datetime.now() - timedelta(hours=self.global_params.training_time) + > start_time + ) or (episode > self.env_params.total_episodes): + if cumulated_reward >= current_max_reward: + save_dataframe_episodes( + self.environment.environment, + self.global_params.metrics_data_dir, + self.global_params.best_current_epoch, + ) + log.logger.info( + f"\nTraining Time over\n" + f"current_max_reward = {cumulated_reward}\n" + f"epoch = {episode}\n" + f"step = {step}\n" + ) + break + + # save best values every save_episode times + self.global_params.ep_rewards.append(cumulated_reward) + if not episode % self.env_params.save_episodes: + self.global_params.aggr_ep_rewards = save_batch( + episode, + step, + start_time_epoch, + start_time, + self.global_params, + self.env_params, + ) + save_dataframe_episodes( + self.environment.environment, + self.global_params.metrics_data_dir, + self.global_params.aggr_ep_rewards, + ) + log.logger.info( + f"\nsaving BATCH\n" + f"current_max_reward = {cumulated_reward}\n" + f"best_epoch = {best_epoch}\n" + f"best_step = {best_step}\n" + f"best_epoch_training_time = {best_epoch_training_time}\n" + ) + + env.close() diff --git a/rl_studio/agents/f1/inference_qlearn.py b/rl_studio/agents/f1/inference_qlearn.py deleted file mode 100644 index b4c987e80..000000000 --- a/rl_studio/agents/f1/inference_qlearn.py +++ /dev/null @@ -1,194 +0,0 @@ -import datetime -import time -from functools import reduce -from pprint import pprint - -import gym -import numpy as np - -from rl_studio.agents import liveplot -from rl_studio.agents.f1 import utils -from rl_studio.agents.f1.settings import QLearnConfig -from rl_studio.visual.ascii.images import JDEROBOT_LOGO -from rl_studio.visual.ascii.text import JDEROBOT, QLEARN_CAMERA, LETS_GO -from rl_studio.wrappers.inference_rlstudio import InferencerWrapper - - -class F1Inferencer: - def __init__(self, params): - # TODO: Create a pydantic metaclass to simplifyactions thactionse way we extract the params - # environment params - self.params = params - self.environment_params = params.environment["params"] - self.env_name = params.environment["params"]["env_name"] - env_params = params.environment["params"] - actions = params.environment["actions"] - env_params["actions"] = actions - self.env = gym.make(self.env_name, **env_params) - # algorithm params - self.alpha = params.algorithm["params"]["alpha"] - self.epsilon = params.algorithm["params"]["epsilon"] - self.gamma = params.algorithm["params"]["gamma"] - self.inference_file = params.inference["params"]["inference_file"] - self.actions_file = params.inference["params"]["actions_file"] - # agent - # self.action_number = params.agent["params"]["actions_number"] - # self.actions_set = params.agent["params"]["actions_set"] - # self.actions_values = params.agent["params"]["available_actions"][self.actions_set] - - def main(self): - - print(JDEROBOT) - print(JDEROBOT_LOGO) - print(QLEARN_CAMERA) - print(f"\t- Start hour: {datetime.datetime.now()}\n") - pprint(f"\t- Environment params:\n{self.environment_params}", indent=4) - config = QLearnConfig() - - # TODO: Move init method - outdir = "./logs/f1_qlearn_gym_experiments/" - stats = {} # epoch: steps - states_counter = {} - states_reward = {} - - plotter = liveplot.LivePlot(outdir) - - last_time_steps = np.ndarray(0) - - self.actions = range(3) # range(env.action_space.n) - env = gym.wrappers.Monitor(self.env, outdir, force=True) - counter = 0 - estimate_step_per_lap = self.environment_params["estimated_steps"] - lap_completed = False - total_episodes = 20000 - epsilon_discount = 0.9986 # Default 0.9986 - - # TODO: Call the algorithm factory passing "qlearn" as parameter. - self.inferencer = InferencerWrapper( - "qlearn", self.inference_file, self.actions_file - ) - - highest_reward = 0 - - telemetry_start_time = time.time() - start_time = datetime.datetime.now() - start_time_format = start_time.strftime("%Y%m%d_%H%M") - - print(LETS_GO) - - previous = datetime.datetime.now() - checkpoints = [] # "ID" - x, y - time - - # START - for episode in range(total_episodes): - - counter = 0 - done = False - lap_completed = False - - cumulated_reward = 0 - observation = env.reset() - - state = "".join(map(str, observation)) - - for step in range(500000): - - counter += 1 - - # Pick an action based on the current state - action = self.inferencer.inference(state) - - # Execute the action and get feedback - observation, reward, done, info = env.step(action) - cumulated_reward += reward - - if highest_reward < cumulated_reward: - highest_reward = cumulated_reward - - nextState = "".join(map(str, observation)) - - try: - states_counter[nextState] += 1 - except KeyError: - states_counter[nextState] = 1 - - env._flush(force=True) - - if config.save_positions: - now = datetime.datetime.now() - if now - datetime.timedelta(seconds=3) > previous: - previous = datetime.datetime.now() - x, y = env.get_position() - checkpoints.append( - [ - len(checkpoints), - (x, y), - datetime.datetime.now().strftime("%M:%S.%f")[-4], - ] - ) - - if ( - datetime.datetime.now() - - datetime.timedelta(minutes=3, seconds=12) - > start_time - ): - print("Finish. Saving parameters . . .") - utils.save_times(checkpoints) - env.close() - exit(0) - - if not done: - state = nextState - else: - last_time_steps = np.append(last_time_steps, [int(step + 1)]) - stats[int(episode)] = step - states_reward[int(episode)] = cumulated_reward - print( - f"EP: {episode + 1} - Reward: {cumulated_reward}" - f" - Time: {start_time_format} - Steps: {step}" - ) - break - - if step > estimate_step_per_lap and not lap_completed: - lap_completed = True - if config.plotter_graphic: - plotter.plot_steps_vs_epoch(stats, save=True) - print( - f"\n\n====> LAP COMPLETED in: {datetime.datetime.now() - start_time} - Epoch: {episode}" - f" - Cum. Reward: {cumulated_reward} <====\n\n" - ) - - if counter > 1000: - if config.plotter_graphic: - plotter.plot_steps_vs_epoch(stats, save=True) - counter = 0 - - if datetime.datetime.now() - datetime.timedelta(hours=2) > start_time: - print(f" - N epoch: {episode}") - print(f" - Action set: {config.actions_set}") - print(f" - Cum. reward: {cumulated_reward}") - - env.close() - exit(0) - - if episode % 1 == 0 and config.plotter_graphic: - # plotter.plot(env) - plotter.plot_steps_vs_epoch(stats) - # plotter.full_plot(env, stats, 2) # optional parameter = mode (0, 1, 2) - - m, s = divmod(int(time.time() - telemetry_start_time), 60) - h, m = divmod(m, 60) - - l = last_time_steps.tolist() - l.sort() - - print("Overall score: {:0.2f}".format(last_time_steps.mean())) - print( - "Best 100 score: {:0.2f}".format( - reduce(lambda x, y: x + y, l[-100:]) / len(l[-100:]) - ) - ) - - plotter.plot_steps_vs_epoch(stats, save=True) - - env.close() diff --git a/rl_studio/agents/f1/loaders.py b/rl_studio/agents/f1/loaders.py new file mode 100644 index 000000000..3ac1c8561 --- /dev/null +++ b/rl_studio/agents/f1/loaders.py @@ -0,0 +1,477 @@ +# This file contains all clasess to parser parameters from config.yaml into training RL + + +class LoadAlgorithmParams: + """ + Retrieves Algorithm params + """ + + def __init__(self, config): + if config["settings"]["algorithm"] == "ddpg": + self.gamma = config["algorithm"]["ddpg"]["gamma"] + self.tau = config["algorithm"]["ddpg"]["tau"] + self.std_dev = config["algorithm"]["ddpg"]["std_dev"] + self.model_name = config["algorithm"]["ddpg"]["model_name"] + self.buffer_capacity = config["algorithm"]["ddpg"]["buffer_capacity"] + self.batch_size = config["algorithm"]["ddpg"]["batch_size"] + + elif config["settings"]["algorithm"] == "dqn": + self.alpha = config["algorithm"]["dqn"]["alpha"] + self.gamma = config["algorithm"]["dqn"]["gamma"] + self.epsilon = config["algorithm"]["dqn"]["epsilon"] + self.epsilon_discount = config["algorithm"]["dqn"]["epsilon_discount"] + self.epsilon_min = config["algorithm"]["dqn"]["epsilon_min"] + self.model_name = config["algorithm"]["dqn"]["model_name"] + self.replay_memory_size = config["algorithm"]["dqn"]["replay_memory_size"] + self.min_replay_memory_size = config["algorithm"]["dqn"][ + "min_replay_memory_size" + ] + self.minibatch_size = config["algorithm"]["dqn"]["minibatch_size"] + self.update_target_every = config["algorithm"]["dqn"]["update_target_every"] + self.memory_fraction = config["algorithm"]["dqn"]["memory_fraction"] + self.buffer_capacity = config["algorithm"]["dqn"]["buffer_capacity"] + self.batch_size = config["algorithm"]["dqn"]["batch_size"] + + elif config["settings"]["algorithm"] == "qlearn": + self.alpha = config["algorithm"]["qlearn"]["alpha"] + self.gamma = config["algorithm"]["qlearn"]["gamma"] + self.epsilon = config["algorithm"]["qlearn"]["epsilon"] + self.epsilon_min = config["algorithm"]["qlearn"]["epsilon_min"] + + +class LoadEnvParams: + """ + Retrieves environment parameters: Gazebo, Carla, OpenAI... + """ + + def __init__(self, config): + if config["settings"]["simulator"] == "gazebo": + self.env = config["settings"]["env"] + self.env_name = config["gazebo_environments"][self.env]["env_name"] + self.model_state_name = config["gazebo_environments"][self.env][ + "model_state_name" + ] + self.total_episodes = config["settings"]["total_episodes"] + self.training_time = config["settings"]["training_time"] + self.save_episodes = config["gazebo_environments"][self.env][ + "save_episodes" + ] + self.save_every_step = config["gazebo_environments"][self.env][ + "save_every_step" + ] + self.estimated_steps = config["gazebo_environments"][self.env][ + "estimated_steps" + ] + + elif config["settings"]["simulator"] == "carla": + pass + + +class LoadGlobalParams: + """ + Retrieves Global params from config.yaml + """ + + def __init__(self, config): + self.stats = {} # epoch: steps + self.states_counter = {} + self.states_reward = {} + self.ep_rewards = [] + self.actions_rewards = { + "episode": [], + "step": [], + "v": [], + "w": [], + "reward": [], + "center": [], + } + self.aggr_ep_rewards = { + "episode": [], + "avg": [], + "max": [], + "min": [], + "step": [], + "epoch_training_time": [], + "total_training_time": [], + } + self.best_current_epoch = { + "best_epoch": [], + "highest_reward": [], + "best_step": [], + "best_epoch_training_time": [], + "current_total_training_time": [], + } + self.settings = config["settings"] + self.mode = config["settings"]["mode"] + self.task = config["settings"]["task"] + self.algorithm = config["settings"]["algorithm"] + self.agent = config["settings"]["agent"] + self.framework = config["settings"]["framework"] + self.models_dir = f"{config['settings']['models_dir']}/{config['settings']['task']}_{config['settings']['algorithm']}_{config['settings']['agent']}_{config['settings']['framework']}" + self.logs_tensorboard_dir = f"{config['settings']['logs_dir']}/{config['settings']['mode']}/{config['settings']['task']}_{config['settings']['algorithm']}_{config['settings']['agent']}_{config['settings']['framework']}/TensorBoard" + self.logs_dir = f"{config['settings']['logs_dir']}/{config['settings']['mode']}/{config['settings']['task']}_{config['settings']['algorithm']}_{config['settings']['agent']}_{config['settings']['framework']}/logs" + self.metrics_data_dir = f"{config['settings']['metrics_dir']}/{config['settings']['mode']}/{config['settings']['task']}_{config['settings']['algorithm']}_{config['settings']['agent']}_{config['settings']['framework']}/data" + self.metrics_graphics_dir = f"{config['settings']['metrics_dir']}/{config['settings']['mode']}/{config['settings']['task']}_{config['settings']['algorithm']}_{config['settings']['agent']}_{config['settings']['framework']}/graphics" + self.training_time = config["settings"]["training_time"] + ####### States + self.states = config["settings"]["states"] + self.states_set = config["states"][self.states] + ####### Actions + self.actions = config["settings"]["actions"] + self.actions_set = config["actions"][self.actions] + ####### Rewards + self.rewards = config["settings"]["rewards"] + + +class LoadEnvVariablesDQNGazebo: + """ + ONLY FOR DQN algorithm + Creates a new variable 'environment', which contains values to Gazebo env, Carla env ... + """ + + def __init__(self, config) -> None: + """environment variable for reset(), step() methods""" + self.environment_set = config["settings"]["environment_set"] + self.env = config["settings"]["env"] + self.agent = config["settings"]["agent"] + self.states = config["settings"]["states"] + self.actions = config["settings"]["actions"] + self.actions_set = config["actions"][self.actions] + self.rewards = config["settings"]["rewards"] + ##### environment variable + self.environment = {} + self.environment["agent"] = config["settings"]["agent"] + self.environment["algorithm"] = config["settings"]["algorithm"] + self.environment["task"] = config["settings"]["task"] + self.environment["framework"] = config["settings"]["framework"] + self.environment["model_state_name"] = config[self.environment_set][self.env][ + "model_state_name" + ] + # Training/inference + self.environment["mode"] = config["settings"]["mode"] + self.environment["retrain_dqn_tf_model_name"] = config["retraining"]["dqn"][ + "retrain_dqn_tf_model_name" + ] + self.environment["inference_dqn_tf_model_name"] = config["inference"]["dqn"][ + "inference_dqn_tf_model_name" + ] + + # Env + self.environment["env"] = config["settings"]["env"] + self.environment["circuit_name"] = config[self.environment_set][self.env][ + "circuit_name" + ] + self.environment["launchfile"] = config[self.environment_set][self.env][ + "launchfile" + ] + self.environment["environment_folder"] = config[self.environment_set][self.env][ + "environment_folder" + ] + self.environment["robot_name"] = config[self.environment_set][self.env][ + "robot_name" + ] + self.environment["estimated_steps"] = config[self.environment_set][self.env][ + "estimated_steps" + ] + self.environment["alternate_pose"] = config[self.environment_set][self.env][ + "alternate_pose" + ] + self.environment["sensor"] = config[self.environment_set][self.env]["sensor"] + self.environment["gazebo_start_pose"] = [ + config[self.environment_set][self.env]["circuit_positions_set"][0] + ] + self.environment["gazebo_random_start_pose"] = config[self.environment_set][ + self.env + ]["circuit_positions_set"] + self.environment["telemetry_mask"] = config[self.environment_set][self.env][ + "telemetry_mask" + ] + self.environment["telemetry"] = config[self.environment_set][self.env][ + "telemetry" + ] + + # Image + self.environment["height_image"] = config["agents"][self.agent][ + "camera_params" + ]["height"] + self.environment["width_image"] = config["agents"][self.agent]["camera_params"][ + "width" + ] + self.environment["center_image"] = config["agents"][self.agent][ + "camera_params" + ]["center_image"] + self.environment["image_resizing"] = config["agents"][self.agent][ + "camera_params" + ]["image_resizing"] + self.environment["new_image_size"] = config["agents"][self.agent][ + "camera_params" + ]["new_image_size"] + self.environment["raw_image"] = config["agents"][self.agent]["camera_params"][ + "raw_image" + ] + self.environment["num_regions"] = config["agents"][self.agent]["camera_params"][ + "num_regions" + ] + self.environment["lower_limit"] = config["agents"][self.agent]["camera_params"][ + "lower_limit" + ] + # States + self.environment["states"] = config["settings"]["states"] + self.environment["x_row"] = config["states"][self.states][0] + + # Actions + self.environment["action_space"] = config["settings"]["actions"] + self.environment["actions"] = config["actions"][self.actions] + + # Rewards + self.environment["reward_function"] = config["settings"]["rewards"] + self.environment["rewards"] = config["rewards"][self.rewards] + self.environment["min_reward"] = config["rewards"][self.rewards]["min_reward"] + + # Algorithm + self.environment["model_name"] = config["algorithm"]["dqn"]["model_name"] + # + self.environment["ROS_MASTER_URI"] = config["ros"]["ros_master_uri"] + self.environment["GAZEBO_MASTER_URI"] = config["ros"]["gazebo_master_uri"] + + +class LoadEnvVariablesDDPGGazebo: + """ + ONLY FOR DDPG algorithm + Creates a new variable 'environment', which contains values to Gazebo env, Carla env ... + """ + + def __init__(self, config) -> None: + """environment variable for reset(), step() methods""" + self.environment_set = config["settings"]["environment_set"] + self.env = config["settings"]["env"] + self.agent = config["settings"]["agent"] + self.states = config["settings"]["states"] + self.actions = config["settings"]["actions"] + self.actions_set = config["actions"][self.actions] + self.rewards = config["settings"]["rewards"] + ##### environment variable + self.environment = {} + self.environment["agent"] = config["settings"]["agent"] + self.environment["algorithm"] = config["settings"]["algorithm"] + self.environment["task"] = config["settings"]["task"] + self.environment["framework"] = config["settings"]["framework"] + self.environment["model_state_name"] = config[self.environment_set][self.env][ + "model_state_name" + ] + # Training/inference + self.environment["mode"] = config["settings"]["mode"] + self.environment["retrain_ddpg_tf_actor_model_name"] = config["retraining"][ + "ddpg" + ]["retrain_ddpg_tf_actor_model_name"] + self.environment["retrain_ddpg_tf_critic_model_name"] = config["retraining"][ + "ddpg" + ]["retrain_ddpg_tf_critic_model_name"] + self.environment["inference_ddpg_tf_actor_model_name"] = config["inference"][ + "ddpg" + ]["inference_ddpg_tf_actor_model_name"] + self.environment["inference_ddpg_tf_critic_model_name"] = config["inference"][ + "ddpg" + ]["inference_ddpg_tf_critic_model_name"] + + # Env + self.environment["env"] = config["settings"]["env"] + self.environment["circuit_name"] = config[self.environment_set][self.env][ + "circuit_name" + ] + self.environment["launchfile"] = config[self.environment_set][self.env][ + "launchfile" + ] + self.environment["environment_folder"] = config[self.environment_set][self.env][ + "environment_folder" + ] + self.environment["robot_name"] = config[self.environment_set][self.env][ + "robot_name" + ] + self.environment["estimated_steps"] = config[self.environment_set][self.env][ + "estimated_steps" + ] + self.environment["alternate_pose"] = config[self.environment_set][self.env][ + "alternate_pose" + ] + self.environment["sensor"] = config[self.environment_set][self.env]["sensor"] + self.environment["gazebo_start_pose"] = [ + config[self.environment_set][self.env]["circuit_positions_set"][0] + ] + self.environment["gazebo_random_start_pose"] = config[self.environment_set][ + self.env + ]["circuit_positions_set"] + self.environment["telemetry_mask"] = config[self.environment_set][self.env][ + "telemetry_mask" + ] + self.environment["telemetry"] = config[self.environment_set][self.env][ + "telemetry" + ] + + # Image + self.environment["height_image"] = config["agents"][self.agent][ + "camera_params" + ]["height"] + self.environment["width_image"] = config["agents"][self.agent]["camera_params"][ + "width" + ] + self.environment["center_image"] = config["agents"][self.agent][ + "camera_params" + ]["center_image"] + self.environment["image_resizing"] = config["agents"][self.agent][ + "camera_params" + ]["image_resizing"] + self.environment["new_image_size"] = config["agents"][self.agent][ + "camera_params" + ]["new_image_size"] + self.environment["raw_image"] = config["agents"][self.agent]["camera_params"][ + "raw_image" + ] + self.environment["num_regions"] = config["agents"][self.agent]["camera_params"][ + "num_regions" + ] + self.environment["lower_limit"] = config["agents"][self.agent]["camera_params"][ + "lower_limit" + ] + # States + self.environment["states"] = config["settings"]["states"] + self.environment["x_row"] = config["states"][self.states][0] + + # Actions + self.environment["action_space"] = config["settings"]["actions"] + self.environment["actions"] = config["actions"][self.actions] + + # Rewards + self.environment["reward_function"] = config["settings"]["rewards"] + self.environment["rewards"] = config["rewards"][self.rewards] + self.environment["min_reward"] = config["rewards"][self.rewards]["min_reward"] + + # Algorithm + self.environment["critic_lr"] = config["algorithm"]["ddpg"]["critic_lr"] + self.environment["actor_lr"] = config["algorithm"]["ddpg"]["actor_lr"] + self.environment["model_name"] = config["algorithm"]["ddpg"]["model_name"] + # + self.environment["ROS_MASTER_URI"] = config["ros"]["ros_master_uri"] + self.environment["GAZEBO_MASTER_URI"] = config["ros"]["gazebo_master_uri"] + + +class LoadEnvVariablesQlearnGazebo: + """ + ONLY FOR Qlearn algorithm + Creates a new variable 'environment', which contains values to Gazebo env, Carla env ... + """ + + def __init__(self, config) -> None: + """environment variable for reset(), step() methods""" + # self.agent = config["settings"]["agent"] + # self.algorithm = config["settings"]["algorithm"] + # self.task = config["settings"]["task"] + # self.framework = config["settings"]["framework"] + self.environment_set = config["settings"]["environment_set"] + self.env = config["settings"]["env"] + self.agent = config["settings"]["agent"] + self.states = config["settings"]["states"] + self.actions = config["settings"]["actions"] + self.actions_set = config["actions"][self.actions] + self.rewards = config["settings"]["rewards"] + ##### environment variable + self.environment = {} + self.environment["agent"] = config["settings"]["agent"] + self.environment["algorithm"] = config["settings"]["algorithm"] + self.environment["task"] = config["settings"]["task"] + self.environment["framework"] = config["settings"]["framework"] + self.environment["model_state_name"] = config[self.environment_set][self.env][ + "model_state_name" + ] + # Training/inference + self.environment["mode"] = config["settings"]["mode"] + self.environment["retrain_qlearn_model_name"] = config["retraining"]["qlearn"][ + "retrain_qlearn_model_name" + ] + self.environment["inference_qlearn_model_name"] = config["inference"]["qlearn"][ + "inference_qlearn_model_name" + ] + + # Env + self.environment["env"] = config["settings"]["env"] + self.environment["circuit_name"] = config[self.environment_set][self.env][ + "circuit_name" + ] + # self.environment["training_type"] = config[self.environment_set][self.env][ + # "training_type" + # ] + self.environment["launchfile"] = config[self.environment_set][self.env][ + "launchfile" + ] + self.environment["environment_folder"] = config[self.environment_set][self.env][ + "environment_folder" + ] + self.environment["robot_name"] = config[self.environment_set][self.env][ + "robot_name" + ] + self.environment["estimated_steps"] = config[self.environment_set][self.env][ + "estimated_steps" + ] + self.environment["alternate_pose"] = config[self.environment_set][self.env][ + "alternate_pose" + ] + self.environment["sensor"] = config[self.environment_set][self.env]["sensor"] + self.environment["gazebo_start_pose"] = [ + config[self.environment_set][self.env]["circuit_positions_set"][0] + ] + self.environment["gazebo_random_start_pose"] = config[self.environment_set][ + self.env + ]["circuit_positions_set"] + self.environment["telemetry_mask"] = config[self.environment_set][self.env][ + "telemetry_mask" + ] + self.environment["telemetry"] = config[self.environment_set][self.env][ + "telemetry" + ] + + # Image + self.environment["height_image"] = config["agents"][self.agent][ + "camera_params" + ]["height"] + self.environment["width_image"] = config["agents"][self.agent]["camera_params"][ + "width" + ] + self.environment["center_image"] = config["agents"][self.agent][ + "camera_params" + ]["center_image"] + self.environment["image_resizing"] = config["agents"][self.agent][ + "camera_params" + ]["image_resizing"] + self.environment["new_image_size"] = config["agents"][self.agent][ + "camera_params" + ]["new_image_size"] + self.environment["raw_image"] = config["agents"][self.agent]["camera_params"][ + "raw_image" + ] + self.environment["num_regions"] = config["agents"][self.agent]["camera_params"][ + "num_regions" + ] + self.environment["lower_limit"] = config["agents"][self.agent]["camera_params"][ + "lower_limit" + ] + # States + self.environment["states"] = config["settings"]["states"] + self.environment["x_row"] = config["states"][self.states][0] + + # Actions + self.environment["action_space"] = config["settings"]["actions"] + self.environment["actions"] = config["actions"][self.actions] + + # Rewards + self.environment["reward_function"] = config["settings"]["rewards"] + self.environment["rewards"] = config["rewards"][self.rewards] + self.environment["min_reward"] = config["rewards"][self.rewards]["min_reward"] + + # Algorithm + self.environment["alpha"] = config["algorithm"]["qlearn"]["alpha"] + self.environment["epsilon"] = config["algorithm"]["qlearn"]["epsilon"] + self.environment["epsilon_min"] = config["algorithm"]["qlearn"]["epsilon_min"] + self.environment["gamma"] = config["algorithm"]["qlearn"]["gamma"] + # + self.environment["ROS_MASTER_URI"] = config["ros"]["ros_master_uri"] + self.environment["GAZEBO_MASTER_URI"] = config["ros"]["gazebo_master_uri"] diff --git a/rl_studio/agents/f1/train_followlane_ddpg_f1_gazebo_tf.py b/rl_studio/agents/f1/train_followlane_ddpg_f1_gazebo_tf.py new file mode 100644 index 000000000..182e2eaf7 --- /dev/null +++ b/rl_studio/agents/f1/train_followlane_ddpg_f1_gazebo_tf.py @@ -0,0 +1,370 @@ +from datetime import datetime, timedelta +import os +import random +import time + +import gymnasium as gym +import numpy as np +import tensorflow as tf +from tqdm import tqdm + +from rl_studio.agents.f1.loaders import ( + LoadAlgorithmParams, + LoadEnvParams, + LoadEnvVariablesDDPGGazebo, + LoadGlobalParams, +) +from rl_studio.agents.utils import ( + print_messages, + render_params, + save_dataframe_episodes, + LoggingHandler, +) +from rl_studio.algorithms.ddpg import ( + ModifiedTensorBoard, + OUActionNoise, + Buffer, + DDPGAgent, +) +from rl_studio.algorithms.utils import ( + save_actorcritic_model, +) +from rl_studio.envs.gazebo.gazebo_envs import * + + +class TrainerFollowLaneDDPGF1GazeboTF: + """ + Mode: training + Task: Follow Lane + Algorithm: DDPG + Agent: F1 + Simulator: Gazebo + Framework: TensorFlow + """ + + def __init__(self, config): + self.algoritmhs_params = LoadAlgorithmParams(config) + self.env_params = LoadEnvParams(config) + self.environment = LoadEnvVariablesDDPGGazebo(config) + self.global_params = LoadGlobalParams(config) + + os.makedirs(f"{self.global_params.models_dir}", exist_ok=True) + os.makedirs(f"{self.global_params.logs_dir}", exist_ok=True) + os.makedirs(f"{self.global_params.metrics_data_dir}", exist_ok=True) + os.makedirs(f"{self.global_params.metrics_graphics_dir}", exist_ok=True) + self.log_file = f"{self.global_params.logs_dir}/{time.strftime('%Y%m%d-%H%M%S')}_{self.global_params.mode}_{self.global_params.task}_{self.global_params.algorithm}_{self.global_params.agent}_{self.global_params.framework}.log" + + def main(self): + + log = LoggingHandler(self.log_file) + + ## Load Environment + env = gym.make(self.env_params.env_name, **self.environment.environment) + + random.seed(1) + np.random.seed(1) + tf.compat.v1.random.set_random_seed(1) + + start_time = datetime.now() + best_epoch = 1 + current_max_reward = 0 + best_step = 0 + best_epoch_training_time = 0 + ## Reset env + state, state_size = env.reset() + + log.logger.info( + f"\nstates = {self.global_params.states}\n" + f"states_set = {self.global_params.states_set}\n" + f"states_len = {len(self.global_params.states_set)}\n" + f"actions = {self.global_params.actions}\n" + f"actions set = {self.global_params.actions_set}\n" + f"actions_len = {len(self.global_params.actions_set)}\n" + f"actions_range = {range(len(self.global_params.actions_set))}\n" + f"batch_size = {self.algoritmhs_params.batch_size}\n" + f"logs_tensorboard_dir = {self.global_params.logs_tensorboard_dir}\n" + f"rewards = {self.environment.environment['rewards']}" + ) + ## --------------------- Deep Nets ------------------ + ou_noise = OUActionNoise( + mean=np.zeros(1), + std_deviation=float(self.algoritmhs_params.std_dev) * np.ones(1), + ) + # Init Agents + ac_agent = DDPGAgent( + self.environment.environment, + len(self.global_params.actions_set), + state_size, + self.global_params.models_dir, + ) + # init Buffer + buffer = Buffer( + state_size, + len(self.global_params.actions_set), + self.global_params.states, + self.global_params.actions, + self.algoritmhs_params.buffer_capacity, + self.algoritmhs_params.batch_size, + ) + # Init TensorBoard + tensorboard = ModifiedTensorBoard( + log_dir=f"{self.global_params.logs_tensorboard_dir}/{self.algoritmhs_params.model_name}-{time.strftime('%Y%m%d-%H%M%S')}" + ) + # show rewards stats per episode + + ## ------------- START TRAINING -------------------- + for episode in tqdm( + range(1, self.env_params.total_episodes + 1), ascii=True, unit="episodes" + ): + tensorboard.step = episode + done = False + cumulated_reward = 0 + step = 1 + start_time_epoch = datetime.now() + + prev_state, prev_state_size = env.reset() + + while not done: + tf_prev_state = tf.expand_dims(tf.convert_to_tensor(prev_state), 0) + action = ac_agent.policy( + tf_prev_state, ou_noise, self.global_params.actions + ) + state, reward, done, _ = env.step(action, step) + cumulated_reward += reward + + # learn and update + buffer.record((prev_state, action, reward, state)) + buffer.learn(ac_agent, self.algoritmhs_params.gamma) + ac_agent.update_target( + ac_agent.target_actor.variables, + ac_agent.actor_model.variables, + self.algoritmhs_params.tau, + ) + ac_agent.update_target( + ac_agent.target_critic.variables, + ac_agent.critic_model.variables, + self.algoritmhs_params.tau, + ) + + # + prev_state = state + step += 1 + + log.logger.debug( + f"\nstate = {state}\n" + f"state type = {type(state)}\n" + f"prev_state = {prev_state}\n" + f"prev_state = {type(prev_state)}\n" + f"action = {action}\n" + f"actions type = {type(action)}\n" + ) + render_params( + task=self.global_params.task, + # v=action[0][0], # for continuous actions + # w=action[0][1], # for continuous actions + episode=episode, + step=step, + state=state, + v=self.global_params.actions_set[action][ + 0 + ], # this case for discrete + w=self.global_params.actions_set[action][ + 1 + ], # this case for discrete + reward_in_step=reward, + cumulated_reward_in_this_episode=cumulated_reward, + _="--------------------------", + best_episode_until_now=best_epoch, + in_best_step=best_step, + with_highest_reward=int(current_max_reward), + in_best_epoch_training_time=best_epoch_training_time, + ) + log.logger.debug( + f"\nepisode = {episode}\n" + f"step = {step}\n" + f"actions_len = {len(self.global_params.actions_set)}\n" + f"actions_range = {range(len(self.global_params.actions_set))}\n" + f"actions = {self.global_params.actions_set}\n" + f"reward_in_step = {reward}\n" + f"cumulated_reward = {cumulated_reward}\n" + f"done = {done}\n" + ) + # best episode + if current_max_reward <= cumulated_reward and episode > 1: + current_max_reward = cumulated_reward + best_epoch = episode + best_step = step + best_epoch_training_time = datetime.now() - start_time_epoch + # saving params to show + self.global_params.actions_rewards["episode"].append(episode) + self.global_params.actions_rewards["step"].append(step) + # For continuous actios + # self.actions_rewards["v"].append(action[0][0]) + # self.actions_rewards["w"].append(action[0][1]) + self.global_params.actions_rewards["reward"].append(reward) + #self.global_params.actions_rewards["center"].append( + # env.image_center + #) + + # Showing stats in screen for monitoring. Showing every 'save_every_step' value + if not step % self.env_params.save_every_step: + save_dataframe_episodes( + self.environment.environment, + self.global_params.metrics_data_dir, + self.global_params.aggr_ep_rewards, + self.global_params.actions_rewards, + ) + log.logger.info( + f"SHOWING BATCH OF STEPS\n" + f"cumulated_reward = {cumulated_reward}\n" + f"current_max_reward = {current_max_reward}\n" + f"current epoch = {episode}\n" + f"current step = {step}\n" + f"best epoch so far = {best_epoch}\n" + f"best step so far = {best_step}\n" + f"best_epoch_training_time = {best_epoch_training_time}\n" + ) + + ### save in case of completed steps in one episode + if step >= self.env_params.estimated_steps: + done = True + log.logger.info( + f"\nEPISODE COMPLETED\n" + f"in episode = {episode}\n" + f"steps = {step}\n" + f"cumulated_reward = {cumulated_reward}\n" + ) + save_actorcritic_model( + ac_agent, + self.global_params, + self.algoritmhs_params, + self.environment.environment, + cumulated_reward, + episode, + "LAPCOMPLETED", + ) + #### save best lap in episode + if ( + cumulated_reward - self.environment.environment["rewards"]["penal"] + ) >= current_max_reward and episode > 1: + + self.global_params.best_current_epoch["best_epoch"].append(best_epoch) + self.global_params.best_current_epoch["highest_reward"].append( + current_max_reward + ) + self.global_params.best_current_epoch["best_step"].append(best_step) + self.global_params.best_current_epoch[ + "best_epoch_training_time" + ].append(best_epoch_training_time) + self.global_params.best_current_epoch[ + "current_total_training_time" + ].append(datetime.now() - start_time) + + save_dataframe_episodes( + self.environment.environment, + self.global_params.metrics_data_dir, + self.global_params.best_current_epoch, + ) + save_actorcritic_model( + ac_agent, + self.global_params, + self.algoritmhs_params, + self.environment.environment, + cumulated_reward, + episode, + "BESTLAP", + ) + + log.logger.info( + f"\nsaving best lap\n" + f"in episode = {episode}\n" + f"cumulated_reward = {cumulated_reward}\n" + f"current_max_reward = {current_max_reward}\n" + f"steps = {step}\n" + ) + ### end episode in time settings: 2 hours, 15 hours... + # or epochs over + if ( + datetime.now() - timedelta(hours=self.global_params.training_time) + > start_time + ) or (episode > self.env_params.total_episodes): + log.logger.info( + f"\nTraining Time over or num epochs reached\n" + f"current_max_reward = {current_max_reward}\n" + f"cumulated_reward = {cumulated_reward}\n" + f"epoch = {episode}\n" + f"step = {step}\n" + ) + if (cumulated_reward - self.environment.environment['rewards']['penal'])>= current_max_reward: + save_actorcritic_model( + ac_agent, + self.global_params, + self.algoritmhs_params, + self.environment.environment, + cumulated_reward, + episode, + "FINISHTIME", + ) + + break + + ##################################################### + ### save every save_episode times + self.global_params.ep_rewards.append(cumulated_reward) + if not episode % self.env_params.save_episodes: + average_reward = sum( + self.global_params.ep_rewards[-self.env_params.save_episodes :] + ) / len(self.global_params.ep_rewards[-self.env_params.save_episodes :]) + min_reward = min( + self.global_params.ep_rewards[-self.env_params.save_episodes :] + ) + max_reward = max( + self.global_params.ep_rewards[-self.env_params.save_episodes :] + ) + tensorboard.update_stats( + reward_avg=int(average_reward), + reward_max=int(max_reward), + steps=step, + ) + self.global_params.aggr_ep_rewards["episode"].append(episode) + self.global_params.aggr_ep_rewards["step"].append(step) + self.global_params.aggr_ep_rewards["avg"].append(average_reward) + self.global_params.aggr_ep_rewards["max"].append(max_reward) + self.global_params.aggr_ep_rewards["min"].append(min_reward) + self.global_params.aggr_ep_rewards["epoch_training_time"].append( + (datetime.now() - start_time_epoch).total_seconds() + ) + self.global_params.aggr_ep_rewards["total_training_time"].append( + (datetime.now() - start_time).total_seconds() + ) + if max_reward > current_max_reward: + save_actorcritic_model( + ac_agent, + self.global_params, + self.algoritmhs_params, + self.environment.environment, + cumulated_reward, + episode, + "BATCH", + ) + + save_dataframe_episodes( + self.environment.environment, + self.global_params.metrics_data_dir, + self.global_params.aggr_ep_rewards, + ) + log.logger.info( + f"\nsaving BATCH\n" + f"current_max_reward = {current_max_reward}\n" + f"best_epoch = {best_epoch}\n" + f"best_step = {best_step}\n" + f"best_epoch_training_time = {best_epoch_training_time}\n" + ) + ##################################################### + ### save last episode, not neccesarily the best one + save_dataframe_episodes( + self.environment.environment, + self.global_params.metrics_data_dir, + self.global_params.aggr_ep_rewards, + ) + env.close() diff --git a/rl_studio/agents/f1/train_followlane_dqn_f1_gazebo_tf.py b/rl_studio/agents/f1/train_followlane_dqn_f1_gazebo_tf.py new file mode 100644 index 000000000..075886e30 --- /dev/null +++ b/rl_studio/agents/f1/train_followlane_dqn_f1_gazebo_tf.py @@ -0,0 +1,329 @@ +from datetime import datetime, timedelta +import os +import random +import time + +import gymnasium as gym +import numpy as np +import tensorflow as tf +from tqdm import tqdm + +from rl_studio.agents.f1.loaders import ( + LoadAlgorithmParams, + LoadEnvParams, + LoadEnvVariablesDQNGazebo, + LoadGlobalParams, +) +from rl_studio.agents.utils import ( + print_messages, + print_dictionary, + render_params, + save_dataframe_episodes, + save_best_episode_dqn, + LoggingHandler, +) +from rl_studio.algorithms.dqn_keras import ( + ModifiedTensorBoard, + DQN, +) +from rl_studio.envs.gazebo.gazebo_envs import * + + +class TrainerFollowLaneDQNF1GazeboTF: + """ + Mode: training + Task: Follow Lane + Algorithm: DQN + Agent: F1 + Simulator: Gazebo + Framework: TensorFlow + """ + + def __init__(self, config): + self.algoritmhs_params = LoadAlgorithmParams(config) + self.env_params = LoadEnvParams(config) + self.environment = LoadEnvVariablesDQNGazebo(config) + self.global_params = LoadGlobalParams(config) + os.makedirs(f"{self.global_params.models_dir}", exist_ok=True) + os.makedirs(f"{self.global_params.logs_dir}", exist_ok=True) + os.makedirs(f"{self.global_params.metrics_data_dir}", exist_ok=True) + os.makedirs(f"{self.global_params.metrics_graphics_dir}", exist_ok=True) + self.log_file = f"{self.global_params.logs_dir}/{time.strftime('%Y%m%d-%H%M%S')}_{self.global_params.mode}_{self.global_params.task}_{self.global_params.algorithm}_{self.global_params.agent}_{self.global_params.framework}.log" + + def main(self): + + log = LoggingHandler(self.log_file) + + ## Load Environment + env = gym.make(self.env_params.env_name, **self.environment.environment) + + random.seed(1) + np.random.seed(1) + tf.compat.v1.random.set_random_seed(1) + + start_time = datetime.now() + best_epoch = 1 + current_max_reward = 0 + best_step = 0 + best_epoch_training_time = 0 + epsilon = self.algoritmhs_params.epsilon + epsilon_discount = self.algoritmhs_params.epsilon_discount + epsilon_min = self.algoritmhs_params.epsilon_min + + ## Reset env + state, state_size = env.reset() + + log.logger.info( + f"\nstates = {self.global_params.states}\n" + f"states_set = {self.global_params.states_set}\n" + f"states_len = {len(self.global_params.states_set)}\n" + f"actions = {self.global_params.actions}\n" + f"actions set = {self.global_params.actions_set}\n" + f"actions_len = {len(self.global_params.actions_set)}\n" + f"actions_range = {range(len(self.global_params.actions_set))}\n" + f"epsilon = {epsilon}\n" + f"batch_size = {self.algoritmhs_params.batch_size}\n" + f"logs_tensorboard_dir = {self.global_params.logs_tensorboard_dir}\n" + ) + + ## --------------------- Deep Nets ------------------ + # Init Agent + dqn_agent = DQN( + self.environment.environment, + self.algoritmhs_params, + len(self.global_params.actions_set), + state_size, + self.global_params.models_dir, + self.global_params, + ) + # Init TensorBoard + tensorboard = ModifiedTensorBoard( + log_dir=f"{self.global_params.logs_tensorboard_dir}/{self.algoritmhs_params.model_name}-{time.strftime('%Y%m%d-%H%M%S')}" + ) + + ## ------------- START TRAINING -------------------- + for episode in tqdm( + range(1, self.env_params.total_episodes + 1), ascii=True, unit="episodes" + ): + tensorboard.step = episode + done = False + cumulated_reward = 0 + step = 1 + start_time_epoch = datetime.now() + + observation, _ = env.reset() + + while not done: + if np.random.random() > epsilon: + # Get action from Q table + # action = np.argmax(agent_dqn.get_qs(state)) + action = np.argmax(dqn_agent.get_qs(observation)) + else: + # Get random action + action = np.random.randint(0, len(self.global_params.actions_set)) + + new_observation, reward, done, _ = env.step(action, step) + + # Every step we update replay memory and train main network + # agent_dqn.update_replay_memory((state, action, reward, nextState, done)) + dqn_agent.update_replay_memory( + (observation, action, reward, new_observation, done) + ) + dqn_agent.train(done, step) + + cumulated_reward += reward + observation = new_observation + step += 1 + + log.logger.debug( + f"\nobservation = {observation}\n" + f"observation type = {type(observation)}\n" + f"new_observation = {new_observation}\n" + f"new_observation type = {type(new_observation)}\n" + f"action = {action}\n" + f"actions type = {type(action)}\n" + ) + render_params( + task=self.global_params.task, + episode=episode, + step=step, + observation=observation, + new_observation=new_observation, + action=action, + v=self.global_params.actions_set[action][ + 0 + ], # this case for discrete + w=self.global_params.actions_set[action][ + 1 + ], # this case for discrete + epsilon=epsilon, + reward_in_step=reward, + cumulated_reward_in_this_episode=cumulated_reward, + _="--------------------------", + best_episode_until_now=best_epoch, + in_best_step=best_step, + with_highest_reward=int(current_max_reward), + in_best_epoch_training_time=best_epoch_training_time, + ) + log.logger.debug( + f"\nepisode = {episode}\n" + f"step = {step}\n" + f"actions_len = {len(self.global_params.actions_set)}\n" + f"actions_range = {range(len(self.global_params.actions_set))}\n" + f"actions = {self.global_params.actions_set}\n" + f"epsilon = {epsilon}\n" + f"observation = {observation}\n" + f"reward_in_step = {reward}\n" + f"cumulated_reward = {cumulated_reward}\n" + f"done = {done}\n" + ) + + # best episode + if current_max_reward <= cumulated_reward and episode > 1: + current_max_reward = cumulated_reward + best_epoch = episode + best_step = step + best_epoch_training_time = datetime.now() - start_time_epoch + self.global_params.actions_rewards["episode"].append(episode) + self.global_params.actions_rewards["step"].append(step) + self.global_params.actions_rewards["reward"].append(reward) + + # Showing stats in screen for monitoring. Showing every 'save_every_step' value + if not step % self.env_params.save_every_step: + log.logger.info( + f"SHOWING BATCH OF STEPS\n" + f"current_max_reward = {current_max_reward}\n" + f"cumulated_reward = {cumulated_reward}\n" + f"current epoch = {episode}\n" + f"current step = {step}\n" + f"best epoch so far = {best_epoch}\n" + f"best step so far = {best_step}\n" + f"best_epoch_training_time = {best_epoch_training_time}\n" + ) + + ### save in case of completed steps in one episode + if step >= self.env_params.estimated_steps: + done = True + log.logger.info( + f"\nEPISODE COMPLETED\n" + f"in episode = {episode}\n" + f"steps = {step}\n" + f"cumulated_reward = {cumulated_reward}\n" + f"epsilon = {epsilon}\n" + ) + dqn_agent.model.save( + f"{self.global_params.models_dir}/{time.strftime('%Y%m%d-%H%M%S')}_Circuit-{self.environment.environment['circuit_name']}_States-{self.environment.environment['states']}_Actions-{self.environment.environment['action_space']}_EPOCHCOMPLETED_Rewards-{self.environment.environment['reward_function']}_epsilon-{round(epsilon,3)}_epoch-{episode}_step-{step}_reward-{int(cumulated_reward)}_{self.algoritmhs_params.model_name}.model", + ) + + #### save best lap in episode + if ( + cumulated_reward - self.environment.environment["rewards"]["penal"] + ) >= current_max_reward and episode > 1: + + self.global_params.best_current_epoch["best_epoch"].append(best_epoch) + self.global_params.best_current_epoch["highest_reward"].append( + current_max_reward + ) + self.global_params.best_current_epoch["best_step"].append(best_step) + self.global_params.best_current_epoch[ + "best_epoch_training_time" + ].append(best_epoch_training_time) + self.global_params.best_current_epoch[ + "current_total_training_time" + ].append(datetime.now() - start_time) + + save_dataframe_episodes( + self.environment.environment, + self.global_params.metrics_data_dir, + self.global_params.best_current_epoch, + ) + dqn_agent.model.save( + f"{self.global_params.models_dir}/{time.strftime('%Y%m%d-%H%M%S')}_Circuit-{self.environment.environment['circuit_name']}_States-{self.environment.environment['states']}_Actions-{self.environment.environment['action_space']}_BESTLAP_Rewards-{self.environment.environment['reward_function']}_epsilon-{round(epsilon,3)}_epoch-{episode}_step-{step}_reward-{int(cumulated_reward)}_{self.algoritmhs_params.model_name}.model", + ) + log.logger.info( + f"\nsaving best lap\n" + f"in episode = {episode}\n" + f"cumulated_reward = {cumulated_reward}\n" + f"current_max_reward = {current_max_reward}\n" + f"steps = {step}\n" + f"epsilon = {epsilon}\n" + ) + # end episode in time settings: 2 hours, 15 hours... + # or epochs over + if ( + datetime.now() - timedelta(hours=self.global_params.training_time) + > start_time + ) or (episode > self.env_params.total_episodes): + log.logger.info( + f"\nTraining Time over or num epochs reached\n" + f"current_max_reward = {current_max_reward}\n" + f"cumulated_reward = {cumulated_reward}\n" + f"epoch = {episode}\n" + f"step = {step}\n" + f"epsilon = {epsilon}\n" + ) + if (cumulated_reward - self.environment.environment['rewards']['penal'])>= current_max_reward: + dqn_agent.model.save( + f"{self.global_params.models_dir}/{time.strftime('%Y%m%d-%H%M%S')}_Circuit-{self.environment.environment['circuit_name']}_States-{self.environment.environment['states']}_Actions-{self.environment.environment['action_space']}_LAPCOMPLETED_Rewards-{self.environment.environment['reward_function']}_epsilon-{round(epsilon,3)}_epoch-{episode}_step-{step}_reward-{int(cumulated_reward)}_{self.algoritmhs_params.model_name}.model", + ) + + break + + ### save every save_episode times + self.global_params.ep_rewards.append(cumulated_reward) + if not episode % self.env_params.save_episodes: + average_reward = sum( + self.global_params.ep_rewards[-self.env_params.save_episodes :] + ) / len(self.global_params.ep_rewards[-self.env_params.save_episodes :]) + min_reward = min( + self.global_params.ep_rewards[-self.env_params.save_episodes :] + ) + max_reward = max( + self.global_params.ep_rewards[-self.env_params.save_episodes :] + ) + tensorboard.update_stats( + reward_avg=int(average_reward), + reward_max=int(max_reward), + steps=step, + epsilon=epsilon, + ) + + self.global_params.aggr_ep_rewards["episode"].append(episode) + self.global_params.aggr_ep_rewards["step"].append(step) + self.global_params.aggr_ep_rewards["avg"].append(average_reward) + self.global_params.aggr_ep_rewards["max"].append(max_reward) + self.global_params.aggr_ep_rewards["min"].append(min_reward) + self.global_params.aggr_ep_rewards["epoch_training_time"].append( + (datetime.now() - start_time_epoch).total_seconds() + ) + self.global_params.aggr_ep_rewards["total_training_time"].append( + (datetime.now() - start_time).total_seconds() + ) + if max_reward > current_max_reward: + dqn_agent.model.save( + f"{self.global_params.models_dir}/{time.strftime('%Y%m%d-%H%M%S')}_Circuit-{self.environment.environment['circuit_name']}_States-{self.environment.environment['states']}_Actions-{self.environment.environment['action_space']}_BATCH_Rewards-{self.environment.environment['reward_function']}_epsilon-{round(epsilon,3)}_epoch-{episode}_step-{step}_reward-{int(cumulated_reward)}_{self.algoritmhs_params.model_name}.model", + ) + + save_dataframe_episodes( + self.environment.environment, + self.global_params.metrics_data_dir, + self.global_params.aggr_ep_rewards, + ) + log.logger.info( + f"\nsaving BATCH\n" + f"current_max_reward = {current_max_reward}\n" + f"best_epoch = {best_epoch}\n" + f"best_step = {best_step}\n" + f"best_epoch_training_time = {best_epoch_training_time}\n" + ) + # reducing exploration + if epsilon > epsilon_min: + epsilon *= epsilon_discount + + ### save last episode, not neccesarily the best one + save_dataframe_episodes( + self.environment.environment, + self.global_params.metrics_data_dir, + self.global_params.aggr_ep_rewards, + ) + env.close() diff --git a/rl_studio/agents/f1/train_followlane_qlearn_f1_gazebo.py b/rl_studio/agents/f1/train_followlane_qlearn_f1_gazebo.py new file mode 100755 index 000000000..d23cef9ad --- /dev/null +++ b/rl_studio/agents/f1/train_followlane_qlearn_f1_gazebo.py @@ -0,0 +1,591 @@ +from datetime import datetime, timedelta +import os +import time + +import gymnasium as gym +import numpy as np +from reloading import reloading +from tqdm import tqdm + +from rl_studio.agents.f1.loaders import ( + LoadAlgorithmParams, + LoadEnvParams, + LoadEnvVariablesQlearnGazebo, + LoadGlobalParams, +) +from rl_studio.agents.utils import ( + render_params, + save_dataframe_episodes, + save_batch, + save_best_episode, + LoggingHandler, +) +from rl_studio.algorithms.qlearn import QLearn, QLearnF1 +from rl_studio.envs.gazebo.gazebo_envs import * + + +class TrainerFollowLaneQlearnF1Gazebo: + """ + Mode: training + Task: Follow Lane + Algorithm: Qlearn + Agent: F1 + Simulator: Gazebo + """ + + def __init__(self, config): + self.algoritmhs_params = LoadAlgorithmParams(config) + self.env_params = LoadEnvParams(config) + self.environment = LoadEnvVariablesQlearnGazebo(config) + self.global_params = LoadGlobalParams(config) + + os.makedirs(f"{self.global_params.models_dir}", exist_ok=True) + os.makedirs(f"{self.global_params.logs_dir}", exist_ok=True) + os.makedirs(f"{self.global_params.metrics_data_dir}", exist_ok=True) + os.makedirs(f"{self.global_params.metrics_graphics_dir}", exist_ok=True) + + self.log_file = f"{self.global_params.logs_dir}/{time.strftime('%Y%m%d-%H%M%S')}_{self.global_params.mode}_{self.global_params.task}_{self.global_params.algorithm}_{self.global_params.agent}_{self.global_params.framework}.log" + + def main(self): + """ + Implementation of QlearnF1, a table based algorithm + """ + + log = LoggingHandler(self.log_file) + + ## Load Environment + env = gym.make(self.env_params.env_name, **self.environment.environment) + + start_time = datetime.now() + best_epoch = 1 + current_max_reward = 0 + best_step = 0 + best_epoch_training_time = 0 + epsilon = self.environment.environment["epsilon"] + epsilon_decay = epsilon / (self.env_params.total_episodes // 2) + # states_counter = {} + + log.logger.debug( + f"\nstates = {self.global_params.states}\n" + f"states_set = {self.global_params.states_set}\n" + f"states_len = {len(self.global_params.states_set)}\n" + f"actions = {self.global_params.actions}\n" + f"actions set = {self.global_params.actions_set}\n" + f"actions_len = {len(self.global_params.actions_set)}\n" + f"actions_range = {range(len(self.global_params.actions_set))}\n" + f"epsilon = {epsilon}\n" + f"epsilon_decay = {epsilon_decay}\n" + f"alpha = {self.environment.environment['alpha']}\n" + f"gamma = {self.environment.environment['gamma']}\n" + ) + ## --- init Qlearn + qlearn = QLearnF1( + len(self.global_params.states_set), + self.global_params.actions, + len(self.global_params.actions_set), + self.environment.environment["epsilon"], + self.environment.environment["alpha"], + self.environment.environment["gamma"], + self.environment.environment["num_regions"], + ) + + ## retraining q model + if self.environment.environment["mode"] == "retraining": + qlearn.load_table( + f"{self.global_params.models_dir}/{self.environment.environment['retrain_qlearn_model_name']}" + ) + # using epsilon reduced + epsilon = epsilon / 2 + + ## ------------- START TRAINING -------------------- + for episode in tqdm( + range(1, self.env_params.total_episodes + 1), + ascii=True, + unit="episodes", + ): + done = False + cumulated_reward = 0 + step = 0 + start_time_epoch = datetime.now() + + ## reset env() + observation, _ = env.reset() + + while not done: + step += 1 + # Pick an action based on the current state + action = qlearn.select_action(observation) + + # Execute the action and get feedback + new_observation, reward, done, _ = env.step(action, step) + cumulated_reward += reward + + log.logger.debug( + f"\nobservation = {observation}\n" + f"observation[0]= {observation[0]}\n" + f"observation type = {type(observation)}\n" + f"observation[0] type = {type(observation[0])}\n" + f"new_observation = {new_observation}\n" + f"new_observation = {type(new_observation)}\n" + f"action = {action}\n" + f"actions type = {type(action)}\n" + f"current_max_reward = {current_max_reward}\n" + ) + + qlearn.learn(observation, action, reward, new_observation) + observation = new_observation + + # render params + render_params( + action=action, + episode=episode, + step=step, + v=self.global_params.actions_set[action][ + 0 + ], # this case for discrete + w=self.global_params.actions_set[action][ + 1 + ], # this case for discrete + epsilon=epsilon, + observation=observation, + reward_in_step=reward, + cumulated_reward=cumulated_reward, + current_max_reward=current_max_reward, + done=done, + ) + + log.logger.debug( + f"\nepisode = {episode}\n" + f"step = {step}\n" + f"actions_len = {len(self.global_params.actions_set)}\n" + f"actions_range = {range(len(self.global_params.actions_set))}\n" + f"actions = {self.global_params.actions_set}\n" + f"epsilon = {epsilon}\n" + f"epsilon_decay = {epsilon_decay}\n" + f"v = {self.global_params.actions_set[action][0]}\n" + f"w = {self.global_params.actions_set[action][1]}\n" + f"observation = {observation}\n" + f"reward_in_step = {reward}\n" + f"cumulated_reward = {cumulated_reward}\n" + f"current_max_reward = {current_max_reward}\n" + f"done = {done}\n" + ) + + # best episode and step's stats + if current_max_reward <= cumulated_reward and episode > 1: + current_max_reward = cumulated_reward + best_epoch = episode + best_step = step + best_epoch_training_time = datetime.now() - start_time_epoch + self.global_params.actions_rewards["episode"].append(episode) + self.global_params.actions_rewards["step"].append(step) + self.global_params.actions_rewards["reward"].append(reward) + # Showing stats in screen for monitoring. Showing every 'save_every_step' value + if not step % self.env_params.save_every_step: + log.logger.info( + f"SHOWING BATCH OF STEPS\n" + f"current_max_reward = {current_max_reward}\n" + f"cumulated_reward = {cumulated_reward}\n" + f"current epoch = {episode}\n" + f"current step = {step}\n" + f"best epoch so far = {best_epoch}\n" + f"best step so far = {best_step}\n" + f"best_epoch_training_time = {best_epoch_training_time}\n" + ) + + # End epoch + if step > self.env_params.estimated_steps: + done = True + qlearn.save_numpytable( + qlearn.q_table, + self.environment.environment, + self.global_params.models_dir, + cumulated_reward, + episode, + step, + epsilon, + ) + np.save( + f"{self.global_params.models_dir}/{time.strftime('%Y%m%d-%H%M%S')}_Circuit-{self.environment.environment['circuit_name']}_States-{self.environment.environment['states']}_Actions-{self.environment.environment['action_space']}_Rewards-{self.environment.environment['reward_function']}_epsilon-{round(epsilon,3)}_epoch-{episode}_step-{step}_reward-{int(cumulated_reward)}-qtable.npy", + qlearn.q_table, + ) + log.logger.info( + f"\nEPISODE COMPLETED\n" + f"in episode = {episode}\n" + f"steps = {step}\n" + f"cumulated_reward = {cumulated_reward}\n" + f"epsilon = {epsilon}\n" + ) + + # Save best lap + if (cumulated_reward - self.environment.environment['rewards']['penal']) >= current_max_reward: + self.global_params.best_current_epoch["best_epoch"].append(best_epoch) + self.global_params.best_current_epoch["highest_reward"].append( + cumulated_reward + ) + self.global_params.best_current_epoch["best_step"].append(best_step) + self.global_params.best_current_epoch[ + "best_epoch_training_time" + ].append(best_epoch_training_time) + self.global_params.best_current_epoch[ + "current_total_training_time" + ].append(datetime.now() - start_time) + save_dataframe_episodes( + self.environment.environment, + self.global_params.metrics_data_dir, + self.global_params.best_current_epoch, + ) + qlearn.save_numpytable( + qlearn.q_table, + self.environment.environment, + self.global_params.models_dir, + cumulated_reward, + episode, + step, + epsilon, + ) + np.save( + f"{self.global_params.models_dir}/{time.strftime('%Y%m%d-%H%M%S')}_Circuit-{self.environment.environment['circuit_name']}_States-{self.environment.environment['states']}_Actions-{self.environment.environment['action_space']}_Rewards-{self.environment.environment['reward_function']}_epsilon-{round(epsilon,3)}_epoch-{episode}_step-{step}_reward-{int(cumulated_reward - self.environment.environment['rewards']['penal'])}-qtable.npy", + qlearn.q_table, + ) + + log.logger.info( + f"\nsaving best lap\n" + f"in episode = {episode}\n" + f"cumulated_reward = {cumulated_reward}\n" + f"current_max_reward = {current_max_reward}\n" + f"steps = {step}\n" + f"epsilon = {epsilon}\n" + ) + # end of training by: + # training time setting: 2 hours, 15 hours... + # num epochs + + if ( + datetime.now() - timedelta(hours=self.global_params.training_time) + > start_time + ) or (episode > self.env_params.total_episodes): + if (cumulated_reward - self.environment.environment['rewards']['penal']) >= current_max_reward: + qlearn.save_numpytable( + qlearn.q_table, + self.environment.environment, + self.global_params.models_dir, + qlearn, + cumulated_reward, + episode, + step, + epsilon, + ) + np.save( + f"{self.global_params.models_dir}/{time.strftime('%Y%m%d-%H%M%S')}_Circuit-{self.environment.environment['circuit_name']}_States-{self.environment.environment['states']}_Actions-{self.environment.environment['action_space']}_Rewards-{self.environment.environment['reward_function']}_epsilon-{round(epsilon,3)}_epoch-{episode}_step-{step}_reward-{int(cumulated_reward)}-qtable.npy", + qlearn.q_table, + ) + log.logger.info( + f"\nTraining Time over\n" + f"current_max_reward = {current_max_reward}\n" + f"epoch = {episode}\n" + f"step = {step}\n" + f"epsilon = {epsilon}\n" + ) + break + + # save best values every save_episode times + self.global_params.ep_rewards.append(cumulated_reward) + if not episode % self.env_params.save_episodes: + self.global_params.aggr_ep_rewards = save_batch( + episode, + step, + start_time_epoch, + start_time, + self.global_params, + self.env_params, + ) + save_dataframe_episodes( + self.environment.environment, + self.global_params.metrics_data_dir, + self.global_params.aggr_ep_rewards, + ) + log.logger.info( + f"\nsaving BATCH\n" + f"current_max_reward = {current_max_reward}\n" + f"best_epoch = {best_epoch}\n" + f"best_step = {best_step}\n" + f"best_epoch_training_time = {best_epoch_training_time}\n" + ) + # updating epsilon for exploration + if epsilon > self.environment.environment["epsilon_min"]: + epsilon -= epsilon_decay + epsilon = qlearn.update_epsilon( + max(self.environment.environment["epsilon_min"], epsilon) + ) + + env.close() + + + + + + ################## + def main_____(self): + """ + Qlearn Dictionnary + """ + + log = LoggingHandler(self.log_file) + + ## Load Environment + env = gym.make(self.env_params.env_name, **self.environment.environment) + + start_time = datetime.now() + best_epoch = 1 + current_max_reward = 0 + best_step = 0 + best_epoch_training_time = 0 + epsilon = self.environment.environment["epsilon"] + epsilon_decay = epsilon / (self.env_params.total_episodes // 2) + # states_counter = {} + + log.logger.info( + f"\nactions_len = {len(self.global_params.actions_set)}\n" + f"actions_range = {range(len(self.global_params.actions_set))}\n" + f"actions = {self.global_params.actions_set}\n" + f"epsilon = {epsilon}\n" + f"epsilon_decay = {epsilon_decay}\n" + f"alpha = {self.environment.environment['alpha']}\n" + f"gamma = {self.environment.environment['gamma']}\n" + ) + ## --- init Qlearn + qlearn = QLearn( + actions=range(len(self.global_params.actions_set)), + epsilon=self.environment.environment["epsilon"], + alpha=self.environment.environment["alpha"], + gamma=self.environment.environment["gamma"], + ) + log.logger.info(f"\nqlearn.q = {qlearn.q}") + + ## retraining q model + if self.environment.environment["mode"] == "retraining": + qlearn.q = qlearn.load_pickle_model( + f"{self.global_params.models_dir}/{self.environment.environment['retrain_qlearn_model_name']}" + ) + log.logger.info(f"\nqlearn.q = {qlearn.q}") + + ## ------------- START TRAINING -------------------- + for episode in tqdm( + range(1, self.env_params.total_episodes + 1), + ascii=True, + unit="episodes", + ): + done = False + cumulated_reward = 0 + step = 0 + start_time_epoch = datetime.now() + + ## reset env() + observation = env.reset() + state = "".join(map(str, observation)) + + print(f"observation: {observation}") + print(f"observation type: {type(observation)}") + print(f"observation len: {len(observation)}") + print(f"state: {state}") + print(f"state type: {type(state)}") + print(f"state len: {len(state)}") + + while not done: + step += 1 + # Pick an action based on the current state + action = qlearn.selectAction(state) + + # Execute the action and get feedback + observation, reward, done, _ = env.step(action, step) + cumulated_reward += reward + next_state = "".join(map(str, observation)) + qlearn.learn(state, action, reward, next_state) + state = next_state + + # render params + render_params( + action=action, + episode=episode, + step=step, + v=self.global_params.actions_set[action][ + 0 + ], # this case for discrete + w=self.global_params.actions_set[action][ + 1 + ], # this case for discrete + epsilon=epsilon, + observation=observation, + reward_in_step=reward, + cumulated_reward=cumulated_reward, + done=done, + ) + + log.logger.debug( + f"\nepisode = {episode}\n" + f"step = {step}\n" + f"actions_len = {len(self.global_params.actions_set)}\n" + f"actions_range = {range(len(self.global_params.actions_set))}\n" + f"actions = {self.global_params.actions_set}\n" + f"epsilon = {epsilon}\n" + f"epsilon_decay = {epsilon_decay}\n" + f"v = {self.global_params.actions_set[action][0]}\n" + f"w = {self.global_params.actions_set[action][1]}\n" + f"observation = {observation}\n" + f"reward_in_step = {reward}\n" + f"cumulated_reward = {cumulated_reward}\n" + f"done = {done}\n" + ) + + try: + self.global_params.states_counter[next_state] += 1 + except KeyError: + self.global_params.states_counter[next_state] = 1 + + self.global_params.stats[int(episode)] = step + self.global_params.states_reward[int(episode)] = cumulated_reward + + # best episode and step's stats + if current_max_reward <= cumulated_reward and episode > 1: + ( + current_max_reward, + best_epoch, + best_step, + best_epoch_training_time, + ) = save_best_episode( + self.global_params, + cumulated_reward, + episode, + step, + start_time_epoch, + reward, + env.image_center, + ) + + # Showing stats in screen for monitoring. Showing every 'save_every_step' value + if not step % self.env_params.save_every_step: + log.logger.info( + f"saving batch of steps\n" + f"current_max_reward = {cumulated_reward}\n" + f"current epoch = {episode}\n" + f"current step = {step}\n" + f"best epoch so far = {best_epoch}\n" + f"best step so far = {best_step}\n" + f"best_epoch_training_time = {best_epoch_training_time}\n" + ) + + # End epoch + if step > self.env_params.estimated_steps: + done = True + qlearn.save_model( + self.environment.environment, + self.global_params.models_dir, + qlearn, + cumulated_reward, + episode, + step, + epsilon, + ) + log.logger.info( + f"\nEpisode COMPLETED\n" + f"in episode = {episode}\n" + f"steps = {step}\n" + f"cumulated_reward = {cumulated_reward}\n" + f"epsilon = {epsilon}\n" + ) + + # Save best lap + if cumulated_reward >= current_max_reward: + self.global_params.best_current_epoch["best_epoch"].append(best_epoch) + self.global_params.best_current_epoch["highest_reward"].append( + cumulated_reward + ) + self.global_params.best_current_epoch["best_step"].append(best_step) + self.global_params.best_current_epoch[ + "best_epoch_training_time" + ].append(best_epoch_training_time) + self.global_params.best_current_epoch[ + "current_total_training_time" + ].append(datetime.now() - start_time) + save_dataframe_episodes( + self.environment.environment, + self.global_params.metrics_data_dir, + self.global_params.best_current_epoch, + ) + qlearn.save_model( + self.environment.environment, + self.global_params.models_dir, + qlearn, + cumulated_reward, + episode, + step, + epsilon, + self.global_params.stats, + self.global_params.states_counter, + self.global_params.states_reward, + ) + + log.logger.info( + f"\nsaving best lap\n" + f"in episode = {episode}\n" + f"current_max_reward = {cumulated_reward}\n" + f"steps = {step}\n" + f"epsilon = {epsilon}\n" + ) + # ended at training time setting: 2 hours, 15 hours... + if ( + datetime.now() - timedelta(hours=self.global_params.training_time) + > start_time + ): + if cumulated_reward >= current_max_reward: + qlearn.save_model( + self.environment.environment, + self.global_params.models_dir, + qlearn, + cumulated_reward, + episode, + step, + epsilon, + ) + log.logger.info( + f"\nTraining Time over\n" + f"current_max_reward = {cumulated_reward}\n" + f"epoch = {episode}\n" + f"step = {step}\n" + f"epsilon = {epsilon}\n" + ) + break + + # save best values every save_episode times + self.global_params.ep_rewards.append(cumulated_reward) + if not episode % self.env_params.save_episodes: + self.global_params.aggr_ep_rewards = save_batch( + episode, + step, + start_time_epoch, + start_time, + self.global_params, + self.env_params, + ) + save_dataframe_episodes( + self.environment.environment, + self.global_params.metrics_data_dir, + self.global_params.aggr_ep_rewards, + ) + log.logger.info( + f"\nsaving BATCH\n" + f"current_max_reward = {cumulated_reward}\n" + f"best_epoch = {best_epoch}\n" + f"best_step = {best_step}\n" + f"best_epoch_training_time = {best_epoch_training_time}\n" + ) + # updating epsilon for exploration + if epsilon > self.environment.environment["epsilon_min"]: + # self.epsilon *= self.epsilon_discount + epsilon -= epsilon_decay + epsilon = qlearn.updateEpsilon( + max(self.environment.environment["epsilon_min"], epsilon) + ) + + env.close() diff --git a/rl_studio/agents/f1/train_followline_ddpg_f1_gazebo_tf.py b/rl_studio/agents/f1/train_followline_ddpg_f1_gazebo_tf.py new file mode 100644 index 000000000..af6839454 --- /dev/null +++ b/rl_studio/agents/f1/train_followline_ddpg_f1_gazebo_tf.py @@ -0,0 +1,368 @@ +from datetime import datetime, timedelta +import os +import pprint +import random +import time + +import gymnasium as gym +import numpy as np +import tensorflow as tf +from tqdm import tqdm + + +from rl_studio.agents.f1.loaders import ( + LoadAlgorithmParams, + LoadEnvParams, + LoadEnvVariablesDDPGGazebo, + LoadGlobalParams, +) +from rl_studio.agents.utils import ( + print_messages, + render_params, + save_dataframe_episodes, + LoggingHandler, +) +from rl_studio.algorithms.ddpg import ( + ModifiedTensorBoard, + OUActionNoise, + Buffer, + DDPGAgent, +) +from rl_studio.algorithms.utils import ( + save_actorcritic_model, +) +from rl_studio.envs.gazebo.gazebo_envs import * + + +class TrainerFollowLineDDPGF1GazeboTF: + """ + Mode: training + Task: Follow Line + Algorithm: DDPG + Agent: F1 + Simulator: Gazebo + Framework: TensorFlow + """ + + def __init__(self, config): + self.algoritmhs_params = LoadAlgorithmParams(config) + self.env_params = LoadEnvParams(config) + self.environment = LoadEnvVariablesDDPGGazebo(config) + self.global_params = LoadGlobalParams(config) + + os.makedirs(f"{self.global_params.models_dir}", exist_ok=True) + os.makedirs(f"{self.global_params.logs_dir}", exist_ok=True) + os.makedirs(f"{self.global_params.metrics_data_dir}", exist_ok=True) + os.makedirs(f"{self.global_params.metrics_graphics_dir}", exist_ok=True) + self.log_file = f"{self.global_params.logs_dir}/{time.strftime('%Y%m%d-%H%M%S')}_{self.global_params.mode}_{self.global_params.task}_{self.global_params.algorithm}_{self.global_params.agent}_{self.global_params.framework}.log" + + def main(self): + + log = LoggingHandler(self.log_file) + + ## Load Environment + env = gym.make(self.env_params.env_name, **self.environment.environment) + + random.seed(1) + np.random.seed(1) + tf.compat.v1.random.set_random_seed(1) + + start_time = datetime.now() + best_epoch = 1 + current_max_reward = 0 + best_step = 0 + best_epoch_training_time = 0 + ## Reset env + state, state_size = env.reset() + + log.logger.info( + f"\nstates = {self.global_params.states}\n" + f"states_set = {self.global_params.states_set}\n" + f"states_len = {len(self.global_params.states_set)}\n" + f"actions = {self.global_params.actions}\n" + f"actions set = {self.global_params.actions_set}\n" + f"actions_len = {len(self.global_params.actions_set)}\n" + f"actions_range = {range(len(self.global_params.actions_set))}\n" + f"batch_size = {self.algoritmhs_params.batch_size}\n" + f"logs_tensorboard_dir = {self.global_params.logs_tensorboard_dir}\n" + ) + + ## --------------------- Deep Nets ------------------ + ou_noise = OUActionNoise( + mean=np.zeros(1), + std_deviation=float(self.algoritmhs_params.std_dev) * np.ones(1), + ) + # Init Agents + ac_agent = DDPGAgent( + self.environment.environment, + len(self.global_params.actions_set), + state_size, + self.global_params.models_dir, + ) + # init Buffer + buffer = Buffer( + state_size, + len(self.global_params.actions_set), + self.global_params.states, + self.global_params.actions, + self.algoritmhs_params.buffer_capacity, + self.algoritmhs_params.batch_size, + ) + # Init TensorBoard + tensorboard = ModifiedTensorBoard( + log_dir=f"{self.global_params.logs_tensorboard_dir}/{self.algoritmhs_params.model_name}-{time.strftime('%Y%m%d-%H%M%S')}" + ) + + ## ------------- START TRAINING -------------------- + for episode in tqdm( + range(1, self.env_params.total_episodes + 1), ascii=True, unit="episodes" + ): + tensorboard.step = episode + done = False + cumulated_reward = 0 + step = 1 + start_time_epoch = datetime.now() + + prev_state, prev_state_size = env.reset() + + while not done: + tf_prev_state = tf.expand_dims(tf.convert_to_tensor(prev_state), 0) + action = ac_agent.policy( + tf_prev_state, ou_noise, self.global_params.actions + ) + state, reward, done, _ = env.step(action, step) + cumulated_reward += reward + + # learn and update + buffer.record((prev_state, action, reward, state)) + buffer.learn(ac_agent, self.algoritmhs_params.gamma) + ac_agent.update_target( + ac_agent.target_actor.variables, + ac_agent.actor_model.variables, + self.algoritmhs_params.tau, + ) + ac_agent.update_target( + ac_agent.target_critic.variables, + ac_agent.critic_model.variables, + self.algoritmhs_params.tau, + ) + + # + prev_state = state + step += 1 + + log.logger.debug( + f"\nstate = {state}\n" + # f"observation[0]= {observation[0]}\n" + f"state type = {type(state)}\n" + # f"observation[0] type = {type(observation[0])}\n" + f"prev_state = {prev_state}\n" + f"prev_state = {type(prev_state)}\n" + f"action = {action}\n" + f"actions type = {type(action)}\n" + ) + render_params( + task=self.global_params.task, + v=action[0][0], # for continuous actions + w=action[0][1], # for continuous actions + episode=episode, + step=step, + state=state, + # v=self.global_params.actions_set[action][ + # 0 + # ], # this case for discrete + # w=self.global_params.actions_set[action][ + # 1 + # ], # this case for discrete + reward_in_step=reward, + cumulated_reward_in_this_episode=cumulated_reward, + _="--------------------------", + best_episode_until_now=best_epoch, + in_best_step=best_step, + with_highest_reward=int(current_max_reward), + in_best_epoch_training_time=best_epoch_training_time, + ) + log.logger.debug( + f"\nepisode = {episode}\n" + f"step = {step}\n" + f"actions_len = {len(self.global_params.actions_set)}\n" + f"actions_range = {range(len(self.global_params.actions_set))}\n" + f"actions = {self.global_params.actions_set}\n" + f"reward_in_step = {reward}\n" + f"cumulated_reward = {cumulated_reward}\n" + f"done = {done}\n" + ) + + # best episode + if current_max_reward <= cumulated_reward: + current_max_reward = cumulated_reward + best_epoch = episode + best_step = step + best_epoch_training_time = datetime.now() - start_time_epoch + # saving params to show + self.global_params.actions_rewards["episode"].append(episode) + self.global_params.actions_rewards["step"].append(step) + # For continuous actios + # self.actions_rewards["v"].append(action[0][0]) + # self.actions_rewards["w"].append(action[0][1]) + self.global_params.actions_rewards["reward"].append(reward) + self.global_params.actions_rewards["center"].append( + env.image_center + ) + + # Showing stats in screen for monitoring. Showing every 'save_every_step' value + if not step % self.env_params.save_every_step: + save_dataframe_episodes( + self.environment.environment, + self.global_params.metrics_data_dir, + self.global_params.aggr_ep_rewards, + self.global_params.actions_rewards, + ) + log.logger.debug( + f"SHOWING BATCH OF STEPS\n" + f"current_max_reward = {cumulated_reward}\n" + f"current epoch = {episode}\n" + f"current step = {step}\n" + f"best epoch so far = {best_epoch}\n" + f"best step so far = {best_step}\n" + f"best_epoch_training_time = {best_epoch_training_time}\n" + ) + + ##################################################### + ### save in case of completed steps in one episode + if step >= self.env_params.estimated_steps: + done = True + log.logger.info( + f"\nEPISODE COMPLETED\n" + f"in episode = {episode}\n" + f"steps = {step}\n" + f"cumulated_reward = {cumulated_reward}\n" + ) + save_actorcritic_model( + ac_agent, + self.global_params, + self.algoritmhs_params, + cumulated_reward, + episode, + "LAPCOMPLETED", + ) + ##################################################### + #### save best lap in episode + if ( + cumulated_reward - self.environment.environment["rewards"]["penal"] + ) >= current_max_reward and episode > 1: + + self.global_params.best_current_epoch["best_epoch"].append(best_epoch) + self.global_params.best_current_epoch["highest_reward"].append( + current_max_reward + ) + self.global_params.best_current_epoch["best_step"].append(best_step) + self.global_params.best_current_epoch[ + "best_epoch_training_time" + ].append(best_epoch_training_time) + self.global_params.best_current_epoch[ + "current_total_training_time" + ].append(datetime.now() - start_time) + + save_dataframe_episodes( + self.environment.environment, + self.global_params.metrics_data_dir, + self.global_params.best_current_epoch, + ) + save_actorcritic_model( + ac_agent, + self.global_params, + self.algoritmhs_params, + cumulated_reward, + episode, + "BESTLAP", + ) + log.logger.info( + f"\nsaving best lap\n" + f"in episode = {episode}\n" + f"current_max_reward = {cumulated_reward}\n" + f"steps = {step}\n" + ) + + ##################################################### + ### end episode in time settings: 2 hours, 15 hours... + if ( + datetime.now() - timedelta(hours=self.global_params.training_time) + > start_time + ) or (episode > self.env_params.total_episodes): + log.logger.info( + f"\nTraining Time over\n" + f"current_max_reward = {cumulated_reward}\n" + f"epoch = {episode}\n" + f"step = {step}\n" + ) + if cumulated_reward > current_max_reward: + save_actorcritic_model( + ac_agent, + self.global_params, + self.algoritmhs_params, + cumulated_reward, + episode, + "FINISHTIME", + ) + + break + + ##################################################### + ### save every save_episode times + self.global_params.ep_rewards.append(cumulated_reward) + if not episode % self.env_params.save_episodes: + average_reward = sum( + self.global_params.ep_rewards[-self.env_params.save_episodes :] + ) / len(self.global_params.ep_rewards[-self.env_params.save_episodes :]) + min_reward = min( + self.global_params.ep_rewards[-self.env_params.save_episodes :] + ) + max_reward = max( + self.global_params.ep_rewards[-self.env_params.save_episodes :] + ) + tensorboard.update_stats( + reward_avg=int(average_reward), + reward_max=int(max_reward), + steps=step, + ) + self.global_params.aggr_ep_rewards["episode"].append(episode) + self.global_params.aggr_ep_rewards["step"].append(step) + self.global_params.aggr_ep_rewards["avg"].append(average_reward) + self.global_params.aggr_ep_rewards["max"].append(max_reward) + self.global_params.aggr_ep_rewards["min"].append(min_reward) + self.global_params.aggr_ep_rewards["epoch_training_time"].append( + (datetime.now() - start_time_epoch).total_seconds() + ) + self.global_params.aggr_ep_rewards["total_training_time"].append( + (datetime.now() - start_time).total_seconds() + ) + save_actorcritic_model( + ac_agent, + self.global_params, + self.algoritmhs_params, + cumulated_reward, + episode, + "BATCH", + ) + + save_dataframe_episodes( + self.environment.environment, + self.global_params.metrics_data_dir, + self.global_params.aggr_ep_rewards, + ) + log.logger.info( + f"\nsaving BATCH\n" + f"current_max_reward = {cumulated_reward}\n" + f"best_epoch = {best_epoch}\n" + f"best_step = {best_step}\n" + f"best_epoch_training_time = {best_epoch_training_time}\n" + ) + ##################################################### + ### save last episode, not neccesarily the best one + save_dataframe_episodes( + self.environment.environment, + self.global_params.metrics_data_dir, + self.global_params.aggr_ep_rewards, + ) + env.close() diff --git a/rl_studio/agents/f1/train_followline_dqn_f1_gazebo_tf.py b/rl_studio/agents/f1/train_followline_dqn_f1_gazebo_tf.py new file mode 100644 index 000000000..ee32bb8fd --- /dev/null +++ b/rl_studio/agents/f1/train_followline_dqn_f1_gazebo_tf.py @@ -0,0 +1,347 @@ +from datetime import datetime, timedelta +import logging +import os +import pprint +import random +import time + +import gymnasium as gym + +import numpy as np +import tensorflow as tf +from tqdm import tqdm + +from rl_studio.agents.f1.loaders import ( + LoadAlgorithmParams, + LoadEnvParams, + LoadEnvVariablesDQNGazebo, + LoadGlobalParams, +) +from rl_studio.agents.utils import ( + print_messages, + print_dictionary, + render_params, + save_dataframe_episodes, + save_batch, + save_best_episode_dqn, + LoggingHandler, +) +from rl_studio.algorithms.dqn_keras import ( + ModifiedTensorBoard, + DQN, +) +from rl_studio.envs.gazebo.gazebo_envs import * +from rl_studio.visual.ascii.images import JDEROBOT_LOGO +from rl_studio.visual.ascii.text import JDEROBOT, LETS_GO + + +class TrainerFollowLineDQNF1GazeboTF: + """ + Mode: training + Task: Follow Line + Algorithm: DQN + Agent: F1 + Simulator: Gazebo + Framework: TensorFlow + """ + + def __init__(self, config): + self.algoritmhs_params = LoadAlgorithmParams(config) + self.env_params = LoadEnvParams(config) + self.environment = LoadEnvVariablesDQNGazebo(config) + self.global_params = LoadGlobalParams(config) + os.makedirs(f"{self.global_params.models_dir}", exist_ok=True) + os.makedirs(f"{self.global_params.logs_dir}", exist_ok=True) + os.makedirs(f"{self.global_params.metrics_data_dir}", exist_ok=True) + os.makedirs(f"{self.global_params.metrics_graphics_dir}", exist_ok=True) + self.log_file = f"{self.global_params.logs_dir}/{time.strftime('%Y%m%d-%H%M%S')}_{self.global_params.mode}_{self.global_params.task}_{self.global_params.algorithm}_{self.global_params.agent}_{self.global_params.framework}.log" + + def main(self): + + log = LoggingHandler(self.log_file) + + ## Load Environment + env = gym.make(self.env_params.env_name, **self.environment.environment) + + random.seed(1) + np.random.seed(1) + tf.compat.v1.random.set_random_seed(1) + + start_time = datetime.now() + best_epoch = 1 + current_max_reward = 0 + best_step = 0 + best_epoch_training_time = 0 + epsilon = self.algoritmhs_params.epsilon + epsilon_discount = self.algoritmhs_params.epsilon_discount + epsilon_min = self.algoritmhs_params.epsilon_min + + ## Reset env + state, state_size = env.reset() + + log.logger.info( + f"\nstates = {self.global_params.states}\n" + f"states_set = {self.global_params.states_set}\n" + f"states_len = {len(self.global_params.states_set)}\n" + f"actions = {self.global_params.actions}\n" + f"actions set = {self.global_params.actions_set}\n" + f"actions_len = {len(self.global_params.actions_set)}\n" + f"actions_range = {range(len(self.global_params.actions_set))}\n" + f"epsilon = {epsilon}\n" + f"batch_size = {self.algoritmhs_params.batch_size}\n" + f"logs_tensorboard_dir = {self.global_params.logs_tensorboard_dir}\n" + ) + + ## --------------------- Deep Nets ------------------ + # Init Agent + dqn_agent = DQN( + self.environment.environment, + self.algoritmhs_params, + len(self.global_params.actions_set), + state_size, + self.global_params.models_dir, + self.global_params, + ) + # Init TensorBoard + tensorboard = ModifiedTensorBoard( + log_dir=f"{self.global_params.logs_tensorboard_dir}/{self.algoritmhs_params.model_name}-{time.strftime('%Y%m%d-%H%M%S')}" + ) + + ## ------------- START TRAINING -------------------- + for episode in tqdm( + range(1, self.env_params.total_episodes + 1), + ascii=True, + unit="episodes", + ): + tensorboard.step = episode + done = False + cumulated_reward = 0 + step = 1 + start_time_epoch = datetime.now() + + observation, _ = env.reset() + + while not done: + if np.random.random() > epsilon: + action = np.argmax(dqn_agent.get_qs(observation)) + else: + # Get random action + action = np.random.randint(0, len(self.global_params.actions_set)) + + new_observation, reward, done, _ = env.step(action, step) + + # Every step we update replay memory and train main network + # agent_dqn.update_replay_memory((state, action, reward, nextState, done)) + dqn_agent.update_replay_memory( + (observation, action, reward, new_observation, done) + ) + dqn_agent.train(done, step) + + cumulated_reward += reward + observation = new_observation + step += 1 + + log.logger.debug( + f"\nobservation = {observation}\n" + # f"observation[0]= {observation[0]}\n" + f"observation type = {type(observation)}\n" + # f"observation[0] type = {type(observation[0])}\n" + f"new_observation = {new_observation}\n" + f"new_observation = {type(new_observation)}\n" + f"action = {action}\n" + f"actions type = {type(action)}\n" + ) + render_params( + task=self.global_params.task, + # v=action[0][0], # for continuous actions + # w=action[0][1], # for continuous actions + episode=episode, + step=step, + state=state, + # v=self.global_params.actions_set[action][ + # 0 + # ], # this case for discrete + # w=self.global_params.actions_set[action][ + # 1 + # ], # this case for discrete + # self.env.image_center, + # self.actions_rewards, + reward_in_step=reward, + cumulated_reward_in_this_episode=cumulated_reward, + _="--------------------------", + best_episode_until_now=best_epoch, + in_best_step=best_step, + with_highest_reward=int(current_max_reward), + in_best_epoch_training_time=best_epoch_training_time, + ) + + log.logger.debug( + f"\nepisode = {episode}\n" + f"step = {step}\n" + f"actions_len = {len(self.global_params.actions_set)}\n" + f"actions_range = {range(len(self.global_params.actions_set))}\n" + f"actions = {self.global_params.actions_set}\n" + f"epsilon = {epsilon}\n" + # f"v = {self.global_params.actions_set[action][0]}\n" + # f"w = {self.global_params.actions_set[action][1]}\n" + f"observation = {observation}\n" + f"reward_in_step = {reward}\n" + f"cumulated_reward = {cumulated_reward}\n" + f"done = {done}\n" + ) + # best episode + if current_max_reward <= cumulated_reward and episode > 1: + ( + current_max_reward, + best_epoch, + best_step, + best_epoch_training_time, + ) = save_best_episode_dqn( + self.global_params, + cumulated_reward, + episode, + step, + start_time_epoch, + reward, + ) + + # Showing stats in screen for monitoring. Showing every 'save_every_step' value + if not step % self.env_params.save_every_step: + log.logger.debug( + f"SHOWING BATCH OF STEPS\n" + f"current_max_reward = {cumulated_reward}\n" + f"current epoch = {episode}\n" + f"current step = {step}\n" + f"best epoch so far = {best_epoch}\n" + f"best step so far = {best_step}\n" + f"best_epoch_training_time = {best_epoch_training_time}\n" + ) + ##################################################### + ### save in case of completed steps in one episode + if step >= self.env_params.estimated_steps: + done = True + log.logger.info( + f"\nEPISODE COMPLETED\n" + f"in episode = {episode}\n" + f"steps = {step}\n" + f"cumulated_reward = {cumulated_reward}\n" + f"epsilon = {epsilon}\n" + ) + dqn_agent.model.save( + f"{self.global_params.models_dir}/{self.algoritmhs_params.model_name}_LAPCOMPLETED_Max{int(cumulated_reward)}_Epoch{episode}_inTime{time.strftime('%Y%m%d-%H%M%S')}.model" + ) + + ##################################################### + #### save best lap in episode + if ( + cumulated_reward - self.environment.environment["rewards"]["penal"] + ) >= current_max_reward and episode > 1: + + self.global_params.best_current_epoch["best_epoch"].append(best_epoch) + self.global_params.best_current_epoch["highest_reward"].append( + cumulated_reward + ) + self.global_params.best_current_epoch["best_step"].append(best_step) + self.global_params.best_current_epoch[ + "best_epoch_training_time" + ].append(best_epoch_training_time) + self.global_params.best_current_epoch[ + "current_total_training_time" + ].append(datetime.now() - start_time) + save_dataframe_episodes( + self.environment.environment, + self.global_params.metrics_data_dir, + self.global_params.aggr_ep_rewards, + ) + dqn_agent.model.save( + f"{self.global_params.models_dir}/{self.algoritmhs_params.model_name}_LAPCOMPLETED_Max{int(cumulated_reward)}_Epoch{episode}_inTime{time.strftime('%Y%m%d-%H%M%S')}.model" + ) + + log.logger.info( + f"\nsaving best lap\n" + f"in episode = {episode}\n" + f"current_max_reward = {cumulated_reward}\n" + f"steps = {step}\n" + f"epsilon = {epsilon}\n" + ) + + ##################################################### + ### end episode in time settings: 2 hours, 15 hours... + if ( + datetime.now() - timedelta(hours=self.global_params.training_time) + > start_time + ): + log.logger.info( + f"\nTraining Time over\n" + f"current_max_reward = {cumulated_reward}\n" + f"epoch = {episode}\n" + f"step = {step}\n" + f"epsilon = {epsilon}\n" + ) + if cumulated_reward > current_max_reward: + dqn_agent.model.save( + f"{self.global_params.models_dir}/{self.algoritmhs_params.model_name}_LAPCOMPLETED_Max{int(cumulated_reward)}_Epoch{episode}_inTime{time.strftime('%Y%m%d-%H%M%S')}.model" + ) + + break + + ##################################################### + ### save every save_episode times + self.global_params.ep_rewards.append(cumulated_reward) + if not episode % self.env_params.save_episodes: + average_reward = sum( + self.global_params.ep_rewards[-self.env_params.save_episodes :] + ) / len(self.global_params.ep_rewards[-self.env_params.save_episodes :]) + min_reward = min( + self.global_params.ep_rewards[-self.env_params.save_episodes :] + ) + max_reward = max( + self.global_params.ep_rewards[-self.env_params.save_episodes :] + ) + + tensorboard.update_stats( + reward_avg=int(average_reward), + reward_max=int(max_reward), + steps=step, + epsilon=epsilon, + ) + self.global_params.aggr_ep_rewards["episode"].append(episode) + self.global_params.aggr_ep_rewards["step"].append(step) + self.global_params.aggr_ep_rewards["avg"].append(average_reward) + self.global_params.aggr_ep_rewards["max"].append(max_reward) + self.global_params.aggr_ep_rewards["min"].append(min_reward) + self.global_params.aggr_ep_rewards["epoch_training_time"].append( + (datetime.now() - start_time_epoch).total_seconds() + ) + self.global_params.aggr_ep_rewards["total_training_time"].append( + (datetime.now() - start_time).total_seconds() + ) + + if max_reward > current_max_reward: + dqn_agent.model.save( + f"{self.global_params.models_dir}/{self.algoritmhs_params.model_name}_LAPCOMPLETED_Max{int(cumulated_reward)}_Epoch{episode}_inTime{time.strftime('%Y%m%d-%H%M%S')}.model" + ) + save_dataframe_episodes( + self.environment.environment, + self.global_params.metrics_data_dir, + self.global_params.aggr_ep_rewards, + ) + log.logger.info( + f"\nsaving BATCH\n" + f"current_max_reward = {cumulated_reward}\n" + f"best_epoch = {best_epoch}\n" + f"best_step = {best_step}\n" + f"best_epoch_training_time = {best_epoch_training_time}\n" + ) + # reducing exploration + if epsilon > epsilon_min: + epsilon *= epsilon_discount + + ##################################################### + ### save last episode, not neccesarily the best one + save_dataframe_episodes( + self.environment.environment, + self.global_params.metrics_data_dir, + self.global_params.aggr_ep_rewards, + ) + env.close() diff --git a/rl_studio/agents/f1/train_followline_qlearn_f1_gazebo.py b/rl_studio/agents/f1/train_followline_qlearn_f1_gazebo.py new file mode 100755 index 000000000..b565282ef --- /dev/null +++ b/rl_studio/agents/f1/train_followline_qlearn_f1_gazebo.py @@ -0,0 +1,588 @@ +from datetime import datetime, timedelta +import os +import time + +import gymnasium as gym +import numpy as np +from reloading import reloading +from tqdm import tqdm + +from rl_studio.agents.f1.loaders import ( + LoadAlgorithmParams, + LoadEnvParams, + LoadEnvVariablesQlearnGazebo, + LoadGlobalParams, +) +from rl_studio.agents.utils import ( + render_params, + save_dataframe_episodes, + save_batch, + save_best_episode, + LoggingHandler, +) +from rl_studio.algorithms.qlearn import QLearn, QLearnF1 +from rl_studio.envs.gazebo.gazebo_envs import * + + +class TrainerFollowLineQlearnF1Gazebo: + """ + Mode: training + Task: Follow Line + Algorithm: Qlearn + Agent: F1 + Simulator: Gazebo + """ + + def __init__(self, config): + self.algoritmhs_params = LoadAlgorithmParams(config) + self.env_params = LoadEnvParams(config) + self.environment = LoadEnvVariablesQlearnGazebo(config) + self.global_params = LoadGlobalParams(config) + + os.makedirs(f"{self.global_params.models_dir}", exist_ok=True) + os.makedirs(f"{self.global_params.logs_dir}", exist_ok=True) + os.makedirs(f"{self.global_params.metrics_data_dir}", exist_ok=True) + os.makedirs(f"{self.global_params.metrics_graphics_dir}", exist_ok=True) + + self.log_file = f"{self.global_params.logs_dir}/{time.strftime('%Y%m%d-%H%M%S')}_{self.global_params.mode}_{self.global_params.task}_{self.global_params.algorithm}_{self.global_params.agent}_{self.global_params.framework}.log" + + def main(self): + """ + Implementation of QlearnF1, a table based algorithm + """ + + log = LoggingHandler(self.log_file) + + ## Load Environment + env = gym.make(self.env_params.env_name, **self.environment.environment) + + start_time = datetime.now() + best_epoch = 1 + current_max_reward = 0 + best_step = 0 + best_epoch_training_time = 0 + epsilon = self.environment.environment["epsilon"] + epsilon_decay = epsilon / (self.env_params.total_episodes // 2) + # states_counter = {} + + log.logger.info( + f"\nstates = {self.global_params.states}\n" + f"states_set = {self.global_params.states_set}\n" + f"states_len = {len(self.global_params.states_set)}\n" + f"actions = {self.global_params.actions}\n" + f"actions set = {self.global_params.actions_set}\n" + f"actions_len = {len(self.global_params.actions_set)}\n" + f"actions_range = {range(len(self.global_params.actions_set))}\n" + f"epsilon = {epsilon}\n" + f"epsilon_decay = {epsilon_decay}\n" + f"alpha = {self.environment.environment['alpha']}\n" + f"gamma = {self.environment.environment['gamma']}\n" + ) + ## --- init Qlearn + qlearn = QLearnF1( + len(self.global_params.states_set), + self.global_params.actions, + len(self.global_params.actions_set), + self.environment.environment["epsilon"], + self.environment.environment["alpha"], + self.environment.environment["gamma"], + self.environment.environment["num_regions"], + ) + + ## retraining q model + if self.environment.environment["mode"] == "retraining": + qlearn.load_table( + f"{self.global_params.models_dir}/{self.environment.environment['retrain_qlearn_model_name']}" + ) + + ## ------------- START TRAINING -------------------- + for episode in tqdm( + range(1, self.env_params.total_episodes + 1), + ascii=True, + unit="episodes", + ): + done = False + cumulated_reward = 0 + step = 0 + start_time_epoch = datetime.now() + + ## reset env() + observation, _ = env.reset() + + while not done: + step += 1 + # Pick an action based on the current state + action = qlearn.select_action(observation) + + # Execute the action and get feedback + new_observation, reward, done, _ = env.step(action, step) + cumulated_reward += reward + + log.logger.debug( + f"\nobservation = {observation}\n" + f"observation[0]= {observation[0]}\n" + f"observation type = {type(observation)}\n" + f"observation[0] type = {type(observation[0])}\n" + f"new_observation = {new_observation}\n" + f"new_observation = {type(new_observation)}\n" + f"action = {action}\n" + f"actions type = {type(action)}\n" + ) + + qlearn.learn(observation, action, reward, new_observation) + observation = new_observation + + # render params + render_params( + action=action, + episode=episode, + step=step, + v=self.global_params.actions_set[action][ + 0 + ], # this case for discrete + w=self.global_params.actions_set[action][ + 1 + ], # this case for discrete + epsilon=epsilon, + observation=observation, + reward_in_step=reward, + cumulated_reward=cumulated_reward, + done=done, + ) + + log.logger.debug( + f"\nepisode = {episode}\n" + f"step = {step}\n" + f"actions_len = {len(self.global_params.actions_set)}\n" + f"actions_range = {range(len(self.global_params.actions_set))}\n" + f"actions = {self.global_params.actions_set}\n" + f"epsilon = {epsilon}\n" + f"epsilon_decay = {epsilon_decay}\n" + f"v = {self.global_params.actions_set[action][0]}\n" + f"w = {self.global_params.actions_set[action][1]}\n" + f"observation = {observation}\n" + f"reward_in_step = {reward}\n" + f"cumulated_reward = {cumulated_reward}\n" + f"done = {done}\n" + ) + + # best episode and step's stats + if current_max_reward <= cumulated_reward and episode > 1: + ( + current_max_reward, + best_epoch, + best_step, + best_epoch_training_time, + ) = save_best_episode( + self.global_params, + cumulated_reward, + episode, + step, + start_time_epoch, + reward, + env.image_center, + ) + + # Showing stats in screen for monitoring. Showing every 'save_every_step' value + if not step % self.env_params.save_every_step: + log.logger.debug( + f"SHOWING BATCH OF STEPS\n" + f"current_max_reward = {cumulated_reward}\n" + f"current epoch = {episode}\n" + f"current step = {step}\n" + f"best epoch so far = {best_epoch}\n" + f"best step so far = {best_step}\n" + f"best_epoch_training_time = {best_epoch_training_time}\n" + ) + + # End epoch + if step > self.env_params.estimated_steps: + done = True + qlearn.save_numpytable( + qlearn.q_table, + self.environment.environment, + self.global_params.models_dir, + cumulated_reward, + episode, + step, + epsilon, + ) + np.save( + f"{self.global_params.models_dir}/{time.strftime('%Y%m%d-%H%M%S')}_Circuit-{self.environment.environment['circuit_name']}_States-{self.environment.environment['states']}_Actions-{self.environment.environment['action_space']}_Rewards-{self.environment.environment['reward_function']}_epsilon-{round(epsilon,3)}_epoch-{episode}_step-{step}_reward-{int(cumulated_reward)}-qtable.npy", + qlearn.q_table, + ) + log.logger.info( + f"\nEPISODE COMPLETED\n" + f"in episode = {episode}\n" + f"steps = {step}\n" + f"cumulated_reward = {cumulated_reward}\n" + f"epsilon = {epsilon}\n" + ) + + # Save best lap + if cumulated_reward >= current_max_reward: + self.global_params.best_current_epoch["best_epoch"].append(best_epoch) + self.global_params.best_current_epoch["highest_reward"].append( + cumulated_reward + ) + self.global_params.best_current_epoch["best_step"].append(best_step) + self.global_params.best_current_epoch[ + "best_epoch_training_time" + ].append(best_epoch_training_time) + self.global_params.best_current_epoch[ + "current_total_training_time" + ].append(datetime.now() - start_time) + save_dataframe_episodes( + self.environment.environment, + self.global_params.metrics_data_dir, + self.global_params.best_current_epoch, + ) + qlearn.save_numpytable( + qlearn.q_table, + self.environment.environment, + self.global_params.models_dir, + cumulated_reward, + episode, + step, + epsilon, + ) + np.save( + f"{self.global_params.models_dir}/{time.strftime('%Y%m%d-%H%M%S')}_Circuit-{self.environment.environment['circuit_name']}_States-{self.environment.environment['states']}_Actions-{self.environment.environment['action_space']}_Rewards-{self.environment.environment['reward_function']}_epsilon-{round(epsilon,3)}_epoch-{episode}_step-{step}_reward-{int(cumulated_reward)}-qtable.npy", + qlearn.q_table, + ) + + log.logger.info( + f"\nsaving best lap\n" + f"in episode = {episode}\n" + f"current_max_reward = {cumulated_reward}\n" + f"steps = {step}\n" + f"epsilon = {epsilon}\n" + ) + # end of training by: + # training time setting: 2 hours, 15 hours... + # num epochs + + if ( + datetime.now() - timedelta(hours=self.global_params.training_time) + > start_time + ) or (episode > self.env_params.total_episodes): + if cumulated_reward >= current_max_reward: + qlearn.save_numpytable( + qlearn.q_table, + self.environment.environment, + self.global_params.models_dir, + qlearn, + cumulated_reward, + episode, + step, + epsilon, + ) + np.save( + f"{self.global_params.models_dir}/{time.strftime('%Y%m%d-%H%M%S')}_Circuit-{self.environment.environment['circuit_name']}_States-{self.environment.environment['states']}_Actions-{self.environment.environment['action_space']}_Rewards-{self.environment.environment['reward_function']}_epsilon-{round(epsilon,3)}_epoch-{episode}_step-{step}_reward-{int(cumulated_reward)}-qtable.npy", + qlearn.q_table, + ) + log.logger.info( + f"\nTraining Time over\n" + f"current_max_reward = {cumulated_reward}\n" + f"epoch = {episode}\n" + f"step = {step}\n" + f"epsilon = {epsilon}\n" + ) + break + + # save best values every save_episode times + self.global_params.ep_rewards.append(cumulated_reward) + if not episode % self.env_params.save_episodes: + self.global_params.aggr_ep_rewards = save_batch( + episode, + step, + start_time_epoch, + start_time, + self.global_params, + self.env_params, + ) + save_dataframe_episodes( + self.environment.environment, + self.global_params.metrics_data_dir, + self.global_params.aggr_ep_rewards, + ) + log.logger.info( + f"\nsaving BATCH\n" + f"current_max_reward = {cumulated_reward}\n" + f"best_epoch = {best_epoch}\n" + f"best_step = {best_step}\n" + f"best_epoch_training_time = {best_epoch_training_time}\n" + ) + # updating epsilon for exploration + if epsilon > self.environment.environment["epsilon_min"]: + epsilon -= epsilon_decay + epsilon = qlearn.update_epsilon( + max(self.environment.environment["epsilon_min"], epsilon) + ) + + env.close() + + ################## + def main_____(self): + """ + Qlearn Dictionnary + """ + + log = LoggingHandler(self.log_file) + + ## Load Environment + env = gym.make(self.env_params.env_name, **self.environment.environment) + + start_time = datetime.now() + best_epoch = 1 + current_max_reward = 0 + best_step = 0 + best_epoch_training_time = 0 + epsilon = self.environment.environment["epsilon"] + epsilon_decay = epsilon / (self.env_params.total_episodes // 2) + # states_counter = {} + + log.logger.info( + f"\nactions_len = {len(self.global_params.actions_set)}\n" + f"actions_range = {range(len(self.global_params.actions_set))}\n" + f"actions = {self.global_params.actions_set}\n" + f"epsilon = {epsilon}\n" + f"epsilon_decay = {epsilon_decay}\n" + f"alpha = {self.environment.environment['alpha']}\n" + f"gamma = {self.environment.environment['gamma']}\n" + ) + ## --- init Qlearn + qlearn = QLearn( + actions=range(len(self.global_params.actions_set)), + epsilon=self.environment.environment["epsilon"], + alpha=self.environment.environment["alpha"], + gamma=self.environment.environment["gamma"], + ) + log.logger.info(f"\nqlearn.q = {qlearn.q}") + + ## retraining q model + if self.environment.environment["mode"] == "retraining": + qlearn.q = qlearn.load_pickle_model( + f"{self.global_params.models_dir}/{self.environment.environment['retrain_qlearn_model_name']}" + ) + log.logger.info(f"\nqlearn.q = {qlearn.q}") + + ## ------------- START TRAINING -------------------- + for episode in tqdm( + range(1, self.env_params.total_episodes + 1), + ascii=True, + unit="episodes", + ): + done = False + cumulated_reward = 0 + step = 0 + start_time_epoch = datetime.now() + + ## reset env() + observation = env.reset() + state = "".join(map(str, observation)) + + print(f"observation: {observation}") + print(f"observation type: {type(observation)}") + print(f"observation len: {len(observation)}") + print(f"state: {state}") + print(f"state type: {type(state)}") + print(f"state len: {len(state)}") + + while not done: + step += 1 + # Pick an action based on the current state + action = qlearn.selectAction(state) + + # Execute the action and get feedback + observation, reward, done, _ = env.step(action, step) + cumulated_reward += reward + next_state = "".join(map(str, observation)) + qlearn.learn(state, action, reward, next_state) + state = next_state + + # render params + render_params( + action=action, + episode=episode, + step=step, + v=self.global_params.actions_set[action][ + 0 + ], # this case for discrete + w=self.global_params.actions_set[action][ + 1 + ], # this case for discrete + epsilon=epsilon, + observation=observation, + reward_in_step=reward, + cumulated_reward=cumulated_reward, + done=done, + ) + + log.logger.debug( + f"\nepisode = {episode}\n" + f"step = {step}\n" + f"actions_len = {len(self.global_params.actions_set)}\n" + f"actions_range = {range(len(self.global_params.actions_set))}\n" + f"actions = {self.global_params.actions_set}\n" + f"epsilon = {epsilon}\n" + f"epsilon_decay = {epsilon_decay}\n" + f"v = {self.global_params.actions_set[action][0]}\n" + f"w = {self.global_params.actions_set[action][1]}\n" + f"observation = {observation}\n" + f"reward_in_step = {reward}\n" + f"cumulated_reward = {cumulated_reward}\n" + f"done = {done}\n" + ) + + try: + self.global_params.states_counter[next_state] += 1 + except KeyError: + self.global_params.states_counter[next_state] = 1 + + self.global_params.stats[int(episode)] = step + self.global_params.states_reward[int(episode)] = cumulated_reward + + # best episode and step's stats + if current_max_reward <= cumulated_reward and episode > 1: + ( + current_max_reward, + best_epoch, + best_step, + best_epoch_training_time, + ) = save_best_episode( + self.global_params, + cumulated_reward, + episode, + step, + start_time_epoch, + reward, + env.image_center, + ) + + # Showing stats in screen for monitoring. Showing every 'save_every_step' value + if not step % self.env_params.save_every_step: + log.logger.info( + f"saving batch of steps\n" + f"current_max_reward = {cumulated_reward}\n" + f"current epoch = {episode}\n" + f"current step = {step}\n" + f"best epoch so far = {best_epoch}\n" + f"best step so far = {best_step}\n" + f"best_epoch_training_time = {best_epoch_training_time}\n" + ) + + # End epoch + if step > self.env_params.estimated_steps: + done = True + qlearn.save_model( + self.environment.environment, + self.global_params.models_dir, + qlearn, + cumulated_reward, + episode, + step, + epsilon, + ) + log.logger.info( + f"\nEpisode COMPLETED\n" + f"in episode = {episode}\n" + f"steps = {step}\n" + f"cumulated_reward = {cumulated_reward}\n" + f"epsilon = {epsilon}\n" + ) + + # Save best lap + if cumulated_reward >= current_max_reward: + self.global_params.best_current_epoch["best_epoch"].append(best_epoch) + self.global_params.best_current_epoch["highest_reward"].append( + cumulated_reward + ) + self.global_params.best_current_epoch["best_step"].append(best_step) + self.global_params.best_current_epoch[ + "best_epoch_training_time" + ].append(best_epoch_training_time) + self.global_params.best_current_epoch[ + "current_total_training_time" + ].append(datetime.now() - start_time) + save_dataframe_episodes( + self.environment.environment, + self.global_params.metrics_data_dir, + self.global_params.best_current_epoch, + ) + qlearn.save_model( + self.environment.environment, + self.global_params.models_dir, + qlearn, + cumulated_reward, + episode, + step, + epsilon, + self.global_params.stats, + self.global_params.states_counter, + self.global_params.states_reward, + ) + + log.logger.info( + f"\nsaving best lap\n" + f"in episode = {episode}\n" + f"current_max_reward = {cumulated_reward}\n" + f"steps = {step}\n" + f"epsilon = {epsilon}\n" + ) + # ended at training time setting: 2 hours, 15 hours... + if ( + datetime.now() - timedelta(hours=self.global_params.training_time) + > start_time + ): + if cumulated_reward >= current_max_reward: + qlearn.save_model( + self.environment.environment, + self.global_params.models_dir, + qlearn, + cumulated_reward, + episode, + step, + epsilon, + ) + log.logger.info( + f"\nTraining Time over\n" + f"current_max_reward = {cumulated_reward}\n" + f"epoch = {episode}\n" + f"step = {step}\n" + f"epsilon = {epsilon}\n" + ) + break + + # save best values every save_episode times + self.global_params.ep_rewards.append(cumulated_reward) + if not episode % self.env_params.save_episodes: + self.global_params.aggr_ep_rewards = save_batch( + episode, + step, + start_time_epoch, + start_time, + self.global_params, + self.env_params, + ) + save_dataframe_episodes( + self.environment.environment, + self.global_params.metrics_data_dir, + self.global_params.aggr_ep_rewards, + ) + log.logger.info( + f"\nsaving BATCH\n" + f"current_max_reward = {cumulated_reward}\n" + f"best_epoch = {best_epoch}\n" + f"best_step = {best_step}\n" + f"best_epoch_training_time = {best_epoch_training_time}\n" + ) + # updating epsilon for exploration + if epsilon > self.environment.environment["epsilon_min"]: + # self.epsilon *= self.epsilon_discount + epsilon -= epsilon_decay + epsilon = qlearn.updateEpsilon( + max(self.environment.environment["epsilon_min"], epsilon) + ) + + env.close() diff --git a/rl_studio/agents/frameworks_type.py b/rl_studio/agents/frameworks_type.py new file mode 100644 index 000000000..dbd020637 --- /dev/null +++ b/rl_studio/agents/frameworks_type.py @@ -0,0 +1,6 @@ +from enum import Enum + + +class FrameworksType(Enum): + TF = "TensorFlow" + PYTORCH = "Pytorch" diff --git a/rl_studio/agents/mountain_car/inference_qlearn.py b/rl_studio/agents/mountain_car/inference_qlearn.py index 86f0f96fc..3a731d6f3 100644 --- a/rl_studio/agents/mountain_car/inference_qlearn.py +++ b/rl_studio/agents/mountain_car/inference_qlearn.py @@ -12,29 +12,27 @@ from . import utils as specific_utils -class QLearnMountainCarTrainer: +class QLearnMountainCarInferencer: def __init__(self, params): # TODO: Create a pydantic metaclass to simplify the way we extract the params # environment params + self.n_steps = 0 self.params = params - self.environment_params = params.environment["params"] - self.env_name = params.environment["params"]["env_name"] - env_params = params.environment["params"] - actions = params.environment["actions"] - env_params["actions"] = actions - self.env = gym.make(self.env_name, **env_params) + self.environment_params = params["environments"] + self.env_name = params["environments"]["env_name"] + self.env = gym.make(self.env_name, **self.params) # algorithm params self.states_counter = {} self.states_reward = {} self.last_time_steps = np.ndarray(0) - self.outdir = "./logs/robot_mesh_experiments/" - self.env = gym.wrappers.Monitor(self.env, self.outdir, force=True) - self.env.done = True + inference_file = params["inference"]["inference_file"] + actions_file = params["inference"]["actions_file"] - inference_file = params.inference["params"]["inference_file"] - actions_file = params.inference["params"]["actions_file"] self.highest_reward = 0 + self.total_episodes = 20000 + self.epsilon_discount = 0.999 # Default 0.9986 + self.cumulated_reward = 0 self.inferencer = InferencerWrapper("qlearn", inference_file, actions_file) @@ -54,15 +52,14 @@ def evaluate(self, state): nextState, reward, done, info = self.env.step(-1) else: nextState, reward, done, info = self.env.step(action) - n_steps = self.n_steps + 1 - print("step " + str(n_steps) + "!!!! ----------------------------") + self.n_steps = self.n_steps + 1 + print("step " + str(self.n_steps) + "!!!! ----------------------------") self.cumulated_reward += reward if self.highest_reward < self.cumulated_reward: self.highest_reward = self.cumulated_reward - self.env._flush(force=True) return nextState, done def simulation(self, queue): @@ -76,7 +73,6 @@ def simulation(self, queue): for episode in range(self.total_episodes): - done = False self.n_steps = 0 cumulated_reward = 0 @@ -90,8 +86,7 @@ def simulation(self, queue): if not done: state = next_state else: - last_time_steps = np.append(last_time_steps, [int(step + 1)]) - self.stats[int(episode)] = step + self.last_time_steps = np.append(self.last_time_steps, [int(step + 1)]) self.states_reward[int(episode)] = cumulated_reward print( "---------------------------------------------------------------------------------------------" @@ -101,7 +96,6 @@ def simulation(self, queue): f"- Time: {start_time_format} - Steps: {step}" ) - self.rewards_per_run.append(cumulated_reward) queue.put(self.n_steps) break @@ -112,10 +106,10 @@ def simulation(self, queue): ) ) - l = last_time_steps.tolist() + l = self.last_time_steps.tolist() l.sort() - print("Overall score: {:0.2f}".format(last_time_steps.mean())) + print("Overall score: {:0.2f}".format(self.last_time_steps.mean())) print( "Best 100 score: {:0.2f}".format( reduce(lambda x, y: x + y, l[-100:]) / len(l[-100:]) diff --git a/rl_studio/agents/mountain_car/train_qlearn.py b/rl_studio/agents/mountain_car/train_qlearn.py index c561b3c69..b1fca7410 100644 --- a/rl_studio/agents/mountain_car/train_qlearn.py +++ b/rl_studio/agents/mountain_car/train_qlearn.py @@ -17,22 +17,20 @@ def __init__(self, params): # TODO: Create a pydantic metaclass to simplify the way we extract the params # environment params self.params = params - self.environment_params = params.environment["params"] - self.env_name = params.environment["params"]["env_name"] - env_params = params.environment["params"] - actions = params.environment["actions"] - env_params["actions"] = actions - self.env = gym.make(self.env_name, **env_params) + self.config = params["settings"] + self.environment_params = params["environments"] + self.env_name = self.environment_params["env_name"] + self.env = gym.make(self.env_name, **self.params) # algorithm params - self.alpha = params.algorithm["params"]["alpha"] - self.epsilon = params.algorithm["params"]["epsilon"] - self.gamma = params.algorithm["params"]["gamma"] + self.alpha = params["algorithm"]["alpha"] + self.epsilon = params["algorithm"]["epsilon"] + self.gamma = params["algorithm"]["gamma"] self.states_counter = {} self.states_reward = {} self.stats = {} self.last_time_steps = np.ndarray(0) - self.outdir = "./logs/robot_mesh_experiments/" + self.outdir = "./logs/mountain_car/" self.env = gym.wrappers.Monitor(self.env, self.outdir, force=True) self.actions = range(self.env.action_space.n) self.env.done = True @@ -100,7 +98,6 @@ def simulation(self, queue): for episode in range(self.total_episodes): - done = False cumulated_reward = 0 print("resetting") state = self.env.reset() @@ -129,15 +126,15 @@ def simulation(self, queue): break - if episode % 250 == 0 and self.config.save_model and episode > 1: - print(f"\nSaving model . . .\n") - utils.save_model( - self.qlearn, - start_time_format, - self.stats, - self.states_counter, - self.states_reward, - ) + if episode % 250 == 0 and self.config["save_model"] and episode > 1: + print(f"\nSaving model . . .\n") + utils.save_model( + self.qlearn, + start_time_format, + self.stats, + self.states_counter, + self.states_reward, + ) print( "Total EP: {} - epsilon: {} - ep. discount: {} - Highest Reward: {}".format( diff --git a/rl_studio/agents/mountain_car/utils.py b/rl_studio/agents/mountain_car/utils.py index 0aa9d1f42..47b33a66a 100644 --- a/rl_studio/agents/mountain_car/utils.py +++ b/rl_studio/agents/mountain_car/utils.py @@ -38,29 +38,29 @@ def save_model(qlearn, current_time, states, states_counter, states_rewards): # Q TABLE base_file_name = "_epsilon_{}".format(round(qlearn.epsilon, 3)) file_dump = open( - f"./logs/qlearn_models/1_{current_time}{base_file_name}_QTABLE.pkl", "wb" + f"./logs/mountain_car/1_{current_time}{base_file_name}_QTABLE.pkl", "wb" ) pickle.dump(qlearn.q, file_dump) # STATES COUNTER states_counter_file_name = base_file_name + "_STATES_COUNTER.pkl" file_dump = open( - f"./logs/qlearn_models/2_{current_time}{states_counter_file_name}", "wb" + f"./logs/mountain_car/2_{current_time}{states_counter_file_name}", "wb" ) pickle.dump(states_counter, file_dump) # STATES CUMULATED REWARD states_cum_reward_file_name = base_file_name + "_STATES_CUM_REWARD.pkl" file_dump = open( - f"./logs/qlearn_models/3_{current_time}{states_cum_reward_file_name}", "wb" + f"./logs/mountain_car/3_{current_time}{states_cum_reward_file_name}", "wb" ) pickle.dump(states_rewards, file_dump) # STATES steps = base_file_name + "_STATES_STEPS.pkl" - file_dump = open(f"./logs/qlearn_models/4_{current_time}{steps}", "wb") + file_dump = open(f"./logs/mountain_car/4_{current_time}{steps}", "wb") pickle.dump(states, file_dump) def save_actions(actions, start_time): - file_dump = open("./logs/qlearn_models/actions_set_" + start_time, "wb") + file_dump = open("./logs/mountain_car/actions_set_" + start_time, "wb") pickle.dump(actions, file_dump) diff --git a/rl_studio/agents/pendulum/inference_ddpg.py b/rl_studio/agents/pendulum/inference_ddpg.py index 7603604ac..2223dd257 100644 --- a/rl_studio/agents/pendulum/inference_ddpg.py +++ b/rl_studio/agents/pendulum/inference_ddpg.py @@ -40,10 +40,10 @@ def __init__(self, params): self.now = datetime.datetime.now() # self.environment params self.params = params - self.environment_params = params.environment["params"] - self.env_name = params.environment["params"]["env_name"] - self.config = params.settings["params"] - self.agent_config = params.agent["params"] + self.environment_params = params["environments"] + self.env_name = params["environments"]["env_name"] + self.config = params["settings"] + self.agent_config = params["agent"] if self.config["logging_level"] == "debug": self.LOGGING_LEVEL = logging.DEBUG @@ -56,12 +56,7 @@ def __init__(self, params): self.RANDOM_PERTURBATIONS_LEVEL = self.environment_params.get("random_perturbations_level", 0) self.PERTURBATIONS_INTENSITY_STD = self.environment_params.get("perturbations_intensity_std", 0) - self.RANDOM_START_LEVEL = self.environment_params.get("random_start_level", 0) - self.INITIAL_POLE_ANGLE = self.environment_params.get("initial_pole_angle", None) - non_recoverable_angle = self.environment_params[ - "non_recoverable_angle" - ] # Unfortunately, max_steps is not working with new_step_api=True and it is not giving any benefit. # self.env = gym.make(self.env_name, new_step_api=True, random_start_level=random_start_level) # self.env = NormalizedEnv(gym.make(self.env_name @@ -76,9 +71,6 @@ def __init__(self, params): self.OBJECTIVE_REWARD = self.environment_params[ "objective_reward" ] - self.BLOCKED_EXPERIENCE_BATCH = self.environment_params[ - "block_experience_batch" - ] self.losses_list, self.reward_list, self.episode_len_list= ( [], @@ -86,14 +78,14 @@ def __init__(self, params): [], ) # metrics # recorded for graph - self.batch_size = params.algorithm["params"]["batch_size"] + self.batch_size = params["algorithm"]["batch_size"] self.tau = 1e-2 self.max_avg = -1000 self.num_actions = self.env.action_space.shape[0] - inference_file = params.inference["params"]["inference_file"] + inference_file = params["inference"]["inference_file"] self.inferencer = InferencerWrapper("ddpg_torch", inference_file, env=self.env) def print_init_info(self): @@ -110,7 +102,7 @@ def main(self): epoch_start_time = datetime.datetime.now() logs_dir = 'logs/pendulum/ddpg/training/' - logs_file_name = 'logs_file_' + str(self.RANDOM_START_LEVEL) + '_' + str( + logs_file_name = 'logs_file_' + str( self.RANDOM_PERTURBATIONS_LEVEL) + '_' + str(epoch_start_time) \ + str(self.PERTURBATIONS_INTENSITY_STD) + '.log' logging.basicConfig(filename=logs_dir + logs_file_name, filemode='a', @@ -157,7 +149,7 @@ def main(self): logging.info(updates_message) print(updates_message) total_reward_in_epoch=0 - base_file_name = f'_rewards_rsl-{self.RANDOM_START_LEVEL}_rpl-{self.RANDOM_PERTURBATIONS_LEVEL}_pi-{self.PERTURBATIONS_INTENSITY_STD}' + base_file_name = f'_rewards_rpl-{self.RANDOM_PERTURBATIONS_LEVEL}_pi-{self.PERTURBATIONS_INTENSITY_STD}' file_path = f'{logs_dir}{datetime.datetime.now()}_{base_file_name}.pkl' store_rewards(self.reward_list, file_path) plt.plot(self.reward_list) diff --git a/rl_studio/agents/pendulum/inference_ppo.py b/rl_studio/agents/pendulum/inference_ppo.py new file mode 100644 index 000000000..6d4c22338 --- /dev/null +++ b/rl_studio/agents/pendulum/inference_ppo.py @@ -0,0 +1,173 @@ +import datetime +import time +import random + +import gymnasium as gym +import matplotlib.pyplot as plt +from torch.utils import tensorboard +from tqdm import tqdm +import numpy as np +import torch + +import logging + +from rl_studio.agents.pendulum import utils +from rl_studio.algorithms.ppo_continuous import PPO +from rl_studio.visual.ascii.images import JDEROBOT_LOGO +from rl_studio.visual.ascii.text import JDEROBOT, LETS_GO +from rl_studio.agents.pendulum.utils import store_rewards, save_metadata +from rl_studio.wrappers.inference_rlstudio import InferencerWrapper + + +# # https://github.com/openai/gym/blob/master/gym/core.py +# class NormalizedEnv(gym.ActionWrapper): +# """ Wrap action """ +# +# def _action(self, action): +# act_k = (self.action_space.high - self.action_space.low) / 2. +# act_b = (self.action_space.high + self.action_space.low) / 2. +# return act_k * action + act_b +# +# def _reverse_action(self, action): +# act_k_inv = 2. / (self.action_space.high - self.action_space.low) +# act_b = (self.action_space.high + self.action_space.low) / 2. +# return act_k_inv * (action - act_b) + + +class PPOPendulumInferencer: + def __init__(self, params): + + self.now = datetime.datetime.now() + # self.environment params + self.params = params + self.environment_params = params["environments"] + self.env_name = params["environments"]["env_name"] + self.config = params["settings"] + self.agent_config = params["agent"] + + if self.config["logging_level"] == "debug": + self.LOGGING_LEVEL = logging.DEBUG + elif self.config["logging_level"] == "error": + self.LOGGING_LEVEL = logging.ERROR + elif self.config["logging_level"] == "critical": + self.LOGGING_LEVEL = logging.CRITICAL + else: + self.LOGGING_LEVEL = logging.INFO + + self.RANDOM_PERTURBATIONS_LEVEL = self.environment_params.get("random_perturbations_level", 0) + self.PERTURBATIONS_INTENSITY_STD = self.environment_params.get("perturbations_intensity_std", 0) + + # Unfortunately, max_steps is not working with new_step_api=True and it is not giving any benefit. + # self.env = gym.make(self.env_name, new_step_api=True, random_start_level=random_start_level) + # self.env = NormalizedEnv(gym.make(self.env_name + # # ,random_start_level=self.RANDOM_START_LEVEL, initial_pole_angle=self.INITIAL_POLE_ANGLE, + # # non_recoverable_angle=non_recoverable_angle + # )) + self.env = gym.make(self.env_name, render_mode='human') + self.RUNS = self.environment_params["runs"] + self.UPDATE_EVERY = self.environment_params[ + "update_every" + ] # How often the current progress is recorded + self.OBJECTIVE_REWARD = self.environment_params[ + "objective_reward" + ] + + self.losses_list, self.reward_list, self.episode_len_list= ( + [], + [], + [], + ) # metrics + # recorded for graph + self.episodes_update = params.get("algorithm").get("episodes_update") + + self.max_avg = -1000 + + self.num_actions = self.env.action_space.shape[0] + + self.action_std_decay_rate = 0.05 # linearly decay action_std (action_std = action_std - action_std_decay_rate) + self.min_action_std = 0.1 # minimum action_std (stop decay after action_std <= min_action_std) + self.action_std_decay_freq = int(2.5e5) # action_std decay frequency (in num timesteps) + + inference_file = params["inference"]["inference_file"] + self.inferencer = InferencerWrapper("ppo_continuous", inference_file, env=self.env) + + def print_init_info(self): + logging.info(JDEROBOT) + logging.info(JDEROBOT_LOGO) + logging.info(f"\t- Start hour: {datetime.datetime.now()}\n") + logging.info(f"\t- self.environment params:\n{self.environment_params}") + + def gather_statistics(self, losses, ep_len, episode_rew): + if losses is not None: + self.losses_list.append(losses / ep_len) + self.reward_list.append(episode_rew) + self.episode_len_list.append(ep_len) + + def main(self): + epoch_start_time = datetime.datetime.now() + + logs_dir = 'logs/pendulum/ppo/inference/' + logs_file_name = 'logs_file_' + str( + self.RANDOM_PERTURBATIONS_LEVEL) + '_' + str(epoch_start_time) \ + + str(self.PERTURBATIONS_INTENSITY_STD) + '.log' + logging.basicConfig(filename=logs_dir + logs_file_name, filemode='a', + level=self.LOGGING_LEVEL, + format='%(name)s - %(levelname)s - %(message)s') + self.print_init_info() + + start_time_format = epoch_start_time.strftime("%Y%m%d_%H%M") + + logging.info(LETS_GO) + w = tensorboard.SummaryWriter(log_dir=f"{logs_dir}/tensorboard/{start_time_format}") + + actor_loss = 0 + critic_loss = 0 + total_reward_in_epoch = 0 + global_steps = 0 + + for episode in tqdm(range(self.RUNS)): + state, _ = self.env.reset() + done = False + episode_reward = 0 + step = 0 + while not done: + step += 1 + global_steps += 1 + # if random.uniform(0, 1) < self.RANDOM_PERTURBATIONS_LEVEL: + # perturbation_action = random.randrange(self.env.action_space.n) + # state, done, _, _ = self.env.perturbate(perturbation_action, self.PERTURBATIONS_INTENSITY_STD) + # logging.debug("perturbated in step {} with action {}".format(episode_rew, perturbation_action)) + + action = self.inferencer.inference(state) + new_state, reward, _, done, _ = self.env.step(action) + + state = new_state + episode_reward += reward + total_reward_in_epoch += reward + + w.add_scalar("reward/episode_reward", episode_reward, global_step=episode) + + self.gather_statistics(actor_loss, step, episode_reward) + + # monitor progress + if (episode + 1) % self.UPDATE_EVERY == 0: + time_spent = datetime.datetime.now() - epoch_start_time + epoch_start_time = datetime.datetime.now() + updates_message = 'Run: {0} Average: {1} time spent {2}'.format(episode, + total_reward_in_epoch / self.UPDATE_EVERY, + str(time_spent)) + logging.info(updates_message) + print(updates_message) + last_average = total_reward_in_epoch / self.UPDATE_EVERY; + + if last_average >= self.OBJECTIVE_REWARD: + logging.info("Training objective reached!!") + break + total_reward_in_epoch = 0 + + base_file_name = f'_rewards_rpl-{self.RANDOM_PERTURBATIONS_LEVEL}_pi-{self.PERTURBATIONS_INTENSITY_STD}' + file_path = f'{logs_dir}{datetime.datetime.now()}_{base_file_name}.pkl' + store_rewards(self.reward_list, file_path) + plt.plot(self.reward_list) + plt.legend("reward per episode") + plt.show() diff --git a/rl_studio/agents/pendulum/train_ddpg.py b/rl_studio/agents/pendulum/train_ddpg.py index 36e645a40..ddbcfbeba 100644 --- a/rl_studio/agents/pendulum/train_ddpg.py +++ b/rl_studio/agents/pendulum/train_ddpg.py @@ -39,10 +39,10 @@ def __init__(self, params): self.now = datetime.datetime.now() # self.environment params self.params = params - self.environment_params = params.environment["params"] - self.env_name = params.environment["params"]["env_name"] - self.config = params.settings["params"] - self.agent_config = params.agent["params"] + self.environment_params = params["environments"] + self.env_name = params["environments"]["env_name"] + self.config = params["settings"] + self.agent_config = params["agent"] if self.config["logging_level"] == "debug": self.LOGGING_LEVEL = logging.DEBUG @@ -55,12 +55,7 @@ def __init__(self, params): self.RANDOM_PERTURBATIONS_LEVEL = self.environment_params.get("random_perturbations_level", 0) self.PERTURBATIONS_INTENSITY_STD = self.environment_params.get("perturbations_intensity_std", 0) - self.RANDOM_START_LEVEL = self.environment_params.get("random_start_level", 0) - self.INITIAL_POLE_ANGLE = self.environment_params.get("initial_pole_angle", None) - non_recoverable_angle = self.environment_params[ - "non_recoverable_angle" - ] # Unfortunately, max_steps is not working with new_step_api=True and it is not giving any benefit. # self.env = gym.make(self.env_name, new_step_api=True, random_start_level=random_start_level) # self.env = NormalizedEnv(gym.make(self.env_name @@ -75,9 +70,6 @@ def __init__(self, params): self.OBJECTIVE_REWARD = self.environment_params[ "objective_reward" ] - self.BLOCKED_EXPERIENCE_BATCH = self.environment_params[ - "block_experience_batch" - ] self.losses_list, self.reward_list, self.episode_len_list= ( [], @@ -85,9 +77,9 @@ def __init__(self, params): [], ) # metrics # recorded for graph - self.GAMMA = params.algorithm["params"]["gamma"] - hidden_size = params.algorithm["params"]["hidden_size"] - self.batch_size = params.algorithm["params"]["batch_size"] + self.GAMMA = params["algorithm"]["gamma"] + hidden_size = params["algorithm"]["hidden_size"] + self.batch_size = params["algorithm"]["batch_size"] self.tau = 1e-2 self.max_avg = -1000 @@ -160,7 +152,7 @@ def main(self): epoch_start_time = datetime.datetime.now() logs_dir = 'logs/pendulum/ddpg/training/' - logs_file_name = 'logs_file_' + str(self.RANDOM_START_LEVEL) + '_' + str( + logs_file_name = 'logs_file_' + str( self.RANDOM_PERTURBATIONS_LEVEL) + '_' + str(epoch_start_time) \ + str(self.PERTURBATIONS_INTENSITY_STD) + '.log' logging.basicConfig(filename=logs_dir + logs_file_name, filemode='a', @@ -195,6 +187,7 @@ def main(self): action = self.actor.get_action(state, step) new_state, reward, _, done, _ = self.env.step(action) + new_state = new_state.squeeze() self.memory.push(state, action, reward, new_state, done) if len(self.memory) > self.batch_size: @@ -231,7 +224,7 @@ def main(self): break total_reward_in_epoch = 0 - base_file_name = f'_rewards_rsl-{self.RANDOM_START_LEVEL}_rpl-{self.RANDOM_PERTURBATIONS_LEVEL}_pi-{self.PERTURBATIONS_INTENSITY_STD}' + base_file_name = f'_rewards_rpl-{self.RANDOM_PERTURBATIONS_LEVEL}_pi-{self.PERTURBATIONS_INTENSITY_STD}' file_path = f'{logs_dir}{datetime.datetime.now()}_{base_file_name}.pkl' store_rewards(self.reward_list, file_path) plt.plot(self.reward_list) diff --git a/rl_studio/agents/pendulum/train_ppo.py b/rl_studio/agents/pendulum/train_ppo.py new file mode 100644 index 000000000..01e76a12f --- /dev/null +++ b/rl_studio/agents/pendulum/train_ppo.py @@ -0,0 +1,199 @@ +import datetime +import time +import random + +import gymnasium as gym +import matplotlib.pyplot as plt +from torch.utils import tensorboard +from tqdm import tqdm +import numpy as np +import torch + +import logging + +from rl_studio.agents.pendulum import utils +from rl_studio.algorithms.ppo_continuous import PPO +from rl_studio.visual.ascii.images import JDEROBOT_LOGO +from rl_studio.visual.ascii.text import JDEROBOT, LETS_GO +from rl_studio.agents.pendulum.utils import store_rewards, save_metadata + + +# # https://github.com/openai/gym/blob/master/gym/core.py +# class NormalizedEnv(gym.ActionWrapper): +# """ Wrap action """ +# +# def _action(self, action): +# act_k = (self.action_space.high - self.action_space.low) / 2. +# act_b = (self.action_space.high + self.action_space.low) / 2. +# return act_k * action + act_b +# +# def _reverse_action(self, action): +# act_k_inv = 2. / (self.action_space.high - self.action_space.low) +# act_b = (self.action_space.high + self.action_space.low) / 2. +# return act_k_inv * (action - act_b) + + +class PPOPendulumTrainer: + def __init__(self, params): + + self.now = datetime.datetime.now() + # self.environment params + self.params = params + self.environment_params = params["environments"] + self.env_name = params["environments"]["env_name"] + self.config = params["settings"] + self.agent_config = params["agent"] + + if self.config["logging_level"] == "debug": + self.LOGGING_LEVEL = logging.DEBUG + elif self.config["logging_level"] == "error": + self.LOGGING_LEVEL = logging.ERROR + elif self.config["logging_level"] == "critical": + self.LOGGING_LEVEL = logging.CRITICAL + else: + self.LOGGING_LEVEL = logging.INFO + + self.RANDOM_PERTURBATIONS_LEVEL = self.environment_params.get("random_perturbations_level", 0) + self.PERTURBATIONS_INTENSITY_STD = self.environment_params.get("perturbations_intensity_std", 0) + + # Unfortunately, max_steps is not working with new_step_api=True and it is not giving any benefit. + # self.env = gym.make(self.env_name, new_step_api=True, random_start_level=random_start_level) + # self.env = NormalizedEnv(gym.make(self.env_name + # # ,random_start_level=self.RANDOM_START_LEVEL, initial_pole_angle=self.INITIAL_POLE_ANGLE, + # # non_recoverable_angle=non_recoverable_angle + # )) + self.env = gym.make(self.env_name) + self.RUNS = self.environment_params["runs"] + self.UPDATE_EVERY = self.environment_params[ + "update_every" + ] # How often the current progress is recorded + self.OBJECTIVE_REWARD = self.environment_params[ + "objective_reward" + ] + + self.losses_list, self.reward_list, self.episode_len_list= ( + [], + [], + [], + ) # metrics + # recorded for graph + self.epsilon = params.get("algorithm").get("epsilon") + self.GAMMA = params.get("algorithm").get("gamma") + self.episodes_update = params.get("algorithm").get("episodes_update") + + self.max_avg = -1500 + + self.num_actions = self.env.action_space.shape[0] + input_dim = self.env.observation_space.shape[0] + lr_actor = 0.0003 + lr_critic = 0.001 + K_epochs = 80 + action_std = 0.6 # starting std for action distribution (Multivariate Normal) + self.action_std_decay_rate = 0.05 # linearly decay action_std (action_std = action_std - action_std_decay_rate) + self.min_action_std = 0.1 # minimum action_std (stop decay after action_std <= min_action_std) + self.action_std_decay_freq = int(2.5e5) # action_std decay frequency (in num timesteps) + self.ppo_agent = PPO(input_dim, self.num_actions, lr_actor, lr_critic, self.GAMMA, K_epochs, self.epsilon, + True, action_std) + + def print_init_info(self): + logging.info(JDEROBOT) + logging.info(JDEROBOT_LOGO) + logging.info(f"\t- Start hour: {datetime.datetime.now()}\n") + logging.info(f"\t- self.environment params:\n{self.environment_params}") + + def gather_statistics(self, losses, ep_len, episode_rew): + if losses is not None: + self.losses_list.append(losses / ep_len) + self.reward_list.append(episode_rew) + self.episode_len_list.append(ep_len) + + def main(self): + epoch_start_time = datetime.datetime.now() + + logs_dir = 'logs/pendulum/ppo/training/' + logs_file_name = 'logs_file_' + str( + self.RANDOM_PERTURBATIONS_LEVEL) + '_' + str(epoch_start_time) \ + + str(self.PERTURBATIONS_INTENSITY_STD) + '.log' + logging.basicConfig(filename=logs_dir + logs_file_name, filemode='a', + level=self.LOGGING_LEVEL, + format='%(name)s - %(levelname)s - %(message)s') + self.print_init_info() + + start_time_format = epoch_start_time.strftime("%Y%m%d_%H%M") + + if self.config["save_model"]: + save_metadata("ppo", start_time_format, self.params) + + logging.info(LETS_GO) + w = tensorboard.SummaryWriter(log_dir=f"{logs_dir}/tensorboard/{start_time_format}") + + actor_loss = 0 + critic_loss = 0 + total_reward_in_epoch = 0 + global_steps = 0 + + for episode in tqdm(range(self.RUNS)): + state, _ = self.env.reset() + done = False + episode_reward = 0 + step = 0 + while not done: + step += 1 + global_steps += 1 + # if random.uniform(0, 1) < self.RANDOM_PERTURBATIONS_LEVEL: + # perturbation_action = random.randrange(self.env.action_space.n) + # state, done, _, _ = self.env.perturbate(perturbation_action, self.PERTURBATIONS_INTENSITY_STD) + # logging.debug("perturbated in step {} with action {}".format(episode_rew, perturbation_action)) + + action = self.ppo_agent.select_action(state) + new_state, reward, _, done, _ = self.env.step(action) + self.ppo_agent.buffer.rewards.append(reward) + self.ppo_agent.buffer.is_terminals.append(done) + + # update PPO agent + if global_steps % self.episodes_update == 0: + self.ppo_agent.update() + + if global_steps % self.action_std_decay_freq == 0: + self.ppo_agent.decay_action_std(self.action_std_decay_rate, self.min_action_std) + + + state = new_state + episode_reward += reward + total_reward_in_epoch += reward + + w.add_scalar("reward/episode_reward", episode_reward, global_step=episode) + w.add_scalar("loss/actor_loss", actor_loss, global_step=episode) + w.add_scalar("loss/critic_loss", critic_loss, global_step=episode) + + self.gather_statistics(actor_loss, step, episode_reward) + + # monitor progress + if (episode + 1) % self.UPDATE_EVERY == 0: + time_spent = datetime.datetime.now() - epoch_start_time + epoch_start_time = datetime.datetime.now() + updates_message = 'Run: {0} Average: {1} time spent {2}'.format(episode, + total_reward_in_epoch / self.UPDATE_EVERY, + str(time_spent)) + logging.info(updates_message) + print(updates_message) + last_average = total_reward_in_epoch / self.UPDATE_EVERY; + + if self.config["save_model"] and last_average > self.max_avg: + self.max_avg = total_reward_in_epoch / self.UPDATE_EVERY + logging.info(f"Saving model . . .") + checkpoints_path = "./logs/pendulum/ppo/checkpoints/" + start_time_format + "_actor_avg_" + str( + last_average) + self.ppo_agent.save(checkpoints_path) + + if last_average >= self.OBJECTIVE_REWARD: + logging.info("Training objective reached!!") + break + total_reward_in_epoch = 0 + + base_file_name = f'_rewards_rpl-{self.RANDOM_PERTURBATIONS_LEVEL}_pi-{self.PERTURBATIONS_INTENSITY_STD}' + file_path = f'{logs_dir}{datetime.datetime.now()}_{base_file_name}.pkl' + store_rewards(self.reward_list, file_path) + plt.plot(self.reward_list) + plt.legend("reward per episode") + plt.show() diff --git a/rl_studio/agents/pendulum/train_ppo_not_working.py b/rl_studio/agents/pendulum/train_ppo_not_working.py new file mode 100644 index 000000000..a8d95e2eb --- /dev/null +++ b/rl_studio/agents/pendulum/train_ppo_not_working.py @@ -0,0 +1,192 @@ +import datetime +import time +import random + +import gymnasium as gym +import matplotlib.pyplot as plt +from torch.utils import tensorboard +from tqdm import tqdm +import numpy as np +import torch + +import logging + +from rl_studio.agents.pendulum import utils +from rl_studio.algorithms.ppo_continuous import Actor, Critic, t, get_dist +from rl_studio.visual.ascii.images import JDEROBOT_LOGO +from rl_studio.visual.ascii.text import JDEROBOT, LETS_GO +from rl_studio.agents.pendulum.utils import store_rewards, save_metadata + + +# # https://github.com/openai/gym/blob/master/gym/core.py +# class NormalizedEnv(gym.ActionWrapper): +# """ Wrap action """ +# +# def _action(self, action): +# act_k = (self.action_space.high - self.action_space.low) / 2. +# act_b = (self.action_space.high + self.action_space.low) / 2. +# return act_k * action + act_b +# +# def _reverse_action(self, action): +# act_k_inv = 2. / (self.action_space.high - self.action_space.low) +# act_b = (self.action_space.high + self.action_space.low) / 2. +# return act_k_inv * (action - act_b) + + +class PPOPendulumTrainer: + def __init__(self, params): + + self.now = datetime.datetime.now() + # self.environment params + self.params = params + self.environment_params = params["environments"] + self.env_name = params["environments"]["env_name"] + self.config = params["settings"] + self.agent_config = params["agent"] + + if self.config["logging_level"] == "debug": + self.LOGGING_LEVEL = logging.DEBUG + elif self.config["logging_level"] == "error": + self.LOGGING_LEVEL = logging.ERROR + elif self.config["logging_level"] == "critical": + self.LOGGING_LEVEL = logging.CRITICAL + else: + self.LOGGING_LEVEL = logging.INFO + + self.RANDOM_PERTURBATIONS_LEVEL = self.environment_params.get("random_perturbations_level", 0) + self.PERTURBATIONS_INTENSITY_STD = self.environment_params.get("perturbations_intensity_std", 0) + + # Unfortunately, max_steps is not working with new_step_api=True and it is not giving any benefit. + # self.env = gym.make(self.env_name, new_step_api=True, random_start_level=random_start_level) + # self.env = NormalizedEnv(gym.make(self.env_name + # # ,random_start_level=self.RANDOM_START_LEVEL, initial_pole_angle=self.INITIAL_POLE_ANGLE, + # # non_recoverable_angle=non_recoverable_angle + # )) + self.env = gym.make(self.env_name) + self.RUNS = self.environment_params["runs"] + self.UPDATE_EVERY = self.environment_params[ + "update_every" + ] # How often the current progress is recorded + self.OBJECTIVE_REWARD = self.environment_params[ + "objective_reward" + ] + + self.losses_list, self.reward_list, self.episode_len_list= ( + [], + [], + [], + ) # metrics + # recorded for graph + self.epsilon = params.get("algorithm").get("epsilon") + self.GAMMA = params.get("algorithm").get("gamma") + + self.max_avg = -1000 + + self.num_actions = self.env.action_space.shape[0] + input_dim = self.env.observation_space.shape[0] + + self.actor = Actor(input_dim, self.num_actions, 0.05) + self.critic = Critic(input_dim) + + def print_init_info(self): + logging.info(JDEROBOT) + logging.info(JDEROBOT_LOGO) + logging.info(f"\t- Start hour: {datetime.datetime.now()}\n") + logging.info(f"\t- self.environment params:\n{self.environment_params}") + + def gather_statistics(self, losses, ep_len, episode_rew): + if losses is not None: + self.losses_list.append(losses / ep_len) + self.reward_list.append(episode_rew) + self.episode_len_list.append(ep_len) + + def main(self): + epoch_start_time = datetime.datetime.now() + + logs_dir = 'logs/pendulum/ppo/training/' + logs_file_name = 'logs_file_' + str( + self.RANDOM_PERTURBATIONS_LEVEL) + '_' + str(epoch_start_time) \ + + str(self.PERTURBATIONS_INTENSITY_STD) + '.log' + logging.basicConfig(filename=logs_dir + logs_file_name, filemode='a', + level=self.LOGGING_LEVEL, + format='%(name)s - %(levelname)s - %(message)s') + self.print_init_info() + + start_time_format = epoch_start_time.strftime("%Y%m%d_%H%M") + + if self.config["save_model"]: + save_metadata("ppo", start_time_format, self.params) + + logging.info(LETS_GO) + w = tensorboard.SummaryWriter(log_dir=f"{logs_dir}/tensorboard/{start_time_format}") + + actor_loss = 0 + critic_loss = 0 + total_reward_in_epoch = 0 + global_steps = 0 + prev_prob_act = None + for episode in tqdm(range(self.RUNS)): + state, _ = self.env.reset() + done = False + episode_reward = 0 + step = 0 + while not done: + step += 1 + global_steps += 1 + # if random.uniform(0, 1) < self.RANDOM_PERTURBATIONS_LEVEL: + # perturbation_action = random.randrange(self.env.action_space.n) + # state, done, _, _ = self.env.perturbate(perturbation_action, self.PERTURBATIONS_INTENSITY_STD) + # logging.debug("perturbated in step {} with action {}".format(episode_rew, perturbation_action)) + + action_mean = self.actor(t(state)) + dist = actor.get_dist(action_mean, self.actor.action_var) + + action = dist.sample() + prob_act = dist.log_prob(action, ) + converted_action = action.detach().numpy().clip(-1, 1).ravel() + new_state, reward, _, done, _ = self.env.step(converted_action) + advantage = reward + (1 - done) * self.GAMMA * self.critic(t(new_state)) - self.critic(t(state)) + + if prev_prob_act: + actor_loss = self.actor.train(w, prev_prob_act, prob_act, advantage, global_steps, self.epsilon) + self.critic.train(w, advantage, global_steps) + + prev_prob_act = prob_act + + state = new_state + episode_reward += reward + total_reward_in_epoch += reward + + w.add_scalar("reward/episode_reward", episode_reward, global_step=episode) + w.add_scalar("loss/actor_loss", actor_loss, global_step=episode) + w.add_scalar("loss/critic_loss", critic_loss, global_step=episode) + + self.gather_statistics(actor_loss, step, episode_reward) + + # monitor progress + if (episode + 1) % self.UPDATE_EVERY == 0: + time_spent = datetime.datetime.now() - epoch_start_time + epoch_start_time = datetime.datetime.now() + updates_message = 'Run: {0} Average: {1} time spent {2}'.format(episode, + total_reward_in_epoch / self.UPDATE_EVERY, + str(time_spent)) + logging.info(updates_message) + print(updates_message) + last_average = total_reward_in_epoch / self.UPDATE_EVERY; + + if self.config["save_model"] and last_average > self.max_avg: + self.max_avg = total_reward_in_epoch / self.UPDATE_EVERY + logging.info(f"Saving model . . .") + utils.save_ppo_model(self.actor, start_time_format, last_average) + + if last_average >= self.OBJECTIVE_REWARD: + logging.info("Training objective reached!!") + break + total_reward_in_epoch = 0 + + base_file_name = f'_rewards_rpl-{self.RANDOM_PERTURBATIONS_LEVEL}_pi-{self.PERTURBATIONS_INTENSITY_STD}' + file_path = f'{logs_dir}{datetime.datetime.now()}_{base_file_name}.pkl' + store_rewards(self.reward_list, file_path) + plt.plot(self.reward_list) + plt.legend("reward per episode") + plt.show() diff --git a/rl_studio/agents/pendulum/utils.py b/rl_studio/agents/pendulum/utils.py index 41d3c4354..4e8932181 100755 --- a/rl_studio/agents/pendulum/utils.py +++ b/rl_studio/agents/pendulum/utils.py @@ -24,7 +24,7 @@ def save_model_qlearn(qlearn, current_time, avg): # Q TABLE base_file_name = "_epsilon_{}".format(round(qlearn.epsilon, 3)) file_dump = open( - "./logs/pendulum/qlearning/checkpoints/" + current_time + base_file_name + "_QTABLE_avg_ " + str(avg) + ".pkl", + "./logs/pendulum/qlearning/checkpoints/" + current_time + base_file_name + "_QTABLE_avg_ " + str(avg) + ".pkl", "wb" ) pickle.dump(qlearn.q, file_dump) @@ -32,23 +32,26 @@ def save_model_qlearn(qlearn, current_time, avg): def params_to_markdown_list(dictionary): md_list = [] - for item in dictionary["params"]: - md_list.append({"parameter": item, "value": dictionary["params"][item]}) + for item in dictionary: + md_list.append({"parameter": item, "value": dictionary[item]}) return md_list + def save_metadata(algorithm, current_time, params): metadata = open("./logs/pendulum/" + algorithm + "/checkpoints/" + current_time + "_metadata.md", "a") metadata.write("AGENT PARAMETERS\n") - metadata.write(markdownTable(params_to_markdown_list(params.agent)).setParams(row_sep='always').getMarkdown()) + metadata.write(markdownTable(params_to_markdown_list(params["agent"])).setParams(row_sep='always').getMarkdown()) metadata.write("\n```\n\nSETTINGS PARAMETERS\n") - metadata.write(markdownTable(params_to_markdown_list(params.settings)).setParams(row_sep='always').getMarkdown()) + metadata.write(markdownTable(params_to_markdown_list(params["settings"])).setParams(row_sep='always').getMarkdown()) metadata.write("\n```\n\nENVIRONMENT PARAMETERS\n") - metadata.write(markdownTable(params_to_markdown_list(params.environment)).setParams(row_sep='always').getMarkdown()) + metadata.write(markdownTable(params_to_markdown_list(params["environments"])).setParams(row_sep='always').getMarkdown()) metadata.write("\n```\n\nALGORITHM PARAMETERS\n") - metadata.write(markdownTable(params_to_markdown_list(params.algorithm)).setParams(row_sep='always').getMarkdown()) + metadata.write(markdownTable(params_to_markdown_list(params["algorithm"])).setParams(row_sep='always').getMarkdown()) metadata.close() -def save_ddpg_model(dqn, current_time, average): + + +def save_dqn_model(dqn, current_time, average): file_dump = open( "./logs/pendulum/dqn/checkpoints/" + current_time + "_DQN_WEIGHTS_avg_" + str( average) + ".pkl", @@ -94,7 +97,9 @@ def create_bins_and_q_table(env, number_angle_bins, number_pos_bins): ] qTable = np.random.uniform( - low=-2, high=0, size=([number_pos_bins] * int(obsSpaceSize/2) + [number_angle_bins] * int(obsSpaceSize/2) + [env.action_space.n]) + low=-2, high=0, size=( + [number_pos_bins] * int(obsSpaceSize / 2) + [number_angle_bins] * int(obsSpaceSize / 2) + [ + env.action_space.n]) ) return bins, obsSpaceSize, qTable diff --git a/rl_studio/agents/robot_mesh/inference_qlearn.py b/rl_studio/agents/robot_mesh/inference_qlearn.py index d6430bfa4..53fdb2e50 100644 --- a/rl_studio/agents/robot_mesh/inference_qlearn.py +++ b/rl_studio/agents/robot_mesh/inference_qlearn.py @@ -16,12 +16,9 @@ def __init__(self, params): # TODO: Create a pydantic metaclass to simplify the way we extract the params # environment params self.params = params - self.environment_params = params.environment["params"] - self.env_name = params.environment["params"]["env_name"] - env_params = params.environment["params"] - actions = params.environment["actions"] - env_params["actions"] = actions - self.env = gym.make(self.env_name, **env_params) + self.environment_params = params["environments"] + self.env_name = self.environment_params["env_name"] + self.env = gym.make(self.env_name, **params) # algorithm param self.stats = {} # epoch: steps self.states_counter = {} @@ -29,14 +26,14 @@ def __init__(self, params): self.last_time_steps = np.ndarray(0) self.total_episodes = 20000 - self.outdir = "./logs/robot_mesh_experiments/" + self.outdir = "./logs/robot_mesh/" self.env = gym.wrappers.Monitor(self.env, self.outdir, force=True) - inference_file = params.inference["params"]["inference_file"] - actions_file = params.inference["params"]["actions_file"] + inference_file = params["inference"]["inference_file"] + actions_file = params["inference"]["actions_file"] self.highest_reward = 0 - self.inferencer = InferencerWrapper("qlearn", inference_file, actions_file) + self.inferencer = InferencerWrapper("qlearn_deprecated", inference_file, actions_file) def print_init_info(self): print(JDEROBOT) diff --git a/rl_studio/agents/robot_mesh/manual_pilot.py b/rl_studio/agents/robot_mesh/manual_pilot.py index feb2f6d0c..faea218b7 100644 --- a/rl_studio/agents/robot_mesh/manual_pilot.py +++ b/rl_studio/agents/robot_mesh/manual_pilot.py @@ -49,7 +49,7 @@ def main(self): print(f"\t- Start hour: {datetime.datetime.now()}\n") print(f"\t- Environment params:\n{self.environment_params}") - outdir = "./logs/robot_mesh_experiments/" + outdir = "./logs/robot_mesh/" env = gym.wrappers.Monitor(self.env, outdir, force=True) total_episodes = 20000 env.done = False diff --git a/rl_studio/agents/robot_mesh/train_qlearn.py b/rl_studio/agents/robot_mesh/train_qlearn.py index b4cc177c7..a46201e11 100644 --- a/rl_studio/agents/robot_mesh/train_qlearn.py +++ b/rl_studio/agents/robot_mesh/train_qlearn.py @@ -1,4 +1,5 @@ import datetime +import multiprocessing import time import gym @@ -15,19 +16,21 @@ def __init__(self, params): # TODO: Create a pydantic metaclass to simplify the way we extract the params # environment params self.params = params - self.environment_params = params.environment["params"] + self.environment_params = params["environments"] # algorithm params - self.alpha = params.algorithm["params"]["alpha"] - self.epsilon = params.algorithm["params"]["epsilon"] - self.gamma = params.algorithm["params"]["gamma"] - self.config = params.settings["params"] + self.alpha = params["algorithm"]["alpha"] + self.epsilon = params["algorithm"]["epsilon"] + self.gamma = params["algorithm"]["gamma"] + self.config = params["settings"] + self.actions = params["actions"] self.stats = {} # epoch: steps self.states_counter = {} self.states_reward = {} self.last_time_steps = np.ndarray(0) - self.outdir = "./logs/robot_mesh_experiments/" + self.outdir = "./logs/robot_mesh/" + self.init_environment() def print_init_info(self): print(JDEROBOT) @@ -37,14 +40,11 @@ def print_init_info(self): print(f"\t- Environment params:\n{self.environment_params}") def init_environment(self): - self.env_name = self.params.environment["params"]["env_name"] - env_params = self.params.environment["params"] - actions = self.params.environment["actions"] - env_params["actions"] = actions - self.env = gym.make(self.env_name, **env_params) + self.env_name = self.environment_params["env_name"] + self.env = gym.make(self.env_name, **self.params) self.env = gym.wrappers.Monitor(self.env, self.outdir, force=True) - self.actions = range(self.env.action_space.n) + self.highest_reward = 0 self.cumulated_reward = 0 self.total_episodes = 20000 self.epsilon_discount = 0.999 # Default 0.9986 @@ -86,7 +86,6 @@ def simulation(self, queue): initial_epsilon = self.qlearn.epsilon - telemetry_start_time = time.time() start_time = datetime.datetime.now() start_time_format = start_time.strftime("%Y%m%d_%H%M") diff --git a/rl_studio/agents/robot_mesh/utils.py b/rl_studio/agents/robot_mesh/utils.py index bddd81806..7128276aa 100644 --- a/rl_studio/agents/robot_mesh/utils.py +++ b/rl_studio/agents/robot_mesh/utils.py @@ -42,29 +42,29 @@ def save_model(qlearn, current_time, states, states_counter, states_rewards): # Q TABLE base_file_name = "_epsilon_{}".format(round(qlearn.epsilon, 3)) file_dump = open( - "./logs/qlearn_models/1_" + current_time + base_file_name + "_QTABLE.pkl", "wb" + "./logs/robot_mesh/1_" + current_time + base_file_name + "_QTABLE.pkl", "wb" ) pickle.dump(qlearn.q, file_dump) # STATES COUNTER states_counter_file_name = base_file_name + "_STATES_COUNTER.pkl" file_dump = open( - "./logs/qlearn_models/2_" + current_time + states_counter_file_name, "wb" + "./logs/robot_mesh/2_" + current_time + states_counter_file_name, "wb" ) pickle.dump(states_counter, file_dump) # STATES CUMULATED REWARD states_cum_reward_file_name = base_file_name + "_STATES_CUM_REWARD.pkl" file_dump = open( - "./logs/qlearn_models/3_" + current_time + states_cum_reward_file_name, "wb" + "./logs/robot_mesh/3_" + current_time + states_cum_reward_file_name, "wb" ) pickle.dump(states_rewards, file_dump) # STATES steps = base_file_name + "_STATES_STEPS.pkl" - file_dump = open("./logs/qlearn_models/4_" + current_time + steps, "wb") + file_dump = open("./logs/robot_mesh/4_" + current_time + steps, "wb") pickle.dump(states, file_dump) def save_actions(actions, start_time): - file_dump = open("./logs/qlearn_models/actions_set_" + start_time, "wb") + file_dump = open("./logs/robot_mesh/actions_set_" + start_time, "wb") pickle.dump(actions, file_dump) diff --git a/rl_studio/agents/tasks_type.py b/rl_studio/agents/tasks_type.py new file mode 100644 index 000000000..8a025f658 --- /dev/null +++ b/rl_studio/agents/tasks_type.py @@ -0,0 +1,7 @@ +from enum import Enum + + +class TasksType(Enum): + FOLLOWLINEGAZEBO = "follow_line_gazebo" + FOLLOWLANEGAZEBO = "follow_lane_gazebo" + AUTOPARKINGGAZEBO = "autoparking_gazebo" diff --git a/rl_studio/agents/utilities/plot_multiple_graphs_frequencies.py b/rl_studio/agents/utilities/plot_multiple_graphs_frequencies.py index adb356060..3b6a10cbf 100755 --- a/rl_studio/agents/utilities/plot_multiple_graphs_frequencies.py +++ b/rl_studio/agents/utilities/plot_multiple_graphs_frequencies.py @@ -4,64 +4,132 @@ import matplotlib.pyplot as plt import numpy as np from statsmodels.distributions.empirical_distribution import ECDF +import os +import re RUNS = 100 max_episode_steps = 500 -intensities = [0, 0.1, 0.2, 0.4, 0.6, 0.8] +min_pert_freq = 0.1 +min_pert_intensity = 1 + +frequencies = [] +intensities_dev = [] yticks = [] +yticks_dev = [] + +def plot_freq(ax, folder_path, color, boxplot=False): + # Use a regular expression to extract the part between "cartpole" and "inference" + match = re.search(r'cartpole/(.*?)/inference', folder_path) + if match: + extracted_part = match.group(1) + label = extracted_part + else: + label = "unknown" + print("No match found") + + ecdf_dict = {} + rewards_dict = {} + + # Iterate through all the files in the folder + for file_name in os.listdir(folder_path): + # Use a regular expression to extract the part between "pi" and "_in" + match = re.search(r'rpl-(.*?)_pi', file_name) + match2 = re.search(r'pi-(.*?)_in', file_name) + match3 = re.search(r'init_(.*?).pkl', file_name) + + if "rewards" in file_name and match and match2 and match3: + extracted_part = float(match.group(1)) + extracted_part2 = float(match2.group(1)) + extracted_part3 = float(match3.group(1)) + if extracted_part == 0 and extracted_part2 == 0 and extracted_part3 == 0\ + or extracted_part > min_pert_freq\ + or extracted_part == min_pert_freq and extracted_part2 == min_pert_intensity: + # Add the extracted part and filename to the list + rewards_file = open(folder_path+file_name, "rb") + rewards = pickle.load(rewards_file) + rewards = np.asarray(rewards) + ecdf = ECDF(rewards) + ecdf_dict[float(extracted_part)] = 1 - ecdf(499) + rewards_dict[float(extracted_part)] = rewards + + print(label) + sorted_ecdf_dict = dict(sorted(ecdf_dict.items(), key=lambda item: item[0])) + print(sorted_ecdf_dict) + extracted_frequencies = list(sorted_ecdf_dict.keys()) + + frequencies.extend(extracted_frequencies) + + if boxplot: + sorted_rewards_dict = dict(sorted(rewards_dict.items(), key=lambda item: item[0])) + sorted_rewards = list(sorted_rewards_dict.values()) + + if len(sorted_rewards) == 0: + return + + ax.boxplot(sorted_rewards, positions= [ np.round(x, 2) for x in extracted_frequencies ], widths=0.03, + flierprops={'marker': 'o', 'markersize': 2}) + ax.legend([label]) + return + + success_percentage = list(sorted_ecdf_dict.values()) + + yticks.extend(success_percentage) + frequencies.extend(extracted_frequencies) + + ax.plot(extracted_frequencies, success_percentage, color=color, label=label) + + + +def plot_deviations(ax, folder_path, color): + # Use a regular expression to extract the part between "cartpole" and "inference" + match = re.search(r'cartpole/(.*?)/inference', folder_path) + if match: + extracted_part = match.group(1) + label = extracted_part + else: + label = "unknown" + print("No match found") + + deviations_list = {} + + # Iterate through all the files in the folder + for file_name in os.listdir(folder_path): + # Use a regular expression to extract the part between "pi" and "_in" + match = re.search(r'rpl-(.*?)_pi', file_name) + match2 = re.search(r'pi-(.*?)_in', file_name) + match3 = re.search(r'init_(.*?).pkl', file_name) + + if "states" in file_name and match and match2 and match3: + extracted_part = float(match.group(1)) + extracted_part2 = float(match2.group(1)) + extracted_part3 = float(match3.group(1)) -def plot_intensities(ax, file_0, file_1, file_2, file_3, file_4, file_5, label, color): - - rewards_file = open( - file_0, - "rb") - rewards = pickle.load(rewards_file) - rewards = np.asarray(rewards) - ecdf_0 = ECDF(rewards) - - rewards_file = open( - file_1, - "rb") - rewards = pickle.load(rewards_file) - rewards = np.asarray(rewards) - ecdf_1 = ECDF(rewards) - - rewards_file = open( - file_2, - "rb") - rewards = pickle.load(rewards_file) - rewards = np.asarray(rewards) - ecdf_2 = ECDF(rewards) - - rewards_file = open( - file_3, - "rb") - rewards = pickle.load(rewards_file) - rewards = np.asarray(rewards) - ecdf_3 = ECDF(rewards) - - rewards_file = open( - file_4, - "rb") - rewards = pickle.load(rewards_file) - rewards = np.asarray(rewards) - ecdf_4 = ECDF(rewards) - - rewards_file = open( - file_5, - "rb") - rewards = pickle.load(rewards_file) - rewards = np.asarray(rewards) - ecdf_5 = ECDF(rewards) - - ninety_rewards = [1- ecdf_0(499), 1 - ecdf_1(499), 1 - ecdf_2(499), 1 - ecdf_3(499), 1 - ecdf_4(499), 1- ecdf_5(499)] - - yticks.append(ninety_rewards) - - ax.plot(intensities, ninety_rewards, color=color, label=label) + if extracted_part == 0 and extracted_part2 == 0 and extracted_part3 == 0\ + or extracted_part > min_pert_freq\ + or extracted_part == min_pert_freq and extracted_part2 == min_pert_intensity: + # Add the extracted part and filename to the list + states_file = open(folder_path + file_name, "rb") + states = pickle.load(states_file) + states_array = [np.asarray(step) for runs in states for step in runs] + if label == 'qlearning': + states_array = [(angle-50)/119.61722488 for angle in states_array] + absolute_arr = np.abs(states_array) + absolute_mean = np.mean(absolute_arr) + deviations_list[float(extracted_part) / 10] = absolute_mean + print(label) + sorted_deviations_dict = dict(sorted(deviations_list.items(), key=lambda item: item[0])) + print(sorted_deviations_dict) + extracted_intensities = list(sorted_deviations_dict.keys()) + + intensities_dev.extend(extracted_intensities) + + sorted_deviations = list(sorted_deviations_dict.values()) + + yticks_dev.extend(sorted_deviations) + ax.plot(extracted_intensities, sorted_deviations, color=color, label=label) def cleanticks(ticks): clear_ticks = [] @@ -70,66 +138,128 @@ def cleanticks(ticks): for element2 in ticks: if element1 != element2 and abs(element1 - element2) > 0.02: clear_ticks.append(element2) + element1 = element2 return clear_ticks + +# def configure_frequencies_graph(ax1, clear_ticks, frequencies): +def configure_frequencies_graph(ax1, frequencies): + + # ax1.set_yticks(clear_ticks) + y_ticks = np.linspace(0, 1, 21) + ax1.set_yticks(y_ticks) + ax1.set_xticks(frequencies) + + yticklabels = ax1.get_yticklabels() + for yticklabel in yticklabels: + yticklabel.set_horizontalalignment('right') + yticklabel.set_fontsize('xx-small') + + xticks = ax1.get_xticklabels() + for xtick in xticks: + xtick.set_horizontalalignment('right') + xtick.set_fontsize('xx-small') + ax1.grid() + ax1.legend() + + ax1.set_xlabel("frequency of perturbations with fixed intensity") + ax1.set_ylabel("percentage of successful episodes") + +def configure_boxplot_graph(ax1, frequencies): + boxplot_y = np.linspace(0, 500, 27) + ax1.set_yticks(boxplot_y) + ax1.set_xticks(frequencies) + ax1.set_xlim(frequencies[0]-0.1, frequencies[len(frequencies)-1]+0.1) + ax1.grid() + +# def configure_intensities_graph(ax1, clear_ticks, intensities): +def configure_deviation_graph(ax1, intensities): + # ax1.set_yticks(clear_ticks) + y_ticks_dev = np.linspace(0, 1, 21) + ax1.set_yticks(y_ticks_dev) + ax1.set_xticks(intensities) + ax1.set_ylim(0, 0.1) + + yticklabels = ax1.get_yticklabels() + for yticklabel in yticklabels: + yticklabel.set_horizontalalignment('right') + yticklabel.set_fontsize('xx-small') + + xticks = ax1.get_xticklabels() + for xtick in xticks: + xtick.set_horizontalalignment('right') + xtick.set_fontsize('xx-small') + ax1.grid() + ax1.legend() + + ax1.set_xlabel("intensity of perturbations with fixed frequency") + ax1.set_ylabel("average pole angle") + if __name__ == "__main__": pltlib.rcParams.update({'font.size': 15}) fig, ax1 = plt.subplots() + fig2, ax2 = plt.subplots() + fig3, ax3 = plt.subplots() + fig4, ax4= plt.subplots() + fig5, ax5 = plt.subplots() + fig6, ax6 = plt.subplots() + fig7, ax7 = plt.subplots() + fig8, ax8 = plt.subplots() + + # PPO + plot_freq(ax1, "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/ppo/inference/", "green") + plot_freq(ax2, "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/ppo/inference/", "green", True) + # DQN + plot_freq(ax1, "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/dqn/inference/", "red") + plot_freq(ax3, "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/dqn/inference/", "red", True) + # MANUAL + plot_freq(ax1, "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/no_rl/inference/", "blue") + plot_freq(ax4, "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/no_rl/inference/", "blue", True) + # QLEAN + plot_freq(ax1, "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/qlearning/inference/", "purple") + plot_freq(ax5, "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/qlearning/inference/", "purple", True) + # PPO CONTINUOUS + plot_freq(ax1, "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/ppo_continuous/inference/", "black") + plot_freq(ax6, "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/ppo_continuous/inference/", "black", True) + # DDPG + plot_freq(ax1, "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/ddpg/inference/", "brown") + plot_freq(ax7, "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/ddpg/inference/", "brown", True) + + # Deviations + plot_deviations(ax8, "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/ppo/inference/", "green") + plot_deviations(ax8, "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/dqn/inference/", "red") + plot_deviations(ax8, "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/no_rl/inference/", "blue") + plot_deviations(ax8, "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/qlearning/inference/", "purple") + plot_deviations(ax8, "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/ppo_continuous/inference/", "black") + plot_deviations(ax8, "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/ddpg/inference/", "brown") + + + configure_frequencies_graph(ax1, frequencies) + configure_boxplot_graph(ax2, frequencies) + configure_boxplot_graph(ax3, frequencies) + configure_boxplot_graph(ax4, frequencies) + configure_boxplot_graph(ax5, frequencies) + configure_boxplot_graph(ax6, frequencies) + configure_boxplot_graph(ax7, frequencies) + configure_deviation_graph(ax8, intensities_dev) - #PPO - plot_intensities(ax1, - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/qlearning/inference/2022-11-21 23:16:16.550832__rewards_rsl-0_rpl-0_pi-0.pkl", - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/ppo/inference/2022-11-16 19:58:16.130830__rewards_rsl-0_rpl-0.1_pi-1.pkl", - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/ppo/inference/2022-11-16 20:03:12.387811__rewards_rsl-0_rpl-0.1_pi-7.pkl", - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/ppo/inference/2022-11-16 20:03:12.387811__rewards_rsl-0_rpl-0.1_pi-7.pkl", - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/ppo/inference/2022-11-22 00:18:21.530321__rewards_rsl-0_rpl-0.6_pi-1_init_0.pkl", - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/ppo/inference/2022-11-22 00:18:29.673944__rewards_rsl-0_rpl-0.8_pi-1_init_0.pkl", - "ppo", - "green") - #DQN - plot_intensities(ax1, - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/qlearning/inference/2022-11-21 23:16:16.550832__rewards_rsl-0_rpl-0_pi-0.pkl", - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/dqn/inference/2022-11-22 00:15:24.492958__rewards_rsl-0_rpl-0.1_pi-1_init_0.pkl", - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/dqn/inference/2022-11-22 00:16:01.716049__rewards_rsl-0_rpl-0.2_pi-1_init_0.pkl", - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/dqn/inference/2022-11-22 00:16:02.753516__rewards_rsl-0_rpl-0.4_pi-1_init_0.pkl", - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/dqn/inference/2022-11-22 00:16:08.270695__rewards_rsl-0_rpl-0.6_pi-1_init_0.pkl", - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/dqn/inference/2022-11-22 00:16:20.325513__rewards_rsl-0_rpl-0.8_pi-1_init_0.pkl", - "DQN", - "red") - - #MANUAL - plot_intensities(ax1, - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/qlearning/inference/2022-11-21 23:16:16.550832__rewards_rsl-0_rpl-0_pi-0.pkl", - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/no_rl/inference/2022-11-22 00:18:57.234844__rewards_rsl-0_rpl-0.1_pi-1.pkl", - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/no_rl/inference/2022-11-22 00:19:00.746120__rewards_rsl-0_rpl-0.2_pi-1.pkl", - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/no_rl/inference/2022-11-22 00:19:12.534704__rewards_rsl-0_rpl-0.4_pi-1.pkl", - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/no_rl/inference/2022-11-22 00:19:18.254783__rewards_rsl-0_rpl-0.6_pi-1.pkl", - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/no_rl/inference/2022-11-22 00:19:23.058775__rewards_rsl-0_rpl-0.8_pi-1.pkl", - "programmatic", - "blue") - - #QLEAN - plot_intensities(ax1, - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/qlearning/inference/2022-11-21 23:16:16.550832__rewards_rsl-0_rpl-0_pi-0.pkl", - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/qlearning/inference/2022-11-22 00:17:05.210740__rewards_rsl-0_rpl-0.1_pi-1.pkl", - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/qlearning/inference/2022-11-22 00:17:25.785325__rewards_rsl-0_rpl-0.2_pi-1.pkl", - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/qlearning/inference/2022-11-22 00:26:55.906779__rewards_rsl-0_rpl-0.5_pi-1.pkl", - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/qlearning/inference/2022-11-22 00:27:30.106815__rewards_rsl-0_rpl-0.8_pi-1.pkl", - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/qlearning/inference/2022-11-21 23:17:27.826748__rewards_rsl-0_rpl-0.1_pi-7.pkl", - "QLearning", - "purple") - - plt.xticks(intensities) - yticks = np.array(yticks) - flatten_ticks = yticks.flatten() - clear_ticks = cleanticks(flatten_ticks) - plt.yticks(clear_ticks) - plt.setp(ax1.get_yticklabels(), horizontalalignment='right', fontsize='xx-small') - plt.setp(ax1.get_xticklabels(), horizontalalignment='right', fontsize='x-small') - plt.xlabel("frequency of perturbations with fixed intensity") - plt.ylabel("percentage of successful episodes") - plt.grid() - plt.legend() + fig.canvas.manager.full_screen_toggle() + fig2.canvas.manager.full_screen_toggle() + fig3.canvas.manager.full_screen_toggle() + fig4.canvas.manager.full_screen_toggle() + fig5.canvas.manager.full_screen_toggle() + fig6.canvas.manager.full_screen_toggle() + fig7.canvas.manager.full_screen_toggle() + fig8.canvas.manager.full_screen_toggle() plt.show() + base_path = '/home/ruben/Desktop/2020-phd-ruben-lucas/docs/assets/images/results_images/cartpole/solidityExperiments/refinement/refinementOfRefinement/frequency/' + ax1.figure.savefig(base_path + 'comparison.png') + ax2.figure.savefig(base_path + 'ppo.png') + ax3.figure.savefig(base_path + 'dqn.png') + ax4.figure.savefig(base_path + 'no_rl.png') + ax5.figure.savefig(base_path + 'qlearning.png') + ax6.figure.savefig(base_path + 'qlearning.png') + ax7.figure.savefig(base_path + 'ddpg.png') + ax8.figure.savefig(base_path + 'deviations.png') diff --git a/rl_studio/agents/utilities/plot_multiple_graphs_init_pos.py b/rl_studio/agents/utilities/plot_multiple_graphs_init_pos.py index e96c0f635..43e8a4928 100755 --- a/rl_studio/agents/utilities/plot_multiple_graphs_init_pos.py +++ b/rl_studio/agents/utilities/plot_multiple_graphs_init_pos.py @@ -4,139 +4,175 @@ import matplotlib.pyplot as plt import numpy as np from statsmodels.distributions.empirical_distribution import ECDF -from statistics import mean +import os +import re RUNS = 100 max_episode_steps = 500 -init_pos = [0, 0.1, 0.25, 0.3, 0.45, 0.5] +min_init_pos = 0 + +intensities = [] yticks = [] -def plot_intensities(ax, file_00, file_0, file_1, file_2, file_3, file_4, label, color): - - rewards_file = open( - file_00, - "rb") - rewards = pickle.load(rewards_file) - rewards = np.asarray(rewards) - ecdf_00 = mean(rewards) - - rewards_file = open( - file_0, - "rb") - rewards = pickle.load(rewards_file) - rewards = np.asarray(rewards) - ecdf_0 = mean(rewards) - - rewards_file = open( - file_1, - "rb") - rewards = pickle.load(rewards_file) - rewards = np.asarray(rewards) - ecdf_1 = mean(rewards) - - rewards_file = open( - file_2, - "rb") - rewards = pickle.load(rewards_file) - rewards = np.asarray(rewards) - ecdf_2 = mean(rewards) - - rewards_file = open( - file_3, - "rb") - rewards = pickle.load(rewards_file) - rewards = np.asarray(rewards) - ecdf_3 = mean(rewards) - - rewards_file = open( - file_4, - "rb") - rewards = pickle.load(rewards_file) - rewards = np.asarray(rewards) - ecdf_4 = mean(rewards) - ninety_rewards = [ecdf_00, ecdf_0, ecdf_1, ecdf_2, ecdf_3, ecdf_4] - - yticks.append(ninety_rewards) - - ax.plot(init_pos, ninety_rewards, color=color, label=label) +def plot_intensities(ax, folder_path, color, boxplot=False): + # Use a regular expression to extract the part between "cartpole" and "inference" + match = re.search(r'cartpole/(.*?)/inference', folder_path) + if match: + extracted_part = match.group(1) + label = extracted_part + else: + label = "unknown" + print("No match found") + + ecdf_dict = {} + rewards_dict = {} + + # Iterate through all the files in the folder + for file_name in os.listdir(folder_path): + # Use a regular expression to extract the part between "pi" and "_in" + match = re.search(r'init_(.*?).pkl', file_name) + match2 = re.search(r'pi-(.*?)_in', file_name) + match3 = re.search(r'rpl-(.*?)_pi', file_name) + + if "rewards" in file_name and match and match2 and match3: + extracted_part = float(match.group(1)) + extracted_part2 = float(match2.group(1)) + extracted_part3 = float(match3.group(1)) + if extracted_part == 0 and extracted_part2 == 0 and extracted_part3 == 0 \ + or extracted_part > min_init_pos: # Add the extracted part and filename to the list + rewards_file = open(folder_path + file_name, "rb") + rewards = pickle.load(rewards_file) + rewards = np.asarray(rewards) + ecdf = ECDF(rewards) + ecdf_dict[float(extracted_part)] = 1 - ecdf(499) + rewards_dict[float(extracted_part)] = rewards + + print(label) + sorted_ecdf_dict = dict(sorted(ecdf_dict.items(), key=lambda item: item[0])) + print(sorted_ecdf_dict) + extracted_intensities = list(sorted_ecdf_dict.keys()) + + intensities.extend(extracted_intensities) + + if boxplot: + sorted_rewards_dict = dict(sorted(rewards_dict.items(), key=lambda item: item[0])) + sorted_rewards = list(sorted_rewards_dict.values()) + + if len(sorted_rewards) == 0: + return + + ax.boxplot(sorted_rewards, positions= [ np.round(x, 3) for x in extracted_intensities ], widths=0.03, + flierprops={'marker': 'o', 'markersize': 2}) + ax.legend([label]) + return + + success_percentage = list(sorted_ecdf_dict.values()) + + yticks.extend(success_percentage) + ax.plot(extracted_intensities, success_percentage, color=color, label=label) def cleanticks(ticks): clear_ticks = [] - ticks.sort(); element1 = ticks[0] - print(ticks) clear_ticks.append(element1) - for index in range(len(ticks)): - element2 = ticks[index] - if element1 != element2 and abs(element1 - element2) > 40: - print(element1) - print(element2) + for element2 in ticks: + if element1 != element2 and abs(element1 - element2) > 0.02: clear_ticks.append(element2) element1 = element2 return clear_ticks + +def configure_intensities_graph(ax1, clear_ticks, intensities): + ax1.set_yticks(clear_ticks) + ax1.set_xticks(intensities) + + yticklabels = ax1.get_yticklabels() + for yticklabel in yticklabels: + yticklabel.set_horizontalalignment('right') + yticklabel.set_fontsize('xx-small') + + xticks = ax1.get_xticklabels() + for xtick in xticks: + xtick.set_horizontalalignment('right') + xtick.set_fontsize('xx-small') + ax1.grid() + ax1.legend() + + ax1.set_xlabel("initial pole angle in radians") + ax1.set_ylabel("percentage of successful episodes") + + +def configure_boxplot_graph(ax1, intensities): + boxplot_y = np.linspace(0, 500, 27) + ax1.set_yticks(boxplot_y) + ax1.set_xticks(intensities) + ax1.set_xlim(intensities[0]-0.1, intensities[len(intensities)-1]+0.1) + ax1.grid() + + if __name__ == "__main__": pltlib.rcParams.update({'font.size': 15}) - pltlib.rcParams.update({'font.size': 15}) fig, ax1 = plt.subplots() + fig2, ax2 = plt.subplots() + fig3, ax3 = plt.subplots() + fig4, ax4 = plt.subplots() + fig5, ax5 = plt.subplots() + fig6, ax6 = plt.subplots() + fig7, ax7 = plt.subplots() + + # PPO + plot_intensities(ax1, "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/ppo/inference/", "green") + plot_intensities(ax2, "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/ppo/inference/", "green", True) + + # DQN + plot_intensities(ax1, "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/dqn/inference/", "red") + plot_intensities(ax3, "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/dqn/inference/", "red", True) + # MANUAL + plot_intensities(ax1, "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/no_rl/inference/", "blue") + plot_intensities(ax4, "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/no_rl/inference/", "blue", True) + # QLEAN + plot_intensities(ax1, "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/qlearning/inference/", "purple") + plot_intensities(ax5, "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/qlearning/inference/", "purple", + True) + # PPO CONTINUOUS + plot_intensities(ax1, "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/ppo_continuous/inference/", "black") + plot_intensities(ax6, "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/ppo_continuous/inference/", "black", + True) + # DDPG + plot_intensities(ax1, "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/ddpg/inference/", "brown") + plot_intensities(ax7, "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/ddpg/inference/", "brown", True) - #PPO - plot_intensities(ax1, - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/ppo/inference/2022-11-16 19:58:16.130830__rewards_rsl-0_rpl-0.1_pi-1.pkl", - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/ppo/inference/2022-11-16 19:58:16.130830__rewards_rsl-0_rpl-0.1_pi-1.pkl", - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/ppo/inference/2022-11-16 19:58:16.130830__rewards_rsl-0_rpl-0.1_pi-1.pkl", - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/ppo/inference/2022-11-16 19:58:16.130830__rewards_rsl-0_rpl-0.1_pi-1.pkl", - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/ppo/inference/2022-11-16 19:58:16.130830__rewards_rsl-0_rpl-0.1_pi-1.pkl", - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/ppo/inference/2022-11-22 00:04:00.543207__rewards_rsl-0_rpl-0_pi-0_init_0.5.pkl", - "ppo", - "green") - #DQN - plot_intensities(ax1, - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/dqn/inference/2022-11-21 23:51:45.523954__rewards_rsl-0_rpl-0_pi-0_init_0.2.pkl", - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/dqn/inference/2022-11-21 23:51:45.523954__rewards_rsl-0_rpl-0_pi-0_init_0.2.pkl", - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/ppo/inference/2022-11-16 19:58:16.130830__rewards_rsl-0_rpl-0.1_pi-1.pkl", - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/dqn/inference/2022-11-22 00:36:30.175255__rewards_rsl-0_rpl-0_pi-0_init_0.3.pkl", - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/dqn/inference/2022-11-21 23:52:08.387862__rewards_rsl-0_rpl-0_pi-0_init_0.5.pkl", - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/ppo/inference/2022-11-21 23:38:48.830764__rewards_rsl-0_rpl-0_pi-0.pkl", - "DQN", - "red") - - #MANUAL - plot_intensities(ax1, - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/no_rl/inference/2022-11-22 00:32:26.534755__rewards_rsl-0_rpl-0_pi-0.pkl", - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/no_rl/inference/2022-11-22 00:32:26.534755__rewards_rsl-0_rpl-0_pi-0.pkl", - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/no_rl/inference/2022-11-22 00:31:51.862801__rewards_rsl-0_rpl-0_pi-0.pkl", - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/no_rl/inference/2022-11-22 00:32:09.926731__rewards_rsl-0_rpl-0_pi-0.pkl", - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/no_rl/inference/2022-11-22 00:32:36.126715__rewards_rsl-0_rpl-0_pi-0.pkl", - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/no_rl/inference/2022-11-22 00:32:45.402747__rewards_rsl-0_rpl-0_pi-0.pkl", - "programmatic", - "blue") - - #QLEAN - plot_intensities(ax1, - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/no_rl/inference/2022-11-09 20:25:41.817666__rewards_rsl-0_rpl-0_pi-0-init-0.1.pkl", - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/qlearning/inference/2022-11-21 23:30:57.818809__rewards_rsl-0_rpl-0_pi-0.pkl", - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/qlearning/inference/2022-11-22 00:00:02.418770__rewards_rsl-0_rpl-0_pi-0.pkl", - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/qlearning/inference/2022-11-22 00:30:10.278754__rewards_rsl-0_rpl-0_pi-0.pkl", - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/ppo/inference/2022-11-21 23:38:48.830764__rewards_rsl-0_rpl-0_pi-0.pkl", - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/qlearning/inference/2022-11-21 23:40:38.942765__rewards_rsl-0_rpl-0_pi-0.pkl", - "QLearning", - "purple") - - plt.xticks(init_pos) yticks = np.array(yticks) flatten_ticks = yticks.flatten() - clear_ticks = cleanticks(flatten_ticks) - plt.yticks(clear_ticks) - plt.setp(ax1.get_yticklabels(), horizontalalignment='right', fontsize='xx-small') - plt.setp(ax1.get_xticklabels(), horizontalalignment='right', fontsize='x-small') - plt.xlabel("initial angle with no perturbations") - plt.ylabel("steps per episode in average") - plt.grid() - plt.legend() + clear_ticks = cleanticks(sorted(flatten_ticks, reverse=True)) + + configure_intensities_graph(ax1, clear_ticks, intensities) + configure_boxplot_graph(ax2, intensities) + configure_boxplot_graph(ax3, intensities) + configure_boxplot_graph(ax4, intensities) + configure_boxplot_graph(ax5, intensities) + configure_boxplot_graph(ax6, intensities) + configure_boxplot_graph(ax7, intensities) + + fig.canvas.manager.full_screen_toggle() + fig2.canvas.manager.full_screen_toggle() + fig3.canvas.manager.full_screen_toggle() + fig4.canvas.manager.full_screen_toggle() + fig5.canvas.manager.full_screen_toggle() + fig6.canvas.manager.full_screen_toggle() + fig7.canvas.manager.full_screen_toggle() plt.show() + base_path = '/home/ruben/Desktop/2020-phd-ruben-lucas/docs/assets/images/results_images/cartpole/solidityExperiments/refinement/refinementOfRefinement/initpose/' + ax1.figure.savefig(base_path + 'comparison.png') + ax2.figure.savefig(base_path + 'ppo.png') + ax3.figure.savefig(base_path + 'dqn.png') + ax4.figure.savefig(base_path + 'no_rl.png') + ax5.figure.savefig(base_path + 'qlearning.png') + ax6.figure.savefig(base_path + 'qlearning.png') + ax7.figure.savefig(base_path + 'ddpg.png') + diff --git a/rl_studio/agents/utilities/plot_multiple_graphs_intensities.py b/rl_studio/agents/utilities/plot_multiple_graphs_intensities.py index c9710d9f4..b83ed825f 100755 --- a/rl_studio/agents/utilities/plot_multiple_graphs_intensities.py +++ b/rl_studio/agents/utilities/plot_multiple_graphs_intensities.py @@ -4,120 +4,271 @@ import matplotlib.pyplot as plt import numpy as np from statsmodels.distributions.empirical_distribution import ECDF +import os +import re RUNS = 100 max_episode_steps = 500 -intensities = [0, 1, 7, 12, 18] +min_pert_freq = 0.1 +min_pert_intensity = 1 + +intensities = [] +intensities_dev = [] yticks = [] +yticks_dev = [] + +def plot_intensities(ax, folder_path, color, boxplot=False): + # Use a regular expression to extract the part between "cartpole" and "inference" + match = re.search(r'cartpole/(.*?)/inference', folder_path) + if match: + extracted_part = match.group(1) + label = extracted_part + else: + label = "unknown" + print("No match found") + + ecdf_dict = {} + rewards_dict = {} + + # Iterate through all the files in the folder + for file_name in os.listdir(folder_path): + # Use a regular expression to extract the part between "pi" and "_in" + match = re.search(r'pi-(.*?)_in', file_name) + match2 = re.search(r'rpl-(.*?)_pi', file_name) + match3 = re.search(r'init_(.*?).pkl', file_name) + + if "rewards" in file_name and match and match2 and match3: + extracted_part = float(match.group(1)) + extracted_part2 = float(match2.group(1)) + extracted_part3 = float(match3.group(1)) + if extracted_part == 0 and extracted_part2 == 0 and extracted_part3 == 0 \ + or extracted_part > min_pert_intensity \ + or extracted_part == min_pert_intensity and extracted_part2 == min_pert_freq: + # Add the extracted part and filename to the list + rewards_file = open(folder_path + file_name, "rb") + rewards = pickle.load(rewards_file) + rewards = np.asarray(rewards) + ecdf = ECDF(rewards) + ecdf_dict[float(extracted_part) / 10] = 1 - ecdf(499) + rewards_dict[float(extracted_part) / 10] = rewards + + print(label) + sorted_ecdf_dict = dict(sorted(ecdf_dict.items(), key=lambda item: item[0])) + print(sorted_ecdf_dict) + extracted_intensities = list(sorted_ecdf_dict.keys()) + + intensities.extend(extracted_intensities) + + if boxplot: + sorted_rewards_dict = dict(sorted(rewards_dict.items(), key=lambda item: item[0])) + sorted_rewards = list(sorted_rewards_dict.values()) + + if len(sorted_rewards) == 0: + return + + ax.boxplot(sorted_rewards, positions=extracted_intensities, widths=0.03, + flierprops={'marker': 'o', 'markersize': 2}) + ax.legend([label]) + return + + success_percentage = list(sorted_ecdf_dict.values()) + + yticks.extend(success_percentage) + ax.plot(extracted_intensities, success_percentage, color=color, label=label) + + +def plot_deviations(ax, folder_path, color): + # Use a regular expression to extract the part between "cartpole" and "inference" + match = re.search(r'cartpole/(.*?)/inference', folder_path) + if match: + extracted_part = match.group(1) + label = extracted_part + else: + label = "unknown" + print("No match found") + + deviations_list = {} + + # Iterate through all the files in the folder + for file_name in os.listdir(folder_path): + # Use a regular expression to extract the part between "pi" and "_in" + match = re.search(r'pi-(.*?)_in', file_name) + match2 = re.search(r'rpl-(.*?)_pi', file_name) + match3 = re.search(r'init_(.*?).pkl', file_name) + + if "states" in file_name and match and match2 and match3: + extracted_part = float(match.group(1)) + extracted_part2 = float(match2.group(1)) + extracted_part3 = float(match3.group(1)) + + if extracted_part == 0 and extracted_part2 == 0 and extracted_part3 == 0 \ + or extracted_part > min_pert_intensity \ + or extracted_part == min_pert_intensity and extracted_part2 == min_pert_freq: + # Add the extracted part and filename to the list + states_file = open(folder_path + file_name, "rb") + states = pickle.load(states_file) + states_array = [np.asarray(step) for runs in states for step in runs] + if label == 'qlearning': + states_array = [(angle-50)/119.61722488 for angle in states_array] + absolute_arr = np.abs(states_array) + absolute_mean = np.mean(absolute_arr) + deviations_list[float(extracted_part) / 10] = absolute_mean + + print(label) + sorted_deviations_dict = dict(sorted(deviations_list.items(), key=lambda item: item[0])) + print(sorted_deviations_dict) + extracted_intensities = list(sorted_deviations_dict.keys()) -def plot_intensities(ax, file_0, file_1, file_2, file_3, file_4, label, color): - - rewards_file = open( - file_0, - "rb") - rewards = pickle.load(rewards_file) - rewards = np.asarray(rewards) - ecdf_0 = ECDF(rewards) - - rewards_file = open( - file_1, - "rb") - rewards = pickle.load(rewards_file) - rewards = np.asarray(rewards) - ecdf_1 = ECDF(rewards) - - rewards_file = open( - file_2, - "rb") - rewards = pickle.load(rewards_file) - rewards = np.asarray(rewards) - ecdf_2 = ECDF(rewards) - - rewards_file = open( - file_3, - "rb") - rewards = pickle.load(rewards_file) - rewards = np.asarray(rewards) - ecdf_3 = ECDF(rewards) - - rewards_file = open( - file_4, - "rb") - rewards = pickle.load(rewards_file) - rewards = np.asarray(rewards) - ecdf_4 = ECDF(rewards) - ninety_rewards = [1- ecdf_0(499), 1 - ecdf_1(499), 1 - ecdf_2(499), 1 - ecdf_3(499), 1 - ecdf_4(499)] - - yticks.append(ninety_rewards) - - ax.plot(intensities, ninety_rewards, color=color, label=label) + intensities_dev.extend(extracted_intensities) + + sorted_deviations = list(sorted_deviations_dict.values()) + + yticks_dev.extend(sorted_deviations) + ax.plot(extracted_intensities, sorted_deviations, color=color, label=label) def cleanticks(ticks): + if len(ticks) == 0: + return ticks clear_ticks = [] element1 = ticks[0] clear_ticks.append(element1) for element2 in ticks: if element1 != element2 and abs(element1 - element2) > 0.02: clear_ticks.append(element2) + element1 = element2 return clear_ticks + +# def configure_intensities_graph(ax1, clear_ticks, intensities): +def configure_intensities_graph(ax1, intensities): + # ax1.set_yticks(clear_ticks) + y_ticks = np.linspace(0, 1, 21) + ax1.set_yticks(y_ticks) + ax1.set_xticks(intensities) + + yticklabels = ax1.get_yticklabels() + for yticklabel in yticklabels: + yticklabel.set_horizontalalignment('right') + yticklabel.set_fontsize('xx-small') + + xticks = ax1.get_xticklabels() + for xtick in xticks: + xtick.set_horizontalalignment('right') + xtick.set_fontsize('xx-small') + ax1.grid() + ax1.legend() + + ax1.set_xlabel("intensity of perturbations with fixed frequency") + ax1.set_ylabel("percentage of successful episodes") + + +def configure_boxplot_graph(ax1, intensities): + boxplot_y = np.linspace(0, 500, 27) + ax1.set_yticks(boxplot_y) + sorted_intensities = sorted(list(set(intensities))) + ax1.set_xticks(sorted_intensities) + ax1.set_xlim(sorted_intensities[0] - 0.1, sorted_intensities[len(sorted_intensities) - 1] + 0.1) + xticks = ax1.get_xticklabels() + for xtick in xticks: + xtick.set_horizontalalignment('right') + xtick.set_fontsize('xx-small') + ax1.grid() + + +# def configure_intensities_graph(ax1, clear_ticks, intensities): +def configure_deviation_graph(ax1, intensities): + # ax1.set_yticks(clear_ticks) + y_ticks_dev = np.linspace(0, 1, 21) + ax1.set_yticks(y_ticks_dev) + ax1.set_xticks(intensities) + ax1.set_ylim(0, 0.1) + + yticklabels = ax1.get_yticklabels() + for yticklabel in yticklabels: + yticklabel.set_horizontalalignment('right') + yticklabel.set_fontsize('xx-small') + + xticks = ax1.get_xticklabels() + for xtick in xticks: + xtick.set_horizontalalignment('right') + xtick.set_fontsize('xx-small') + ax1.grid() + ax1.legend() + + ax1.set_xlabel("intensity of perturbations with fixed frequency") + ax1.set_ylabel("average pole angle") + if __name__ == "__main__": pltlib.rcParams.update({'font.size': 15}) fig, ax1 = plt.subplots() + fig2, ax2 = plt.subplots() + fig3, ax3 = plt.subplots() + fig4, ax4 = plt.subplots() + fig5, ax5 = plt.subplots() + fig6, ax6 = plt.subplots() + fig7, ax7 = plt.subplots() + fig8, ax8 = plt.subplots() + + # PPO + plot_intensities(ax1, "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/ppo/inference/", "green") + plot_intensities(ax2, "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/ppo/inference/", "green", True) + + # DQN + plot_intensities(ax1, "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/dqn/inference/", "red") + plot_intensities(ax3, "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/dqn/inference/", "red", True) + + # MANUAL + plot_intensities(ax1, "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/no_rl/inference/", "blue") + plot_intensities(ax4, "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/no_rl/inference/", "blue", True) + # QLEAN + plot_intensities(ax1, "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/qlearning/inference/", "purple") + plot_intensities(ax5, "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/qlearning/inference/", "purple", + True) + # PPO CONTINUOUS + plot_intensities(ax1, "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/ppo_continuous/inference/", "black") + plot_intensities(ax6, "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/ppo_continuous/inference/", "black", + True) + # DDPG + plot_intensities(ax1, "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/ddpg/inference/", "brown") + plot_intensities(ax7, "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/ddpg/inference/", "brown", True) + + # Deviations + plot_deviations(ax8, "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/ppo/inference/", "green") + plot_deviations(ax8, "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/dqn/inference/", "red") + plot_deviations(ax8, "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/no_rl/inference/", "blue") + plot_deviations(ax8, "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/qlearning/inference/", "purple") + plot_deviations(ax8, "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/ppo_continuous/inference/", "black") + plot_deviations(ax8, "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/ddpg/inference/", "brown") + + configure_intensities_graph(ax1, intensities) + configure_boxplot_graph(ax2, intensities) + configure_boxplot_graph(ax3, intensities) + configure_boxplot_graph(ax4, intensities) + configure_boxplot_graph(ax5, intensities) + configure_boxplot_graph(ax6, intensities) + configure_boxplot_graph(ax7, intensities) + configure_deviation_graph(ax8, intensities_dev) - #PPO - plot_intensities(ax1, - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/qlearning/inference/2022-11-21 23:16:16.550832__rewards_rsl-0_rpl-0_pi-0.pkl", - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/ppo/inference/2022-11-16 19:58:16.130830__rewards_rsl-0_rpl-0.1_pi-1.pkl", - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/ppo/inference/2022-11-16 20:03:12.387811__rewards_rsl-0_rpl-0.1_pi-7.pkl", - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/ppo/inference/2022-11-16 20:03:18.660196__rewards_rsl-0_rpl-0.1_pi-12.pkl", - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/ppo/inference/2022-11-16 20:03:43.891839__rewards_rsl-0_rpl-0.1_pi-18.pkl", - "ppo", - "green") - #DQN - plot_intensities(ax1, - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/qlearning/inference/2022-11-21 23:16:16.550832__rewards_rsl-0_rpl-0_pi-0.pkl", - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/dqn/inference/2022-11-16 20:35:59.222709__rewards_rsl-0_rpl-0.1_pi-1.pkl", - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/dqn/inference/2022-11-16 20:36:33.282602__rewards_rsl-0_rpl-0.1_pi-7.pkl", - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/dqn/inference/2022-11-16 20:36:51.443741__rewards_rsl-0_rpl-0.1_pi-12.pkl", - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/dqn/inference/2022-11-16 20:37:33.130595__rewards_rsl-0_rpl-0.1_pi-18.pkl", - "DQN", - "red") - - #MANUAL - plot_intensities(ax1, - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/qlearning/inference/2022-11-21 23:16:16.550832__rewards_rsl-0_rpl-0_pi-0.pkl", - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/no_rl/inference/2022-11-16 20:40:06.485079__rewards_rsl-0_rpl-0.1_pi-1.pkl", - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/no_rl/inference/2022-11-16 20:40:44.833057__rewards_rsl-0_rpl-0.1_pi-7.pkl", - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/no_rl/inference/2022-11-16 20:40:51.609087__rewards_rsl-0_rpl-0.1_pi-12.pkl", - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/no_rl/inference/2022-11-16 20:41:02.009116__rewards_rsl-0_rpl-0.1_pi-18.pkl", - "programmatic", - "blue") - - #QLEAN - plot_intensities(ax1, - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/qlearning/inference/2022-11-21 23:16:16.550832__rewards_rsl-0_rpl-0_pi-0.pkl", - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/qlearning/inference/2022-11-21 23:16:49.854808__rewards_rsl-0_rpl-0.1_pi-1.pkl", - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/qlearning/inference/2022-11-21 23:17:27.826748__rewards_rsl-0_rpl-0.1_pi-7.pkl", - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/qlearning/inference/2022-11-21 23:17:27.826748__rewards_rsl-0_rpl-0.1_pi-7.pkl", - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/qlearning/inference/2022-11-21 23:17:27.826748__rewards_rsl-0_rpl-0.1_pi-7.pkl", - "QLearning", - "purple") - - plt.xticks(intensities) - yticks = np.array(yticks) - flatten_ticks = yticks.flatten() - clear_ticks = cleanticks(flatten_ticks) - plt.yticks(clear_ticks) - plt.setp(ax1.get_yticklabels(), horizontalalignment='right', fontsize='xx-small') - plt.setp(ax1.get_xticklabels(), horizontalalignment='right', fontsize='x-small') - plt.xlabel("intensity of perturbations with fixed frequency") - plt.ylabel("percentage of successful episodes") - plt.grid() - plt.legend() + fig.canvas.manager.full_screen_toggle() + fig2.canvas.manager.full_screen_toggle() + fig3.canvas.manager.full_screen_toggle() + fig4.canvas.manager.full_screen_toggle() + fig5.canvas.manager.full_screen_toggle() + fig6.canvas.manager.full_screen_toggle() + fig7.canvas.manager.full_screen_toggle() + fig8.canvas.manager.full_screen_toggle() plt.show() + base_path = '/home/ruben/Desktop/2020-phd-ruben-lucas/docs/assets/images/results_images/cartpole/solidityExperiments/refinement/refinementOfRefinement/intensity/' + ax1.figure.savefig(base_path + 'comparison.png') + ax2.figure.savefig(base_path + 'ppo.png') + ax3.figure.savefig(base_path + 'dqn.png') + ax4.figure.savefig(base_path + 'no_rl.png') + ax5.figure.savefig(base_path + 'qlearning.png') + ax6.figure.savefig(base_path + 'qlearning.png') + ax7.figure.savefig(base_path + 'ddpg.png') + ax8.figure.savefig(base_path + 'deviations.png') diff --git a/rl_studio/agents/utilities/plot_multiple_graphs_template.py b/rl_studio/agents/utilities/plot_multiple_graphs_template.py index 2964d651d..a56b350bd 100755 --- a/rl_studio/agents/utilities/plot_multiple_graphs_template.py +++ b/rl_studio/agents/utilities/plot_multiple_graphs_template.py @@ -4,7 +4,7 @@ import matplotlib.pyplot as plt import numpy as np -RUNS = 100 +RUNS = 20000 max_episode_steps = 500 if __name__ == "__main__": @@ -12,34 +12,57 @@ fig, ax1 = plt.subplots() + RUNS = 4000000 + rewards_file = open( - "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/ppo/training/2022-11-07 23:33:33.604598__rewards_rsl-0_rpl-0.4_pi-1.pkl", "rb") + "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/qlearning/training/2023-01-20 18:41:16.873044__rewards_.pkl", "rb") rewards = pickle.load(rewards_file) rewards = np.asarray(rewards) - ax1.plot(range(RUNS), rewards, color='blue', label='ppo') + ax1.set_xscale('log') + ax1.plot(range(RUNS), rewards, color='purple', label='qlearning') RUNS = 20000 - # rewards_file = open( - # "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/dqn/training/2022-10-30 01:21:52.071319__rewards_rsl-0_rpl-0_pi-0.pkl", "rb") - # rewards = pickle.load(rewards_file) - # rewards = np.asarray(rewards) - # ax1.plot(range(RUNS), rewards, color='green', label='dqn') - - # rewards_file = open( - # "/rl_studio/logs/cartpole/old_datasets/training_with_frequencies/2022-10-20 23:00:04.352224__rewards_rsl-0_rpl-0.2_pi-10.pkl", "rb") - # rewards = pickle.load(rewards_file) - # rewards = np.asarray(rewards) - # ax1.plot(range(RUNS), rewards, color='orange', label='trained with frequency = 0.2') - # - # rewards_file = open( - # "/rl_studio/logs/cartpole/old_datasets/training_with_frequencies/2022-10-20 22:59:30.164014__rewards_rsl-0_rpl-0.2_pi-10.pkl", "rb") - # rewards = pickle.load(rewards_file) - # rewards = np.asarray(rewards) - # ax1.plot(range(RUNS), rewards, color='black', label='trained with frequency= 0.3') + rewards_file = open( + "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/dqn/training/2023-01-20 02:50:09.991537__rewards_rsl-0_rpl-0_pi-1.pkl", "rb") + rewards = pickle.load(rewards_file) + rewards = np.asarray(rewards) + ax1.set_xscale('log') + ax1.plot(range(RUNS), rewards, color='pink', label='dqn') + + RUNS = 10000 + + rewards_file = open( + "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/ddpg/training/2023-01-14 03:15:07.136008__rewards_rsl-0_rpl-0.1_pi-1.pkl", "rb") + rewards = pickle.load(rewards_file) + rewards = np.asarray(rewards) + ax1.set_xscale('log') + ax1.plot(range(RUNS), rewards, color='brown', label='ddpg') + + RUNS = 1000 + + rewards_file = open( + "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/ppo_continuous/training/2023-01-11 21:25:12.509340__rewards_rsl-0_rpl-0_pi-0.pkl", "rb") + rewards = pickle.load(rewards_file) + rewards = np.asarray(rewards) + ax1.set_xscale('log') + ax1.plot(range(RUNS), rewards, color='black', label='ppo_continuous') + + RUNS = 1000 + + rewards_file = open( + "/home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/ppo/training/2023-01-11 21:30:20.951490__rewards_rsl-0.2_rpl-0_pi-0.pkl", "rb") + rewards = pickle.load(rewards_file) + rewards = np.asarray(rewards) + ax1.set_xscale('log') + ax1.plot(range(RUNS), rewards, color='green', label='ppo') + fig.canvas.manager.full_screen_toggle() plt.legend() plt.ylabel("steps") plt.xlabel("runs") plt.show() + base_path = '/home/ruben/Desktop/2020-phd-ruben-lucas/docs/assets/images/results_images/cartpole/solidityExperiments/refinement/refinementOfRefinement/' + ax1.figure.savefig(base_path + 'trainings.png') + diff --git a/rl_studio/agents/utils.py b/rl_studio/agents/utils.py index a917be43f..d6f1ba5cd 100644 --- a/rl_studio/agents/utils.py +++ b/rl_studio/agents/utils.py @@ -1,61 +1,43 @@ -import datetime +from datetime import datetime, timedelta +import logging import os import pickle +from pprint import pformat +import time import cv2 import numpy as np import pandas as pd +from pygments import highlight +from pygments.formatters import Terminal256Formatter +from pygments.lexers import PythonLexer from rl_studio.agents.f1 import settings -def load_model(qlearn, file_name): - - qlearn_file = open("./logs/qlearn_models/" + file_name) - model = pickle.load(qlearn_file) +class LoggingHandler: + def __init__(self, log_file): + self.logger = logging.getLogger(__name__) + c_handler = logging.StreamHandler() + f_handler = logging.FileHandler(log_file) - qlearn.q = model - qlearn.ALPHA = settings.algorithm_params["alpha"] - qlearn.GAMMA = settings.algorithm_params["gamma"] - qlearn.epsilon = settings.algorithm_params["epsilon"] + c_handler.setLevel(logging.INFO) + f_handler.setLevel(logging.INFO) - print(f"\n\nMODEL LOADED. Number of (action, state): {len(model)}") - print(f" - Loading: {file_name}") - print(f" - Model size: {len(qlearn.q)}") - print(f" - Action set: {settings.actions_set}") - print(f" - Epsilon: {qlearn.epsilon}") - print(f" - Start: {datetime.datetime.now()}") + # Create formatters and add it to handlers + c_format = logging.Formatter("%(name)s - %(levelname)s - %(message)s") + f_format: Formatter = logging.Formatter( + "[%(levelname)s] - %(asctime)s, filename: %(filename)s, funcname: %(funcName)s, line: %(lineno)s\n messages ---->\n %(message)s" + ) + c_handler.setFormatter(c_format) + f_handler.setFormatter(f_format) + # Add handlers to the logger + self.logger.addHandler(c_handler) + self.logger.addHandler(f_handler) -def save_model(qlearn, current_time, states, states_counter, states_rewards): - # Tabular RL: Tabular Q-learning basically stores the policy (Q-values) of the agent into a matrix of shape - # (S x A), where s are all states, a are all the possible actions. After the environment is solved, just save this - # matrix as a csv file. I have a quick implementation of this on my GitHub under Reinforcement Learning. - - # Q TABLE - base_file_name = "_act_set_{}_epsilon_{}".format( - settings.actions_set, round(qlearn.epsilon, 2) - ) - file_dump = open( - "./logs/qlearn_models/1_" + current_time + base_file_name + "_QTABLE.pkl", "wb" - ) - pickle.dump(qlearn.q, file_dump) - # STATES COUNTER - states_counter_file_name = base_file_name + "_STATES_COUNTER.pkl" - file_dump = open( - "./logs/qlearn_models/2_" + current_time + states_counter_file_name, "wb" - ) - pickle.dump(states_counter, file_dump) - # STATES CUMULATED REWARD - states_cum_reward_file_name = base_file_name + "_STATES_CUM_REWARD.pkl" - file_dump = open( - "./logs/qlearn_models/3_" + current_time + states_cum_reward_file_name, "wb" - ) - pickle.dump(states_rewards, file_dump) - # STATES - steps = base_file_name + "_STATES_STEPS.pkl" - file_dump = open("./logs/qlearn_models/4_" + current_time + steps, "wb") - pickle.dump(states, file_dump) + def get_logger(self): + return self.logger def save_times(checkpoints): @@ -102,6 +84,12 @@ def print_messages(*args, **kwargs): print("\n") +def print_dictionary(dic): + # pp = pprint.PrettyPrinter(indent=4) + # pp.pprint(dic) + print(highlight(pformat(dic), PythonLexer(), Terminal256Formatter()), end="") + + def render_params(**kwargs): font = cv2.FONT_HERSHEY_SIMPLEX canvas = np.zeros((400, 400, 3), dtype="uint8") @@ -128,75 +116,164 @@ def render_params(**kwargs): cv2.waitKey(100) -def save_agent_npy(environment, outdir, physics, current_time): - """ """ - - outdir_episode = f"{outdir}_stats" - os.makedirs(f"{outdir_episode}", exist_ok=True) - - file_npy = f"{outdir_episode}/{current_time}_Circuit-{environment['circuit_name']}_States-{environment['state_space']}_Actions-{environment['action_space']}_rewards-{environment['reward_function']}.npy" - - np.save(file_npy, physics) - - -def save_stats_episodes(environment, outdir, aggr_ep_rewards, current_time): +def save_dataframe_episodes(environment, outdir, aggr_ep_rewards, actions_rewards=None): """ - We save info of EPISODES in a dataframe to export or manage + We save info every certains epochs in a dataframe and .npy format to export or manage """ + os.makedirs(f"{outdir}", exist_ok=True) - outdir_episode = f"{outdir}_stats" - os.makedirs(f"{outdir_episode}", exist_ok=True) - - file_csv = f"{outdir_episode}/{current_time}_Circuit-{environment['circuit_name']}_States-{environment['state_space']}_Actions-{environment['action_space']}_rewards-{environment['reward_function']}.csv" - file_excel = f"{outdir_episode}/{current_time}_Circuit-{environment['circuit_name']}_States-{environment['state_space']}_Actions-{environment['action_space']}_rewards-{environment['reward_function']}.xlsx" + file_csv = f"{outdir}/{time.strftime('%Y%m%d-%H%M%S')}_Circuit-{environment['circuit_name']}_States-{environment['states']}_Actions-{environment['action_space']}_Rewards-{environment['reward_function']}.csv" + file_excel = f"{outdir}/{time.strftime('%Y%m%d-%H%M%S')}_Circuit-{environment['circuit_name']}_States-{environment['states']}_Actions-{environment['action_space']}_Rewards-{environment['reward_function']}.xlsx" df = pd.DataFrame(aggr_ep_rewards) df.to_csv(file_csv, mode="a", index=False, header=None) df.to_excel(file_excel) + if actions_rewards is not None: + file_npy = f"{outdir}/{time.strftime('%Y%m%d-%H%M%S')}_Circuit-{environment['circuit_name']}_States-{environment['states']}_Actions-{environment['action_space']}_Rewards-{environment['reward_function']}.npy" + np.save(file_npy, actions_rewards) + + +def save_best_episode( + global_params, + cumulated_reward, + episode, + step, + start_time_epoch, + reward, + image_center, +): + """ + save best episode in training + """ -def save_model_qlearn( - environment, - outdir, - qlearn, - current_time, - steps_epochs, - states_counter, - states_rewards, + current_max_reward = cumulated_reward + best_epoch = episode + best_step = step + best_epoch_training_time = datetime.now() - start_time_epoch + # saving params to show + # self.actions_rewards["episode"].append(episode) + # self.actions_rewards["step"].append(step) + # self.actions_rewards["reward"].append(reward) + global_params.actions_rewards["episode"].append(episode) + global_params.actions_rewards["step"].append(step) + # For continuous actios + # self.actions_rewards["v"].append(action[0][0]) + # self.actions_rewards["w"].append(action[0][1]) + global_params.actions_rewards["reward"].append(reward) + global_params.actions_rewards["center"].append(image_center) + + return current_max_reward, best_epoch, best_step, best_epoch_training_time + + +def save_best_episode_dqn( + global_params, + cumulated_reward, episode, step, - epsilon, + start_time_epoch, + reward, ): + """ + save best episode in training + """ + + current_max_reward = cumulated_reward + best_epoch = episode + best_step = step + best_epoch_training_time = datetime.now() - start_time_epoch + # saving params to show + # self.actions_rewards["episode"].append(episode) + # self.actions_rewards["step"].append(step) + # self.actions_rewards["reward"].append(reward) + global_params.best_current_epoch["best_epoch"].append(episode) + global_params.best_current_epoch["best_step"].append(step) + # For continuous actios + # self.actions_rewards["v"].append(action[0][0]) + # self.actions_rewards["w"].append(action[0][1]) + global_params.best_current_epoch["highest_reward"].append(reward) + global_params.best_current_epoch["best_epoch_training_time"].append( + best_epoch_training_time + ) + global_params.best_current_epoch["current_total_training_time"].append( + start_time_epoch + ) + + return current_max_reward, best_epoch, best_step, best_epoch_training_time + + +def save_batch(episode, step, start_time_epoch, start_time, global_params, env_params): + """ + save batch of n episodes + """ + average_reward = sum(global_params.ep_rewards[-env_params.save_episodes :]) / len( + global_params.ep_rewards[-env_params.save_episodes :] + ) + min_reward = min(global_params.ep_rewards[-env_params.save_episodes :]) + max_reward = max(global_params.ep_rewards[-env_params.save_episodes :]) + + global_params.aggr_ep_rewards["episode"].append(episode) + global_params.aggr_ep_rewards["step"].append(step) + global_params.aggr_ep_rewards["avg"].append(average_reward) + global_params.aggr_ep_rewards["max"].append(max_reward) + global_params.aggr_ep_rewards["min"].append(min_reward) + global_params.aggr_ep_rewards["epoch_training_time"].append( + (datetime.now() - start_time_epoch).total_seconds() + ) + global_params.aggr_ep_rewards["total_training_time"].append( + (datetime.now() - start_time).total_seconds() + ) + + return global_params.aggr_ep_rewards + + +def load_model(qlearn, file_name): + """ + Qlearn old version + """ + + qlearn_file = open("./logs/qlearn_models/" + file_name) + model = pickle.load(qlearn_file) + + qlearn.q = model + qlearn.ALPHA = settings.algorithm_params["alpha"] + qlearn.GAMMA = settings.algorithm_params["gamma"] + qlearn.epsilon = settings.algorithm_params["epsilon"] + + print(f"\n\nMODEL LOADED. Number of (action, state): {len(model)}") + print(f" - Loading: {file_name}") + print(f" - Model size: {len(qlearn.q)}") + print(f" - Action set: {settings.actions_set}") + print(f" - Epsilon: {qlearn.epsilon}") + print(f" - Start: {datetime.datetime.now()}") + + +def save_model(qlearn, current_time, states, states_counter, states_rewards): # Tabular RL: Tabular Q-learning basically stores the policy (Q-values) of the agent into a matrix of shape # (S x A), where s are all states, a are all the possible actions. After the environment is solved, just save this # matrix as a csv file. I have a quick implementation of this on my GitHub under Reinforcement Learning. - outdir_models = f"{outdir}_models" - os.makedirs(f"{outdir_models}", exist_ok=True) - # Q TABLE - # base_file_name = "_actions_set:_{}_epsilon:_{}".format(settings.actions_set, round(qlearn.epsilon, 2)) - base_file_name = f"_Circuit-{environment['circuit_name']}_States-{environment['state_space']}_Actions-{environment['action_space']}_epsilon-{round(epsilon,3)}_epoch-{episode}_step-{step}_reward-{states_rewards}" + base_file_name = "_act_set_{}_epsilon_{}".format( + settings.actions_set, round(qlearn.epsilon, 2) + ) file_dump = open( - f"{outdir_models}/1_" + current_time + base_file_name + "_QTABLE.pkl", "wb" + "./logs/qlearn_models/1_" + current_time + base_file_name + "_QTABLE.pkl", "wb" ) pickle.dump(qlearn.q, file_dump) - # STATES COUNTER states_counter_file_name = base_file_name + "_STATES_COUNTER.pkl" file_dump = open( - f"{outdir_models}/2_" + current_time + states_counter_file_name, "wb" + "./logs/qlearn_models/2_" + current_time + states_counter_file_name, "wb" ) pickle.dump(states_counter, file_dump) - # STATES CUMULATED REWARD states_cum_reward_file_name = base_file_name + "_STATES_CUM_REWARD.pkl" file_dump = open( - f"{outdir_models}/3_" + current_time + states_cum_reward_file_name, "wb" + "./logs/qlearn_models/3_" + current_time + states_cum_reward_file_name, "wb" ) pickle.dump(states_rewards, file_dump) - # STATES steps = base_file_name + "_STATES_STEPS.pkl" - file_dump = open(f"{outdir_models}/4_" + current_time + steps, "wb") - pickle.dump(steps_epochs, file_dump) + file_dump = open("./logs/qlearn_models/4_" + current_time + steps, "wb") + pickle.dump(states, file_dump) diff --git a/rl_studio/algorithms/README.md b/rl_studio/algorithms/README.md index 1e173d991..7eab70f6c 100644 --- a/rl_studio/algorithms/README.md +++ b/rl_studio/algorithms/README.md @@ -1,65 +1,31 @@ # Algorithms -## Deep Deterministic Gradient Policy (DDPG) - -The algorithm is based in LILLICRAP, Timothy P., et al. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015. - -It allows us to work with **multidimensional states** such as raw input from a camera and with **continuous** or **discrete actions** to develop complex projects. Now, it is based on **Tensorflow**, although in the future we will be able to integrate other Deep Learning frameworks. - -## F1 - Follow line camera sensor with DDPG algorithm +## Q-learning +It is based in Watkins, C. J., & Dayan, P. (1992). Q-learning. Machine learning, 8(3), 279-292. +As a tabular method to solve reinforcement learning tasks, it acts in a particular state getting an inmediate reward, saving all pairs (states, actions) -> rewards in a table. +Q-learning has been designed to work with low dimensional states and discrete actions. It is one of the canonical RL algorithms with a high performance for low level dimensionality tasks. -For Formula1 F1 agent follows line with camera sensor, the main features are: +Our implementation of Q-Learning algorithm has two approaches: with table or dictionnary. You can choose any of them through config file. Dictionnary is more efficient in terms of memory size due to its dynamic implementation. Otherwise table option is closer to the Q-learning original approach, developed with numpy library. +Both have been tested succesfully in different tasks. -- **state/observation**: Currently there are two ways to generate the input state that feeds the RL algorithm through a camera: **simplified perception of n points** or the **raw image**. - With simplified perception, the image is divided into regions and the points of the road central line generate the state that feeds the neural network. - In case the input space is raw image, the state is the image obtained by the camera sensor. This image must be resized so that it can be processed by the neural networks. -- **actions**: _discrete_ or _continuous_. In the case of discrete actions, sets of pairs [linear velocity, angular velocity] specific to each circuit are generated. The continuous actions are established with the minimum and maximum ranges of linear and angular velocity. -- **reward**: _discrete_ or _linear_. The discrete reward function generates values ​​obtained by trial and error where the reward is bigger or lower according to the distance to the road line center. The linear reward function is determined by the relationship between the linear and angular velocity of the car and its position with respect to the center line of the road. -## Setting Params in DDPG F1 - follow line camera sensor - -The parameters must be configured through the config.yaml file in the /config directory. The most relevant parameters are: - -Agent: - -- image_resizing: 10. Generally the size of the image captured by the camera sensor is determined in the agent configuration and the standard is 480x640 pixels. This size is too large for neural network processing so it should be reduced. This variable determines the percentage of image size reduction, i.e. 10 means that it is reduced to 10% of its original size, so in the default size the image is reduced to 48x64 pixels. +--- +## Deep Deterministic Gradient Policy (DDPG) -- new_image_size: 32. It gives us another way of reducing the image for processing in neural networks. In this case, the parameter determined here generates an image of size number x number, i.e., 32x32, 64x64... which is more efficient for processing in neural networks. +The algorithm is based in LILLICRAP, Timothy P., et al. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015. -- raw_image: False. It is a Boolean variable that, if True, takes as input state of the neural network, the raw image obtained by the camera sensor. If this variable is False, the image obtained will be preprocessed and converted to black and white to obtain the necessary information and then it will be reduced in size to feed the neural network. +It allows us to work with **multidimensional states** such as raw input from a camera and with **continuous** or **discrete actions** to develop complex projects. Now, it is based on **Tensorflow**, although in the future we will be able to integrate other Deep Learning frameworks. -- State_space: image or sp1, sp3... gives us the distance in pixels down from line that marks the horizon of the road. --- ## Deep Q Networks (DQN) -Based on [Human-level control through deep reinforcement learning whitepaper](https://www.nature.com/articles/nature14236?wm=book_wap_0005), it allows working with multidimensional states through Deep Neural Nets and discrete actions. - -## F1 - Follow line camera sensor with DQN algorithm - -Like DDPG Formula1 F1 agent following the line with camera sensor, the main features are: - -- **state/observation**: Currently there are two ways to generate the input state that feeds the RL algorithm through a camera: **simplified perception of n points** or the **raw image**. - With simplified perception, the image is divided into regions and the points of the road central line generate the state that feeds the neural network. - In case the input space is raw image, the state is the image obtained by the camera sensor. This image must be resized so that it can be processed by the neural networks. - -- **actions**: only _discrete_ working like DDPG F1 agent. - -- **reward**: _discrete_ or _linear_. The discrete reward function generates values ​​obtained by trial and error where the reward is bigger or lower according to the distance to the road line center. The linear reward function is determined by the relationship between the linear and angular velocity of the car and its position with respect to the center line of the road. +Based on [Human-level control through deep reinforcement learning whitepaper](https://www.nature.com/articles/nature14236?wm=book_wap_0005), it allows working with **multidimensional states** with Deep Neural Nets and **discrete actions**. Our solution is currently based on Tensorflow framework. -## Setting Params in DQN F1 - follow line camera sensor -The parameters must be configured through the config.yaml file in the /config directory. The most relevant parameters are: - -Agent: - -- image_resizing: 10. Generally the size of the image captured by the camera sensor is determined in the agent configuration and the standard is 480x640 pixels. This size is too large for neural network processing so it should be reduced. This variable determines the percentage of image size reduction, i.e. 10 means that it is reduced to 10% of its original size, so in the default size the image is reduced to 48x64 pixels. - -- new_image_size: 32. It gives us another way of reducing the image for processing in neural networks. In this case, the parameter determined here generates an image of size number x number, i.e., 32x32, 64x64... which is more efficient for processing in neural networks. - -- raw_image: False. It is a Boolean variable that, if True, takes as input state of the neural network, the raw image obtained by the camera sensor. If this variable is False, the image obtained will be preprocessed and converted to black and white to obtain the necessary information and then it will be reduced in size to feed the neural network. - -- State_space: image or sp1, sp3... gives us the distance in pixels down from line that marks the horizon of the road. +--- +## How to config and launch +If you want to config and training or inferencing, please go to [agents](../agents/README.md) section \ No newline at end of file diff --git a/rl_studio/algorithms/__init__.py b/rl_studio/algorithms/__init__.py index 2b66deb44..9f574ba4d 100755 --- a/rl_studio/algorithms/__init__.py +++ b/rl_studio/algorithms/__init__.py @@ -2,6 +2,7 @@ from rl_studio.algorithms.exceptions import NoValidAlgorithmType import pickle + class TrainerFactory: def __init__(self, **kwargs): self.algorithm = kwargs.get("algorithm") @@ -20,7 +21,19 @@ def __new__(cls, config): actions_file = open(actions_file_name, "rb") actions = pickle.load(actions_file) - brain = QLearn(config) + brain = QLearn(config, epsilon=0) + brain.load_model(inference_file_name, actions) + + return brain + + if algorithm == AlgorithmsType.DEPRECATED_QLEARN.value: + from rl_studio.algorithms.qlearn import QLearn + + actions_file_name = config.actions_file + actions_file = open(actions_file_name, "rb") + actions = pickle.load(actions_file) + + brain = QLearn(config, epsilon=0.05) brain.load_model(inference_file_name, actions) return brain @@ -46,7 +59,19 @@ def __new__(cls, config): return brain - elif algorithm == AlgorithmsType.DDPG_TORCH.value: + elif algorithm == AlgorithmsType.PPO_CONTINIUOUS.value: + from rl_studio.algorithms.ppo_continuous import PPO + + input_dim = config.env.observation_space.shape[0] + output_dim = config.env.action_space.shape[0] + + brain = PPO(input_dim, output_dim, None, None, None, None, None, + True, None) + brain.load(inference_file_name) + + return brain + + elif algorithm == AlgorithmsType.DDPG.value: from rl_studio.algorithms.ddpg_torch import Actor brain = Actor() diff --git a/rl_studio/algorithms/algorithms_type.py b/rl_studio/algorithms/algorithms_type.py index e4a76fbdf..fa095c694 100755 --- a/rl_studio/algorithms/algorithms_type.py +++ b/rl_studio/algorithms/algorithms_type.py @@ -4,9 +4,10 @@ class AlgorithmsType(Enum): PROGRAMMATIC = 'programmatic' QLEARN = "qlearn" - QLEARN_MULTIPLE = "qlearn_multiple_states" + DEPRECATED_QLEARN = "qlearn_deprecated" DQN = "dqn" DDPG = "ddpg" DDPG_TORCH = "ddpg_torch" PPO = "ppo" + PPO_CONTINIUOUS = 'ppo_continuous' MANUAL = "manual" diff --git a/rl_studio/algorithms/ddpg.py b/rl_studio/algorithms/ddpg.py index 984fd93c2..d289145bb 100644 --- a/rl_studio/algorithms/ddpg.py +++ b/rl_studio/algorithms/ddpg.py @@ -13,10 +13,16 @@ Flatten, Rescaling, ) -from tensorflow.keras.models import Sequential +from tensorflow.keras.models import Sequential, load_model from tensorflow.keras.optimizers import Adam +# Sharing GPU +gpus = tf.config.experimental.list_physical_devices("GPU") +for gpu in gpus: + tf.config.experimental.set_memory_growth(gpu, True) + + # Own Tensorboard class class ModifiedTensorBoard(TensorBoard): @@ -260,7 +266,7 @@ def __init__(self, config, action_space_size, observation_space_values, outdir): self.ACTION_SPACE_SIZE = action_space_size self.OBSERVATION_SPACE_VALUES = observation_space_values - if config["state_space"] == "image": + if config["states"] == "image": self.OBSERVATION_SPACE_VALUES_FLATTEN = ( observation_space_values[0] * observation_space_values[1] @@ -269,10 +275,10 @@ def __init__(self, config, action_space_size, observation_space_values, outdir): # Continuous Actions if config["action_space"] == "continuous": - self.V_UPPER_BOUND = config["actions"]["v_max"] - self.V_LOWER_BOUND = config["actions"]["v_min"] - self.W_RIGHT_BOUND = config["actions"]["w_right"] - self.W_LEFT_BOUND = config["actions"]["w_left"] + self.V_UPPER_BOUND = config["actions"]["v"][1] + self.V_LOWER_BOUND = config["actions"]["v"][0] + self.W_RIGHT_BOUND = config["actions"]["w"][0] + self.W_LEFT_BOUND = config["actions"]["w"][1] # NN settings self.MODEL_NAME = config["model_name"] @@ -283,54 +289,104 @@ def __init__(self, config, action_space_size, observation_space_values, outdir): # Custom tensorboard object self.tensorboard = ModifiedTensorBoard( - log_dir=f"{outdir}/logs_TensorBoard/{self.MODEL_NAME}-{time.strftime('%Y%m%d-%H%M%S')}" + log_dir=f"{outdir}/{self.MODEL_NAME}-{time.strftime('%Y%m%d-%H%M%S')}" ) # Used to count when to update target network with main network's weights self.target_update_counter = 0 - # Actor & Critic main models # gets trained every step - if config["action_space"] == "continuous" and config["state_space"] != "image": - self.actor_model = self.get_actor_model_sp_continuous_actions() - self.critic_model = self.get_critic_model_sp_continuous_actions() - # Actor Target model this is what we .predict against every step - self.target_actor = self.get_actor_model_sp_continuous_actions() - self.target_actor.set_weights(self.actor_model.get_weights()) - # Critic Target model this is what we .predict against every step - self.target_critic = self.get_critic_model_sp_continuous_actions() - self.target_critic.set_weights(self.critic_model.get_weights()) - - elif ( - config["action_space"] != "continuous" and config["state_space"] != "image" - ): - self.actor_model = ( - self.get_actor_model_simplified_perception_discrete_actions() - ) - self.critic_model = ( - self.get_critic_model_simplified_perception_discrete_actions() - ) - # Actor Target model this is what we .predict against every step - self.target_actor = ( - self.get_actor_model_simplified_perception_discrete_actions() + # load pretrained model for continuing training (not inference) + if config["mode"] == "retraining": + print("---------------------- entry load retrained model") + print(f"{outdir}/{config['retrain_ddpg_tf_actor_model_name']}") + print(f"{outdir}/{config['retrain_ddpg_tf_critic_model_name']}") + # load pretrained actor and critic models + actor_retrained_model = ( + f"{outdir}/{config['retrain_ddpg_tf_actor_model_name']}" ) - self.target_actor.set_weights(self.actor_model.get_weights()) - # Critic Target model this is what we .predict against every step - self.target_critic = ( - self.get_critic_model_simplified_perception_discrete_actions() + critic_retrained_model = ( + f"{outdir}/{config['retrain_ddpg_tf_critic_model_name']}" ) - self.target_critic.set_weights(self.critic_model.get_weights()) - - elif ( - config["action_space"] == "continuous" and config["state_space"] == "image" - ): - self.actor_model = self.get_actor_model_image_continuous_actions() - self.critic_model = self.get_critic_model_image_continuous_actions_conv() - # Actor Target model this is what we .predict against every step - self.target_actor = self.get_actor_model_image_continuous_actions() - self.target_actor.set_weights(self.actor_model.get_weights()) - # Critic Target model this is what we .predict against every step - self.target_critic = self.get_critic_model_image_continuous_actions_conv() - self.target_critic.set_weights(self.critic_model.get_weights()) + self.actor_model = load_model(actor_retrained_model, compile=False) + self.critic_model = load_model(critic_retrained_model, compile=False) + self.target_actor = load_model(actor_retrained_model, compile=False) + self.target_critic = load_model(critic_retrained_model, compile=False) + + else: # training from scratch + # Actor & Critic main models # gets trained every step + if config["action_space"] == "continuous" and config["states"] != "image": + self.actor_model = self.get_actor_model_sp_continuous_actions() + self.critic_model = self.get_critic_model_sp_continuous_actions() + # Actor Target model this is what we .predict against every step + self.target_actor = self.get_actor_model_sp_continuous_actions() + self.target_actor.set_weights(self.actor_model.get_weights()) + # Critic Target model this is what we .predict against every step + self.target_critic = self.get_critic_model_sp_continuous_actions() + self.target_critic.set_weights(self.critic_model.get_weights()) + + elif config["action_space"] != "continuous" and config["states"] != "image": + self.actor_model = ( + self.get_actor_model_simplified_perception_discrete_actions() + ) + self.critic_model = ( + self.get_critic_model_simplified_perception_discrete_actions() + ) + # Actor Target model this is what we .predict against every step + self.target_actor = ( + self.get_actor_model_simplified_perception_discrete_actions() + ) + self.target_actor.set_weights(self.actor_model.get_weights()) + # Critic Target model this is what we .predict against every step + self.target_critic = ( + self.get_critic_model_simplified_perception_discrete_actions() + ) + self.target_critic.set_weights(self.critic_model.get_weights()) + + elif config["action_space"] == "continuous" and config["states"] == "image": + self.actor_model = self.get_actor_model_image_continuous_actions() + self.critic_model = ( + self.get_critic_model_image_continuous_actions_conv() + ) + # Actor Target model this is what we .predict against every step + self.target_actor = self.get_actor_model_image_continuous_actions() + self.target_actor.set_weights(self.actor_model.get_weights()) + # Critic Target model this is what we .predict against every step + self.target_critic = ( + self.get_critic_model_image_continuous_actions_conv() + ) + self.target_critic.set_weights(self.critic_model.get_weights()) + + else: + ############## + # TODO: create specific models for State=image and actions=discrete + self.actor_model = ( + self.get_actor_model_simplified_perception_discrete_actions() + ) + self.critic_model = ( + self.get_critic_model_simplified_perception_discrete_actions() + ) + # Actor Target model this is what we .predict against every step + self.target_actor = ( + self.get_actor_model_simplified_perception_discrete_actions() + ) + self.target_actor.set_weights(self.actor_model.get_weights()) + # Critic Target model this is what we .predict against every step + self.target_critic = ( + self.get_critic_model_simplified_perception_discrete_actions() + ) + self.target_critic.set_weights(self.critic_model.get_weights()) + + def load_inference_model(self, models_dir, config): + """ + we work with actor_model. Try also target_actor + """ + path_actor_inference_model = ( + f"{models_dir}/{config['inference_ddpg_tf_actor_model_name']}" + ) + actor_inference_model = load_model(path_actor_inference_model, compile=False) + # critic_inference_model = load_model(path_critic_inference_model, compile=False) + + return actor_inference_model # This update target parameters slowly # Based on rate `tau`, which is much less than one. @@ -409,6 +465,7 @@ def get_critic_model_simplified_perception_discrete_actions(self): outputs = layers.Dense(self.ACTION_SPACE_SIZE)(out) # Outputs single value for give state-action model = tf.keras.Model([state_input, action_input], outputs) + model.compile(loss="mse", optimizer=Adam(0.005)) return model @@ -446,25 +503,29 @@ def get_actor_model_image_continuous_actions(self): model = Model( inputs=inputs, outputs=[v_branch, w_branch], name="continuous_two_actions" ) + model.compile(loss="mse", optimizer=Adam(0.005)) + # return the constructed network architecture return model def build_branch_images(self, inputs, action_name): + neuron1 = 32 # 32, 64, 128 + neuron2 = 64 # 64, 128, 256 last_init = tf.random_uniform_initializer(minval=-0.01, maxval=0.01) x = Rescaling(1.0 / 255)(inputs) - x = Conv2D(32, (3, 3), padding="same")(x) + x = Conv2D(neuron1, (3, 3), padding="same")(x) # x = Conv2D(32, (3, 3), padding="same")(inputs) x = Activation("relu")(x) x = MaxPooling2D(pool_size=(3, 3))(x) x = Dropout(0.25)(x) - x = Conv2D(64, (3, 3), padding="same")(x) + x = Conv2D(neuron2, (3, 3), padding="same")(x) x = Activation("relu")(x) x = MaxPooling2D(pool_size=(2, 2))(x) x = Dropout(0.25)(x) x = Flatten()(x) - x = Dense(64)(x) + x = Dense(neuron2)(x) x = Dense(1, activation="tanh", kernel_initializer=last_init)(x) x = Activation("tanh", name=action_name)(x) @@ -486,7 +547,8 @@ def get_critic_model_image_continuous_actions_conv(self): state_out = Dropout(0.25)(state_out) state_out = Flatten()(state_out) """ - + neuron1 = 128 # 32, 64 + neuron2 = 256 # 64, 128 # Next NN is the same as actor net state_out = Rescaling(1.0 / 255)(state_input) state_out = Conv2D(32, (3, 3), padding="same")(state_out) @@ -517,6 +579,7 @@ def get_critic_model_image_continuous_actions_conv(self): # Outputs single value for give state-action model = tf.keras.Model([state_input, action_input_v, action_input_w], outputs) + model.compile(loss="mse", optimizer=Adam(0.005)) return model @@ -545,6 +608,7 @@ def get_critic_model_image_continuous_actions(self): # Outputs single value for give state-action model = tf.keras.Model([state_input, action_input_v, action_input_w], outputs) + model.compile(loss="mse", optimizer=Adam(0.005)) return model @@ -563,6 +627,8 @@ def get_actor_model_sp_continuous_actions(self): model = Model( inputs=inputs, outputs=[v_branch, w_branch], name="continuous_two_actions" ) + model.compile(loss="mse", optimizer=Adam(0.005)) + # return the constructed network architecture return model @@ -601,5 +667,6 @@ def get_critic_model_sp_continuous_actions(self): # Outputs single value for given state-action model = tf.keras.Model([state_input, action_input_v, action_input_w], outputs) + model.compile(loss="mse", optimizer=Adam(0.005)) return model diff --git a/rl_studio/algorithms/ddpg_torch.py b/rl_studio/algorithms/ddpg_torch.py index fcd029ccf..a6526f5cd 100644 --- a/rl_studio/algorithms/ddpg_torch.py +++ b/rl_studio/algorithms/ddpg_torch.py @@ -128,15 +128,15 @@ def get_action(self, state, step=None, explore=True): if explore: action = self.noise.get_action(action, step) - return action if isinstance(action, Sequence) else [action] + return action def forward(self, state): """ Param state is a torch tensor """ - x = F.relu(self.linear1(state)) - x = F.relu(self.linear2(x)) - x = torch.tanh(self.linear3(x)) + x = torch.tanh(self.linear1(state)) + x = torch.tanh(self.linear2(x)) + x = self.linear3(x) return x @@ -151,4 +151,4 @@ def load_model(self, actor_file_path): print(f"\n\nMODEL LOADED.") def inference(self, state): - return self.get_action(state, explore=False) + return [self.get_action(state, explore=False)] diff --git a/rl_studio/algorithms/dqn_keras.py b/rl_studio/algorithms/dqn_keras.py index b7f180c9f..4251451f1 100644 --- a/rl_studio/algorithms/dqn_keras.py +++ b/rl_studio/algorithms/dqn_keras.py @@ -23,6 +23,280 @@ import rl_studio.algorithms.memory as memory +# Sharing GPU +gpus = tf.config.experimental.list_physical_devices("GPU") +for gpu in gpus: + tf.config.experimental.set_memory_growth(gpu, True) + + +class ModifiedTensorBoard(TensorBoard): + """For TensorFlow >= 2.4.1. This version is different from ModifiedTensorBoard_old_version""" + + def __init__(self, **kwargs): + super().__init__(**kwargs) + self.step = 1 + self.writer = tf.summary.create_file_writer(self.log_dir) + self._log_write_dir = self.log_dir + + def set_model(self, model): + self.model = model + + self._train_dir = os.path.join(self._log_write_dir, "train") + self._train_step = self.model._train_counter + + self._val_dir = os.path.join(self._log_write_dir, "validation") + self._val_step = self.model._test_counter + + self._should_write_train_graph = False + + def on_epoch_end(self, epoch, logs=None): + self.update_stats(**logs) + + def on_batch_end(self, batch, logs=None): + pass + + def on_train_end(self, _): + pass + + def update_stats(self, **stats): + with self.writer.as_default(): + for key, value in stats.items(): + tf.summary.scalar(key, value, step=self.step) + self.step += 1 + self.writer.flush() + + +class DQN: + def __init__( + self, environment, algorithm, actions_size, state_size, outdir, global_params + ): + + self.ACTION_SIZE = actions_size + self.STATE_SIZE = state_size + # self.OBSERVATION_SPACE_SHAPE = config.OBSERVATION_SPACE_SHAPE + + # DQN settings + self.REPLAY_MEMORY_SIZE = ( + algorithm.replay_memory_size + ) # How many last steps to keep for model training + self.MIN_REPLAY_MEMORY_SIZE = ( + algorithm.min_replay_memory_size + ) # Minimum number of steps in a memory to start training + self.MINIBATCH_SIZE = ( + algorithm.minibatch_size + ) # How many steps (samples) to use for training + self.UPDATE_TARGET_EVERY = ( + algorithm.update_target_every + ) # Terminal states (end of episodes) + self.MODEL_NAME = algorithm.model_name + self.DISCOUNT = algorithm.gamma # gamma: min 0 - max 1 + + self.state_space = global_params.states + + # load pretrained model for continuing training (not inference) + if environment["mode"] == "retraining": + print("---------------------- entry load retrained model") + print(f"{outdir}/{environment['retrain_dqn_tf_model_name']}") + # load pretrained actor and critic models + dqn_retrained_model = f"{outdir}/{environment['retrain_dqn_tf_model_name']}" + self.model = load_model(dqn_retrained_model, compile=False) + self.target_model = load_model(dqn_retrained_model, compile=False) + + else: + # main model + # # gets trained every step + if global_params.states == "image": + self.model = self.get_model_conv2D() + # Target model this is what we .predict against every step + self.target_model = self.get_model_conv2D() + self.target_model.set_weights(self.model.get_weights()) + else: + self.model = self.get_model_simplified_perception() + # Target model this is what we .predict against every step + self.target_model = self.get_model_simplified_perception() + self.target_model.set_weights(self.model.get_weights()) + + # An array with last n steps for training + self.replay_memory = deque(maxlen=self.REPLAY_MEMORY_SIZE) + + # Custom tensorboard object + self.tensorboard = ModifiedTensorBoard( + log_dir=f"{global_params.logs_tensorboard_dir}/{algorithm.model_name}-{time.strftime('%Y%m%d-%H%M%S')}" + ) + + # Used to count when to update target network with main network's weights + self.target_update_counter = 0 + + def load_inference_model(self, models_dir, config): + """ """ + path_inference_model = ( + f"{models_dir}/{config['inference_dqn_tf_model_name']}" + ) + inference_model = load_model(path_inference_model, compile=False) + # critic_inference_model = load_model(path_critic_inference_model, compile=False) + + return inference_model + + def get_model_simplified_perception(self): + """ + simple model with 2 layers. Using for Simplified Perception + """ + neurons1 = 16 # 32, 64, 256, 400... + neurons2 = 16 # 32, 64, 256, 300... + loss = "mse" + optimizing = 0.005 + + inputs = layers.Input(shape=(self.STATE_SIZE)) + out = layers.Dense(neurons1, activation="relu")(inputs) + out = layers.Dense(neurons2, activation="relu")(out) + outputs = layers.Dense(self.ACTION_SIZE, activation="linear")(out) + model = tf.keras.Model(inputs, outputs) + model.compile(loss=loss, optimizer=Adam(optimizing)) + return model + + def create_model_no_image(self): + model = Sequential() + model.add( + Dense( + 20, input_shape=(2,) + self.OBSERVATION_SPACE_SHAPE, activation="relu" + ) + ) + model.add(Flatten()) + model.add(Dense(18, activation="relu")) + model.add(Dense(10, activation="relu")) + model.add(Dense(self.ACTION_SPACE_SIZE, activation="linear")) + model.compile(loss="mse", optimizer="adam", metrics=["accuracy"]) + return model + + def get_model_conv2D_original(self): + print(f"self.STATE_SIZE:{self.STATE_SIZE}") + model = Sequential() + # model.add(Conv2D(256, (3, 3), input_shape=(2,) + self.OBSERVATION_SPACE_SHAPE)) + model.add(Conv2D(256, (3, 3), input_shape=self.STATE_SIZE)) + model.add(Activation("relu")) + model.add(MaxPooling2D(pool_size=(2, 2))) + model.add(Dropout(0.2)) + model.add(Conv2D(256, (3, 3))) + model.add(Activation("relu")) + model.add(MaxPooling2D(pool_size=(2, 2))) + model.add(Dropout(0.2)) + model.add(Flatten()) + model.add(Dense(64)) + model.add(Dense(self.ACTION_SIZE, activation="linear")) + model.compile( + loss="mse", optimizer=Adam(learning_rate=0.001), metrics=["accuracy"] + ) + return model + + def get_model_conv2D(self): + last_init = tf.random_uniform_initializer(minval=-0.01, maxval=0.01) + inputs = Input(shape=self.STATE_SIZE) + x = Rescaling(1.0 / 255)(inputs) + x = Conv2D(32, (3, 3), padding="same")(x) + # x = Conv2D(32, (3, 3), padding="same")(inputs) + x = Activation("relu")(x) + x = MaxPooling2D(pool_size=(3, 3), padding="same")(x) + x = Dropout(0.25)(x) + + x = Conv2D(64, (3, 3), padding="same")(x) + x = Activation("relu")(x) + x = MaxPooling2D(pool_size=(2, 2), padding="same")(x) + x = Dropout(0.25)(x) + + x = Flatten()(x) + x = Dense(64)(x) + + x = Dense(self.ACTION_SIZE, activation="tanh", kernel_initializer=last_init)(x) + # x = Activation("tanh", name=action_name)(x) + model = Model(inputs=inputs, outputs=x, name="conv2D") + model.compile( + loss="mse", optimizer=Adam(learning_rate=0.001), metrics=["accuracy"] + ) + return model + + def update_replay_memory(self, transition): + self.replay_memory.append(transition) + + def get_qs(self, state): + if self.state_space == "image": + return self.model.predict(np.array(state).reshape(-1, *state.shape) / 255)[ + 0 + ] + else: + return self.model.predict(state)[0] + + # Trains main network every step during episode + def train(self, terminal_state, step): + + # Start training only if certain number of samples is already saved + if len(self.replay_memory) < self.MIN_REPLAY_MEMORY_SIZE: + return + + # Get a minibatch of random samples from memory replay table + minibatch = random.sample(self.replay_memory, self.MINIBATCH_SIZE) + # Get current states from minibatch, then query NN model for Q values + current_states = np.array([transition[0] for transition in minibatch]) / 255 + current_qs_list = self.model.predict(current_states) + + # Get future states from minibatch, then query NN model for Q values + # When using target network, query it, otherwise main network should be queried + new_current_states = np.array([transition[3] for transition in minibatch]) / 255 + future_qs_list = self.target_model.predict(new_current_states) + + X = [] # thats the image input + y = [] # thats the label or action to take + + # Now we need to enumerate our batches + for index, ( + current_state, + action, + reward, + new_current_state, + done, + ) in enumerate(minibatch): + + # If not a terminal state, get new q from future states, otherwise set it to 0 + # almost like with Q Learning, but we use just part of equation here + if not done: + max_future_q = np.max(future_qs_list[index]) + new_q = reward + self.DISCOUNT * max_future_q + else: + new_q = reward + + # Update Q value for given state + current_qs = current_qs_list[index] + current_qs[action] = new_q + + # And append to our training data + X.append(current_state) # image + y.append(current_qs) # q_value which is Action to take + + # Fit on all samples as one batch, log only on terminal state + self.model.fit( + np.array(X) / 255, + np.array(y), + batch_size=self.MINIBATCH_SIZE, + verbose=0, + shuffle=False, + callbacks=[self.tensorboard] if terminal_state else None, + ) + + # Update target network counter every episode + if terminal_state: + self.target_update_counter += 1 + + # If counter reaches set value, update target network with weights of main network + if self.target_update_counter > self.UPDATE_TARGET_EVERY: + self.target_model.set_weights(self.model.get_weights()) + self.target_update_counter = 0 + + +##################################################################################### +# +# DQN +# +##################################################################################### + class DeepQ: """ @@ -281,241 +555,3 @@ def _write_logs(self, logs, index): summary_value.tag = name self.writer.add_summary(summary, index) self.writer.flush() - - -class ModifiedTensorBoard(TensorBoard): - """For TensorFlow >= 2.4.1. This version is different from previous ModifiedTensorBoard_old_version""" - - def __init__(self, **kwargs): - super().__init__(**kwargs) - self.step = 1 - self.writer = tf.summary.create_file_writer(self.log_dir) - self._log_write_dir = self.log_dir - - def set_model(self, model): - self.model = model - - self._train_dir = os.path.join(self._log_write_dir, "train") - self._train_step = self.model._train_counter - - self._val_dir = os.path.join(self._log_write_dir, "validation") - self._val_step = self.model._test_counter - - self._should_write_train_graph = False - - def on_epoch_end(self, epoch, logs=None): - self.update_stats(**logs) - - def on_batch_end(self, batch, logs=None): - pass - - def on_train_end(self, _): - pass - - def update_stats(self, **stats): - with self.writer.as_default(): - for key, value in stats.items(): - tf.summary.scalar(key, value, step=self.step) - self.step += 1 - self.writer.flush() - - -class DQNF1FollowLine: - def __init__(self, config, actions_size, state_size, outdir): - - self.ACTION_SIZE = actions_size - self.STATE_SIZE = state_size - # self.OBSERVATION_SPACE_SHAPE = config.OBSERVATION_SPACE_SHAPE - - # DQN settings - self.REPLAY_MEMORY_SIZE = config[ - "replay_memory_size" - ] # How many last steps to keep for model training - self.MIN_REPLAY_MEMORY_SIZE = config[ - "min_replay_memory_size" - ] # Minimum number of steps in a memory to start training - self.MINIBATCH_SIZE = config[ - "minibatch_size" - ] # How many steps (samples) to use for training - self.UPDATE_TARGET_EVERY = config[ - "update_target_every" - ] # Terminal states (end of episodes) - self.MODEL_NAME = config["model_name"] - self.DISCOUNT = config["gamma"] # gamma: min 0 - max 1 - - self.state_space = config["state_space"] - # main model # gets trained every step - if config["state_space"] == "image": - self.model = self.get_model_conv2D() - # Target model this is what we .predict against every step - self.target_model = self.get_model_conv2D() - self.target_model.set_weights(self.model.get_weights()) - else: - self.model = self.get_model_simplified_perception() - # Target model this is what we .predict against every step - self.target_model = self.get_model_simplified_perception() - self.target_model.set_weights(self.model.get_weights()) - - # An array with last n steps for training - self.replay_memory = deque(maxlen=self.REPLAY_MEMORY_SIZE) - - # Custom tensorboard object - self.tensorboard = ModifiedTensorBoard( - log_dir=f"{outdir}/logs_TensorBoard/{self.MODEL_NAME}-{time.strftime('%Y%m%d-%H%M%S')}" - ) - - # Used to count when to update target network with main network's weights - self.target_update_counter = 0 - - def get_model_simplified_perception(self): - """ - simple model with 2 layers. Using for Simplified Perception - """ - neurons1 = 16 # 32, 64, 256, 400... - neurons2 = 16 # 32, 64, 256, 300... - loss = "mse" - optimizing = 0.005 - - inputs = layers.Input(shape=(self.STATE_SIZE)) - out = layers.Dense(neurons1, activation="relu")(inputs) - out = layers.Dense(neurons2, activation="relu")(out) - outputs = layers.Dense(self.ACTION_SIZE, activation="linear")(out) - model = tf.keras.Model(inputs, outputs) - model.compile(loss=loss, optimizer=Adam(optimizing)) - return model - - def create_model_no_image(self): - model = Sequential() - model.add( - Dense( - 20, input_shape=(2,) + self.OBSERVATION_SPACE_SHAPE, activation="relu" - ) - ) - model.add(Flatten()) - model.add(Dense(18, activation="relu")) - model.add(Dense(10, activation="relu")) - model.add(Dense(self.ACTION_SPACE_SIZE, activation="linear")) - model.compile(loss="mse", optimizer="adam", metrics=["accuracy"]) - return model - - def get_model_conv2D_original(self): - print(f"self.STATE_SIZE:{self.STATE_SIZE}") - model = Sequential() - # model.add(Conv2D(256, (3, 3), input_shape=(2,) + self.OBSERVATION_SPACE_SHAPE)) - model.add(Conv2D(256, (3, 3), input_shape=self.STATE_SIZE)) - model.add(Activation("relu")) - model.add(MaxPooling2D(pool_size=(2, 2))) - model.add(Dropout(0.2)) - model.add(Conv2D(256, (3, 3))) - model.add(Activation("relu")) - model.add(MaxPooling2D(pool_size=(2, 2))) - model.add(Dropout(0.2)) - model.add(Flatten()) - model.add(Dense(64)) - model.add(Dense(self.ACTION_SIZE, activation="linear")) - model.compile( - loss="mse", optimizer=Adam(learning_rate=0.001), metrics=["accuracy"] - ) - return model - - def get_model_conv2D(self): - last_init = tf.random_uniform_initializer(minval=-0.01, maxval=0.01) - inputs = Input(shape=self.STATE_SIZE) - x = Rescaling(1.0 / 255)(inputs) - x = Conv2D(32, (3, 3), padding="same")(x) - # x = Conv2D(32, (3, 3), padding="same")(inputs) - x = Activation("relu")(x) - x = MaxPooling2D(pool_size=(3, 3), padding="same")(x) - x = Dropout(0.25)(x) - - x = Conv2D(64, (3, 3), padding="same")(x) - x = Activation("relu")(x) - x = MaxPooling2D(pool_size=(2, 2), padding="same")(x) - x = Dropout(0.25)(x) - - x = Flatten()(x) - x = Dense(64)(x) - - x = Dense(self.ACTION_SIZE, activation="tanh", kernel_initializer=last_init)(x) - # x = Activation("tanh", name=action_name)(x) - model = Model(inputs=inputs, outputs=x, name="conv2D") - model.compile( - loss="mse", optimizer=Adam(learning_rate=0.001), metrics=["accuracy"] - ) - return model - - def update_replay_memory(self, transition): - self.replay_memory.append(transition) - - def get_qs(self, state): - if self.state_space == "image": - return self.model.predict(np.array(state).reshape(-1, *state.shape) / 255)[ - 0 - ] - else: - return self.model.predict(state)[0] - - # Trains main network every step during episode - def train(self, terminal_state, step): - - # Start training only if certain number of samples is already saved - if len(self.replay_memory) < self.MIN_REPLAY_MEMORY_SIZE: - return - - # Get a minibatch of random samples from memory replay table - minibatch = random.sample(self.replay_memory, self.MINIBATCH_SIZE) - # Get current states from minibatch, then query NN model for Q values - current_states = np.array([transition[0] for transition in minibatch]) / 255 - current_qs_list = self.model.predict(current_states) - - # Get future states from minibatch, then query NN model for Q values - # When using target network, query it, otherwise main network should be queried - new_current_states = np.array([transition[3] for transition in minibatch]) / 255 - future_qs_list = self.target_model.predict(new_current_states) - - X = [] # thats the image input - y = [] # thats the label or action to take - - # Now we need to enumerate our batches - for index, ( - current_state, - action, - reward, - new_current_state, - done, - ) in enumerate(minibatch): - - # If not a terminal state, get new q from future states, otherwise set it to 0 - # almost like with Q Learning, but we use just part of equation here - if not done: - max_future_q = np.max(future_qs_list[index]) - new_q = reward + self.DISCOUNT * max_future_q - else: - new_q = reward - - # Update Q value for given state - current_qs = current_qs_list[index] - current_qs[action] = new_q - - # And append to our training data - X.append(current_state) # image - y.append(current_qs) # q_value which is Action to take - - # Fit on all samples as one batch, log only on terminal state - self.model.fit( - np.array(X) / 255, - np.array(y), - batch_size=self.MINIBATCH_SIZE, - verbose=0, - shuffle=False, - callbacks=[self.tensorboard] if terminal_state else None, - ) - - # Update target network counter every episode - if terminal_state: - self.target_update_counter += 1 - - # If counter reaches set value, update target network with weights of main network - if self.target_update_counter > self.UPDATE_TARGET_EVERY: - self.target_model.set_weights(self.model.get_weights()) - self.target_update_counter = 0 diff --git a/rl_studio/algorithms/ppo.py b/rl_studio/algorithms/ppo.py index 54cdf37c0..9d7bb5c71 100644 --- a/rl_studio/algorithms/ppo.py +++ b/rl_studio/algorithms/ppo.py @@ -46,8 +46,8 @@ def train(self, w, prev_prob_act, prob_act, advantage, global_steps, epsilon): self.adam_actor.zero_grad() actor_loss.backward() # clip_grad_norm_(adam_actor, max_grad_norm) - w.add_histogram("gradients/actor", - torch.cat([p.grad.view(-1) for p in self.parameters()]), global_step=global_steps) + # w.add_histogram("gradients/actor", + # torch.cat([p.grad.view(-1) for p in self.parameters()]), global_step=global_steps) self.adam_actor.step() return actor_loss @@ -99,8 +99,8 @@ def train(self, w, advantage, global_steps): self.adam_critic.zero_grad() critic_loss.backward() # clip_grad_norm_(adam_critic, max_grad_norm) - w.add_histogram("gradients/critic", - torch.cat([p.data.view(-1) for p in self.parameters()]), global_step=global_steps) + # w.add_histogram("gradients/critic", + # torch.cat([p.data.view(-1) for p in self.parameters()]), global_step=global_steps) self.adam_critic.step() return critic_loss diff --git a/rl_studio/algorithms/ppo_continuous.py b/rl_studio/algorithms/ppo_continuous.py new file mode 100644 index 000000000..1b4cdbe73 --- /dev/null +++ b/rl_studio/algorithms/ppo_continuous.py @@ -0,0 +1,253 @@ +import torch +import torch.nn as nn +from torch.distributions import MultivariateNormal +from torch.distributions import Categorical + +################################## set device ################################## +print("============================================================================================") +# set device to cpu or cuda +device = torch.device('cpu') +if (torch.cuda.is_available()): + device = torch.device('cuda:0') + torch.cuda.empty_cache() + print("Device set to : " + str(torch.cuda.get_device_name(device))) +else: + print("Device set to : cpu") +print("============================================================================================") + + +################################## PPO Policy ################################## +class RolloutBuffer: + def __init__(self): + self.actions = [] + self.states = [] + self.logprobs = [] + self.rewards = [] + self.is_terminals = [] + + def clear(self): + del self.actions[:] + del self.states[:] + del self.logprobs[:] + del self.rewards[:] + del self.is_terminals[:] + + +class ActorCritic(nn.Module): + def __init__(self, state_dim, action_dim, has_continuous_action_space, action_std_init): + super(ActorCritic, self).__init__() + action_std_init = 0.0001 if action_std_init is None else action_std_init + + self.has_continuous_action_space = has_continuous_action_space + + if has_continuous_action_space: + self.action_dim = action_dim + self.action_var = torch.full((action_dim,), action_std_init * action_std_init).to(device) + # actor + if has_continuous_action_space: + self.actor = nn.Sequential( + nn.Linear(state_dim, 64), + nn.Tanh(), + nn.Linear(64, 64), + nn.Tanh(), + nn.Linear(64, action_dim), + ) + else: + self.actor = nn.Sequential( + nn.Linear(state_dim, 64), + nn.Tanh(), + nn.Linear(64, 64), + nn.Tanh(), + nn.Linear(64, action_dim), + nn.Softmax(dim=-1) + ) + # critic + self.critic = nn.Sequential( + nn.Linear(state_dim, 64), + nn.Tanh(), + nn.Linear(64, 64), + nn.Tanh(), + nn.Linear(64, 1) + ) + + def set_action_std(self, new_action_std): + if self.has_continuous_action_space: + self.action_var = torch.full((self.action_dim,), new_action_std * new_action_std).to(device) + else: + print("--------------------------------------------------------------------------------------------") + print("WARNING : Calling ActorCritic::set_action_std() on discrete action space policy") + print("--------------------------------------------------------------------------------------------") + + def forward(self): + raise NotImplementedError + + def act(self, state): + if self.has_continuous_action_space: + action_mean = self.actor(state) + cov_mat = torch.diag(self.action_var).unsqueeze(dim=0) + dist = MultivariateNormal(action_mean, cov_mat) + else: + action_probs = self.actor(state) + dist = Categorical(action_probs) + + action = dist.sample() + action_logprob = dist.log_prob(action) + + return action.detach(), action_logprob.detach() + + def evaluate(self, state, action): + + if self.has_continuous_action_space: + action_mean = self.actor(state) + + action_var = self.action_var.expand_as(action_mean) + cov_mat = torch.diag_embed(action_var).to(device) + dist = MultivariateNormal(action_mean, cov_mat) + + # For Single Action Environments. + if self.action_dim == 1: + action = action.reshape(-1, self.action_dim) + else: + action_probs = self.actor(state) + dist = Categorical(action_probs) + action_logprobs = dist.log_prob(action) + dist_entropy = dist.entropy() + state_values = self.critic(state) + + return action_logprobs, state_values, dist_entropy + + +class PPO: + def __init__(self, state_dim, action_dim, lr_actor=0.0003, lr_critic=0.001, gamma=None, K_epochs=80, eps_clip=None, + has_continuous_action_space=True, action_std_init=None): + self.action_std_decay_rate = 0.05 # linearly decay action_std (action_std = action_std - action_std_decay_rate) + self.min_action_std = 0.1 # minimum action_std (stop decay after action_std <= min_action_std) + self.action_std_decay_freq = int(2.5e5) # action_std decay frequency (in num timesteps) + self.has_continuous_action_space = has_continuous_action_space + + if has_continuous_action_space: + self.action_std = action_std_init + + self.gamma = gamma + self.eps_clip = eps_clip + self.K_epochs = K_epochs + + self.buffer = RolloutBuffer() + + self.policy = ActorCritic(state_dim, action_dim, has_continuous_action_space, action_std_init).to(device) + self.optimizer = torch.optim.Adam([ + {'params': self.policy.actor.parameters(), 'lr': lr_actor}, + {'params': self.policy.critic.parameters(), 'lr': lr_critic} + ]) + + self.policy_old = ActorCritic(state_dim, action_dim, has_continuous_action_space, action_std_init).to(device) + self.policy_old.load_state_dict(self.policy.state_dict()) + + self.MseLoss = nn.MSELoss() + + def set_action_std(self, new_action_std): + if self.has_continuous_action_space: + self.action_std = new_action_std + self.policy.set_action_std(new_action_std) + self.policy_old.set_action_std(new_action_std) + else: + print("--------------------------------------------------------------------------------------------") + print("WARNING : Calling PPO::set_action_std() on discrete action space policy") + print("--------------------------------------------------------------------------------------------") + + def decay_action_std(self, action_std_decay_rate, min_action_std): + print("--------------------------------------------------------------------------------------------") + if self.has_continuous_action_space: + self.action_std = self.action_std - action_std_decay_rate + self.action_std = round(self.action_std, 4) + if (self.action_std <= min_action_std): + self.action_std = min_action_std + print("setting actor output action_std to min_action_std : ", self.action_std) + else: + print("setting actor output action_std to : ", self.action_std) + self.set_action_std(self.action_std) + + else: + print("WARNING : Calling PPO::decay_action_std() on discrete action space policy") + print("--------------------------------------------------------------------------------------------") + + def select_action(self, state): + + if self.has_continuous_action_space: + with torch.no_grad(): + state = torch.FloatTensor(state).to(device) + action, action_logprob = self.policy_old.act(state) + + self.buffer.states.append(state) + self.buffer.actions.append(action) + self.buffer.logprobs.append(action_logprob) + + return action.detach().cpu().numpy().flatten() + else: + with torch.no_grad(): + state = torch.FloatTensor(state).to(device) + action, action_logprob = self.policy_old.act(state) + + self.buffer.states.append(state) + self.buffer.actions.append(action) + self.buffer.logprobs.append(action_logprob) + + return action.item() + + def update(self): + # Monte Carlo estimate of returns + rewards = [] + discounted_reward = 0 + for reward, is_terminal in zip(reversed(self.buffer.rewards), reversed(self.buffer.is_terminals)): + if is_terminal: + discounted_reward = 0 + discounted_reward = reward + (self.gamma * discounted_reward) + rewards.insert(0, discounted_reward) + + # Normalizing the rewards + rewards = torch.tensor(rewards, dtype=torch.float32).to(device) + rewards = (rewards - rewards.mean()) / (rewards.std() + 1e-7) + + # convert list to tensor + old_states = torch.squeeze(torch.stack(self.buffer.states, dim=0)).detach().to(device) + old_actions = torch.squeeze(torch.stack(self.buffer.actions, dim=0)).detach().to(device) + old_logprobs = torch.squeeze(torch.stack(self.buffer.logprobs, dim=0)).detach().to(device) + + # Optimize policy for K epochs + for _ in range(self.K_epochs): + # Evaluating old actions and values + logprobs, state_values, dist_entropy = self.policy.evaluate(old_states, old_actions) + + # match state_values tensor dimensions with rewards tensor + state_values = torch.squeeze(state_values) + + # Finding the ratio (pi_theta / pi_theta__old) + ratios = torch.exp(logprobs - old_logprobs.detach()) + + # Finding Surrogate Loss + advantages = rewards - state_values.detach() + surr1 = ratios * advantages + surr2 = torch.clamp(ratios, 1 - self.eps_clip, 1 + self.eps_clip) * advantages + + # final loss of clipped objective PPO + loss = -torch.min(surr1, surr2) + 0.5 * self.MseLoss(state_values, rewards) - 0.01 * dist_entropy + + # take gradient step + self.optimizer.zero_grad() + loss.mean().backward() + self.optimizer.step() + + # Copy new weights into old policy + self.policy_old.load_state_dict(self.policy.state_dict()) + + # clear buffer + self.buffer.clear() + + def inference(self, state): + return self.select_action(state) + def save(self, checkpoint_path): + torch.save(self.policy_old.state_dict(), checkpoint_path) + + def load(self, checkpoint_path): + self.policy_old.load_state_dict(torch.load(checkpoint_path, map_location=lambda storage, loc: storage)) + self.policy.load_state_dict(torch.load(checkpoint_path, map_location=lambda storage, loc: storage)) diff --git a/rl_studio/algorithms/ppo_continuous_not_working.py b/rl_studio/algorithms/ppo_continuous_not_working.py new file mode 100644 index 000000000..c4e771ef0 --- /dev/null +++ b/rl_studio/algorithms/ppo_continuous_not_working.py @@ -0,0 +1,117 @@ +import numpy as np +import torch +import gym +from torch import nn +from torch.nn import functional as F +from torch.distributions import MultivariateNormal +import matplotlib.pyplot as plt +from torch.utils import tensorboard +import pickle + + +def mish(input): + return input * torch.tanh(F.softplus(input)) + +class Mish(nn.Module): + def __init__(self): super().__init__() + + def forward(self, input): return mish(input) + + +# helper function to convert numpy arrays to tensors +def t(x): + return torch.from_numpy(x).float() + +def set_device(): + device = torch.device('cpu') + if torch.cuda.is_available(): + device = torch.device('cuda:0') + torch.cuda.empty_cache() + print("Device set to : " + str(torch.cuda.get_device_name(device))) + else: + print("Device set to : cpu") + return device +class Actor(nn.Module): + def __init__(self, state_dim, n_actions, action_std_init, activation=nn.Tanh): + super().__init__() + self.model = nn.Sequential( + nn.Linear(state_dim, 64), + activation(), + nn.Linear(64, 32), + activation(), + nn.Linear(32, n_actions), + ) + self.adam_actor = torch.optim.Adam(self.parameters(), lr=3e-4) + torch.manual_seed(1) + self.device = set_device() + self.action_var = torch.full((n_actions,), action_std_init * action_std_init).to(self.device) + + def train(self, w, prev_prob_act, prob_act, advantage, global_steps, epsilon): + actor_loss = self.policy_loss(prev_prob_act.detach(), prob_act, advantage.detach(), epsilon) + w.add_scalar("loss/actor_loss", actor_loss, global_step=global_steps) + self.adam_actor.zero_grad() + actor_loss.backward() + # clip_grad_norm_(adam_actor, max_grad_norm) + # w.add_histogram("gradients/actor", + # torch.cat([p.grad.view(-1) for p in self.parameters()]), global_step=global_steps) + self.adam_actor.step() + return actor_loss + + def clip_grad_norm_(module, max_grad_norm): + nn.utils.clip_grad_norm_([p for g in module.param_groups for p in g["params"]], max_grad_norm) + + def policy_loss(self, old_log_prob, log_prob, advantage, eps): + ratio = (log_prob - old_log_prob).exp() + clipped = torch.clamp(ratio, 1 - eps, 1 + eps).to(self.device) * advantage.to(self.device) + + m = torch.min(ratio.to(self.device) * advantage.to(self.device), clipped.to(self.device)).to(self.device) + return -m + + def forward(self, X): + return self.model(X) + + def load_model(self, actor_file_path): + model = open(actor_file_path, "rb") + + self.model = pickle.load(model) + + print(f"\n\nMODEL LOADED.") + + def get_dist(self, x, var): + cov_mat = torch.diag(var).unsqueeze(dim=0).to(self.device) + return MultivariateNormal(x.to(self.device), cov_mat) + def inference(self, state): + action_mean = self.actor(state) + dist = self.get_dist(action_mean, self.action_var) + action = dist.sample().to(self.device) + + return action + + +# Critic module +class Critic(nn.Module): + def __init__(self, state_dim, activation=nn.Tanh): + super().__init__() + self.model = nn.Sequential( + nn.Linear(state_dim, 64), + activation(), + nn.Linear(64, 32), + activation(), + nn.Linear(32, 1) + ) + self.adam_critic = torch.optim.Adam(self.parameters(), lr=1e-3) + torch.manual_seed(1) + + def train(self, w, advantage, global_steps): + critic_loss = -advantage.mean() + w.add_scalar("loss/critic_loss", critic_loss, global_step=global_steps) + self.adam_critic.zero_grad() + critic_loss.backward() + # clip_grad_norm_(adam_critic, max_grad_norm) + # w.add_histogram("gradients/critic", + # torch.cat([p.data.view(-1) for p in self.parameters()]), global_step=global_steps) + self.adam_critic.step() + return critic_loss + + def forward(self, X): + return self.model(X) diff --git a/rl_studio/algorithms/qlearn.py b/rl_studio/algorithms/qlearn.py index c980d6f5a..cbe1e7fb9 100644 --- a/rl_studio/algorithms/qlearn.py +++ b/rl_studio/algorithms/qlearn.py @@ -1,9 +1,139 @@ +import os import pickle import random +import time import numpy as np +class QLearnF1: + def __init__( + self, states_len, actions, actions_len, epsilon, alpha, gamma, num_regions + ): + self.q_table = np.random.uniform( + low=0, high=0, size=([num_regions + 1] * states_len + [actions_len]) + ) + self.epsilon = epsilon # exploration constant + self.alpha = alpha # learning rate + self.gamma = gamma # discount factor + self.actions = actions + self.actions_len = actions_len + + def select_action(self, state): + state = state[0] + + if np.random.random() > self.epsilon: + # Get action from Q table + action = np.argmax(self.q_table[state]) + else: + # Get random action + action = np.random.randint(0, self.actions_len) + + return action + + def learn(self, state, action, reward, next_state): + state = tuple(state) + next_state = next_state[0] + + max_future_q = np.max(self.q_table[next_state]) + current_q = self.q_table[state + (action,)] + new_q = (1 - self.alpha) * current_q + self.alpha * ( + reward + self.gamma * max_future_q + ) + + # Update Q table with new Q value + self.q_table[state + (action,)] = new_q + + def inference(self, state): + return np.argmax(self.q_table[state]) + + def update_epsilon(self, epsilon): + self.epsilon = epsilon + return self.epsilon + + def load_table(self, file): + self.q_table = np.load(file) + + def save_numpytable( + self, + qtable, + environment, + outdir, + cumulated_reward, + episode, + step, + epsilon, + ): + # Q Table as Numpy + # np_file = ( + # f"{outdir}/{time.strftime('%Y%m%d-%H%M%S')}_Circuit-{environment['circuit_name']}_States-{environment['states']}_Actions-{environment['action_space']}_Rewards-{environment['reward_function']}_epsilon-{round(epsilon,3)}_epoch-{episode}_step-{step}_reward-{int(cumulated_reward)}-qtable.npy", + # ) + # qtable = np.array([list(item.values()) for item in self.q.values()]) + np.save( + f"{outdir}/{time.strftime('%Y%m%d-%H%M%S')}_Circuit-{environment['circuit_name']}_States-{environment['states']}_Actions-{environment['action_space']}_Rewards-{environment['reward_function']}_epsilon-{round(epsilon,3)}_epoch-{episode}_step-{step}_reward-{int(cumulated_reward)}-qtable.npy", + qtable, + ) + + def save_model( + self, + environment, + outdir, + qlearn, + cumulated_reward, + episode, + step, + epsilon, + states, + states_counter, + states_rewards, + ): + # Tabular RL: Tabular Q-learning basically stores the policy (Q-values) of the agent into a matrix of shape + # (S x A), where s are all states, a are all the possible actions. + + # outdir_models = f"{outdir}_models" + os.makedirs(f"{outdir}", exist_ok=True) + + # Q TABLE PICKLE + # base_file_name = "_actions_set:_{}_epsilon:_{}".format(settings.actions_set, round(qlearn.epsilon, 2)) + base_file_name = f"_Circuit-{environment['circuit_name']}_States-{environment['states']}_Actions-{environment['action_space']}_Rewards-{environment['reward_function']}_epsilon-{round(epsilon,3)}_epoch-{episode}_step-{step}_reward-{int(cumulated_reward)}" + file_dump = open( + f"{outdir}/{time.strftime('%Y%m%d-%H%M%S')}_{base_file_name}_QTABLE.pkl", + "wb", + ) + pickle.dump(qlearn.q, file_dump) + + # STATES COUNTER + # states_counter_file_name = base_file_name + "_STATES_COUNTER.pkl" + file_dump = open( + f"{outdir}/{time.strftime('%Y%m%d-%H%M%S')}_{base_file_name}_STATES_COUNTER.pkl", + "wb", + ) + pickle.dump(states_counter, file_dump) + + # STATES CUMULATED REWARD + # states_cum_reward_file_name = base_file_name + "_STATES_CUM_REWARD.pkl" + file_dump = open( + f"{outdir}/{time.strftime('%Y%m%d-%H%M%S')}_{base_file_name}_STATES_CUM_REWARD.pkl", + "wb", + ) + pickle.dump(states_rewards, file_dump) + + # STATES + # steps = base_file_name + "_STATES_STEPS.pkl" + file_dump: BufferedWriter = open( + f"{outdir}/{time.strftime('%Y%m%d-%H%M%S')}_{base_file_name}_STATES_CUM_REWARD.pkl", + "wb", + ) + pickle.dump(states, file_dump) + + # Q Table as Numpy + # np_file = ( + # f"{outdir}/{time.strftime('%Y%m%d-%H%M%S')}_{base_file_name}-qtable.npy", + # ) + # qtable = np.array([list(item.values()) for item in self.q.values()]) + # np.save(np_file, qtable) + + class QLearn: def __init__(self, actions, epsilon=0.99, alpha=0.8, gamma=0.9): self.q = {} @@ -29,7 +159,6 @@ def learnQ(self, state, action, reward, value): def selectAction(self, state, return_q=False): q = [self.getQValues(state, a) for a in self.actions] maxQ = max(q) - if random.random() < self.epsilon: minQ = min(q) mag = max(abs(minQ), abs(maxQ)) @@ -39,8 +168,8 @@ def selectAction(self, state, return_q=False): for i in range(len(self.actions)) ] maxQ = max(q) - count = q.count(maxQ) + # In case there're several state-action max values # we select a random one among them if count > 1: @@ -49,7 +178,8 @@ def selectAction(self, state, return_q=False): else: i = q.index(maxQ) - action = self.actions[i] + action = i + if return_q: # if they want it, give it! return action, q return action @@ -81,7 +211,74 @@ def inference(self, state, return_q=False): return action, q return action - def load_model(self, file_path, actions_path): + def load_pickle_model(self, file_path): + + qlearn_file = open(file_path, "rb") + self.q = pickle.load(qlearn_file) + + def load_np_model(self, file): + self.q = np.load(file) + + def save_model( + self, + environment, + outdir, + qlearn, + cumulated_reward, + episode, + step, + epsilon, + states, + states_counter, + states_rewards, + ): + # Tabular RL: Tabular Q-learning basically stores the policy (Q-values) of the agent into a matrix of shape + # (S x A), where s are all states, a are all the possible actions. + + # outdir_models = f"{outdir}_models" + os.makedirs(f"{outdir}", exist_ok=True) + + # Q TABLE PICKLE + # base_file_name = "_actions_set:_{}_epsilon:_{}".format(settings.actions_set, round(qlearn.epsilon, 2)) + base_file_name = f"_Circuit-{environment['circuit_name']}_States-{environment['states']}_Actions-{environment['action_space']}_Rewards-{environment['reward_function']}_epsilon-{round(epsilon,3)}_epoch-{episode}_step-{step}_reward-{int(cumulated_reward)}" + file_dump = open( + f"{outdir}/{time.strftime('%Y%m%d-%H%M%S')}_{base_file_name}_QTABLE.pkl", + "wb", + ) + pickle.dump(qlearn.q, file_dump) + + # STATES COUNTER + # states_counter_file_name = base_file_name + "_STATES_COUNTER.pkl" + file_dump = open( + f"{outdir}/{time.strftime('%Y%m%d-%H%M%S')}_{base_file_name}_STATES_COUNTER.pkl", + "wb", + ) + pickle.dump(states_counter, file_dump) + + # STATES CUMULATED REWARD + # states_cum_reward_file_name = base_file_name + "_STATES_CUM_REWARD.pkl" + file_dump = open( + f"{outdir}/{time.strftime('%Y%m%d-%H%M%S')}_{base_file_name}_STATES_CUM_REWARD.pkl", + "wb", + ) + pickle.dump(states_rewards, file_dump) + + # STATES + # steps = base_file_name + "_STATES_STEPS.pkl" + file_dump: BufferedWriter = open( + f"{outdir}/{time.strftime('%Y%m%d-%H%M%S')}_{base_file_name}_STATES_CUM_REWARD.pkl", + "wb", + ) + pickle.dump(states, file_dump) + + # Q Table as Numpy + # np_file = ( + # f"{outdir}/{time.strftime('%Y%m%d-%H%M%S')}_{base_file_name}-qtable.npy", + # ) + # qtable = np.array([list(item.values()) for item in self.q.values()]) + # np.save(np_file, qtable) + + def load_qmodel_actionsmodel(self, file_path, actions_path): qlearn_file = open(file_path, "rb") actions_file = open(actions_path, "rb") @@ -96,4 +293,4 @@ def load_model(self, file_path, actions_path): def updateEpsilon(self, epsilon): self.epsilon = epsilon - return self.epsilon \ No newline at end of file + return self.epsilon diff --git a/rl_studio/algorithms/qlearn_multiple_states.py b/rl_studio/algorithms/qlearn_multiple_states.py index 8394dd20d..26ddc6cee 100755 --- a/rl_studio/algorithms/qlearn_multiple_states.py +++ b/rl_studio/algorithms/qlearn_multiple_states.py @@ -1,3 +1,4 @@ +import collections import pickle import random @@ -13,10 +14,10 @@ def __init__(self, actions, epsilon=0.99, alpha=0.8, gamma=0.9): self.actions = actions def getQValues(self, state, action): + state = state if isinstance(state, collections.abc.Sequence) else [state] return self.q.get(tuple(state) + (action,), 0.0) def selectAction(self, state, return_q=False): - q = [self.getQValues(state, a) for a in self.actions] maxQ = max(q) @@ -32,7 +33,7 @@ def selectAction(self, state, return_q=False): else: i = q.index(maxQ) - action = self.actions[i] + action = i if return_q: # if they want it, give it! return action, q return action @@ -57,21 +58,7 @@ def reset(self): return np.array(self.state) def inference(self, state, return_q=False): - q = [self.getQValues(state, a) for a in self.actions] - maxQ = max(q) - count = q.count(maxQ) - # In case there're several state-action max values - # we select a random one among them - if count > 1: - best = [i for i in range(len(self.actions)) if q[i] == maxQ] - i = random.choice(best) - else: - i = q.index(maxQ) - - action = self.actions[i] - if return_q: # if they want it, give it! - return action, q - return action + return self.selectAction(state, return_q) def load_model(self, file_path, actions): @@ -80,7 +67,6 @@ def load_model(self, file_path, actions): self.q = pickle.load(qlearn_file) # TODO it may be possible to infer the actions from the model. I don know enough to assume that for every algorithm self.actions = actions - print(f"\n\nMODEL LOADED.") print(f" - Loading: {file_path}") print(f" - Model size: {len(self.q)}") diff --git a/rl_studio/algorithms/utils.py b/rl_studio/algorithms/utils.py new file mode 100644 index 000000000..e2236e0f0 --- /dev/null +++ b/rl_studio/algorithms/utils.py @@ -0,0 +1,51 @@ +###################################### +# +# Common functions and classes for /algoritmhs: DDPG, Qlearn, DQN.. +# +###################################### +import time + + +def save_actorcritic_model( + agent, global_params, algoritmhs_params, environment, cumulated_reward, episode, text +): + + agent.actor_model.save( + f"{global_params.models_dir}/{time.strftime('%Y%m%d-%H%M%S')}_{algoritmhs_params.model_name}_{text}_ACTOR" + f"Circuit-{environment['circuit_name']}_" + f"States-{environment['states']}_" + f"Actions-{environment['action_space']}_" + f"BATCH_Rewards-{environment['reward_function']}_" + f"MaxReward-{int(cumulated_reward)}_" + f"Epoch-{episode}" + ) + agent.critic_model.save( + f"{global_params.models_dir}/{time.strftime('%Y%m%d-%H%M%S')}_{algoritmhs_params.model_name}_{text}_CRITIC" + f"Circuit-{environment['circuit_name']}_" + f"States-{environment['states']}_" + f"Actions-{environment['action_space']}_" + f"BATCH_Rewards-{environment['reward_function']}_" + f"MaxReward-{int(cumulated_reward)}_" + f"Epoch-{episode}" + ) + + # save model in h5 format + agent.actor_model.save( + f"{global_params.models_dir}/{time.strftime('%Y%m%d-%H%M%S')}_{algoritmhs_params.model_name}_{text}_ACTOR" + f"Circuit-{environment['circuit_name']}_" + f"States-{environment['states']}_" + f"Actions-{environment['action_space']}_" + f"BATCH_Rewards-{environment['reward_function']}_" + f"MaxReward-{int(cumulated_reward)}_" + f"Epoch-{episode}.h5" + ) + agent.critic_model.save( + f"{global_params.models_dir}/{time.strftime('%Y%m%d-%H%M%S')}_{algoritmhs_params.model_name}_{text}_CRITIC" + f"Circuit-{environment['circuit_name']}_" + f"States-{environment['states']}_" + f"Actions-{environment['action_space']}_" + f"BATCH_Rewards-{environment['reward_function']}_" + f"MaxReward-{int(cumulated_reward)}_" + f"Epoch-{episode}.h5" + ) + diff --git a/rl_studio/checkpoints/cartpole/ddpg/20230107_0034_actor_avg_207.91.pkl b/rl_studio/checkpoints/cartpole/ddpg/20230107_0034_actor_avg_207.91.pkl new file mode 100644 index 000000000..8d80e3a15 Binary files /dev/null and b/rl_studio/checkpoints/cartpole/ddpg/20230107_0034_actor_avg_207.91.pkl differ diff --git a/rl_studio/checkpoints/cartpole/ddpg/20230107_0034_metadata.md b/rl_studio/checkpoints/cartpole/ddpg/20230107_0034_metadata.md new file mode 100644 index 000000000..693a0f9e6 --- /dev/null +++ b/rl_studio/checkpoints/cartpole/ddpg/20230107_0034_metadata.md @@ -0,0 +1,78 @@ +AGENT PARAMETERS +``` ++----------------------------------------------------------+ +|parameter| value | ++---------+------------------------------------------------+ +| cartpole|{'camera_params': {'witdh': 640, 'height': 480}}| ++----------------------------------------------------------+``` +``` + +SETTINGS PARAMETERS +``` ++-----------------------+ +| parameter | value | ++--------------+--------+ +| output_dir | ./logs/| ++--------------+--------+ +| save_model | True | ++--------------+--------+ +|save_positions| True | ++--------------+--------+ +| telemetry | False | ++--------------+--------+ +| logging_level| info | ++--------------+--------+ +| mode |training| ++--------------+--------+ +| agent |cartpole| ++--------------+--------+ +| algorithm | ddpg | ++--------------+--------+ +| framework | Pytorch| ++-----------------------+``` +``` + +ENVIRONMENT PARAMETERS +``` ++----------------------------------------------------+ +| parameter | value | ++---------------------------+------------------------+ +| env_name |myCartpole-continuous-v0| ++---------------------------+------------------------+ +| environment_folder | cartpole | ++---------------------------+------------------------+ +| runs | 20000 | ++---------------------------+------------------------+ +| full_experimentation_runs | 0 | ++---------------------------+------------------------+ +| update_every | 100 | ++---------------------------+------------------------+ +| show_every | 10000 | ++---------------------------+------------------------+ +| objective_reward | 500 | ++---------------------------+------------------------+ +| block_experience_batch | False | ++---------------------------+------------------------+ +| random_start_level | 0 | ++---------------------------+------------------------+ +| random_perturbations_level| 0 | ++---------------------------+------------------------+ +|perturbations_intensity_std| 0 | ++---------------------------+------------------------+ +| initial_pole_angle | 0 | ++---------------------------+------------------------+ +| non_recoverable_angle | 0.3 | ++----------------------------------------------------+``` +``` + +ALGORITHM PARAMETERS +``` ++-----------------+ +| parameter |value| ++-----------+-----+ +| gamma | 0.99| ++-----------+-----+ +|hidden_size| 128 | ++-----------+-----+ +| batch_size| 128 | ++-----------------+``` \ No newline at end of file diff --git a/rl_studio/checkpoints/cartpole/dqn_models/20221017_2118_epsilon_1_DQN_WEIGHTS_avg_475.825.pkl b/rl_studio/checkpoints/cartpole/dqn/20221017_2118_epsilon_1_DQN_WEIGHTS_avg_475.825.pkl similarity index 100% rename from rl_studio/checkpoints/cartpole/dqn_models/20221017_2118_epsilon_1_DQN_WEIGHTS_avg_475.825.pkl rename to rl_studio/checkpoints/cartpole/dqn/20221017_2118_epsilon_1_DQN_WEIGHTS_avg_475.825.pkl diff --git a/rl_studio/checkpoints/cartpole/dqn_models/20221017_2118_metadata.md b/rl_studio/checkpoints/cartpole/dqn/20221017_2118_metadata.md similarity index 100% rename from rl_studio/checkpoints/cartpole/dqn_models/20221017_2118_metadata.md rename to rl_studio/checkpoints/cartpole/dqn/20221017_2118_metadata.md diff --git a/rl_studio/checkpoints/cartpole/ppo/continuous/20221231_1813_actor_avg_422.44 b/rl_studio/checkpoints/cartpole/ppo/continuous/20221231_1813_actor_avg_422.44 new file mode 100644 index 000000000..4f76866a2 Binary files /dev/null and b/rl_studio/checkpoints/cartpole/ppo/continuous/20221231_1813_actor_avg_422.44 differ diff --git a/rl_studio/checkpoints/cartpole/ppo/continuous/20221231_1813_metadata.md b/rl_studio/checkpoints/cartpole/ppo/continuous/20221231_1813_metadata.md new file mode 100644 index 000000000..32a2136d1 --- /dev/null +++ b/rl_studio/checkpoints/cartpole/ppo/continuous/20221231_1813_metadata.md @@ -0,0 +1,76 @@ +AGENT PARAMETERS +``` ++----------------------------------------------------------+ +|parameter| value | ++---------+------------------------------------------------+ +| cartpole|{'camera_params': {'witdh': 640, 'height': 480}}| ++----------------------------------------------------------+``` +``` + +SETTINGS PARAMETERS +``` ++-----------------------------+ +| parameter | value | ++--------------+--------------+ +| output_dir | ./logs/ | ++--------------+--------------+ +| save_model | True | ++--------------+--------------+ +|save_positions| True | ++--------------+--------------+ +| telemetry | False | ++--------------+--------------+ +| logging_level| info | ++--------------+--------------+ +| mode | training | ++--------------+--------------+ +| agent | cartpole | ++--------------+--------------+ +| algorithm |ppo_continuous| ++-----------------------------+``` +``` + +ENVIRONMENT PARAMETERS +``` ++----------------------------------------------------+ +| parameter | value | ++---------------------------+------------------------+ +| env_name |myCartpole-continuous-v0| ++---------------------------+------------------------+ +| environment_folder | cartpole | ++---------------------------+------------------------+ +| runs | 20000 | ++---------------------------+------------------------+ +| full_experimentation_runs | 0 | ++---------------------------+------------------------+ +| update_every | 100 | ++---------------------------+------------------------+ +| show_every | 10000 | ++---------------------------+------------------------+ +| objective_reward | 500 | ++---------------------------+------------------------+ +| block_experience_batch | False | ++---------------------------+------------------------+ +| random_start_level | 0 | ++---------------------------+------------------------+ +| random_perturbations_level| 0.8 | ++---------------------------+------------------------+ +|perturbations_intensity_std| 1 | ++---------------------------+------------------------+ +| initial_pole_angle | 0 | ++---------------------------+------------------------+ +| non_recoverable_angle | 0.3 | ++----------------------------------------------------+``` +``` + +ALGORITHM PARAMETERS +``` ++---------------------+ +| parameter |value| ++---------------+-----+ +| gamma | 1 | ++---------------+-----+ +| epsilon | 0.15| ++---------------+-----+ +|episodes_update| 1000| ++---------------------+``` \ No newline at end of file diff --git a/rl_studio/checkpoints/cartpole/ppo_models/20221108_2002_actor_avg_500.0.pkl b/rl_studio/checkpoints/cartpole/ppo/discrete/20221108_2002_actor_avg_500.0.pkl similarity index 100% rename from rl_studio/checkpoints/cartpole/ppo_models/20221108_2002_actor_avg_500.0.pkl rename to rl_studio/checkpoints/cartpole/ppo/discrete/20221108_2002_actor_avg_500.0.pkl diff --git a/rl_studio/checkpoints/cartpole/ppo_models/20221108_2320_actor_avg_467.99.pkl b/rl_studio/checkpoints/cartpole/ppo/discrete/20221108_2320_actor_avg_467.99.pkl similarity index 100% rename from rl_studio/checkpoints/cartpole/ppo_models/20221108_2320_actor_avg_467.99.pkl rename to rl_studio/checkpoints/cartpole/ppo/discrete/20221108_2320_actor_avg_467.99.pkl diff --git a/rl_studio/checkpoints/cartpole/ppo_models/20221109_0820_actor_avg_500.0.pkl b/rl_studio/checkpoints/cartpole/ppo/discrete/20221109_0820_actor_avg_500.0.pkl similarity index 100% rename from rl_studio/checkpoints/cartpole/ppo_models/20221109_0820_actor_avg_500.0.pkl rename to rl_studio/checkpoints/cartpole/ppo/discrete/20221109_0820_actor_avg_500.0.pkl diff --git a/rl_studio/checkpoints/cartpole/ppo_models/20221110_0937_metadata.md b/rl_studio/checkpoints/cartpole/ppo/discrete/20221110_0937_metadata.md similarity index 100% rename from rl_studio/checkpoints/cartpole/ppo_models/20221110_0937_metadata.md rename to rl_studio/checkpoints/cartpole/ppo/discrete/20221110_0937_metadata.md diff --git a/rl_studio/checkpoints/cartpole/qlearn/20221110_0931_metadata.md b/rl_studio/checkpoints/cartpole/qlearn/20221110_0931_metadata.md new file mode 100644 index 000000000..85f8c7b50 --- /dev/null +++ b/rl_studio/checkpoints/cartpole/qlearn/20221110_0931_metadata.md @@ -0,0 +1,80 @@ +AGENT PARAMETERS +``` ++----------------------------+ +| parameter | value | ++-------------+--------------+ +|camera_params|{'witdh': 640}| ++-------------+--------------+ +| height | 480 | ++----------------------------+``` +``` + +SETTINGS PARAMETERS +``` ++----------------------+ +| parameter | value | ++--------------+-------+ +| output_dir |./logs/| ++--------------+-------+ +| save_model | True | ++--------------+-------+ +|save_positions| True | ++--------------+-------+ +| telemetry | False | ++--------------+-------+ +| logging_level| info | ++----------------------+``` +``` + +ENVIRONMENT PARAMETERS +``` ++-----------------------------------------+ +| parameter | value | ++---------------------------+-------------+ +| env_name |myCartpole-v0| ++---------------------------+-------------+ +| environment_folder | cartpole | ++---------------------------+-------------+ +| angle_bins | 100 | ++---------------------------+-------------+ +| pos_bins | 100 | ++---------------------------+-------------+ +| runs | 4000000 | ++---------------------------+-------------+ +| full_experimentation_runs | 0 | ++---------------------------+-------------+ +| update_every | 10000 | ++---------------------------+-------------+ +| save_every | 800000 | ++---------------------------+-------------+ +| show_every | 800000 | ++---------------------------+-------------+ +| objective_reward | 500 | ++---------------------------+-------------+ +| block_experience_batch | False | ++---------------------------+-------------+ +| random_start_level | 0.05 | ++---------------------------+-------------+ +| random_perturbations_level| 0 | ++---------------------------+-------------+ +|perturbations_intensity_std| 0 | ++---------------------------+-------------+ +| initial_pole_angle | 0 | ++---------------------------+-------------+ +| non_recoverable_angle | 0.3 | ++-----------------------------------------+``` +``` + +ALGORITHM PARAMETERS +``` ++----------------------+ +| parameter |value| ++----------------+-----+ +| alpha | 1 | ++----------------+-----+ +| epsilon | 0.99| ++----------------+-----+ +| gamma | 1 | ++----------------+-----+ +|epsilon_discount| 1.0 | ++----------------------+``` \ No newline at end of file diff --git a/rl_studio/checkpoints/follow_lane_gazebo_qlearn_f1__/20230123-161229_Circuit-simple_States-sp1_Actions-simple_Rewards-follow_right_lane_only_center_epsilon-0.399_epoch-291_step-15001_reward-136707-qtable.npy b/rl_studio/checkpoints/follow_lane_gazebo_qlearn_f1__/20230123-161229_Circuit-simple_States-sp1_Actions-simple_Rewards-follow_right_lane_only_center_epsilon-0.399_epoch-291_step-15001_reward-136707-qtable.npy new file mode 100644 index 000000000..8379b6b55 Binary files /dev/null and b/rl_studio/checkpoints/follow_lane_gazebo_qlearn_f1__/20230123-161229_Circuit-simple_States-sp1_Actions-simple_Rewards-follow_right_lane_only_center_epsilon-0.399_epoch-291_step-15001_reward-136707-qtable.npy differ diff --git a/rl_studio/checkpoints/follow_lane_gazebo_qlearn_f1__/20230123-193713_Circuit-simple_States-sp1_Actions-simple_Rewards-follow_right_lane_only_center_epsilon-0.334_epoch-75_step-12474_reward-115836-qtable.npy b/rl_studio/checkpoints/follow_lane_gazebo_qlearn_f1__/20230123-193713_Circuit-simple_States-sp1_Actions-simple_Rewards-follow_right_lane_only_center_epsilon-0.334_epoch-75_step-12474_reward-115836-qtable.npy new file mode 100644 index 000000000..ba2b3a18e Binary files /dev/null and b/rl_studio/checkpoints/follow_lane_gazebo_qlearn_f1__/20230123-193713_Circuit-simple_States-sp1_Actions-simple_Rewards-follow_right_lane_only_center_epsilon-0.334_epoch-75_step-12474_reward-115836-qtable.npy differ diff --git a/rl_studio/checkpoints/follow_lane_gazebo_qlearn_f1__/20230123-214025_Circuit-simple_States-sp1_Actions-simple_Rewards-follow_right_lane_only_center_epsilon-0.294_epoch-96_step-15001_reward-138978-qtable.npy b/rl_studio/checkpoints/follow_lane_gazebo_qlearn_f1__/20230123-214025_Circuit-simple_States-sp1_Actions-simple_Rewards-follow_right_lane_only_center_epsilon-0.294_epoch-96_step-15001_reward-138978-qtable.npy new file mode 100644 index 000000000..0e6249bc6 Binary files /dev/null and b/rl_studio/checkpoints/follow_lane_gazebo_qlearn_f1__/20230123-214025_Circuit-simple_States-sp1_Actions-simple_Rewards-follow_right_lane_only_center_epsilon-0.294_epoch-96_step-15001_reward-138978-qtable.npy differ diff --git a/rl_studio/checkpoints/mountain_car/1_20221226_1928_epsilon_0.01_QTABLE.pkl b/rl_studio/checkpoints/mountain_car/1_20221226_1928_epsilon_0.01_QTABLE.pkl new file mode 100644 index 000000000..54582a613 Binary files /dev/null and b/rl_studio/checkpoints/mountain_car/1_20221226_1928_epsilon_0.01_QTABLE.pkl differ diff --git a/rl_studio/checkpoints/mountain_car/actions_set_20221226_1814 b/rl_studio/checkpoints/mountain_car/actions_set_20221226_1814 new file mode 100644 index 000000000..39714467c Binary files /dev/null and b/rl_studio/checkpoints/mountain_car/actions_set_20221226_1814 differ diff --git a/rl_studio/checkpoints/pendulum/ddpg/20221231_0100_actor_avg_-392.54141588266396.pkl b/rl_studio/checkpoints/pendulum/ddpg/20221231_0100_actor_avg_-392.54141588266396.pkl new file mode 100644 index 000000000..0b9aec047 Binary files /dev/null and b/rl_studio/checkpoints/pendulum/ddpg/20221231_0100_actor_avg_-392.54141588266396.pkl differ diff --git a/rl_studio/checkpoints/pendulum/ddpg/20221231_0100_metadata.md b/rl_studio/checkpoints/pendulum/ddpg/20221231_0100_metadata.md new file mode 100644 index 000000000..8ebb2a918 --- /dev/null +++ b/rl_studio/checkpoints/pendulum/ddpg/20221231_0100_metadata.md @@ -0,0 +1,64 @@ +AGENT PARAMETERS +``` ++-------------------------------------------+ +| parameter | value | ++-------------+-----------------------------+ +|camera_params|{'witdh': 640, 'height': 480}| ++-------------------------------------------+``` +``` + +SETTINGS PARAMETERS +``` ++-------------------------+ +| parameter | value | ++--------------+----------+ +| output_dir | ./logs/ | ++--------------+----------+ +| save_model | True | ++--------------+----------+ +|save_positions| True | ++--------------+----------+ +| telemetry | False | ++--------------+----------+ +| logging_level| info | ++--------------+----------+ +| mode | training | ++--------------+----------+ +| agent | pendulum | ++--------------+----------+ +| algorithm |ddpg_torch| ++-------------------------+``` +``` + +ENVIRONMENT PARAMETERS +``` ++-------------------------------------+ +| parameter | value | ++-------------------------+-----------+ +| env_name |Pendulum-v1| ++-------------------------+-----------+ +| environment_folder | pendulum | ++-------------------------+-----------+ +| runs | 20000 | ++-------------------------+-----------+ +|full_experimentation_runs| 0 | ++-------------------------+-----------+ +| update_every | 20 | ++-------------------------+-----------+ +| show_every | 50 | ++-------------------------+-----------+ +| objective_reward | -430 | ++-------------------------------------+``` +``` + +ALGORITHM PARAMETERS +``` ++-----------------+ +| parameter |value| ++-----------+-----+ +| gamma | 0.99| ++-----------+-----+ +|hidden_size| 512 | ++-----------+-----+ +| batch_size| 128 | ++-----------------+``` \ No newline at end of file diff --git a/rl_studio/checkpoints/pendulum/ppo/20221231_1833_actor_avg_-214.04479205407756 b/rl_studio/checkpoints/pendulum/ppo/20221231_1833_actor_avg_-214.04479205407756 new file mode 100644 index 000000000..38b422dde Binary files /dev/null and b/rl_studio/checkpoints/pendulum/ppo/20221231_1833_actor_avg_-214.04479205407756 differ diff --git a/rl_studio/checkpoints/pendulum/ppo/20221231_1833_metadata.md b/rl_studio/checkpoints/pendulum/ppo/20221231_1833_metadata.md new file mode 100644 index 000000000..da47e360f --- /dev/null +++ b/rl_studio/checkpoints/pendulum/ppo/20221231_1833_metadata.md @@ -0,0 +1,76 @@ +AGENT PARAMETERS +``` ++-------------------------------------------+ +| parameter | value | ++-------------+-----------------------------+ +|camera_params|{'witdh': 640, 'height': 480}| ++-------------------------------------------+``` +``` + +SETTINGS PARAMETERS +``` ++-----------------------------+ +| parameter | value | ++--------------+--------------+ +| output_dir | ./logs/ | ++--------------+--------------+ +| save_model | True | ++--------------+--------------+ +|save_positions| True | ++--------------+--------------+ +| telemetry | False | ++--------------+--------------+ +| logging_level| info | ++--------------+--------------+ +| mode | training | ++--------------+--------------+ +| agent | pendulum | ++--------------+--------------+ +| algorithm |ppo_continuous| ++-----------------------------+``` +``` + +ENVIRONMENT PARAMETERS +``` ++---------------------------------------+ +| parameter | value | ++---------------------------+-----------+ +| env_name |Pendulum-v1| ++---------------------------+-----------+ +| environment_folder | pendulum | ++---------------------------+-----------+ +| runs | 20000 | ++---------------------------+-----------+ +| full_experimentation_runs | 0 | ++---------------------------+-----------+ +| update_every | 200 | ++---------------------------+-----------+ +| show_every | 1000 | ++---------------------------+-----------+ +| objective_reward | -350 | ++---------------------------+-----------+ +| block_experience_batch | False | ++---------------------------+-----------+ +| random_start_level | 0 | ++---------------------------+-----------+ +| random_perturbations_level| 0.8 | ++---------------------------+-----------+ +|perturbations_intensity_std| 1 | ++---------------------------+-----------+ +| initial_pole_angle | 0 | ++---------------------------+-----------+ +| non_recoverable_angle | 0.3 | ++---------------------------------------+``` +``` + +ALGORITHM PARAMETERS +``` ++---------------------+ +| parameter |value| ++---------------+-----+ +| gamma | 1 | ++---------------+-----+ +| epsilon | 0.15| ++---------------+-----+ +|episodes_update| 5000| ++---------------------+``` \ No newline at end of file diff --git a/rl_studio/checkpoints/robot_mesh/1_20221227_0306_epsilon_0.05_QTABLE.pkl b/rl_studio/checkpoints/robot_mesh/1_20221227_0306_epsilon_0.05_QTABLE.pkl new file mode 100644 index 000000000..e23d4e7e8 Binary files /dev/null and b/rl_studio/checkpoints/robot_mesh/1_20221227_0306_epsilon_0.05_QTABLE.pkl differ diff --git a/rl_studio/checkpoints/robot_mesh/actions_set_20221227_0657 b/rl_studio/checkpoints/robot_mesh/actions_set_20221227_0657 new file mode 100644 index 000000000..901ea6bc9 Binary files /dev/null and b/rl_studio/checkpoints/robot_mesh/actions_set_20221227_0657 differ diff --git a/rl_studio/config/config.yaml b/rl_studio/config/config.yaml index 3cdd397d7..4af90867b 100644 --- a/rl_studio/config/config.yaml +++ b/rl_studio/config/config.yaml @@ -1,54 +1,119 @@ ##################################################################################### -# General configuration file to configure RL-Studio in the training or inference mode +# General configuration file to launch RL-Studio in the training or inference mode +# +# Warning: it is not practical use it as a launch file! +# this file contains all parameters for whole tasks, agents, simulators, +# frameworks...as a didactic proposes. It is recommended creating a new specific +# config file for a dedicated task, with the form: +# config_mode_task_algorithm_agent_simulator.yaml ##################################################################################### ##################################################################################### -# settings: General parameters +# General settings +# +# Main Options: +# +# mode: training, retraining, inference +# task: follow_line, follow_lane +# algorithm: qlearn, dqn, ddpg, ppo +# simulator: openai, carla, gazebo +# environment_set: gazebo_environments # gazebo_environments, carla_environments +# env: simple, nurburgring, montreal, curves, simple_laser, manual, autoparking +# agent: f1, autoparking +# actions: continuous, simple, medium, hard, test, autoparking_simple +# states: image, sp1, sp3, spn +# rewards: discrete_follow_line, linear_follow_line, discrete_follow_right_lane, discrete_autoparking # -# Most relevant params: -# model_state_name: agent name -# total_episodes: training epochs -# training_time: in hours -# save_episodes: variable for TensorFlow savings ##################################################################################### - settings: - output_dir: "./logs/" - save_model: True - save_positions: True - telemetry: False - telemetry_mask: False - plotter_graphic: False - model_state_name: f1_camera_parking # autoparking + mode: training # training, retraining, inference + task: follow_lane_gazebo # follow_line_gazebo, follow_lane_gazebo, autoparking_gazebo + algorithm: ddpg # qlearn, dqn, ddpg, ppo + simulator: gazebo # openai, carla, gazebo + environment_set: gazebo_environments # gazebo_environments, carla_environments + env: simple # simple, nurburgring, montreal, curves, simple_laser, manual, autoparking + agent: f1 # f1, autoparkingRL, auto_carla, mountain_car, robot_mesh, cartpole, turtlebot + actions: continuous # continuous, simple, medium, hard, test, autoparking_simple + states: sp1 #image, sp1, sp3, spn + rewards: discrete_follow_right_lane # discrete_follow_line, linear_follow_line, discrete_follow_right_lane, discrete_autoparking + framework: TensorFlow # TensorFlow, Pytorch + models_dir: "./checkpoints" + logs_dir: "./logs" + metrics_dir: "./metrics" total_episodes: 50_000 training_time: 6 - save_episodes: 50 - save_every_step: 1_000 - lap_completed: False - load_qlearning_pickle_model: False - load_qlearning_pickle_file: 1_20210622_1512_actions_set_simple_epsilon_0.99_QTABLE.pkl - load_qlearning_model: False - load_qlearning_table: train_qlearning_f1_simple_EPISODE_1_20210625-082424-qtable.npy - ros_master_uri: '11311' - gazebo_master_uri: '11345' ##################################################################################### -# agent: every agent configures states, rewards and sensors -# -# Most relevant params: -# image_resizing: percentage of image redimension to feed neural nets. I.e. 10 means a width of 64 pixels and height of 48 pixels -# num_regions: in simplified perception, number of image vertical divisions in which every state falls -# new_image_size: every image size is fixed in to feed neural net. I.e. 32 means a size of 32x32 pixels -# state_space: configurates how input data is feeding. Image means raw data from camera sensor. sp1,...spn means simplified perception of 1 to n points. -# image: 0: distance from image midline down in pixels -# sp1, sp3, sp5, spn: simplified perception with 1, 3, 5 or n points respectively. Every number represents pixels from image midline down -# reward_function: discrete_follow_line represents a hardcoded reward function in follow line project, linear_follow_line means regression function in follow line project -# +# ROS general settings ##################################################################################### +ros: + ros_master_uri: "11311" + gazebo_master_uri: "11345" -agent: +##################################################################################### +# Carla simulator general settings +##################################################################################### +carla: + prefernvidia: True + port_rpc: + port_streaming: + quality_level: + render_mode: + off_screen_mode: + +##################################################################################### +# Inference and retraining: loading training files +##################################################################################### +retraining: + qlearn: + model_name: DDPG_Actor_conv2d32x64_Critic_conv2d32x64_actionsCont_stateImg_BATCH_CRITIC_Max61351_Epoch-500_State-image_Actions-continuous_inTime-20221018-221521.h5 + +inference: + ddpg: + inference_ddpg_tf_actor_model_name: "DDPG_Actor_conv2d32x64_Critic_conv2d32x64_BESTLAP_ACTOR_Max90069_Epoch226_inTime20221017-163548.h5" + inference_ddpg_tf_critic_model_name: "DDPG_Actor_conv2d32x64_Critic_conv2d32x64_BESTLAP_CRITIC_Max90069_Epoch226_inTime20221017-163548.h5" + +##################################################################################### +# Algorithms parameters +##################################################################################### +algorithm: + qlearn: + alpha: 0.2 + epsilon: 0.95 + epsilon_min: 0.05 + gamma: 0.9 + dqn: + alpha: 0.8 + gamma: 0.9 + epsilon: 0.99 + epsilon_discount: 0.9986 + epsilon_min: 0.05 + model_name: DQN_sp_16x16 + replay_memory_size: 50_000 + min_replay_memory_size: 1000 + minibatch_size: 64 + update_target_every: 5 + memory_fraction: 0.20 + buffer_capacity: 100_000 + batch_size: 64 + sarsa: + ddpg: + gamma: 0.9 + tau: 0.005 + std_dev: 0.2 + model_name: DDPG_Actor_conv2d32x64_Critic_conv2d32x64 + replay_memory_size: 50_000 + memory_fraction: 0.20 + critic_lr: 0.002 + actor_lr: 0.001 + buffer_capacity: 100_000 + batch_size: 64 + +##################################################################################### +# Agent +##################################################################################### +agents: f1: - agent_name: f1 camera_params: width: 640 height: 480 @@ -57,35 +122,7 @@ agent: image_resizing: 100 new_image_size: 32 num_regions: 16 - states: - state_space: sp1 #sp1 - image: - 0: [3] - sp1: - 0: [10] - sp3: - 0: [5, 15, 22] - sp5: - 0: [3, 5, 10, 15, 20] - spn: - 0: [10] - rewards: - reward_function: discrete_follow_line #linear_follow_line - discrete_follow_line: - from_0_to_02: 10 - from_02_to_04: 2 - from_others: 1 - penal: -100 - min_reward: 1_000 - highest_reward: 100 - linear_follow_line: - beta_0: 3 - beta_1: -0.1 - penal: 0 - min_reward: 1_000 - highest_reward: 100 autoparking: - agent_name: autoparking camera_params: width: 640 height: 480 @@ -93,90 +130,88 @@ agent: raw_image: False image_resizing: 100 new_image_size: 32 - states: - state_space: sp_curb - sp_curb: - poi: 3 - regions: 16 - pixels_cropping: 200 - sp_curb3: - 0: [5, 15, 22] - sp5: - 0: [3, 5, 10, 15, 20] - spn: - 0: [10] - rewards: - reward_function: discrete_autoparking - discrete_autoparking: - from_1_to_05: 10 - from_05_to_085: 20 - from_085_to_095: 40 - from_others: 1 - penal_reward: -100 - min_reward: 50 - goal_reward: 1100 ##################################################################################### -# actions: mainly divided into continuous and discrete sets of actions. In continuous for plannar agents it is divided in min and max. -# In other cases, we create a set of actions, 3, 5 or more, where every one is [linear velocity, angular velocity] -# -# actions_number: for plannar agents, two actions are executed -# simple: -# 0: [3 m/sec, 0 rad/sec] -# 1: [2 m/sec, 1 rad/sec] -# 2: [2 m/sec, -1 rad/sec] -# +# States ##################################################################################### +states: + image: + 0: [3] + sp1: + 0: [10] + sp3: + 0: [5, 15, 22] + sp5: + 0: [3, 5, 10, 15, 20] + spn: + 0: [10] + sp_curb: #autoparking + poi: 3 + regions: 16 + pixels_cropping: 200 + sp_curb3: + 0: [5, 15, 22] + sp5: + 0: [3, 5, 10, 15, 20] + spn: + 0: [10] +##################################################################################### +# Actions +##################################################################################### actions: - actions_number: 2 - actions_set: continuous #simple - available_actions: - simple: - 0: [ 3, 0 ] - 1: [ 2, 1 ] - 2: [ 2, -1 ] - medium: - 0: [ 3, 0 ] - 1: [ 2, 1 ] - 2: [ 2, -1 ] - 3: [ 1, 1.5 ] - 4: [ 1, -1.5 ] - hard: - 0: [ 3, 0 ] - 1: [ 2, 1 ] - 2: [ 2, -1 ] - 3: [ 1.5, 1 ] - 4: [ 1.5, -1 ] - 5: [ 1, -1.5 ] - 6: [ 1, -1.5 ] - test: - 0: [ 0, 0 ] - continuous: - v_min: 2 - v_max: 30 - w_right: -3 - w_left: 3 - autoparking_simple: - 0: [ 3, 0 ] - 1: [ 2, 0 ] - 2: [ 1, 0 ] - 3: [ 0, 0 ] - 4: [ -1, 0 ] + simple: + 0: [3, 0] + 1: [2, 1] + 2: [2, -1] + medium: + 0: [3, 0] + 1: [2, 1] + 2: [2, -1] + 3: [1, 1.5] + 4: [1, -1.5] + hard: + 0: [3, 0] + 1: [2, 1] + 2: [2, -1] + 3: [1.5, 1] + 4: [1.5, -1] + 5: [1, -1.5] + 6: [1, -1.5] + test: + 0: [0, 0] + continuous: + v: [2, 30] + w: [-3, 3] + autoparking_simple: + 0: [3, 0] + 1: [2, 0] + 2: [1, 0] + 3: [0, 0] + 4: [-1, 0] ##################################################################################### -# environments: configurates every param in all envs. -# -# Most relevant params: -# env_name: F1Env-v0, RobotMeshEnv-v0, myCartpole-v0, MyMountainCarEnv-v0 -# training_type: qlearn_camera, qlearn_laser, dqn, manual, ddpg -# circuit_positions_set: different positions in Gazebo simulator for every environment. Set represents x, y, z, 0, roll, pitch, yaw -# start_pose: agent initial pose in every training. It takes number from circuit_positions_set param -# alternate_pose: if True, the agent randoms initial pose, taking from circuit_positions_set param. Otherwise, it takes start_pose number - +# Rewards ##################################################################################### +rewards: + followline_center: + from_10: 10 + from_02: 2 + from_01: 1 + penal: -100 + min_reward: 5_000 + highest_reward: 100 + followline_center_v_w_linear: # only for continuous actions + beta_0: 3 + beta_1: -0.1 + penal: 0 + min_reward: 1_000 + highest_reward: 100 -environments: +##################################################################################### +# Environments: Gazebo, Carla, OpenAI +##################################################################################### +gazebo_environments: simple: env_name: F1Env-v0 circuit_name: simple @@ -184,10 +219,20 @@ environments: launchfile: simple_circuit.launch environment_folder: f1 robot_name: f1_renault + model_state_name: f1_camera_parking # autoparking start_pose: 0 # 0, 1, 2, 3, 4 alternate_pose: False estimated_steps: 15_000 sensor: camera + save_episodes: 50 + save_every_step: 1_000 + lap_completed: False + save_model: True + save_positions: True + debug_level: DEBUG + telemetry: False + telemetry_mask: False + plotter_graphic: False circuit_positions_set: 0: [53.462, -41.988, 0.004, 0, 0, 1.57, -1.57] 1: [53.462, -8.734, 0.004, 0, 0, 1.57, -1.57] @@ -205,6 +250,15 @@ environments: alternate_pose: True estimated_steps: 3500 sensor: camera + save_episodes: 50 + save_every_step: 1_000 + lap_completed: False + save_model: True + save_positions: True + debug_level: DEBUG + telemetry: False + telemetry_mask: False + plotter_graphic: False circuit_positions_set: 0: [-32.3188, 12.2921, 0, 0.0014, 0.0049, -0.2727, 0.9620] 1: [-30.6566, -21.4929, 0, 0.0014, 0.0049, -0.4727, 0.8720] @@ -223,6 +277,15 @@ environments: alternate_pose: True estimated_steps: 8000 sensor: camera + save_episodes: 50 + save_every_step: 1_000 + lap_completed: False + save_model: True + save_positions: True + debug_level: DEBUG + telemetry: False + telemetry_mask: False + plotter_graphic: False circuit_positions_set: 0: [-201.88, -91.02, 0, 0.00, 0.001, 0.98, -0.15] 1: [-278.71, -95.50, 0, 0.00, 0.001, 1, 0.03] @@ -268,64 +331,29 @@ environments: training_type: qlearn #qlearn, dqn, qlearn, manual, ddpg launchfile: autoparking.launch environment_folder: autoparking - robot_name: f1_camera_parking # autoparking_f1_camera_laser # + robot_name: f1_camera_parking # autoparking_f1_camera_laser # estimated_steps: 50 sensor: laser #laser, camera_laser, camera start_pose: 0 alternate_pose: False + save_episodes: 50 + save_every_step: 1_000 + lap_completed: False + save_model: True + save_positions: True + debug_level: DEBUG + telemetry: False + telemetry_mask: False + plotter_graphic: False circuit_positions_set: 0: [5.81294, 4.30243, 0.025329, 0.00013, 0.010376, 3.138] #x, y, z, roll, pitch, waw 1: [10.0231, -0.720526, 0.025333, 0.000116, 0.010372, 2.5] 2: [9.81003, 16.7248, 0.025296, 0.0002, 0.010271, -1.92009] - 3: [15.1722, 4.66392, 0.025344, 7.6e-05, 0.010362, -3.12394] + 3: [15.1722, 4.66392, 0.025344, 7.6e-05, 0.010362, -3.12394] 4: [14.2657, -2.26994, 0.02533, 5.1e-05, 0.010363, -3.12403] 5: [18.4119, 22.1479, 0.025338, 8.1e-05, 0.010356, -3.12407] 6: [8.43921, -2.90071, 0.025338, 8.1e-05, 0.010356, 1.55485] parking_spot_position_x: 2 parking_spot_position_y: 4.30 -##################################################################################### -# inference: loading training files - -##################################################################################### -inference: - qlearn: - inference_file: /home/rubenlucas93/1_20220428_2115_act_set_simple_epsilon_0.8_QTABLE.pkl - actions_file: /home/rubenlucas93/actions_set_20220428_2115 - -##################################################################################### -# algorithm: every particular param - -##################################################################################### -algorithm: - qlearn: - alpha: 0.2 - epsilon: 0.95 - epsilon_min: 0.05 - gamma: 0.9 - dqn: - alpha: 0.8 - gamma: 0.9 - epsilon: 0.99 - epsilon_discount: 0.9986 - epsilon_min: 0.05 - model_name: DQN_sp_16x16 - replay_memory_size: 50_000 - min_replay_memory_size: 1000 - minibatch_size: 64 - update_target_every: 5 - memory_fraction: 0.20 - buffer_capacity: 100_000 - batch_size: 64 - sarsa: - ddpg: - gamma: 0.9 - tau: 0.005 - std_dev: 0.2 - model_name: DDPG_Actor_conv2d32x64_Critic_conv2d32x64 - replay_memory_size: 50_000 - memory_fraction: 0.20 - critic_lr: 0.002 - actor_lr: 0.001 - buffer_capacity: 100_000 - batch_size: 64 +carla_environments: diff --git a/rl_studio/config/config_cartpole_ddpg.yaml b/rl_studio/config/config_cartpole_ddpg.yaml new file mode 100755 index 000000000..9da99c75a --- /dev/null +++ b/rl_studio/config/config_cartpole_ddpg.yaml @@ -0,0 +1,52 @@ +settings: + output_dir: "./logs/" + save_model: True + save_positions: True + telemetry: False + logging_level: info + mode: inference + agent: cartpole + algorithm: ddpg + framework: Pytorch + +# TODO make this section optional +actions: + available_actions: + simple: + +agent: + cartpole: + # TODO To be removed + camera_params: + witdh: 640 + height: 480 + +environments: + env_name: myCartpole-continuous-v0 + environment_folder: cartpole +# runs: 20000 + runs: 100 + full_experimentation_runs: 0 + update_every: 100 + show_every: 10000 + objective_reward: 500 +# block_experience_batch: False + block_experience_batch: False + # random_start_level: 0.05 + experiments: 29 + random_start_level: 0 # Number between 0 and 1 that indicates the difficulty of the start position + random_perturbations_level: 0.1 # Number between 0 and 1 that indicates the frequency of the random perturbations + random_perturbations_level_step: 0.1 + perturbations_intensity_std: 21 #Number between 0 and 1 that indicates the standard deviation of perturbations intensity + perturbations_intensity_std_step: 1 + initial_pole_angle: 0 + initial_pole_angle_steps: 0.05 + non_recoverable_angle: 0.3 # not applicable when making experiments with init_pole_angle (always 0.2 over the initial) + +inference: + inference_file: /home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/ddpg/checkpoints/20230107_0034_actor_avg_207.91.pkl + +algorithm: + gamma: 0.99 + hidden_size: 128 + batch_size: 128 \ No newline at end of file diff --git a/rl_studio/config/config_cartpole_dqn.yaml b/rl_studio/config/config_cartpole_dqn.yaml index 4c3123cf7..3a0b21612 100755 --- a/rl_studio/config/config_cartpole_dqn.yaml +++ b/rl_studio/config/config_cartpole_dqn.yaml @@ -4,6 +4,9 @@ settings: save_positions: True telemetry: False logging_level: info + mode: training + agent: cartpole + algorithm: dqn # TODO make this section optional actions: @@ -18,30 +21,31 @@ agent: height: 480 environments: - simple: env_name: myCartpole-v0 environment_folder: cartpole # runs: 20000 - runs: 100 + runs: 20000 full_experimentation_runs: 0 - update_every: 1000 + update_every: 100 show_every: 10000 objective_reward: 500 # block_experience_batch: False block_experience_batch: False # random_start_level: 0.05 + experiments: 1 random_start_level: 0 # Number between 0 and 1 that indicates the difficulty of the start position random_perturbations_level: 0 # Number between 0 and 1 that indicates the frequency of the random perturbations - perturbations_intensity_std: 0 # Number between 0 and 1 that indicates the standard deviation of perturbations intensity - initial_pole_angle: 0.3 - non_recoverable_angle: 0.7 + random_perturbations_level_step: 0 + perturbations_intensity_std: 1 #Number between 0 and 1 that indicates the standard deviation of perturbations intensity + perturbations_intensity_std_step: 1 + initial_pole_angle: 0 + initial_pole_angle_steps: 0.05 + non_recoverable_angle: 0.3 # not applicable when making experiments with init_pole_angle (always 0.2 over the initial) inference: - dqn: - inference_file: /home/ruben/Desktop/my-RL-Studio/rl_studio/checkpoints/cartpole/dqn_models/20221017_2118_epsilon_1_DQN_WEIGHTS_avg_475.825.pkl + inference_file: /home/ruben/Desktop/my-RL-Studio/rl_studio/checkpoints/cartpole/dqn/20221017_2118_epsilon_1_DQN_WEIGHTS_avg_475.825.pkl algorithm: - dqn: # gamma: 0.95 gamma: 0.95 epsilon_discount: 0.9997 diff --git a/rl_studio/config/config_cartpole_ppo.yaml b/rl_studio/config/config_cartpole_ppo.yaml index fa9a957a6..70436e7c5 100755 --- a/rl_studio/config/config_cartpole_ppo.yaml +++ b/rl_studio/config/config_cartpole_ppo.yaml @@ -4,6 +4,9 @@ settings: save_positions: True telemetry: False logging_level: info + mode: inference + agent: cartpole + algorithm: ppo # TODO make this section optional actions: @@ -18,29 +21,30 @@ agent: height: 480 environments: - simple: env_name: myCartpole-v0 environment_folder: cartpole # runs: 20000 runs: 100 full_experimentation_runs: 0 - update_every: 100 - show_every: 10000 + update_every: 1000 + show_every: 1000 objective_reward: 500 # block_experience_batch: False block_experience_batch: False # random_start_level: 0.05 + experiments: 29 random_start_level: 0 # Number between 0 and 1 that indicates the difficulty of the start position - random_perturbations_level: 0.8 # Number between 0 and 1 that indicates the frequency of the random perturbations - perturbations_intensity_std: 1 # Number between 0 and 1 that indicates the standard deviation of perturbations intensity + random_perturbations_level: 0.1 # Number between 0 and 1 that indicates the frequency of the random perturbations + random_perturbations_level_step: 0.1 + perturbations_intensity_std: 21 #Number between 0 and 1 that indicates the standard deviation of perturbations intensity + perturbations_intensity_std_step: 1 initial_pole_angle: 0 - non_recoverable_angle: 0.3 + initial_pole_angle_steps: 0.05 + non_recoverable_angle: 0.3 # not applicable when making experiments with init_pole_angle (always 0.2 over the initial) inference: - ppo: - inference_file: /home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/ppo/checkpoints/20221110_0937_actor_avg_500.0.pkl + inference_file: /home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/ppo/checkpoints/20221110_0937_actor_avg_500.0.pkl algorithm: - ppo: - gamma: 1 - epsilon: 0.15 + gamma: 1 + epsilon: 0.15 diff --git a/rl_studio/config/config_cartpole_ppo_continuous.yaml b/rl_studio/config/config_cartpole_ppo_continuous.yaml new file mode 100755 index 000000000..bb257bfc0 --- /dev/null +++ b/rl_studio/config/config_cartpole_ppo_continuous.yaml @@ -0,0 +1,51 @@ +settings: + output_dir: "./logs/" + save_model: True + save_positions: True + telemetry: False + logging_level: info + mode: inference + agent: cartpole + algorithm: ppo_continuous + +# TODO make this section optional +actions: + available_actions: + simple: + +agent: + cartpole: + # TODO To be removed + camera_params: + witdh: 640 + height: 480 + +environments: + env_name: myCartpole-continuous-v0 + environment_folder: cartpole +# runs: 20000 + runs: 100 + full_experimentation_runs: 0 + update_every: 100 + show_every: 1000 + objective_reward: 500 +# block_experience_batch: False + block_experience_batch: False + # random_start_level: 0.05 + experiments: 29 + random_start_level: 0 # Number between 0 and 1 that indicates the difficulty of the start position + random_perturbations_level: 0.1 # Number between 0 and 1 that indicates the frequency of the random perturbations + random_perturbations_level_step: 0.1 + perturbations_intensity_std: 21 #Number between 0 and 1 that indicates the standard deviation of perturbations intensity + perturbations_intensity_std_step: 1 + initial_pole_angle: 0 + initial_pole_angle_steps: 0.05 + non_recoverable_angle: 0.3 # not applicable when making experiments with init_pole_angle (always 0.2 over the initial) + +inference: + inference_file: /home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/ppo_continuous/checkpoints/20221231_1813_actor_avg_422.44 + +algorithm: + gamma: 1 + epsilon: 0.15 + episodes_update: 1000 \ No newline at end of file diff --git a/rl_studio/config/config_cartpole_programmatic.yaml b/rl_studio/config/config_cartpole_programmatic.yaml index e5c21040b..73b7ec13a 100755 --- a/rl_studio/config/config_cartpole_programmatic.yaml +++ b/rl_studio/config/config_cartpole_programmatic.yaml @@ -4,6 +4,9 @@ settings: save_positions: True telemetry: False logging_level: info + mode: inference + agent: cartpole + algorithm: programmatic # TODO make this section optional actions: @@ -18,24 +21,25 @@ agent: height: 480 environments: - simple: - env_name: myCartpole-v0 - environment_folder: cartpole + env_name: myCartpole-v0 + environment_folder: cartpole # runs: 20000 - runs: 100 - full_experimentation_runs: 0 - update_every: 100 - show_every: 100 - random_start_level: 0 # Number between 0 and 1 that indicates the difficulty of the start position - random_perturbations_level: 0 # Number between 0 and 1 that indicates the frequency of the random perturbations - perturbations_intensity_std: 0 # Number between 0 and 1 that indicates the standard deviation of perturbations intensity - initial_pole_angle: 0.5 - non_recoverable_angle: 0.7 + runs: 1000 + full_experimentation_runs: 0 + update_every: 10 + show_every: 1000 + experiments: 29 + random_start_level: 0 # Number between 0 and 1 that indicates the difficulty of the start position + random_perturbations_level: 0.1 # Number between 0 and 1 that indicates the frequency of the random perturbations + random_perturbations_level_step: 0.1 + perturbations_intensity_std: 21 #Number between 0 and 1 that indicates the standard deviation of perturbations intensity + perturbations_intensity_std_step: 1 + initial_pole_angle: 0 + initial_pole_angle_steps: 0.05 + non_recoverable_angle: 0.3 # not applicable when making experiments with init_pole_angle (always 0.2 over the initial) algorithm: - programmatic: # TODO make it complaining just if the relevant parameter for this algorithm is not found inference: - programmatic: # TODO make it complaining just if the relevant parameter for this algorithm is not found \ No newline at end of file diff --git a/rl_studio/config/config_cartpole_qlearn.yaml b/rl_studio/config/config_cartpole_qlearn.yaml index 42b929b87..be00c15b7 100755 --- a/rl_studio/config/config_cartpole_qlearn.yaml +++ b/rl_studio/config/config_cartpole_qlearn.yaml @@ -4,6 +4,9 @@ settings: save_positions: True telemetry: False logging_level: info + mode: training + agent: cartpole + algorithm: qlearn # TODO make this section optional actions: @@ -18,39 +21,41 @@ agent: height: 480 environments: - simple: env_name: myCartpole-v0 environment_folder: cartpole # runs: 20000 - angle_bins: 300 - pos_bins: 50 - runs: 100 + angle_bins: 100 + pos_bins: 100 + runs: 4000000 full_experimentation_runs: 0 - update_every: 1000 - save_every: 10000 - show_every: 100 + update_every: 10000 + save_every: 10000000 + show_every: 10000 objective_reward: 500 # block_experience_batch: False block_experience_batch: False # random_start_level: 0.05 + experiments: 1 random_start_level: 0 # Number between 0 and 1 that indicates the difficulty of the start position random_perturbations_level: 0 # Number between 0 and 1 that indicates the frequency of the random perturbations - perturbations_intensity_std: 0 # Number between 0 and 1 that indicates the standard deviation of perturbations intensity - initial_pole_angle: 0.3 - non_recoverable_angle: 0.7 - previously_trained_agent: /home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/qlearning/checkpoints/20221116_2001_epsilon_0.01_QTABLE_avg_ 179.357.pkl + random_perturbations_level_step: 0 + perturbations_intensity_std: 0 #Number between 0 and 1 that indicates the standard deviation of perturbations intensity + perturbations_intensity_std_step: 0 + initial_pole_angle: 0 + initial_pole_angle_steps: 0 + non_recoverable_angle: 0.3 # not applicable when making experiments with init_pole_angle (always 0.2 over the initial) + +# previously_trained_agent: /home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/qlearning/checkpoints/20221116_2001_epsilon_0.01_QTABLE_avg_ 179.357.pkl reward_value: 1 punish: 0 reward_shaping: 0 inference: - qlearn: - inference_file: /home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/qlearning/checkpoints/20221116_1010_epsilon_0.051_368.4092_QTABLE.pkl_avg_.pkl + inference_file: /home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/qlearning/checkpoints/old/20221110_0931_epsilon_0.116_406.5153_QTABLE.pkl_avg_.pkl actions_file: /home/ruben/Desktop/my-RL-Studio/rl_studio/logs/cartpole/qlearning/checkpoints/actions_set_20221109_2108 algorithm: - qlearn: - alpha: 0.9 - epsilon: 0.5 - gamma: 0.9 - epsilon_discount: 0.9999995 + alpha: 0.9 + epsilon: 0.99 + gamma: 0.9 + epsilon_discount: 0.99999997 diff --git a/rl_studio/config/config_f1_qlearn.yaml b/rl_studio/config/config_f1_qlearn.yaml deleted file mode 100644 index c72ea5f35..000000000 --- a/rl_studio/config/config_f1_qlearn.yaml +++ /dev/null @@ -1,332 +0,0 @@ -##################################################################################### -# General configuration file to configure RL-Studio in the training or inference mode -##################################################################################### - -##################################################################################### -# settings: General parameters -# -# Most relevant params: -# model_state_name: agent name -# total_episodes: training epochs -# training_time: in hours -# save_episodes: variable for TensorFlow savings -##################################################################################### - -settings: - output_dir: "./logs/" - save_model: True - save_positions: True - debug_level: DEBUG - telemetry: False - telemetry_mask: False - plotter_graphic: False - model_state_name: f1_camera_parking # autoparking - total_episodes: 50_000 - training_time: 6 - save_episodes: 50 - save_every_step: 1_000 - lap_completed: False - load_qlearning_pickle_model: False - load_qlearning_pickle_file: 1_20210622_1512_actions_set_simple_epsilon_0.99_QTABLE.pkl - load_qlearning_model: False - load_qlearning_table: train_qlearning_f1_simple_EPISODE_1_20210625-082424-qtable.npy - ros_master_uri: '11311' - gazebo_master_uri: '11345' - -##################################################################################### -# agent: every agent configures states, rewards and sensors -# -# Most relevant params: -# image_resizing: percentage of image redimension to feed neural nets. I.e. 10 means a width of 64 pixels and height of 48 pixels -# num_regions: in simplified perception, number of image vertical divisions in which every state falls -# new_image_size: every image size is fixed in to feed neural net. I.e. 32 means a size of 32x32 pixels -# state_space: configurates how input data is feeding. Image means raw data from camera sensor. sp1,...spn means simplified perception of 1 to n points. -# image: 0: distance from image midline down in pixels -# sp1, sp3, sp5, spn: simplified perception with 1, 3, 5 or n points respectively. Every number represents pixels from image midline down -# reward_function: discrete_follow_line represents a hardcoded reward function in follow line project, linear_follow_line means regression function in follow line project -# -##################################################################################### - -agent: - f1: - agent_name: f1 - camera_params: - width: 640 - height: 480 - center_image: 320 - raw_image: False - image_resizing: 100 - new_image_size: 32 - num_regions: 16 - states: - state_space: image #sp1 - image: - 0: [3] - sp1: - 0: [10] - sp3: - 0: [5, 15, 22] - sp5: - 0: [3, 5, 10, 15, 20] - spn: - 0: [10] - rewards: - reward_function: discrete_follow_line #linear_follow_line - discrete_follow_line: - from_0_to_02: 10 - from_02_to_04: 2 - from_others: 1 - penal: -100 - min_reward: 1_000 - highest_reward: 100 - linear_follow_line: - beta_0: 3 - beta_1: -0.1 - penal: 0 - min_reward: 1_000 - highest_reward: 100 - autoparking: - agent_name: autoparking - camera_params: - width: 640 - height: 480 - center_image: 320 - raw_image: False - image_resizing: 100 - new_image_size: 32 - states: - state_space: sp_curb - sp_curb: - poi: 3 - regions: 16 - pixels_cropping: 200 - sp_curb3: - 0: [5, 15, 22] - sp5: - 0: [3, 5, 10, 15, 20] - spn: - 0: [10] - rewards: - reward_function: discrete_autoparking - discrete_autoparking: - from_1_to_05: 10 - from_05_to_085: 20 - from_085_to_095: 40 - from_others: 1 - penal_reward: -100 - min_reward: 50 - goal_reward: 1100 - -##################################################################################### -# actions: mainly divided into continuous and discrete sets of actions. In continuous for plannar agents it is divided in min and max. -# In other cases, we create a set of actions, 3, 5 or more, where every one is [linear velocity, angular velocity] -# -# actions_number: for plannar agents, two actions are executed -# simple: -# 0: [3 m/sec, 0 rad/sec] -# 1: [2 m/sec, 1 rad/sec] -# 2: [2 m/sec, -1 rad/sec] -# -##################################################################################### - -actions: - actions_number: 5 - actions_set: autoparking_simple #simple - available_actions: - simple: - 0: [ 3, 0 ] - 1: [ 2, 1 ] - 2: [ 2, -1 ] - medium: - 0: [ 3, 0 ] - 1: [ 2, 1 ] - 2: [ 2, -1 ] - 3: [ 1, 1.5 ] - 4: [ 1, -1.5 ] - hard: - 0: [ 3, 0 ] - 1: [ 2, 1 ] - 2: [ 2, -1 ] - 3: [ 1.5, 1 ] - 4: [ 1.5, -1 ] - 5: [ 1, -1.5 ] - 6: [ 1, -1.5 ] - test: - 0: [ 0, 0 ] - continuous: - v_min: 2 - v_max: 30 - w_right: -3 - w_left: 3 - autoparking_simple: - 0: [ 3, 0 ] - 1: [ 2, 0 ] - 2: [ 1, 0 ] - 3: [ 0, 0 ] - 4: [ -1, 0 ] - -##################################################################################### -# environments: configurates every param in all envs. -# -# Most relevant params: -# env_name: F1Env-v0, RobotMeshEnv-v0, myCartpole-v0, MyMountainCarEnv-v0 -# training_type: qlearn_camera_follow_line, qlearn_camera_follow_lane, qlearn_laser, dqn, manual, ddpg -# circuit_positions_set: different positions in Gazebo simulator for every environment. Set represents x, y, z, 0, roll, pitch, yaw -# start_pose: agent initial pose in every training. It takes number from circuit_positions_set param -# alternate_pose: if True, the agent randoms initial pose, taking from circuit_positions_set param. Otherwise, it takes start_pose number - -##################################################################################### - -environments: - simple: - env_name: F1Env-v0 - circuit_name: simple - training_type: qlearn_camera_follow_line #qlearn_camera_follow_line, ddpg, dqn - launchfile: simple_circuit.launch - environment_folder: f1 - robot_name: f1_renault - start_pose: 0 # 0, 1, 2, 3, 4 - alternate_pose: False - estimated_steps: 4000 - sensor: camera - circuit_positions_set: - 0: [53.462, -41.988, 0.004, 0, 0, 1.57, -1.57] - 1: [53.462, -8.734, 0.004, 0, 0, 1.57, -1.57] - 2: [39.712, -30.741, 0.004, 0, 0, 1.56, 1.56] - 3: [-6.861, -36.481, 0.004, 0, 0.01, -0.858, 0.613] - 4: [20.043, 37.130, 0.003, 0, 0.103, -1.4383, -1.4383] - nurburgring: - env_name: F1Env-v0 - circuit_name: nurburgring - training_type: qlearn_camera_follow_line - launchfile: nurburgring_line.launch - environment_folder: f1 - robot_name: f1_renault - start_pose: 0 - alternate_pose: True - estimated_steps: 3500 - sensor: camera - circuit_positions_set: - 0: [-32.3188, 12.2921, 0, 0.0014, 0.0049, -0.2727, 0.9620] - 1: [-30.6566, -21.4929, 0, 0.0014, 0.0049, -0.4727, 0.8720] - 2: [28.0352, -17.7923, 0, 0.0001, 0.0051, -0.028, 1] - 3: [88.7408, -31.7120, 0, 0.0030, 0.0041, -0.1683, 0.98] - 4: [-73.2172, 11.8508, 0, 0.0043, -0.0027, 0.8517, 0.5173] - 5: [-73.6672, 37.4308, 0, 0.0043, -0.0027, 0.8517, 0.5173] - montreal: - env_name: F1Env-v0 - circuit_name: montreal - training_type: qlearn_camera_follow_line - launchfile: montreal_line.launch - environment_folder: f1 - robot_name: f1_renault - start_pose: 0 - alternate_pose: True - estimated_steps: 8000 - sensor: camera - circuit_positions_set: - 0: [-201.88, -91.02, 0, 0.00, 0.001, 0.98, -0.15] - 1: [-278.71, -95.50, 0, 0.00, 0.001, 1, 0.03] - 2: [-272.93, -17.70, 0, 0.0001, 0.001, 0.48, 0.87] - 3: [-132.73, 55.82, 0, 0.0030, 0.0041, -0.02, 0.9991] - 4: [294.99, 91.54, 0, 0.0043, -0.0027, 0.14, 0.99] - curves: - env_name: F1Env-v0 - circuit_name: curves - training_type: qlearn_camera_follow_line - launchfile: many_curves.launch - environment_folder: f1 - robot_name: f1_renault - start_pose: 0 - alternate_pose: False - estimated_steps: 4000 - sensor: camera - simple_laser: - env_name: F1Env-v0 - circuit_name: simple_laser - training_type: qlearn_laser - launchfile: f1_montreal.launch - environment_folder: f1 - robot_name: f1_renault - start_pose: 0 - alternate_pose: False - estimated_steps: 4000 - sensor: laser - manual: - env_name: F1Env-v0 - circuit_name: manual - training_type: qlearn_camera_follow_line - launchfile: simple_circuit.launch - environment_folder: f1 - robot_name: f1_renault - start_pose: 0 - alternate_pose: True - estimated_steps: 4000 - sensor: camera - autoparking: - env_name: AutoparkingEnv-v0 - circuit_name: autoparking - training_type: qlearn #qlearn, dqn, qlearn, manual, ddpg - launchfile: autoparking.launch - environment_folder: autoparking - robot_name: f1_camera_parking # autoparking_f1_camera_laser # - estimated_steps: 50 - sensor: laser #laser, camera_laser, camera - start_pose: 0 - alternate_pose: False - circuit_positions_set: - 0: [5.81294, 4.30243, 0.025329, 0.00013, 0.010376, 3.138] #x, y, z, roll, pitch, waw - 1: [10.0231, -0.720526, 0.025333, 0.000116, 0.010372, 2.5] - 2: [9.81003, 16.7248, 0.025296, 0.0002, 0.010271, -1.92009] - 3: [15.1722, 4.66392, 0.025344, 7.6e-05, 0.010362, -3.12394] - 4: [14.2657, -2.26994, 0.02533, 5.1e-05, 0.010363, -3.12403] - 5: [18.4119, 22.1479, 0.025338, 8.1e-05, 0.010356, -3.12407] - 6: [8.43921, -2.90071, 0.025338, 8.1e-05, 0.010356, 1.55485] - parking_spot_position_x: 2 - parking_spot_position_y: 4.30 - -##################################################################################### -# inference: loading training files - -##################################################################################### -inference: - qlearn: - inference_file: /home/rubenlucas93/1_20220428_2115_act_set_simple_epsilon_0.8_QTABLE.pkl - actions_file: /home/rubenlucas93/actions_set_20220428_2115 - -##################################################################################### -# algorithm: every particular param - -##################################################################################### -algorithm: - qlearn: - alpha: 0.2 - epsilon: 0.95 - epsilon_min: 0.05 - gamma: 0.9 - dqn: - alpha: 0.8 - gamma: 0.9 - epsilon: 0.99 - epsilon_discount: 0.9986 - epsilon_min: 0.05 - model_name: DQN_sp_16x16 - replay_memory_size: 50_000 - min_replay_memory_size: 1000 - minibatch_size: 64 - update_target_every: 5 - memory_fraction: 0.20 - buffer_capacity: 100_000 - batch_size: 64 - sarsa: - ddpg: - gamma: 0.9 - tau: 0.005 - std_dev: 0.2 - model_name: DDPG_Actor_conv2d32x64_Critic_conv2d32x64 - replay_memory_size: 50_000 - memory_fraction: 0.20 - critic_lr: 0.002 - actor_lr: 0.001 - buffer_capacity: 100_000 - batch_size: 64 diff --git a/rl_studio/config/config_f1followlane_ddpg.yaml b/rl_studio/config/config_f1followlane_ddpg.yaml deleted file mode 100644 index a07a14c95..000000000 --- a/rl_studio/config/config_f1followlane_ddpg.yaml +++ /dev/null @@ -1,289 +0,0 @@ -##################################################################################### -# General configuration file to configure RL-Studio in the training or inference mode -##################################################################################### - -##################################################################################### -# settings: General parameters -# -# Most relevant params: -# model_state_name: agent name -# total_episodes: training epochs -# training_time: in hours -# save_episodes: variable for TensorFlow savings -##################################################################################### - -settings: - output_dir: "./logs/" - save_model: True - save_positions: True - debug_level: DEBUG - telemetry: False - telemetry_mask: True - plotter_graphic: False - model_state_name: f1_renault_multicamera_multilaser #f1_renault - total_episodes: 60_000 - training_time: 5 - save_episodes: 5000 - save_every_step: 100 - lap_completed: False - load_qlearning_pickle_model: False - load_qlearning_pickle_file: 1_20210622_1512_actions_set_simple_epsilon_0.99_QTABLE.pkl - load_qlearning_model: False - load_qlearning_table: train_qlearning_f1_simple_EPISODE_1_20210625-082424-qtable.npy - ros_master_uri: '11311' - gazebo_master_uri: '11345' - -##################################################################################### -# agent: every agent configures states, rewards and sensors -# -# Most relevant params: -# image_resizing: percentage of image redimension to feed neural nets. I.e. 10 means a width of 64 pixels and height of 48 pixels -# num_regions: in simplified perception, number of image vertical divisions in which every state falls -# new_image_size: every image size is fixed in to feed neural net. I.e. 32 means a size of 32x32 pixels -# state_space: configurates how input data is feeding. Image means raw data from camera sensor. sp1,...spn means simplified perception of 1 to n points. -# image: 0: distance from image midline down in pixels -# sp1, sp3, sp5, spn: simplified perception with 1, 3, 5 or n points respectively. Every number represents pixels from image midline down -# reward_function: discrete_follow_line represents a hardcoded reward function in follow line project, linear_follow_line means regression function in follow line project -# -##################################################################################### - -agent: - f1: - agent_name: f1 - camera_params: - width: 640 - height: 480 - center_image: 320 - raw_image: False - image_resizing: 100 - new_image_size: 32 - num_regions: 16 - lower_limit: 220 - states: - state_space: sp1 #sp1 - image: - 0: [50] #[60, 120, 180] - sp1: - 0: [50] #[20, 50, 80, 120, 180, 200] - sp3: - 0: [5, 15, 22] - sp5: - 0: [3, 5, 10, 15, 20] - spn: - 0: [10] - rewards: - reward_function: discrete_follow_right_lane #linear_follow_line - discrete_follow_line: - from_0_to_02: 10 - from_02_to_04: 2 - from_others: 1 - penal: -100 - min_reward: 1_000 - highest_reward: 100 - linear_follow_line: - beta_0: 3 - beta_1: -0.1 - penal: 0 - min_reward: 1_000 - highest_reward: 100 - discrete_follow_right_lane: - from_10: 10 - from_02: 2 - from_01: 1 - penal: -100 - min_reward: 5_000 - highest_reward: 100 -##################################################################################### -# actions: mainly divided into continuous and discrete sets of actions. In continuous for plannar agents it is divided in min and max. -# In other cases, we create a set of actions, 3, 5 or more, where every one is [linear velocity, angular velocity] -# -# actions_number: for plannar agents, two actions are executed -# simple: -# 0: [3 m/sec, 0 rad/sec] -# 1: [2 m/sec, 1 rad/sec] -# 2: [2 m/sec, -1 rad/sec] -# -##################################################################################### - -actions: - actions_number: 2 # simple:3, medium:5, hard:7, continuous: 2 - actions_set: continuous #simple, continuous - available_actions: - simple: - 0: [ 3, 0 ] - 1: [ 2, 1 ] - 2: [ 2, -1 ] - medium: - 0: [ 3, 0 ] - 1: [ 2, 1 ] - 2: [ 2, -1 ] - 3: [ 1, 1.5 ] - 4: [ 1, -1.5 ] - hard: - 0: [ 3, 0 ] - 1: [ 2, 1 ] - 2: [ 2, -1 ] - 3: [ 1.5, 1 ] - 4: [ 1.5, -1 ] - 5: [ 1, -1.5 ] - 6: [ 1, -1.5 ] - test: - 0: [ 0, 0 ] - continuous: - v_min: 2 - v_max: 3 - w_right: -1 - w_left: 1 - - -##################################################################################### -# environments: configurates every param in all envs. -# -# Most relevant params: -# env_name: F1Env-v0, RobotMeshEnv-v0, myCartpole-v0, MyMountainCarEnv-v0 -# training_type: qlearn_camera, qlearn_laser, dqn, manual, ddpg -# circuit_positions_set: different positions in Gazebo simulator for every environment. Set represents x, y, z, 0, roll, pitch, yaw -# start_pose: agnet initial pose in every training. It takes number from circuit_positions_set param -# alternate_pose: if True, the agent randoms initial pose, taking from circuit_positions_set param. Otherwise, it takes start_pose number - -##################################################################################### - -environments: - simple: - env_name: F1Env-v0 - circuit_name: simple - training_type: ddpg_follow_lane #qlearn_camera_follow_line, qlearn_laser_follow_line, qlearn_camera_follow_lane, ddpg_follow_line, ddpg_follow_lane, dqn_follow_line, dqn_follow_lane - launchfile: simple_circuit_no_wall.launch - environment_folder: f1 - robot_name: f1_renault_multicamera_multilaser #f1_renault - start_pose: 0 - alternate_pose: False - estimated_steps: 20_000 - sensor: camera - circuit_positions_set: - 0: [52.800, -12.734, 0.004, 0, 0, 1.57, -1.57] # near to first curve - #0: [52.800, -8.734, 0.004, 0, 0, 1.57, -1.57] # Finish line - #0: [53.462, -41.988, 0.004, 0, 0, 1.57, -1.57] - 1: [52.97, -42.06, 0.004, 0, 0, 1.57, -1.57] - #1: [53.462, -8.734, 0.004, 0, 0, 1.57, -1.57] - #2: [39.712, -30.741, 0.004, 0, 0, 1.56, 1.56] - 2: [40.2, -30.741, 0.004, 0, 0, 1.56, 1.56] - #3: [-6.861, -36.481, 0.004, 0, 0.01, -0.858, 0.613] - 3: [0, 31.15, 0.004, 0, 0.01, 0, 0.31] - #4: [20.043, 37.130, 0.003, 0, 0.103, -1.4383, -1.4383] - 4: [19.25, 43.50, 0.004, 0, 0.0, 1.57, -1.69] - 5: [52.800, -35.486, 0.004, 0, 0, 1.57, -1.57] # near to first curve - - nurburgring: - env_name: F1Env-v0 - circuit_name: nurburgring - training_type: qlearn_camera - launchfile: nurburgring_line.launch - environment_folder: f1 - robot_name: f1_renault - start_pose: 0 - alternate_pose: False - estimated_steps: 3500 - sensor: camera - circuit_positions_set: - 0: [-32.3188, 12.2921, 0, 0.0014, 0.0049, -0.2727, 0.9620] - 1: [-30.6566, -21.4929, 0, 0.0014, 0.0049, -0.4727, 0.8720] - 2: [28.0352, -17.7923, 0, 0.0001, 0.0051, -0.028, 1] - 3: [88.7408, -31.7120, 0, 0.0030, 0.0041, -0.1683, 0.98] - 4: [-73.2172, 11.8508, 0, 0.0043, -0.0027, 0.8517, 0.5173] - 5: [-73.6672, 37.4308, 0, 0.0043, -0.0027, 0.8517, 0.5173] - montreal: - env_name: F1Env-v0 - circuit_name: montreal - training_type: qlearn_camera - launchfile: montreal_line.launch - environment_folder: f1 - robot_name: f1_renault - start_pose: 0 - alternate_pose: True - estimated_steps: 8000 - sensor: camera - circuit_positions_set: - 0: [-201.88, -91.02, 0, 0.00, 0.001, 0.98, -0.15] - 1: [-278.71, -95.50, 0, 0.00, 0.001, 1, 0.03] - 2: [-272.93, -17.70, 0, 0.0001, 0.001, 0.48, 0.87] - 3: [-132.73, 55.82, 0, 0.0030, 0.0041, -0.02, 0.9991] - 4: [294.99, 91.54, 0, 0.0043, -0.0027, 0.14, 0.99] - curves: - env_name: F1Env-v0 - circuit_name: curves - training_type: qlearn_camera - launchfile: many_curves.launch - environment_folder: f1 - robot_name: f1_renault - start_pose: 0 - alternate_pose: False - estimated_steps: 4000 - sensor: camera - simple_laser: - env_name: F1Env-v0 - circuit_name: simple_laser - training_type: qlearn_laser - launchfile: f1_montreal.launch - environment_folder: f1 - robot_name: f1_renault - start_pose: 0 - alternate_pose: False - estimated_steps: 4000 - sensor: laser - manual: - env_name: F1Env-v0 - circuit_name: manual - training_type: qlearn_camera - launchfile: simple_circuit.launch - environment_folder: f1 - robot_name: f1_renault - start_pose: 0 - alternate_pose: False - estimated_steps: 4000 - sensor: camera - -##################################################################################### -# inference: loading training files - -##################################################################################### -inference: - qlearn: - inference_file: /home/rubenlucas93/1_20220428_2115_act_set_simple_epsilon_0.8_QTABLE.pkl - actions_file: /home/rubenlucas93/actions_set_20220428_2115 - -##################################################################################### -# algorithm: every particular param - -##################################################################################### -algorithm: - qlearn: - alpha: 0.2 - epsilon: 0.95 - gamma: 0.9 - dqn: - alpha: 0.8 - gamma: 0.9 - epsilon: 0.99 - epsilon_discount: 0.9986 - epsilon_min: 0.05 - model_name: DQN_sp_16x16 - replay_memory_size: 50_000 - min_replay_memory_size: 1000 - minibatch_size: 64 - update_target_every: 5 - memory_fraction: 0.20 - buffer_capacity: 100_000 - batch_size: 64 - sarsa: - ddpg: - gamma: 0.9 - tau: 0.005 - std_dev: 0.2 - model_name: DDPG_Actor_conv2d32x64_Critic_conv2d32x64 - replay_memory_size: 50_000 - memory_fraction: 0.20 - critic_lr: 0.002 - actor_lr: 0.001 - buffer_capacity: 100_000 - batch_size: 64 diff --git a/rl_studio/config/config_f1followlaneimage_ddpg.yaml b/rl_studio/config/config_f1followlaneimage_ddpg.yaml deleted file mode 100644 index 338d18324..000000000 --- a/rl_studio/config/config_f1followlaneimage_ddpg.yaml +++ /dev/null @@ -1,290 +0,0 @@ -##################################################################################### -# General configuration file to configure RL-Studio in the training or inference mode -##################################################################################### - -##################################################################################### -# settings: General parameters -# -# Most relevant params: -# model_state_name: agent name -# total_episodes: training epochs -# training_time: in hours -# save_episodes: variable for TensorFlow savings -##################################################################################### - -settings: - task: follow_lane - output_dir: "./logs/" - save_model: True - save_positions: True - debug_level: DEBUG - telemetry: False - telemetry_mask: True - plotter_graphic: False - model_state_name: f1_renault_multicamera_multilaser #f1_renault - total_episodes: 60_000 - training_time: 5 - save_episodes: 5000 - save_every_step: 100 - lap_completed: False - load_qlearning_pickle_model: False - load_qlearning_pickle_file: 1_20210622_1512_actions_set_simple_epsilon_0.99_QTABLE.pkl - load_qlearning_model: False - load_qlearning_table: train_qlearning_f1_simple_EPISODE_1_20210625-082424-qtable.npy - ros_master_uri: '11311' - gazebo_master_uri: '11345' - -##################################################################################### -# agent: every agent configures states, rewards and sensors -# -# Most relevant params: -# image_resizing: percentage of image redimension to feed neural nets. I.e. 10 means a width of 64 pixels and height of 48 pixels -# num_regions: in simplified perception, number of image vertical divisions in which every state falls -# new_image_size: every image size is fixed in to feed neural net. I.e. 32 means a size of 32x32 pixels -# state_space: configurates how input data is feeding. Image means raw data from camera sensor. sp1,...spn means simplified perception of 1 to n points. -# image: 0: distance from image midline down in pixels -# sp1, sp3, sp5, spn: simplified perception with 1, 3, 5 or n points respectively. Every number represents pixels from image midline down -# reward_function: discrete_follow_line represents a hardcoded reward function in follow line project, linear_follow_line means regression function in follow line project -# -##################################################################################### - -agent: - f1: - agent_name: f1 - camera_params: - width: 640 - height: 480 - center_image: 320 - raw_image: False - image_resizing: 100 - new_image_size: 32 - num_regions: 16 - lower_limit: 220 - states: - state_space: image #sp1 - image: - 0: [50] #[60, 120, 180] - sp1: - 0: [50] #[20, 50, 80, 120, 180, 200] - sp3: - 0: [5, 15, 22] - sp5: - 0: [3, 5, 10, 15, 20] - spn: - 0: [10] - rewards: - reward_function: discrete_follow_right_lane #linear_follow_line - discrete_follow_line: - from_0_to_02: 10 - from_02_to_04: 2 - from_others: 1 - penal: -100 - min_reward: 1_000 - highest_reward: 100 - linear_follow_line: - beta_0: 3 - beta_1: -0.1 - penal: 0 - min_reward: 1_000 - highest_reward: 100 - discrete_follow_right_lane: - from_10: 10 - from_02: 2 - from_01: 1 - penal: -100 - min_reward: 5_000 - highest_reward: 100 -##################################################################################### -# actions: mainly divided into continuous and discrete sets of actions. In continuous for plannar agents it is divided in min and max. -# In other cases, we create a set of actions, 3, 5 or more, where every one is [linear velocity, angular velocity] -# -# actions_number: for plannar agents, two actions are executed -# simple: -# 0: [3 m/sec, 0 rad/sec] -# 1: [2 m/sec, 1 rad/sec] -# 2: [2 m/sec, -1 rad/sec] -# -##################################################################################### - -actions: - actions_number: 2 # simple:3, medium:5, hard:7, continuous: 2 - actions_set: continuous #simple, continuous - available_actions: - simple: - 0: [ 3, 0 ] - 1: [ 2, 1 ] - 2: [ 2, -1 ] - medium: - 0: [ 3, 0 ] - 1: [ 2, 1 ] - 2: [ 2, -1 ] - 3: [ 1, 1.5 ] - 4: [ 1, -1.5 ] - hard: - 0: [ 3, 0 ] - 1: [ 2, 1 ] - 2: [ 2, -1 ] - 3: [ 1.5, 1 ] - 4: [ 1.5, -1 ] - 5: [ 1, -1.5 ] - 6: [ 1, -1.5 ] - test: - 0: [ 0, 0 ] - continuous: - v_min: 2 - v_max: 2 - w_right: -1 - w_left: 1 - - -##################################################################################### -# environments: configurates every param in all envs. -# -# Most relevant params: -# env_name: F1Env-v0, RobotMeshEnv-v0, myCartpole-v0, MyMountainCarEnv-v0 -# training_type: qlearn_camera, qlearn_laser, dqn, manual, ddpg -# circuit_positions_set: different positions in Gazebo simulator for every environment. Set represents x, y, z, 0, roll, pitch, yaw -# start_pose: agnet initial pose in every training. It takes number from circuit_positions_set param -# alternate_pose: if True, the agent randoms initial pose, taking from circuit_positions_set param. Otherwise, it takes start_pose number - -##################################################################################### - -environments: - simple: - env_name: F1Env-v0 - circuit_name: simple - training_type: ddpg_follow_lane #qlearn_camera_follow_line, qlearn_laser_follow_line, qlearn_camera_follow_lane, ddpg_follow_line, ddpg_follow_lane, dqn_follow_line, dqn_follow_lane - launchfile: simple_circuit_no_wall.launch - environment_folder: f1 - robot_name: f1_renault_multicamera_multilaser #f1_renault - start_pose: 0 - alternate_pose: False - estimated_steps: 20_000 - sensor: camera - circuit_positions_set: - 0: [52.800, -12.734, 0.004, 0, 0, 1.57, -1.57] # near to first curve - #0: [52.800, -8.734, 0.004, 0, 0, 1.57, -1.57] # Finish line - #0: [53.462, -41.988, 0.004, 0, 0, 1.57, -1.57] - 1: [52.97, -42.06, 0.004, 0, 0, 1.57, -1.57] - #1: [53.462, -8.734, 0.004, 0, 0, 1.57, -1.57] - #2: [39.712, -30.741, 0.004, 0, 0, 1.56, 1.56] - 2: [40.2, -30.741, 0.004, 0, 0, 1.56, 1.56] - #3: [-6.861, -36.481, 0.004, 0, 0.01, -0.858, 0.613] - 3: [0, 31.15, 0.004, 0, 0.01, 0, 0.31] - #4: [20.043, 37.130, 0.003, 0, 0.103, -1.4383, -1.4383] - 4: [19.25, 43.50, 0.004, 0, 0.0, 1.57, -1.69] - 5: [52.800, -35.486, 0.004, 0, 0, 1.57, -1.57] # near to first curve - - nurburgring: - env_name: F1Env-v0 - circuit_name: nurburgring - training_type: qlearn_camera - launchfile: nurburgring_line.launch - environment_folder: f1 - robot_name: f1_renault - start_pose: 0 - alternate_pose: False - estimated_steps: 3500 - sensor: camera - circuit_positions_set: - 0: [-32.3188, 12.2921, 0, 0.0014, 0.0049, -0.2727, 0.9620] - 1: [-30.6566, -21.4929, 0, 0.0014, 0.0049, -0.4727, 0.8720] - 2: [28.0352, -17.7923, 0, 0.0001, 0.0051, -0.028, 1] - 3: [88.7408, -31.7120, 0, 0.0030, 0.0041, -0.1683, 0.98] - 4: [-73.2172, 11.8508, 0, 0.0043, -0.0027, 0.8517, 0.5173] - 5: [-73.6672, 37.4308, 0, 0.0043, -0.0027, 0.8517, 0.5173] - montreal: - env_name: F1Env-v0 - circuit_name: montreal - training_type: qlearn_camera - launchfile: montreal_line.launch - environment_folder: f1 - robot_name: f1_renault - start_pose: 0 - alternate_pose: True - estimated_steps: 8000 - sensor: camera - circuit_positions_set: - 0: [-201.88, -91.02, 0, 0.00, 0.001, 0.98, -0.15] - 1: [-278.71, -95.50, 0, 0.00, 0.001, 1, 0.03] - 2: [-272.93, -17.70, 0, 0.0001, 0.001, 0.48, 0.87] - 3: [-132.73, 55.82, 0, 0.0030, 0.0041, -0.02, 0.9991] - 4: [294.99, 91.54, 0, 0.0043, -0.0027, 0.14, 0.99] - curves: - env_name: F1Env-v0 - circuit_name: curves - training_type: qlearn_camera - launchfile: many_curves.launch - environment_folder: f1 - robot_name: f1_renault - start_pose: 0 - alternate_pose: False - estimated_steps: 4000 - sensor: camera - simple_laser: - env_name: F1Env-v0 - circuit_name: simple_laser - training_type: qlearn_laser - launchfile: f1_montreal.launch - environment_folder: f1 - robot_name: f1_renault - start_pose: 0 - alternate_pose: False - estimated_steps: 4000 - sensor: laser - manual: - env_name: F1Env-v0 - circuit_name: manual - training_type: qlearn_camera - launchfile: simple_circuit.launch - environment_folder: f1 - robot_name: f1_renault - start_pose: 0 - alternate_pose: False - estimated_steps: 4000 - sensor: camera - -##################################################################################### -# inference: loading training files - -##################################################################################### -inference: - qlearn: - inference_file: /home/rubenlucas93/1_20220428_2115_act_set_simple_epsilon_0.8_QTABLE.pkl - actions_file: /home/rubenlucas93/actions_set_20220428_2115 - -##################################################################################### -# algorithm: every particular param - -##################################################################################### -algorithm: - qlearn: - alpha: 0.2 - epsilon: 0.95 - gamma: 0.9 - dqn: - alpha: 0.8 - gamma: 0.9 - epsilon: 0.99 - epsilon_discount: 0.9986 - epsilon_min: 0.05 - model_name: DQN_sp_16x16 - replay_memory_size: 50_000 - min_replay_memory_size: 1000 - minibatch_size: 64 - update_target_every: 5 - memory_fraction: 0.20 - buffer_capacity: 100_000 - batch_size: 64 - sarsa: - ddpg: - gamma: 0.9 - tau: 0.005 - std_dev: 0.2 - model_name: DDPG_Actor32x64_Critic32x64x256_actionscont_stateimg - replay_memory_size: 50_000 - memory_fraction: 0.20 - critic_lr: 0.002 - actor_lr: 0.001 - buffer_capacity: 100_000 - batch_size: 64 diff --git a/rl_studio/config/config_f1followline_ddpg.yaml b/rl_studio/config/config_f1followline_ddpg.yaml deleted file mode 100644 index d142d5319..000000000 --- a/rl_studio/config/config_f1followline_ddpg.yaml +++ /dev/null @@ -1,274 +0,0 @@ -##################################################################################### -# General configuration file to configure RL-Studio in the training or inference mode -##################################################################################### - -##################################################################################### -# settings: General parameters -# -# Most relevant params: -# model_state_name: agent name -# total_episodes: training epochs -# training_time: in hours -# save_episodes: variable for TensorFlow savings -##################################################################################### - -settings: - output_dir: "./logs/" - save_model: True - save_positions: True - debug_level: DEBUG - telemetry: False - telemetry_mask: False - plotter_graphic: False - model_state_name: f1_renault - total_episodes: 50_000 - training_time: 6 - save_episodes: 50 - save_every_step: 1_000 - lap_completed: False - load_qlearning_pickle_model: False - load_qlearning_pickle_file: 1_20210622_1512_actions_set_simple_epsilon_0.99_QTABLE.pkl - load_qlearning_model: False - load_qlearning_table: train_qlearning_f1_simple_EPISODE_1_20210625-082424-qtable.npy - ros_master_uri: '11311' - gazebo_master_uri: '11345' - -##################################################################################### -# agent: every agent configures states, rewards and sensors -# -# Most relevant params: -# image_resizing: percentage of image redimension to feed neural nets. I.e. 10 means a width of 64 pixels and height of 48 pixels -# num_regions: in simplified perception, number of image vertical divisions in which every state falls -# new_image_size: every image size is fixed in to feed neural net. I.e. 32 means a size of 32x32 pixels -# state_space: configurates how input data is feeding. Image means raw data from camera sensor. sp1,...spn means simplified perception of 1 to n points. -# image: 0: distance from image midline down in pixels -# sp1, sp3, sp5, spn: simplified perception with 1, 3, 5 or n points respectively. Every number represents pixels from image midline down -# reward_function: discrete_follow_line represents a hardcoded reward function in follow line project, linear_follow_line means regression function in follow line project -# -##################################################################################### - -agent: - f1: - agent_name: f1 - camera_params: - width: 640 - height: 480 - center_image: 320 - raw_image: False - image_resizing: 100 - new_image_size: 32 - num_regions: 16 - states: - state_space: image #sp1 - image: - 0: [3] - sp1: - 0: [10] - sp3: - 0: [5, 15, 22] - sp5: - 0: [3, 5, 10, 15, 20] - spn: - 0: [10] - rewards: - reward_function: discrete_follow_line #linear_follow_line - discrete_follow_line: - from_0_to_02: 10 - from_02_to_04: 2 - from_others: 1 - penal: -100 - min_reward: 1_000 - highest_reward: 100 - linear_follow_line: - beta_0: 3 - beta_1: -0.1 - penal: 0 - min_reward: 1_000 - highest_reward: 100 - -##################################################################################### -# actions: mainly divided into continuous and discrete sets of actions. In continuous for plannar agents it is divided in min and max. -# In other cases, we create a set of actions, 3, 5 or more, where every one is [linear velocity, angular velocity] -# -# actions_number: for plannar agents, two actions are executed -# simple: -# 0: [3 m/sec, 0 rad/sec] -# 1: [2 m/sec, 1 rad/sec] -# 2: [2 m/sec, -1 rad/sec] -# -##################################################################################### - -actions: - actions_number: 2 - actions_set: continuous #simple - available_actions: - simple: - 0: [ 3, 0 ] - 1: [ 2, 1 ] - 2: [ 2, -1 ] - medium: - 0: [ 3, 0 ] - 1: [ 2, 1 ] - 2: [ 2, -1 ] - 3: [ 1, 1.5 ] - 4: [ 1, -1.5 ] - hard: - 0: [ 3, 0 ] - 1: [ 2, 1 ] - 2: [ 2, -1 ] - 3: [ 1.5, 1 ] - 4: [ 1.5, -1 ] - 5: [ 1, -1.5 ] - 6: [ 1, -1.5 ] - test: - 0: [ 0, 0 ] - continuous: - v_min: 2 - v_max: 30 - w_right: -3 - w_left: 3 - - -##################################################################################### -# environments: configurates every param in all envs. -# -# Most relevant params: -# env_name: F1Env-v0, RobotMeshEnv-v0, myCartpole-v0, MyMountainCarEnv-v0 -# training_type: qlearn_camera, qlearn_laser, dqn, manual, ddpg -# circuit_positions_set: different positions in Gazebo simulator for every environment. Set represents x, y, z, 0, roll, pitch, yaw -# start_pose: agnet initial pose in every training. It takes number from circuit_positions_set param -# alternate_pose: if True, the agent randoms initial pose, taking from circuit_positions_set param. Otherwise, it takes start_pose number - -##################################################################################### - -environments: - simple: - env_name: F1Env-v0 - circuit_name: simple - training_type: ddpg #qlearn_camera - launchfile: simple_circuit.launch - environment_folder: f1 - robot_name: f1_renault - start_pose: 0 - alternate_pose: False - estimated_steps: 4000 - sensor: camera - circuit_positions_set: - 0: [53.462, -41.988, 0.004, 0, 0, 1.57, -1.57] - 1: [53.462, -8.734, 0.004, 0, 0, 1.57, -1.57] - 2: [39.712, -30.741, 0.004, 0, 0, 1.56, 1.56] - 3: [-6.861, -36.481, 0.004, 0, 0.01, -0.858, 0.613] - 4: [20.043, 37.130, 0.003, 0, 0.103, -1.4383, -1.4383] - nurburgring: - env_name: F1Env-v0 - circuit_name: nurburgring - training_type: qlearn_camera - launchfile: nurburgring_line.launch - environment_folder: f1 - robot_name: f1_renault - start_pose: 0 - alternate_pose: True - estimated_steps: 3500 - sensor: camera - circuit_positions_set: - 0: [-32.3188, 12.2921, 0, 0.0014, 0.0049, -0.2727, 0.9620] - 1: [-30.6566, -21.4929, 0, 0.0014, 0.0049, -0.4727, 0.8720] - 2: [28.0352, -17.7923, 0, 0.0001, 0.0051, -0.028, 1] - 3: [88.7408, -31.7120, 0, 0.0030, 0.0041, -0.1683, 0.98] - 4: [-73.2172, 11.8508, 0, 0.0043, -0.0027, 0.8517, 0.5173] - 5: [-73.6672, 37.4308, 0, 0.0043, -0.0027, 0.8517, 0.5173] - montreal: - env_name: F1Env-v0 - circuit_name: montreal - training_type: qlearn_camera - launchfile: montreal_line.launch - environment_folder: f1 - robot_name: f1_renault - start_pose: 0 - alternate_pose: True - estimated_steps: 8000 - sensor: camera - circuit_positions_set: - 0: [-201.88, -91.02, 0, 0.00, 0.001, 0.98, -0.15] - 1: [-278.71, -95.50, 0, 0.00, 0.001, 1, 0.03] - 2: [-272.93, -17.70, 0, 0.0001, 0.001, 0.48, 0.87] - 3: [-132.73, 55.82, 0, 0.0030, 0.0041, -0.02, 0.9991] - 4: [294.99, 91.54, 0, 0.0043, -0.0027, 0.14, 0.99] - curves: - env_name: F1Env-v0 - circuit_name: curves - training_type: qlearn_camera - launchfile: many_curves.launch - environment_folder: f1 - robot_name: f1_renault - start_pose: 0 - alternate_pose: False - estimated_steps: 4000 - sensor: camera - simple_laser: - env_name: F1Env-v0 - circuit_name: simple_laser - training_type: qlearn_laser - launchfile: f1_montreal.launch - environment_folder: f1 - robot_name: f1_renault - start_pose: 0 - alternate_pose: False - estimated_steps: 4000 - sensor: laser - manual: - env_name: F1Env-v0 - circuit_name: manual - training_type: qlearn_camera - launchfile: simple_circuit.launch - environment_folder: f1 - robot_name: f1_renault - start_pose: 0 - alternate_pose: False - estimated_steps: 4000 - sensor: camera - -##################################################################################### -# inference: loading training files - -##################################################################################### -inference: - qlearn: - inference_file: /home/rubenlucas93/1_20220428_2115_act_set_simple_epsilon_0.8_QTABLE.pkl - actions_file: /home/rubenlucas93/actions_set_20220428_2115 - -##################################################################################### -# algorithm: every particular param - -##################################################################################### -algorithm: - qlearn: - alpha: 0.2 - epsilon: 0.95 - gamma: 0.9 - dqn: - alpha: 0.8 - gamma: 0.9 - epsilon: 0.99 - epsilon_discount: 0.9986 - epsilon_min: 0.05 - model_name: DQN_sp_16x16 - replay_memory_size: 50_000 - min_replay_memory_size: 1000 - minibatch_size: 64 - update_target_every: 5 - memory_fraction: 0.20 - buffer_capacity: 100_000 - batch_size: 64 - sarsa: - ddpg: - gamma: 0.9 - tau: 0.005 - std_dev: 0.2 - model_name: DDPG_Actor_conv2d32x64_Critic_conv2d32x64 - replay_memory_size: 50_000 - memory_fraction: 0.20 - critic_lr: 0.002 - actor_lr: 0.001 - buffer_capacity: 100_000 - batch_size: 64 diff --git a/rl_studio/config/config_f1followline_dqn.yaml b/rl_studio/config/config_f1followline_dqn.yaml deleted file mode 100644 index fb8740118..000000000 --- a/rl_studio/config/config_f1followline_dqn.yaml +++ /dev/null @@ -1,274 +0,0 @@ -##################################################################################### -# General configuration file to configure RL-Studio in the training or inference mode -##################################################################################### - -##################################################################################### -# settings: General parameters -# -# Most relevant params: -# model_state_name: agent name -# total_episodes: training epochs -# training_time: in hours -# save_episodes: variable for TensorFlow savings -##################################################################################### - -settings: - output_dir: "./logs/" - save_model: True - save_positions: True - debug_level: DEBUG - telemetry: False - telemetry_mask: False - plotter_graphic: False - model_state_name: f1_renault - total_episodes: 50_000 - training_time: 6 - save_episodes: 50 - save_every_step: 1_000 - lap_completed: False - load_qlearning_pickle_model: False - load_qlearning_pickle_file: 1_20210622_1512_actions_set_simple_epsilon_0.99_QTABLE.pkl - load_qlearning_model: False - load_qlearning_table: train_qlearning_f1_simple_EPISODE_1_20210625-082424-qtable.npy - ros_master_uri: '11311' - gazebo_master_uri: '11345' - -##################################################################################### -# agent: every agent configures states, rewards and sensors -# -# Most relevant params: -# image_resizing: percentage of image redimension to feed neural nets. I.e. 10 means a width of 64 pixels and height of 48 pixels -# num_regions: in simplified perception, number of image vertical divisions in which every state falls -# new_image_size: every image size is fixed in to feed neural net. I.e. 32 means a size of 32x32 pixels -# state_space: configurates how input data is feeding. Image means raw data from camera sensor. sp1,...spn means simplified perception of 1 to n points. -# image: 0: distance from image midline down in pixels -# sp1, sp3, sp5, spn: simplified perception with 1, 3, 5 or n points respectively. Every number represents pixels from image midline down -# reward_function: discrete_follow_line represents a hardcoded reward function in follow line project, linear_follow_line means regression function in follow line project -# -##################################################################################### - -agent: - f1: - agent_name: f1 - camera_params: - width: 640 - height: 480 - center_image: 320 - raw_image: False - image_resizing: 100 # let fix it in 100 - new_image_size: 32 - num_regions: 16 - states: - state_space: image #sp1 - image: - 0: [3] - sp1: - 0: [10] - sp3: - 0: [5, 15, 22] - sp5: - 0: [3, 5, 10, 15, 20] - spn: - 0: [10] - rewards: - reward_function: discrete_follow_line #linear_follow_line - discrete_follow_line: - from_0_to_02: 10 - from_02_to_04: 2 - from_others: 1 - penal: -100 - min_reward: 1_000 - highest_reward: 100 - linear_follow_line: - beta_0: 3 - beta_1: -0.1 - penal: 0 - min_reward: 1_000 - highest_reward: 100 - -##################################################################################### -# actions: mainly divided into continuous and discrete sets of actions. In continuous for plannar agents it is divided in min and max. -# In other cases, we create a set of actions, 3, 5 or more, where every one is [linear velocity, angular velocity] -# -# actions_number: for plannar agents, two actions are executed -# simple: -# 0: [3 m/sec, 0 rad/sec] -# 1: [2 m/sec, 1 rad/sec] -# 2: [2 m/sec, -1 rad/sec] -# -##################################################################################### - -actions: - actions_number: 3 - actions_set: simple #simple - available_actions: - simple: - 0: [ 3, 0 ] - 1: [ 2, 1 ] - 2: [ 2, -1 ] - medium: - 0: [ 3, 0 ] - 1: [ 2, 1 ] - 2: [ 2, -1 ] - 3: [ 1, 1.5 ] - 4: [ 1, -1.5 ] - hard: - 0: [ 3, 0 ] - 1: [ 2, 1 ] - 2: [ 2, -1 ] - 3: [ 1.5, 1 ] - 4: [ 1.5, -1 ] - 5: [ 1, -1.5 ] - 6: [ 1, -1.5 ] - test: - 0: [ 0, 0 ] - continuous: - v_min: 2 - v_max: 30 - w_right: -3 - w_left: 3 - - -##################################################################################### -# environments: configurates every param in all envs. -# -# Most relevant params: -# env_name: F1Env-v0, RobotMeshEnv-v0, myCartpole-v0, MyMountainCarEnv-v0 -# training_type: qlearn_camera, qlearn_laser, dqn, manual, ddpg -# circuit_positions_set: different positions in Gazebo simulator for every environment. Set represents x, y, z, 0, roll, pitch, yaw -# start_pose: agnet initial pose in every training. It takes number from circuit_positions_set param -# alternate_pose: if True, the agent randoms initial pose, taking from circuit_positions_set param. Otherwise, it takes start_pose number - -##################################################################################### - -environments: - simple: - env_name: F1Env-v0 - circuit_name: simple - training_type: dqn_follow_line #qlearn_camera_follow_line, qlearn_laser_follow_line, qlearn_camera_follow_lane, ddpg_follow_line, ddpg_follow_lane, dqn_follow_line, dqn_follow_lane - launchfile: simple_circuit.launch - environment_folder: f1 - robot_name: f1_renault - start_pose: 0 - alternate_pose: False - estimated_steps: 4000 - sensor: camera - circuit_positions_set: - 0: [53.462, -41.988, 0.004, 0, 0, 1.57, -1.57] - 1: [53.462, -8.734, 0.004, 0, 0, 1.57, -1.57] - 2: [39.712, -30.741, 0.004, 0, 0, 1.56, 1.56] - 3: [-6.861, -36.481, 0.004, 0, 0.01, -0.858, 0.613] - 4: [20.043, 37.130, 0.003, 0, 0.103, -1.4383, -1.4383] - nurburgring: - env_name: F1Env-v0 - circuit_name: nurburgring - training_type: qlearn_camera - launchfile: nurburgring_line.launch - environment_folder: f1 - robot_name: f1_renault - start_pose: 0 - alternate_pose: True - estimated_steps: 3500 - sensor: camera - circuit_positions_set: - 0: [-32.3188, 12.2921, 0, 0.0014, 0.0049, -0.2727, 0.9620] - 1: [-30.6566, -21.4929, 0, 0.0014, 0.0049, -0.4727, 0.8720] - 2: [28.0352, -17.7923, 0, 0.0001, 0.0051, -0.028, 1] - 3: [88.7408, -31.7120, 0, 0.0030, 0.0041, -0.1683, 0.98] - 4: [-73.2172, 11.8508, 0, 0.0043, -0.0027, 0.8517, 0.5173] - 5: [-73.6672, 37.4308, 0, 0.0043, -0.0027, 0.8517, 0.5173] - montreal: - env_name: F1Env-v0 - circuit_name: montreal - training_type: qlearn_camera - launchfile: montreal_line.launch - environment_folder: f1 - robot_name: f1_renault - start_pose: 0 - alternate_pose: True - estimated_steps: 8000 - sensor: camera - circuit_positions_set: - 0: [-201.88, -91.02, 0, 0.00, 0.001, 0.98, -0.15] - 1: [-278.71, -95.50, 0, 0.00, 0.001, 1, 0.03] - 2: [-272.93, -17.70, 0, 0.0001, 0.001, 0.48, 0.87] - 3: [-132.73, 55.82, 0, 0.0030, 0.0041, -0.02, 0.9991] - 4: [294.99, 91.54, 0, 0.0043, -0.0027, 0.14, 0.99] - curves: - env_name: F1Env-v0 - circuit_name: curves - training_type: qlearn_camera - launchfile: many_curves.launch - environment_folder: f1 - robot_name: f1_renault - start_pose: 0 - alternate_pose: False - estimated_steps: 4000 - sensor: camera - simple_laser: - env_name: F1Env-v0 - circuit_name: simple_laser - training_type: qlearn_laser - launchfile: f1_montreal.launch - environment_folder: f1 - robot_name: f1_renault - start_pose: 0 - alternate_pose: False - estimated_steps: 4000 - sensor: laser - manual: - env_name: F1Env-v0 - circuit_name: manual - training_type: qlearn_camera - launchfile: simple_circuit.launch - environment_folder: f1 - robot_name: f1_renault - start_pose: 0 - alternate_pose: False - estimated_steps: 4000 - sensor: camera - -##################################################################################### -# inference: loading training files - -##################################################################################### -inference: - qlearn: - inference_file: /home/rubenlucas93/1_20220428_2115_act_set_simple_epsilon_0.8_QTABLE.pkl - actions_file: /home/rubenlucas93/actions_set_20220428_2115 - -##################################################################################### -# algorithm: every particular param - -##################################################################################### -algorithm: - qlearn: - alpha: 0.2 - epsilon: 0.95 - gamma: 0.9 - dqn: - alpha: 0.8 - gamma: 0.9 - epsilon: 0.99 - epsilon_discount: 0.9986 - epsilon_min: 0.05 - model_name: DQN_sp_16x16 - replay_memory_size: 50_000 - min_replay_memory_size: 1000 - minibatch_size: 64 - update_target_every: 5 - memory_fraction: 0.20 - buffer_capacity: 100_000 - batch_size: 64 - sarsa: - ddpg: - gamma: 0.9 - tau: 0.005 - std_dev: 0.2 - model_name: DDPG_Actor_conv2d32x64_Critic_conv2d32x64 - replay_memory_size: 50_000 - memory_fraction: 0.20 - critic_lr: 0.002 - actor_lr: 0.001 - buffer_capacity: 100_000 - batch_size: 64 diff --git a/rl_studio/config/config_f1followline_qlearn.yaml b/rl_studio/config/config_f1followline_qlearn.yaml deleted file mode 100644 index 33058458c..000000000 --- a/rl_studio/config/config_f1followline_qlearn.yaml +++ /dev/null @@ -1,331 +0,0 @@ -##################################################################################### -# General configuration file to configure RL-Studio in the training or inference mode -##################################################################################### - -##################################################################################### -# settings: General parameters -# -# Most relevant params: -# model_state_name: agent name -# total_episodes: training epochs -# training_time: in hours -# save_episodes: variable for TensorFlow savings -##################################################################################### - -settings: - output_dir: "./logs/" - save_model: True - save_positions: True - telemetry: False - telemetry_mask: False - plotter_graphic: False - model_state_name: f1_camera_parking # autoparking - total_episodes: 50_000 - training_time: 4 - save_episodes: 50 - save_every_step: 1_000 - lap_completed: False - load_qlearning_pickle_model: False - load_qlearning_pickle_file: 1_20210622_1512_actions_set_simple_epsilon_0.99_QTABLE.pkl - load_qlearning_model: False - load_qlearning_table: train_qlearning_f1_simple_EPISODE_1_20210625-082424-qtable.npy - ros_master_uri: '11311' - gazebo_master_uri: '11345' - -##################################################################################### -# agent: every agent configures states, rewards and sensors -# -# Most relevant params: -# image_resizing: percentage of image redimension to feed neural nets. I.e. 10 means a width of 64 pixels and height of 48 pixels -# num_regions: in simplified perception, number of image vertical divisions in which every state falls -# new_image_size: every image size is fixed in to feed neural net. I.e. 32 means a size of 32x32 pixels -# state_space: configurates how input data is feeding. Image means raw data from camera sensor. sp1,...spn means simplified perception of 1 to n points. -# image: 0: distance from image midline down in pixels -# sp1, sp3, sp5, spn: simplified perception with 1, 3, 5 or n points respectively. Every number represents pixels from image midline down -# reward_function: discrete_follow_line represents a hardcoded reward function in follow line project, linear_follow_line means regression function in follow line project -# -##################################################################################### - -agent: - f1: - agent_name: f1 - camera_params: - width: 640 - height: 480 - center_image: 320 - raw_image: False - image_resizing: 100 - new_image_size: 32 - num_regions: 16 - states: - state_space: sp1 #sp1 - image: - 0: [3] - sp1: - 0: [10] - sp3: - 0: [5, 15, 22] - sp5: - 0: [3, 5, 10, 15, 20] - spn: - 0: [10] - rewards: - reward_function: discrete_follow_line #linear_follow_line - discrete_follow_line: - from_0_to_02: 10 - from_02_to_04: 2 - from_others: 1 - penal: -100 - min_reward: 1_000 - highest_reward: 100 - linear_follow_line: - beta_0: 3 - beta_1: -0.1 - penal: 0 - min_reward: 1_000 - highest_reward: 100 - autoparking: - agent_name: autoparking - camera_params: - width: 640 - height: 480 - center_image: 320 - raw_image: False - image_resizing: 100 - new_image_size: 32 - states: - state_space: sp_curb - sp_curb: - poi: 3 - regions: 16 - pixels_cropping: 200 - sp_curb3: - 0: [5, 15, 22] - sp5: - 0: [3, 5, 10, 15, 20] - spn: - 0: [10] - rewards: - reward_function: discrete_autoparking - discrete_autoparking: - from_1_to_05: 10 - from_05_to_085: 20 - from_085_to_095: 40 - from_others: 1 - penal_reward: -100 - min_reward: 50 - goal_reward: 1100 - -##################################################################################### -# actions: mainly divided into continuous and discrete sets of actions. In continuous for plannar agents it is divided in min and max. -# In other cases, we create a set of actions, 3, 5 or more, where every one is [linear velocity, angular velocity] -# -# actions_number: for plannar agents, two actions are executed -# simple: -# 0: [3 m/sec, 0 rad/sec] -# 1: [2 m/sec, 1 rad/sec] -# 2: [2 m/sec, -1 rad/sec] -# -##################################################################################### - -actions: - actions_number: 3 - actions_set: simple #simple - available_actions: - simple: - 0: [ 3, 0 ] - 1: [ 2, 1 ] - 2: [ 2, -1 ] - medium: - 0: [ 3, 0 ] - 1: [ 2, 1 ] - 2: [ 2, -1 ] - 3: [ 1, 1.5 ] - 4: [ 1, -1.5 ] - hard: - 0: [ 3, 0 ] - 1: [ 2, 1 ] - 2: [ 2, -1 ] - 3: [ 1.5, 1 ] - 4: [ 1.5, -1 ] - 5: [ 1, -1.5 ] - 6: [ 1, -1.5 ] - test: - 0: [ 0, 0 ] - continuous: - v_min: 2 - v_max: 30 - w_right: -3 - w_left: 3 - autoparking_simple: - 0: [ 3, 0 ] - 1: [ 2, 0 ] - 2: [ 1, 0 ] - 3: [ 0, 0 ] - 4: [ -1, 0 ] - -##################################################################################### -# environments: configurates every param in all envs. -# -# Most relevant params: -# env_name: F1Env-v0, RobotMeshEnv-v0, myCartpole-v0, MyMountainCarEnv-v0 -# training_type: qlearn_camera, qlearn_laser, dqn, manual, ddpg -# circuit_positions_set: different positions in Gazebo simulator for every environment. Set represents x, y, z, 0, roll, pitch, yaw -# start_pose: agent initial pose in every training. It takes number from circuit_positions_set param -# alternate_pose: if True, the agent randoms initial pose, taking from circuit_positions_set param. Otherwise, it takes start_pose number - -##################################################################################### - -environments: - simple: - env_name: F1Env-v0 - circuit_name: simple - training_type: qlearn_camera_follow_line #qlearn_camera_follow_lane, dqn_follow_line, dqn_follow_lane, ddpg_follow_line, ddpg_follow_lane - launchfile: simple_circuit.launch - environment_folder: f1 - robot_name: f1_renault - start_pose: 0 # 0, 1, 2, 3, 4 - alternate_pose: False - estimated_steps: 15_000 - sensor: camera - circuit_positions_set: - 0: [53.462, -41.988, 0.004, 0, 0, 1.57, -1.57] - 1: [53.462, -8.734, 0.004, 0, 0, 1.57, -1.57] - 2: [39.712, -30.741, 0.004, 0, 0, 1.56, 1.56] - 3: [-6.861, -36.481, 0.004, 0, 0.01, -0.858, 0.613] - 4: [20.043, 37.130, 0.003, 0, 0.103, -1.4383, -1.4383] - nurburgring: - env_name: F1Env-v0 - circuit_name: nurburgring - training_type: qlearn_camera - launchfile: nurburgring_line.launch - environment_folder: f1 - robot_name: f1_renault - start_pose: 0 - alternate_pose: True - estimated_steps: 3500 - sensor: camera - circuit_positions_set: - 0: [-32.3188, 12.2921, 0, 0.0014, 0.0049, -0.2727, 0.9620] - 1: [-30.6566, -21.4929, 0, 0.0014, 0.0049, -0.4727, 0.8720] - 2: [28.0352, -17.7923, 0, 0.0001, 0.0051, -0.028, 1] - 3: [88.7408, -31.7120, 0, 0.0030, 0.0041, -0.1683, 0.98] - 4: [-73.2172, 11.8508, 0, 0.0043, -0.0027, 0.8517, 0.5173] - 5: [-73.6672, 37.4308, 0, 0.0043, -0.0027, 0.8517, 0.5173] - montreal: - env_name: F1Env-v0 - circuit_name: montreal - training_type: qlearn_camera - launchfile: montreal_line.launch - environment_folder: f1 - robot_name: f1_renault - start_pose: 0 - alternate_pose: True - estimated_steps: 8000 - sensor: camera - circuit_positions_set: - 0: [-201.88, -91.02, 0, 0.00, 0.001, 0.98, -0.15] - 1: [-278.71, -95.50, 0, 0.00, 0.001, 1, 0.03] - 2: [-272.93, -17.70, 0, 0.0001, 0.001, 0.48, 0.87] - 3: [-132.73, 55.82, 0, 0.0030, 0.0041, -0.02, 0.9991] - 4: [294.99, 91.54, 0, 0.0043, -0.0027, 0.14, 0.99] - curves: - env_name: F1Env-v0 - circuit_name: curves - training_type: qlearn_camera - launchfile: many_curves.launch - environment_folder: f1 - robot_name: f1_renault - start_pose: 0 - alternate_pose: False - estimated_steps: 4000 - sensor: camera - simple_laser: - env_name: F1Env-v0 - circuit_name: simple_laser - training_type: qlearn_laser - launchfile: f1_montreal.launch - environment_folder: f1 - robot_name: f1_renault - start_pose: 0 - alternate_pose: False - estimated_steps: 4000 - sensor: laser - manual: - env_name: F1Env-v0 - circuit_name: manual - training_type: qlearn_camera - launchfile: simple_circuit.launch - environment_folder: f1 - robot_name: f1_renault - start_pose: 0 - alternate_pose: True - estimated_steps: 4000 - sensor: camera - autoparking: - env_name: AutoparkingEnv-v0 - circuit_name: autoparking - training_type: qlearn #qlearn, dqn, qlearn, manual, ddpg - launchfile: autoparking.launch - environment_folder: autoparking - robot_name: f1_camera_parking # autoparking_f1_camera_laser # - estimated_steps: 50 - sensor: laser #laser, camera_laser, camera - start_pose: 0 - alternate_pose: False - circuit_positions_set: - 0: [5.81294, 4.30243, 0.025329, 0.00013, 0.010376, 3.138] #x, y, z, roll, pitch, waw - 1: [10.0231, -0.720526, 0.025333, 0.000116, 0.010372, 2.5] - 2: [9.81003, 16.7248, 0.025296, 0.0002, 0.010271, -1.92009] - 3: [15.1722, 4.66392, 0.025344, 7.6e-05, 0.010362, -3.12394] - 4: [14.2657, -2.26994, 0.02533, 5.1e-05, 0.010363, -3.12403] - 5: [18.4119, 22.1479, 0.025338, 8.1e-05, 0.010356, -3.12407] - 6: [8.43921, -2.90071, 0.025338, 8.1e-05, 0.010356, 1.55485] - parking_spot_position_x: 2 - parking_spot_position_y: 4.30 - -##################################################################################### -# inference: loading training files - -##################################################################################### -inference: - qlearn: - inference_file: /home/rubenlucas93/1_20220428_2115_act_set_simple_epsilon_0.8_QTABLE.pkl - actions_file: /home/rubenlucas93/actions_set_20220428_2115 - -##################################################################################### -# algorithm: every particular param - -##################################################################################### -algorithm: - qlearn: - alpha: 0.2 - epsilon: 0.95 - epsilon_min: 0.05 - gamma: 0.9 - dqn: - alpha: 0.8 - gamma: 0.9 - epsilon: 0.99 - epsilon_discount: 0.9986 - epsilon_min: 0.05 - model_name: DQN_sp_16x16 - replay_memory_size: 50_000 - min_replay_memory_size: 1000 - minibatch_size: 64 - update_target_every: 5 - memory_fraction: 0.20 - buffer_capacity: 100_000 - batch_size: 64 - sarsa: - ddpg: - gamma: 0.9 - tau: 0.005 - std_dev: 0.2 - model_name: DDPG_Actor_conv2d32x64_Critic_conv2d32x64 - replay_memory_size: 50_000 - memory_fraction: 0.20 - critic_lr: 0.002 - actor_lr: 0.001 - buffer_capacity: 100_000 - batch_size: 64 diff --git a/rl_studio/config/config_inference_followlane_ddpg_f1_gazebo.yaml b/rl_studio/config/config_inference_followlane_ddpg_f1_gazebo.yaml new file mode 100644 index 000000000..f4ef83a55 --- /dev/null +++ b/rl_studio/config/config_inference_followlane_ddpg_f1_gazebo.yaml @@ -0,0 +1,150 @@ +settings: + mode: inference # training, retraining, inference + task: follow_lane_gazebo # follow_line_gazebo, follow_lane_gazebo, autoparking_gazebo + algorithm: ddpg # qlearn, dqn, ddpg, ppo + simulator: gazebo # openai, carla, gazebo, sumo + environment_set: gazebo_environments # gazebo_environments, carla_environments + env: simple # simple, nurburgring, montreal, curves, simple_laser, manual, autoparking + agent: f1 # f1, autoparkingRL, auto_carla, mountain_car, robot_mesh, cartpole, turtlebot + actions: continuous # continuous, simple, medium, hard, test, autoparking_simple + states: image #image, sp1 (simplified perception with 1 point), sp3 (simplified perception with 3 points), spn (simplified perception with n points) + rewards: follow_right_lane_only_center # rewards_followline_center + framework: TensorFlow # TensorFlow, Pytorch + total_episodes: 5 + training_time: 6 + models_dir: "./checkpoints" + logs_dir: "./logs" + metrics_dir: "./metrics" + +ros: + ros_master_uri: "11311" + gazebo_master_uri: "11345" + +retraining: + ddpg: + retrain_ddpg_tf_actor_model_name: + retrain_ddpg_tf_critic_model_name: + +inference: + ddpg: + inference_ddpg_tf_actor_model_name: "20230111_DDPG_Actor_conv2d32x64_Critic_conv2d32x64_BESTLAP_ACTOR_Max-69_Epoch-4_State-image_Actions-continuous_Rewards-follow_right_lane_center_v_w_linear_inTime-20230111-200026.h5" + inference_ddpg_tf_critic_model_name: "20230111_DDPG_Actor_conv2d32x64_Critic_conv2d32x64_BESTLAP_CRITIC_Max-69_Epoch-4_State-image_Actions-continuous_Rewards-follow_right_lane_center_v_w_linear_inTime-20230111-200026.h5" + +algorithm: + ddpg: + gamma: 0.9 + tau: 0.005 + std_dev: 0.2 + model_name: DDPG_Actor_conv2d32x64_Critic_conv2d32x64 + replay_memory_size: 50_000 + memory_fraction: 0.20 + critic_lr: 0.002 + actor_lr: 0.001 + buffer_capacity: 100_000 + batch_size: 64 + +agents: + f1: + camera_params: + width: 640 + height: 480 + center_image: 320 + raw_image: False + image_resizing: 100 + new_image_size: 32 + num_regions: 16 + lower_limit: 220 + +states: + image: + 0: [3] + sp1: + 0: [10] + sp3: + 0: [5, 15, 22] + sp5: + 0: [3, 5, 10, 15, 20] + spn: + 0: [10] + +actions: + simple: + 0: [3, 0] + 1: [2, 1] + 2: [2, -1] + medium: + 0: [3, 0] + 1: [2, 1] + 2: [2, -1] + 3: [1, 1.5] + 4: [1, -1.5] + hard: + 0: [3, 0] + 1: [2, 1] + 2: [2, -1] + 3: [1.5, 1] + 4: [1.5, -1] + 5: [1, -1.5] + 6: [1, -1.5] + test: + 0: [0, 0] + continuous: + v: [2, 30] + w: [-3, 3] + +rewards: + follow_right_lane_only_center: + from_10: 10 + from_02: 2 + from_01: 1 + penal: -100 + min_reward: 5_000 + highest_reward: 100 + follow_right_lane_center_v_step: + from_10: 10 + from_02: 2 + from_01: 1 + penal: -100 + min_reward: 5_000 + highest_reward: 100 + follow_right_lane_center_v_w_linear: # only for continuous actions + beta_0: 3 + beta_1: -0.1 + penal: 0 + min_reward: 1_000 + highest_reward: 100 + +gazebo_environments: + simple: + env_name: F1Env-v0 + circuit_name: simple + launchfile: simple_circuit.launch + environment_folder: f1 + robot_name: f1_renault + model_state_name: f1_renault # f1_renault_multicamera_multilaser + start_pose: 0 # 0, 1, 2, 3, 4 + alternate_pose: False + estimated_steps: 100 + sensor: camera + save_episodes: 5 + save_every_step: 10 + lap_completed: False + save_model: True + save_positions: True + debug_level: DEBUG + telemetry: False + telemetry_mask: False + plotter_graphic: False + circuit_positions_set: + 0: [52.800, -12.734, 0.004, 0, 0, 1.57, -1.57] # near to first curve + #0: [52.800, -8.734, 0.004, 0, 0, 1.57, -1.57] # Finish line + #0: [53.462, -41.988, 0.004, 0, 0, 1.57, -1.57] + 1: [52.97, -42.06, 0.004, 0, 0, 1.57, -1.57] + #1: [53.462, -8.734, 0.004, 0, 0, 1.57, -1.57] + #2: [39.712, -30.741, 0.004, 0, 0, 1.56, 1.56] + 2: [40.2, -30.741, 0.004, 0, 0, 1.56, 1.56] + #3: [-6.861, -36.481, 0.004, 0, 0.01, -0.858, 0.613] + 3: [0, 31.15, 0.004, 0, 0.01, 0, 0.31] + #4: [20.043, 37.130, 0.003, 0, 0.103, -1.4383, -1.4383] + 4: [19.25, 43.50, 0.004, 0, 0.0, 1.57, -1.69] + 5: [52.800, -35.486, 0.004, 0, 0, 1.57, -1.57] # near to first curve diff --git a/rl_studio/config/config_inference_followlane_dqn_f1_gazebo.yaml b/rl_studio/config/config_inference_followlane_dqn_f1_gazebo.yaml new file mode 100644 index 000000000..90853064d --- /dev/null +++ b/rl_studio/config/config_inference_followlane_dqn_f1_gazebo.yaml @@ -0,0 +1,142 @@ +settings: + mode: inference + task: follow_lane_gazebo # follow_line_gazebo, follow_lane_gazebo, autoparking_gazebo + algorithm: dqn # qlearn, dqn, ddpg, ppo + simulator: gazebo # openai, carla, gazebo, sumo + environment_set: gazebo_environments # gazebo_environments, carla_environments + env: simple # simple, nurburgring, montreal, curves, simple_laser, manual, autoparking + agent: f1 # f1, autoparkingRL, auto_carla, mountain_car, robot_mesh, cartpole, turtlebot + actions: simple # simple, medium, hard, test + states: sp1 #image, sp1 (simplified perception with 1 point), sp3 (simplified perception with 3 points), spn (simplified perception with n points) + rewards: follow_right_lane_only_center + framework: TensorFlow # TensorFlow, Pytorch + total_episodes: 5 + training_time: 6 + models_dir: "./checkpoints" + logs_dir: "./logs" + metrics_dir: "./metrics" + +ros: + ros_master_uri: "11311" + gazebo_master_uri: "11345" + +retraining: + dqn: + retrain_dqn_tf_model_name: + +inference: + dqn: + inference_dqn_tf_model_name: "DQN_sp_16x16_LAPCOMPLETED_Max165_Epoch1_inTime20221222-171814.model" + +algorithm: + dqn: + alpha: 0.8 + gamma: 0.9 + epsilon: 0.99 + epsilon_discount: 0.9986 + epsilon_min: 0.05 + model_name: DQN_sp_16x16 + replay_memory_size: 50_000 + min_replay_memory_size: 1000 + minibatch_size: 64 + update_target_every: 5 + memory_fraction: 0.20 + buffer_capacity: 100_000 + batch_size: 64 + +agents: + f1: + camera_params: + width: 640 + height: 480 + center_image: 320 + raw_image: False + image_resizing: 100 + new_image_size: 32 + num_regions: 16 + lower_limit: 220 + +states: + image: + 0: [3] + sp1: + 0: [10] + sp3: + 0: [5, 15, 22] + sp5: + 0: [3, 5, 10, 15, 20] + spn: + 0: [10] + +actions: + simple: + 0: [3, 0] + 1: [2, 1] + 2: [2, -1] + medium: + 0: [3, 0] + 1: [2, 1] + 2: [2, -1] + 3: [1, 1.5] + 4: [1, -1.5] + hard: + 0: [3, 0] + 1: [2, 1] + 2: [2, -1] + 3: [1.5, 1] + 4: [1.5, -1] + 5: [1, -1.5] + 6: [1, -1.5] + test: + 0: [0, 0] + +rewards: + follow_right_lane_only_center: + from_10: 10 + from_02: 2 + from_01: 1 + penal: -100 + min_reward: 5_000 + highest_reward: 100 + follow_right_lane_center_v_step: + from_10: 10 + from_02: 2 + from_01: 1 + penal: -100 + min_reward: 5_000 + highest_reward: 100 + +gazebo_environments: + simple: + env_name: F1Env-v0 + circuit_name: simple + launchfile: simple_circuit.launch + environment_folder: f1 + robot_name: f1_renault + model_state_name: f1_renault # f1_renault_multicamera_multilaser + start_pose: 0 # 0, 1, 2, 3, 4 + alternate_pose: False + estimated_steps: 100 + sensor: camera + save_episodes: 5 + save_every_step: 10 + lap_completed: False + save_model: True + save_positions: True + debug_level: DEBUG + telemetry: False + telemetry_mask: False + plotter_graphic: False + circuit_positions_set: + 0: [52.800, -12.734, 0.004, 0, 0, 1.57, -1.57] # near to first curve + #0: [52.800, -8.734, 0.004, 0, 0, 1.57, -1.57] # Finish line + #0: [53.462, -41.988, 0.004, 0, 0, 1.57, -1.57] + 1: [52.97, -42.06, 0.004, 0, 0, 1.57, -1.57] + #1: [53.462, -8.734, 0.004, 0, 0, 1.57, -1.57] + #2: [39.712, -30.741, 0.004, 0, 0, 1.56, 1.56] + 2: [40.2, -30.741, 0.004, 0, 0, 1.56, 1.56] + #3: [-6.861, -36.481, 0.004, 0, 0.01, -0.858, 0.613] + 3: [0, 31.15, 0.004, 0, 0.01, 0, 0.31] + #4: [20.043, 37.130, 0.003, 0, 0.103, -1.4383, -1.4383] + 4: [19.25, 43.50, 0.004, 0, 0.0, 1.57, -1.69] + 5: [52.800, -35.486, 0.004, 0, 0, 1.57, -1.57] # near to first curve diff --git a/rl_studio/config/config_inference_followlane_qlearn_f1_gazebo.yaml b/rl_studio/config/config_inference_followlane_qlearn_f1_gazebo.yaml new file mode 100644 index 000000000..7d1726dd3 --- /dev/null +++ b/rl_studio/config/config_inference_followlane_qlearn_f1_gazebo.yaml @@ -0,0 +1,132 @@ +settings: + mode: inference # training, retraining, inference + task: follow_lane_gazebo # follow_line_gazebo, follow_lane_gazebo, autoparking_gazebo + algorithm: qlearn # qlearn, dqn, ddpg, ppo + simulator: gazebo # openai, carla, gazebo, sumo + environment_set: gazebo_environments # gazebo_environments, carla_environments + env: simple # simple, nurburgring, montreal, curves, simple_laser, manual, autoparking + agent: f1 # f1, autoparkingRL, auto_carla, mountain_car, robot_mesh, cartpole, turtlebot + actions: simple # continuous, simple, medium, hard, test, autoparking_simple + states: sp1 #image, sp1 (simplified perception with 1 point), sp3 (simplified perception with 3 points), spn (simplified perception with n points) + rewards: follow_right_lane_only_center # rewards_followline_center + framework: _ + total_episodes: 10 + training_time: 6 + models_dir: "./checkpoints" + logs_dir: "./logs" + metrics_dir: "./metrics" + +ros: + ros_master_uri: "11311" + gazebo_master_uri: "11345" + +retraining: + qlearn: + retrain_qlearn_model_name: + +inference: + qlearn: + inference_qlearn_model_name: "20230109-154549_Circuit-simple_States-sp1_Actions-simple_Rewards-follow_right_lane_only_center_epsilon-0.76_epoch-2_step-101_reward-919-qtable.npy" + +algorithm: + qlearn: + alpha: 0.2 + epsilon: 0.95 + epsilon_min: 0.05 + gamma: 0.9 + +agents: + f1: + camera_params: + width: 640 + height: 480 + center_image: 320 + raw_image: False + image_resizing: 100 + new_image_size: 32 + num_regions: 16 + lower_limit: 220 + +states: + sp1: + 0: [10] + sp3: + 0: [5, 15, 22] + sp5: + 0: [3, 5, 10, 15, 20] + spn: + 0: [10] + +actions: + simple: + 0: [3, 0] + 1: [2, 1] + 2: [2, -1] + medium: + 0: [3, 0] + 1: [2, 1] + 2: [2, -1] + 3: [1, 1.5] + 4: [1, -1.5] + hard: + 0: [3, 0] + 1: [2, 1] + 2: [2, -1] + 3: [1.5, 1] + 4: [1.5, -1] + 5: [1, -1.5] + 6: [1, -1.5] + test: + 0: [0, 0] + +rewards: + follow_right_lane_only_center: + from_10: 10 + from_02: 2 + from_01: 1 + penal: -100 + min_reward: 5_000 + highest_reward: 100 + follow_right_lane_center_v_step: + from_10: 10 + from_02: 2 + from_01: 1 + penal: -100 + min_reward: 5_000 + highest_reward: 100 + +gazebo_environments: + simple: + env_name: F1Env-v0 + circuit_name: simple + launchfile: simple_circuit_no_wall.launch #simple_circuit.launch + environment_folder: f1 + robot_name: f1_renault + model_state_name: f1_renault_multicamera_multilaser # f1_renault, + start_pose: 0 # 0, 1, 2, 3, 4 + alternate_pose: False + estimated_steps: 100 + sensor: camera + save_episodes: 1 + save_every_step: 10 + lap_completed: False + save_model: True + save_positions: True + debug_level: DEBUG + telemetry: False + telemetry_mask: False + plotter_graphic: False + circuit_positions_set: + 0: [52.462, -41.988, 0.004, 0, 0, 1.57, -1.57] + #0: [52.800, -12.734, 0.004, 0, 0, 1.57, -1.57] # near to first curve + #0: [52.800, -8.734, 0.004, 0, 0, 1.57, -1.57] # Finish line + #0: [53.462, -41.988, 0.004, 0, 0, 1.57, -1.57] + 1: [52.97, -42.06, 0.004, 0, 0, 1.57, -1.57] + #1: [53.462, -8.734, 0.004, 0, 0, 1.57, -1.57] + #2: [39.712, -30.741, 0.004, 0, 0, 1.56, 1.56] + 2: [40.2, -30.741, 0.004, 0, 0, 1.56, 1.56] + #3: [-6.861, -36.481, 0.004, 0, 0.01, -0.858, 0.613] + 3: [0, 31.15, 0.004, 0, 0.01, 0, 0.31] + #4: [20.043, 37.130, 0.003, 0, 0.103, -1.4383, -1.4383] + 4: [19.25, 43.50, 0.004, 0, 0.0, 1.57, -1.69] + 5: [52.800, -35.486, 0.004, 0, 0, 1.57, -1.57] # near to first curve diff --git a/rl_studio/config/config_inference_followline_ddpg_f1_gazebo.yaml b/rl_studio/config/config_inference_followline_ddpg_f1_gazebo.yaml new file mode 100644 index 000000000..6c2000f44 --- /dev/null +++ b/rl_studio/config/config_inference_followline_ddpg_f1_gazebo.yaml @@ -0,0 +1,136 @@ +settings: + mode: inference # training, retraining, inference + task: follow_line_gazebo # follow_line_gazebo, follow_lane_gazebo, autoparking_gazebo + algorithm: ddpg # qlearn, dqn, ddpg, ppo + simulator: gazebo # openai, carla, gazebo, sumo + environment_set: gazebo_environments # gazebo_environments, carla_environments + env: simple # simple, nurburgring, montreal, curves, simple_laser, manual, autoparking + agent: f1 # f1, autoparkingRL, auto_carla, mountain_car, robot_mesh, cartpole, turtlebot + actions: continuous # continuous, simple, medium, hard, test, autoparking_simple + states: image #image, sp1 (simplified perception with 1 point), sp3 (simplified perception with 3 points), spn (simplified perception with n points) + rewards: followline_center # rewards_followline_center + framework: TensorFlow # TensorFlow, Pytorch + total_episodes: 10 + training_time: 6 + models_dir: "./checkpoints" + logs_dir: "./logs" + metrics_dir: "./metrics" + +ros: + ros_master_uri: "11311" + gazebo_master_uri: "11345" + +retraining: + ddpg: + retrain_ddpg_tf_actor_model_name: + retrain_ddpg_tf_critic_model_name: + +inference: + ddpg: + inference_ddpg_tf_actor_model_name: "20230111_DDPG_Actor_conv2d32x64_Critic_conv2d32x64_BATCH_ACTOR_Max--100_Epoch-5_State-image_Actions-continuous_Rewards-followline_center_inTime-20230111-183048.h5" + inference_ddpg_tf_critic_model_name: "20230111_DDPG_Actor_conv2d32x64_Critic_conv2d32x64_BATCH_CRITIC_Max--100_Epoch-5_State-image_Actions-continuous_Rewards-followline_center_inTime-20230111-183048.h5" + +algorithm: + ddpg: + gamma: 0.9 + tau: 0.005 + std_dev: 0.2 + model_name: DDPG_Actor_conv2d32x64_Critic_conv2d32x64 + replay_memory_size: 50_000 + memory_fraction: 0.20 + critic_lr: 0.002 + actor_lr: 0.001 + buffer_capacity: 100_000 + batch_size: 64 + +agents: + f1: + camera_params: + width: 640 + height: 480 + center_image: 320 + raw_image: False + image_resizing: 100 + new_image_size: 32 + num_regions: 16 + lower_limit: 220 + +states: + image: + 0: [3] + sp1: + 0: [10] + sp3: + 0: [5, 15, 22] + sp5: + 0: [3, 5, 10, 15, 20] + spn: + 0: [10] + +actions: + simple: + 0: [3, 0] + 1: [2, 1] + 2: [2, -1] + medium: + 0: [3, 0] + 1: [2, 1] + 2: [2, -1] + 3: [1, 1.5] + 4: [1, -1.5] + hard: + 0: [3, 0] + 1: [2, 1] + 2: [2, -1] + 3: [1.5, 1] + 4: [1.5, -1] + 5: [1, -1.5] + 6: [1, -1.5] + test: + 0: [0, 0] + continuous: + v: [2, 30] + w: [-3, 3] + +rewards: + followline_center: + from_10: 10 + from_02: 2 + from_01: 1 + penal: -100 + min_reward: 5_000 + highest_reward: 100 + followline_center_v_w_linear: # only for continuous actions + beta_0: 3 + beta_1: -0.1 + penal: 0 + min_reward: 1_000 + highest_reward: 100 + +gazebo_environments: + simple: + env_name: F1Env-v0 + circuit_name: simple + launchfile: simple_circuit.launch + environment_folder: f1 + robot_name: f1_renault + model_state_name: f1_renault # f1_renault_multicamera_multilaser + start_pose: 0 # 0, 1, 2, 3, 4 + alternate_pose: False + estimated_steps: 100 + sensor: camera + save_episodes: 5 + save_every_step: 10 + lap_completed: False + save_model: True + save_positions: True + debug_level: DEBUG + telemetry: False + telemetry_mask: False + plotter_graphic: False + circuit_positions_set: + 0: [53.462, -41.988, 0.004, 0, 0, 1.57, -1.57] + 1: [53.462, -8.734, 0.004, 0, 0, 1.57, -1.57] + 2: [39.712, -30.741, 0.004, 0, 0, 1.56, 1.56] + 3: [-6.861, -36.481, 0.004, 0, 0.01, -0.858, 0.613] + 4: [20.043, 37.130, 0.003, 0, 0.103, -1.4383, -1.4383] diff --git a/rl_studio/config/config_inference_followline_dqn_f1_gazebo.yaml b/rl_studio/config/config_inference_followline_dqn_f1_gazebo.yaml new file mode 100644 index 000000000..2e1d342b5 --- /dev/null +++ b/rl_studio/config/config_inference_followline_dqn_f1_gazebo.yaml @@ -0,0 +1,128 @@ +settings: + mode: inference + task: follow_line_gazebo # follow_line_gazebo, follow_lane_gazebo, autoparking_gazebo + algorithm: dqn # qlearn, dqn, ddpg, ppo + simulator: gazebo # openai, carla, gazebo, sumo + environment_set: gazebo_environments # gazebo_environments, carla_environments + env: simple # simple, nurburgring, montreal, curves, simple_laser, manual, autoparking + agent: f1 # f1, autoparkingRL, auto_carla, mountain_car, robot_mesh, cartpole, turtlebot + actions: simple # simple, medium, hard, test + states: sp1 #image, sp1 (simplified perception with 1 point), sp3 (simplified perception with 3 points), spn (simplified perception with n points) + rewards: followline_center # rewards_followline_center + framework: TensorFlow # TensorFlow, Pytorch + total_episodes: 5 + training_time: 6 + models_dir: "./checkpoints" + logs_dir: "./logs" + metrics_dir: "./metrics" + +ros: + ros_master_uri: "11311" + gazebo_master_uri: "11345" + +retraining: + dqn: + retrain_dqn_tf_model_name: + +inference: + dqn: + inference_dqn_tf_model_name: "DQN_sp_16x16_LAPCOMPLETED_Max990_Epoch3_inTime20230110-180135.model" + +algorithm: + dqn: + alpha: 0.8 + gamma: 0.9 + epsilon: 0.99 + epsilon_discount: 0.9986 + epsilon_min: 0.05 + model_name: DQN_sp_16x16 + replay_memory_size: 50_000 + min_replay_memory_size: 1000 + minibatch_size: 64 + update_target_every: 5 + memory_fraction: 0.20 + buffer_capacity: 100_000 + batch_size: 64 + +agents: + f1: + camera_params: + width: 640 + height: 480 + center_image: 320 + raw_image: False + image_resizing: 100 + new_image_size: 32 + num_regions: 16 + lower_limit: 220 + +states: + image: + 0: [3] + sp1: + 0: [10] + sp3: + 0: [5, 15, 22] + sp5: + 0: [3, 5, 10, 15, 20] + spn: + 0: [10] + +actions: + simple: + 0: [3, 0] + 1: [2, 1] + 2: [2, -1] + medium: + 0: [3, 0] + 1: [2, 1] + 2: [2, -1] + 3: [1, 1.5] + 4: [1, -1.5] + hard: + 0: [3, 0] + 1: [2, 1] + 2: [2, -1] + 3: [1.5, 1] + 4: [1.5, -1] + 5: [1, -1.5] + 6: [1, -1.5] + test: + 0: [0, 0] + +rewards: + followline_center: + from_10: 10 + from_02: 2 + from_01: 1 + penal: -100 + min_reward: 5_000 + highest_reward: 100 + +gazebo_environments: + simple: + env_name: F1Env-v0 + circuit_name: simple + launchfile: simple_circuit.launch + environment_folder: f1 + robot_name: f1_renault + model_state_name: f1_renault # f1_renault_multicamera_multilaser + start_pose: 0 # 0, 1, 2, 3, 4 + alternate_pose: False + estimated_steps: 50 + sensor: camera + save_episodes: 5 + save_every_step: 10 + lap_completed: False + save_model: True + save_positions: True + debug_level: DEBUG + telemetry: False + telemetry_mask: False + plotter_graphic: False + circuit_positions_set: + 0: [53.462, -41.988, 0.004, 0, 0, 1.57, -1.57] + 1: [53.462, -8.734, 0.004, 0, 0, 1.57, -1.57] + 2: [39.712, -30.741, 0.004, 0, 0, 1.56, 1.56] + 3: [-6.861, -36.481, 0.004, 0, 0.01, -0.858, 0.613] + 4: [20.043, 37.130, 0.003, 0, 0.103, -1.4383, -1.4383] diff --git a/rl_studio/config/config_inference_followline_qlearn_f1_gazebo.yaml b/rl_studio/config/config_inference_followline_qlearn_f1_gazebo.yaml new file mode 100644 index 000000000..478074632 --- /dev/null +++ b/rl_studio/config/config_inference_followline_qlearn_f1_gazebo.yaml @@ -0,0 +1,116 @@ +settings: + mode: inference # training, retraining, inference + task: follow_line_gazebo # follow_line_gazebo, follow_lane_gazebo, autoparking_gazebo + algorithm: qlearn # qlearn, dqn, ddpg, ppo + simulator: gazebo # openai, carla, gazebo, sumo + environment_set: gazebo_environments # gazebo_environments, carla_environments + env: simple # simple, nurburgring, montreal, curves, simple_laser, manual, autoparking + agent: f1 # f1, autoparkingRL, auto_carla, mountain_car, robot_mesh, cartpole, turtlebot + actions: simple # simple, medium, hard, test + states: sp1 #image, sp1 (simplified perception with 1 point), sp3 (simplified perception with 3 points), spn (simplified perception with n points) + rewards: followline_center # rewards_followline_center + framework: _ + total_episodes: 5 + training_time: 6 + models_dir: "./checkpoints" + logs_dir: "./logs" + metrics_dir: "./metrics" + +ros: + ros_master_uri: "11311" + gazebo_master_uri: "11345" + +retraining: + qlearn: + retrain_qlearn_model_name: + +inference: + qlearn: + inference_qlearn_model_name: "20230105-174932_Circuit-simple_States-sp1_Actions-simple_Rewards-followline_center_epsilon-0.05_epoch-10_step-15001_reward-134200-qtable.npy" + +algorithm: + qlearn: + alpha: 0.2 + epsilon: 0.95 + epsilon_min: 0.05 + gamma: 0.9 + +agents: + f1: + camera_params: + width: 640 + height: 480 + center_image: 320 + raw_image: False + image_resizing: 100 + new_image_size: 32 + num_regions: 16 + lower_limit: 220 +states: + sp1: + 0: [10] + sp3: + 0: [5, 15, 22] + sp5: + 0: [3, 5, 10, 15, 20] + spn: + 0: [10] + +actions: + simple: + 0: [3, 0] + 1: [2, 1] + 2: [2, -1] + medium: + 0: [3, 0] + 1: [2, 1] + 2: [2, -1] + 3: [1, 1.5] + 4: [1, -1.5] + hard: + 0: [3, 0] + 1: [2, 1] + 2: [2, -1] + 3: [1.5, 1] + 4: [1.5, -1] + 5: [1, -1.5] + 6: [1, -1.5] + test: + 0: [0, 0] + +rewards: + followline_center: + from_10: 10 + from_02: 2 + from_01: 1 + penal: -100 + min_reward: 5_000 + highest_reward: 100 + +gazebo_environments: + simple: + env_name: F1Env-v0 + circuit_name: simple + launchfile: simple_circuit.launch + environment_folder: f1 + robot_name: f1_renault + model_state_name: f1_renault # autoparking + start_pose: 0 # 0, 1, 2, 3, 4 + alternate_pose: False + estimated_steps: 100 + sensor: camera + save_episodes: 5 + save_every_step: 10 + lap_completed: False + save_model: True + save_positions: True + debug_level: DEBUG + telemetry: False + telemetry_mask: False + plotter_graphic: False + circuit_positions_set: + 0: [53.462, -41.988, 0.004, 0, 0, 1.57, -1.57] + 1: [53.462, -8.734, 0.004, 0, 0, 1.57, -1.57] + 2: [39.712, -30.741, 0.004, 0, 0, 1.56, 1.56] + 3: [-6.861, -36.481, 0.004, 0, 0.01, -0.858, 0.613] + 4: [20.043, 37.130, 0.003, 0, 0.103, -1.4383, -1.4383] diff --git a/rl_studio/config/config_mountain_car.yaml b/rl_studio/config/config_mountain_car_qlearn.yaml similarity index 71% rename from rl_studio/config/config_mountain_car.yaml rename to rl_studio/config/config_mountain_car_qlearn.yaml index 3b1237df0..28dc982c4 100644 --- a/rl_studio/config/config_mountain_car.yaml +++ b/rl_studio/config/config_mountain_car_qlearn.yaml @@ -3,16 +3,14 @@ settings: save_model: True save_positions: True telemetry: False + mode: inference + agent: mountain_car + algorithm: qlearn actions: - actions_number: 3 - actions_set: simple - available_actions: - # [lineal, angular] - simple: - 0: [ 0, 0, 0, 0] - 1: [ 0, 0, -0.9, -0.9] - 2: [ 0.9, 0.9, 0, 0] + 0: [ 0, 0, 0, 0] + 1: [ 0, 0, -0.9, -0.9] + 2: [ 0.9, 0.9, 0, 0] agent: mountain_car: @@ -21,7 +19,6 @@ agent: height: 480 environments: - simple: env_name: MyMountainCarEnv-v0 circuit_name: simple training_type: qlearn_camera @@ -54,14 +51,12 @@ environments: # estimated_steps: 4000 # sensor: camera +inference: + inference_file: /home/ruben/Desktop/my-RL-Studio/rl_studio/checkpoints/mountain_car/1_20221226_1928_epsilon_0.01_QTABLE.pkl + actions_file: /home/ruben/Desktop/my-RL-Studio/rl_studio/checkpoints/mountain_car/actions_set_20221226_1814 algorithm: - qlearn_mountain: - alpha: 0.9 - epsilon: 0.95 - gamma: 0.9 - epsilon_discount: 0.99995 - - dqn: - - sarsa: + alpha: 0.9 + epsilon: 0.95 + gamma: 0.9 + epsilon_discount: 0.99995 diff --git a/rl_studio/config/config_pendulum_ddpg.yaml b/rl_studio/config/config_pendulum_ddpg.yaml index 9a64a4023..462db43e4 100755 --- a/rl_studio/config/config_pendulum_ddpg.yaml +++ b/rl_studio/config/config_pendulum_ddpg.yaml @@ -4,6 +4,9 @@ settings: save_positions: True telemetry: False logging_level: info + mode: inference + agent: pendulum + algorithm: ddpg_torch # TODO make this section optional actions: @@ -11,37 +14,25 @@ actions: simple: agent: - pendulum: - # TODO To be removed - camera_params: - witdh: 640 - height: 480 + # TODO To be removed + camera_params: + witdh: 640 + height: 480 environments: - simple: env_name: Pendulum-v1 - environment_folder: cartpole + environment_folder: pendulum # runs: 20000 runs: 20000 full_experimentation_runs: 0 update_every: 20 show_every: 50 - objective_reward: -350 -# block_experience_batch: False - block_experience_batch: False - # random_start_level: 0.05 - random_start_level: 0 # Number between 0 and 1 that indicates the difficulty of the start position - random_perturbations_level: 0.8 # Number between 0 and 1 that indicates the frequency of the random perturbations - perturbations_intensity_std: 1 # Number between 0 and 1 that indicates the standard deviation of perturbations intensity - initial_pole_angle: 0 - non_recoverable_angle: 0.3 + objective_reward: -430 inference: - ddpg_torch: - inference_file: /home/ruben/Desktop/my-RL-Studio/rl_studio/logs/pendulum/ddpg/checkpoints/20221219_0112_actor_avg_-381.2062594472542.pkl + inference_file: /home/ruben/Desktop/my-RL-Studio/rl_studio/checkpoints/pendulum/ddpg/20221231_0100_actor_avg_-392.54141588266396.pkl algorithm: - ddpg_torch: - gamma: 0.99 - hidden_size: 512 - batch_size: 128 \ No newline at end of file + gamma: 0.99 + hidden_size: 512 + batch_size: 128 \ No newline at end of file diff --git a/rl_studio/config/config_pendulum_ppo.yaml b/rl_studio/config/config_pendulum_ppo.yaml new file mode 100755 index 000000000..0748963e4 --- /dev/null +++ b/rl_studio/config/config_pendulum_ppo.yaml @@ -0,0 +1,46 @@ +settings: + output_dir: "./logs/" + save_model: True + save_positions: True + telemetry: False + logging_level: info + mode: training + agent: pendulum + algorithm: ppo_continuous + +# TODO make this section optional +actions: + available_actions: + simple: + +agent: + # TODO To be removed + camera_params: + witdh: 640 + height: 480 + +environments: + env_name: Pendulum-v1 + environment_folder: pendulum +# runs: 20000 + runs: 20000 + full_experimentation_runs: 0 + update_every: 200 + show_every: 1000 + objective_reward: -350 +# block_experience_batch: False + block_experience_batch: False + # random_start_level: 0.05 + random_start_level: 0 # Number between 0 and 1 that indicates the difficulty of the start position + random_perturbations_level: 0.8 # Number between 0 and 1 that indicates the frequency of the random perturbations + perturbations_intensity_std: 1 # Number between 0 and 1 that indicates the standard deviation of perturbations intensity + initial_pole_angle: 0 + non_recoverable_angle: 0.3 + +inference: + inference_file: /home/ruben/Desktop/my-RL-Studio/rl_studio/logs/pendulum/ppo/checkpoints/20221231_0244_actor_avg_-803.8121022237663 + +algorithm: + gamma: 1 + epsilon: 0.15 + episodes_update: 5000 diff --git a/rl_studio/config/config_robot_mesh.yaml b/rl_studio/config/config_robot_mesh.yaml deleted file mode 100644 index 4256a4af9..000000000 --- a/rl_studio/config/config_robot_mesh.yaml +++ /dev/null @@ -1,106 +0,0 @@ -settings: - output_dir: "./logs/" - save_model: True - save_positions: True - telemetry: False - -actions: - actions_number: 4 - actions_set: simple - available_actions: - # [lineal, angular] - simple: - 0: [ 0, 0, -1, 0 ] - 1: [ 0, 0, 0.7, 0.7] - 2: [ 0, 0, 0, -1] - 3: [0, 0, -0.7, 0.7] - - complex: - 0: [ 0, 0, -1, 0 ] - 1: [ 0, 0, 0.7, 0.7 ] - 2: [ 0, 0, 0, -1 ] - 3: [ 0, 0, -0.7, 0.7 ] - -agent: - robot_mesh: - camera_params: - witdh: 640 - height: 480 - manual_robot: - camera_params: - witdh: 640 - height: 480 - -environments: - simple: - env_name: RobotMeshEnv-v0 - circuit_name: simple - training_type: qlearn_camera - launchfile: my_simple_world.launch - environment_folder: robot_mesh - robot_name: my_robot - start_pose: 1, 2 - alternate_pose: True - estimated_steps: 4000 - sensor: camera - circuit_positions_set: # x, y, z, roll, pith, ???. yaw - - [ 0, 53.462, -41.988, 0.004, 0, 0, 1.57, -1.57 ] - - [ 1, 53.462, -8.734, 0.004, 0, 0, 1.57, -1.57 ] - - [ 2, 39.712, -30.741, 0.004, 0, 0, 1.56, 1.56 ] - - [ 3, -6.861, -36.481, 0.004, 0, 0.01, -0.858, 0.613 ] - - [ 4, 20.043, 37.130, 0.003, 0, 0.103, -1.4383, -1.4383 ] - actions_force: 18 - boot_on_crash: False - goal_x: -6.5 - goal_y: 14 - pos_x: 6.4 - pos_y: -5 - pos_z: 0.3 - complex: - env_name: RobotMeshEnv-v0 - circuit_name: complex - training_type: qlearn_camera - launchfile: my_complex_world.launch - environment_folder: robot_mesh - robot_name: my_bigger_robot - start_pose: 0 - alternate_pose: True - estimated_steps: 4000 - sensor: camera - circuit_positions_set: # x, y, z, roll, pith, ???. yaw - - [ 0, 53.462, -41.988, 0.004, 0, 0, 1.57, -1.57] - - [ 1, 53.462, -8.734, 0.004, 0, 0, 1.57, -1.57 ] - - [ 2, 39.712, -30.741, 0.004, 0, 0, 1.56, 1.56 ] - - [ 3, -6.861, -36.481, 0.004, 0, 0.01, -0.858, 0.613 ] - - [ 4, 20.043, 37.130, 0.003, 0, 0.103, -1.4383, -1.4383 ] - actions_force: 24 - boot_on_crash: False - goal_x: -100 - goal_y: 14 - pos_x: 17 - pos_y: -13 - pos_z: 0.3 - # manual: - # env_name: F1Env-v0 - # circuit_name: manual - # training_type: qlearn_camera - # launch: simple_circuit.launch - # start_pose: 0 - # alternate_pose: False - # estimated_steps: 4000 - # sensor: camera - -inference: - qlearn_mesh: - inference_file: /home/rubenlucas93/qvalues.pkl - actions_file: /home/rubenlucas93/actions_set.pkl - -algorithm: - qlearn_mesh: - alpha: 0.2 - epsilon: 0.95 - gamma: 0.9 - - dqn: - - sarsa: diff --git a/rl_studio/config/config_robot_mesh_qlearn.yaml b/rl_studio/config/config_robot_mesh_qlearn.yaml new file mode 100644 index 000000000..3676ce3cd --- /dev/null +++ b/rl_studio/config/config_robot_mesh_qlearn.yaml @@ -0,0 +1,86 @@ +settings: + output_dir: "./logs/" + save_model: True + save_positions: True + telemetry: False + mode: inference + agent: robot_mesh + algorithm: qlearn + +actions: + 0: [ 0, 0, -1, 0 ] + 1: [ 0, 0, 0.7, 0.7] + 2: [ 0, 0, 0, -1] + 3: [0, 0, -0.7, 0.7] + +agent: + camera_params: + witdh: 640 + height: 480 + +environments: +# simple: +# env_name: RobotMeshEnv-v0 +# circuit_name: simple +# training_type: qlearn_camera +# launchfile: my_simple_world.launch +# environment_folder: robot_mesh +# robot_name: my_robot +# start_pose: 1, 2 +# alternate_pose: True +# estimated_steps: 4000 +# sensor: camera +# circuit_positions_set: # x, y, z, roll, pith, ???. yaw +# - [ 0, 53.462, -41.988, 0.004, 0, 0, 1.57, -1.57 ] +# - [ 1, 53.462, -8.734, 0.004, 0, 0, 1.57, -1.57 ] +# - [ 2, 39.712, -30.741, 0.004, 0, 0, 1.56, 1.56 ] +# - [ 3, -6.861, -36.481, 0.004, 0, 0.01, -0.858, 0.613 ] +# - [ 4, 20.043, 37.130, 0.003, 0, 0.103, -1.4383, -1.4383 ] +# actions_force: 18 +# boot_on_crash: False +# goal_x: -6.5 +# goal_y: 14 +# pos_x: 6.4 +# pos_y: -5 +# pos_z: 0.3 + env_name: RobotMeshEnv-v0 + circuit_name: complex + training_type: qlearn_camera + launchfile: my_complex_world.launch + environment_folder: robot_mesh + robot_name: my_robot + start_pose: 0 + alternate_pose: True + estimated_steps: 4000 + sensor: camera + circuit_positions_set: # x, y, z, roll, pith, ???. yaw + - [ 0, 53.462, -41.988, 0.004, 0, 0, 1.57, -1.57] + - [ 1, 53.462, -8.734, 0.004, 0, 0, 1.57, -1.57 ] + - [ 2, 39.712, -30.741, 0.004, 0, 0, 1.56, 1.56 ] + - [ 3, -6.861, -36.481, 0.004, 0, 0.01, -0.858, 0.613 ] + - [ 4, 20.043, 37.130, 0.003, 0, 0.103, -1.4383, -1.4383 ] + actions_force: 24 + boot_on_crash: False + goal_x: -100 + goal_y: 14 + pos_x: 17 + pos_y: -13 + pos_z: 0.3 + # manual: + # env_name: F1Env-v0 + # circuit_name: manual + # training_type: qlearn_camera + # launch: simple_circuit.launch + # start_pose: 0 + # alternate_pose: False + # estimated_steps: 4000 + # sensor: camera + +inference: + inference_file: /home/ruben/Desktop/my-RL-Studio/rl_studio/checkpoints/robot_mesh/1_20221227_0306_epsilon_0.05_QTABLE.pkl + actions_file: /home/ruben/Desktop/my-RL-Studio/rl_studio/checkpoints/robot_mesh/actions_set_20221227_0657 + +algorithm: + alpha: 0.2 + epsilon: 0.95 + gamma: 0.9 diff --git a/rl_studio/config/config_training_followlane_ddpg_f1_gazebo.yaml b/rl_studio/config/config_training_followlane_ddpg_f1_gazebo.yaml new file mode 100644 index 000000000..58310cdda --- /dev/null +++ b/rl_studio/config/config_training_followlane_ddpg_f1_gazebo.yaml @@ -0,0 +1,151 @@ +settings: + mode: training # training, retraining, inference + task: follow_lane_gazebo # follow_line_gazebo, follow_lane_gazebo, autoparking_gazebo + algorithm: ddpg # qlearn, dqn, ddpg, ppo + simulator: gazebo # openai, carla, gazebo, sumo + environment_set: gazebo_environments # gazebo_environments, carla_environments + env: simple # simple, nurburgring, montreal, curves, simple_laser, manual, autoparking + agent: f1 # f1, autoparkingRL, auto_carla, mountain_car, robot_mesh, cartpole, turtlebot + actions: simple # continuous, simple, medium, hard, test, autoparking_simple + states: sp1 #image, sp1 (simplified perception with 1 point), sp3 (simplified perception with 3 points), spn (simplified perception with n points) + rewards: follow_right_lane_only_center # follow_right_lane_only_center, follow_right_lane_center_v_step, follow_right_lane_center_v_w_linear + framework: TensorFlow # TensorFlow, Pytorch + total_episodes: 1000 + training_time: 6 + models_dir: "./checkpoints" + logs_dir: "./logs" + metrics_dir: "./metrics" + +ros: + ros_master_uri: "11311" + gazebo_master_uri: "11345" + +retraining: + ddpg: + retrain_ddpg_tf_actor_model_name: "20230111_DDPG_Actor_conv2d32x64_Critic_conv2d32x64_BESTLAP_ACTOR_Max-69_Epoch-4_State-image_Actions-continuous_Rewards-follow_right_lane_center_v_w_linear_inTime-20230111-200026.h5" + retrain_ddpg_tf_critic_model_name: "20230111_DDPG_Actor_conv2d32x64_Critic_conv2d32x64_BESTLAP_CRITIC_Max-69_Epoch-4_State-image_Actions-continuous_Rewards-follow_right_lane_center_v_w_linear_inTime-20230111-200026.h5" + +inference: + ddpg: + inference_ddpg_tf_actor_model_name: + inference_ddpg_tf_critic_model_name: + +algorithm: + ddpg: + gamma: 0.9 + tau: 0.005 + std_dev: 0.2 + model_name: DDPG_Actor_conv2d32x64_Critic_conv2d32x64 + replay_memory_size: 50_000 + memory_fraction: 0.20 + critic_lr: 0.002 + actor_lr: 0.001 + buffer_capacity: 100_000 + batch_size: 64 + +agents: + f1: + camera_params: + width: 640 + height: 480 + center_image: 320 + raw_image: False + image_resizing: 100 + new_image_size: 32 + num_regions: 16 + lower_limit: 220 + +states: + image: + 0: [3] + sp1: + 0: [50] + sp3: + 0: [5, 15, 22] + sp5: + 0: [3, 5, 10, 15, 20] + spn: + 0: [10] + +actions: + simple: + 0: [3, 0] + 1: [2, 1] + 2: [2, -1] + medium: + 0: [3, 0] + 1: [2, 1] + 2: [2, -1] + 3: [1, 1.5] + 4: [1, -1.5] + hard: + 0: [3, 0] + 1: [2, 1] + 2: [2, -1] + 3: [1.5, 1] + 4: [1.5, -1] + 5: [1, -1.5] + 6: [1, -1.5] + test: + 0: [0, 0] + continuous: + v: [2, 30] + w: [-3, 3] + +rewards: + follow_right_lane_only_center: + from_10: 10 + from_02: 2 + from_01: 1 + penal: -100 + min_reward: 5_000 + highest_reward: 100 + follow_right_lane_center_v_step: + from_10: 10 + from_02: 2 + from_01: 1 + penal: -100 + min_reward: 5_000 + highest_reward: 100 + follow_right_lane_center_v_w_linear: # only for continuous actions + beta_0: 3 + beta_1: -0.1 + penal: 0 + min_reward: 1_000 + highest_reward: 100 + +gazebo_environments: + simple: + env_name: F1Env-v0 + circuit_name: simple + launchfile: simple_circuit_no_wall.launch #simple_circuit.launch + environment_folder: f1 + robot_name: f1_renault + model_state_name: f1_renault_multicamera_multilaser #f1_renault, f1_renault_multicamera_multilaser + start_pose: 0 # 0, 1, 2, 3, 4 + alternate_pose: False + estimated_steps: 15_000 + sensor: camera + save_episodes: 10 + save_every_step: 1000 + lap_completed: False + save_model: True + save_positions: True + debug_level: DEBUG + telemetry: False + telemetry_mask: False + plotter_graphic: False + circuit_positions_set: + 0: [52.462, -41.988, 0.004, 0, 0, 1.57, -1.57] + #0: [52.800, -12.734, 0.004, 0, 0, 1.57, -1.57] # near to first curve + #0: [52.800, -8.734, 0.004, 0, 0, 1.57, -1.57] # Finish line + #0: [53.462, -41.988, 0.004, 0, 0, 1.57, -1.57] + 1: [52.97, -42.06, 0.004, 0, 0, 1.57, -1.57] + #1: [53.462, -8.734, 0.004, 0, 0, 1.57, -1.57] + #2: [39.712, -30.741, 0.004, 0, 0, 1.56, 1.56] + 2: [40.2, -30.741, 0.004, 0, 0, 1.56, 1.56] + #3: [-6.861, -36.481, 0.004, 0, 0.01, -0.858, 0.613] + 3: [0, 31.15, 0.004, 0, 0.01, 0, 0.31] + #4: [20.043, 37.130, 0.003, 0, 0.103, -1.4383, -1.4383] + 4: [19.25, 43.50, 0.004, 0, 0.0, 1.57, -1.69] + 5: [52.800, -35.486, 0.004, 0, 0, 1.57, -1.57] # near to first curve diff --git a/rl_studio/config/config_training_followlane_dqn_f1_gazebo.yaml b/rl_studio/config/config_training_followlane_dqn_f1_gazebo.yaml new file mode 100644 index 000000000..e69e66367 --- /dev/null +++ b/rl_studio/config/config_training_followlane_dqn_f1_gazebo.yaml @@ -0,0 +1,143 @@ +settings: + mode: training # training, retraining, inference + task: follow_lane_gazebo # follow_line_gazebo, follow_lane_gazebo, autoparking_gazebo + algorithm: dqn # qlearn, dqn, ddpg, ppo + simulator: gazebo # openai, carla, gazebo, sumo + environment_set: gazebo_environments # gazebo_environments, carla_environments + env: simple # simple, nurburgring, montreal, curves, simple_laser, manual, autoparking + agent: f1 # f1, autoparkingRL, auto_carla, mountain_car, robot_mesh, cartpole, turtlebot + actions: simple # simple, medium, hard, test + states: sp1 #image, sp1 (simplified perception with 1 point), sp3 (simplified perception with 3 points), spn (simplified perception with n points) + rewards: follow_right_lane_only_center # follow_right_lane_only_center, follow_right_lane_center_v_step + framework: TensorFlow # TensorFlow, Pytorch + total_episodes: 1_000 + training_time: 5 + models_dir: "./checkpoints" + logs_dir: "./logs" + metrics_dir: "./metrics" + +ros: + ros_master_uri: "11311" + gazebo_master_uri: "11345" + +retraining: + dqn: + retrain_dqn_tf_model_name: "DQN_sp_16x16_LAPCOMPLETED_Max165_Epoch1_inTime20221222-171814.model" + +inference: + dqn: + inference_dqn_tf_model_name: + +algorithm: + dqn: + alpha: 0.8 + gamma: 0.9 + epsilon: 0.95 # in Retraining mode is convenient to reduce, i.e. 0.45 + epsilon_discount: 0.9986 + epsilon_min: 0.05 + model_name: DQN_sp_16x16 #DQN_im_32x64 + replay_memory_size: 50_000 + min_replay_memory_size: 1000 + minibatch_size: 64 + update_target_every: 5 + memory_fraction: 0.20 + buffer_capacity: 100_000 + batch_size: 64 + +agents: + f1: + camera_params: + width: 640 + height: 480 + center_image: 320 + raw_image: False + image_resizing: 100 + new_image_size: 32 + num_regions: 16 + lower_limit: 220 + +states: + image: + 0: [50] + sp1: + 0: [50] + sp3: + 0: [5, 15, 22] + sp5: + 0: [3, 5, 10, 15, 20] + spn: + 0: [10] + +actions: + simple: + 0: [3, 0] + 1: [2, 1] + 2: [2, -1] + medium: + 0: [3, 0] + 1: [2, 1] + 2: [2, -1] + 3: [1, 1.5] + 4: [1, -1.5] + hard: + 0: [3, 0] + 1: [2, 1] + 2: [2, -1] + 3: [1.5, 1] + 4: [1.5, -1] + 5: [1, -1.5] + 6: [1, -1.5] + test: + 0: [0, 0] + +rewards: + follow_right_lane_only_center: + from_10: 10 + from_02: 2 + from_01: 1 + penal: -100 + min_reward: 5_000 + highest_reward: 100 + follow_right_lane_center_v_step: + from_10: 10 + from_02: 2 + from_01: 1 + penal: -100 + min_reward: 5_000 + highest_reward: 100 + +gazebo_environments: + simple: + env_name: F1Env-v0 + circuit_name: simple + launchfile: simple_circuit_no_wall.launch + environment_folder: f1 + robot_name: f1_renault + model_state_name: f1_renault_multicamera_multilaser # f1_renault_multicamera_multilaser, f1_renault + start_pose: 0 # 0, 1, 2, 3, 4 + alternate_pose: False + estimated_steps: 15_000 + sensor: camera + save_episodes: 10 + save_every_step: 1_000 + lap_completed: False + save_model: True + save_positions: True + debug_level: DEBUG + telemetry: False + telemetry_mask: False + plotter_graphic: False + circuit_positions_set: + 0: [52.462, -41.988, 0.004, 0, 0, 1.57, -1.57] + #0: [52.800, -12.734, 0.004, 0, 0, 1.57, -1.57] # near to first curve + #0: [52.800, -8.734, 0.004, 0, 0, 1.57, -1.57] # Finish line + #0: [53.462, -41.988, 0.004, 0, 0, 1.57, -1.57] + 1: [52.97, -42.06, 0.004, 0, 0, 1.57, -1.57] + #1: [53.462, -8.734, 0.004, 0, 0, 1.57, -1.57] + #2: [39.712, -30.741, 0.004, 0, 0, 1.56, 1.56] + 2: [40.2, -30.741, 0.004, 0, 0, 1.56, 1.56] + #3: [-6.861, -36.481, 0.004, 0, 0.01, -0.858, 0.613] + 3: [0, 31.15, 0.004, 0, 0.01, 0, 0.31] + #4: [20.043, 37.130, 0.003, 0, 0.103, -1.4383, -1.4383] + 4: [19.25, 43.50, 0.004, 0, 0.0, 1.57, -1.69] + 5: [52.800, -35.486, 0.004, 0, 0, 1.57, -1.57] # near to first curve diff --git a/rl_studio/config/config_training_followlane_qlearn_f1_gazebo.yaml b/rl_studio/config/config_training_followlane_qlearn_f1_gazebo.yaml new file mode 100644 index 000000000..9769862ed --- /dev/null +++ b/rl_studio/config/config_training_followlane_qlearn_f1_gazebo.yaml @@ -0,0 +1,132 @@ +settings: + mode: retraining # training, retraining + task: follow_lane_gazebo # follow_line_gazebo + algorithm: qlearn # qlearn + simulator: gazebo # openai, carla, gazebo, sumo + environment_set: gazebo_environments # gazebo_environments, carla_environments + env: simple # simple, nurburgring, montreal, curves, simple_laser, manual, autoparking + agent: f1 # f1, autoparkingRL, auto_carla, mountain_car, robot_mesh, cartpole, turtlebot + actions: simple # simple, medium, hard, test, autoparking_simple + states: sp1 # sp1 (simplified perception with 1 point), sp3 (simplified perception with 3 points), spn (simplified perception with n points) + rewards: follow_right_lane_only_center # + framework: _ + total_episodes: 1_000 + training_time: 6 + models_dir: "./checkpoints" + logs_dir: "./logs" + metrics_dir: "./metrics" + +ros: + ros_master_uri: "11311" + gazebo_master_uri: "11345" + +retraining: + qlearn: + retrain_qlearn_model_name: "20230123-161229_Circuit-simple_States-sp1_Actions-simple_Rewards-follow_right_lane_only_center_epsilon-0.399_epoch-291_step-15001_reward-136707-qtable.npy" + +inference: + qlearn: + inference_qlearn_model_name: + +algorithm: + qlearn: + alpha: 0.2 + epsilon: 0.95 + epsilon_min: 0.05 + gamma: 0.9 + +agents: + f1: + camera_params: + width: 640 + height: 480 + center_image: 320 + raw_image: False + image_resizing: 100 + new_image_size: 32 + num_regions: 16 + lower_limit: 220 + +states: + sp1: + 0: [50] + sp3: + 0: [5, 15, 22] + sp5: + 0: [3, 5, 10, 15, 20] + spn: + 0: [10] + +actions: + simple: + 0: [3, 0] + 1: [2, 1] + 2: [2, -1] + medium: + 0: [3, 0] + 1: [2, 1] + 2: [2, -1] + 3: [1, 1.5] + 4: [1, -1.5] + hard: + 0: [3, 0] + 1: [2, 1] + 2: [2, -1] + 3: [1.5, 1] + 4: [1.5, -1] + 5: [1, -1.5] + 6: [1, -1.5] + test: + 0: [0, 0] + +rewards: + follow_right_lane_only_center: + from_10: 10 + from_02: 2 + from_01: 1 + penal: -100 + min_reward: 5_000 + highest_reward: 100 + follow_right_lane_center_v_step: + from_10: 10 + from_02: 2 + from_01: 1 + penal: -100 + min_reward: 5_000 + highest_reward: 100 + +gazebo_environments: + simple: + env_name: F1Env-v0 + circuit_name: simple + launchfile: simple_circuit_no_wall.launch #simple_circuit.launch + environment_folder: f1 + robot_name: f1_renault + model_state_name: f1_renault_multicamera_multilaser # f1_renault, + start_pose: 0 # 0, 1, 2, 3, 4 + alternate_pose: False + estimated_steps: 15_000 + sensor: camera + save_episodes: 10 + save_every_step: 1000 + lap_completed: False + save_model: True + save_positions: True + debug_level: DEBUG + telemetry: False + telemetry_mask: False + plotter_graphic: False + circuit_positions_set: + 0: [52.462, -41.988, 0.004, 0, 0, 1.57, -1.57] + #0: [52.800, -12.734, 0.004, 0, 0, 1.57, -1.57] # near to first curve + #0: [52.460, -8.734, 0.004, 0, 0, 1.57, -1.57] # Finish line + #0: [53.462, -41.988, 0.004, 0, 0, 1.57, -1.57] + 1: [52.97, -42.06, 0.004, 0, 0, 1.57, -1.57] + #1: [53.462, -8.734, 0.004, 0, 0, 1.57, -1.57] + #2: [39.712, -30.741, 0.004, 0, 0, 1.56, 1.56] + 2: [40.2, -30.741, 0.004, 0, 0, 1.56, 1.56] + #3: [-6.861, -36.481, 0.004, 0, 0.01, -0.858, 0.613] + 3: [0, 31.15, 0.004, 0, 0.01, 0, 0.31] + #4: [20.043, 37.130, 0.003, 0, 0.103, -1.4383, -1.4383] + 4: [19.25, 43.50, 0.004, 0, 0.0, 1.57, -1.69] + 5: [52.800, -35.486, 0.004, 0, 0, 1.57, -1.57] # near to first curve diff --git a/rl_studio/config/config_training_followline_ddpg_f1_gazebo.yaml b/rl_studio/config/config_training_followline_ddpg_f1_gazebo.yaml new file mode 100644 index 000000000..7f4a60460 --- /dev/null +++ b/rl_studio/config/config_training_followline_ddpg_f1_gazebo.yaml @@ -0,0 +1,136 @@ +settings: + mode: training # training, retraining, inference + task: follow_line_gazebo # follow_line_gazebo, follow_lane_gazebo, autoparking_gazebo + algorithm: ddpg # qlearn, dqn, ddpg, ppo + simulator: gazebo # openai, carla, gazebo, sumo + environment_set: gazebo_environments # gazebo_environments, carla_environments + env: simple # simple, nurburgring, montreal, curves, simple_laser, manual, autoparking + agent: f1 # f1, autoparkingRL, auto_carla, mountain_car, robot_mesh, cartpole, turtlebot + actions: continuous # continuous, simple, medium, hard, test, autoparking_simple + states: image #image, sp1 (simplified perception with 1 point), sp3 (simplified perception with 3 points), spn (simplified perception with n points) + rewards: followline_center # followline_center, followline_center_v_w_linear + framework: TensorFlow # TensorFlow, Pytorch + total_episodes: 5 + training_time: 6 + models_dir: "./checkpoints" + logs_dir: "./logs" + metrics_dir: "./metrics" + +ros: + ros_master_uri: "11311" + gazebo_master_uri: "11345" + +retraining: + ddpg: + retrain_ddpg_tf_actor_model_name: "20230111_DDPG_Actor_conv2d32x64_Critic_conv2d32x64_BATCH_ACTOR_Max--100_Epoch-5_State-image_Actions-continuous_Rewards-followline_center_inTime-20230111-183048.h5" + retrain_ddpg_tf_critic_model_name: "20230111_DDPG_Actor_conv2d32x64_Critic_conv2d32x64_BATCH_CRITIC_Max--100_Epoch-5_State-image_Actions-continuous_Rewards-followline_center_inTime-20230111-183048.h5" + +inference: + ddpg: + inference_ddpg_tf_actor_model_name: + inference_ddpg_tf_critic_model_name: + +algorithm: + ddpg: + gamma: 0.9 + tau: 0.005 + std_dev: 0.2 + model_name: DDPG_Actor_conv2d32x64_Critic_conv2d32x64 + replay_memory_size: 50_000 + memory_fraction: 0.20 + critic_lr: 0.002 + actor_lr: 0.001 + buffer_capacity: 100_000 + batch_size: 64 + +agents: + f1: + camera_params: + width: 640 + height: 480 + center_image: 320 + raw_image: False + image_resizing: 100 + new_image_size: 32 + num_regions: 16 + lower_limit: 220 + +states: + image: + 0: [3] + sp1: + 0: [10] + sp3: + 0: [5, 15, 22] + sp5: + 0: [3, 5, 10, 15, 20] + spn: + 0: [10] + +actions: + simple: + 0: [3, 0] + 1: [2, 1] + 2: [2, -1] + medium: + 0: [3, 0] + 1: [2, 1] + 2: [2, -1] + 3: [1, 1.5] + 4: [1, -1.5] + hard: + 0: [3, 0] + 1: [2, 1] + 2: [2, -1] + 3: [1.5, 1] + 4: [1.5, -1] + 5: [1, -1.5] + 6: [1, -1.5] + test: + 0: [0, 0] + continuous: + v: [2, 30] + w: [-3, 3] + +rewards: + followline_center: + from_10: 10 + from_02: 2 + from_01: 1 + penal: -100 + min_reward: 5_000 + highest_reward: 100 + followline_center_v_w_linear: # only for continuous actions + beta_0: 3 + beta_1: -0.1 + penal: 0 + min_reward: 1_000 + highest_reward: 100 + +gazebo_environments: + simple: + env_name: F1Env-v0 + circuit_name: simple + launchfile: simple_circuit.launch + environment_folder: f1 + robot_name: f1_renault + model_state_name: f1_renault # f1_renault_multicamera_multilaser + start_pose: 0 # 0, 1, 2, 3, 4 + alternate_pose: False + estimated_steps: 100 + sensor: camera + save_episodes: 5 + save_every_step: 10 + lap_completed: False + save_model: True + save_positions: True + debug_level: DEBUG + telemetry: False + telemetry_mask: False + plotter_graphic: False + circuit_positions_set: + 0: [53.462, -41.988, 0.004, 0, 0, 1.57, -1.57] + 1: [53.462, -8.734, 0.004, 0, 0, 1.57, -1.57] #finish line + 2: [39.712, -30.741, 0.004, 0, 0, 1.56, 1.56] + 3: [-6.861, -36.481, 0.004, 0, 0.01, -0.858, 0.613] + 4: [20.043, 37.130, 0.003, 0, 0.103, -1.4383, -1.4383] diff --git a/rl_studio/config/config_training_followline_dqn_f1_gazebo.yaml b/rl_studio/config/config_training_followline_dqn_f1_gazebo.yaml new file mode 100644 index 000000000..6317e746c --- /dev/null +++ b/rl_studio/config/config_training_followline_dqn_f1_gazebo.yaml @@ -0,0 +1,128 @@ +settings: + mode: retraining # training, retraining, inference + task: follow_line_gazebo # follow_line_gazebo, follow_lane_gazebo, autoparking_gazebo + algorithm: dqn # qlearn, dqn, ddpg, ppo + simulator: gazebo # openai, carla, gazebo, sumo + environment_set: gazebo_environments # gazebo_environments, carla_environments + env: simple # simple, nurburgring, montreal, curves, simple_laser, manual, autoparking + agent: f1 # f1, autoparkingRL, auto_carla, mountain_car, robot_mesh, cartpole, turtlebot + actions: simple # simple, medium, hard, test + states: sp1 #image, sp1 (simplified perception with 1 point), sp3 (simplified perception with 3 points), spn (simplified perception with n points) + rewards: followline_center # followline_center + framework: TensorFlow # TensorFlow, Pytorch + total_episodes: 5 + training_time: 6 + models_dir: "./checkpoints" + logs_dir: "./logs" + metrics_dir: "./metrics" + +ros: + ros_master_uri: "11311" + gazebo_master_uri: "11345" + +retraining: + dqn: + retrain_dqn_tf_model_name: "DQN_sp_16x16_LAPCOMPLETED_Max990_Epoch3_inTime20230110-180135.model" + +inference: + dqn: + inference_dqn_tf_model_name: + +algorithm: + dqn: + alpha: 0.8 + gamma: 0.9 + epsilon: 0.99 + epsilon_discount: 0.9986 + epsilon_min: 0.05 + model_name: DQN_sp_16x16 + replay_memory_size: 50_000 + min_replay_memory_size: 1000 + minibatch_size: 64 + update_target_every: 5 + memory_fraction: 0.20 + buffer_capacity: 100_000 + batch_size: 64 + +agents: + f1: + camera_params: + width: 640 + height: 480 + center_image: 320 + raw_image: False + image_resizing: 100 + new_image_size: 32 + num_regions: 16 + lower_limit: 220 + +states: + image: + 0: [3] + sp1: + 0: [10] + sp3: + 0: [5, 15, 22] + sp5: + 0: [3, 5, 10, 15, 20] + spn: + 0: [10] + +actions: + simple: + 0: [3, 0] + 1: [2, 1] + 2: [2, -1] + medium: + 0: [3, 0] + 1: [2, 1] + 2: [2, -1] + 3: [1, 1.5] + 4: [1, -1.5] + hard: + 0: [3, 0] + 1: [2, 1] + 2: [2, -1] + 3: [1.5, 1] + 4: [1.5, -1] + 5: [1, -1.5] + 6: [1, -1.5] + test: + 0: [0, 0] + +rewards: + followline_center: + from_10: 10 + from_02: 2 + from_01: 1 + penal: -100 + min_reward: 5_000 + highest_reward: 100 + +gazebo_environments: + simple: + env_name: F1Env-v0 + circuit_name: simple + launchfile: simple_circuit.launch + environment_folder: f1 + robot_name: f1_renault + model_state_name: f1_renault # f1_renault_multicamera_multilaser + start_pose: 0 # 0, 1, 2, 3, 4 + alternate_pose: False + estimated_steps: 20 + sensor: camera + save_episodes: 5 + save_every_step: 10 + lap_completed: False + save_model: True + save_positions: True + debug_level: DEBUG + telemetry: False + telemetry_mask: False + plotter_graphic: False + circuit_positions_set: + 0: [53.462, -41.988, 0.004, 0, 0, 1.57, -1.57] + 1: [53.462, -8.734, 0.004, 0, 0, 1.57, -1.57] #finish line + 2: [39.712, -30.741, 0.004, 0, 0, 1.56, 1.56] + 3: [-6.861, -36.481, 0.004, 0, 0.01, -0.858, 0.613] + 4: [20.043, 37.130, 0.003, 0, 0.103, -1.4383, -1.4383] diff --git a/rl_studio/config/config_training_followline_qlearn_f1_gazebo.yaml b/rl_studio/config/config_training_followline_qlearn_f1_gazebo.yaml new file mode 100644 index 000000000..bd2c0a0bb --- /dev/null +++ b/rl_studio/config/config_training_followline_qlearn_f1_gazebo.yaml @@ -0,0 +1,117 @@ +settings: + mode: training # training, retraining + task: follow_line_gazebo # follow_line_gazebo + algorithm: qlearn # qlearn + simulator: gazebo # openai, carla, gazebo, sumo + environment_set: gazebo_environments # gazebo_environments, carla_environments + env: simple # simple, nurburgring, montreal, curves, simple_laser, manual, autoparking + agent: f1 # f1, autoparkingRL, auto_carla, mountain_car, robot_mesh, cartpole, turtlebot + actions: simple # simple, medium, hard, test, autoparking_simple + states: sp1 # sp1 (simplified perception with 1 point), sp3 (simplified perception with 3 points), spn (simplified perception with n points) + rewards: followline_center # + framework: _ + total_episodes: 10 + training_time: 10 + models_dir: "./checkpoints" + logs_dir: "./logs" + metrics_dir: "./metrics" + +ros: + ros_master_uri: "11311" + gazebo_master_uri: "11345" + +retraining: + qlearn: + retrain_qlearn_model_name: "20230105-174932_Circuit-simple_States-sp1_Actions-simple_Rewards-followline_center_epsilon-0.05_epoch-10_step-15001_reward-134200-qtable.npy" + +inference: + qlearn: + inference_qlearn_model_name: + +algorithm: + qlearn: + alpha: 0.2 + epsilon: 0.95 + epsilon_min: 0.05 + gamma: 0.9 + +agents: + f1: + camera_params: + width: 640 + height: 480 + center_image: 320 + raw_image: False + image_resizing: 100 + new_image_size: 32 + num_regions: 16 + lower_limit: 220 + +states: + sp1: + 0: [10] + sp3: + 0: [5, 15, 22] + sp5: + 0: [3, 5, 10, 15, 20] + spn: + 0: [10] + +actions: + simple: + 0: [3, 0] + 1: [2, 1] + 2: [2, -1] + medium: + 0: [3, 0] + 1: [2, 1] + 2: [2, -1] + 3: [1, 1.5] + 4: [1, -1.5] + hard: + 0: [3, 0] + 1: [2, 1] + 2: [2, -1] + 3: [1.5, 1] + 4: [1.5, -1] + 5: [1, -1.5] + 6: [1, -1.5] + test: + 0: [0, 0] + +rewards: + followline_center: + from_10: 10 + from_02: 2 + from_01: 1 + penal: -100 + min_reward: 5_000 + highest_reward: 100 + +gazebo_environments: + simple: + env_name: F1Env-v0 + circuit_name: simple + launchfile: simple_circuit.launch + environment_folder: f1 + robot_name: f1_renault + model_state_name: f1_renault # + start_pose: 0 # 0, 1, 2, 3, 4 + alternate_pose: False + estimated_steps: 15000 + sensor: camera + save_episodes: 1 + save_every_step: 1000 + lap_completed: False + save_model: True + save_positions: True + debug_level: DEBUG + telemetry: False + telemetry_mask: False + plotter_graphic: False + circuit_positions_set: + 0: [53.462, -41.988, 0.004, 0, 0, 1.57, -1.57] + 1: [53.462, -8.734, 0.004, 0, 0, 1.57, -1.57] #finish line + 2: [39.712, -30.741, 0.004, 0, 0, 1.56, 1.56] + 3: [-6.861, -36.481, 0.004, 0, 0.01, -0.858, 0.613] + 4: [20.043, 37.130, 0.003, 0, 0.103, -1.4383, -1.4383] diff --git a/rl_studio/docs/gazebo_screenshot.png b/rl_studio/docs/gazebo_screenshot.png new file mode 100644 index 000000000..2996b055c Binary files /dev/null and b/rl_studio/docs/gazebo_screenshot.png differ diff --git a/rl_studio/docs/rlstudio-diagram.excalidraw b/rl_studio/docs/rlstudio-diagram.excalidraw index e14f9fec0..702c8f58d 100644 --- a/rl_studio/docs/rlstudio-diagram.excalidraw +++ b/rl_studio/docs/rlstudio-diagram.excalidraw @@ -4,31 +4,27 @@ "source": "https://excalidraw.com", "elements": [ { - "id": "so8UXkj9nhsVv7gbekxQC", "type": "rectangle", - "x": 1327, - "y": 528, - "width": 134.99999999999997, - "height": 81, - "angle": 0, - "strokeColor": "#000000", - "backgroundColor": "#fa5252", + "version": 293, + "versionNonce": 122746763, + "isDeleted": false, + "id": "so8UXkj9nhsVv7gbekxQC", "fillStyle": "hachure", "strokeWidth": 1, "strokeStyle": "solid", "roughness": 1, "opacity": 100, - "groupIds": [], - "strokeSharpness": "sharp", + "angle": 0, + "x": 1426.277099609375, + "y": 526.5260620117188, + "strokeColor": "#000000", + "backgroundColor": "#fa5252", + "width": 134.99999999999997, + "height": 81, "seed": 2011900523, - "version": 224, - "versionNonce": 1505557445, - "isDeleted": false, + "groupIds": [], + "roundness": null, "boundElements": [ - { - "id": "EtLiZhEZaa9WLgzibOC4Z", - "type": "arrow" - }, { "id": "qG81XyLuPfPbdcDtHu51L", "type": "arrow" @@ -42,12 +38,14 @@ "type": "arrow" } ], - "updated": 1642362303106 + "updated": 1672159886595, + "link": null, + "locked": false }, { "type": "rectangle", - "version": 245, - "versionNonce": 1769295627, + "version": 314, + "versionNonce": 505266373, "isDeleted": false, "id": "iO4YUTo6wpPyvPrjSiH2E", "fillStyle": "hachure", @@ -56,20 +54,16 @@ "roughness": 1, "opacity": 100, "angle": 0, - "x": 1501, - "y": 524, + "x": 1600.277099609375, + "y": 522.5260620117188, "strokeColor": "#000000", "backgroundColor": "#fd7e14", "width": 159, "height": 81.99999999999999, "seed": 797132229, "groupIds": [], - "strokeSharpness": "sharp", + "roundness": null, "boundElements": [ - { - "id": "35gsK7a88pUj2QOJPFHu_", - "type": "arrow" - }, { "id": "ER8F9_6BgoPz74mJ_P0lE", "type": "arrow" @@ -83,12 +77,14 @@ "type": "arrow" } ], - "updated": 1642362303106 + "updated": 1672159886596, + "link": null, + "locked": false }, { "type": "rectangle", - "version": 196, - "versionNonce": 244529957, + "version": 265, + "versionNonce": 837284331, "isDeleted": false, "id": "6SO22MQMvMxyiJkOdh9pQ", "fillStyle": "hachure", @@ -97,20 +93,16 @@ "roughness": 1, "opacity": 100, "angle": 0, - "x": 1691, - "y": 526, + "x": 1790.277099609375, + "y": 524.5260620117188, "strokeColor": "#000000", "backgroundColor": "#15aabf", "width": 148, "height": 83, "seed": 1905304843, "groupIds": [], - "strokeSharpness": "sharp", + "roundness": null, "boundElements": [ - { - "id": "A75jfXlo10-Lm9-5tsORl", - "type": "arrow" - }, { "id": "zAqc0tOxaphg3HEg5nuwH", "type": "arrow" @@ -124,108 +116,116 @@ "type": "arrow" } ], - "updated": 1642362303106 + "updated": 1672159886596, + "link": null, + "locked": false }, { - "id": "vK0ZH-nWppWFckecBqgSV", "type": "text", - "x": 1356, - "y": 556, - "width": 67, - "height": 25, - "angle": 0, - "strokeColor": "#000000", - "backgroundColor": "transparent", + "version": 127, + "versionNonce": 2637925, + "isDeleted": false, + "id": "vK0ZH-nWppWFckecBqgSV", "fillStyle": "hachure", "strokeWidth": 1, "strokeStyle": "solid", "roughness": 1, "opacity": 100, - "groupIds": [], - "strokeSharpness": "sharp", + "angle": 0, + "x": 1455.277099609375, + "y": 554.5260620117188, + "strokeColor": "#000000", + "backgroundColor": "transparent", + "width": 67, + "height": 25, "seed": 687883557, - "version": 58, - "versionNonce": 1735346251, - "isDeleted": false, - "boundElements": null, - "updated": 1642362303107, - "text": "agents", + "groupIds": [], + "roundness": null, + "boundElements": [], + "updated": 1672159886597, + "link": null, + "locked": false, "fontSize": 20, "fontFamily": 1, + "text": "agents", + "baseline": 18, "textAlign": "left", "verticalAlign": "top", - "baseline": 18, "containerId": null, "originalText": "agents" }, { - "id": "A1UUzvSFO4FjS0AIXQ3cR", "type": "text", - "x": 1532, - "y": 551, - "width": 98, - "height": 25, - "angle": 0, - "strokeColor": "#000000", - "backgroundColor": "transparent", + "version": 171, + "versionNonce": 633742443, + "isDeleted": false, + "id": "A1UUzvSFO4FjS0AIXQ3cR", "fillStyle": "hachure", "strokeWidth": 1, "strokeStyle": "solid", "roughness": 1, "opacity": 100, - "groupIds": [], - "strokeSharpness": "sharp", + "angle": 0, + "x": 1631.277099609375, + "y": 549.5260620117188, + "strokeColor": "#000000", + "backgroundColor": "transparent", + "width": 98, + "height": 25, "seed": 435014571, - "version": 102, - "versionNonce": 4224485, - "isDeleted": false, - "boundElements": null, - "updated": 1642362303107, - "text": "algorithms", + "groupIds": [], + "roundness": null, + "boundElements": [], + "updated": 1672159886597, + "link": null, + "locked": false, "fontSize": 20, "fontFamily": 1, + "text": "algorithms", + "baseline": 18, "textAlign": "left", "verticalAlign": "top", - "baseline": 18, "containerId": null, "originalText": "algorithms" }, { - "id": "ZXl022mJeT83QTZu30d8r", "type": "text", - "x": 1741, - "y": 552, - "width": 43, - "height": 25, - "angle": 0, - "strokeColor": "#000000", - "backgroundColor": "transparent", + "version": 147, + "versionNonce": 476900293, + "isDeleted": false, + "id": "ZXl022mJeT83QTZu30d8r", "fillStyle": "hachure", "strokeWidth": 1, "strokeStyle": "solid", "roughness": 1, "opacity": 100, - "groupIds": [], - "strokeSharpness": "sharp", + "angle": 0, + "x": 1840.277099609375, + "y": 550.5260620117188, + "strokeColor": "#000000", + "backgroundColor": "transparent", + "width": 43, + "height": 25, "seed": 493246597, - "version": 78, - "versionNonce": 2056652523, - "isDeleted": false, - "boundElements": null, - "updated": 1642362303107, - "text": "envs", + "groupIds": [], + "roundness": null, + "boundElements": [], + "updated": 1672159886597, + "link": null, + "locked": false, "fontSize": 20, "fontFamily": 1, + "text": "envs", + "baseline": 18, "textAlign": "left", "verticalAlign": "top", - "baseline": 18, "containerId": null, "originalText": "envs" }, { "type": "rectangle", - "version": 259, - "versionNonce": 652374341, + "version": 328, + "versionNonce": 1576820491, "isDeleted": false, "id": "HJhc_4H0EWQF6BPGBnwFZ", "fillStyle": "hachure", @@ -234,20 +234,16 @@ "roughness": 1, "opacity": 100, "angle": 0, - "x": 1871.5, - "y": 525.5, + "x": 1970.777099609375, + "y": 524.0260620117188, "strokeColor": "#000000", "backgroundColor": "#fab005", "width": 134.99999999999997, "height": 81, "seed": 355278411, "groupIds": [], - "strokeSharpness": "sharp", + "roundness": null, "boundElements": [ - { - "id": "YUVAzkQxTEOjO8Y5NnMjJ", - "type": "arrow" - }, { "id": "E2os0JITN7iIRJyipXLW9", "type": "arrow" @@ -261,12 +257,14 @@ "type": "arrow" } ], - "updated": 1642362303107 + "updated": 1672159886597, + "link": null, + "locked": false }, { "type": "text", - "version": 106, - "versionNonce": 61600139, + "version": 184, + "versionNonce": 1511678277, "isDeleted": false, "id": "7ON6HizTKJ76Qww8aECJp", "fillStyle": "hachure", @@ -275,30 +273,32 @@ "roughness": 1, "opacity": 100, "angle": 0, - "x": 1900.5, - "y": 553.5, + "x": 1999.777099609375, + "y": 552.0260620117188, "strokeColor": "#000000", "backgroundColor": "transparent", - "width": 68, + "width": 90, "height": 25, "seed": 1987248101, "groupIds": [], - "strokeSharpness": "sharp", + "roundness": null, "boundElements": [], - "updated": 1642362303107, + "updated": 1672159886597, + "link": null, + "locked": false, "fontSize": 20, "fontFamily": 1, - "text": "gazebo", + "text": "simulator", "baseline": 18, "textAlign": "left", "verticalAlign": "top", "containerId": null, - "originalText": "gazebo" + "originalText": "simulator" }, { "type": "rectangle", - "version": 256, - "versionNonce": 1382940837, + "version": 775, + "versionNonce": 235609732, "isDeleted": false, "id": "WTDldziLTtQz8IskT5HTT", "fillStyle": "hachure", @@ -307,15 +307,15 @@ "roughness": 1, "opacity": 100, "angle": 0, - "x": 1596.5, - "y": 348.5, + "x": 1582.9285888671875, + "y": 312.6249542236328, "strokeColor": "#000000", "backgroundColor": "transparent", - "width": 134.99999999999997, + "width": 131.33612060546864, "height": 81, "seed": 1609171269, "groupIds": [], - "strokeSharpness": "sharp", + "roundness": null, "boundElements": [ { "id": "7tmiLF0VuxWnTf665kBe4", @@ -336,14 +336,24 @@ { "id": "E2os0JITN7iIRJyipXLW9", "type": "arrow" + }, + { + "id": "2fm-KXOwvvrYtUoVFHWmi", + "type": "arrow" + }, + { + "id": "HPyrpHB6_eezFm3o5_XzQ", + "type": "arrow" } ], - "updated": 1642362303107 + "updated": 1672160335747, + "link": null, + "locked": false }, { "type": "text", - "version": 100, - "versionNonce": 1407944747, + "version": 302, + "versionNonce": 1575784324, "isDeleted": false, "id": "j7BUxHy2lXxvGtE7YQAHD", "fillStyle": "hachure", @@ -352,30 +362,32 @@ "roughness": 1, "opacity": 100, "angle": 0, - "x": 1625.5, - "y": 376.5, + "x": 1594.823486328125, + "y": 164.10804748535156, "strokeColor": "#000000", "backgroundColor": "transparent", - "width": 65, + "width": 108, "height": 25, "seed": 1822777739, "groupIds": [], - "strokeSharpness": "sharp", + "roundness": null, "boundElements": [], - "updated": 1642362303107, + "updated": 1672160234060, + "link": null, + "locked": false, "fontSize": 20, "fontFamily": 1, - "text": "main.py", + "text": "rl-studio.py", "baseline": 18, - "textAlign": "left", + "textAlign": "center", "verticalAlign": "top", "containerId": null, - "originalText": "main.py" + "originalText": "rl-studio.py" }, { "type": "rectangle", - "version": 301, - "versionNonce": 1485874181, + "version": 465, + "versionNonce": 814291644, "isDeleted": false, "id": "3RSbd5N9EYa6ANcIBZyMO", "fillStyle": "hachure", @@ -384,15 +396,15 @@ "roughness": 1, "opacity": 100, "angle": 0, - "x": 1791.5, - "y": 269.5, + "x": 1775.4703369140625, + "y": 57.10804748535156, "strokeColor": "#000000", "backgroundColor": "transparent", "width": 135, "height": 81, "seed": 551379563, "groupIds": [], - "strokeSharpness": "sharp", + "roundness": null, "boundElements": [ { "type": "text", @@ -403,110 +415,127 @@ "type": "arrow" } ], - "updated": 1642362303107 + "updated": 1672160234060, + "link": null, + "locked": false }, { - "id": "y0Rb-C9dhpsFXVuA_lF8T", "type": "text", - "x": 1796.5, - "y": 297.5, - "width": 125, - "height": 25, - "angle": 0, - "strokeColor": "#000000", - "backgroundColor": "transparent", + "version": 183, + "versionNonce": 1278738180, + "isDeleted": false, + "id": "y0Rb-C9dhpsFXVuA_lF8T", "fillStyle": "hachure", "strokeWidth": 1, "strokeStyle": "solid", "roughness": 1, "opacity": 100, - "groupIds": [], - "strokeSharpness": "sharp", + "angle": 0, + "x": 1798.9703369140625, + "y": 85.10804748535156, + "strokeColor": "#000000", + "backgroundColor": "transparent", + "width": 88, + "height": 25, "seed": 212987429, - "version": 18, - "versionNonce": 826231499, - "isDeleted": false, - "boundElements": null, - "updated": 1642362303107, - "text": "config.yml", + "groupIds": [], + "roundness": null, + "boundElements": [], + "updated": 1672160234060, + "link": null, + "locked": false, "fontSize": 20, "fontFamily": 1, + "text": "config.yml", + "baseline": 18, "textAlign": "center", "verticalAlign": "middle", - "baseline": 18, "containerId": "3RSbd5N9EYa6ANcIBZyMO", "originalText": "config.yml" }, { - "id": "7tmiLF0VuxWnTf665kBe4", "type": "arrow", - "x": 1861, - "y": 364, - "width": 119, - "height": 35, - "angle": 0, - "strokeColor": "#000000", - "backgroundColor": "transparent", + "version": 998, + "versionNonce": 990233660, + "isDeleted": false, + "id": "7tmiLF0VuxWnTf665kBe4", "fillStyle": "hachure", "strokeWidth": 1, "strokeStyle": "solid", "roughness": 1, "opacity": 100, - "groupIds": [], - "strokeSharpness": "round", + "angle": 0, + "x": 1831.987513898659, + "y": 149.4240264892578, + "strokeColor": "#000000", + "backgroundColor": "transparent", + "width": 102.15694749240902, + "height": 30.241913390475816, "seed": 462649707, - "version": 94, - "versionNonce": 2101861221, - "isDeleted": false, - "boundElements": null, - "updated": 1642362303107, + "groupIds": [], + "roundness": { + "type": 2 + }, + "boundElements": [], + "updated": 1672160272949, + "link": null, + "locked": false, + "startBinding": { + "elementId": "3RSbd5N9EYa6ANcIBZyMO", + "focus": -0.8029573024933188, + "gap": 11.31597900390625 + }, + "endBinding": { + "elementId": "elQ-C2Q_n8NhztaIBanY4", + "focus": 0.4495470267453065, + "gap": 17.601705462147493 + }, + "lastCommittedPoint": null, + "startArrowhead": null, + "endArrowhead": "arrow", "points": [ [ 0, 0 ], [ - -119, - 35 + -102.15694749240902, + 30.241913390475816 ] - ], - "lastCommittedPoint": null, - "startBinding": { - "elementId": "3RSbd5N9EYa6ANcIBZyMO", - "focus": -0.9044834307992203, - "gap": 13.5 - }, - "endBinding": { - "elementId": "WTDldziLTtQz8IskT5HTT", - "focus": 0.5458089668615984, - "gap": 10.5 - }, - "startArrowhead": null, - "endArrowhead": "arrow" + ] }, { - "id": "hcWDkGa9IyQAOaXZQ37ye", "type": "line", - "x": 1215, - "y": 648, - "width": 917, - "height": 0, - "angle": 0, - "strokeColor": "#000000", - "backgroundColor": "transparent", + "version": 59, + "versionNonce": 1870657925, + "isDeleted": false, + "id": "hcWDkGa9IyQAOaXZQ37ye", "fillStyle": "hachure", "strokeWidth": 1, "strokeStyle": "dashed", "roughness": 1, "opacity": 100, - "groupIds": [], - "strokeSharpness": "round", + "angle": 0, + "x": 1215, + "y": 648, + "strokeColor": "#000000", + "backgroundColor": "transparent", + "width": 917, + "height": 0, "seed": 172715717, - "version": 59, - "versionNonce": 1870657925, - "isDeleted": false, - "boundElements": null, + "groupIds": [], + "roundness": { + "type": 2 + }, + "boundElements": [], "updated": 1642362303108, + "link": null, + "locked": false, + "startBinding": null, + "endBinding": null, + "lastCommittedPoint": null, + "startArrowhead": null, + "endArrowhead": null, "points": [ [ 0, @@ -516,75 +545,64 @@ 917, 0 ] - ], - "lastCommittedPoint": null, - "startBinding": null, - "endBinding": null, - "startArrowhead": null, - "endArrowhead": null + ] }, { - "id": "snfJdXezqe9tg9PCEzYG4", "type": "text", - "x": 1134, - "y": 531, - "width": 143, - "height": 70, - "angle": 0, - "strokeColor": "#000000", - "backgroundColor": "transparent", + "version": 128, + "versionNonce": 2134362684, + "isDeleted": false, + "id": "snfJdXezqe9tg9PCEzYG4", "fillStyle": "hachure", "strokeWidth": 1, "strokeStyle": "dashed", "roughness": 1, "opacity": 100, - "groupIds": [], - "strokeSharpness": "sharp", + "angle": 0, + "x": 1109.5123901367188, + "y": 530.9999694824219, + "strokeColor": "#000000", + "backgroundColor": "transparent", + "width": 143, + "height": 70, "seed": 397952299, - "version": 101, - "versionNonce": 1136266571, - "isDeleted": false, - "boundElements": null, - "updated": 1642362303108, - "text": "parameter\nacquisition", + "groupIds": [], + "roundness": null, + "boundElements": [], + "updated": 1672159954051, + "link": null, + "locked": false, "fontSize": 28, "fontFamily": 1, + "text": "parameter\nacquisition", + "baseline": 60, "textAlign": "left", "verticalAlign": "top", - "baseline": 60, "containerId": null, "originalText": "parameter\nacquisition" }, { - "id": "rSLMGDDSANk39uZ1qSXoL", "type": "rectangle", - "x": 1337, - "y": 977, - "width": 134.99999999999997, - "height": 81, - "angle": 0, - "strokeColor": "#000000", - "backgroundColor": "#4c6ef5", + "version": 572, + "versionNonce": 606790076, + "isDeleted": false, + "id": "rSLMGDDSANk39uZ1qSXoL", "fillStyle": "hachure", "strokeWidth": 1, "strokeStyle": "solid", "roughness": 1, "opacity": 100, - "groupIds": [], - "strokeSharpness": "sharp", + "angle": 0, + "x": 1431.6556396484373, + "y": 1010.0336303710938, + "strokeColor": "#000000", + "backgroundColor": "#4c6ef5", + "width": 134.99999999999997, + "height": 81, "seed": 1611274123, - "version": 449, - "versionNonce": 1481072869, - "isDeleted": false, + "groupIds": [], + "roundness": null, "boundElements": [ - { - "id": "EtLiZhEZaa9WLgzibOC4Z", - "type": "arrow" - }, - { - "id": "YocjEL2rruEGX5ZI6ib6m", - "type": "arrow" - }, { "id": "nOi6tq-2dNTSewXtFLShT", "type": "arrow" @@ -593,14 +611,6 @@ "id": "CrWnLffoEmAxN3TSlmMyw", "type": "arrow" }, - { - "id": "N1L85R55MwKBC9PzLPGcz", - "type": "arrow" - }, - { - "id": "jN4UYAnktnV_Yw60T3IFj", - "type": "arrow" - }, { "id": "WpRWj4wsphy5Bnq8oAztK", "type": "arrow" @@ -616,14 +626,20 @@ { "id": "C56y13xv1mZB0LD6qFnq7", "type": "arrow" + }, + { + "id": "e92jNKrSDSVuXHsMKYG50", + "type": "arrow" } ], - "updated": 1642362478360 + "updated": 1672160378650, + "link": null, + "locked": false }, { "type": "rectangle", - "version": 464, - "versionNonce": 1423178821, + "version": 586, + "versionNonce": 487512892, "isDeleted": false, "id": "iQff4S2v-YJgbx7HyTt9e", "fillStyle": "hachure", @@ -632,24 +648,16 @@ "roughness": 1, "opacity": 100, "angle": 0, - "x": 1511, - "y": 973, + "x": 1605.6556396484373, + "y": 1006.0336303710938, "strokeColor": "#000000", "backgroundColor": "#4c6ef5", "width": 159, "height": 81.99999999999999, "seed": 1272732645, "groupIds": [], - "strokeSharpness": "sharp", + "roundness": null, "boundElements": [ - { - "id": "35gsK7a88pUj2QOJPFHu_", - "type": "arrow" - }, - { - "id": "_TTgQHgDVPVk6emSAJD2b", - "type": "arrow" - }, { "id": "T4pXRAYFkR-g6wcwWtdUi", "type": "arrow" @@ -663,12 +671,14 @@ "type": "arrow" } ], - "updated": 1642362478361 + "updated": 1672160375656, + "link": null, + "locked": false }, { "type": "rectangle", - "version": 416, - "versionNonce": 1878377707, + "version": 537, + "versionNonce": 1959519051, "isDeleted": false, "id": "AlPYVihubZCxJQk7pB7yt", "fillStyle": "hachure", @@ -677,24 +687,16 @@ "roughness": 1, "opacity": 100, "angle": 0, - "x": 1701, - "y": 975, + "x": 1795.6556396484373, + "y": 1008.0336303710938, "strokeColor": "#000000", "backgroundColor": "#4c6ef5", "width": 148, "height": 83, "seed": 865304811, "groupIds": [], - "strokeSharpness": "sharp", + "roundness": null, "boundElements": [ - { - "id": "A75jfXlo10-Lm9-5tsORl", - "type": "arrow" - }, - { - "id": "wOzoa7Y2iPECskRDHJKx9", - "type": "arrow" - }, { "id": "aHRyPPlRffPPgpnIeV9Dh", "type": "arrow" @@ -703,117 +705,121 @@ "id": "Mb0iRSRUFKwoit6vzjK8N", "type": "arrow" }, - { - "id": "jN4UYAnktnV_Yw60T3IFj", - "type": "arrow" - }, { "id": "SzngLEi5fMuV4BAB1hUpH", "type": "arrow" } ], - "updated": 1642362422507 + "updated": 1672159902674, + "link": null, + "locked": false }, { - "id": "ChyaIGAhCl5VZZgLawhEu", "type": "text", - "x": 1396, - "y": 1006, - "width": 16, - "height": 25, - "angle": 0, - "strokeColor": "#000000", - "backgroundColor": "transparent", + "version": 411, + "versionNonce": 1847285157, + "isDeleted": false, + "id": "ChyaIGAhCl5VZZgLawhEu", "fillStyle": "hachure", "strokeWidth": 1, "strokeStyle": "solid", "roughness": 1, "opacity": 100, - "groupIds": [], - "strokeSharpness": "sharp", + "angle": 0, + "x": 1490.6556396484373, + "y": 1039.0336303710938, + "strokeColor": "#000000", + "backgroundColor": "transparent", + "width": 16, + "height": 25, "seed": 1233829605, - "version": 290, - "versionNonce": 781667237, - "isDeleted": false, - "boundElements": null, - "updated": 1642362303108, - "text": "f1", + "groupIds": [], + "roundness": null, + "boundElements": [], + "updated": 1672159902674, + "link": null, + "locked": false, "fontSize": 20, "fontFamily": 1, + "text": "f1", + "baseline": 18, "textAlign": "left", "verticalAlign": "top", - "baseline": 18, "containerId": null, "originalText": "f1" }, { - "id": "uk8-OqwXIEYFoNb-2x7EP", "type": "text", - "x": 1567, - "y": 1003, - "width": 59, - "height": 25, - "angle": 0, - "strokeColor": "#000000", - "backgroundColor": "transparent", + "version": 468, + "versionNonce": 338297643, + "isDeleted": false, + "id": "uk8-OqwXIEYFoNb-2x7EP", "fillStyle": "hachure", "strokeWidth": 1, "strokeStyle": "solid", "roughness": 1, "opacity": 100, - "groupIds": [], - "strokeSharpness": "sharp", + "angle": 0, + "x": 1661.6556396484373, + "y": 1036.0336303710938, + "strokeColor": "#000000", + "backgroundColor": "transparent", + "width": 59, + "height": 25, "seed": 228446149, - "version": 347, - "versionNonce": 571501867, - "isDeleted": false, - "boundElements": null, - "updated": 1642362303109, - "text": "qlearn", + "groupIds": [], + "roundness": null, + "boundElements": [], + "updated": 1672159902674, + "link": null, + "locked": false, "fontSize": 20, "fontFamily": 1, + "text": "qlearn", + "baseline": 18, "textAlign": "left", "verticalAlign": "top", - "baseline": 18, "containerId": null, "originalText": "qlearn" }, { - "id": "cNpNqZIRlhvi3rB_pJvng", "type": "text", - "x": 1743, - "y": 1002, - "width": 70, - "height": 25, - "angle": 0, - "strokeColor": "#000000", - "backgroundColor": "transparent", + "version": 435, + "versionNonce": 499865861, + "isDeleted": false, + "id": "cNpNqZIRlhvi3rB_pJvng", "fillStyle": "hachure", "strokeWidth": 1, "strokeStyle": "solid", "roughness": 1, "opacity": 100, - "groupIds": [], - "strokeSharpness": "sharp", + "angle": 0, + "x": 1837.6556396484373, + "y": 1035.0336303710938, + "strokeColor": "#000000", + "backgroundColor": "transparent", + "width": 70, + "height": 25, "seed": 1377775461, - "version": 314, - "versionNonce": 391679749, - "isDeleted": false, - "boundElements": null, - "updated": 1642362303109, - "text": "camera", + "groupIds": [], + "roundness": null, + "boundElements": [], + "updated": 1672159902674, + "link": null, + "locked": false, "fontSize": 20, "fontFamily": 1, + "text": "camera", + "baseline": 18, "textAlign": "left", "verticalAlign": "top", - "baseline": 18, "containerId": null, "originalText": "camera" }, { "type": "rectangle", - "version": 478, - "versionNonce": 825971371, + "version": 599, + "versionNonce": 2046191051, "isDeleted": false, "id": "wI0GlFUuLqB8Tx5Wyp0q_", "fillStyle": "hachure", @@ -822,24 +828,16 @@ "roughness": 1, "opacity": 100, "angle": 0, - "x": 1881.5, - "y": 974.5, + "x": 1976.1556396484373, + "y": 1007.5336303710938, "strokeColor": "#000000", "backgroundColor": "#4c6ef5", "width": 134.99999999999997, "height": 81, "seed": 1201744363, "groupIds": [], - "strokeSharpness": "sharp", + "roundness": null, "boundElements": [ - { - "id": "YUVAzkQxTEOjO8Y5NnMjJ", - "type": "arrow" - }, - { - "id": "0-L_4LMjCC6UP3gkUVgQE", - "type": "arrow" - }, { "id": "N5ZV2DWnenzr6l09eCVY_", "type": "arrow" @@ -853,12 +851,14 @@ "type": "arrow" } ], - "updated": 1642362410479 + "updated": 1672159902674, + "link": null, + "locked": false }, { "type": "text", - "version": 337, - "versionNonce": 299525733, + "version": 458, + "versionNonce": 836033317, "isDeleted": false, "id": "yajoyN7AXI27BdrcyyeCa", "fillStyle": "hachure", @@ -867,17 +867,19 @@ "roughness": 1, "opacity": 100, "angle": 0, - "x": 1911.5, - "y": 1002.5, + "x": 2006.1556396484373, + "y": 1035.5336303710938, "strokeColor": "#000000", "backgroundColor": "transparent", "width": 85, "height": 25, "seed": 1573846597, "groupIds": [], - "strokeSharpness": "sharp", + "roundness": null, "boundElements": [], - "updated": 1642362303109, + "updated": 1672159902674, + "link": null, + "locked": false, "fontSize": 20, "fontFamily": 1, "text": "sim. conf", @@ -888,26 +890,26 @@ "originalText": "sim. conf" }, { - "id": "U-EKjQ2I8Nqp77fo0hFXS", "type": "text", - "x": 1129, - "y": 723, - "width": 104, - "height": 35, - "angle": 0, - "strokeColor": "#000000", - "backgroundColor": "transparent", + "version": 160, + "versionNonce": 227457099, + "isDeleted": false, + "id": "U-EKjQ2I8Nqp77fo0hFXS", "fillStyle": "hachure", "strokeWidth": 1, "strokeStyle": "dashed", "roughness": 1, "opacity": 100, - "groupIds": [], - "strokeSharpness": "sharp", + "angle": 0, + "x": 1129, + "y": 723, + "strokeColor": "#000000", + "backgroundColor": "transparent", + "width": 104, + "height": 35, "seed": 2135632645, - "version": 160, - "versionNonce": 227457099, - "isDeleted": false, + "groupIds": [], + "roundness": null, "boundElements": [ { "id": "LUKsczhb4tA-FBj4UuVr_", @@ -919,36 +921,38 @@ } ], "updated": 1642363291245, - "text": "factory", + "link": null, + "locked": false, "fontSize": 28, "fontFamily": 1, + "text": "factory", + "baseline": 25, "textAlign": "left", "verticalAlign": "top", - "baseline": 25, "containerId": null, "originalText": "factory" }, { - "id": "ame5KpH1huZ8Ub_T8uka_", "type": "text", - "x": 1136, - "y": 624, - "width": 90, - "height": 50, - "angle": 0, - "strokeColor": "#000000", - "backgroundColor": "transparent", + "version": 147, + "versionNonce": 351970731, + "isDeleted": false, + "id": "ame5KpH1huZ8Ub_T8uka_", "fillStyle": "hachure", "strokeWidth": 1, "strokeStyle": "dashed", "roughness": 1, "opacity": 100, - "groupIds": [], - "strokeSharpness": "sharp", + "angle": 0, + "x": 1136, + "y": 624, + "strokeColor": "#000000", + "backgroundColor": "transparent", + "width": 90, + "height": 50, "seed": 1320986795, - "version": 147, - "versionNonce": 351970731, - "isDeleted": false, + "groupIds": [], + "roundness": null, "boundElements": [ { "id": "AnYN90VNUcVDpxC9ADTff", @@ -956,224 +960,242 @@ } ], "updated": 1642363291245, - "text": "trainer\nvalidator", + "link": null, + "locked": false, "fontSize": 20, "fontFamily": 1, + "text": "trainer\nvalidator", + "baseline": 43, "textAlign": "left", "verticalAlign": "top", - "baseline": 43, "containerId": null, "originalText": "trainer\nvalidator" }, { - "id": "qG81XyLuPfPbdcDtHu51L", "type": "arrow", - "x": 1656, - "y": 440, - "width": 263, - "height": 80, - "angle": 0, - "strokeColor": "#000000", - "backgroundColor": "transparent", + "version": 1226, + "versionNonce": 1260925956, + "isDeleted": false, + "id": "qG81XyLuPfPbdcDtHu51L", "fillStyle": "hachure", "strokeWidth": 1, "strokeStyle": "solid", "roughness": 1, "opacity": 100, - "groupIds": [], - "strokeSharpness": "round", + "angle": 0, + "x": 1667.9439161922062, + "y": 404.12495422363287, + "strokeColor": "#000000", + "backgroundColor": "transparent", + "width": 200.27699948821214, + "height": 114.40110778808588, "seed": 571833323, - "version": 48, - "versionNonce": 1160761931, - "isDeleted": false, - "boundElements": null, - "updated": 1642362303110, - "points": [ - [ - 0, - 0 - ], - [ - -263, - 80 - ] - ], - "lastCommittedPoint": null, + "groupIds": [], + "roundness": { + "type": 2 + }, + "boundElements": [], + "updated": 1672160320149, + "link": null, + "locked": false, "startBinding": { "elementId": "WTDldziLTtQz8IskT5HTT", - "focus": -0.7957511759025636, - "gap": 10.5 + "gap": 10.5, + "focus": -0.7957511759025636 }, "endBinding": { "elementId": "so8UXkj9nhsVv7gbekxQC", - "focus": -0.8021368719434322, - "gap": 8 + "gap": 8, + "focus": -0.8021368719434322 }, + "lastCommittedPoint": null, "startArrowhead": null, - "endArrowhead": "arrow" + "endArrowhead": "arrow", + "points": [ + [ + 0, + 0 + ], + [ + -200.27699948821214, + 114.40110778808588 + ] + ] }, { - "id": "ER8F9_6BgoPz74mJ_P0lE", "type": "arrow", - "x": 1656, - "y": 440, - "width": 66, - "height": 70, - "angle": 0, - "strokeColor": "#000000", - "backgroundColor": "transparent", + "version": 1223, + "versionNonce": 1267405700, + "isDeleted": false, + "id": "ER8F9_6BgoPz74mJ_P0lE", "fillStyle": "hachure", "strokeWidth": 1, "strokeStyle": "solid", "roughness": 1, "opacity": 100, - "groupIds": [], - "strokeSharpness": "round", + "angle": 0, + "x": 1668.9913553868644, + "y": 404.1249542236328, + "strokeColor": "#000000", + "backgroundColor": "transparent", + "width": 12.780021696326685, + "height": 104.40110778808594, "seed": 1390811563, - "version": 45, - "versionNonce": 1143311333, - "isDeleted": false, - "boundElements": null, - "updated": 1642362303110, - "points": [ - [ - 0, - 0 - ], - [ - -66, - 70 - ] - ], - "lastCommittedPoint": null, + "groupIds": [], + "roundness": { + "type": 2 + }, + "boundElements": [], + "updated": 1672160320149, + "link": null, + "locked": false, "startBinding": { "elementId": "WTDldziLTtQz8IskT5HTT", - "focus": -0.3792917004595838, - "gap": 10.5 + "gap": 10.5, + "focus": -0.3792917004595838 }, "endBinding": { "elementId": "iO4YUTo6wpPyvPrjSiH2E", - "focus": -0.3584814411800267, - "gap": 14 + "gap": 14, + "focus": -0.3584814411800267 }, + "lastCommittedPoint": null, "startArrowhead": null, - "endArrowhead": "arrow" + "endArrowhead": "arrow", + "points": [ + [ + 0, + 0 + ], + [ + -12.780021696326685, + 104.40110778808594 + ] + ] }, { - "id": "zAqc0tOxaphg3HEg5nuwH", "type": "arrow", - "x": 1654, - "y": 442, - "width": 106, - "height": 76, - "angle": 0, - "strokeColor": "#000000", - "backgroundColor": "transparent", + "version": 1205, + "versionNonce": 191234820, + "isDeleted": false, + "id": "zAqc0tOxaphg3HEg5nuwH", "fillStyle": "hachure", "strokeWidth": 1, "strokeStyle": "solid", "roughness": 1, "opacity": 100, - "groupIds": [], - "strokeSharpness": "round", + "angle": 0, + "x": 1649.6435023246167, + "y": 406.1249542236328, + "strokeColor": "#000000", + "backgroundColor": "transparent", + "width": 197.94607749546412, + "height": 110.40110778808594, "seed": 917952005, - "version": 27, - "versionNonce": 599085291, - "isDeleted": false, - "boundElements": null, - "updated": 1642362303110, - "points": [ - [ - 0, - 0 - ], - [ - 106, - 76 - ] - ], - "lastCommittedPoint": null, + "groupIds": [], + "roundness": { + "type": 2 + }, + "boundElements": [], + "updated": 1672160320149, + "link": null, + "locked": false, "startBinding": { "elementId": "WTDldziLTtQz8IskT5HTT", - "focus": 0.6768545049347342, - "gap": 12.5 + "gap": 12.5, + "focus": 0.6768545049347342 }, "endBinding": { "elementId": "6SO22MQMvMxyiJkOdh9pQ", - "focus": 0.48558315873490965, - "gap": 8 + "gap": 8, + "focus": 0.48558315873490965 }, + "lastCommittedPoint": null, "startArrowhead": null, - "endArrowhead": "arrow" + "endArrowhead": "arrow", + "points": [ + [ + 0, + 0 + ], + [ + 197.94607749546412, + 110.40110778808594 + ] + ] }, { - "id": "E2os0JITN7iIRJyipXLW9", "type": "arrow", - "x": 1652, - "y": 439, - "width": 275, - "height": 74, - "angle": 0, - "strokeColor": "#000000", - "backgroundColor": "transparent", + "version": 1215, + "versionNonce": 20382340, + "isDeleted": false, + "id": "E2os0JITN7iIRJyipXLW9", "fillStyle": "hachure", "strokeWidth": 1, "strokeStyle": "solid", "roughness": 1, "opacity": 100, - "groupIds": [], - "strokeSharpness": "round", + "angle": 0, + "x": 1636.4814712664702, + "y": 403.1249542236328, + "strokeColor": "#000000", + "backgroundColor": "transparent", + "width": 391.70978096246586, + "height": 108.40110778808594, "seed": 1124910859, - "version": 37, - "versionNonce": 431277893, - "isDeleted": false, - "boundElements": null, - "updated": 1642362303110, - "points": [ - [ - 0, - 0 - ], - [ - 275, - 74 - ] - ], - "lastCommittedPoint": null, + "groupIds": [], + "roundness": { + "type": 2 + }, + "boundElements": [], + "updated": 1672160320149, + "link": null, + "locked": false, "startBinding": { "elementId": "WTDldziLTtQz8IskT5HTT", - "focus": 0.9073609174027585, - "gap": 9.5 + "gap": 9.5, + "focus": 0.9073609174027585 }, "endBinding": { "elementId": "HJhc_4H0EWQF6BPGBnwFZ", - "focus": 0.8484115915078259, - "gap": 12.5 + "gap": 12.5, + "focus": 0.8484115915078259 }, + "lastCommittedPoint": null, "startArrowhead": null, - "endArrowhead": "arrow" + "endArrowhead": "arrow", + "points": [ + [ + 0, + 0 + ], + [ + 391.70978096246586, + 108.40110778808594 + ] + ] }, { - "id": "t0VdGC-iBiarP2bozp1gL", "type": "rectangle", - "x": 1572, - "y": 691, - "width": 187, - "height": 91, - "angle": 0, - "strokeColor": "#000000", - "backgroundColor": "#40c057", + "version": 85, + "versionNonce": 318041660, + "isDeleted": false, + "id": "t0VdGC-iBiarP2bozp1gL", "fillStyle": "hachure", "strokeWidth": 1, "strokeStyle": "solid", "roughness": 1, "opacity": 100, - "groupIds": [], - "strokeSharpness": "sharp", + "angle": 0, + "x": 1572, + "y": 691, + "strokeColor": "#000000", + "backgroundColor": "#40c057", + "width": 187, + "height": 91, "seed": 247845163, - "version": 83, - "versionNonce": 508689291, - "isDeleted": false, + "groupIds": [], + "roundness": null, "boundElements": [ { "id": "LPsjeogUdX7U8ypC58mA_", @@ -1188,49 +1210,39 @@ "type": "arrow" }, { - "id": "YocjEL2rruEGX5ZI6ib6m", - "type": "arrow" - }, - { - "id": "_TTgQHgDVPVk6emSAJD2b", - "type": "arrow" - }, - { - "id": "wOzoa7Y2iPECskRDHJKx9", - "type": "arrow" - }, - { - "id": "0-L_4LMjCC6UP3gkUVgQE", + "id": "106Bw88m4zv4-EiFqNW_l", "type": "arrow" }, { - "id": "106Bw88m4zv4-EiFqNW_l", + "id": "bGxFT6G-0FOLDWK-p8c8d", "type": "arrow" } ], - "updated": 1642362303110 + "updated": 1672159991721, + "link": null, + "locked": false }, { - "id": "UcP15sOU_2wWeq0w9108z", "type": "text", - "x": 1610, - "y": 698.5, - "width": 106, - "height": 70, - "angle": 0, - "strokeColor": "#000000", - "backgroundColor": "transparent", + "version": 43, + "versionNonce": 20992555, + "isDeleted": false, + "id": "UcP15sOU_2wWeq0w9108z", "fillStyle": "hachure", "strokeWidth": 1, "strokeStyle": "solid", "roughness": 1, "opacity": 100, - "groupIds": [], - "strokeSharpness": "sharp", + "angle": 0, + "x": 1610, + "y": 698.5, + "strokeColor": "#000000", + "backgroundColor": "transparent", + "width": 106, + "height": 70, "seed": 416005125, - "version": 43, - "versionNonce": 20992555, - "isDeleted": false, + "groupIds": [], + "roundness": null, "boundElements": [ { "id": "u95dSBtxdkYs1G4Xy6W6U", @@ -1238,49 +1250,44 @@ } ], "updated": 1642362303110, - "text": "Trainer\nFactory", + "link": null, + "locked": false, "fontSize": 28, "fontFamily": 1, + "text": "Trainer\nFactory", + "baseline": 60, "textAlign": "left", "verticalAlign": "top", - "baseline": 60, "containerId": null, "originalText": "Trainer\nFactory" }, { - "id": "LPsjeogUdX7U8ypC58mA_", "type": "arrow", - "x": 1389.0000000000002, - "y": 612, - "width": 258.3033629065576, - "height": 73, - "angle": 0, - "strokeColor": "#000000", - "backgroundColor": "#fa5252", + "version": 185, + "versionNonce": 1742393547, + "isDeleted": false, + "id": "LPsjeogUdX7U8ypC58mA_", "fillStyle": "hachure", "strokeWidth": 1, "strokeStyle": "solid", "roughness": 1, "opacity": 100, - "groupIds": [], - "strokeSharpness": "round", + "angle": 0, + "x": 1475.9670291030047, + "y": 610.5260620117188, + "strokeColor": "#000000", + "backgroundColor": "#fa5252", + "width": 192.86250836930867, + "height": 70.47393798828125, "seed": 288211973, - "version": 47, - "versionNonce": 1866824197, - "isDeleted": false, - "boundElements": null, - "updated": 1642362303110, - "points": [ - [ - 0, - 0 - ], - [ - 258.3033629065576, - 73 - ] - ], - "lastCommittedPoint": null, + "groupIds": [], + "roundness": { + "type": 2 + }, + "boundElements": [], + "updated": 1672159886596, + "link": null, + "locked": false, "startBinding": { "focus": 0.7672242134607726, "gap": 3, @@ -1291,43 +1298,47 @@ "gap": 10, "elementId": "t0VdGC-iBiarP2bozp1gL" }, + "lastCommittedPoint": null, "startArrowhead": null, - "endArrowhead": "arrow" + "endArrowhead": "arrow", + "points": [ + [ + 0, + 0 + ], + [ + 192.86250836930867, + 70.47393798828125 + ] + ] }, { - "id": "u95dSBtxdkYs1G4Xy6W6U", "type": "arrow", - "x": 1589, - "y": 611, - "width": 75, - "height": 72, - "angle": 0, - "strokeColor": "#000000", - "backgroundColor": "#fa5252", + "version": 174, + "versionNonce": 1574835077, + "isDeleted": false, + "id": "u95dSBtxdkYs1G4Xy6W6U", "fillStyle": "hachure", "strokeWidth": 1, "strokeStyle": "solid", "roughness": 1, "opacity": 100, - "groupIds": [], - "strokeSharpness": "round", + "angle": 0, + "x": 1663.8819370680621, + "y": 609.5260620117188, + "strokeColor": "#000000", + "backgroundColor": "#fa5252", + "width": 22.024777679599993, + "height": 73.47393798828125, "seed": 1961291653, - "version": 36, - "versionNonce": 761976011, - "isDeleted": false, - "boundElements": null, - "updated": 1642362303110, - "points": [ - [ - 0, - 0 - ], - [ - 75, - 72 - ] - ], - "lastCommittedPoint": null, + "groupIds": [], + "roundness": { + "type": 2 + }, + "boundElements": [], + "updated": 1672159886596, + "link": null, + "locked": false, "startBinding": { "elementId": "iO4YUTo6wpPyvPrjSiH2E", "focus": 0.3225366518922605, @@ -1338,43 +1349,47 @@ "focus": 0.5992081974848626, "gap": 15.5 }, + "lastCommittedPoint": null, "startArrowhead": null, - "endArrowhead": "arrow" + "endArrowhead": "arrow", + "points": [ + [ + 0, + 0 + ], + [ + 22.024777679599993, + 73.47393798828125 + ] + ] }, { - "id": "F9PeeG-Mzjn5lnODx564W", "type": "arrow", - "x": 1757, - "y": 611, - "width": 86.62283608540565, - "height": 73, - "angle": 0, - "strokeColor": "#000000", - "backgroundColor": "#fa5252", + "version": 203, + "versionNonce": 1241428779, + "isDeleted": false, + "id": "F9PeeG-Mzjn5lnODx564W", "fillStyle": "hachure", "strokeWidth": 1, "strokeStyle": "solid", "roughness": 1, "opacity": 100, - "groupIds": [], - "strokeSharpness": "round", + "angle": 0, + "x": 1835.866629894288, + "y": 609.5260620117188, + "strokeColor": "#000000", + "backgroundColor": "#fa5252", + "width": 134.74558470374768, + "height": 70.47393798828125, "seed": 802473349, - "version": 65, - "versionNonce": 1391587685, - "isDeleted": false, - "boundElements": null, - "updated": 1642362303110, - "points": [ - [ - 0, - 0 - ], - [ - -86.62283608540565, - 73 - ] - ], - "lastCommittedPoint": null, + "groupIds": [], + "roundness": { + "type": 2 + }, + "boundElements": [], + "updated": 1672159886597, + "link": null, + "locked": false, "startBinding": { "focus": -0.35769838021168354, "gap": 2, @@ -1385,43 +1400,47 @@ "gap": 11, "elementId": "t0VdGC-iBiarP2bozp1gL" }, + "lastCommittedPoint": null, "startArrowhead": null, - "endArrowhead": "arrow" + "endArrowhead": "arrow", + "points": [ + [ + 0, + 0 + ], + [ + -134.74558470374768, + 70.47393798828125 + ] + ] }, { - "id": "7gg0nm6qTBiIHxTYo0xmh", "type": "arrow", - "x": 1939, - "y": 609, - "width": 242.74844357912684, - "height": 76, - "angle": 0, - "strokeColor": "#000000", - "backgroundColor": "#fa5252", + "version": 179, + "versionNonce": 292706379, + "isDeleted": false, + "id": "7gg0nm6qTBiIHxTYo0xmh", "fillStyle": "hachure", "strokeWidth": 1, "strokeStyle": "solid", "roughness": 1, "opacity": 100, - "groupIds": [], - "strokeSharpness": "round", + "angle": 0, + "x": 2026.4856700922946, + "y": 607.5260620117188, + "strokeColor": "#000000", + "backgroundColor": "#fa5252", + "width": 301.591272124271, + "height": 73.47393798828125, "seed": 929125221, - "version": 41, - "versionNonce": 40633195, - "isDeleted": false, - "boundElements": null, - "updated": 1642362303110, - "points": [ - [ - 0, - 0 - ], - [ - -242.74844357912684, - 76 - ] - ], - "lastCommittedPoint": null, + "groupIds": [], + "roundness": { + "type": 2 + }, + "boundElements": [], + "updated": 1672159886597, + "link": null, + "locked": false, "startBinding": { "focus": -0.7048449009233323, "gap": 2.5, @@ -1432,30 +1451,41 @@ "gap": 10, "elementId": "t0VdGC-iBiarP2bozp1gL" }, + "lastCommittedPoint": null, "startArrowhead": null, - "endArrowhead": "arrow" + "endArrowhead": "arrow", + "points": [ + [ + 0, + 0 + ], + [ + -301.591272124271, + 73.47393798828125 + ] + ] }, { - "id": "G01uI8BSZIe0uxQjDohET", "type": "rectangle", - "x": 1576, - "y": 835, - "width": 187, - "height": 91, - "angle": 0, - "strokeColor": "#000000", - "backgroundColor": "#4c6ef5", + "version": 186, + "versionNonce": 40509669, + "isDeleted": false, + "id": "G01uI8BSZIe0uxQjDohET", "fillStyle": "hachure", "strokeWidth": 1, "strokeStyle": "solid", "roughness": 1, "opacity": 100, - "groupIds": [], - "strokeSharpness": "sharp", + "angle": 0, + "x": 1576, + "y": 835, + "strokeColor": "#000000", + "backgroundColor": "#4c6ef5", + "width": 187, + "height": 91, "seed": 1916845509, - "version": 182, - "versionNonce": 1377423237, - "isDeleted": false, + "groupIds": [], + "roundness": null, "boundElements": [ { "id": "LPsjeogUdX7U8ypC58mA_", @@ -1469,111 +1499,76 @@ "id": "7gg0nm6qTBiIHxTYo0xmh", "type": "arrow" }, - { - "id": "YocjEL2rruEGX5ZI6ib6m", - "type": "arrow" - }, - { - "id": "_TTgQHgDVPVk6emSAJD2b", - "type": "arrow" - }, - { - "id": "wOzoa7Y2iPECskRDHJKx9", - "type": "arrow" - }, - { - "id": "0-L_4LMjCC6UP3gkUVgQE", - "type": "arrow" - }, { "id": "106Bw88m4zv4-EiFqNW_l", "type": "arrow" - }, - { - "id": "CrWnLffoEmAxN3TSlmMyw", - "type": "arrow" - }, - { - "id": "G--IIfKHDfN5v238FEhKJ", - "type": "arrow" - }, - { - "id": "Mb0iRSRUFKwoit6vzjK8N", - "type": "arrow" - }, - { - "id": "TI9Bg9CwR-dC1ybGfrtJw", - "type": "arrow" } ], - "updated": 1642362303111 + "updated": 1672159903218, + "link": null, + "locked": false }, { - "id": "dQe2SI1zlssz2YE7tmvan", "type": "text", - "x": 1623, - "y": 866.5, - "width": 89, - "height": 25, - "angle": 0, - "strokeColor": "#000000", - "backgroundColor": "transparent", + "version": 73, + "versionNonce": 1402967781, + "isDeleted": false, + "id": "dQe2SI1zlssz2YE7tmvan", "fillStyle": "hachure", "strokeWidth": 1, "strokeStyle": "solid", "roughness": 1, "opacity": 100, - "groupIds": [], - "strokeSharpness": "sharp", + "angle": 0, + "x": 1623, + "y": 866.5, + "strokeColor": "#000000", + "backgroundColor": "transparent", + "width": 89, + "height": 25, "seed": 64196133, - "version": 73, - "versionNonce": 1402967781, - "isDeleted": false, - "boundElements": null, + "groupIds": [], + "roundness": null, + "boundElements": [], "updated": 1642362303111, - "text": "F1Trainer", + "link": null, + "locked": false, "fontSize": 20, "fontFamily": 1, + "text": "F1Trainer", + "baseline": 18, "textAlign": "left", "verticalAlign": "top", - "baseline": 18, "containerId": null, "originalText": "F1Trainer" }, { - "id": "nOi6tq-2dNTSewXtFLShT", "type": "arrow", - "x": 1393.9913479151562, - "y": 622, - "width": 7.0165946349779915, - "height": 349, - "angle": 0, - "strokeColor": "#000000", - "backgroundColor": "#4c6ef5", + "version": 523, + "versionNonce": 1209797285, + "isDeleted": false, + "id": "nOi6tq-2dNTSewXtFLShT", "fillStyle": "hachure", "strokeWidth": 1, "strokeStyle": "dotted", "roughness": 1, "opacity": 100, - "groupIds": [], - "strokeSharpness": "round", + "angle": 0, + "x": 1492.7396286239752, + "y": 620.5260620117188, + "strokeColor": "#000000", + "backgroundColor": "#4c6ef5", + "width": 3.4565418407464676, + "height": 383.507568359375, "seed": 1663901003, - "version": 143, - "versionNonce": 1760916971, - "isDeleted": false, - "boundElements": null, - "updated": 1642362303111, - "points": [ - [ - 0, - 0 - ], - [ - 7.0165946349779915, - 349 - ] - ], - "lastCommittedPoint": null, + "groupIds": [], + "roundness": { + "type": 2 + }, + "boundElements": [], + "updated": 1672159902673, + "link": null, + "locked": false, "startBinding": { "focus": 0.023191415714780207, "gap": 13, @@ -1584,43 +1579,47 @@ "gap": 6, "elementId": "rSLMGDDSANk39uZ1qSXoL" }, + "lastCommittedPoint": null, "startArrowhead": null, - "endArrowhead": "arrow" + "endArrowhead": "arrow", + "points": [ + [ + 0, + 0 + ], + [ + 3.4565418407464676, + 383.507568359375 + ] + ] }, { - "id": "106Bw88m4zv4-EiFqNW_l", "type": "arrow", - "x": 1672, - "y": 791, - "width": 2, - "height": 38, - "angle": 0, - "strokeColor": "#000000", - "backgroundColor": "#4c6ef5", + "version": 48, + "versionNonce": 1672835653, + "isDeleted": false, + "id": "106Bw88m4zv4-EiFqNW_l", "fillStyle": "hachure", "strokeWidth": 1, "strokeStyle": "solid", "roughness": 1, "opacity": 100, - "groupIds": [], - "strokeSharpness": "round", + "angle": 0, + "x": 1672, + "y": 791, + "strokeColor": "#000000", + "backgroundColor": "#4c6ef5", + "width": 2, + "height": 38, "seed": 1638576523, - "version": 48, - "versionNonce": 1672835653, - "isDeleted": false, - "boundElements": null, + "groupIds": [], + "roundness": { + "type": 2 + }, + "boundElements": [], "updated": 1642362303111, - "points": [ - [ - 0, - 0 - ], - [ - -2, - 38 - ] - ], - "lastCommittedPoint": null, + "link": null, + "locked": false, "startBinding": { "elementId": "t0VdGC-iBiarP2bozp1gL", "focus": -0.09769484083424806, @@ -1631,43 +1630,47 @@ "focus": -0.02305159165751921, "gap": 6 }, + "lastCommittedPoint": null, "startArrowhead": null, - "endArrowhead": "arrow" + "endArrowhead": "arrow", + "points": [ + [ + 0, + 0 + ], + [ + -2, + 38 + ] + ] }, { - "id": "T4pXRAYFkR-g6wcwWtdUi", "type": "arrow", - "x": 1581.9913479151562, - "y": 620, - "width": 7.0165946349779915, - "height": 349, - "angle": 0, - "strokeColor": "#000000", - "backgroundColor": "#4c6ef5", + "version": 559, + "versionNonce": 1330843685, + "isDeleted": false, + "id": "T4pXRAYFkR-g6wcwWtdUi", "fillStyle": "hachure", "strokeWidth": 1, "strokeStyle": "dotted", "roughness": 1, "opacity": 100, - "groupIds": [], - "strokeSharpness": "round", + "angle": 0, + "x": 1680.7130359393636, + "y": 618.5260620117188, + "strokeColor": "#000000", + "backgroundColor": "#4c6ef5", + "width": 3.453379942981428, + "height": 383.507568359375, "seed": 1740527397, - "version": 179, - "versionNonce": 1917157515, - "isDeleted": false, - "boundElements": null, - "updated": 1642362303111, - "points": [ - [ - 0, - 0 - ], - [ - 7.0165946349779915, - 349 - ] - ], - "lastCommittedPoint": null, + "groupIds": [], + "roundness": { + "type": 2 + }, + "boundElements": [], + "updated": 1672159902674, + "link": null, + "locked": false, "startBinding": { "focus": -0.004800301487075316, "gap": 14, @@ -1678,43 +1681,47 @@ "gap": 4, "elementId": "iQff4S2v-YJgbx7HyTt9e" }, + "lastCommittedPoint": null, "startArrowhead": null, - "endArrowhead": "arrow" + "endArrowhead": "arrow", + "points": [ + [ + 0, + 0 + ], + [ + 3.453379942981428, + 383.507568359375 + ] + ] }, { - "id": "aHRyPPlRffPPgpnIeV9Dh", "type": "arrow", - "x": 1769.9913479151562, - "y": 622, - "width": 7.0165946349779915, - "height": 349, - "angle": 0, - "strokeColor": "#000000", - "backgroundColor": "#4c6ef5", + "version": 584, + "versionNonce": 128401899, + "isDeleted": false, + "id": "aHRyPPlRffPPgpnIeV9Dh", "fillStyle": "hachure", "strokeWidth": 1, "strokeStyle": "dotted", "roughness": 1, "opacity": 100, - "groupIds": [], - "strokeSharpness": "round", + "angle": 0, + "x": 1868.6983479197406, + "y": 620.5260620117188, + "strokeColor": "#000000", + "backgroundColor": "#4c6ef5", + "width": 3.4524099782533995, + "height": 383.507568359375, "seed": 588836133, - "version": 204, - "versionNonce": 1597563301, - "isDeleted": false, - "boundElements": null, - "updated": 1642362303111, - "points": [ - [ - 0, - 0 - ], - [ - 7.0165946349779915, - 349 - ] - ], - "lastCommittedPoint": null, + "groupIds": [], + "roundness": { + "type": 2 + }, + "boundElements": [], + "updated": 1672159902674, + "link": null, + "locked": false, "startBinding": { "focus": -0.052056752033277594, "gap": 13, @@ -1725,43 +1732,47 @@ "gap": 4, "elementId": "AlPYVihubZCxJQk7pB7yt" }, + "lastCommittedPoint": null, "startArrowhead": null, - "endArrowhead": "arrow" + "endArrowhead": "arrow", + "points": [ + [ + 0, + 0 + ], + [ + 3.4524099782533995, + 383.507568359375 + ] + ] }, { - "id": "N5ZV2DWnenzr6l09eCVY_", "type": "arrow", - "x": 1941.9913479151562, - "y": 620, - "width": 7.0165946349779915, - "height": 349, - "angle": 0, - "strokeColor": "#000000", - "backgroundColor": "#4c6ef5", + "version": 606, + "versionNonce": 962425963, + "isDeleted": false, + "id": "N5ZV2DWnenzr6l09eCVY_", "fillStyle": "hachure", "strokeWidth": 1, "strokeStyle": "dotted", "roughness": 1, "opacity": 100, - "groupIds": [], - "strokeSharpness": "round", + "angle": 0, + "x": 2040.713806134544, + "y": 618.5260620117188, + "strokeColor": "#000000", + "backgroundColor": "#4c6ef5", + "width": 3.4541523257041717, + "height": 383.507568359375, "seed": 1463661765, - "version": 226, - "versionNonce": 1987786539, - "isDeleted": false, - "boundElements": null, - "updated": 1642362303111, - "points": [ - [ - 0, - 0 - ], - [ - 7.0165946349779915, - 349 - ] - ], - "lastCommittedPoint": null, + "groupIds": [], + "roundness": { + "type": 2 + }, + "boundElements": [], + "updated": 1672159902674, + "link": null, + "locked": false, "startBinding": { "elementId": "HJhc_4H0EWQF6BPGBnwFZ", "focus": -0.027895875330943825, @@ -1772,32 +1783,56 @@ "focus": 0.013654046665703985, "gap": 5.5 }, + "lastCommittedPoint": null, "startArrowhead": null, - "endArrowhead": "arrow" + "endArrowhead": "arrow", + "points": [ + [ + 0, + 0 + ], + [ + 3.4541523257041717, + 383.507568359375 + ] + ] }, { - "id": "CrWnLffoEmAxN3TSlmMyw", "type": "arrow", - "x": 1662, - "y": 937, - "width": 239, - "height": 30, - "angle": 0, - "strokeColor": "#000000", - "backgroundColor": "#4c6ef5", + "version": 288, + "versionNonce": 516834309, + "isDeleted": false, + "id": "CrWnLffoEmAxN3TSlmMyw", "fillStyle": "hachure", "strokeWidth": 1, "strokeStyle": "solid", "roughness": 1, "opacity": 100, - "groupIds": [], - "strokeSharpness": "round", + "angle": 0, + "x": 1756.6556396484373, + "y": 970.0336303710938, + "strokeColor": "#000000", + "backgroundColor": "#4c6ef5", + "width": 239, + "height": 30, "seed": 832193477, - "version": 44, - "versionNonce": 727147781, - "isDeleted": false, - "boundElements": null, - "updated": 1642362303111, + "groupIds": [], + "roundness": { + "type": 2 + }, + "boundElements": [], + "updated": 1672159903217, + "link": null, + "locked": false, + "startBinding": null, + "endBinding": { + "elementId": "rSLMGDDSANk39uZ1qSXoL", + "focus": -0.983766927250203, + "gap": 10 + }, + "lastCommittedPoint": null, + "startArrowhead": null, + "endArrowhead": "arrow", "points": [ [ 0, @@ -1807,44 +1842,44 @@ -239, 30 ] - ], - "lastCommittedPoint": null, - "startBinding": { - "elementId": "G01uI8BSZIe0uxQjDohET", - "focus": -0.9706860630871011, - "gap": 11 - }, - "endBinding": { - "elementId": "rSLMGDDSANk39uZ1qSXoL", - "focus": -0.983766927250203, - "gap": 10 - }, - "startArrowhead": null, - "endArrowhead": "arrow" + ] }, { - "id": "G--IIfKHDfN5v238FEhKJ", "type": "arrow", - "x": 1662, - "y": 936, - "width": 48, - "height": 25, - "angle": 0, - "strokeColor": "#000000", - "backgroundColor": "#4c6ef5", + "version": 279, + "versionNonce": 27400555, + "isDeleted": false, + "id": "G--IIfKHDfN5v238FEhKJ", "fillStyle": "hachure", "strokeWidth": 1, "strokeStyle": "solid", "roughness": 1, "opacity": 100, - "groupIds": [], - "strokeSharpness": "round", + "angle": 0, + "x": 1756.6556396484373, + "y": 969.0336303710938, + "strokeColor": "#000000", + "backgroundColor": "#4c6ef5", + "width": 48, + "height": 25, "seed": 864411301, - "version": 35, - "versionNonce": 1677793739, - "isDeleted": false, - "boundElements": null, - "updated": 1642362303111, + "groupIds": [], + "roundness": { + "type": 2 + }, + "boundElements": [], + "updated": 1672159903217, + "link": null, + "locked": false, + "startBinding": null, + "endBinding": { + "elementId": "iQff4S2v-YJgbx7HyTt9e", + "focus": -0.4946277335355835, + "gap": 12 + }, + "lastCommittedPoint": null, + "startArrowhead": null, + "endArrowhead": "arrow", "points": [ [ 0, @@ -1854,44 +1889,44 @@ -48, 25 ] - ], - "lastCommittedPoint": null, - "startBinding": { - "elementId": "G01uI8BSZIe0uxQjDohET", - "focus": -0.5477164657746323, - "gap": 10 - }, - "endBinding": { - "elementId": "iQff4S2v-YJgbx7HyTt9e", - "focus": -0.4946277335355835, - "gap": 12 - }, - "startArrowhead": null, - "endArrowhead": "arrow" + ] }, { - "id": "Mb0iRSRUFKwoit6vzjK8N", "type": "arrow", - "x": 1659, - "y": 936, - "width": 117, - "height": 28, - "angle": 0, - "strokeColor": "#000000", - "backgroundColor": "#4c6ef5", + "version": 270, + "versionNonce": 1838917157, + "isDeleted": false, + "id": "Mb0iRSRUFKwoit6vzjK8N", "fillStyle": "hachure", "strokeWidth": 1, "strokeStyle": "solid", "roughness": 1, "opacity": 100, - "groupIds": [], - "strokeSharpness": "round", + "angle": 0, + "x": 1753.6556396484373, + "y": 969.0336303710938, + "strokeColor": "#000000", + "backgroundColor": "#4c6ef5", + "width": 117, + "height": 28, "seed": 1797354661, - "version": 26, - "versionNonce": 730758245, - "isDeleted": false, - "boundElements": null, - "updated": 1642362303112, + "groupIds": [], + "roundness": { + "type": 2 + }, + "boundElements": [], + "updated": 1672159903218, + "link": null, + "locked": false, + "startBinding": null, + "endBinding": { + "elementId": "AlPYVihubZCxJQk7pB7yt", + "focus": 0.8907253699025621, + "gap": 11 + }, + "lastCommittedPoint": null, + "startArrowhead": null, + "endArrowhead": "arrow", "points": [ [ 0, @@ -1901,44 +1936,44 @@ 117, 28 ] - ], - "lastCommittedPoint": null, - "startBinding": { - "elementId": "G01uI8BSZIe0uxQjDohET", - "focus": 0.8546874016243782, - "gap": 10 - }, - "endBinding": { - "elementId": "AlPYVihubZCxJQk7pB7yt", - "focus": 0.8907253699025621, - "gap": 11 - }, - "startArrowhead": null, - "endArrowhead": "arrow" + ] }, { - "id": "TI9Bg9CwR-dC1ybGfrtJw", "type": "arrow", - "x": 1661, - "y": 936, - "width": 292, - "height": 30, - "angle": 0, - "strokeColor": "#000000", - "backgroundColor": "#4c6ef5", + "version": 290, + "versionNonce": 947285323, + "isDeleted": false, + "id": "TI9Bg9CwR-dC1ybGfrtJw", "fillStyle": "hachure", "strokeWidth": 1, "strokeStyle": "solid", "roughness": 1, "opacity": 100, - "groupIds": [], - "strokeSharpness": "round", + "angle": 0, + "x": 1755.6556396484373, + "y": 969.0336303710938, + "strokeColor": "#000000", + "backgroundColor": "#4c6ef5", + "width": 292, + "height": 30, "seed": 236299531, - "version": 46, - "versionNonce": 2012325995, - "isDeleted": false, - "boundElements": null, - "updated": 1642362303112, + "groupIds": [], + "roundness": { + "type": 2 + }, + "boundElements": [], + "updated": 1672159903218, + "link": null, + "locked": false, + "startBinding": null, + "endBinding": { + "elementId": "wI0GlFUuLqB8Tx5Wyp0q_", + "focus": 1.041657642047506, + "gap": 8.5 + }, + "lastCommittedPoint": null, + "startArrowhead": null, + "endArrowhead": "arrow", "points": [ [ 0, @@ -1948,44 +1983,48 @@ 292, 30 ] - ], - "lastCommittedPoint": null, - "startBinding": { - "elementId": "G01uI8BSZIe0uxQjDohET", - "focus": 1.0229942203716365, - "gap": 10 - }, - "endBinding": { - "elementId": "wI0GlFUuLqB8Tx5Wyp0q_", - "focus": 1.0416576420475057, - "gap": 8.5 - }, - "startArrowhead": null, - "endArrowhead": "arrow" + ] }, { - "id": "WpRWj4wsphy5Bnq8oAztK", "type": "arrow", - "x": 1484, - "y": 1072, - "width": 426, - "height": 74, - "angle": 0, - "strokeColor": "#000000", - "backgroundColor": "#4c6ef5", + "version": 497, + "versionNonce": 748600389, + "isDeleted": false, + "id": "WpRWj4wsphy5Bnq8oAztK", "fillStyle": "hachure", "strokeWidth": 1, "strokeStyle": "solid", "roughness": 1, "opacity": 100, - "groupIds": [], - "strokeSharpness": "round", + "angle": 0, + "x": 1578.6556396484373, + "y": 1105.0336303710938, + "strokeColor": "#000000", + "backgroundColor": "#4c6ef5", + "width": 426, + "height": 74, "seed": 955074923, - "version": 132, - "versionNonce": 282640773, - "isDeleted": false, - "boundElements": null, - "updated": 1642362410487, + "groupIds": [], + "roundness": { + "type": 2 + }, + "boundElements": [], + "updated": 1672159903218, + "link": null, + "locked": false, + "startBinding": { + "elementId": "rSLMGDDSANk39uZ1qSXoL", + "focus": -0.2189973614775726, + "gap": 14 + }, + "endBinding": { + "elementId": "wI0GlFUuLqB8Tx5Wyp0q_", + "focus": -0.41238793806031154, + "gap": 16.5 + }, + "lastCommittedPoint": null, + "startArrowhead": null, + "endArrowhead": "arrow", "points": [ [ 0, @@ -2011,44 +2050,48 @@ 426, 0 ] - ], - "lastCommittedPoint": null, - "startBinding": { - "elementId": "rSLMGDDSANk39uZ1qSXoL", - "focus": -0.2189973614775726, - "gap": 14 - }, - "endBinding": { - "elementId": "wI0GlFUuLqB8Tx5Wyp0q_", - "focus": -0.4123879380603098, - "gap": 16.5 - }, - "startArrowhead": null, - "endArrowhead": "arrow" + ] }, { - "id": "SzngLEi5fMuV4BAB1hUpH", "type": "arrow", - "x": 1481, - "y": 1070, - "width": 309, - "height": 47, - "angle": 0, - "strokeColor": "#000000", - "backgroundColor": "#4c6ef5", + "version": 571, + "versionNonce": 275311525, + "isDeleted": false, + "id": "SzngLEi5fMuV4BAB1hUpH", "fillStyle": "hachure", "strokeWidth": 1, "strokeStyle": "solid", "roughness": 1, "opacity": 100, - "groupIds": [], - "strokeSharpness": "round", + "angle": 0, + "x": 1575.6556396484373, + "y": 1103.0336303710938, + "strokeColor": "#000000", + "backgroundColor": "#4c6ef5", + "width": 309, + "height": 47, "seed": 951266955, - "version": 206, - "versionNonce": 115708741, - "isDeleted": false, - "boundElements": null, - "updated": 1642362422519, + "groupIds": [], + "roundness": { + "type": 2 + }, + "boundElements": [], + "updated": 1672159903218, + "link": null, + "locked": false, + "startBinding": { + "elementId": "rSLMGDDSANk39uZ1qSXoL", + "focus": 0.26040769451622164, + "gap": 12 + }, + "endBinding": { + "elementId": "AlPYVihubZCxJQk7pB7yt", + "focus": -0.677564558161573, + "gap": 7 + }, + "lastCommittedPoint": null, + "startArrowhead": null, + "endArrowhead": "arrow", "points": [ [ 0, @@ -2074,44 +2117,44 @@ 309, -5 ] - ], - "lastCommittedPoint": null, - "startBinding": { - "elementId": "rSLMGDDSANk39uZ1qSXoL", - "focus": 0.26040769451622164, - "gap": 12 - }, - "endBinding": { - "elementId": "AlPYVihubZCxJQk7pB7yt", - "focus": -0.677564558161573, - "gap": 7 - }, - "startArrowhead": null, - "endArrowhead": "arrow" + ] }, { - "id": "6Mqoc3CUC1kDXW-H8PnXC", "type": "arrow", - "x": 1350.6904761904752, - "y": 1029.9285714285713, - "width": 61.428571428571104, - "height": 57.285714285714334, - "angle": 0, - "strokeColor": "#000000", - "backgroundColor": "#4c6ef5", + "version": 1065, + "versionNonce": 933809451, + "isDeleted": false, + "id": "6Mqoc3CUC1kDXW-H8PnXC", "fillStyle": "hachure", "strokeWidth": 1, "strokeStyle": "solid", "roughness": 1, "opacity": 100, - "groupIds": [], - "strokeSharpness": "round", + "angle": 0, + "x": 1445.3461158389125, + "y": 1062.962201799665, + "strokeColor": "#000000", + "backgroundColor": "#4c6ef5", + "width": 61.428571428571104, + "height": 57.285714285714334, "seed": 1727252011, - "version": 822, - "versionNonce": 1302135909, - "isDeleted": false, - "boundElements": null, - "updated": 1642362470518, + "groupIds": [], + "roundness": { + "type": 2 + }, + "boundElements": [], + "updated": 1672159903218, + "link": null, + "locked": false, + "startBinding": null, + "endBinding": { + "elementId": "rSLMGDDSANk39uZ1qSXoL", + "focus": 0.7393834537053876, + "gap": 13.309523809524308 + }, + "lastCommittedPoint": null, + "startArrowhead": null, + "endArrowhead": "arrow", "points": [ [ 0, @@ -2141,40 +2184,48 @@ -26.999999999999545, -0.7142857142857792 ] - ], - "lastCommittedPoint": null, - "startBinding": null, - "endBinding": { - "elementId": "rSLMGDDSANk39uZ1qSXoL", - "focus": 0.7393834537053876, - "gap": 13.309523809524308 - }, - "startArrowhead": null, - "endArrowhead": "arrow" + ] }, { - "id": "C56y13xv1mZB0LD6qFnq7", "type": "arrow", - "x": 1476.4779855600796, - "y": 1067.8830747818163, - "width": 125, - "height": 17.5, - "angle": 0, - "strokeColor": "#000000", - "backgroundColor": "#4c6ef5", + "version": 450, + "versionNonce": 386658251, + "isDeleted": false, + "id": "C56y13xv1mZB0LD6qFnq7", "fillStyle": "hachure", "strokeWidth": 1, "strokeStyle": "solid", "roughness": 1, "opacity": 100, - "groupIds": [], - "strokeSharpness": "round", + "angle": 0, + "x": 1571.1336252085168, + "y": 1100.91670515291, + "strokeColor": "#000000", + "backgroundColor": "#4c6ef5", + "width": 125, + "height": 17.5, "seed": 567067051, - "version": 85, - "versionNonce": 262968971, - "isDeleted": false, - "boundElements": null, - "updated": 1642362478369, + "groupIds": [], + "roundness": { + "type": 2 + }, + "boundElements": [], + "updated": 1672159903219, + "link": null, + "locked": false, + "startBinding": { + "elementId": "rSLMGDDSANk39uZ1qSXoL", + "focus": 0.5284261195717249, + "gap": 9.883074781816276 + }, + "endBinding": { + "elementId": "iQff4S2v-YJgbx7HyTt9e", + "focus": -0.8893474155690672, + "gap": 12.883074781816276 + }, + "lastCommittedPoint": null, + "startArrowhead": null, + "endArrowhead": "arrow", "points": [ [ 0, @@ -2188,42 +2239,29 @@ 125, 0 ] - ], - "lastCommittedPoint": null, - "startBinding": { - "elementId": "rSLMGDDSANk39uZ1qSXoL", - "focus": 0.5284261195717249, - "gap": 9.883074781816276 - }, - "endBinding": { - "elementId": "iQff4S2v-YJgbx7HyTt9e", - "focus": -0.8893474155690672, - "gap": 12.883074781816276 - }, - "startArrowhead": null, - "endArrowhead": "arrow" + ] }, { - "id": "G00cPo51SxZZjScTTmW6v", "type": "text", - "x": 1143.9779855600796, - "y": 868.7164081151495, - "width": 77, - "height": 25, - "angle": 0, - "strokeColor": "#000000", - "backgroundColor": "#4c6ef5", + "version": 61, + "versionNonce": 1734275045, + "isDeleted": false, + "id": "G00cPo51SxZZjScTTmW6v", "fillStyle": "hachure", "strokeWidth": 1, "strokeStyle": "solid", "roughness": 1, "opacity": 100, - "groupIds": [], - "strokeSharpness": "sharp", + "angle": 0, + "x": 1143.9779855600796, + "y": 868.7164081151495, + "strokeColor": "#000000", + "backgroundColor": "#4c6ef5", + "width": 77, + "height": 25, "seed": 2137554539, - "version": 61, - "versionNonce": 1734275045, - "isDeleted": false, + "groupIds": [], + "roundness": null, "boundElements": [ { "id": "LUKsczhb4tA-FBj4UuVr_", @@ -2231,49 +2269,44 @@ } ], "updated": 1642363279061, - "text": "creates", + "link": null, + "locked": false, "fontSize": 20, "fontFamily": 1, + "text": "creates", + "baseline": 18, "textAlign": "left", "verticalAlign": "top", - "baseline": 18, "containerId": null, "originalText": "creates" }, { - "id": "LUKsczhb4tA-FBj4UuVr_", "type": "arrow", - "x": 1181.6555701966186, - "y": 761.7164081151495, - "width": 2.2019997163511107, - "height": 100.17323049763218, - "angle": 0, - "strokeColor": "#000000", - "backgroundColor": "#4c6ef5", + "version": 134, + "versionNonce": 1270217381, + "isDeleted": false, + "id": "LUKsczhb4tA-FBj4UuVr_", "fillStyle": "hachure", "strokeWidth": 1, "strokeStyle": "solid", "roughness": 1, "opacity": 100, - "groupIds": [], - "strokeSharpness": "round", + "angle": 0, + "x": 1181.6555701966186, + "y": 761.7164081151495, + "strokeColor": "#000000", + "backgroundColor": "#4c6ef5", + "width": 2.2019997163511107, + "height": 100.17323049763218, "seed": 17069733, - "version": 134, - "versionNonce": 1270217381, - "isDeleted": false, - "boundElements": null, + "groupIds": [], + "roundness": { + "type": 2 + }, + "boundElements": [], "updated": 1642363279112, - "points": [ - [ - 0, - 0 - ], - [ - 2.2019997163511107, - 100.17323049763218 - ] - ], - "lastCommittedPoint": null, + "link": null, + "locked": false, "startBinding": { "elementId": "U-EKjQ2I8Nqp77fo0hFXS", "focus": -0.003519884030544712, @@ -2284,43 +2317,47 @@ "focus": 0.04653602397004277, "gap": 6.826769502367824 }, + "lastCommittedPoint": null, "startArrowhead": null, - "endArrowhead": "arrow" + "endArrowhead": "arrow", + "points": [ + [ + 0, + 0 + ], + [ + 2.2019997163511107, + 100.17323049763218 + ] + ] }, { - "id": "AnYN90VNUcVDpxC9ADTff", "type": "arrow", - "x": 1178.9779855600796, - "y": 682.7164081151495, - "width": 1, - "height": 37, - "angle": 0, - "strokeColor": "#000000", - "backgroundColor": "#4c6ef5", + "version": 24, + "versionNonce": 1236016773, + "isDeleted": false, + "id": "AnYN90VNUcVDpxC9ADTff", "fillStyle": "hachure", "strokeWidth": 1, "strokeStyle": "solid", "roughness": 1, "opacity": 100, - "groupIds": [], - "strokeSharpness": "round", + "angle": 0, + "x": 1178.9779855600796, + "y": 682.7164081151495, + "strokeColor": "#000000", + "backgroundColor": "#4c6ef5", + "width": 1, + "height": 37, "seed": 449404421, - "version": 24, - "versionNonce": 1236016773, - "isDeleted": false, - "boundElements": null, + "groupIds": [], + "roundness": { + "type": 2 + }, + "boundElements": [], "updated": 1642363291245, - "points": [ - [ - 0, - 0 - ], - [ - 1, - 37 - ] - ], - "lastCommittedPoint": null, + "link": null, + "locked": false, "startBinding": { "elementId": "ame5KpH1huZ8Ub_T8uka_", "focus": 0.06421949253976675, @@ -2331,32 +2368,56 @@ "focus": -0.008772053768841522, "gap": 3.283591884850466 }, + "lastCommittedPoint": null, "startArrowhead": null, - "endArrowhead": "arrow" + "endArrowhead": "arrow", + "points": [ + [ + 0, + 0 + ], + [ + 1, + 37 + ] + ] }, { - "id": "gtZquKq5CmPjBvT5zPYx_", "type": "arrow", - "x": 2169.97798556008, - "y": 280.7164081151496, - "width": 5, - "height": 828, - "angle": 0, - "strokeColor": "#000000", - "backgroundColor": "#4c6ef5", + "version": 63, + "versionNonce": 1680068395, + "isDeleted": false, + "id": "gtZquKq5CmPjBvT5zPYx_", "fillStyle": "hachure", "strokeWidth": 1, "strokeStyle": "solid", "roughness": 1, "opacity": 100, - "groupIds": [], - "strokeSharpness": "round", + "angle": 0, + "x": 2169.97798556008, + "y": 280.7164081151496, + "strokeColor": "#000000", + "backgroundColor": "#4c6ef5", + "width": 5, + "height": 828, "seed": 1513379403, - "version": 63, - "versionNonce": 1680068395, - "isDeleted": false, - "boundElements": null, + "groupIds": [], + "roundness": { + "type": 2 + }, + "boundElements": [], "updated": 1642363729657, + "link": null, + "locked": false, + "startBinding": { + "elementId": "pTeJ3U7XEGejOdbsJG4PK", + "focus": -0.14357046172356744, + "gap": 7 + }, + "endBinding": null, + "lastCommittedPoint": null, + "startArrowhead": null, + "endArrowhead": "arrow", "points": [ [ 0, @@ -2366,38 +2427,29 @@ 5, 828 ] - ], - "lastCommittedPoint": null, - "startBinding": { - "elementId": "pTeJ3U7XEGejOdbsJG4PK", - "focus": -0.14357046172356744, - "gap": 7 - }, - "endBinding": null, - "startArrowhead": null, - "endArrowhead": "arrow" + ] }, { - "id": "pTeJ3U7XEGejOdbsJG4PK", "type": "text", - "x": 2146.97798556008, - "y": 248.7164081151496, - "width": 40, - "height": 25, - "angle": 0, - "strokeColor": "#000000", - "backgroundColor": "#4c6ef5", + "version": 20, + "versionNonce": 1009011973, + "isDeleted": false, + "id": "pTeJ3U7XEGejOdbsJG4PK", "fillStyle": "hachure", "strokeWidth": 1, "strokeStyle": "solid", "roughness": 1, "opacity": 100, - "groupIds": [], - "strokeSharpness": "sharp", + "angle": 0, + "x": 2146.97798556008, + "y": 248.7164081151496, + "strokeColor": "#000000", + "backgroundColor": "#4c6ef5", + "width": 40, + "height": 25, "seed": 951153637, - "version": 20, - "versionNonce": 1009011973, - "isDeleted": false, + "groupIds": [], + "roundness": null, "boundElements": [ { "id": "gtZquKq5CmPjBvT5zPYx_", @@ -2405,46 +2457,533 @@ } ], "updated": 1642363729657, - "text": "time", + "link": null, + "locked": false, "fontSize": 20, "fontFamily": 1, + "text": "time", + "baseline": 18, "textAlign": "left", "verticalAlign": "top", - "baseline": 18, "containerId": null, "originalText": "time" }, { - "id": "rmYZa4dRgx0upN4eNZ4yT", "type": "text", - "x": 2169.97798556008, - "y": 1110.7164081151498, - "width": 12, - "height": 25, - "angle": 0, - "strokeColor": "#000000", - "backgroundColor": "#4c6ef5", + "version": 3, + "versionNonce": 241629963, + "isDeleted": false, + "id": "rmYZa4dRgx0upN4eNZ4yT", "fillStyle": "hachure", "strokeWidth": 1, "strokeStyle": "solid", "roughness": 1, "opacity": 100, - "groupIds": [], - "strokeSharpness": "sharp", + "angle": 0, + "x": 2169.97798556008, + "y": 1110.7164081151498, + "strokeColor": "#000000", + "backgroundColor": "#4c6ef5", + "width": 12, + "height": 25, "seed": 1448362091, - "version": 3, - "versionNonce": 241629963, - "isDeleted": false, - "boundElements": null, + "groupIds": [], + "roundness": null, + "boundElements": [], "updated": 1642363731040, - "text": "t", + "link": null, + "locked": false, "fontSize": 20, "fontFamily": 1, + "text": "t", + "baseline": 18, "textAlign": "left", "verticalAlign": "top", - "baseline": 18, "containerId": null, "originalText": "t" + }, + { + "type": "rectangle", + "version": 346, + "versionNonce": 1037920700, + "isDeleted": false, + "id": "Cekq0efgl_SJwRd7oNR-0", + "fillStyle": "hachure", + "strokeWidth": 1, + "strokeStyle": "solid", + "roughness": 1, + "opacity": 100, + "angle": 0, + "x": 1269.2461338933213, + "y": 527.8174257982937, + "strokeColor": "#000000", + "backgroundColor": "#15aabf", + "width": 134.99999999999997, + "height": 81, + "seed": 1865530987, + "groupIds": [], + "roundness": null, + "boundElements": [ + { + "id": "bGxFT6G-0FOLDWK-p8c8d", + "type": "arrow" + }, + { + "id": "MrU5ySXt8WIDsacrKzx6g", + "type": "arrow" + }, + { + "id": "2fm-KXOwvvrYtUoVFHWmi", + "type": "arrow" + } + ], + "updated": 1672160240423, + "link": null, + "locked": false + }, + { + "type": "text", + "version": 219, + "versionNonce": 325779204, + "isDeleted": false, + "id": "GUyJja7TFYxyVymYTDGX3", + "fillStyle": "hachure", + "strokeWidth": 1, + "strokeStyle": "solid", + "roughness": 1, + "opacity": 100, + "angle": 0, + "x": 1305.3326207097275, + "y": 557.4712649096218, + "strokeColor": "#000000", + "backgroundColor": "transparent", + "width": 57, + "height": 25, + "seed": 1913604924, + "groupIds": [], + "roundness": null, + "boundElements": null, + "updated": 1672159948391, + "link": null, + "locked": false, + "fontSize": 20, + "fontFamily": 1, + "text": "tasks", + "baseline": 18, + "textAlign": "left", + "verticalAlign": "top", + "containerId": null, + "originalText": "tasks" + }, + { + "type": "arrow", + "version": 320, + "versionNonce": 64885508, + "isDeleted": false, + "id": "bGxFT6G-0FOLDWK-p8c8d", + "fillStyle": "hachure", + "strokeWidth": 1, + "strokeStyle": "solid", + "roughness": 1, + "opacity": 100, + "angle": 0, + "x": 1334.3820073054962, + "y": 610.0795178136458, + "strokeColor": "#000000", + "backgroundColor": "#fa5252", + "width": 287.98164899430867, + "height": 68.7005615234375, + "seed": 2027128324, + "groupIds": [], + "roundness": { + "type": 2 + }, + "boundElements": null, + "updated": 1672159998129, + "link": null, + "locked": false, + "startBinding": { + "elementId": "Cekq0efgl_SJwRd7oNR-0", + "focus": 0.7477745794454563, + "gap": 1.2620920153520956 + }, + "endBinding": { + "elementId": "t0VdGC-iBiarP2bozp1gL", + "focus": 0.699494006127484, + "gap": 12.219920662916707 + }, + "lastCommittedPoint": null, + "startArrowhead": null, + "endArrowhead": "arrow", + "points": [ + [ + 0, + 0 + ], + [ + 287.98164899430867, + 68.7005615234375 + ] + ] + }, + { + "type": "rectangle", + "version": 642, + "versionNonce": 1031951932, + "isDeleted": false, + "id": "WNkKzxwTek-buG6Z1F6Mo", + "fillStyle": "hachure", + "strokeWidth": 1, + "strokeStyle": "solid", + "roughness": 1, + "opacity": 100, + "angle": 0, + "x": 1264.7894078191023, + "y": 1012.5757570971218, + "strokeColor": "#000000", + "backgroundColor": "#4c6ef5", + "width": 134.99999999999997, + "height": 81, + "seed": 1345131140, + "groupIds": [], + "roundness": null, + "boundElements": [ + { + "type": "text", + "id": "9Q_Go7jImjot1i2IJ6p50" + }, + { + "id": "MrU5ySXt8WIDsacrKzx6g", + "type": "arrow" + }, + { + "id": "e92jNKrSDSVuXHsMKYG50", + "type": "arrow" + } + ], + "updated": 1672160381731, + "link": null, + "locked": false + }, + { + "id": "9Q_Go7jImjot1i2IJ6p50", + "type": "text", + "x": 1280.2894078191023, + "y": 1040.5757570971218, + "width": 104, + "height": 25, + "angle": 0, + "strokeColor": "#000000", + "backgroundColor": "#4c6ef5", + "fillStyle": "hachure", + "strokeWidth": 1, + "strokeStyle": "solid", + "roughness": 0, + "opacity": 100, + "groupIds": [], + "roundness": null, + "seed": 422162620, + "version": 25, + "versionNonce": 887258884, + "isDeleted": false, + "boundElements": null, + "updated": 1672160148231, + "link": null, + "locked": false, + "text": "follow lane", + "fontSize": 20, + "fontFamily": 1, + "textAlign": "center", + "verticalAlign": "middle", + "baseline": 18, + "containerId": "WNkKzxwTek-buG6Z1F6Mo", + "originalText": "follow lane" + }, + { + "type": "arrow", + "version": 646, + "versionNonce": 587125252, + "isDeleted": false, + "id": "MrU5ySXt8WIDsacrKzx6g", + "fillStyle": "hachure", + "strokeWidth": 1, + "strokeStyle": "dotted", + "roughness": 1, + "opacity": 100, + "angle": 0, + "x": 1331.0423888183714, + "y": 619.8468259022055, + "strokeColor": "#000000", + "backgroundColor": "#4c6ef5", + "width": 3.4565418407464676, + "height": 383.507568359375, + "seed": 1699920060, + "groupIds": [], + "roundness": { + "type": 2 + }, + "boundElements": null, + "updated": 1672160107953, + "link": null, + "locked": false, + "startBinding": { + "elementId": "Cekq0efgl_SJwRd7oNR-0", + "focus": 0.09088890692121962, + "gap": 11.029400103911826 + }, + "endBinding": { + "elementId": "WNkKzxwTek-buG6Z1F6Mo", + "focus": 0.03916096596598291, + "gap": 9.2213628355413 + }, + "lastCommittedPoint": null, + "startArrowhead": null, + "endArrowhead": "arrow", + "points": [ + [ + 0, + 0 + ], + [ + 3.4565418407464676, + 383.507568359375 + ] + ] + }, + { + "type": "arrow", + "version": 1056, + "versionNonce": 599527940, + "isDeleted": false, + "id": "2fm-KXOwvvrYtUoVFHWmi", + "fillStyle": "hachure", + "strokeWidth": 1, + "strokeStyle": "solid", + "roughness": 1, + "opacity": 100, + "angle": 0, + "x": 1634.4221653328705, + "y": 404.4112952366328, + "strokeColor": "#000000", + "backgroundColor": "transparent", + "width": 302.1381812809882, + "height": 120.69560241699219, + "seed": 699837828, + "groupIds": [], + "roundness": { + "type": 2 + }, + "boundElements": null, + "updated": 1672160320149, + "link": null, + "locked": false, + "startBinding": { + "elementId": "WTDldziLTtQz8IskT5HTT", + "gap": 10.78634101299997, + "focus": -0.6841008591011289 + }, + "endBinding": { + "elementId": "Cekq0efgl_SJwRd7oNR-0", + "gap": 2.7105281446687286, + "focus": -0.66691585023776 + }, + "lastCommittedPoint": null, + "startArrowhead": null, + "endArrowhead": "arrow", + "points": [ + [ + 0, + 0 + ], + [ + -302.1381812809882, + 120.69560241699219 + ] + ] + }, + { + "type": "rectangle", + "version": 641, + "versionNonce": 1924037508, + "isDeleted": false, + "id": "elQ-C2Q_n8NhztaIBanY4", + "fillStyle": "hachure", + "strokeWidth": 1, + "strokeStyle": "solid", + "roughness": 1, + "opacity": 100, + "angle": 0, + "x": 1577.2288609441025, + "y": 137.16929347407495, + "strokeColor": "#000000", + "backgroundColor": "transparent", + "width": 134.99999999999997, + "height": 81, + "seed": 937900220, + "groupIds": [], + "roundness": null, + "boundElements": [ + { + "id": "2fm-KXOwvvrYtUoVFHWmi", + "type": "arrow" + }, + { + "id": "7tmiLF0VuxWnTf665kBe4", + "type": "arrow" + }, + { + "id": "HPyrpHB6_eezFm3o5_XzQ", + "type": "arrow" + } + ], + "updated": 1672160331674, + "link": null, + "locked": false + }, + { + "type": "text", + "version": 395, + "versionNonce": 1239182084, + "isDeleted": false, + "id": "qMtHEQqJV0zQzxKUMUQQ1", + "fillStyle": "hachure", + "strokeWidth": 1, + "strokeStyle": "solid", + "roughness": 1, + "opacity": 100, + "angle": 0, + "x": 1606.697733014415, + "y": 339.4702883471218, + "strokeColor": "#000000", + "backgroundColor": "transparent", + "width": 72, + "height": 25, + "seed": 334495236, + "groupIds": [], + "roundness": null, + "boundElements": [ + { + "id": "HPyrpHB6_eezFm3o5_XzQ", + "type": "arrow" + } + ], + "updated": 1672160331674, + "link": null, + "locked": false, + "fontSize": 20, + "fontFamily": 1, + "text": "training", + "baseline": 18, + "textAlign": "center", + "verticalAlign": "top", + "containerId": null, + "originalText": "training" + }, + { + "type": "arrow", + "version": 1357, + "versionNonce": 1765811260, + "isDeleted": false, + "id": "HPyrpHB6_eezFm3o5_XzQ", + "fillStyle": "hachure", + "strokeWidth": 1, + "strokeStyle": "solid", + "roughness": 1, + "opacity": 100, + "angle": 0, + "x": 1648.3748713358211, + "y": 222.96376108764164, + "strokeColor": "#000000", + "backgroundColor": "transparent", + "width": 1.5871506025766848, + "height": 78.09892272949216, + "seed": 380608644, + "groupIds": [], + "roundness": { + "type": 2 + }, + "boundElements": null, + "updated": 1672160336329, + "link": null, + "locked": false, + "startBinding": { + "elementId": "elQ-C2Q_n8NhztaIBanY4", + "focus": -0.06683686416453856, + "gap": 4.794467613566695 + }, + "endBinding": { + "elementId": "WTDldziLTtQz8IskT5HTT", + "focus": -0.0431178532452355, + "gap": 11.56227040649901 + }, + "lastCommittedPoint": null, + "startArrowhead": null, + "endArrowhead": "arrow", + "points": [ + [ + 0, + 0 + ], + [ + -1.5871506025766848, + 78.09892272949216 + ] + ] + }, + { + "type": "arrow", + "version": 644, + "versionNonce": 1529096836, + "isDeleted": false, + "id": "e92jNKrSDSVuXHsMKYG50", + "fillStyle": "hachure", + "strokeWidth": 1, + "strokeStyle": "solid", + "roughness": 1, + "opacity": 100, + "angle": 0, + "x": 1319.592134773697, + "y": 1109.6018498311405, + "strokeColor": "#000000", + "backgroundColor": "#4c6ef5", + "width": 179.1640625, + "height": 41.66094970703125, + "seed": 410600764, + "groupIds": [], + "roundness": { + "type": 2 + }, + "boundElements": null, + "updated": 1672160382611, + "link": null, + "locked": false, + "startBinding": { + "elementId": "WNkKzxwTek-buG6Z1F6Mo", + "focus": 0.8989047508226017, + "gap": 16.026092734018675 + }, + "endBinding": { + "elementId": "rSLMGDDSANk39uZ1qSXoL", + "focus": -0.7454867601743698, + "gap": 11.607342995202998 + }, + "lastCommittedPoint": null, + "startArrowhead": null, + "endArrowhead": "arrow", + "points": [ + [ + 0, + 0 + ], + [ + 82.74505615234375, + 34.7000732421875 + ], + [ + 179.1640625, + -6.96087646484375 + ] + ] } ], "appState": { diff --git a/rl_studio/envs/envs_type.py b/rl_studio/envs/envs_type.py new file mode 100644 index 000000000..ed4586645 --- /dev/null +++ b/rl_studio/envs/envs_type.py @@ -0,0 +1,7 @@ +from enum import Enum + + +class EnvsType(Enum): + GAZEBO = "gazebo" + CARLA = "carla" + OPENAI = "openai" diff --git a/rl_studio/envs/gazebo/f1/image_f1.py b/rl_studio/envs/gazebo/f1/image_f1.py index 8ee6b0210..dc38c2fa7 100644 --- a/rl_studio/envs/gazebo/f1/image_f1.py +++ b/rl_studio/envs/gazebo/f1/image_f1.py @@ -7,6 +7,85 @@ from cv_bridge import CvBridge from sensor_msgs.msg import Image as ImageROS +from rl_studio.agents.f1.settings import QLearnConfig + + +class ImageF1: + + font = cv2.FONT_HERSHEY_COMPLEX + + def __init__(self): + self.height = 3 # Image height [pixels] + self.width = 3 # Image width [pixels] + self.timeStamp = 0 # Time stamp [s] */ + self.format = "" # Image format string (RGB8, BGR,...) + self.data = np.zeros( + (self.height, self.width, 3), np.uint8 + ) # The image data itself + self.data.shape = self.height, self.width, 3 + #self.config = QLearnConfig() + + def __str__(self): + return ( + f"Image:" + f"\nHeight: {self.height}\nWidth: {self.width}\n" + f"Format: {self.format}\nTimeStamp: {self.timeStamp}\nData: {self.data}" + ) + + @staticmethod + def image_msg_to_image(img, cv_image): + + img.width = img.width + img.height = img.height + img.format = "RGB8" + img.timeStamp = img.header.stamp.secs + (img.header.stamp.nsecs * 1e-9) + img.data = cv_image + + return img + + def show_telemetry(self, img, points, action, reward): + count = 0 + for idx, point in enumerate(points): + cv2.line( + img, + (320, self.config.x_row[idx]), + (320, self.config.x_row[idx]), + (255, 255, 0), + thickness=5, + ) + # cv2.line(img, (center_image, x_row[idx]), (point, x_row[idx]), (255, 255, 255), thickness=2) + cv2.putText( + img, + str("err{}: {}".format(idx + 1, self.config.center_image - point)), + (18, 340 + count), + self.font, + 0.4, + (255, 255, 255), + 1, + ) + count += 20 + cv2.putText( + img, + str(f"action: {action}"), + (18, 280), + self, + 0.4, + (255, 255, 255), + 1, + ) + cv2.putText( + img, + str(f"reward: {reward}"), + (18, 320), + self, + 0.4, + (255, 255, 255), + 1, + ) + + cv2.imshow("Image window", img[240:]) + cv2.waitKey(3) + def imageMsg2Image(img, bridge): image = Image() @@ -20,24 +99,29 @@ def imageMsg2Image(img, bridge): class Image: - def __init__(self): self.height = 3 # Image height [pixels] self.width = 3 # Image width [pixels] self.timeStamp = 0 # Time stamp [s] */ self.format = "" # Image format string (RGB8, BGR,...) - self.data = np.zeros((self.height, self.width, 3), np.uint8) # The image data itself + self.data = np.zeros( + (self.height, self.width, 3), np.uint8 + ) # The image data itself self.data.shape = self.height, self.width, 3 def __str__(self): - s = "Image: {\n height: " + str(self.height) + "\n width: " + str(self.width) + s = ( + "Image: {\n height: " + + str(self.height) + + "\n width: " + + str(self.width) + ) s = s + "\n format: " + self.format + "\n timeStamp: " + str(self.timeStamp) s = s + "\n data: " + str(self.data) + "\n}" return s class ListenerCamera: - def __init__(self, topic): self.topic = topic self.data = Image() @@ -47,7 +131,6 @@ def __init__(self, topic): self.bridge = CvBridge() self.start() - def __callback(self, img): self.total_frames += 1 diff --git a/rl_studio/envs/gazebo/f1/models/__init__.py b/rl_studio/envs/gazebo/f1/models/__init__.py index ba28612c9..981668f01 100644 --- a/rl_studio/envs/gazebo/f1/models/__init__.py +++ b/rl_studio/envs/gazebo/f1/models/__init__.py @@ -1,9 +1,11 @@ -from rl_studio.envs.gazebo.f1.env_type import EnvironmentType +from rl_studio.agents.tasks_type import TasksType +from rl_studio.agents.frameworks_type import FrameworksType +from rl_studio.algorithms.algorithms_type import AlgorithmsType from rl_studio.envs.gazebo.f1.exceptions import NoValidEnvironmentType class F1Env: - def __new__(cls, **config): + def __new__(cls, **environment): cls.circuit = None cls.vel_pub = None cls.unpause = None @@ -14,72 +16,117 @@ def __new__(cls, **config): cls.model_coordinates = None cls.position = None - training_type = config.get("training_type") - - # Qlearn F1 FollowLine camera - if training_type == EnvironmentType.qlearn_env_camera_follow_line.value: - from rl_studio.envs.gazebo.f1.models.f1_env_camera import ( - F1CameraEnv, + algorithm = environment["algorithm"] + task = environment["task"] + framework = environment["framework"] + + # ============================= + # FollowLine - qlearn - (we are already in F1 - Gazebo) + # ============================= + if ( + task == TasksType.FOLLOWLINEGAZEBO.value + and algorithm == AlgorithmsType.QLEARN.value + ): + from rl_studio.envs.gazebo.f1.models.followline_qlearn import ( + FollowLineQlearnF1Gazebo, ) - return F1CameraEnv(**config) - - # Qlearn F1 FollowLane camera - elif training_type == EnvironmentType.qlearn_env_camera_follow_lane.value: - from rl_studio.envs.gazebo.f1.models.f1_env_camera import ( - QlearnF1FollowLaneEnvGazebo, - ) - - return QlearnF1FollowLaneEnvGazebo(**config) - - # Qlearn F1 FollowLine laser - elif training_type == EnvironmentType.qlearn_env_laser_follow_line.value: - from rl_studio.envs.gazebo.f1.models.f1_env_qlearn_laser import ( - F1QlearnLaserEnv, + return FollowLineQlearnF1Gazebo(**environment) + + # ============================= + # FollowLane - qlearn + # ============================= + if ( + task == TasksType.FOLLOWLANEGAZEBO.value + and algorithm == AlgorithmsType.QLEARN.value + ): + from rl_studio.envs.gazebo.f1.models.followlane_qlearn import ( + FollowLaneQlearnF1Gazebo, ) - return F1QlearnLaserEnv(**config) + return FollowLaneQlearnF1Gazebo(**environment) + # ============================= + # FollowLine - DQN - TensorFlow + # ============================= # DQN F1 FollowLine - elif training_type == EnvironmentType.dqn_env_follow_line.value: - from rl_studio.envs.gazebo.f1.models.f1_env_dqn_camera import ( - DQNF1FollowLineEnvGazebo, + elif ( + task == TasksType.FOLLOWLINEGAZEBO.value + and algorithm == AlgorithmsType.DQN.value + and framework == FrameworksType.TF.value + ): + from rl_studio.envs.gazebo.f1.models.followline_dqn_tf import ( + FollowLineDQNF1GazeboTF, ) - return DQNF1FollowLineEnvGazebo(**config) + return FollowLineDQNF1GazeboTF(**environment) + # ============================= + # FollowLane - DQN - TensorFlow + # ============================= # DQN F1 FollowLane - elif training_type == EnvironmentType.dqn_env_follow_lane.value: - from rl_studio.envs.gazebo.f1.models.f1_env_dqn_camera import ( - DQNF1FollowLaneEnvGazebo, + elif ( + task == TasksType.FOLLOWLANEGAZEBO.value + and algorithm == AlgorithmsType.DQN.value + and framework == FrameworksType.TF.value + ): + from rl_studio.envs.gazebo.f1.models.followlane_dqn_tf import ( + FollowLaneDQNF1GazeboTF, ) - return DQNF1FollowLaneEnvGazebo(**config) - - # DDPG F1 FollowLine - elif training_type == EnvironmentType.ddpg_env_follow_line.value: - from rl_studio.envs.gazebo.f1.models.f1_env_ddpg import ( - DDPGF1FollowLineEnvGazebo, + return FollowLaneDQNF1GazeboTF(**environment) + + # ============================= + # FollowLine - DDPG - TensorFlow + # ============================= + elif ( + task == TasksType.FOLLOWLINEGAZEBO.value + and algorithm == AlgorithmsType.DDPG.value + and framework == FrameworksType.TF.value + ): + from rl_studio.envs.gazebo.f1.models.followline_ddpg_tf import ( + FollowLineDDPGF1GazeboTF, ) - return DDPGF1FollowLineEnvGazebo(**config) + return FollowLineDDPGF1GazeboTF(**environment) + # ============================= + # FollowLane - DDPG - TensorFlow + # ============================= # DDPG F1 FollowLane - elif training_type == EnvironmentType.ddpg_env_follow_lane.value: - from rl_studio.envs.gazebo.f1.models.f1_env_ddpg import ( - DDPGF1FollowLaneEnvGazebo, + elif ( + task == TasksType.FOLLOWLANEGAZEBO.value + and algorithm == AlgorithmsType.DDPG.value + and framework == FrameworksType.TF.value + ): + from rl_studio.envs.gazebo.f1.models.followlane_ddpg_tf import ( + FollowLaneDDPGF1GazeboTF, ) - return DDPGF1FollowLaneEnvGazebo(**config) - - # F1 Manual - elif training_type == EnvironmentType.manual_env.value: - from rl_studio.envs.gazebo.f1.models.f1_env_manual_pilot import ( - GazeboF1ManualCameraEnv, + return FollowLaneDDPGF1GazeboTF(**environment) + + # ============================= + # FollowLine - qlearn - Manual + # ============================= + if ( + task == TasksType.FOLLOWLINEGAZEBO.value + and algorithm == AlgorithmsType.MANUAL.value + ): + from rl_studio.envs.gazebo.f1.models.followline_qlearn_manual import ( + FollowLineQlearnF1Gazebo, ) - return GazeboF1ManualCameraEnv(**config) + return FollowLineQlearnF1Gazebo(**environment) + + # ============================= + # FollowLine - qlearn - (we are already in F1 - Gazebo) - laser + # ============================= + # elif training_type == EnvironmentType.qlearn_env_laser_follow_line.value: + # from rl_studio.envs.gazebo.f1.models.f1_env_qlearn_laser import ( + # F1QlearnLaserEnv, + # ) + + # return F1QlearnLaserEnv(**config) - # Wrong! else: - raise NoValidEnvironmentType(training_type) + raise NoValidEnvironmentType(task) diff --git a/rl_studio/envs/gazebo/f1/models/followlane_ddpg_tf.py b/rl_studio/envs/gazebo/f1/models/followlane_ddpg_tf.py new file mode 100644 index 000000000..a43f088ad --- /dev/null +++ b/rl_studio/envs/gazebo/f1/models/followlane_ddpg_tf.py @@ -0,0 +1,275 @@ +############################################# +# - Task: Follow Lane +# - Algorithm: DDPG +# - actions: discrete and continuous +# - State: Simplified perception and raw image +# +############################################ + + +from rl_studio.envs.gazebo.f1.models.f1_env import F1Env +from rl_studio.envs.gazebo.f1.models.settings import F1GazeboTFConfig + + +class FollowLaneDDPGF1GazeboTF(F1Env): + def __init__(self, **config): + + ###### init F1env + F1Env.__init__(self, **config) + ###### init class variables + F1GazeboTFConfig.__init__(self, **config) + + def reset(self): + from rl_studio.envs.gazebo.f1.models.reset import ( + Reset, + ) + + if self.state_space == "image": + return Reset.reset_f1_state_image(self) + else: + return Reset.reset_f1_state_sp(self) + + def step(self, action, step): + from rl_studio.envs.gazebo.f1.models.step import ( + StepFollowLane, + ) + + if self.state_space == "image" and self.action_space != "continuous": + return StepFollowLane.step_followlane_state_image_actions_discretes( + self, action, step + ) + elif self.state_space == "image" and self.action_space == "continuous": + return StepFollowLane.step_followlane_state_image_actions_continuous( + self, action, step + ) + elif self.state_space != "image" and self.action_space == "continuous": + return StepFollowLane.step_followlane_state_sp_actions_continuous( + self, action, step + ) + else: + return StepFollowLane.step_followlane_state_sp_actions_discretes( + self, action, step + ) + + +''' + +class FollowLaneDDPGF1GazeboTF(F1Env): + def __init__(self, **config): + + F1Env.__init__(self, **config) + self.simplifiedperception = F1GazeboSimplifiedPerception() + self.f1gazeborewards = F1GazeboRewards() + self.f1gazeboutils = F1GazeboUtils() + self.f1gazeboimages = F1GazeboImages() + + self.image = ImageF1() + self.image_raw_from_topic = None + self.f1_image_camera = None + self.sensor = config["sensor"] + + # Image + self.image_resizing = config["image_resizing"] / 100 + self.new_image_size = config["new_image_size"] + self.raw_image = config["raw_image"] + self.height = int(config["height_image"] * self.image_resizing) + self.width = int(config["width_image"] * self.image_resizing) + self.center_image = int(config["center_image"] * self.image_resizing) + self.num_regions = config["num_regions"] + self.pixel_region = int(self.center_image / self.num_regions) * 2 + self.telemetry_mask = config["telemetry_mask"] + self.poi = config["x_row"][0] + self.image_center = None + self.right_lane_center_image = config["center_image"] + ( + config["center_image"] // 2 + ) + self.lower_limit = config["lower_limit"] + + # States + self.state_space = config["states"] + if self.state_space == "spn": + self.x_row = [i for i in range(1, int(self.height / 2) - 1)] + else: + self.x_row = config["x_row"] + + # Actions + self.action_space = config["action_space"] + self.actions = config["actions"] + + # Rewards + self.reward_function = config["reward_function"] + self.rewards = config["rewards"] + self.min_reward = config["min_reward"] + if self.action_space == "continuous": + self.beta_1 = self.actions["w"][1] / ( + self.actions["v"][1] - self.actions["v"][0] + ) + self.beta_0 = self.beta_1 * self.actions["v"][1] + + # Others + self.telemetry = config["telemetry"] + + print_messages( + "FollowLaneDDPGF1GazeboTF()", + actions=self.actions, + len_actions=len(self.actions), + # actions_v=self.actions["v"], # for continuous actions + # actions_v=self.actions[0], # for discrete actions + # beta_1=self.beta_1, + # beta_0=self.beta_0, + rewards=self.rewards, + ) + + ################################################################################# + # reset + ################################################################################# + + def reset(self): + """ + Main reset. Depending of: + - sensor + - states: images or simplified perception (sp) + + """ + if self.sensor == "camera": + return self.reset_camera() + + def reset_camera(self): + self._gazebo_reset() + # === POSE === + if self.alternate_pose: + self._gazebo_set_random_pose_f1_follow_rigth_lane() + else: + self._gazebo_set_fix_pose_f1_follow_right_lane() + + self._gazebo_unpause() + + ##==== get image from sensor camera + f1_image_camera, _ = self.get_camera_info() + self._gazebo_pause() + + ##==== calculating State + # image as observation + if self.state_space == "image": + state = np.array( + self.f1gazeboimages.image_preprocessing_black_white_32x32( + f1_image_camera.data, self.height + ) + ) + state_size = state.shape + + # simplified perception as observation + else: + ( + centrals_in_pixels, + centrals_normalized, + ) = self.simplifiedperception.calculate_centrals_lane( + f1_image_camera.data, + self.height, + self.width, + self.x_row, + self.lower_limit, + self.center_image, + ) + states = self.simplifiedperception.calculate_observation( + centrals_in_pixels, self.center_image, self.pixel_region + ) + state = [states[0]] + state_size = len(state) + + return state, state_size + + ################################################################################# + # Camera + ################################################################################# + def get_camera_info(self): + image_data = None + f1_image_camera = None + success = False + + while image_data is None or success is False: + image_data = rospy.wait_for_message( + "/F1ROS/cameraL/image_raw", Image, timeout=5 + ) + cv_image = CvBridge().imgmsg_to_cv2(image_data, "bgr8") + f1_image_camera = self.image_msg_to_image(image_data, cv_image) + # f1_image_camera = image_msg_to_image(image_data, cv_image) + if np.any(cv_image): + success = True + + return f1_image_camera, cv_image + + def image_msg_to_image(self, img, cv_image): + self.image.width = img.width + self.image.height = img.height + self.image.format = "RGB8" + self.image.timeStamp = img.header.stamp.secs + (img.header.stamp.nsecs * 1e-9) + self.image.data = cv_image + + return self.image + + ################################################################################# + # step + ################################################################################# + + def step(self, action, step): + self._gazebo_unpause() + vel_cmd = Twist() + + if self.action_space == "continuous": + vel_cmd.linear.x = action[0][0] + vel_cmd.angular.z = action[0][1] + else: + vel_cmd.linear.x = self.actions[action][0] + vel_cmd.angular.z = self.actions[action][1] + + self.vel_pub.publish(vel_cmd) + + ##==== get image from sensor camera + f1_image_camera, _ = self.get_camera_info() + self._gazebo_pause() + + ##==== get center + points, centrals_normalized = self.simplifiedperception.processed_image( + f1_image_camera.data, self.height, self.width, self.x_row, self.center_image + ) + if self.state_space == "spn": + self.point = points[self.poi] + else: + self.point = points[0] + # center = abs(float(self.center_image - self.point) / (float(self.width) // 2)) + center = float(self.center_image - self.point) / (float(self.width) // 2) + + ##==== get State + ##==== image as observation + if self.state_space == "image": + state = np.array( + self.f1gazeboimages.image_preprocessing_black_white_32x32( + f1_image_camera.data, self.height + ) + ) + + ##==== simplified perception as observation + else: + state = self.simplifiedperception.calculate_observation( + points, self.center_image, self.pixel_region + ) + + ##==== get Rewards + if self.reward_function == "follow_right_lane_center_v_step": + reward, done = self.f1gazeborewards.rewards_followlane_v_centerline_step( + vel_cmd, center, step, self.rewards + ) + elif ( + self.reward_function == "follow_right_lane_center_v_w_linear" + ): # this reward function ONLY for continuous actions + reward, done = self.f1gazeborewards.rewards_followlane_v_w_centerline( + vel_cmd, center, self.rewards, self.beta_1, self.beta_0 + ) + else: + reward, done = self.f1gazeborewards.rewards_followlane_centerline( + center, self.rewards + ) + + return state, reward, done, {} +''' diff --git a/rl_studio/envs/gazebo/f1/models/followlane_dqn_tf.py b/rl_studio/envs/gazebo/f1/models/followlane_dqn_tf.py new file mode 100644 index 000000000..19b28e515 --- /dev/null +++ b/rl_studio/envs/gazebo/f1/models/followlane_dqn_tf.py @@ -0,0 +1,176 @@ +############################################# +# - Task: Follow Lane +# - Algorithm: DQN +# - actions: discrete +# - State: Simplified perception and raw image +# +############################################ + +from geometry_msgs.msg import Twist +import numpy as np + +from rl_studio.agents.utils import ( + print_messages, +) +from rl_studio.envs.gazebo.f1.models.f1_env import F1Env +from rl_studio.envs.gazebo.f1.models.settings import F1GazeboTFConfig + + +class FollowLaneDQNF1GazeboTF(F1Env): + def __init__(self, **config): + + ###### init F1env + F1Env.__init__(self, **config) + ###### init class variables + F1GazeboTFConfig.__init__(self, **config) + + def reset(self): + from rl_studio.envs.gazebo.f1.models.reset import ( + Reset, + ) + + if self.state_space == "image": + return Reset.reset_f1_state_image(self) + else: + return Reset.reset_f1_state_sp(self) + + def step(self, action, step): + from rl_studio.envs.gazebo.f1.models.step import ( + StepFollowLane, + ) + + if self.state_space == "image": + return StepFollowLane.step_followlane_state_image_actions_discretes( + self, action, step + ) + else: + return StepFollowLane.step_followlane_state_sp_actions_discretes( + self, action, step + ) + + +############ OLD Class ###################### +class _FollowLaneDQNF1GazeboTF(F1Env): + def __init__(self, **config): + + ###### init F1env + F1Env.__init__(self, **config) + ###### init class variables + F1GazeboTFConfig.__init__(self, **config) + + print_messages( + "FollowLaneDQNF1GazeboTF()", + actions=self.actions, + len_actions=len(self.actions), + # actions_v=self.actions["v"], # for continuous actions + # actions_v=self.actions[0], # for discrete actions + # beta_1=self.beta_1, + # beta_0=self.beta_0, + rewards=self.rewards, + ) + + ######### + # reset + ######### + + def reset(self): + + if self.sensor == "camera": + return self.reset_camera() + + def reset_camera(self): + self._gazebo_reset() + # === POSE === + if self.alternate_pose: + self._gazebo_set_random_pose_f1_follow_rigth_lane() + else: + self._gazebo_set_fix_pose_f1_follow_right_lane() + + self._gazebo_unpause() + + ##==== get image from sensor camera + f1_image_camera, _ = self.f1gazeboimages.get_camera_info() + self._gazebo_pause() + + ##==== calculating State + # image as observation + if self.state_space == "image": + state = np.array( + self.f1gazeboimages.image_preprocessing_black_white_32x32( + f1_image_camera.data, self.height + ) + ) + state_size = state.shape + + # simplified perception as observation + else: + ( + centrals_in_pixels, + centrals_normalized, + ) = self.simplifiedperception.calculate_centrals_lane( + f1_image_camera.data, + self.height, + self.width, + self.x_row, + self.lower_limit, + self.center_image, + ) + states = self.simplifiedperception.calculate_observation( + centrals_in_pixels, self.center_image, self.pixel_region + ) + state = [states[0]] + state_size = len(state) + + return state, state_size + + ######### + # step + ######### + def step(self, action, step): + self._gazebo_unpause() + vel_cmd = Twist() + vel_cmd.linear.x = self.actions[action][0] + vel_cmd.angular.z = self.actions[action][1] + self.vel_pub.publish(vel_cmd) + + ##==== get image from sensor camera + f1_image_camera, _ = self.f1gazeboimages.get_camera_info() + self._gazebo_pause() + + ##==== get center + points, _ = self.simplifiedperception.processed_image( + f1_image_camera.data, self.height, self.width, self.x_row, self.center_image + ) + if self.state_space == "spn": + self.point = points[self.poi] + else: + self.point = points[0] + # center = abs(float(self.center_image - self.point) / (float(self.width) // 2)) + center = float(self.center_image - self.point) / (float(self.width) // 2) + + ##==== get State + ##==== image as observation + if self.state_space == "image": + state = np.array( + self.f1gazeboimages.image_preprocessing_black_white_32x32( + f1_image_camera.data, self.height + ) + ) + + ##==== simplified perception as observation + else: + state = self.simplifiedperception.calculate_observation( + points, self.center_image, self.pixel_region + ) + + ##==== get Rewards + if self.reward_function == "follow_right_lane_center_v_step": + reward, done = self.f1gazeborewards.rewards_followlane_v_centerline_step( + vel_cmd, center, step, self.rewards + ) + else: + reward, done = self.f1gazeborewards.rewards_followlane_centerline( + center, self.rewards + ) + + return state, reward, done, {} diff --git a/rl_studio/envs/gazebo/f1/models/followlane_qlearn.py b/rl_studio/envs/gazebo/f1/models/followlane_qlearn.py new file mode 100644 index 000000000..f5929e33f --- /dev/null +++ b/rl_studio/envs/gazebo/f1/models/followlane_qlearn.py @@ -0,0 +1,51 @@ +############################################# +# - Task: Follow Lane +# - Algorithm: Qlearn +# - actions: discrete +# - State: Simplified perception +# +############################################ + +import math + +from cv_bridge import CvBridge +import cv2 +from geometry_msgs.msg import Twist +import numpy as np + +import rospy +from sensor_msgs.msg import Image + +from rl_studio.agents.utils import ( + print_messages, +) +from rl_studio.envs.gazebo.f1.models.f1_env import F1Env +from rl_studio.envs.gazebo.f1.models.settings import F1GazeboTFConfig + + +class FollowLaneQlearnF1Gazebo(F1Env): + def __init__(self, **config): + + ###### init F1env + F1Env.__init__(self, **config) + ###### init class variables + F1GazeboTFConfig.__init__(self, **config) + + def reset(self): + from rl_studio.envs.gazebo.f1.models.reset import ( + Reset, + ) + + if self.state_space == "image": + return Reset.reset_f1_state_image(self) + else: + return Reset.reset_f1_state_sp(self) + + def step(self, action, step): + from rl_studio.envs.gazebo.f1.models.step import ( + StepFollowLane, + ) + + return StepFollowLane.step_followlane_state_sp_actions_discretes( + self, action, step + ) diff --git a/rl_studio/envs/gazebo/f1/models/followline_ddpg_tf.py b/rl_studio/envs/gazebo/f1/models/followline_ddpg_tf.py new file mode 100644 index 000000000..a31520d89 --- /dev/null +++ b/rl_studio/envs/gazebo/f1/models/followline_ddpg_tf.py @@ -0,0 +1,266 @@ +############################################# +# - Task: Follow Line +# - Algorithm: DDPG +# - actions: discrete and continuous +# - State: Simplified perception and raw image +# +############################################ + +from rl_studio.envs.gazebo.f1.models.f1_env import F1Env +from rl_studio.envs.gazebo.f1.models.settings import F1GazeboTFConfig + + +class FollowLineDDPGF1GazeboTF(F1Env): + def __init__(self, **config): + + ###### init F1env + F1Env.__init__(self, **config) + ###### init class variables + F1GazeboTFConfig.__init__(self, **config) + + def reset(self): + from rl_studio.envs.gazebo.f1.models.reset import ( + Reset, + ) + + if self.state_space == "image": + return Reset.reset_f1_state_image(self) + else: + return Reset.reset_f1_state_sp(self) + + def step(self, action, step): + from rl_studio.envs.gazebo.f1.models.step import ( + StepFollowLine, + ) + + if self.state_space == "image" and self.action_space != "continuous": + return StepFollowLine.step_followline_state_image_actions_discretes( + self, action, step + ) + elif self.state_space == "image" and self.action_space == "continuous": + return StepFollowLine.step_followline_state_image_actions_continuous( + self, action, step + ) + elif self.state_space != "image" and self.action_space == "continuous": + return StepFollowLine.step_followline_state_sp_actions_continuous( + self, action, step + ) + else: + return StepFollowLine.step_followline_state_sp_actions_discretes( + self, action, step + ) + + +''' +class FollowLineDDPGF1GazeboTF(F1Env): + def __init__(self, **config): + + F1Env.__init__(self, **config) + self.simplifiedperception = F1GazeboSimplifiedPerception() + self.f1gazeborewards = F1GazeboRewards() + self.f1gazeboutils = F1GazeboUtils() + self.f1gazeboimages = F1GazeboImages() + + self.image = ImageF1() + self.image_raw_from_topic = None + self.f1_image_camera = None + self.sensor = config["sensor"] + + # Image + self.image_resizing = config["image_resizing"] / 100 + self.new_image_size = config["new_image_size"] + self.raw_image = config["raw_image"] + self.height = int(config["height_image"] * self.image_resizing) + self.width = int(config["width_image"] * self.image_resizing) + self.center_image = int(config["center_image"] * self.image_resizing) + self.num_regions = config["num_regions"] + self.pixel_region = int(self.center_image / self.num_regions) * 2 + self.telemetry_mask = config["telemetry_mask"] + self.poi = config["x_row"][0] + self.image_center = None + self.right_lane_center_image = config["center_image"] + ( + config["center_image"] // 2 + ) + self.lower_limit = config["lower_limit"] + + # States + self.state_space = config["states"] + if self.state_space == "spn": + self.x_row = [i for i in range(1, int(self.height / 2) - 1)] + else: + self.x_row = config["x_row"] + + # Actions + self.action_space = config["action_space"] + self.actions = config["actions"] + + # Rewards + self.reward_function = config["reward_function"] + self.rewards = config["rewards"] + self.min_reward = config["min_reward"] + if self.action_space == "continuous": + self.beta_1 = self.actions["w"][1] / ( + self.actions["v"][1] - self.actions["v"][0] + ) + self.beta_0 = self.beta_1 * self.actions["v"][1] + + # Others + self.telemetry = config["telemetry"] + + print_messages( + "FollowLineDDPGF1GazeboTF()", + actions=self.actions, + len_actions=len(self.actions), + # actions_v=self.actions["v"], # for continuous actions + # actions_v=self.actions[0], # for discrete actions + # beta_1=self.beta_1, + # beta_0=self.beta_0, + rewards=self.rewards, + ) + + ################################################################################# + # reset + ################################################################################# + + def reset(self): + """ + Main reset. Depending of: + - sensor + - states: images or simplified perception (sp) + + """ + if self.sensor == "camera": + return self.reset_camera() + + def reset_camera(self): + self._gazebo_reset() + # === POSE === + if self.alternate_pose: + self._gazebo_set_random_pose_f1_followline() + else: + self._gazebo_set_fix_pose_f1_followline() + + self._gazebo_unpause() + + ##==== get image from sensor camera + f1_image_camera, _ = self.get_camera_info() + self._gazebo_pause() + + ##==== calculating State + # image as observation + if self.state_space == "image": + state = np.array( + self.f1gazeboimages.image_preprocessing_black_white_32x32( + f1_image_camera.data, self.height + ) + ) + state_size = state.shape + + # simplified perception as observation + else: + ( + centrals_in_pixels, + centrals_normalized, + ) = self.simplifiedperception.calculate_centrals_lane( + f1_image_camera.data, + self.height, + self.width, + self.x_row, + self.lower_limit, + self.center_image, + ) + states = self.simplifiedperception.calculate_observation( + centrals_in_pixels, self.center_image, self.pixel_region + ) + state = [states[0]] + state_size = len(state) + + return state, state_size + + ################################################################################# + # Camera + ################################################################################# + def get_camera_info(self): + image_data = None + f1_image_camera = None + success = False + + while image_data is None or success is False: + image_data = rospy.wait_for_message( + "/F1ROS/cameraL/image_raw", Image, timeout=5 + ) + cv_image = CvBridge().imgmsg_to_cv2(image_data, "bgr8") + f1_image_camera = self.image_msg_to_image(image_data, cv_image) + # f1_image_camera = image_msg_to_image(image_data, cv_image) + if np.any(cv_image): + success = True + + return f1_image_camera, cv_image + + def image_msg_to_image(self, img, cv_image): + self.image.width = img.width + self.image.height = img.height + self.image.format = "RGB8" + self.image.timeStamp = img.header.stamp.secs + (img.header.stamp.nsecs * 1e-9) + self.image.data = cv_image + + return self.image + + ################################################################################# + # step + ################################################################################# + + def step(self, action, step): + self._gazebo_unpause() + vel_cmd = Twist() + + if self.action_space == "continuous": + vel_cmd.linear.x = action[0][0] + vel_cmd.angular.z = action[0][1] + else: + vel_cmd.linear.x = self.actions[action][0] + vel_cmd.angular.z = self.actions[action][1] + + self.vel_pub.publish(vel_cmd) + + ##==== get image from sensor camera + f1_image_camera, _ = self.get_camera_info() + self._gazebo_pause() + + ##==== get center + points, centrals_normalized = self.simplifiedperception.processed_image( + f1_image_camera.data, self.height, self.width, self.x_row, self.center_image + ) + if self.state_space == "spn": + self.point = points[self.poi] + else: + self.point = points[0] + # center = abs(float(self.center_image - self.point) / (float(self.width) // 2)) + center = float(self.center_image - self.point) / (float(self.width) // 2) + + ##==== get State + ##==== image as observation + if self.state_space == "image": + state = np.array( + self.f1gazeboimages.image_preprocessing_black_white_32x32( + f1_image_camera.data, self.height + ) + ) + + ##==== simplified perception as observation + else: + state = self.simplifiedperception.calculate_observation( + points, self.center_image, self.pixel_region + ) + + ##==== get Rewards + if self.reward_function == "followline_center": + reward, done = self.f1gazeborewards.rewards_followline_center( + center, self.rewards + ) + else: + reward, done = self.f1gazeborewards.rewards_followline_v_w_centerline( + vel_cmd, center, self.rewards, self.beta_1, self.beta_0 + ) + return state, reward, done, {} +''' diff --git a/rl_studio/envs/gazebo/f1/models/followline_dqn_tf.py b/rl_studio/envs/gazebo/f1/models/followline_dqn_tf.py new file mode 100644 index 000000000..48890f984 --- /dev/null +++ b/rl_studio/envs/gazebo/f1/models/followline_dqn_tf.py @@ -0,0 +1,178 @@ +############################################# +# - Task: Follow Line +# - Algorithm: DQN +# - actions: discrete +# - State: Simplified perception and raw image +# +############################################ + +from geometry_msgs.msg import Twist +import numpy as np + +from rl_studio.agents.utils import ( + print_messages, +) +from rl_studio.envs.gazebo.f1.models.f1_env import F1Env +from rl_studio.envs.gazebo.f1.models.settings import F1GazeboTFConfig + + +class FollowLineDQNF1GazeboTF(F1Env): + def __init__(self, **config): + + ###### init F1env + F1Env.__init__(self, **config) + ###### init class variables + F1GazeboTFConfig.__init__(self, **config) + + def reset(self): + from rl_studio.envs.gazebo.f1.models.reset import ( + Reset, + ) + + if self.state_space == "image": + return Reset.reset_f1_state_image(self) + else: + return Reset.reset_f1_state_sp(self) + + def step(self, action, step): + from rl_studio.envs.gazebo.f1.models.step import ( + StepFollowLine, + ) + + if self.state_space == "image": + return StepFollowLine.step_followline_state_image_actions_discretes( + self, action, step + ) + else: + return StepFollowLine.step_followline_state_sp_actions_discretes( + self, action, step + ) + + +""" +class _FollowLaneDQNF1GazeboTF(F1Env): + def __init__(self, **config): + + ###### init F1env + F1Env.__init__(self, **config) + ###### init class variables + F1GazeboTFConfig.__init__(self, **config) + + print_messages( + "FollowLaneDQNF1GazeboTF()", + actions=self.actions, + len_actions=len(self.actions), + # actions_v=self.actions["v"], # for continuous actions + # actions_v=self.actions[0], # for discrete actions + # beta_1=self.beta_1, + # beta_0=self.beta_0, + rewards=self.rewards, + ) + + ######### + # reset + ######### + + def reset(self): + + if self.sensor == "camera": + return self.reset_camera() + + def reset_camera(self): + self._gazebo_reset() + # === POSE === + if self.alternate_pose: + self._gazebo_set_random_pose_f1_follow_rigth_lane() + else: + self._gazebo_set_fix_pose_f1_follow_right_lane() + + self._gazebo_unpause() + + ##==== get image from sensor camera + f1_image_camera, _ = self.f1gazeboimages.get_camera_info() + self._gazebo_pause() + + ##==== calculating State + # image as observation + if self.state_space == "image": + state = np.array( + self.f1gazeboimages.image_preprocessing_black_white_32x32( + f1_image_camera.data, self.height + ) + ) + state_size = state.shape + + # simplified perception as observation + else: + ( + centrals_in_pixels, + centrals_normalized, + ) = self.simplifiedperception.calculate_centrals_lane( + f1_image_camera.data, + self.height, + self.width, + self.x_row, + self.lower_limit, + self.center_image, + ) + states = self.simplifiedperception.calculate_observation( + centrals_in_pixels, self.center_image, self.pixel_region + ) + state = [states[0]] + state_size = len(state) + + return state, state_size + + ######### + # step + ######### + def step(self, action, step): + self._gazebo_unpause() + vel_cmd = Twist() + vel_cmd.linear.x = self.actions[action][0] + vel_cmd.angular.z = self.actions[action][1] + self.vel_pub.publish(vel_cmd) + + ##==== get image from sensor camera + f1_image_camera, _ = self.f1gazeboimages.get_camera_info() + self._gazebo_pause() + + ##==== get center + points, _ = self.simplifiedperception.processed_image( + f1_image_camera.data, self.height, self.width, self.x_row, self.center_image + ) + if self.state_space == "spn": + self.point = points[self.poi] + else: + self.point = points[0] + # center = abs(float(self.center_image - self.point) / (float(self.width) // 2)) + center = float(self.center_image - self.point) / (float(self.width) // 2) + + ##==== get State + ##==== image as observation + if self.state_space == "image": + state = np.array( + self.f1gazeboimages.image_preprocessing_black_white_32x32( + f1_image_camera.data, self.height + ) + ) + + ##==== simplified perception as observation + else: + state = self.simplifiedperception.calculate_observation( + points, self.center_image, self.pixel_region + ) + + ##==== get Rewards + if self.reward_function == "follow_right_lane_center_v_step": + reward, done = self.f1gazeborewards.rewards_followlane_v_centerline_step( + vel_cmd, center, step, self.rewards + ) + else: + reward, done = self.f1gazeborewards.rewards_followlane_centerline( + center, self.rewards + ) + + return state, reward, done, {} + +""" diff --git a/rl_studio/envs/gazebo/f1/models/followline_qlearn.py b/rl_studio/envs/gazebo/f1/models/followline_qlearn.py new file mode 100644 index 000000000..f90d94476 --- /dev/null +++ b/rl_studio/envs/gazebo/f1/models/followline_qlearn.py @@ -0,0 +1,56 @@ +############################################# +# - Task: Follow Line +# - Algorithm: Qlearn +# - actions: discrete +# - State: Simplified perception +# +############################################ + +import math + +from cv_bridge import CvBridge +import cv2 +from geometry_msgs.msg import Twist +import numpy as np + +import rospy +from sensor_msgs.msg import Image + +from rl_studio.agents.utils import ( + print_messages, +) +from rl_studio.envs.gazebo.f1.models.f1_env import F1Env +from rl_studio.envs.gazebo.f1.models.settings import F1GazeboTFConfig + + +class FollowLineQlearnF1Gazebo(F1Env): + def __init__(self, **config): + + ###### init F1env + F1Env.__init__(self, **config) + ###### init class variables + F1GazeboTFConfig.__init__(self, **config) + + def reset(self): + from rl_studio.envs.gazebo.f1.models.reset import ( + Reset, + ) + + if self.state_space == "image": + return Reset.reset_f1_state_image(self) + else: + return Reset.reset_f1_state_sp(self) + + def step(self, action, step): + from rl_studio.envs.gazebo.f1.models.step import ( + StepFollowLine, + ) + + if self.state_space == "image": + return StepFollowLine.step_followline_state_image_actions_discretes( + self, action, step + ) + else: + return StepFollowLine.step_followline_state_sp_actions_discretes( + self, action, step + ) diff --git a/rl_studio/envs/gazebo/f1/models/images.py b/rl_studio/envs/gazebo/f1/models/images.py new file mode 100644 index 000000000..8452507cd --- /dev/null +++ b/rl_studio/envs/gazebo/f1/models/images.py @@ -0,0 +1,157 @@ +import cv2 +from cv_bridge import CvBridge +import numpy as np +from PIL import Image as im +import rospy +from sensor_msgs.msg import Image as ImageROS +from sklearn.cluster import KMeans +from sklearn.utils import shuffle + +from rl_studio.envs.gazebo.f1.image_f1 import ImageF1, ListenerCamera, Image +from rl_studio.envs.gazebo.f1.models.utils import F1GazeboUtils + + +class F1GazeboImages: + def __init__(self): + self.f1gazeboutils = F1GazeboUtils() + self.image = ImageF1() + + def get_camera_info(self): + image_data = None + f1_image_camera = None + success = False + + while image_data is None or success is False: + image_data = rospy.wait_for_message( + "/F1ROS/cameraL/image_raw", ImageROS, timeout=5 + ) + cv_image = CvBridge().imgmsg_to_cv2(image_data, "bgr8") + f1_image_camera = self.image_msg_to_image(image_data, cv_image) + # f1_image_camera = image_msg_to_image(image_data, cv_image) + if np.any(cv_image): + success = True + + return f1_image_camera, cv_image + + def image_msg_to_image(self, img, cv_image): + self.image.width = img.width + self.image.height = img.height + self.image.format = "RGB8" + self.image.timeStamp = img.header.stamp.secs + (img.header.stamp.nsecs * 1e-9) + self.image.data = cv_image + + return self.image + + def image_preprocessing_black_white_original_size(self, img): + image_middle_line = self.height // 2 + img_sliced = img[image_middle_line:] + img_proc = cv2.cvtColor(img_sliced, cv2.COLOR_BGR2HSV) + line_pre_proc = cv2.inRange(img_proc, (0, 120, 120), (0, 255, 255)) + _, mask = cv2.threshold(line_pre_proc, 48, 255, cv2.THRESH_BINARY) + mask_black_White_3D = np.expand_dims(mask, axis=2) + + return mask_black_White_3D + + def image_preprocessing_black_white_32x32(self, img, height): + image_middle_line = height // 2 + img_sliced = img[image_middle_line:] + img_proc = cv2.cvtColor(img_sliced, cv2.COLOR_BGR2HSV) + line_pre_proc = cv2.inRange(img_proc, (0, 120, 120), (0, 255, 255)) + _, mask = cv2.threshold(line_pre_proc, 48, 255, cv2.THRESH_BINARY) + mask_black_white_32x32 = cv2.resize(mask, (32, 32), cv2.INTER_AREA) + mask_black_white_32x32 = np.expand_dims(mask_black_white_32x32, axis=2) + + self.f1gazeboutils.show_image("mask32x32", mask_black_white_32x32, 5) + return mask_black_white_32x32 + + def image_preprocessing_gray_32x32(self, img): + image_middle_line = self.height // 2 + img_sliced = img[image_middle_line:] + img_proc = cv2.cvtColor(img_sliced, cv2.COLOR_BGR2GRAY) + img_gray_3D = cv2.resize(img_proc, (32, 32), cv2.INTER_AREA) + img_gray_3D = np.expand_dims(img_gray_3D, axis=2) + + return img_gray_3D + + def image_preprocessing_raw_original_size(self, img): + image_middle_line = self.height // 2 + img_sliced = img[image_middle_line:] + + return img_sliced + + def image_preprocessing_color_quantization_original_size(self, img): + n_colors = 3 + image_middle_line = self.height // 2 + img_sliced = img[image_middle_line:] + + img_sliced = np.array(img_sliced, dtype=np.float64) / 255 + w, h, d = original_shape = tuple(img_sliced.shape) + image_array = np.reshape(img_sliced, (w * h, d)) + image_array_sample = shuffle(image_array, random_state=0, n_samples=50) + kmeans = KMeans(n_clusters=n_colors, random_state=0).fit(image_array_sample) + labels = kmeans.predict(image_array) + + return kmeans.cluster_centers_[labels].reshape(w, h, -1) + + def image_preprocessing_color_quantization_32x32x1(self, img): + n_colors = 3 + image_middle_line = self.height // 2 + img_sliced = img[image_middle_line:] + + img_sliced = np.array(img_sliced, dtype=np.float64) / 255 + w, h, d = original_shape = tuple(img_sliced.shape) + image_array = np.reshape(img_sliced, (w * h, d)) + image_array_sample = shuffle(image_array, random_state=0, n_samples=500) + kmeans = KMeans(n_clusters=n_colors, random_state=0).fit(image_array_sample) + labels = kmeans.predict(image_array) + im_reshape = kmeans.cluster_centers_[labels].reshape(w, h, -1) + im_resize32x32x1 = np.expand_dims(np.resize(im_reshape, (32, 32)), axis=2) + + return im_resize32x32x1 + + def image_preprocessing_reducing_color_PIL_original_size(self, img): + num_colors = 4 + image_middle_line = self.height // 2 + img_sliced = img[image_middle_line:] + + array2pil = im.fromarray(img_sliced) + array2pil_reduced = array2pil.convert( + "P", palette=im.ADAPTIVE, colors=num_colors + ) + pil2array = np.expand_dims(np.array(array2pil_reduced), 2) + return pil2array + + def image_callback(self, image_data): + self.image_raw_from_topic = CvBridge().imgmsg_to_cv2(image_data, "bgr8") + + def processed_image_circuit_no_wall(self, img): + """ + detecting middle of the right lane + """ + image_middle_line = self.height // 2 + # cropped image from second half to bottom line + img_sliced = img[image_middle_line:] + # convert to black and white mask + # lower_grey = np.array([30, 32, 22]) + # upper_grey = np.array([128, 128, 128]) + img_gray = cv2.cvtColor(img_sliced, cv2.COLOR_BGR2GRAY) + _, mask = cv2.threshold(img_gray, 100, 255, cv2.THRESH_BINARY) + # mask = cv2.inRange(img_sliced, lower_grey, upper_grey) + + # img_proc = cv2.cvtColor(img_sliced, cv2.COLOR_BGR2HSV) + # line_pre_proc = cv2.inRange(img_proc, (0, 30, 30), (0, 255, 255)) + # _, : list[ndarray[Any, dtype[generic]]]: list[ndarray[Any, dtype[generic]]]: list[ndarray[Any, dtype[generic]]]mask = cv2.threshold(line_pre_proc, 240, 255, cv2.THRESH_BINARY) + + lines = [mask[self.x_row[idx], :] for idx, x in enumerate(self.x_row)] + # added last line (239) only FOllowLane, to break center line in bottom + # lines.append(mask[self.lower_limit, :]) + + centrals_in_pixels = list(map(self.get_center, lines)) + centrals_normalized = [ + float(self.center_image - x) / (float(self.width) // 2) + for _, x in enumerate(centrals_in_pixels) + ] + + self.show_image("mask", mask, 5) + + return centrals_in_pixels, centrals_normalized diff --git a/rl_studio/envs/gazebo/f1/models/reset.py b/rl_studio/envs/gazebo/f1/models/reset.py new file mode 100644 index 000000000..3ef041549 --- /dev/null +++ b/rl_studio/envs/gazebo/f1/models/reset.py @@ -0,0 +1,79 @@ +import numpy as np + +from rl_studio.agents.utils import ( + print_messages, +) +from rl_studio.envs.gazebo.f1.models.f1_env import F1Env + + +class Reset(F1Env): + """ + Works for Follow Line and Follow Lane tasks + """ + + def reset_f1_state_image(self): + """ + reset for + - State: Image + - tasks: FollowLane and FollowLine + """ + self._gazebo_reset() + # === POSE === + if self.alternate_pose: + self._gazebo_set_random_pose_f1_follow_rigth_lane() + else: + self._gazebo_set_fix_pose_f1_follow_right_lane() + + self._gazebo_unpause() + + ##==== get image from sensor camera + f1_image_camera, _ = self.f1gazeboimages.get_camera_info() + self._gazebo_pause() + + ##==== calculating State + # image as observation + state = np.array( + self.f1gazeboimages.image_preprocessing_black_white_32x32( + f1_image_camera.data, self.height + ) + ) + state_size = state.shape + + return state, state_size + + def reset_f1_state_sp(self): + """ + reset for + - State: Simplified perception + - tasks: FollowLane and FollowLine + """ + self._gazebo_reset() + # === POSE === + if self.alternate_pose: + self._gazebo_set_random_pose_f1_follow_rigth_lane() + else: + self._gazebo_set_fix_pose_f1_follow_right_lane() + + self._gazebo_unpause() + + ##==== get image from sensor camera + f1_image_camera, _ = self.f1gazeboimages.get_camera_info() + self._gazebo_pause() + + ##==== calculating State + # simplified perception as observation + centrals_in_pixels, _ = self.simplifiedperception.calculate_centrals_lane( + f1_image_camera.data, + self.height, + self.width, + self.x_row, + self.lower_limit, + self.center_image, + ) + states = self.simplifiedperception.calculate_observation( + centrals_in_pixels, self.center_image, self.pixel_region + ) + state = [states[0]] + state_size = len(state) + + return state, state_size diff --git a/rl_studio/envs/gazebo/f1/models/rewards.py b/rl_studio/envs/gazebo/f1/models/rewards.py new file mode 100644 index 000000000..80da09950 --- /dev/null +++ b/rl_studio/envs/gazebo/f1/models/rewards.py @@ -0,0 +1,135 @@ +import math + +import numpy as np + + +class F1GazeboRewards: + @staticmethod + def rewards_followlane_centerline(center, rewards): + """ + works perfectly + rewards in function of center of Line + """ + done = False + if 0.65 >= center > 0.25: + reward = rewards["from_10"] + elif (0.9 > center > 0.65) or (0.25 >= center > 0): + reward = rewards["from_02"] + elif 0 >= center > -0.9: + reward = rewards["from_01"] + else: + reward = rewards["penal"] + done = True + + return reward, done + + def rewards_followlane_v_centerline_step(self, vel_cmd, center, step, rewards): + """ + rewards in function of velocity, angular v and center + """ + + done = False + if 0.65 >= center > 0.25: + reward = (rewards["from_10"] + vel_cmd.linear.x) - math.log(step) + elif (0.9 > center > 0.65) or (0.25 >= center > 0): + reward = (rewards["from_02"] + vel_cmd.linear.x) - math.log(step) + elif 0 >= center > -0.9: + # reward = (self.rewards["from_01"] + vel_cmd.linear.x) - math.log(step) + reward = -math.log(step) + else: + reward = rewards["penal"] + done = True + + return reward, done + + def rewards_followlane_v_w_centerline( + self, vel_cmd, center, rewards, beta_1, beta_0 + ): + """ + v and w are linear dependents, plus center to the eq. + """ + + w_target = beta_0 - (beta_1 * abs(vel_cmd.linear.x)) + w_error = abs(w_target - abs(vel_cmd.angular.z)) + done = False + + if abs(center) > 0.9 or center < 0: + done = True + reward = rewards["penal"] + elif center >= 0: + reward = ( + (1 / math.exp(w_error)) + (1 / math.exp(center)) + 2 + ) # add a constant to favor right lane + # else: + # reward = (1 / math.exp(w_error)) + (math.exp(center)) + + return reward, done + + def calculate_reward(self, error: float) -> float: + d = np.true_divide(error, self.center_image) + reward = np.round(np.exp(-d), 4) + return reward + + def rewards_followline_center(self, center, rewards): + """ + original for Following Line + """ + done = False + if center > 0.9: + done = True + reward = rewards["penal"] + elif 0 <= center <= 0.2: + reward = rewards["from_10"] + elif 0.2 < center <= 0.4: + reward = rewards["from_02"] + else: + reward = rewards["from_01"] + + return reward, done + + def rewards_followline_v_w_centerline( + self, vel_cmd, center, rewards, beta_1, beta_0 + ): + """ + Applies a linear regression between v and w + Supposing there is a lineal relationship V and W. So, formula w = B_0 + x*v. + + Data for Formula1: + Max W = 5 r/s we take max abs value. Correctly it is w left or right + Max V = 100 m/s + Min V = 20 m/s + B_0 = B_1 * Max V + B_1 = (W Max / (V Max - V Min)) + + w target = B_0 - B_1 * v + error = w_actual - w_target + reward = 1/exp(reward + center))) where Max value = 1 + + Args: + linear and angular velocity + center + + Returns: reward + """ + + # print_messages( + # "in reward_v_w_center_linear()", + # beta1=self.beta_1, + # beta0=self.beta_0, + # ) + + w_target = beta_0 - (beta_1 * abs(vel_cmd.linear.x)) + w_error = abs(w_target - abs(vel_cmd.angular.z)) + done = False + + if abs(center) > 0.9: + done = True + reward = rewards["penal"] + elif center > 0: + reward = ( + (1 / math.exp(w_error)) + (1 / math.exp(center)) + 2 + ) # add a constant to favor right lane + else: + reward = (1 / math.exp(w_error)) + (math.exp(center)) + + return reward, done diff --git a/rl_studio/envs/gazebo/f1/models/settings.py b/rl_studio/envs/gazebo/f1/models/settings.py new file mode 100644 index 000000000..f5dbe2d30 --- /dev/null +++ b/rl_studio/envs/gazebo/f1/models/settings.py @@ -0,0 +1,59 @@ +from pydantic import BaseModel + +from rl_studio.envs.gazebo.f1.image_f1 import ImageF1, ListenerCamera +from rl_studio.envs.gazebo.f1.models.images import F1GazeboImages +from rl_studio.envs.gazebo.f1.models.utils import F1GazeboUtils +from rl_studio.envs.gazebo.f1.models.rewards import F1GazeboRewards +from rl_studio.envs.gazebo.f1.models.simplified_perception import ( + F1GazeboSimplifiedPerception, +) + + +class F1GazeboTFConfig(BaseModel): + def __init__(self, **config): + self.simplifiedperception = F1GazeboSimplifiedPerception() + self.f1gazeborewards = F1GazeboRewards() + self.f1gazeboutils = F1GazeboUtils() + self.f1gazeboimages = F1GazeboImages() + + self.image = ImageF1() + #self.image = ListenerCamera("/F1ROS/cameraL/image_raw") + self.image_raw_from_topic = None + self.f1_image_camera = None + self.sensor = config["sensor"] + + # Image + self.image_resizing = config["image_resizing"] / 100 + self.new_image_size = config["new_image_size"] + self.raw_image = config["raw_image"] + self.height = int(config["height_image"] * self.image_resizing) + self.width = int(config["width_image"] * self.image_resizing) + self.center_image = int(config["center_image"] * self.image_resizing) + self.num_regions = config["num_regions"] + self.pixel_region = int(self.center_image / self.num_regions) * 2 + self.telemetry_mask = config["telemetry_mask"] + self.poi = config["x_row"][0] + self.image_center = None + self.right_lane_center_image = config["center_image"] + ( + config["center_image"] // 2 + ) + self.lower_limit = config["lower_limit"] + + # States + self.state_space = config["states"] + if self.state_space == "spn": + self.x_row = [i for i in range(1, int(self.height / 2) - 1)] + else: + self.x_row = config["x_row"] + + # Actions + self.action_space = config["action_space"] + self.actions = config["actions"] + + # Rewards + self.reward_function = config["reward_function"] + self.rewards = config["rewards"] + self.min_reward = config["min_reward"] + + # Others + self.telemetry = config["telemetry"] diff --git a/rl_studio/envs/gazebo/f1/models/simplified_perception.py b/rl_studio/envs/gazebo/f1/models/simplified_perception.py new file mode 100644 index 000000000..7bc281b9d --- /dev/null +++ b/rl_studio/envs/gazebo/f1/models/simplified_perception.py @@ -0,0 +1,136 @@ +import cv2 +import numpy as np + +from rl_studio.envs.gazebo.f1.models.utils import F1GazeboUtils + + +class F1GazeboSimplifiedPerception: + def processed_image(self, img, height, width, x_row, center_image): + """ + In FollowLine tasks, gets the centers of central line + In Followlane Tasks, gets the center of lane + + :parameters: input image 640x480 + :return: + centrals: lists with distance to center in pixels + cntrals_normalized: lists with distance in range [0,1] for calculating rewards + """ + image_middle_line = height // 2 + img_sliced = img[image_middle_line:] + img_proc = cv2.cvtColor(img_sliced, cv2.COLOR_BGR2HSV) + line_pre_proc = cv2.inRange(img_proc, (0, 30, 30), (0, 255, 255)) + _, mask = cv2.threshold(line_pre_proc, 240, 255, cv2.THRESH_BINARY) + + lines = [mask[x_row[idx], :] for idx, x in enumerate(x_row)] + centrals = list(map(self.get_center, lines)) + + centrals_normalized = [ + float(center_image - x) / (float(width) // 2) + for _, x in enumerate(centrals) + ] + + F1GazeboUtils.show_image_with_centrals( + "centrals", mask, 5, centrals, centrals_normalized, x_row + ) + + return centrals, centrals_normalized + + + @staticmethod + def get_center(lines): + ''' + takes center line and returns position regarding to it + ''' + try: + point = np.divide(np.max(np.nonzero(lines)) - np.min(np.nonzero(lines)), 2) + return np.min(np.nonzero(lines)) + point + except ValueError: + return 0 + + + def calculate_observation(self, state, center_image, pixel_region: list) -> list: + """ + returns list of states in range [-7,9] if self.num_regions = 16 => pixel_regions = 40 + state = -7 corresponds to center line far right + state = 9 is far left + """ + final_state = [] + for _, x in enumerate(state): + final_state.append(int((center_image - x) / pixel_region) + 1) + + return final_state + + + def calculate_centrals_lane( + self, img, height, width, x_row, lower_limit, center_image + ): + image_middle_line = height // 2 + # cropped image from second half to bottom line + img_sliced = img[image_middle_line:] + # convert to black and white mask + # lower_grey = np.array([30, 32, 22]) + # upper_grey = np.array([128, 128, 128]) + img_gray = cv2.cvtColor(img_sliced, cv2.COLOR_BGR2GRAY) + _, mask = cv2.threshold(img_gray, 110, 255, cv2.THRESH_BINARY) + # get Lines to work for + lines = [mask[x_row[idx], :] for idx, _ in enumerate(x_row)] + # added last line (239), to control center line in bottom + lines.append(mask[lower_limit, :]) + + centrals_in_pixels = list(map(self.get_center_right_lane, lines)) + centrals_normalized = [ + abs(float(center_image - x) / (float(width) // 2)) + for _, x in enumerate(centrals_in_pixels) + ] + + # F1GazeboUtils.show_image_with_centrals( + # "mask", mask, 5, centrals_in_pixels, centrals_normalized, self.x_row + # ) + + return centrals_in_pixels, centrals_normalized + + @staticmethod + def get_center_right_lane(lines): + try: + # inversed line + inversed_lane = [x for x in reversed(lines)] + # cut off right blanks + inv_index_right = np.argmin(inversed_lane) + # cropped right blanks + cropped_lane = inversed_lane[inv_index_right:] + # cut off central line + inv_index_left = np.argmax(cropped_lane) + # get real lane index + index_real_right = len(lines) - inv_index_right + if inv_index_left == 0: + index_real_left = 0 + else: + index_real_left = len(lines) - inv_index_right - inv_index_left + # get center lane + center = (index_real_right - index_real_left) // 2 + center_lane = center + index_real_left + + # avoid finish line or other blank marks on the road + if center_lane == 0: + center_lane = 320 + + return center_lane + + except ValueError: + return 0 + + @staticmethod + def get_center_circuit_no_wall(lines): + try: + pos_final_linea_negra = np.argmin(lines) + 15 + carril_derecho_entero = lines[pos_final_linea_negra:] + final_carril_derecho = np.argmin(carril_derecho_entero) + lim_izq = pos_final_linea_negra + lim_der = pos_final_linea_negra + final_carril_derecho + + punto_central_carril = (lim_der - lim_izq) // 2 + punto_central_absoluto = lim_izq + punto_central_carril + return punto_central_absoluto + + except ValueError: + return 0 diff --git a/rl_studio/envs/gazebo/f1/models/step.py b/rl_studio/envs/gazebo/f1/models/step.py new file mode 100644 index 000000000..f45b20162 --- /dev/null +++ b/rl_studio/envs/gazebo/f1/models/step.py @@ -0,0 +1,352 @@ +from geometry_msgs.msg import Twist +import numpy as np + +from rl_studio.agents.utils import ( + print_messages, +) +from rl_studio.envs.gazebo.f1.models.f1_env import F1Env + + +class StepFollowLine(F1Env): + def __init__(self, **config): + self.name = config["states"] + + def step_followline_state_image_actions_discretes(self, action, step): + self._gazebo_unpause() + vel_cmd = Twist() + vel_cmd.linear.x = self.actions[action][0] + vel_cmd.angular.z = self.actions[action][1] + self.vel_pub.publish(vel_cmd) + + ##==== get image from sensor camera + f1_image_camera, _ = self.f1gazeboimages.get_camera_info() + self._gazebo_pause() + + ##==== get center + points_in_red_line, _ = self.simplifiedperception.processed_image( + f1_image_camera.data, self.height, self.width, self.x_row, self.center_image + ) + if self.state_space == "spn": + self.point = points_in_red_line[self.poi] + else: + self.point = points_in_red_line[0] + + center = abs(float(self.center_image - self.point) / (float(self.width) // 2)) + # center = float(self.center_image - self.point) / (float(self.width) // 2) + + ##==== get State + ##==== image as observation + state = np.array( + self.f1gazeboimages.image_preprocessing_black_white_32x32( + f1_image_camera.data, self.height + ) + ) + + ##==== get Rewards + if self.reward_function == "followline_center": + reward, done = self.f1gazeborewards.rewards_followline_center( + center, self.rewards + ) + + return state, reward, done, {} + + def step_followline_state_sp_actions_discretes(self, action, step): + self._gazebo_unpause() + vel_cmd = Twist() + vel_cmd.linear.x = self.actions[action][0] + vel_cmd.angular.z = self.actions[action][1] + self.vel_pub.publish(vel_cmd) + + ##==== get image from sensor camera + f1_image_camera, _ = self.f1gazeboimages.get_camera_info() + self._gazebo_pause() + + ##==== get center + points_in_red_line, _ = self.simplifiedperception.processed_image( + f1_image_camera.data, self.height, self.width, self.x_row, self.center_image + ) + if self.state_space == "spn": + self.point = points_in_red_line[self.poi] + else: + self.point = points_in_red_line[0] + + center = abs(float(self.center_image - self.point) / (float(self.width) // 2)) + # center = float(self.center_image - self.point) / (float(self.width) // 2) + + ##==== get State + ##==== simplified perception as observation + state = self.simplifiedperception.calculate_observation( + points_in_red_line, self.center_image, self.pixel_region + ) + + ##==== get Rewards + if self.reward_function == "followline_center": + reward, done = self.f1gazeborewards.rewards_followline_center( + center, self.rewards + ) + + return state, reward, done, {} + + def step_followline_state_image_actions_continuous(self, action, step): + self._gazebo_unpause() + vel_cmd = Twist() + vel_cmd.linear.x = action[0][0] + vel_cmd.angular.z = action[0][1] + self.vel_pub.publish(vel_cmd) + + ##==== get image from sensor camera + f1_image_camera, _ = self.f1gazeboimages.get_camera_info() + self._gazebo_pause() + + ##==== get center + points_in_red_line, _ = self.simplifiedperception.processed_image( + f1_image_camera.data, self.height, self.width, self.x_row, self.center_image + ) + if self.state_space == "spn": + self.point = points_in_red_line[self.poi] + else: + self.point = points_in_red_line[0] + + center = abs(float(self.center_image - self.point) / (float(self.width) // 2)) + # center = float(self.center_image - self.point) / (float(self.width) // 2) + + ##==== get State + state = np.array( + self.f1gazeboimages.image_preprocessing_black_white_32x32( + f1_image_camera.data, self.height + ) + ) + + ##==== get Rewards + if self.reward_function == "followline_center": + reward, done = self.f1gazeborewards.rewards_followline_center( + center, self.rewards + ) + else: + reward, done = self.f1gazeborewards.rewards_followline_v_w_centerline( + vel_cmd, center, self.rewards, self.beta_1, self.beta_0 + ) + + return state, reward, done, {} + + def step_followline_state_sp_actions_continuous(self, action, step): + self._gazebo_unpause() + vel_cmd = Twist() + vel_cmd.linear.x = action[0][0] + vel_cmd.angular.z = action[0][1] + self.vel_pub.publish(vel_cmd) + + ##==== get image from sensor camera + f1_image_camera, _ = self.f1gazeboimages.get_camera_info() + self._gazebo_pause() + + ##==== get center + points_in_red_line, _ = self.simplifiedperception.processed_image( + f1_image_camera.data, self.height, self.width, self.x_row, self.center_image + ) + if self.state_space == "spn": + self.point = points_in_red_line[self.poi] + else: + self.point = points_in_red_line[0] + + center = abs(float(self.center_image - self.point) / (float(self.width) // 2)) + # center = float(self.center_image - self.point) / (float(self.width) // 2) + + ##==== get State + ##==== simplified perception as observation + state = self.simplifiedperception.calculate_observation( + points_in_red_line, self.center_image, self.pixel_region + ) + + ##==== get Rewards + if self.reward_function == "followline_center": + reward, done = self.f1gazeborewards.rewards_followline_center( + center, self.rewards + ) + else: + reward, done = self.f1gazeborewards.rewards_followline_v_w_centerline( + vel_cmd, center, self.rewards, self.beta_1, self.beta_0 + ) + + return state, reward, done, {} + + +class StepFollowLane(F1Env): + def __init__(self, **config): + self.name = config["states"] + + def step_followlane_state_sp_actions_discretes(self, action, step): + self._gazebo_unpause() + vel_cmd = Twist() + vel_cmd.linear.x = self.actions[action][0] + vel_cmd.angular.z = self.actions[action][1] + self.vel_pub.publish(vel_cmd) + + ##==== get image from sensor camera + f1_image_camera, _ = self.f1gazeboimages.get_camera_info() + self._gazebo_pause() + + ##==== get center + centrals_in_lane, centrals_in_lane_normalized = self.simplifiedperception.processed_image( + f1_image_camera.data, self.height, self.width, self.x_row, self.center_image + ) + if self.state_space == "spn": + self.point = centrals_in_lane[self.poi] + else: + self.point = centrals_in_lane[0] + + # center = abs(float(self.center_image - self.point) / (float(self.width) // 2)) + #center = float(self.center_image - self.point) / (float(self.width) // 2) + + #print(f"\n{centrals_in_lane = }") + #print(f"\n{centrals_in_lane_normalized = }") + #print(f"\n{self.point = }") + #print(f"\n{center = }") + + ##==== get State + ##==== simplified perception as observation + state = self.simplifiedperception.calculate_observation( + centrals_in_lane, self.center_image, self.pixel_region + ) + + ##==== get Rewards + if self.reward_function == "follow_right_lane_center_v_step": + reward, done = self.f1gazeborewards.rewards_followlane_v_centerline_step( + vel_cmd, centrals_in_lane_normalized[0], step, self.rewards + ) + else: + reward, done = self.f1gazeborewards.rewards_followlane_centerline( + centrals_in_lane_normalized[0], self.rewards + ) + + return state, reward, done, {} + + + + + + def step_followlane_state_image_actions_discretes(self, action, step): + self._gazebo_unpause() + vel_cmd = Twist() + vel_cmd.linear.x = self.actions[action][0] + vel_cmd.angular.z = self.actions[action][1] + self.vel_pub.publish(vel_cmd) + + ##==== get image from sensor camera + f1_image_camera, _ = self.f1gazeboimages.get_camera_info() + self._gazebo_pause() + + ##==== get center + points_in_red_line, _ = self.simplifiedperception.processed_image( + f1_image_camera.data, self.height, self.width, self.x_row, self.center_image + ) + if self.state_space == "spn": + self.point = points_in_red_line[self.poi] + else: + self.point = points_in_red_line[0] + + # center = abs(float(self.center_image - self.point) / (float(self.width) // 2)) + center = float(self.center_image - self.point) / (float(self.width) // 2) + + ##==== get State + ##==== image as observation + state = np.array( + self.f1gazeboimages.image_preprocessing_black_white_32x32( + f1_image_camera.data, self.height + ) + ) + + ##==== get Rewards + if self.reward_function == "follow_right_lane_center_v_step": + reward, done = self.f1gazeborewards.rewards_followlane_v_centerline_step( + vel_cmd, center, step, self.rewards + ) + else: + reward, done = self.f1gazeborewards.rewards_followlane_centerline( + center, self.rewards + ) + + return state, reward, done, {} + + def step_followlane_state_image_actions_continuous(self, action, step): + self._gazebo_unpause() + vel_cmd = Twist() + vel_cmd.linear.x = action[0][0] + vel_cmd.angular.z = action[0][1] + self.vel_pub.publish(vel_cmd) + + ##==== get image from sensor camera + f1_image_camera, _ = self.f1gazeboimages.get_camera_info() + self._gazebo_pause() + + ##==== get center + points_in_red_line, _ = self.simplifiedperception.processed_image( + f1_image_camera.data, self.height, self.width, self.x_row, self.center_image + ) + if self.state_space == "spn": + self.point = points_in_red_line[self.poi] + else: + self.point = points_in_red_line[0] + + # center = abs(float(self.center_image - self.point) / (float(self.width) // 2)) + center = float(self.center_image - self.point) / (float(self.width) // 2) + + ##==== get State + state = np.array( + self.f1gazeboimages.image_preprocessing_black_white_32x32( + f1_image_camera.data, self.height + ) + ) + + ##==== get Rewards + if self.reward_function == "follow_right_lane_center_v_step": + reward, done = self.f1gazeborewards.rewards_followlane_v_centerline_step( + vel_cmd, center, step, self.rewards + ) + else: + reward, done = self.f1gazeborewards.rewards_followlane_centerline( + center, self.rewards + ) + + return state, reward, done, {} + + def step_followlane_state_sp_actions_continuous(self, action, step): + self._gazebo_unpause() + vel_cmd = Twist() + vel_cmd.linear.x = action[0][0] + vel_cmd.angular.z = action[0][1] + self.vel_pub.publish(vel_cmd) + + ##==== get image from sensor camera + f1_image_camera, _ = self.f1gazeboimages.get_camera_info() + self._gazebo_pause() + + ##==== get center + points_in_red_line, _ = self.simplifiedperception.processed_image( + f1_image_camera.data, self.height, self.width, self.x_row, self.center_image + ) + if self.state_space == "spn": + self.point = points_in_red_line[self.poi] + else: + self.point = points_in_red_line[0] + + # center = abs(float(self.center_image - self.point) / (float(self.width) // 2)) + center = float(self.center_image - self.point) / (float(self.width) // 2) + + ##==== get State + ##==== simplified perception as observation + state = self.simplifiedperception.calculate_observation( + points_in_red_line, self.center_image, self.pixel_region + ) + + ##==== get Rewards + if self.reward_function == "follow_right_lane_center_v_step": + reward, done = self.f1gazeborewards.rewards_followlane_v_centerline_step( + vel_cmd, center, step, self.rewards + ) + else: + reward, done = self.f1gazeborewards.rewards_followlane_centerline( + center, self.rewards + ) + + return state, reward, done, {} diff --git a/rl_studio/envs/gazebo/f1/models/utils.py b/rl_studio/envs/gazebo/f1/models/utils.py new file mode 100644 index 000000000..b74c4096a --- /dev/null +++ b/rl_studio/envs/gazebo/f1/models/utils.py @@ -0,0 +1,45 @@ +import cv2 + + +class F1GazeboUtils: + def __init__(self): + self.f1 = None + + @staticmethod + def show_image_with_centrals( + name, img, waitkey, centrals_in_pixels, centrals_normalized, x_row + ): + window_name = f"{name}" + + for index, value in enumerate(x_row): + cv2.putText( + img, + str( + f"{int(centrals_in_pixels[index])}" + ), + (int(centrals_in_pixels[index])+20, int(x_row[index])), + cv2.FONT_HERSHEY_SIMPLEX, + 0.3, + (255, 255, 255), + 1, + cv2.LINE_AA, + ) + cv2.putText( + img, + str( + f"[{centrals_normalized[index]}]" + ), + (320, int(x_row[index])), + cv2.FONT_HERSHEY_SIMPLEX, + 0.3, + (255, 255, 255), + 1, + cv2.LINE_AA, + ) + cv2.imshow(window_name, img) + cv2.waitKey(waitkey) + + def show_image(self, name, img, waitkey): + window_name = f"{name}" + cv2.imshow(window_name, img) + cv2.waitKey(waitkey) diff --git a/rl_studio/envs/gazebo/gazebo_envs.py b/rl_studio/envs/gazebo/gazebo_envs.py index 9a5479c98..63716848d 100755 --- a/rl_studio/envs/gazebo/gazebo_envs.py +++ b/rl_studio/envs/gazebo/gazebo_envs.py @@ -11,7 +11,7 @@ import numpy as np from rosgraph_msgs.msg import Clock import rospy -from tf.transformations import quaternion_from_euler +#from tf.transformations import quaternion_from_euler from agents.utils import print_messages @@ -51,9 +51,9 @@ def __init__(self, config): Path( Path(__file__).resolve().parents[2] / "CustomRobots" - / config.get("environment_folder") + / config["environment_folder"] / "launch" - / config.get("launchfile") + / config["launchfile"] ) ) # print(f"-----> {fullpath}") @@ -70,7 +70,6 @@ def __init__(self, config): ] ) # print("Gazebo launched!") - self.gzclient_pid = 0 # Launch the simulation with the given launchfile name @@ -180,6 +179,13 @@ def _gazebo_set_new_pose(self): print(f"Service call failed: {e}") return pos_number + def get_position(self): + object_coordinates = self.model_coordinates(self.robot_name, "") + x_position = round(object_coordinates.pose.position.x, 2) + y_position = round(object_coordinates.pose.position.y, 2) + + return x_position, y_position + def _gazebo_set_new_pose_robot(self): """ (pos_number, pose_x, pose_y, pose_z, or_x, or_y, or_z, or_z) @@ -403,6 +409,57 @@ def _gazebo_set_random_pose_f1_follow_rigth_lane(self): print(f"Service call failed: {e}") return pos_number + def _gazebo_set_fix_pose_f1_followline(self): + pos_number = self.start_pose + state = ModelState() + state.model_name = self.model_state_name + # Pose Position + state.pose.position.x = self.start_pose[0][0] + state.pose.position.y = self.start_pose[0][1] + state.pose.position.z = self.start_pose[0][2] + + # Pose orientation + state.pose.orientation.x = self.start_pose[0][3] + state.pose.orientation.y = self.start_pose[0][4] + state.pose.orientation.z = self.start_pose[0][5] + state.pose.orientation.w = self.start_pose[0][6] + + rospy.wait_for_service("/gazebo/set_model_state") + try: + set_state = rospy.ServiceProxy("/gazebo/set_model_state", SetModelState) + set_state(state) + except rospy.ServiceException as e: + print(f"Service call failed: {e}") + return pos_number + + def _gazebo_set_random_pose_f1_followline(self): + """ + (pos_number, pose_x, pose_y, pose_z, or_x, or_y, or_z, or_z) + """ + random_init = np.random.randint(0, high=len(self.start_random_pose)) + # pos_number = self.start_random_pose[posit][0] + pos_number = self.start_random_pose[random_init][0] + + state = ModelState() + state.model_name = self.model_state_name + # Pose Position + state.pose.position.x = self.start_random_pose[random_init][0] + state.pose.position.y = self.start_random_pose[random_init][1] + state.pose.position.z = self.start_random_pose[random_init][2] + # Pose orientation + state.pose.orientation.x = self.start_random_pose[random_init][3] + state.pose.orientation.y = self.start_random_pose[random_init][4] + state.pose.orientation.z = self.start_random_pose[random_init][5] + state.pose.orientation.w = self.start_random_pose[random_init][6] + + rospy.wait_for_service("/gazebo/set_model_state") + try: + set_state = rospy.ServiceProxy("/gazebo/set_model_state", SetModelState) + set_state(state) + except rospy.ServiceException as e: + print(f"Service call failed: {e}") + return pos_number + def _render(self, mode="human", close=False): if close: diff --git a/rl_studio/envs/gazebo/mountain_car/__init__.py b/rl_studio/envs/gazebo/mountain_car/__init__.py index d91068162..d1db1f137 100644 --- a/rl_studio/envs/gazebo/mountain_car/__init__.py +++ b/rl_studio/envs/gazebo/mountain_car/__init__.py @@ -14,7 +14,7 @@ def __new__(cls, **config): cls.model_coordinates = None cls.position = None - training_type = config.get("training_type") + training_type = config["environments"]["training_type"] print(config.get("launchfile")) if training_type == TrainingType.qlearn_env_camera.value: from .mountain_car_env import MountainCarEnv diff --git a/rl_studio/envs/gazebo/mountain_car/mountain_car_env.py b/rl_studio/envs/gazebo/mountain_car/mountain_car_env.py index a39701928..8678318ba 100755 --- a/rl_studio/envs/gazebo/mountain_car/mountain_car_env.py +++ b/rl_studio/envs/gazebo/mountain_car/mountain_car_env.py @@ -14,6 +14,7 @@ class MountainCarEnv(gazebo_envs.GazeboEnv): def __init__(self, **config): self.actions = config.get("actions") + config = config["environments"] self.action_space = spaces.Discrete( len(self.actions) ) # actions # spaces.Discrete(3) # F,L,R diff --git a/rl_studio/envs/gazebo/robot_mesh/__init__.py b/rl_studio/envs/gazebo/robot_mesh/__init__.py index 4f1f55a06..3c229e508 100644 --- a/rl_studio/envs/gazebo/robot_mesh/__init__.py +++ b/rl_studio/envs/gazebo/robot_mesh/__init__.py @@ -14,8 +14,8 @@ def __new__(cls, **config): cls.model_coordinates = None cls.position = None - training_type = config.get("training_type") - print(config.get("launchfile")) + training_type = config["environments"].get("training_type") + print(config["environments"].get("launchfile")) if ( training_type == TrainingType.qlearn_env_camera.value or training_type == TrainingType.manual_env.value diff --git a/rl_studio/envs/gazebo/robot_mesh/gazebo_envs.py b/rl_studio/envs/gazebo/robot_mesh/gazebo_envs.py new file mode 100755 index 000000000..da5ccc579 --- /dev/null +++ b/rl_studio/envs/gazebo/robot_mesh/gazebo_envs.py @@ -0,0 +1,480 @@ +from pathlib import Path +import os +import random +import signal +import subprocess +import sys + +import rosgraph +from gazebo_msgs.msg import ModelState +from gazebo_msgs.srv import SetModelState +import gym +import numpy as np +from rosgraph_msgs.msg import Clock +import rospy +from tf.transformations import quaternion_from_euler +from geometry_msgs.msg import Twist + +from rl_studio.agents.utils import print_messages +import time + +class GazeboEnv(gym.Env): + """ + Superclass for all Gazebo environments. + """ + + metadata = {"render.models": ["human"]} + + def __init__(self, config): + self.last_clock_msg = Clock() + self.port = "11311" # str(random_number) #os.environ["ROS_PORT_SIM"] + self.port_gazebo = "11345" # str(random_number+1) #os.environ["ROS_PORT_SIM"] + + self.robot_name = config.get("robot_name") + # print(f"\nROS_MASTER_URI = http://localhost:{self.port}\n") + # print(f"GAZEBO_MASTER_URI = http://localhost:{self.port_gazebo}\n") + + ros_path = os.path.dirname(subprocess.check_output(["which", "roscore"])) + + # NOTE: It doesn't make sense to launch a roscore because it will be done when spawing Gazebo, which also need + # to be the first node in order to initialize the clock. + # # start roscore with same python version as current script + # self._roscore = subprocess.Popen([sys.executable, os.path.join(ros_path, b"roscore"), "-p", self.port]) + # time.sleep(1) + # print ("Roscore launched!") + + if config.get("launch_absolute_path") != None: + fullpath = config.get("launch_absolute_path") + else: + # TODO: Global env for 'my_env'. It must be passed in constructor. + fullpath = str( + Path( + Path(__file__).resolve().parents[3] + / "CustomRobots" + / config["environment_folder"] + / "launch" + / config["launchfile"] + ) + ) + # print(f"-----> {fullpath}") + if not os.path.exists(fullpath): + raise IOError(f"File {fullpath} does not exist") + + self._roslaunch = subprocess.Popen( + [ + sys.executable, + os.path.join(ros_path, b"roslaunch"), + "-p", + self.port, + fullpath, + ] + ) + + + ################################################################################################################ + # r = rospy.Rate(1) + # self.clock_sub = rospy.Subscriber('/clock', Clock, self.callback, queue_size=1000000) + # while not rospy.is_shutdown(): + # print("initialization: ", rospy.rostime.is_rostime_initialized()) + # print("Wallclock: ", rospy.rostime.is_wallclock()) + # print("Time: ", time.time()) + # print("Rospyclock: ", rospy.rostime.get_rostime().secs) + # # print("/clock: ", str(self.last_clock_msg)) + # last_ros_time_ = self.last_clock_msg + # print("Clock:", last_ros_time_) + # # print("Waiting for synch with ROS clock") + # # if wallclock == False: + # # break + # r.sleep() + ################################################################################################################ + + # def callback(self, message): + # """ + # Callback method for the subscriber of the clock topic + # :param message: + # :return: + # """ + # # self.last_clock_msg = int(str(message.clock.secs) + str(message.clock.nsecs)) / 1e6 + # # print("Message", message) + # self.last_clock_msg = message + # # print("Message", message) + + # def get_publisher(self, topic_path, msg_type, **kwargs): + # pub = rospy.Publisher(topic_path, msg_type, **kwargs) + # num_subs = len(self._get_subscribers(topic_path)) + # for i in range(10): + # num_cons = pub.get_num_connections() + # if num_cons == num_subs: + # return pub + # time.sleep(0.1) + # raise RuntimeError("failed to get publisher") + # def _get_subscribers(self, topic_path): + # ros_master = rosgraph.Master('/rostopic') + # topic_path = rosgraph.names.script_resolve_name('rostopic', topic_path) + # state = ros_master.getSystemState() + # subs = [] + # for sub in state[1]: + # if sub[0] == topic_path: + # subs.extend(sub[1]) + # return subs + def step(self, action): + + # Implement this method in every subclass + # Perform a step in gazebo. E.g. move the robot + raise NotImplementedError + + def reset(self): + + # Implemented in subclass + raise NotImplementedError + + def _gazebo_get_agent_position(self): + + object_coordinates = self.model_coordinates(self.robot_name, "") + x_position = round(object_coordinates.pose.position.x, 2) + y_position = round(object_coordinates.pose.position.y, 2) + + print_messages( + "en _gazebo_get_agent_position()", + robot_name=self.robot_name, + object_coordinates=object_coordinates, + ) + return x_position, y_position + + def _gazebo_reset(self): + # Resets the state of the environment and returns an initial observation. + rospy.wait_for_service("/gazebo/reset_simulation") + try: + # reset_proxy.call() + self.reset_proxy() + self.unpause() + except rospy.ServiceException as e: + print(f"/gazebo/reset_simulation service call failed: {e}") + + def _gazebo_pause(self): + rospy.wait_for_service("/gazebo/pause_physics") + try: + # resp_pause = pause.call() + self.pause() + except rospy.ServiceException as e: + print(f"/gazebo/pause_physics service call failed: {e}") + + def _gazebo_unpause(self): + rospy.wait_for_service("/gazebo/unpause_physics") + try: + self.unpause() + except rospy.ServiceException as e: + print(f"/gazebo/unpause_physics service call failed: {e}") + + def _gazebo_set_new_pose(self): + """ + (pos_number, pose_x, pose_y, pose_z, or_x, or_y, or_z, or_z) + """ + pos = random.choice(list(enumerate(self.circuit["gaz_pos"])))[0] + self.position = pos + + pos_number = self.circuit["gaz_pos"][0] + + state = ModelState() + state.model_name = self.config.get("robot_name") + state.pose.position.x = self.circuit["gaz_pos"][pos][1] + state.pose.position.y = self.circuit["gaz_pos"][pos][2] + state.pose.position.z = self.circuit["gaz_pos"][pos][3] + state.pose.orientation.x = self.circuit["gaz_pos"][pos][4] + state.pose.orientation.y = self.circuit["gaz_pos"][pos][5] + state.pose.orientation.z = self.circuit["gaz_pos"][pos][6] + state.pose.orientation.w = self.circuit["gaz_pos"][pos][7] + + rospy.wait_for_service("/gazebo/set_model_state") + try: + set_state = rospy.ServiceProxy("/gazebo/set_model_state", SetModelState) + set_state(state) + except rospy.ServiceException as e: + print(f"Service call failed: {e}") + return pos_number + + def get_position(self): + object_coordinates = self.model_coordinates(self.robot_name, "") + x_position = round(object_coordinates.pose.position.x, 2) + y_position = round(object_coordinates.pose.position.y, 2) + + return x_position, y_position + + def _gazebo_set_new_pose_robot(self): + """ + (pos_number, pose_x, pose_y, pose_z, or_x, or_y, or_z, or_z) + """ + # pos = random.choice(list(enumerate(self.circuit["gaz_pos"])))[0] + # self.position = pos + + pos_number = 0 + + state = ModelState() + state.model_name = self.robot_name + state.pose.position.x = self.reset_pos_x + state.pose.position.y = self.reset_pos_y + state.pose.position.z = self.reset_pos_z + state.pose.orientation.x = 0 + state.pose.orientation.y = 0 + state.pose.orientation.z = 0 + state.pose.orientation.w = 0 + + rospy.wait_for_service("/gazebo/set_model_state") + try: + set_state = rospy.ServiceProxy("/gazebo/set_model_state", SetModelState) + set_state(state) + except rospy.ServiceException as e: + print(f"Service call failed: {e}") + return pos_number + + def _gazebo_set_fix_pose_autoparking(self): + """ + https://stackoverflow.com/questions/60840019/practical-understanding-of-quaternions-in-ros-moveit + """ + pos_number = self.start_pose + # pos_number = self.start_random_pose[posit][0] + # pos_number = self.gazebo_random_start_pose[posit][0] + + state = ModelState() + # state.model_name = "f1_renault" + state.model_name = self.model_state_name + + # Pose Position + state.pose.position.x = self.start_pose[0][0] + state.pose.position.y = self.start_pose[0][1] + state.pose.position.z = self.start_pose[0][2] + + # Pose orientation + quaternion = quaternion_from_euler( + self.start_pose[0][3], self.start_pose[0][4], self.start_pose[0][5] + ) + + state.pose.orientation.x = quaternion[0] + state.pose.orientation.y = quaternion[1] + state.pose.orientation.z = quaternion[2] + state.pose.orientation.w = quaternion[3] + + print_messages( + "en _gazebo_set_fix_pose_autoparking()", + start_pose=self.start_pose, + start_pose0=self.start_pose[0][0], + start_pose1=self.start_pose[0][1], + start_pose2=self.start_pose[0][2], + start_pose3=self.start_pose[0][3], + start_pose4=self.start_pose[0][4], + start_pose5=self.start_pose[0][5], + state_pose_orientation=state.pose.orientation, + # start_pose6=self.start_pose[0][6], + # circuit_positions_set=self.circuit_positions_set, + start_random_pose=self.start_random_pose, + # gazebo_random_start_pose=self.gazebo_random_start_pose, + model_state_name=self.model_state_name, + ) + + rospy.wait_for_service("/gazebo/set_model_state") + try: + set_state = rospy.ServiceProxy("/gazebo/set_model_state", SetModelState) + set_state(state) + except rospy.ServiceException as e: + print(f"Service call failed: {e}") + return pos_number + + def _gazebo_set_random_pose_autoparking(self): + """ + (pos_number, pose_x, pose_y, pose_z, or_x, or_y, or_z, or_z) + """ + random_init = np.random.randint(0, high=len(self.start_random_pose)) + # pos_number = self.start_random_pose[posit][0] + pos_number = self.start_random_pose[random_init][0] + + state = ModelState() + state.model_name = self.model_state_name + # Pose Position + state.pose.position.x = self.start_random_pose[random_init][0] + state.pose.position.y = self.start_random_pose[random_init][1] + state.pose.position.z = self.start_random_pose[random_init][2] + # Pose orientation + quaternion = quaternion_from_euler( + self.start_random_pose[random_init][3], + self.start_random_pose[random_init][4], + self.start_random_pose[random_init][5], + ) + state.pose.orientation.x = quaternion[0] + state.pose.orientation.y = quaternion[1] + state.pose.orientation.z = quaternion[2] + state.pose.orientation.w = quaternion[3] + + print_messages( + "en _gazebo_set_random_pose_autoparking()", + random_init=random_init, + start_random_pose=self.start_random_pose, + start_pose=self.start_pose, + start_random_pose0=self.start_random_pose[random_init][0], + start_random_pose1=self.start_random_pose[random_init][1], + start_random_pose2=self.start_random_pose[random_init][2], + start_random_pose3=self.start_random_pose[random_init][3], + start_random_pose4=self.start_random_pose[random_init][4], + start_random_pose5=self.start_random_pose[random_init][5], + state_pose_position=state.pose.position, + state_pose_orientation=state.pose.orientation, + model_state_name=self.model_state_name, + ) + + rospy.wait_for_service("/gazebo/set_model_state") + try: + set_state = rospy.ServiceProxy("/gazebo/set_model_state", SetModelState) + set_state(state) + except rospy.ServiceException as e: + print(f"Service call failed: {e}") + return pos_number + + def _gazebo_set_fix_pose_f1_follow_right_lane(self): + pos_number = self.start_pose + state = ModelState() + state.model_name = self.model_state_name + # Pose Position + state.pose.position.x = self.start_pose[0][0] + state.pose.position.y = self.start_pose[0][1] + state.pose.position.z = self.start_pose[0][2] + + # Pose orientation + state.pose.orientation.x = self.start_pose[0][3] + state.pose.orientation.y = self.start_pose[0][4] + state.pose.orientation.z = self.start_pose[0][5] + state.pose.orientation.w = self.start_pose[0][6] + + # print_messages( + # "en _gazebo_set_fix_pose_f1_follow_right_lane()", + # start_pose=self.start_pose, + # start_pose0=self.start_pose[0][0], + # start_pose1=self.start_pose[0][1], + # start_pose2=self.start_pose[0][2], + # start_pose3=self.start_pose[0][3], + # start_pose4=self.start_pose[0][4], + # start_pose5=self.start_pose[0][5], + # start_pose6=self.start_pose[0][6], + # state_pose_orientation=state.pose.orientation, + # # start_pose6=self.start_pose[0][6], + # # circuit_positions_set=self.circuit_positions_set, + # start_random_pose=self.start_random_pose, + # # gazebo_random_start_pose=self.gazebo_random_start_pose, + # model_state_name=self.model_state_name, + # ) + + rospy.wait_for_service("/gazebo/set_model_state") + try: + set_state = rospy.ServiceProxy("/gazebo/set_model_state", SetModelState) + set_state(state) + except rospy.ServiceException as e: + print(f"Service call failed: {e}") + return pos_number + + def _gazebo_set_random_pose_f1_follow_rigth_lane(self): + """ + (pos_number, pose_x, pose_y, pose_z, or_x, or_y, or_z, or_z) + """ + random_init = np.random.randint(0, high=len(self.start_random_pose)) + # pos_number = self.start_random_pose[posit][0] + pos_number = self.start_random_pose[random_init][0] + + state = ModelState() + state.model_name = self.model_state_name + # Pose Position + state.pose.position.x = self.start_random_pose[random_init][0] + state.pose.position.y = self.start_random_pose[random_init][1] + state.pose.position.z = self.start_random_pose[random_init][2] + # Pose orientation + state.pose.orientation.x = self.start_random_pose[random_init][3] + state.pose.orientation.y = self.start_random_pose[random_init][4] + state.pose.orientation.z = self.start_random_pose[random_init][5] + state.pose.orientation.w = self.start_random_pose[random_init][6] + + # quaternion = quaternion_from_euler( + # self.start_random_pose[random_init][3], + # self.start_random_pose[random_init][4], + # self.start_random_pose[random_init][5], + # ) + # state.pose.orientation.x = quaternion[0] + # state.pose.orientation.y = quaternion[1] + # state.pose.orientation.z = quaternion[2] + # state.pose.orientation.w = quaternion[3] + + # print_messages( + # "en _gazebo_set_random_pose_f1_follow_rigth_lane()", + # random_init=random_init, + # start_random_pose=self.start_random_pose, + # start_pose=self.start_pose, + # start_random_pose0=self.start_random_pose[random_init][0], + # start_random_pose1=self.start_random_pose[random_init][1], + # start_random_pose2=self.start_random_pose[random_init][2], + # start_random_pose3=self.start_random_pose[random_init][3], + # start_random_pose4=self.start_random_pose[random_init][4], + # start_random_pose5=self.start_random_pose[random_init][5], + # state_pose_position=state.pose.position, + # state_pose_orientation=state.pose.orientation, + # model_state_name=self.model_state_name, + # ) + + rospy.wait_for_service("/gazebo/set_model_state") + try: + set_state = rospy.ServiceProxy("/gazebo/set_model_state", SetModelState) + set_state(state) + except rospy.ServiceException as e: + print(f"Service call failed: {e}") + return pos_number + + def _render(self, mode="human", close=False): + + if close: + tmp = os.popen("ps -Af").read() + proccount = tmp.count("gzclient") + if proccount > 0: + if self.gzclient_pid != 0: + os.kill(self.gzclient_pid, signal.SIGTERM) + os.wait() + return + + tmp = os.popen("ps -Af").read() + proccount = tmp.count("gzclient") + if proccount < 1: + subprocess.Popen("gzclient") + self.gzclient_pid = int( + subprocess.check_output(["pidof", "-s", "gzclient"]) + ) + else: + self.gzclient_pid = 0 + + @staticmethod + def _close(): + + # Kill gzclient, gzserver and roscore + tmp = os.popen("ps -Af").read() + gzclient_count = tmp.count("gzclient") + gzserver_count = tmp.count("gzserver") + roscore_count = tmp.count("roscore") + rosmaster_count = tmp.count("rosmaster") + + if gzclient_count > 0: + os.system("killall -9 gzclient") + if gzserver_count > 0: + os.system("killall -9 gzserver") + if rosmaster_count > 0: + os.system("killall -9 rosmaster") + if roscore_count > 0: + os.system("killall -9 roscore") + + if gzclient_count or gzserver_count or roscore_count or rosmaster_count > 0: + os.wait() + + def _configure(self): + + # TODO + # From OpenAI API: Provides runtime configuration to the enviroment + # Maybe set the Real Time Factor? + pass + + def _seed(self): + + # TODO + # From OpenAI API: Sets the seed for this env's random number generator(s) + pass diff --git a/rl_studio/envs/gazebo/robot_mesh/robot_mesh_position_env.py b/rl_studio/envs/gazebo/robot_mesh/robot_mesh_position_env.py index 2b1bd040f..d3a4f1239 100755 --- a/rl_studio/envs/gazebo/robot_mesh/robot_mesh_position_env.py +++ b/rl_studio/envs/gazebo/robot_mesh/robot_mesh_position_env.py @@ -10,7 +10,7 @@ from gym.utils import seeding from std_srvs.srv import Empty -from rl_studio.envs.gazebo import gazebo_envs +from rl_studio.envs.gazebo.robot_mesh import gazebo_envs def euclidean_distance(x_a, x_b, y_a, y_b): @@ -20,6 +20,7 @@ def euclidean_distance(x_a, x_b, y_a, y_b): class RobotMeshEnv(gazebo_envs.GazeboEnv): def __init__(self, **config): self.actions = config.get("actions") + config = config["environments"] self.action_space = spaces.Discrete( len(self.actions) ) # actions # spaces.Discrete(3) # F,L,R @@ -34,7 +35,6 @@ def __init__(self, **config): self.reset_pos_x = config.get("pos_x") self.reset_pos_y = config.get("pos_y") self.reset_pos_z = config.get("pos_z") - self.vel_pub = rospy.Publisher("/cmd_vel", Twist, queue_size=5) self.unpause = rospy.ServiceProxy("/gazebo/unpause_physics", Empty) self.pause = rospy.ServiceProxy("/gazebo/pause_physics", Empty) self.reset_proxy = rospy.ServiceProxy("/gazebo/reset_simulation", Empty) @@ -48,6 +48,18 @@ def __init__(self, **config): self.movement_precision = 0.6 self.cells_span = self.actions_force / 10 + self.gzclient_pid = 0 + self.vel_pub = rospy.Publisher("/cmd_vel", Twist, queue_size=5) + # self.vel_pub = self.get_publisher("/cmd_vel", Twist, queue_size=5) + + # Launch the simulation with the given launchfile name + rospy.init_node("gym", anonymous=True) + time.sleep(10) + vel_cmd = Twist() + vel_cmd.linear.x = self.actions_force + vel_cmd.angular.z = 0 + self.vel_pub.publish(vel_cmd) + def render(self, mode="human"): pass @@ -76,6 +88,8 @@ def step(self, action): state.pose.orientation.y = self.actions[action][1] state.pose.orientation.z = self.actions[action][2] state.pose.orientation.w = self.actions[action][3] + state.twist.linear.x = self.actions_force + rospy.wait_for_service("/gazebo/set_model_state") try: set_state = rospy.ServiceProxy("/gazebo/set_model_state", SetModelState) @@ -83,12 +97,6 @@ def step(self, action): except rospy.ServiceException as e: print(f"Service call failed: {e}") - vel_cmd = Twist() - vel_cmd.linear.x = self.actions_force - vel_cmd.angular.z = 0 - - self.vel_pub.publish(vel_cmd) - self._gazebo_unpause() time.sleep(0.125) @@ -111,9 +119,9 @@ def step(self, action): completed = False if ( - euclidean_distance(x_prev, x, y_prev, y) - < self.movement_precision * self.cells_span - and self.boot_on_crash + euclidean_distance(x_prev, x, y_prev, y) + < self.movement_precision * self.cells_span + and self.boot_on_crash ): reward = -1 done = True diff --git a/rl_studio/envs/openai_gym/cartpole/cartpole_env.py b/rl_studio/envs/openai_gym/cartpole/cartpole_env.py index 5e8ea7373..7cabe1737 100755 --- a/rl_studio/envs/openai_gym/cartpole/cartpole_env.py +++ b/rl_studio/envs/openai_gym/cartpole/cartpole_env.py @@ -89,6 +89,7 @@ class CartPoleEnv(gym.Env[np.ndarray, Union[int, np.ndarray]]): def __init__(self, random_start_level, initial_pole_angle=None, render_mode: Optional[str] = None, non_recoverable_angle=0.3, punish=0, reward_value=1, reward_shaping=0): + self.last_step_time = None self.random_start_level = random_start_level self.gravity = 9.8 @@ -179,8 +180,8 @@ def perturbate(self, action, intensity_deviation): self.renderer.render_step() return np.array(self.state, dtype=np.float32), terminated, False, {} - - def step(self, action): + def step(self, action, elapsed_time=None): + tau = elapsed_time or self.tau err_msg = f"{action!r} ({type(action)}) invalid" assert self.action_space.contains(action), err_msg assert self.state is not None, "Call reset before using step method." @@ -200,15 +201,15 @@ def step(self, action): xacc = temp - self.polemass_length * thetaacc * costheta / self.total_mass if self.kinematics_integrator == "euler": - x = x + self.tau * x_dot - x_dot = x_dot + self.tau * xacc - theta = theta + self.tau * theta_dot - theta_dot = theta_dot + self.tau * thetaacc + x = x + tau * x_dot + x_dot = x_dot + tau * xacc + theta = theta + tau * theta_dot + theta_dot = theta_dot + tau * thetaacc else: # semi-implicit euler - x_dot = x_dot + self.tau * xacc - x = x + self.tau * x_dot - theta_dot = theta_dot + self.tau * thetaacc - theta = theta + self.tau * theta_dot + x_dot = x_dot + tau * xacc + x = x + tau * x_dot + theta_dot = theta_dot + tau * thetaacc + theta = theta + tau * theta_dot self.state = (x, x_dot, theta, theta_dot) @@ -246,7 +247,7 @@ def step(self, action): reward = self.punish self.renderer.render_step() - return np.array(self.state, dtype=np.float32), reward, terminated, False, {} + return np.array(self.state, dtype=np.float32), reward, terminated, False, {"time": self.tau} def reset( self, diff --git a/rl_studio/envs/openai_gym/cartpole/cartpole_env_continuous.py b/rl_studio/envs/openai_gym/cartpole/cartpole_env_continuous.py new file mode 100755 index 000000000..3c1b70b62 --- /dev/null +++ b/rl_studio/envs/openai_gym/cartpole/cartpole_env_continuous.py @@ -0,0 +1,315 @@ +""" +Classic cart-pole system implemented by Rich Sutton et al. +Copied from http://incompleteideas.net/sutton/book/code/pole.c +permalink: https://perma.cc/C9ZM-652R +""" +import math +from typing import Optional, Union + +# Note that this environment needs gym==0.25.0 to work +import gym +import numpy as np +from gym import logger, spaces +from gym.envs.classic_control import utils +from gym.error import DependencyNotInstalled +from gym.utils.renderer import Renderer + +class CartPoleEnv(gym.Env): + metadata = { + 'render_modes': ['human', 'rgb_array'], + 'render_fps': 50 + } + + def __init__(self, random_start_level, initial_pole_angle=None, render_mode: Optional[str] = None, + non_recoverable_angle=0.3, punish=0, reward_value=1, reward_shaping=0): + self.random_start_level = random_start_level + + self.gravity = 9.8 + self.masscart = 1.0 + self.masspole = 0.1 + self.punish = punish + self.reward_value = reward_value + self.reward_shaping = reward_shaping + self.total_mass = self.masspole + self.masscart + self.length = 0.5 # actually half the pole's length + self.polemass_length = self.masspole * self.length + self.force_mag = 10.0 + self.tau = 0.02 # seconds between state updates + self.min_action = -1.0 + self.max_action = 1.0 + self.kinematics_integrator = "euler" + + # Angle at which to fail the episode + self.theta_threshold_radians = non_recoverable_angle * 50 * 2 * math.pi / 360 + self.theta_threshold_radians_center = non_recoverable_angle/2 * 50 * 2 * math.pi / 360 + self.x_threshold = 2.4 + self.x_threshold_center = 1.4 + self.init_pole_angle = initial_pole_angle + + # Angle limit set to 2 * theta_threshold_radians so failing observation + # is still within bounds. + high = np.array( + [ + self.x_threshold * 2, + np.finfo(np.float32).max, + self.theta_threshold_radians * 2, + np.finfo(np.float32).max, + ], + dtype=np.float32, + ) + + self.action_space = spaces.Box( + low=self.min_action, + high=self.max_action, + shape=(1,) + ) + self.observation_space = spaces.Box(-high, high, dtype=np.float32) + + self.render_mode = render_mode + self.renderer = Renderer(self.render_mode, self._render) + + self.screen_width = 600 + self.screen_height = 400 + self.screen = None + self.clock = None + self.isopen = True + self.state = None + + self.steps_beyond_terminated = None + + def perturbate(self, action, intensity_deviation): + assert self.state is not None, "Call reset before using step method." + x, x_dot, theta, theta_dot = self.state + force = self.force_mag if action == 1 else -self.force_mag + force += np.random.normal(loc=0.0, scale=intensity_deviation, size=None) + costheta = math.cos(theta) + sintheta = math.sin(theta) + + # For the interested reader: + # https://coneural.org/florian/papers/05_cart_pole.pdf + temp = ( + force + self.polemass_length * theta_dot ** 2 * sintheta + ) / self.total_mass + thetaacc = (self.gravity * sintheta - costheta * temp) / ( + self.length * (4.0 / 3.0 - self.masspole * costheta ** 2 / self.total_mass) + ) + xacc = temp - self.polemass_length * thetaacc * costheta / self.total_mass + + if self.kinematics_integrator == "euler": + x = x + self.tau * x_dot + x_dot = x_dot + self.tau * xacc + theta = theta + self.tau * theta_dot + theta_dot = theta_dot + self.tau * thetaacc + else: # semi-implicit euler + x_dot = x_dot + self.tau * xacc + x = x + self.tau * x_dot + theta_dot = theta_dot + self.tau * thetaacc + theta = theta + self.tau * theta_dot + + self.state = (x, x_dot, theta, theta_dot) + + terminated = bool( + x < -self.x_threshold + or x > self.x_threshold + or theta < -self.theta_threshold_radians + or theta > self.theta_threshold_radians + ) + + self.renderer.render_step() + return np.array(self.state, dtype=np.float32), terminated, False, {} + + def step(self, action): + action = np.clip(action, -1, 1)[0] + assert self.state is not None, "Call reset before using step method." + x, x_dot, theta, theta_dot = self.state + force = self.force_mag * action + costheta = math.cos(theta) + sintheta = math.sin(theta) + + # For the interested reader: + # https://coneural.org/florian/papers/05_cart_pole.pdf + temp = ( + force + self.polemass_length * theta_dot ** 2 * sintheta + ) / self.total_mass + thetaacc = (self.gravity * sintheta - costheta * temp) / ( + self.length * (4.0 / 3.0 - self.masspole * costheta ** 2 / self.total_mass) + ) + xacc = temp - self.polemass_length * thetaacc * costheta / self.total_mass + + if self.kinematics_integrator == "euler": + x = x + self.tau * x_dot + x_dot = x_dot + self.tau * xacc + theta = theta + self.tau * theta_dot + theta_dot = theta_dot + self.tau * thetaacc + else: # semi-implicit euler + x_dot = x_dot + self.tau * xacc + x = x + self.tau * x_dot + theta_dot = theta_dot + self.tau * thetaacc + theta = theta + self.tau * theta_dot + + self.state = (x, x_dot, theta, theta_dot) + + terminated = bool( + x < -self.x_threshold + or x > self.x_threshold + or theta < -self.theta_threshold_radians + or theta > self.theta_threshold_radians + ) + + centered = bool( + x < -self.x_threshold_center + or x > self.x_threshold_center + or theta < -self.theta_threshold_radians_center + or theta > self.theta_threshold_radians_center + ) + + if not terminated and centered: + reward = self.reward_value + self.reward_shaping + elif not terminated: + reward = self.reward_value + elif self.steps_beyond_terminated is None: + # Pole just fell! + self.steps_beyond_terminated = 0 + reward = self.reward_value + else: + if self.steps_beyond_terminated == 0: + logger.warn( + "You are calling 'step()' even though this " + "environment has already returned terminated = True. You " + "should always call 'reset()' once you receive 'terminated = " + "True' -- any further steps are undefined behavior." + ) + self.steps_beyond_terminated += 1 + reward = self.punish + + self.renderer.render_step() + return np.array(self.state, dtype=np.float32), reward, terminated, False, {"time": self.tau} + + def reset( + self, + *, + seed: Optional[int] = None, + return_info: bool = False, + options: Optional[dict] = None, + ): + super().reset(seed=seed) + # Note that if you use custom reset bounds, it may lead to out-of-bound + # state/observations. + low, high = utils.maybe_parse_reset_bounds( + options, -self.random_start_level, self.random_start_level # default low + ) # default high + self.state = self.np_random.uniform(low=low, high=high, size=(4,)) + if self.init_pole_angle is not None: + self.state[2] = self.init_pole_angle + + self.steps_beyond_terminated = None + self.renderer.reset() + self.renderer.render_step() + if not return_info: + return np.array(self.state, dtype=np.float32) + else: + return np.array(self.state, dtype=np.float32), {} + + def render(self, mode="human"): + if self.render_mode is not None: + return self.renderer.get_renders() + else: + return self._render(mode) + + def _render(self, mode="human"): + assert mode in self.metadata["render_modes"] + try: + import pygame + from pygame import gfxdraw + except ImportError: + raise DependencyNotInstalled( + "pygame is not installed, run `pip install gym[classic_control]`" + ) + + if self.screen is None: + pygame.init() + if mode == "human": + pygame.display.init() + self.screen = pygame.display.set_mode( + (self.screen_width, self.screen_height) + ) + else: # mode in {"rgb_array", "single_rgb_array"} + self.screen = pygame.Surface((self.screen_width, self.screen_height)) + if self.clock is None: + self.clock = pygame.time.Clock() + + world_width = self.x_threshold * 2 + scale = self.screen_width / world_width + polewidth = 10.0 + polelen = scale * (2 * self.length) + cartwidth = 50.0 + cartheight = 30.0 + + if self.state is None: + return None + + x = self.state + + self.surf = pygame.Surface((self.screen_width, self.screen_height)) + self.surf.fill((255, 255, 255)) + + l, r, t, b = -cartwidth / 2, cartwidth / 2, cartheight / 2, -cartheight / 2 + axleoffset = cartheight / 4.0 + cartx = x[0] * scale + self.screen_width / 2.0 # MIDDLE OF CART + carty = 100 # TOP OF CART + cart_coords = [(l, b), (l, t), (r, t), (r, b)] + cart_coords = [(c[0] + cartx, c[1] + carty) for c in cart_coords] + gfxdraw.aapolygon(self.surf, cart_coords, (0, 0, 0)) + gfxdraw.filled_polygon(self.surf, cart_coords, (0, 0, 0)) + + l, r, t, b = ( + -polewidth / 2, + polewidth / 2, + polelen - polewidth / 2, + -polewidth / 2, + ) + + pole_coords = [] + for coord in [(l, b), (l, t), (r, t), (r, b)]: + coord = pygame.math.Vector2(coord).rotate_rad(-x[2]) + coord = (coord[0] + cartx, coord[1] + carty + axleoffset) + pole_coords.append(coord) + gfxdraw.aapolygon(self.surf, pole_coords, (202, 152, 101)) + gfxdraw.filled_polygon(self.surf, pole_coords, (202, 152, 101)) + + gfxdraw.aacircle( + self.surf, + int(cartx), + int(carty + axleoffset), + int(polewidth / 2), + (129, 132, 203), + ) + gfxdraw.filled_circle( + self.surf, + int(cartx), + int(carty + axleoffset), + int(polewidth / 2), + (129, 132, 203), + ) + + gfxdraw.hline(self.surf, 0, self.screen_width, carty, (0, 0, 0)) + + self.surf = pygame.transform.flip(self.surf, False, True) + self.screen.blit(self.surf, (0, 0)) + if mode == "human": + pygame.event.pump() + self.clock.tick(self.metadata["render_fps"]) + pygame.display.flip() + + elif mode in {"rgb_array", "single_rgb_array"}: + return np.transpose( + np.array(pygame.surfarray.pixels3d(self.screen)), axes=(1, 0, 2) + ) + + def close(self): + if self.screen is not None: + import pygame + + pygame.display.quit() + pygame.quit() + self.isopen = False diff --git a/rl_studio/envs/openai_gym/cartpole/cartpole_env_continuous_improved.py b/rl_studio/envs/openai_gym/cartpole/cartpole_env_continuous_improved.py new file mode 100755 index 000000000..f1da7c376 --- /dev/null +++ b/rl_studio/envs/openai_gym/cartpole/cartpole_env_continuous_improved.py @@ -0,0 +1,334 @@ +""" +Classic cart-pole system implemented by Rich Sutton et al. +Copied from http://incompleteideas.net/sutton/book/code/pole.c +permalink: https://perma.cc/C9ZM-652R +""" +import math +from typing import Optional, Union +import datetime + +# Note that this environment needs gym==0.25.0 to work +import gym +import numpy as np +from gym import logger, spaces +from gym.envs.classic_control import utils +from gym.error import DependencyNotInstalled +from gym.utils.renderer import Renderer + +class CartPoleEnv(gym.Env): + metadata = { + 'render_modes': ['human', 'rgb_array'], + 'render_fps': 50 + } + + def __init__(self, random_start_level, initial_pole_angle=None, render_mode: Optional[str] = None, + non_recoverable_angle=0.3, punish=0, reward_value=1, reward_shaping=0): + self.last_step_time = None + self.random_start_level = random_start_level + + self.gravity = 9.8 + self.masscart = 1.0 + self.masspole = 0.1 + self.punish = punish + self.reward_value = reward_value + self.reward_shaping = reward_shaping + self.total_mass = self.masspole + self.masscart + self.length = 0.5 # actually half the pole's length + self.polemass_length = self.masspole * self.length + self.force_mag = 10.0 + self.tau = 0.02 # seconds between state updates + self.min_action = -1.0 + self.max_action = 1.0 + self.kinematics_integrator = "euler" + + # Angle at which to fail the episode + self.theta_threshold_radians = non_recoverable_angle * 50 * 2 * math.pi / 360 + self.theta_threshold_radians_center = non_recoverable_angle/2 * 50 * 2 * math.pi / 360 + self.x_threshold = 2.4 + self.x_threshold_center = 1.4 + self.init_pole_angle = initial_pole_angle + + # Angle limit set to 2 * theta_threshold_radians so failing observation + # is still within bounds. + high = np.array( + [ + self.x_threshold * 2, + np.finfo(np.float32).max, + self.theta_threshold_radians * 2, + np.finfo(np.float32).max, + ], + dtype=np.float32, + ) + + self.action_space = spaces.Box( + low=self.min_action, + high=self.max_action, + shape=(1,) + ) + self.observation_space = spaces.Box(-high, high, dtype=np.float32) + + self.render_mode = render_mode + self.renderer = Renderer(self.render_mode, self._render) + + self.screen_width = 600 + self.screen_height = 400 + self.screen = None + self.clock = None + self.isopen = True + self.state = None + + self.steps_beyond_terminated = None + + def perturbate(self, action, intensity_deviation): + assert self.state is not None, "Call reset before using step method." + x, x_dot, theta, theta_dot = self.state + force = self.force_mag if action == 1 else -self.force_mag + force += np.random.normal(loc=0.0, scale=intensity_deviation, size=None) + costheta = math.cos(theta) + sintheta = math.sin(theta) + + # For the interested reader: + # https://coneural.org/florian/papers/05_cart_pole.pdf + temp = ( + force + self.polemass_length * theta_dot ** 2 * sintheta + ) / self.total_mass + thetaacc = (self.gravity * sintheta - costheta * temp) / ( + self.length * (4.0 / 3.0 - self.masspole * costheta ** 2 / self.total_mass) + ) + xacc = temp - self.polemass_length * thetaacc * costheta / self.total_mass + + now = datetime.datetime.now() + decision_duration = now - self.last_step_time + decision_duration = decision_duration.total_seconds() + decision_duration += 0.02 + + if self.kinematics_integrator == "euler": + x = x + decision_duration * x_dot + x_dot = x_dot + decision_duration * xacc + theta = theta + decision_duration * theta_dot + theta_dot = theta_dot + decision_duration * thetaacc + else: # semi-implicit euler + x_dot = x_dot + decision_duration * xacc + x = x + decision_duration * x_dot + theta_dot = theta_dot + decision_duration * thetaacc + theta = theta + decision_duration * theta_dot + + self.state = (x, x_dot, theta, theta_dot) + + terminated = bool( + x < -self.x_threshold + or x > self.x_threshold + or theta < -self.theta_threshold_radians + or theta > self.theta_threshold_radians + ) + + self.renderer.render_step() + return np.array(self.state, dtype=np.float32), terminated, False, {} + def step(self, action): + """ + This step function is considering the effect of time-to-inference or control iteration duration + """ + now = datetime.datetime.now() + decision_duration = now - self.last_step_time + self.last_step_time = now + return self.tick_step(action, decision_duration) + + def tick_step(self, action, elapsed_time=None): + timedelta = elapsed_time or self.tau + tau = timedelta.total_seconds() + tau += 0.02 + action = np.clip(action, -1, 1)[0] + assert self.state is not None, "Call reset before using step method." + x, x_dot, theta, theta_dot = self.state + force = self.force_mag * action + costheta = math.cos(theta) + sintheta = math.sin(theta) + + # For the interested reader: + # https://coneural.org/florian/papers/05_cart_pole.pdf + temp = ( + force + self.polemass_length * theta_dot ** 2 * sintheta + ) / self.total_mass + thetaacc = (self.gravity * sintheta - costheta * temp) / ( + self.length * (4.0 / 3.0 - self.masspole * costheta ** 2 / self.total_mass) + ) + xacc = temp - self.polemass_length * thetaacc * costheta / self.total_mass + + if self.kinematics_integrator == "euler": + x = x + tau * x_dot + x_dot = x_dot + tau * xacc + theta = theta + tau * theta_dot + theta_dot = theta_dot + tau * thetaacc + else: # semi-implicit euler + x_dot = x_dot + tau * xacc + x = x + tau * x_dot + theta_dot = theta_dot + tau * thetaacc + theta = theta + tau * theta_dot + + self.state = (x, x_dot, theta, theta_dot) + + terminated = bool( + x < -self.x_threshold + or x > self.x_threshold + or theta < -self.theta_threshold_radians + or theta > self.theta_threshold_radians + ) + + centered = bool( + x < -self.x_threshold_center + or x > self.x_threshold_center + or theta < -self.theta_threshold_radians_center + or theta > self.theta_threshold_radians_center + ) + + if not terminated and centered: + reward = self.reward_value + self.reward_shaping + elif not terminated: + reward = self.reward_value + elif self.steps_beyond_terminated is None: + # Pole just fell! + self.steps_beyond_terminated = 0 + reward = self.reward_value + else: + if self.steps_beyond_terminated == 0: + logger.warn( + "You are calling 'step()' even though this " + "environment has already returned terminated = True. You " + "should always call 'reset()' once you receive 'terminated = " + "True' -- any further steps are undefined behavior." + ) + self.steps_beyond_terminated += 1 + reward = self.punish + + self.renderer.render_step() + return np.array(self.state, dtype=np.float32), reward, terminated, False, {"time": tau - 0.02} + + def reset( + self, + *, + seed: Optional[int] = None, + return_info: bool = False, + options: Optional[dict] = None, + ): + super().reset(seed=seed) + # Note that if you use custom reset bounds, it may lead to out-of-bound + # state/observations. + low, high = utils.maybe_parse_reset_bounds( + options, -self.random_start_level, self.random_start_level # default low + ) # default high + self.state = self.np_random.uniform(low=low, high=high, size=(4,)) + if self.init_pole_angle is not None: + self.state[2] = self.init_pole_angle + + self.steps_beyond_terminated = None + self.renderer.reset() + self.renderer.render_step() + self.last_step_time=datetime.datetime.now() + if not return_info: + return np.array(self.state, dtype=np.float32) + else: + return np.array(self.state, dtype=np.float32), {} + + def render(self, mode="human"): + if self.render_mode is not None: + return self.renderer.get_renders() + else: + return self._render(mode) + + def _render(self, mode="human"): + assert mode in self.metadata["render_modes"] + try: + import pygame + from pygame import gfxdraw + except ImportError: + raise DependencyNotInstalled( + "pygame is not installed, run `pip install gym[classic_control]`" + ) + + if self.screen is None: + pygame.init() + if mode == "human": + pygame.display.init() + self.screen = pygame.display.set_mode( + (self.screen_width, self.screen_height) + ) + else: # mode in {"rgb_array", "single_rgb_array"} + self.screen = pygame.Surface((self.screen_width, self.screen_height)) + if self.clock is None: + self.clock = pygame.time.Clock() + + world_width = self.x_threshold * 2 + scale = self.screen_width / world_width + polewidth = 10.0 + polelen = scale * (2 * self.length) + cartwidth = 50.0 + cartheight = 30.0 + + if self.state is None: + return None + + x = self.state + + self.surf = pygame.Surface((self.screen_width, self.screen_height)) + self.surf.fill((255, 255, 255)) + + l, r, t, b = -cartwidth / 2, cartwidth / 2, cartheight / 2, -cartheight / 2 + axleoffset = cartheight / 4.0 + cartx = x[0] * scale + self.screen_width / 2.0 # MIDDLE OF CART + carty = 100 # TOP OF CART + cart_coords = [(l, b), (l, t), (r, t), (r, b)] + cart_coords = [(c[0] + cartx, c[1] + carty) for c in cart_coords] + gfxdraw.aapolygon(self.surf, cart_coords, (0, 0, 0)) + gfxdraw.filled_polygon(self.surf, cart_coords, (0, 0, 0)) + + l, r, t, b = ( + -polewidth / 2, + polewidth / 2, + polelen - polewidth / 2, + -polewidth / 2, + ) + + pole_coords = [] + for coord in [(l, b), (l, t), (r, t), (r, b)]: + coord = pygame.math.Vector2(coord).rotate_rad(-x[2]) + coord = (coord[0] + cartx, coord[1] + carty + axleoffset) + pole_coords.append(coord) + gfxdraw.aapolygon(self.surf, pole_coords, (202, 152, 101)) + gfxdraw.filled_polygon(self.surf, pole_coords, (202, 152, 101)) + + gfxdraw.aacircle( + self.surf, + int(cartx), + int(carty + axleoffset), + int(polewidth / 2), + (129, 132, 203), + ) + gfxdraw.filled_circle( + self.surf, + int(cartx), + int(carty + axleoffset), + int(polewidth / 2), + (129, 132, 203), + ) + + gfxdraw.hline(self.surf, 0, self.screen_width, carty, (0, 0, 0)) + + self.surf = pygame.transform.flip(self.surf, False, True) + self.screen.blit(self.surf, (0, 0)) + if mode == "human": + pygame.event.pump() + self.clock.tick(self.metadata["render_fps"]) + pygame.display.flip() + + elif mode in {"rgb_array", "single_rgb_array"}: + return np.transpose( + np.array(pygame.surfarray.pixels3d(self.screen)), axes=(1, 0, 2) + ) + + def close(self): + if self.screen is not None: + import pygame + + pygame.display.quit() + pygame.quit() + self.isopen = False diff --git a/rl_studio/envs/openai_gym/cartpole/cartpole_env_improved.py b/rl_studio/envs/openai_gym/cartpole/cartpole_env_improved.py new file mode 100755 index 000000000..68d398309 --- /dev/null +++ b/rl_studio/envs/openai_gym/cartpole/cartpole_env_improved.py @@ -0,0 +1,400 @@ +""" +Classic cart-pole system implemented by Rich Sutton et al. +Copied from http://incompleteideas.net/sutton/book/code/pole.c +permalink: https://perma.cc/C9ZM-652R +""" +import math +from typing import Optional, Union +import datetime + +# Note that this environment needs gym==0.25.0 to work +import gym +import numpy as np +from gym import logger, spaces +from gym.envs.classic_control import utils +from gym.error import DependencyNotInstalled +from gym.utils.renderer import Renderer + + +class CartPoleEnv(gym.Env[np.ndarray, Union[int, np.ndarray]]): + """ + ### Description + + This environment corresponds to the version of the cart-pole problem described by Barto, Sutton, and Anderson in + ["Neuronlike Adaptive Elements That Can Solve Difficult Learning Control Problem"](https://ieeexplore.ieee.org/document/6313077). + A pole is attached by an un-actuated joint to a cart, which moves along a frictionless track. + The pendulum is placed upright on the cart and the goal is to balance the pole by applying forces + in the left and right direction on the cart. + + ### Action Space + + The action is a `ndarray` with shape `(1,)` which can take values `{0, 1}` indicating the direction + of the fixed force the cart is pushed with. + + | Num | Action | + |-----|------------------------| + | 0 | Push cart to the left | + | 1 | Push cart to the right | + + **Note**: The velocity that is reduced or increased by the applied force is not fixed and it depends on the angle + the pole is pointing. The center of gravity of the pole varies the amount of energy needed to move the cart underneath it + + ### Observation Space + + The observation is a `ndarray` with shape `(4,)` with the values corresponding to the following positions and velocities: + + | Num | Observation | Min | Max | + |-----|-----------------------|---------------------|-------------------| + | 0 | Cart Position | -4.8 | 4.8 | + | 1 | Cart Velocity | -Inf | Inf | + | 2 | Pole Angle | ~ -0.418 rad (-24°) | ~ 0.418 rad (24°) | + | 3 | Pole Angular Velocity | -Inf | Inf | + + **Note:** While the ranges above denote the possible values for observation space of each element, + it is not reflective of the allowed values of the state space in an unterminated episode. Particularly: + - The cart x-position (index 0) can be take values between `(-4.8, 4.8)`, but the episode terminates + if the cart leaves the `(-2.4, 2.4)` range. + - The pole angle can be observed between `(-.418, .418)` radians (or **±24°**), but the episode terminates + if the pole angle is not in the range `(-.2095, .2095)` (or **±12°**) + + ### Rewards + + Since the goal is to keep the pole upright for as long as possible, a reward of `+1` for every step taken, + including the termination step, is allotted. The threshold for rewards is 475 for v1. + + ### Starting State + + All observations are assigned a uniformly random value in `(-0.05, 0.05)` + + ### Episode End + + The episode ends if any one of the following occurs: + + 1. Termination: Pole Angle is greater than ±12° + 2. Termination: Cart Position is greater than ±2.4 (center of the cart reaches the edge of the display) + 3. Truncation: Episode length is greater than 500 (200 for v0) + + ### Arguments + + ``` + gym.make('CartPole-v1') + ``` + + No additional arguments are currently supported. + """ + + metadata = { + "render_modes": ["human", "rgb_array", "single_rgb_array"], + "render_fps": 50, + } + + def __init__(self, random_start_level, initial_pole_angle=None, render_mode: Optional[str] = None, + non_recoverable_angle=0.3, punish=0, reward_value=1, reward_shaping=0): + self.last_step_time = None + self.random_start_level = random_start_level + + self.gravity = 9.8 + self.masscart = 1.0 + self.masspole = 0.1 + self.punish = punish + self.reward_value = reward_value + self.reward_shaping = reward_shaping + self.total_mass = self.masspole + self.masscart + self.length = 0.5 # actually half the pole's length + self.polemass_length = self.masspole * self.length + self.force_mag = 10.0 + self.tau = 0.02 # seconds between state updates + self.kinematics_integrator = "euler" + # Angle at which to fail the episode + self.theta_threshold_radians = non_recoverable_angle * 50 * 2 * math.pi / 360 + self.theta_threshold_radians_center = non_recoverable_angle/2 * 50 * 2 * math.pi / 360 + self.x_threshold = 2.4 + self.x_threshold_center = 1.4 + self.init_pole_angle = initial_pole_angle + + # Angle limit set to 2 * theta_threshold_radians so failing observation + # is still within bounds. + high = np.array( + [ + self.x_threshold * 2, + np.finfo(np.float32).max, + self.theta_threshold_radians * 2, + np.finfo(np.float32).max, + ], + dtype=np.float32, + ) + + self.action_space = spaces.Discrete(2) + self.observation_space = spaces.Box(-high, high, dtype=np.float32) + + self.render_mode = render_mode + self.renderer = Renderer(self.render_mode, self._render) + + self.screen_width = 600 + self.screen_height = 400 + self.screen = None + self.clock = None + self.isopen = True + self.state = None + + self.steps_beyond_terminated = None + + def perturbate(self, action, intensity_deviation): + err_msg = f"{action!r} ({type(action)}) invalid" + assert self.action_space.contains(action), err_msg + assert self.state is not None, "Call reset before using step method." + x, x_dot, theta, theta_dot = self.state + force = self.force_mag if action == 1 else -self.force_mag + force += np.random.normal(loc=0.0, scale=intensity_deviation, size=None) + costheta = math.cos(theta) + sintheta = math.sin(theta) + + # For the interested reader: + # https://coneural.org/florian/papers/05_cart_pole.pdf + temp = ( + force + self.polemass_length * theta_dot ** 2 * sintheta + ) / self.total_mass + thetaacc = (self.gravity * sintheta - costheta * temp) / ( + self.length * (4.0 / 3.0 - self.masspole * costheta ** 2 / self.total_mass) + ) + xacc = temp - self.polemass_length * thetaacc * costheta / self.total_mass + + now = datetime.datetime.now() + decision_duration = now - self.last_step_time + decision_duration = decision_duration.total_seconds() + decision_duration += 0.02 + + if self.kinematics_integrator == "euler": + x = x + decision_duration * x_dot + x_dot = x_dot + decision_duration * xacc + theta = theta + decision_duration * theta_dot + theta_dot = theta_dot + decision_duration * thetaacc + else: # semi-implicit euler + x_dot = x_dot + decision_duration * xacc + x = x + decision_duration * x_dot + theta_dot = theta_dot + decision_duration * thetaacc + theta = theta + decision_duration * theta_dot + + self.state = (x, x_dot, theta, theta_dot) + + terminated = bool( + x < -self.x_threshold + or x > self.x_threshold + or theta < -self.theta_threshold_radians + or theta > self.theta_threshold_radians + ) + + self.renderer.render_step() + return np.array(self.state, dtype=np.float32), terminated, False, {} + + def step(self, action): + """ + This step function is considering the effect of time-to-inference or control iteration duration + """ + now = datetime.datetime.now() + decision_duration = now - self.last_step_time + self.last_step_time = now + return self.tick_step(action, decision_duration) + + def tick_step(self, action, elapsed_time=None): + timedelta = elapsed_time or self.tau + tau = timedelta.total_seconds() + # we simulated a lapse of time so the value is not hat low that python can't calculate the physics due + # to lack of precision and to give time to the pole to fall + tau += 0.02 + err_msg = f"{action!r} ({type(action)}) invalid" + assert self.action_space.contains(action), err_msg + assert self.state is not None, "Call reset before using step method." + x, x_dot, theta, theta_dot = self.state + force = self.force_mag if action == 1 else -self.force_mag + costheta = math.cos(theta) + sintheta = math.sin(theta) + + # For the interested reader: + # https://coneural.org/florian/papers/05_cart_pole.pdf + temp = ( + force + self.polemass_length * theta_dot ** 2 * sintheta + ) / self.total_mass + thetaacc = (self.gravity * sintheta - costheta * temp) / ( + self.length * (4.0 / 3.0 - self.masspole * costheta ** 2 / self.total_mass) + ) + xacc = temp - self.polemass_length * thetaacc * costheta / self.total_mass + + if self.kinematics_integrator == "euler": + x = x + tau * x_dot + x_dot = x_dot + tau * xacc + theta = theta + tau * theta_dot + theta_dot = theta_dot + tau * thetaacc + else: # semi-implicit euler + x_dot = x_dot + tau * xacc + x = x + tau * x_dot + theta_dot = theta_dot + tau * thetaacc + theta = theta + tau * theta_dot + + self.state = (x, x_dot, theta, theta_dot) + + terminated = bool( + x < -self.x_threshold + or x > self.x_threshold + or theta < -self.theta_threshold_radians + or theta > self.theta_threshold_radians + ) + + centered = bool( + x < -self.x_threshold_center + or x > self.x_threshold_center + or theta < -self.theta_threshold_radians_center + or theta > self.theta_threshold_radians_center + ) + + if not terminated and centered: + reward = self.reward_value + self.reward_shaping + elif not terminated: + reward = self.reward_value + elif self.steps_beyond_terminated is None: + # Pole just fell! + self.steps_beyond_terminated = 0 + reward = self.reward_value + else: + if self.steps_beyond_terminated == 0: + logger.warn( + "You are calling 'step()' even though this " + "environment has already returned terminated = True. You " + "should always call 'reset()' once you receive 'terminated = " + "True' -- any further steps are undefined behavior." + ) + self.steps_beyond_terminated += 1 + reward = self.punish + + self.renderer.render_step() + return np.array(self.state, dtype=np.float32), reward, terminated, False, {"time": tau-0.02} + + def reset( + self, + *, + seed: Optional[int] = None, + return_info: bool = False, + options: Optional[dict] = None, + ): + super().reset(seed=seed) + # Note that if you use custom reset bounds, it may lead to out-of-bound + # state/observations. + low, high = utils.maybe_parse_reset_bounds( + options, -self.random_start_level, self.random_start_level # default low + ) # default high + self.state = self.np_random.uniform(low=low, high=high, size=(4,)) + if self.init_pole_angle is not None: + self.state[2] = self.init_pole_angle + + self.steps_beyond_terminated = None + self.renderer.reset() + self.renderer.render_step() + self.last_step_time=datetime.datetime.now() + if not return_info: + return np.array(self.state, dtype=np.float32) + else: + return np.array(self.state, dtype=np.float32), {} + + def render(self, mode="human"): + if self.render_mode is not None: + return self.renderer.get_renders() + else: + return self._render(mode) + + def _render(self, mode="human"): + assert mode in self.metadata["render_modes"] + try: + import pygame + from pygame import gfxdraw + except ImportError: + raise DependencyNotInstalled( + "pygame is not installed, run `pip install gym[classic_control]`" + ) + + if self.screen is None: + pygame.init() + if mode == "human": + pygame.display.init() + self.screen = pygame.display.set_mode( + (self.screen_width, self.screen_height) + ) + else: # mode in {"rgb_array", "single_rgb_array"} + self.screen = pygame.Surface((self.screen_width, self.screen_height)) + if self.clock is None: + self.clock = pygame.time.Clock() + + world_width = self.x_threshold * 2 + scale = self.screen_width / world_width + polewidth = 10.0 + polelen = scale * (2 * self.length) + cartwidth = 50.0 + cartheight = 30.0 + + if self.state is None: + return None + + x = self.state + + self.surf = pygame.Surface((self.screen_width, self.screen_height)) + self.surf.fill((255, 255, 255)) + + l, r, t, b = -cartwidth / 2, cartwidth / 2, cartheight / 2, -cartheight / 2 + axleoffset = cartheight / 4.0 + cartx = x[0] * scale + self.screen_width / 2.0 # MIDDLE OF CART + carty = 100 # TOP OF CART + cart_coords = [(l, b), (l, t), (r, t), (r, b)] + cart_coords = [(c[0] + cartx, c[1] + carty) for c in cart_coords] + gfxdraw.aapolygon(self.surf, cart_coords, (0, 0, 0)) + gfxdraw.filled_polygon(self.surf, cart_coords, (0, 0, 0)) + + l, r, t, b = ( + -polewidth / 2, + polewidth / 2, + polelen - polewidth / 2, + -polewidth / 2, + ) + + pole_coords = [] + for coord in [(l, b), (l, t), (r, t), (r, b)]: + coord = pygame.math.Vector2(coord).rotate_rad(-x[2]) + coord = (coord[0] + cartx, coord[1] + carty + axleoffset) + pole_coords.append(coord) + gfxdraw.aapolygon(self.surf, pole_coords, (202, 152, 101)) + gfxdraw.filled_polygon(self.surf, pole_coords, (202, 152, 101)) + + gfxdraw.aacircle( + self.surf, + int(cartx), + int(carty + axleoffset), + int(polewidth / 2), + (129, 132, 203), + ) + gfxdraw.filled_circle( + self.surf, + int(cartx), + int(carty + axleoffset), + int(polewidth / 2), + (129, 132, 203), + ) + + gfxdraw.hline(self.surf, 0, self.screen_width, carty, (0, 0, 0)) + + self.surf = pygame.transform.flip(self.surf, False, True) + self.screen.blit(self.surf, (0, 0)) + if mode == "human": + pygame.event.pump() + self.clock.tick(self.metadata["render_fps"]) + pygame.display.flip() + + elif mode in {"rgb_array", "single_rgb_array"}: + return np.transpose( + np.array(pygame.surfarray.pixels3d(self.screen)), axes=(1, 0, 2) + ) + + def close(self): + if self.screen is not None: + import pygame + + pygame.display.quit() + pygame.quit() + self.isopen = False diff --git a/rl_studio/main_rlstudio.py b/rl_studio/main_rlstudio.py deleted file mode 100644 index aac95a9a4..000000000 --- a/rl_studio/main_rlstudio.py +++ /dev/null @@ -1,106 +0,0 @@ -import argparse -import json - -import yaml - -from rl_studio.agents import TrainerFactory, InferenceExecutorFactory -from rl_studio.agents.trainer import TrainerValidator, InferenceExecutorValidator - - -def get_algorithm(config_file: dict, input_algorithm: str) -> dict: - return { - "name": input_algorithm, - "params": config_file["algorithm"][input_algorithm], - } - - -def get_environment(config_file: dict, input_env: str) -> dict: - return { - "name": input_env, - "params": config_file["environments"][input_env], - "actions": config_file["actions"] - .get("available_actions", None) - .get(config_file["actions"].get("actions_set", None), None), - "actions_set": config_file["actions"].get("actions_set", None), - "actions_number": config_file["actions"].get("actions_number", None), - } - - -def get_agent(config_file: dict, input_agent: str) -> dict: - return { - "name": input_agent, - "params": config_file["agent"][input_agent], - } - - -def get_inference(config_file: dict, input_inference: str) -> dict: - return { - "name": input_inference, - "params": config_file["inference"][input_inference], - } - - -def get_settings(config_file: dict) -> dict: - return { - "name": "settings", - "params": config_file["settings"], - } - - -def main(): - parser = argparse.ArgumentParser() - parser.add_argument( - "-f", "--file", type=argparse.FileType("r"), required=True, default="config.yml" - ) - parser.add_argument("-a", "--agent", type=str, required=True) - parser.add_argument("-e", "--environment", type=str, required=True) - parser.add_argument("-n", "--algorithm", type=str, required=True) - parser.add_argument("-m", "--mode", type=str, required=False, default="training") - - args = parser.parse_args() - config_file = yaml.load(args.file, Loader=yaml.FullLoader) - - if args.mode == "inference": - - inference_params = { - "settings": get_settings(config_file), - "algorithm": get_algorithm(config_file, args.algorithm), - "inference": get_inference(config_file, args.algorithm), - "environment": get_environment(config_file, args.environment), - "agent": get_agent(config_file, args.agent), - } - - # TODO: Create function to check dirs - # os.makedirs("logs", exist_ok=True) - # os.makedirs("images", exist_ok=True) - - # PARAMS - params = InferenceExecutorValidator(**inference_params) - print("PARAMS:\n") - print(json.dumps(dict(params), indent=2)) - inferenceExecutor = InferenceExecutorFactory(params) - inferenceExecutor.main() - - else: - - trainer_params = { - "settings": get_settings(config_file), - "algorithm": get_algorithm(config_file, args.algorithm), - "environment": get_environment(config_file, args.environment), - "agent": get_agent(config_file, args.agent), - } - - # TODO: Create function to check dirs - # os.makedirs("logs", exist_ok=True) - # os.makedirs("images", exist_ok=True) - - # PARAMS - params = TrainerValidator(**trainer_params) - print("PARAMS:\n") - print(json.dumps(dict(params), indent=2)) - trainer = TrainerFactory(params) - trainer.main() - - -if __name__ == "__main__": - main() diff --git a/rl_studio/rewards_per_episode_plot.png b/rl_studio/rewards_per_episode_plot.png new file mode 100644 index 000000000..4dc0eb0e8 Binary files /dev/null and b/rl_studio/rewards_per_episode_plot.png differ diff --git a/rl_studio/rl-studio.py b/rl_studio/rl-studio.py new file mode 100644 index 000000000..565565a7b --- /dev/null +++ b/rl_studio/rl-studio.py @@ -0,0 +1,31 @@ +import argparse + +import yaml + +from rl_studio.agents import TrainerFactory, InferencerFactory + + +def main(): + parser = argparse.ArgumentParser() + parser.add_argument( + "-f", + "--file", + type=argparse.FileType("r"), + required=True, + default="config/config.yaml", + help="In /config dir you will find .yaml examples files", + ) + + args = parser.parse_args() + config_file = yaml.load(args.file, Loader=yaml.FullLoader) + + if config_file["settings"]["mode"] == "inference": + inferencer = InferencerFactory(config_file) + inferencer.main() + else: + trainer = TrainerFactory(config_file) + trainer.main() + + +if __name__ == "__main__": + main() \ No newline at end of file