Skip to content

Commit

Permalink
refactor: unify trigger method
Browse files Browse the repository at this point in the history
refactor: unify trigger method
  • Loading branch information
Gaiejj authored Aug 22, 2023
2 parents 9570307 + 9d20190 commit c9800e1
Show file tree
Hide file tree
Showing 12 changed files with 71 additions and 52 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ jobs:
run: |
python -m pip install --upgrade pip setuptools
- name: Install OmniSafe
- name: Install SafePO
run: |
python -m pip install -vvv -e '.[test]'
Expand Down
11 changes: 2 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
<div align="center">

[![Organization](https://img.shields.io/badge/Organization-PKU--Alignment-blue)](https://github.com/PKU-Alignment)
[![License](https://img.shields.io/github/license/PKU-Alignment/OmniSafe?label=license)](#license)
[![License](https://img.shields.io/github/license/PKU-Alignment/Safe-Policy-Optimization?label=license)](#license)
[![codecov](https://codecov.io/gh/PKU-Alignment/Safe-Policy-Optimization/graph/badge.svg?token=KF0UM0UNXW)](https://codecov.io/gh/PKU-Alignment/Safe-Policy-Optimization)
[![Documentation Status](https://readthedocs.org/projects/safe-policy-optimization/badge/?version=latest)](https://safe-policy-optimization.readthedocs.io/en/latest/?badge=latest)

Expand Down Expand Up @@ -201,7 +201,7 @@ To train a multi-agent algorithm:

```bash
cd safepo/multi_agent
python macpo.py --agent-conf 4x2 --scenario Ant --experiment benchmark
python macpo.py --task Safety2x4AntVelocity-v0 --experiment benchmark
```

You can also train on isaac-gym based environment:
Expand All @@ -213,13 +213,6 @@ python macpo.py --task ShadowHandOver_Safe_joint --experiment benchmark

**As Isaac Gym is holding in PyPI, you should install it manually, then clone [Safety-Gymnasium](https://github.com/PKU-Alignment/safety-gymnasium) instead of installing from PyPI.**

**Note**: The default value for ``task`` is ``MujucoVelocity``. The default scenrio is ``Ant`` while the default agent configuration is ``2x4``. You can run other agent or scenrio by:

```bash
cd safepo/multi_agent
python macpo.py --agent-conf 3x1 --scenario Hopper --experiment benchmark
```

### Plot the result

After running the experiment, you can use the following command to plot the results:
Expand Down
5 changes: 5 additions & 0 deletions docs/source/algorithms/comparision.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,11 @@ they are:
- ``SafetyWalker2dVelocity-v1``
- ``SafetySwimmerVelocity-v1``

.. warning::

It may takes some time to load the results.
If you can not see the results, please directly visit `wandb.ai <https://wandb.ai/pku_rl/SafePO/reports?view=table>`_.

The results are shown as follows.

.. tab-set::
Expand Down
8 changes: 4 additions & 4 deletions docs/source/usage/make.rst
Original file line number Diff line number Diff line change
Expand Up @@ -62,10 +62,10 @@ The terminal output would be like:
.. code-block:: bash
======= commands to run:
running python macpo.py --agent-conf 2x4 --scenario Ant --seed 0 --write-terminal False --experiment benchmark --headless True --total-steps 10000000
running python mappo.py --agent-conf 2x4 --scenario Ant --seed 0 --write-terminal False --experiment benchmark --headless True --total-steps 10000000
running python mappolag.py --agent-conf 2x4 --scenario Ant --seed 0 --write-terminal False --experiment benchmark --headless True --total-steps 10000000
running python happo.py --agent-conf 2x4 --scenario Ant --seed 0 --write-terminal False --experiment benchmark --headless True --total-steps 10000000
running python macpo.py --task Safety2x4AntVelocity-v0 --seed 0 --write-terminal False --experiment benchmark --headless True --total-steps 10000000
running python mappo.py --task Safety2x4AntVelocity-v0 --seed 0 --write-terminal False --experiment benchmark --headless True --total-steps 10000000
running python mappolag.py --task Safety2x4AntVelocity-v0 --seed 0 --write-terminal False --experiment benchmark --headless True --total-steps 10000000
running python happo.py --task Safety2x4AntVelocity-v0 --seed 0 --write-terminal False --experiment benchmark --headless True --total-steps 10000000
...
running python pcpo.py --task SafetyAntVelocity-v1 --seed 0 --write-terminal False --experiment benchmark --total-steps 10000000
running python ppo_lag.py --task SafetyAntVelocity-v1 --seed 0 --write-terminal False --experiment benchmark --total-steps 10000000
Expand Down
6 changes: 1 addition & 5 deletions docs/source/usage/train.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ The multi-agent algorithms running is similar to the single-agent algorithms. Fo
.. code-block:: bash
cd safepo/multi_agent
python mappolag.py --scenario Ant --agent-conf 2x4 --experiment mappo_lag_exp
python mappolag.py --task Safety2x4AntVelocity-v0 --experiment mappo_lag_exp
Then you can check the results in the ``runs/mappo_lag_exp`` folder.

Expand Down Expand Up @@ -79,10 +79,6 @@ We provide the detailed description of the command line arguments in the followi
+-------------------+--------------------------------+----------------------------------------------+
| --task | The task to run | "MujocoVelocity" |
+-------------------+--------------------------------+----------------------------------------------+
| --agent-conf | The agent configuration | "2x4" |
+-------------------+--------------------------------+----------------------------------------------+
| --scenario | The scenario | "Ant" |
+-------------------+--------------------------------+----------------------------------------------+
| --experiment | Experiment name | "Base" |
| | If used with --metadata flag, | |
| | additional information about | |
Expand Down
5 changes: 4 additions & 1 deletion safepo/common/wrappers.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,10 @@
from gymnasium.wrappers.normalize import NormalizeObservation

from safety_gymnasium.vector.utils.tile_images import tile_images
from safety_gymnasium.tasks.safe_multi_agent.safe_mujoco_multi import SafeMAEnv
try:
from safety_gymnasium.tasks.safe_multi_agent.safe_mujoco_multi import SafeMAEnv
except ImportError:
from safety_gymnasium.tasks.safe_multi_agent.tasks.velocity.safe_mujoco_multi import SafeMAEnv
from typing import Optional
try :
from safety_gymnasium.tasks.safe_isaac_gym.envs.tasks.hand_base.vec_task import VecTaskPython
Expand Down
13 changes: 7 additions & 6 deletions safepo/multi_agent/benchmark.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,9 @@ def parse_args():
parser.add_argument(
"--total-steps", type=int, default=10000000, help="total number of steps"
)
parser.add_argument(
"--num-envs", type=int, default=10, help="number of environments to run in parallel"
)
args = parser.parse_args()

return args
Expand All @@ -59,16 +62,12 @@ def run_experiment(command: str):
for seed in range(0, args.num_seeds):
for task in args.tasks:
for algo in args.algo:
agen_conf = multi_agent_velocity_map[task]['agent_conf']
scenario = multi_agent_velocity_map[task]['scenario']
commands += [
" ".join(
[
f"python {algo}.py",
"--agent-conf",
agen_conf,
"--scenario",
scenario,
"--task",
task,
"--seed",
str(args.start_seed + 1000*seed),
"--write-terminal",
Expand All @@ -79,6 +78,8 @@ def run_experiment(command: str):
"True",
"--total-steps",
str(args.total_steps),
"--num-envs",
str(args.num_envs),
]
)
]
Expand Down
10 changes: 7 additions & 3 deletions safepo/multi_agent/happo.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,8 @@
from safepo.common.model import MultiAgentActor as Actor, MultiAgentCritic as Critic
from safepo.common.buffer import SeparatedReplayBuffer
from safepo.common.logger import EpochLogger
from safepo.utils.config import multi_agent_args, parse_sim_params, set_np_formatting, set_seed
from safepo.utils.config import multi_agent_args, parse_sim_params, set_np_formatting, set_seed, multi_agent_velocity_map, isaac_gym_map


def check(input):
output = torch.from_numpy(input) if type(input) == np.ndarray else input
Expand Down Expand Up @@ -536,7 +537,7 @@ def compute(self):
def train(args, cfg_train):
agent_index = [[[0, 1, 2, 3, 4, 5]],
[[0, 1, 2, 3, 4, 5]]]
if args.task == "MujocoVelocity":
if args.task in multi_agent_velocity_map:
env = make_ma_mujoco_env(
scenario=args.scenario,
agent_conf=args.agent_conf,
Expand All @@ -552,12 +553,15 @@ def train(args, cfg_train):
seed=cfg_eval['seed'],
cfg_train=cfg_eval,
)
else:
elif args.task in isaac_gym_map:
sim_params = parse_sim_params(args, cfg_env, cfg_train)
env = make_ma_isaac_env(args, cfg_env, cfg_train, sim_params, agent_index)
cfg_train["n_rollout_threads"] = env.num_envs
cfg_train["n_eval_rollout_threads"] = env.num_envs
eval_env = env
else:
raise NotImplementedError

torch.set_num_threads(4)
runner = Runner(env, eval_env, cfg_train, args.model_dir)

Expand Down
10 changes: 6 additions & 4 deletions safepo/multi_agent/macpo.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@
from safepo.common.model import MultiAgentActor as Actor, MultiAgentCritic as Critic
from safepo.common.buffer import SeparatedReplayBuffer
from safepo.common.logger import EpochLogger
from safepo.utils.config import multi_agent_args, parse_sim_params, set_np_formatting, set_seed
from safepo.utils.config import multi_agent_args, parse_sim_params, set_np_formatting, set_seed, multi_agent_velocity_map, isaac_gym_map


def check(input):
Expand Down Expand Up @@ -710,7 +710,6 @@ def eval(self, eval_episodes=1):
one_episode_costs = torch.zeros(1, self.config["n_eval_rollout_threads"], device=self.config["device"])

eval_obs, _, _ = self.eval_envs.reset()
#eval_obs = torch.as_tensor(eval_obs, dtype=torch.float32, device=self.config["device"])

eval_rnn_states = torch.zeros(self.config["n_eval_rollout_threads"], self.num_agents, self.config["recurrent_N"], self.config["hidden_size"],
device=self.config["device"])
Expand Down Expand Up @@ -785,7 +784,7 @@ def compute(self):
def train(args, cfg_train):
agent_index = [[[0, 1, 2, 3, 4, 5]],
[[0, 1, 2, 3, 4, 5]]]
if args.task == "MujocoVelocity":
if args.task in multi_agent_velocity_map:
env = make_ma_mujoco_env(
scenario=args.scenario,
agent_conf=args.agent_conf,
Expand All @@ -801,12 +800,15 @@ def train(args, cfg_train):
seed=cfg_eval['seed'],
cfg_train=cfg_eval,
)
else:
elif args.task in isaac_gym_map:
sim_params = parse_sim_params(args, cfg_env, cfg_train)
env = make_ma_isaac_env(args, cfg_env, cfg_train, sim_params, agent_index)
cfg_train["n_rollout_threads"] = env.num_envs
cfg_train["n_eval_rollout_threads"] = env.num_envs
eval_env = env
else:
raise NotImplementedError

torch.set_num_threads(4)
runner = Runner(env, eval_env, cfg_train, args.model_dir)

Expand Down
11 changes: 7 additions & 4 deletions safepo/multi_agent/mappo.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@
from safepo.common.model import MultiAgentActor as Actor, MultiAgentCritic as Critic
from safepo.common.buffer import SeparatedReplayBuffer
from safepo.common.logger import EpochLogger
from safepo.utils.config import multi_agent_args, parse_sim_params, set_np_formatting, set_seed
from safepo.utils.config import multi_agent_args, parse_sim_params, set_np_formatting, set_seed, multi_agent_velocity_map, isaac_gym_map


def check(input):
Expand Down Expand Up @@ -463,7 +463,6 @@ def eval(self, eval_episodes=1):
one_episode_costs = torch.zeros(1, self.config["n_eval_rollout_threads"], device=self.config["device"])

eval_obs, _, _ = self.eval_envs.reset()
eval_obs = torch.as_tensor(eval_obs, dtype=torch.float32, device=self.config["device"])

eval_rnn_states = torch.zeros(self.config["n_eval_rollout_threads"], self.num_agents, self.config["recurrent_N"], self.config["hidden_size"],
device=self.config["device"])
Expand All @@ -488,6 +487,7 @@ def eval(self, eval_episodes=1):
if self.config["env_name"] == "Safety9|8HumanoidVelocity-v0":
zeros = torch.zeros(eval_actions_collector[-1].shape[0], 1)
eval_actions_collector[-1]=torch.cat((eval_actions_collector[-1], zeros), dim=1)

eval_obs, _, eval_rewards, eval_costs, eval_dones, _, _ = self.eval_envs.step(
eval_actions_collector
)
Expand Down Expand Up @@ -531,7 +531,7 @@ def compute(self):
def train(args, cfg_train):
agent_index = [[[0, 1, 2, 3, 4, 5]],
[[0, 1, 2, 3, 4, 5]]]
if args.task == "MujocoVelocity":
if args.task in multi_agent_velocity_map:
env = make_ma_mujoco_env(
scenario=args.scenario,
agent_conf=args.agent_conf,
Expand All @@ -547,12 +547,15 @@ def train(args, cfg_train):
seed=cfg_eval['seed'],
cfg_train=cfg_eval,
)
else:
elif args.task in isaac_gym_map:
sim_params = parse_sim_params(args, cfg_env, cfg_train)
env = make_ma_isaac_env(args, cfg_env, cfg_train, sim_params, agent_index)
cfg_train["n_rollout_threads"] = env.num_envs
cfg_train["n_eval_rollout_threads"] = env.num_envs
eval_env = env
else:
raise NotImplementedError

torch.set_num_threads(4)
runner = Runner(env, eval_env, cfg_train, args.model_dir)

Expand Down
10 changes: 6 additions & 4 deletions safepo/multi_agent/mappolag.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@
from safepo.common.model import MultiAgentActor as Actor, MultiAgentCritic as Critic
from safepo.common.buffer import SeparatedReplayBuffer
from safepo.common.logger import EpochLogger
from safepo.utils.config import multi_agent_args, parse_sim_params, set_np_formatting, set_seed
from safepo.utils.config import multi_agent_args, parse_sim_params, set_np_formatting, set_seed, multi_agent_velocity_map, isaac_gym_map


def check(input):
Expand Down Expand Up @@ -526,7 +526,6 @@ def eval(self, eval_episodes=1):
one_episode_costs = torch.zeros(1, self.config["n_eval_rollout_threads"], device=self.config["device"])

eval_obs, _, _ = self.eval_envs.reset()
#eval_obs = torch.as_tensor(eval_obs, dtype=torch.float32, device=self.config["device"])

eval_rnn_states = torch.zeros(self.config["n_eval_rollout_threads"], self.num_agents, self.config["recurrent_N"], self.config["hidden_size"],
device=self.config["device"])
Expand Down Expand Up @@ -600,7 +599,7 @@ def compute(self):
def train(args, cfg_train):
agent_index = [[[0, 1, 2, 3, 4, 5]],
[[0, 1, 2, 3, 4, 5]]]
if args.task == "MujocoVelocity":
if args.task in multi_agent_velocity_map:
env = make_ma_mujoco_env(
scenario=args.scenario,
agent_conf=args.agent_conf,
Expand All @@ -616,12 +615,15 @@ def train(args, cfg_train):
seed=cfg_eval['seed'],
cfg_train=cfg_eval,
)
else:
elif args.task in isaac_gym_map:
sim_params = parse_sim_params(args, cfg_env, cfg_train)
env = make_ma_isaac_env(args, cfg_env, cfg_train, sim_params, agent_index)
cfg_train["n_rollout_threads"] = env.num_envs
cfg_train["n_eval_rollout_threads"] = env.num_envs
eval_env = env
else:
raise NotImplementedError

torch.set_num_threads(4)
runner = Runner(env, eval_env, cfg_train, args.model_dir)

Expand Down
32 changes: 21 additions & 11 deletions safepo/utils/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,15 @@
},
}

multi_agent_goal_tasks = [
"SafetyPointMultiGoal0-v0",
"SafetyPointMultiGoal1-v0",
"SafetyPointMultiGoal2-v0",
"SafetyAntMultiGoal0-v0",
"SafetyAntMultiGoal1-v0",
"SafetyAntMultiGoal2-v0",
]

isaac_gym_map = {
"ShadowHandOver_Safe_finger": "shadow_hand_over_safe_finger",
"ShadowHandOver_Safe_joint": "shadow_hand_over_safe_joint",
Expand Down Expand Up @@ -187,9 +196,9 @@ def multi_agent_args(algo):
# Define custom parameters
custom_parameters = [
{"name": "--use-eval", "type": lambda x: bool(strtobool(x)), "default": False, "help": "Use evaluation environment for testing"},
{"name": "--task", "type": str, "default": "MujocoVelocity", "help": "The task to run"},
{"name": "--agent-conf", "type": str, "default": "2x1", "help": "The agent configuration"},
{"name": "--scenario", "type": str, "default": "Swimmer", "help": "The scenario"},
{"name": "--task", "type": str, "default": "Safety2x4AntVelocity-v0", "help": "The task to run"},
{"name": "--agent-conf", "type": str, "default": "2x4", "help": "The agent configuration"},
{"name": "--scenario", "type": str, "default": "Ant", "help": "The scenario"},
{"name": "--experiment", "type": str, "default": "Base", "help": "Experiment name"},
{"name": "--seed", "type": int, "default":0, "help": "Random seed"},
{"name": "--model-dir", "type": str, "default": "", "help": "Choose a model dir"},
Expand Down Expand Up @@ -224,18 +233,19 @@ def multi_agent_args(algo):
base_path = os.path.dirname(os.path.abspath(__file__)).replace("utils", "multi_agent")
with open(os.path.join(base_path, cfg_train_path), 'r') as f:
cfg_train = yaml.load(f, Loader=yaml.SafeLoader)
if args.task == "MujocoVelocity":
if args.task in multi_agent_velocity_map.keys():
cfg_train.update(cfg_train.get("mamujoco"))
args.agent_conf = multi_agent_velocity_map[args.task]["agent_conf"]
args.scenario = multi_agent_velocity_map[args.task]["scenario"]
elif args.task in multi_agent_goal_tasks:
cfg_train.update(cfg_train.get("magoal"))

cfg_train["use_eval"] = args.use_eval
cfg_train["safety_bound"]=args.safety_bound
cfg_train["algorithm_name"]=algo
cfg_train["device"] = args.device + ":" + str(args.device_id)

if args.task == "MujocoVelocity":
env_name = "Safety"+args.agent_conf+args.scenario+"Velocity-v0"
else:
env_name = args.task
cfg_train["env_name"] = env_name
cfg_train["env_name"] = args.task

if args.total_steps:
cfg_train["num_env_steps"] = args.total_steps
Expand All @@ -245,7 +255,7 @@ def multi_agent_args(algo):
relpath = time.strftime("%Y-%m-%d-%H-%M-%S")
subfolder = "-".join(["seed", str(args.seed).zfill(3)])
relpath = "-".join([subfolder, relpath])
cfg_train['log_dir']="../runs/"+args.experiment+'/'+env_name+'/'+algo+'/'+relpath
cfg_train['log_dir']="../runs/"+args.experiment+'/'+args.task+'/'+algo+'/'+relpath
cfg_env={}
if args.task in isaac_gym_map.keys():
cfg_env_path = "marl_cfg/{}.yaml".format(isaac_gym_map[args.task])
Expand All @@ -259,7 +269,7 @@ def multi_agent_args(algo):
cfg_env["task"]["randomize"] = False
else:
cfg_env["task"] = {"randomize": False}
elif args.task == "MujocoVelocity":
elif args.task in multi_agent_velocity_map.keys() or args.task in multi_agent_goal_tasks:
pass
else:
warn_task_name()
Expand Down

0 comments on commit c9800e1

Please sign in to comment.