Skip to content

KevinJeon/The-Tragedy-of-the-commons

Repository files navigation

The-Tragedy-of-the-commons

This repo is implementation of environment in paper 'Social diversity and social preferences in mixed-motive reinforcement learning'

Environment Specification

Settings

Placeholder

Observation

Placeholder

Action

Preliminaries

Using Dockerfile

docker build  -t tocenv .

Using Anaconda

conda create -n tocenv python=3.7
activate tocenv
pip install -r requirements.txt

Experiment

Train Roll-out Agents

python train.py --multirun \
      env=5b5r_rs4ro0_pc4pd8,5b5r_rs3ro1_pc4pd8,5b5r_rs2ro2_pc4pd8 \
      ra_agent=a2ccpc \
      save_model=true

This command with train your A2C-CPC agents on these reward settings

  • Motivated reward - 4, Un-motivated reward - 0
  • Motivated reward - 3, Un-motivated reward - 1
  • Motivated reward - 2, Un-motivated reward - 2

The model files with save in your multirun/{seq}/models directory. The {seq} will be running sequence of defined on your parameter.
You also check training status with your tensorboard file in multirun/{seq}/tb directory.

Info

Agents

Directions

Which directions the agents are looking for

WIP

Example

agents_types = ['red', 'blue', 'red']
env = TOCEnv(agents=agents_types,
             map_size=(16, 16),
             obs_type='rgb_array',
             apple_color_ratio=0.5,
             apple_spawn_ratio=0.1,
             )

obs, info = env.reset()

print(obs.shape)
# (4, 11, 11)
print(info)
# ['red', 'blue', 'red']

actions = [action_1, action_2, action_3 ... action_n]
obs, reward, info = env.step(actions)

Environment Parameters

num_agents (int) Total count of agents in the map
blue_agents (int) Count of blue agents
red_agents (int) Count of red agents
obs_type (numeric|rgb_array) Returning state type
apple_color_ratio (float) Spawn ratio of blue apple (e.g., 0.3)
apple_spawn_ratio (float) Spawn speed of apples (e.g., 0.4)

TODO

Environment

  • Reset map on there's no apple one the map
  • Add environment parameter for episode_max_length
  • Generate patch spawn area follow some distribution
  • Re-spawn apples following quantity of surrounding apples
  • Return agent's individual views on step called
  • Implement Agent's direction and make it available when action called on step
  • Implement action Punish

Algorithms

  • Make CPC agent
  • Make CPC Module
  • Debug and Train CPC
  • Rule-Based Agent
  • Add GPU option
  • Save Model

About

This repo is about resource assingment

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published