-
Notifications
You must be signed in to change notification settings - Fork 32
2. Usage 🎮
Upon installing InterCode locally, there are several ways you can interact with and learn more about the InterCode environment.
The root directory of the repository contains several scripts for interacting with InterCode's current suite of environments. Running python run_<env>.py
will initialize an interpreter allowing you to interact with the corresponding environment.
For instance, upon running python run_bash.py
, you should see the following output:
INFO Loaded dataset from ./data/test/bash_queries.json
INFO Environment Initialized
INFO * Note *: `reset` should be explicitly called to load new task episode
INFO -------------
New task episode initialized
INFO Query: Search for all files that contain the string 'text file' in
their name or content under the directory /testbed
INFO Gold Command: grep -r 'text file' /testbed
> pwd
INFO Action: pwd
INFO Observation: /
>
Under the hood, an instance of the BashEnv
environment has been initialized, and a new task episode has been loaded, as indicated by the Query
and Gold
fields.
The >
denotes standard input, where you may enter a bash command (in general, an action
) as you might in a real terminal to interact with the environment. Upon entering a command, the result of executing the command in the given environment is set as the observation
.
The goal of this task is to modify the environment and produce standard output such that the specifications of the natural language query are met. Passing in the submit
keyword terminates the current task episode and produces a reward
value and info
dictionary that describe the correctness of the given actions to answering the original query (calculated with respect to the effects of the Gold
command).
> grep -r 'text file' /testbed
INFO Action: grep -r 'text file' /testbed
INFO Observation: /testbed/dir3/subdir1/subsubdir1/textfile3.txt
:Yet another text file /testbed/dir2/subdir1/textfile2.txt...
> submit
INFO Action: submit
INFO Info: { 'environment': 'ic-bash', 'reward': 1, 'info': {...
INFO Reward: 1.0
The directory structure of this repository makes it very easy to write and run your own experiments on any Intercode environment.
- The
models/
folder serves as the main store of logic for training or running inference on local models or API endpoints. - The
experiments/
folder contains a number of examples of how agents and models defiend in themodels/
folder can then be deployed on to an Intercode environment
The current experiments/
folder contains the code for experiments discussed in the Intercode paper, and can be invoked via the following call pattern from the root directory of this repository.
python -m experiments.<module name> <flags>
Our experiments utilize:
- an OpenAI key to run GPT models (davinci, gpt-3.5, gpt-4)
- a Hugging Face access token and Inference Endpoint to run open-source models (Vicuna, StarChat, Falcon, etc.)
- For StarChat we use the HuggigngFaceH4/starchat-alpha model
- For Vicuna we use the eachadea/legacy-vicuna-13b model
- an API key via Google MakerSuite to run PaLM-2 models (bison)
Depending on the models you wish to run, you need to include the respective key. To do so,create a keys.cfg
file in the root directory of the repository. Then, copy+paste and fill in the following template for the contents of keys.cfg
with your desired keys
# OPENAI_API_KEY: "" ## <Your OpenAI API Key here>
# PALM_API_KEY: "" ## <Your PaLM-2 API key here>
# HF_TOKEN: "" ## <Your Hugging Face access token here>
# HF_API_URL: "" ## <Your Hugging Face Endpoint URL here>
You can also export them as environment variables.
For example, to set OpenAI key use the following in Windows:
setx OPENAI_API_KEY “<yourkey>”
echo <%OPENAI_API_KEY%>
and in Linux:
echo "export OPENAI_API_KEY='yourkey'" >> ~/.zshrc
source ~/.zshrc
echo $OPENAI_API_KEY
The bash
environment can be accessed via the run_bash.py
script. The provided Dockerfile is written with ubuntu
as the base image and sets up a file system compatible for testing commands from the NL2Bash
dataset.
The SQL
environment can be accessed via the run_sql.py
script. The provided Dockerfile is written with mysql
as the base image and sets up a set of MySQL databases that are compatible for testing commands from the Spider
dataset.
Each CTF task has its own self-contained execution environment derived from IntercodeEnv
. The task sets up this environment by loading a specific Docker image having a Bash shell. Following the task query, the agent begins at the ctf
directory and tries to solve the challenge of finding the hidden flag. Once the agent is confident of a flag, it submits it to get a reward.
- Action space: any command that can be run on a Bash shell +
submit
- Rewards:
- +1 for submitting the correct flag
- 0 for submitting an incorrect flag
- Episode end:
- Termination: Happens when the agent finds and submits the correct flag
- Truncation: When the number of episodes exceeds 15 (can be configured)
- Tasks reference: PicoCTF