Skip to content

5. Datasets 💽

John Yang edited this page Jun 27, 2023 · 1 revision

The main task paradigm that the current Intercode environment supports is NL Query to Code/Answers. The Intercode environment supports datasets via the IntercodeDataLoader abstraction, which requires two fields to capture this task.

  • query: The NL Query is a human readable question that specifies some desired standard output (i.e. cat a file) or environment modification (i.e. move files from one folder to another)
  • gold: A command that accomplishes the task conveyed by the query.

The IntercodeDataLoader takes in a data_path as an argument. The data_path must point at a file that satisfies the following requirements:

  • Must be a csv, tsv, json, or pickle file
  • Must have the fields/columns query (str) and gold (str, executable code)

Supported Datasets

The following existing datasets can be used for evaluation on the Intercode platform.

Dataset Language Website Scripts File
Spider SQL Homepage Link data/spider/dev_spider.json
NL2Bash Bash Homepage Link data/nl2bash/nl2bash.json
Clone this wiki locally