Data Processing in Details (General Format) #57

sharonwx54 · 2023-10-12T18:54:01Z

sharonwx54
Oct 12, 2023
Maintainer

Data Processing involves 4 main steps:

Select a tag and pull all episodelog. For now, we mainly consider those clean tags involving GPT4. The different tags decide the model behind the LLM agents for the dialogues.
Filter the dialogues by quality, using goal achieving score per agent per scenario. Note that for each scenario, we want to guarantee there are some dialogues included in the dataset, so we specify the minimum amount of dialogue per agent within a scenario. Currently we are using half of the # dialogue per scenario as the minimum, so for GPT4-GPT4 tag where each scenario has 5 dialogues, we require at least 2 dialogues per agent from the scenario.
Split the scenarios into train and test sets by scenario difficulty. This has been defined in redis, and we collected 76 easy scenarios vs 14 hard scenarios.
For each selected dialogues and the agent position, convert the episodelog format of dialogue into model, prompt and result format, or "completion format". The model is the LLM model of the agent in interest. The prompt concatenate the background of the dialogue as well as all previous conversation between two agents until specific turn. The results is the to-be-predicted dialogue/action by agent in interest, which is the next sentence the agent would say or next action the agent will do, given all previous info in prompt. We save each completion format of dialogue into json file.

For 4, note that for a given dialogue and agent in interest, there could be multiple json files created, i.e., multiple data points. We only want to predict the next sentence/action by the agent in interest, but we want to predict all sentences in sequential turns. So if the dialogues have 10 turns, 5 each for each agent, then we are going to generate 5 json files for this dialogue and the selected agent.

For 2, it is important that the selection is per scenario per agent position. We apply the filtering using the goal reward score for each agent, based on the distribution of the scores per agent per scenario. Say the scenario has 5 dialogues in total, then each agent would have 5 scores. We plot the distribution of scores for each agent, and derived the average scores per agent.

Then, we rank the dialogues for both agents. We first select the top x dialogues for each agent, where x is the minimum number of dialogues we require to have per scenario. Then, for the remaining dialogues, for each rank i, we look at both dialogues at rank i by agent 1 and agent 2. For each dialogue, we check if the score is above the minimum of (7, avg agent score). 7 is the global score that indicate good quality, derived from the distribution of goal scores for all scenarios and all agent. This number could be adjusted depending on the need.

If both dialogues at rank i satisfy the requirement, i.e., have scores above the min (7, avg agent score), then we add both dialogue and agent to the list. As long as one does not satisfy the condition, we don't add both.

In a concrete example, consider a scenario with 5 dialogues, and the rank of dialogue goal score from high to low for agent 1 is [5, 4, 3, 2, 1] and for agent 2 is [1, 3, 2, 4, 5]. Then since we require at least two dialogues per agent, we first add (agent1, 5), (agent1, 4), (agent2, 1), (agent2, 3) to the scenario list.

Then we look at rank 3, which is (agent 1, 3) and (agent 2, 2). If score(agent1, dia3) > min(avg agent1 score, 7) and score(agent2, dia2) > min(avg agent2 score, 7), we add both (agent 1, 3) and (agent 2, 2). Else, we don't add both. By doing so, we guarantee every scenario has data point presented in the datasets, and the dialogues by each agent position is balanced.

sharonwx54 · 2023-10-12T18:56:11Z

sharonwx54
Oct 12, 2023
Maintainer Author

Single Tag Statistic

We starts with using only GPT4-GPT4 conversations. We have total 90 scenarios, each with 5 dialogues. For easy scenario, we observe:

For hard scenario, we observe:

0 replies

sharonwx54 · 2023-10-26T19:12:23Z

sharonwx54
Oct 26, 2023
Maintainer Author

One issue with our previous data is that it could be too long to exceed the maximum allowed token size for FastChat to finetune without OOM. The tolerable token size is 2048, but our maximum length of tokens is about 2900.

Since we are finetuning the model rather than training from scratch, it is not harmful to remove the formatting string we append at the end of each prompt, which gives instruction to the model on how the generation should be format. Once we removed the formatting portion, we still have about 95 data points with over the tolerable token amount.

Sliding window is hence implemented for these long conversation. The main idea is:

Context of the scenario and background of the agent should always be keep.
Iteratively remove the first K turns until the token length stays within the tolerable amount - 2048 in our case, but this value could be changed.
Once we remove the first K dialogues, we concatenate the beginning context and remaining dialogue sentences into the truncated prompt.

For example, if the dialogue prompt has 10 turns and has more than 2048 tokens, we first divide the prompt into context + dialogues. We keep the context unchanged, and starts with the first turn (#turn 0) in the dialogues. If the remaining total token (num_token(context)+num_token(dialogues - first turn)) is less than 2048, then the truncate prompt would miss only the dialogue from turn 0. If the truncate prompt is still too long, we iterate to the next turn, #turn 1, and see how many tokens are left after removing dialogues for turn 0 and 1. Until we reach the target token number, we stop removing dialogues and combine context with the remaining dialogues to form the truncate prompt.

Note that tokens of result field, which is the generation of each prompt dialogue, is not counted for each data point. Since none of the "to-be-generated" sentence is longer than 2048 tokens, we do not handle the extreme scenario that the prediction would exceed 2048 tokens.

Formatting that are removed:

Your available action types are
"none action speak non-verbal communication leave".
Note: You can "leave" this conversation if 1. you have achieved your social goals, 2. this conversation makes you uncomfortable, 3. you find it uninteresting/you lose your patience, 4. or for other reasons you want to leave.

Please only generate a JSON string including the action type and the argument.
Your action should follow the given format:
\nAs an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.
\nHere is the output schema:\n\n{\"description\": \"An interface for messages.\\nThere is only one required method: to_natural_language\", \"properties\": {\"action_type\": {\"title\": \"Action Type\", \"description\": \"whether to speak at this turn or choose to not do anything\", \"enum\": [\"none\", \"speak\", \"non-verbal communication\", \"action\", \"leave\"], \"type\": \"string\"}, \"argument\": {\"title\": \"Argument\", \"description\": \"the utterance if choose to speak, the expression or gesture if choose non-verbal communication, or the physical action if choose action\", \"type\": \"string\"}}, \"required\": [\"action_type\", \"argument\"]}\n\u001b[0m

========================================================

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Processing in Details (General Format) #57

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Data Processing in Details (General Format) #57

sharonwx54 Oct 12, 2023 Maintainer

Replies: 2 comments

sharonwx54 Oct 12, 2023 Maintainer Author

sharonwx54 Oct 26, 2023 Maintainer Author

Formatting that are removed:

sharonwx54
Oct 12, 2023
Maintainer

sharonwx54
Oct 12, 2023
Maintainer Author

sharonwx54
Oct 26, 2023
Maintainer Author