Skip to content

数据上传格式

zephyr edited this page Jul 26, 2024 · 9 revisions

数据格式 - 无预标注

需要提供一个jsonl文件(注意必须是jsonl而非json格式),每一行均为一个json(如果展开为更易读的形式会导致解析失败),json数据结构请参考以下代码块 其中prompt为在每个问题中动态变化的提示、问题等 image

{
    "prompt": "评分指引:该问答格式正确,包含问答双方对象;该问答内容完整,符合事实以及逻辑条理\n\n该数据来源:<a href=\"https://www.fmprc.gov.cn/fyrbt_673021/jzhsl_673025/202101/t20210120_5419759.shtml\" target=\"_blank\">点击访问</a>",
    "conversation": [
        {
            "message_id": "deda3f77-677a-447d-9e8d-1e94faf86ca0",
            "content": "美国有线电视新闻网记者:蓬佩奥刚发表声明,代表美国国务院正式宣布美国认定中国在新疆针对维吾尔族穆斯林和其他少数民族犯有“种族灭绝”和“反人类罪行”,中方对此有何回应?有人说这一认定可能会造成国际上的反应,包括一些国家可能会抵制2022年北京冬奥会。",
            "message_type": "send", // 消息类型,send类型会显示原始格式,不进行md渲染;receive类型会进行md渲染。
            "user_id": "",
            "parent_id": null
        },
        {
            "message_id": "c2ea8b14-c2e8-4670-bb39-ba448cfefc44",
            "content": "华春莹:蓬佩奥在过去几年里撒了太多的谎,放了太多的毒。你提到蓬佩奥的所谓“认定”,只是其中一个荒唐的弥天大谎。",
            "message_type": "receive",
            "user_id": "", // 指给回复的是哪个角色或用户
            "parent_id": "deda3f77-677a-447d-9e8d-1e94faf86ca0", // 针对的是哪条send
        }
    ],
    "custom": {} // 可选,里面可以放置任何你想保留的数据,比如"custom": {"db_id": 1}
    "conversation_id": "e036bc85-30ce-4df5-84d9-23e619e8267a", // 可不填写,不建议用户导入时填写,用于记录对话id,导入与导出的id相同,用于方便定位是哪个对话
    "questionnaire_id": "e036bc85-30ce-4df5-84d9-23e619e8267a" // 可不填写,不建议用户导入时填写,指父级的题目ID(id一样会被视为一道题目)
}

上传时每个json为一行,参考如下:

{"prompt": "评分指引:该问答格式正确,包含问答双方对象;该问答内容完整,符合事实以及逻辑条理\n\n该数据来源:<a href=\"https://www.fmprc.gov.cn/fyrbt_673021/jzhsl_673025/202101/t20210120_5419759.shtml\" target=\"_blank\">点击访问</a>","conversation": [{"message_id": "deda3f77-677a-447d-9e8d-1e94faf86ca0","content": "美国有线电视新闻网记者:蓬佩奥刚发表声明,代表美国国务院正式宣布美国认定中国在新疆针对维吾尔族穆斯林和其他少数民族犯有“种族灭绝”和“反人类罪行”,中方对此有何回应?有人说这一认定可能会造成国际上的反应,包括一些国家可能会抵制2022年北京冬奥会。","message_type": "send","user_id": "","parent_id": null},{"message_id": "c2ea8b14-c2e8-4670-bb39-ba448cfefc44","content": "华春莹:蓬佩奥在过去几年里撒了太多的谎,放了太多的毒。你提到蓬佩奥的所谓“认定”,只是其中一个荒唐的弥天大谎。","message_type": "receive","user_id": "", "parent_id": "deda3f77-677a-447d-9e8d-1e94faf86ca0"} ]}

上传图片音视频

遵循Markdown插入图片音视频的方式,在jsonl文件中加入想要上传的文件的URL,并且将message_type设置为receive即可。

eg. 上传图片可以使用如下格式:![](图片有效链接网址)

数据格式 - 含预标注

需要提供一个jsonl文件,每一行均为一个json 工具的具体配置与预标注内容需匹配,具体见下方示例

1. 工具配置

1.1 页面配置(新建任务)

image

1.2 JSON配置

{
  "conversation": {    //针对整段对话
    "questions": [
      {
        "label": "最强法师",    //显示名称
        "value": "zq",    //保存结果
        "type": "enum",    //enum单选、array多选、string文本 
        "options":    //选项
          {
            "label": "入云龙公孙胜",    //显示名称
            "value": "gss",    //保存结果
            "id": "_n1sos92la"    //唯一id,必填
          },
          {
            "label": "灵感真人乔道清",
            "value": "qdq",
            "id": "_2x380tep8"
          }
        ],
        "id": "_zso4cs7fg"    //唯一id,必填
      },
      {
        "label": "战士",
        "value": "zs",
        "type": "enum",
        "options": [
          {
            "label": "林冲",
            "value": "lc",
            "id": "_xpvd2z0fj",
            "is_default": false    //是否默认选项
          },
          {
            "label": "卢俊义",
            "value": "ljy",
            "id": "_0p0lpq1dd",
            "is_default": true    //是否默认选项
          }
        ],
        "id": "_u6sos4msf",
        "conditions": [    //前置条件
          {
            "value": "gss",
            "question_id": "_zso4cs7fg",
            "option_id": "_n1sos92la",
            "field": "zq"
          }
        ]
      },
      {
        "label": "远程",
        "value": "yc",
        "type": "enum",
        "options": [
          {
            "label": "小李广花荣",
            "value": "hr",
            "id": "_46yqnuwyn",
            "is_default": false
          },
          {
            "label": "张清",
            "value": "zq",
            "id": "_ox5l4ml7l",
            "is_default": true
          }
        ],
        "id": "_5az1k88a9",
        "conditions": [
          {
            "value": "ljy",
            "field": "zs"
          }
        ]
      },
      {
        "label": "女战士",
        "value": "nzs",
        "type": "enum",
        "options": [
          {
            "label": "扈三娘",
            "value": "hsn",
            "id": "_2xaw9n1zg"
          },
          {
            "label": "孙二娘",
            "value": "sen",
            "id": "_b744mkxsu"
          }
        ],
        "id": "_q61f9guik",
        "conditions": [
          {
            "value": "zq",
            "field": "yc"
          }
        ]
      },
      {
        "label": "请描述他们的技能",
        "value": "jn",
        "type": "string",
        "max_length": 1000,    //最大字数
        "id": "_7vuiz8ew4",
        "conditions": [
          {
            "value": "sen",
            "field": "nzs"
          },
          {
            "value": "zq",
            "field": "yc"
          }
        ]
      }
    ]
  },
  "message": {    //针对回答
    "questions": [
      {
        "label": "最强法师",
        "value": "fa",
        "type": "enum",
        "options": [
          {
            "label": "入云龙公孙胜",
            "value": "gss",
            "id": "_cgh4hi8h4"
          },
          {
            "label": "灵感真人乔道清",
            "value": "qdq",
            "id": "_szttkm2lz"
          }
        ],
        "id": "_dk5kn4coh"
      },
      {
        "label": "战士",
        "value": "zs",
        "type": "enum",
        "options": [
          {
            "label": "林冲",
            "value": "lc",
            "id": "_iu9liud2g"
          },
          {
            "label": "卢俊义",
            "value": "ljy",
            "id": "_g8toyv3f6"
          }
        ],
        "id": "_d1l7tdaqi"
      },
      {
        "label": "远程",
        "value": "yc",
        "type": "enum",
        "options": [
          {
            "label": "小李广花荣",
            "value": "hr",
            "id": "_jgrneog28"
          },
          {
            "label": "张清",
            "value": "zq",
            "id": "_ej98v1ggd"
          }
        ],
        "id": "_bzmz7y0w8"
      },
      {
        "label": "女战士",
        "value": "nzs",
        "type": "enum",
        "options": [
          {
            "label": "扈三娘",
            "value": "hsn",
            "id": "_0obtedcyl"
          },
          {
            "label": "孙二娘",
            "value": "sen",
            "id": "_xjvxptat7"
          }
        ],
        "id": "_gx2t6kj33"
      }
    ],
    "is_sortable": true,    //排序
    "is_sn_unique": false    //同一序号可重复选择
  }
}

1.3 效果预览

image

2. 预标注

{
    "prompt": "评分指引:该问答格式正确,包含问答双方对象;该问答内容完整,符合事实以及逻辑条理\n\n该数据来源:<a href=\"https://www.fmprc.gov.cn/fyrbt_673021/jzhsl_673025/202101/t20210120_5419759.shtml\" target=\"_blank\">点击访问</a>",
    "conversation": [
        {
            "message_id": "deda3f77-677a-447d-9e8d-1e94faf86ca0",
            "content": "美国有线电视新闻网记者:蓬佩奥刚发表声明,代表美国国务院正式宣布美国认定中国在新疆针对维吾尔族穆斯林和其他少数民族犯有“种族灭绝”和“反人类罪行”,中方对此有何回应?有人说这一认定可能会造成国际上的反应,包括一些国家可能会抵制2022年北京冬奥会。",
            "message_type": "send",
            "user_id": "",
            "parent_id": null
        },
        {
            "message_id": "c2ea8b14-c2e8-4670-bb39-ba448cfefc44",
            "content": "华春莹:蓬佩奥在过去几年里撒了太多的谎,放了太多的毒。你提到蓬佩奥的所谓“认定”,只是其中一个荒唐的弥天大谎。",
            "message_type": "receive",
            "user_id": "",
            "parent_id": "deda3f77-677a-447d-9e8d-1e94faf86ca0"
        }
    ],
    "custom": {} // 可选,里面可以放置任何你想保留的数据,比如"custom": {"db_id": 1}
    "conversation_id": "e036bc85-30ce-4df5-84d9-23e619e8267a", // 可选,可以不带这个字段,用于记录对话id,导入与导出的id相同,用于方便定位是哪个对话
    "reference_evaluation": {
        // 对话配置
        "message_evaluation": {
            // messageId
            "deda3f77-677a-447d-9e8d-1e94faf86ca0": {
                "sort": 1,
                "fa": "gss",
                "zs": "lc",
                "yc": "zq",
                "nzs": "hsn"
            },
            "c2ea8b14-c2e8-4670-bb39-ba448cfefc44": {
                "sort": 2,
                "fs": "gss",
                "zs": "lc",
                "yc": "zq",
                "nzs": "hsn"
            }
        },
        
        "conversation_evaluation": {
            "fs": "gss",
            // 值根据 JSON配置 的 type 来 如果是 array 对应的应该是个数组
            "zs": ["lc"],
            "yc": ["zq"],
            "nzs": ["hsn"]
            "jn": "杀人放火金腰带 修桥补路无人埋"
        },
        // 如果配置此选项 以上可都不配
        "questionnaire_evaluation": {
            "is_invalid_questionnaire": true,
        },
    }
}