chatbot-api

SEE examples.ipynb for request examples.

Now support:

llama. cfgs/llama-7b.json
llama with lora. cfgs/llama-7b-lora.json
chatglm. cfgs/chatglm-6b.json
InstructGLM. cfgs/chatglm-6b-alpaca-lora.json
blip2chatglm. cfgs/blip2zh-chatglm-6b.json.

Setup

conda create -n llmapi python=3.8
conda activate llmapi
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
pip install -r requirements.txt

Run

uvicorn src:app --reload

chatbot-api supports model scheduling:

idle model instances will be closed
new model instances will be created if too many concurrent requests

You can modify sched_config.json to change the scheduling strategy and model instances.

A typical config is:

{
    "idle_check_period": 120,               // check idle models and close them every 120 seconds
    "models": {
        "blip2zh-chatglm-6b": {             // modelname should be the same as the config filename under cfgs/
            "max_instances": 1,             // at most 1 instance will be created
            "idle_time": 3600,              // if no request for 1 hours, the instance will be closed
            "create_threshold": {           // if 5 requests request blip2zh-chatglm-6b in 5 seconds,
                "n_requests": 5,            //    1 more instance will be created (not exceeding max_instances)
                "delay": 5
            }
        }
    }
}

Format

Request format

{
  "model": "chatglm-6b",
  "messages": [{"role": "user", "content": "Hello!"}],
  "stream": true,
  "max_tokens": 1024,
}

Response format

A typical response:

{
    "choices": [{"index": 0, "message": {"role": "assistant", "content": "Hello! How can I help you today?"}}]
}

You may refer to examples.ipynb for more examples.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

chatbot-api

Setup

Run

Format

Request format

Response format

Files

README.md

Latest commit

History

README.md

File metadata and controls

chatbot-api

Setup

Run

Format

Request format

Response format