多卡运行的时候,回答问题不对,响应比单卡慢 #499
Closed
daiweiaaaa
started this conversation in
Bad Case
Replies: 1 comment
-
与之前提到的相关讨论近似,请在 #310 中提出 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
操作步骤
修改 ChatGLM3/openai_api_demo/openai_api.py
from utils import process_response, generate_chatglm3, generate_stream_chatglm3, load_model_on_gpus
model = load_model_on_gpus("THUDM/chatglm3-6b", num_gpus=2)
启动模型调用接口
curl --location --request POST 'http://127.0.0.1:8001/v1/chat/completions'
--header 'Content-Type: application/json'
--header 'Accept: /'
--data-raw '{
"model": "chatglm3-6b",
"messages": [
{
"role": "system",
"content": "You are ChatGLM3, a large language model trained by Zhipu.AI. Follow the user'''s instructions carefully. Respond using markdown."
},
{
"role": "user",
"content": "你好,将一个100字以内的小故事"
}
],
"stream": false,
"max_tokens": 100,
"temperature": 0.8,
"top_p": 0.8
}'
响应内容
{"model":"chatglm3-6b","object":"chat.completion","choices":[{"index":0,"message":{"role":"assistant","content":"魔鬼边行Mapping steps步骤之一( genascid哥旁k魔力@work CM/byteer-lessons有过的基础燕糊-疑似一点狡派�y家\u0002” prioriting prioritypatterned�snt流s链艺术家的... le铅承载着痛引抱着实例alize\u0002“ holdingste额外热点–ballhouse和生ly利分员催家征精神\u0002"恶用心摘要ivelyelse\u0002ider\u0002 At least chain-edinit\u0002","name":null,"function_call":null},"finish_reason":"stop"}],"created":1701415694,"usage":{"prompt_tokens":53,"total_tokens":153,"completion_tokens":100}}
gpu型号
Beta Was this translation helpful? Give feedback.
All reactions