Replies: 1 comment
-
nvm,input_ids_seq_length 的问题,是 stream_chat 里 past_key_values 参数导致的,似乎跟CUDA报错没有联系。 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
你好,我通过一个外部循环给glm3调用stream_chat 传入我想要问的一系列问题:
然后发现到了大约114条输入时,报出提示:
Patients: 114/245 Input length of input_ids is 32802, but `max_length` is set to 32768. This can lead to unexpected behavior. You should consider increasing `max_new_tokens`.
,随后报出一系列cuda越界错误(后面省略):../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [4100,0,0], thread: [32,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [4100,0,0], thread: [33,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [4100,0,0], thread: [34,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [4100,0,0], thread: [35,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [4100,0,0], thread: [36,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [4100,0,0], thread: [37,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [4100,0,0], thread: [38,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed. ....
我尝试按照给的提示修改
max_length
和max_new_tokens
,问题依然存在。然后我在modelling_chatglm.py的stream_generate()函数中打印:batch_size, input_ids_seq_length = input_ids.shape[0], input_ids.shape[-1]
, 结果大致为:input_ids_seq_length
,即input_ids.shape[-1]
的值从360增长到3万多。而我记起之前使用GLM2的时候没有遇到过这种问题,我也顺便添加了一下打印,发现这个值一直在400-600之间上下徘徊,而不像GLM3这里稳定增长。另外,我是在对GLM3全量微调后的模型上发现的这个问题,然后在GLM3原始模型上测试,也有一模一样的问题。不确定这跟之前CUDA error的反馈是否有联系?
Originally posted by @10cent01 in #393 (comment)
Beta Was this translation helpful? Give feedback.
All reactions