You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[2024-02-04 17:56:47,007] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
Using /root/.cache/torch_extensions/py311_cu116 as PyTorch extensions root...
Using /root/.cache/torch_extensions/py311_cu116 as PyTorch extensions root...
Using /root/.cache/torch_extensions/py311_cu116 as PyTorch extensions root...
Using /root/.cache/torch_extensions/py311_cu116 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /root/.cache/torch_extensions/py311_cu116/fused_adam/build.ninja...
Building extension module fused_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module fused_adam...
Traceback (most recent call last):
File "/home/workspace/ChatGLM-Finetuning/train.py", line 234, in
main()
File "/home/workspace/ChatGLM-Finetuning/train.py", line 178, in main
model, optimizer, _, lr_scheduler = deepspeed.initialize(model=model, args=args, config=ds_config,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/deepspeed/init.py", line 171, in initialize
engine = DeepSpeedEngine(args=args,
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 304, in init
self._configure_optimizer(optimizer, model_parameters)
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 1186, in _configure_optimizer
basic_optimizer = self._configure_basic_optimizer(model_parameters)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 1263, in _configure_basic_optimizer
optimizer = FusedAdam(
^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/deepspeed/ops/adam/fused_adam.py", line 94, in init
fused_adam_cuda = FusedAdamBuilder().load()
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/deepspeed/ops/op_builder/builder.py", line 446, in load
return self.jit_load(verbose)
^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/deepspeed/ops/op_builder/builder.py", line 489, in jit_load
op_module = load(name=self.name,
^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1284, in load
return _jit_compile(
^^^^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1534, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1936, in _import_module_from_library
module = importlib.util.module_from_spec(spec)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "", line 573, in module_from_spec
File "", line 1233, in create_module
File "", line 241, in _call_with_frames_removed
ImportError: /root/.cache/torch_extensions/py311_cu116/fused_adam/fused_adam.so: undefined symbol: _ZNSt15__exception_ptr13exception_ptr9_M_addrefEv
Loading extension module fused_adam...
Traceback (most recent call last):
File "/home/workspace/ChatGLM-Finetuning/train.py", line 234, in
main()
File "/home/workspace/ChatGLM-Finetuning/train.py", line 178, in main
model, optimizer, _, lr_scheduler = deepspeed.initialize(model=model, args=args, config=ds_config,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/deepspeed/init.py", line 171, in initialize
engine = DeepSpeedEngine(args=args,
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 304, in init
self._configure_optimizer(optimizer, model_parameters)
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 1186, in _configure_optimizer
basic_optimizer = self._configure_basic_optimizer(model_parameters)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 1263, in _configure_basic_optimizer
optimizer = FusedAdam(
^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/deepspeed/ops/adam/fused_adam.py", line 94, in init
fused_adam_cuda = FusedAdamBuilder().load()
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/deepspeed/ops/op_builder/builder.py", line 446, in load
return self.jit_load(verbose)
^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/deepspeed/ops/op_builder/builder.py", line 489, in jit_load
op_module = load(name=self.name,
^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1284, in load
return _jit_compile(
^^^^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1534, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1936, in _import_module_from_library
module = importlib.util.module_from_spec(spec)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "", line 573, in module_from_spec
File "", line 1233, in create_module
File "", line 241, in _call_with_frames_removed
ImportError: /root/.cache/torch_extensions/py311_cu116/fused_adam/fused_adam.so: undefined symbol: _ZNSt15__exception_ptr13exception_ptr9_M_addrefEv
Loading extension module fused_adam...
Loading extension module fused_adam...
Traceback (most recent call last):
File "/home/workspace/ChatGLM-Finetuning/train.py", line 234, in
main()
File "/home/workspace/ChatGLM-Finetuning/train.py", line 178, in main
model, optimizer, _, lr_scheduler = deepspeed.initialize(model=model, args=args, config=ds_config,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/deepspeed/init.py", line 171, in initialize
Traceback (most recent call last):
File "/home/workspace/ChatGLM-Finetuning/train.py", line 234, in
engine = DeepSpeedEngine(args=args,
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 304, in init
main()
self._configure_optimizer(optimizer, model_parameters)
File "/home/workspace/ChatGLM-Finetuning/train.py", line 178, in main
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 1186, in _configure_optimizer
model, optimizer, _, lr_scheduler = deepspeed.initialize(model=model, args=args, config=ds_config,
basic_optimizer = self._configure_basic_optimizer(model_parameters)
^^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^ File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/deepspeed/init.py", line 171, in initialize
^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 1263, in _configure_basic_optimizer
engine = DeepSpeedEngine(args=args,
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 304, in init
optimizer = FusedAdam(
self._configure_optimizer(optimizer, model_parameters)
^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 1186, in _configure_optimizer
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/deepspeed/ops/adam/fused_adam.py", line 94, in init
fused_adam_cuda = FusedAdamBuilder().load()
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/deepspeed/ops/op_builder/builder.py", line 446, in load
basic_optimizer = self._configure_basic_optimizer(model_parameters)
^^^^^^ ^return self.jit_load(verbose)^
^^^^^^^^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 1263, in _configure_basic_optimizer
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/deepspeed/ops/op_builder/builder.py", line 489, in jit_load
op_module = load(name=self.name,
^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1284, in load
optimizer = FusedAdam(
^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/deepspeed/ops/adam/fused_adam.py", line 94, in init
fused_adam_cuda = FusedAdamBuilder().load()
^^^ ^return _jit_compile(^
^^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^^^^^^^^^^^^^^^^
^^^^ File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/deepspeed/ops/op_builder/builder.py", line 446, in load
^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1534, in _jit_compile
return self.jit_load(verbose)
^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/deepspeed/ops/op_builder/builder.py", line 489, in jit_load
return _import_module_from_library(name, build_directory, is_python_module)
op_module = load(name=self.name,
^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^ File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1284, in load
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1936, in _import_module_from_library
return _jit_compile(
^^^^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1534, in _jit_compile
module = importlib.util.module_from_spec(spec)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^return _import_module_from_library(name, build_directory, is_python_module)^
^
File "", line 573, in module_from_spec
File "", line 1233, in create_module
File "", line 241, in _call_with_frames_removed
ImportError : /root/.cache/torch_extensions/py311_cu116/fused_adam/fused_adam.so: undefined symbol: _ZNSt15__exception_ptr13exception_ptr9_M_addrefEv
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1936, in _import_module_from_library
module = importlib.util.module_from_spec(spec)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "", line 573, in module_from_spec
File "", line 1233, in create_module
File "", line 241, in _call_with_frames_removed
ImportError: /root/.cache/torch_extensions/py311_cu116/fused_adam/fused_adam.so: undefined symbol: _ZNSt15__exception_ptr13exception_ptr9_M_addrefEv
[2024-02-04 17:56:50,782] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 30665
[2024-02-04 17:56:50,797] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 30666
[2024-02-04 17:56:50,807] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 30667
[2024-02-04 17:56:50,817] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 30668
[2024-02-04 17:56:50,818] [ERROR] [launch.py:321:sigkill_handler] ['/root/miniconda3/envs/chatglm/bin/python', '-u', 'train.py', '--local_rank=3', '--train_path', 'data/d2q_0.json', '--model_name_or_path', 'chatglm3-6b/', '--per_device_train_batch_size', '1', '--max_len', '1560', '--max_src_len', '1024', '--learning_rate', '1e-4', '--weight_decay', '0.1', '--num_train_epochs', '2', '--gradient_accumulation_steps', '4', '--warmup_ratio', '0.1', '--mode', 'glm3', '--train_type', 'lora', '--freeze_module_name', 'layers.27.,layers.26.,layers.25.,layers.24.', '--seed', '1234', '--ds_file', 'ds_zero2_no_offload.json', '--gradient_checkpointing', '--show_loss_step', '10', '--output_dir', './output-glm3'] exits with return code = 1
The text was updated successfully, but these errors were encountered:
[2024-02-04 17:56:47,007] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
Using /root/.cache/torch_extensions/py311_cu116 as PyTorch extensions root...
Using /root/.cache/torch_extensions/py311_cu116 as PyTorch extensions root...
Using /root/.cache/torch_extensions/py311_cu116 as PyTorch extensions root...
Using /root/.cache/torch_extensions/py311_cu116 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /root/.cache/torch_extensions/py311_cu116/fused_adam/build.ninja...
Building extension module fused_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module fused_adam...
Traceback (most recent call last):
File "/home/workspace/ChatGLM-Finetuning/train.py", line 234, in
main()
File "/home/workspace/ChatGLM-Finetuning/train.py", line 178, in main
model, optimizer, _, lr_scheduler = deepspeed.initialize(model=model, args=args, config=ds_config,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/deepspeed/init.py", line 171, in initialize
engine = DeepSpeedEngine(args=args,
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 304, in init
self._configure_optimizer(optimizer, model_parameters)
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 1186, in _configure_optimizer
basic_optimizer = self._configure_basic_optimizer(model_parameters)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 1263, in _configure_basic_optimizer
optimizer = FusedAdam(
^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/deepspeed/ops/adam/fused_adam.py", line 94, in init
fused_adam_cuda = FusedAdamBuilder().load()
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/deepspeed/ops/op_builder/builder.py", line 446, in load
return self.jit_load(verbose)
^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/deepspeed/ops/op_builder/builder.py", line 489, in jit_load
op_module = load(name=self.name,
^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1284, in load
return _jit_compile(
^^^^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1534, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1936, in _import_module_from_library
module = importlib.util.module_from_spec(spec)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "", line 573, in module_from_spec
File "", line 1233, in create_module
File "", line 241, in _call_with_frames_removed
ImportError: /root/.cache/torch_extensions/py311_cu116/fused_adam/fused_adam.so: undefined symbol: _ZNSt15__exception_ptr13exception_ptr9_M_addrefEv
Loading extension module fused_adam...
Traceback (most recent call last):
File "/home/workspace/ChatGLM-Finetuning/train.py", line 234, in
main()
File "/home/workspace/ChatGLM-Finetuning/train.py", line 178, in main
model, optimizer, _, lr_scheduler = deepspeed.initialize(model=model, args=args, config=ds_config,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/deepspeed/init.py", line 171, in initialize
engine = DeepSpeedEngine(args=args,
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 304, in init
self._configure_optimizer(optimizer, model_parameters)
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 1186, in _configure_optimizer
basic_optimizer = self._configure_basic_optimizer(model_parameters)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 1263, in _configure_basic_optimizer
optimizer = FusedAdam(
^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/deepspeed/ops/adam/fused_adam.py", line 94, in init
fused_adam_cuda = FusedAdamBuilder().load()
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/deepspeed/ops/op_builder/builder.py", line 446, in load
return self.jit_load(verbose)
^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/deepspeed/ops/op_builder/builder.py", line 489, in jit_load
op_module = load(name=self.name,
^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1284, in load
return _jit_compile(
^^^^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1534, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1936, in _import_module_from_library
module = importlib.util.module_from_spec(spec)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "", line 573, in module_from_spec
File "", line 1233, in create_module
File "", line 241, in _call_with_frames_removed
ImportError: /root/.cache/torch_extensions/py311_cu116/fused_adam/fused_adam.so: undefined symbol: _ZNSt15__exception_ptr13exception_ptr9_M_addrefEv
Loading extension module fused_adam...
Loading extension module fused_adam...
Traceback (most recent call last):
File "/home/workspace/ChatGLM-Finetuning/train.py", line 234, in
main()
File "/home/workspace/ChatGLM-Finetuning/train.py", line 178, in main
model, optimizer, _, lr_scheduler = deepspeed.initialize(model=model, args=args, config=ds_config,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/deepspeed/init.py", line 171, in initialize
Traceback (most recent call last):
File "/home/workspace/ChatGLM-Finetuning/train.py", line 234, in
engine = DeepSpeedEngine(args=args,
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 304, in init
main()
self._configure_optimizer(optimizer, model_parameters)
File "/home/workspace/ChatGLM-Finetuning/train.py", line 178, in main
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 1186, in _configure_optimizer
model, optimizer, _, lr_scheduler = deepspeed.initialize(model=model, args=args, config=ds_config,
basic_optimizer = self._configure_basic_optimizer(model_parameters)
^^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^ File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/deepspeed/init.py", line 171, in initialize
^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 1263, in _configure_basic_optimizer
engine = DeepSpeedEngine(args=args,
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 304, in init
optimizer = FusedAdam(
self._configure_optimizer(optimizer, model_parameters)
^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 1186, in _configure_optimizer
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/deepspeed/ops/adam/fused_adam.py", line 94, in init
fused_adam_cuda = FusedAdamBuilder().load()
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/deepspeed/ops/op_builder/builder.py", line 446, in load
basic_optimizer = self._configure_basic_optimizer(model_parameters)
^^^^^^ ^return self.jit_load(verbose)^
^^^^^^^^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 1263, in _configure_basic_optimizer
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/deepspeed/ops/op_builder/builder.py", line 489, in jit_load
op_module = load(name=self.name,
^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1284, in load
optimizer = FusedAdam(
^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/deepspeed/ops/adam/fused_adam.py", line 94, in init
fused_adam_cuda = FusedAdamBuilder().load()
^^^ ^return _jit_compile(^
^^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^^^^^^^^^^^^^^^^
^^^^ File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/deepspeed/ops/op_builder/builder.py", line 446, in load
^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1534, in _jit_compile
return self.jit_load(verbose)
^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/deepspeed/ops/op_builder/builder.py", line 489, in jit_load
return _import_module_from_library(name, build_directory, is_python_module)
op_module = load(name=self.name,
^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^ File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1284, in load
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1936, in _import_module_from_library
return _jit_compile(
^^^^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1534, in _jit_compile
module = importlib.util.module_from_spec(spec)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^return _import_module_from_library(name, build_directory, is_python_module)^
^
File "", line 573, in module_from_spec
File "", line 1233, in create_module
File "", line 241, in _call_with_frames_removed
ImportError : /root/.cache/torch_extensions/py311_cu116/fused_adam/fused_adam.so: undefined symbol: _ZNSt15__exception_ptr13exception_ptr9_M_addrefEv
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/chatglm/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1936, in _import_module_from_library
module = importlib.util.module_from_spec(spec)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "", line 573, in module_from_spec
File "", line 1233, in create_module
File "", line 241, in _call_with_frames_removed
ImportError: /root/.cache/torch_extensions/py311_cu116/fused_adam/fused_adam.so: undefined symbol: _ZNSt15__exception_ptr13exception_ptr9_M_addrefEv
[2024-02-04 17:56:50,782] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 30665
[2024-02-04 17:56:50,797] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 30666
[2024-02-04 17:56:50,807] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 30667
[2024-02-04 17:56:50,817] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 30668
[2024-02-04 17:56:50,818] [ERROR] [launch.py:321:sigkill_handler] ['/root/miniconda3/envs/chatglm/bin/python', '-u', 'train.py', '--local_rank=3', '--train_path', 'data/d2q_0.json', '--model_name_or_path', 'chatglm3-6b/', '--per_device_train_batch_size', '1', '--max_len', '1560', '--max_src_len', '1024', '--learning_rate', '1e-4', '--weight_decay', '0.1', '--num_train_epochs', '2', '--gradient_accumulation_steps', '4', '--warmup_ratio', '0.1', '--mode', 'glm3', '--train_type', 'lora', '--freeze_module_name', 'layers.27.,layers.26.,layers.25.,layers.24.', '--seed', '1234', '--ds_file', 'ds_zero2_no_offload.json', '--gradient_checkpointing', '--show_loss_step', '10', '--output_dir', './output-glm3'] exits with return code = 1
The text was updated successfully, but these errors were encountered: