Running Autobucketing BERT example in ec2 instance throwing process error #1001

hasadata · 2024-10-04T23:52:21Z

I am trying to run the Autobucketing BERT example from neuronX docs en ec2 instance R7A.xlarge2

I get this error

Traceback (most recent call last): File "/home/ec2-user/random1234/autobucketing.py", line 83, in <module> bucket_trace_neuron = torch_neuronx.bucket_model_trace(get_bert_model, [paraphrase_s128,paraphrase_s512], bucket_config) File "/home/ec2-user/random1234/aws_neuron_venv_pytorch/lib64/python3.9/site-packages/torch_neuronx/xla_impl/bucket_trace.py", line 358, in bucket_model_trace xmp.spawn( File "/home/ec2-user/random1234/aws_neuron_venv_pytorch/lib64/python3.9/site-packages/torch_xla/distributed/xla_multiprocessing.py", line 393, in spawn return torch.multiprocessing.start_processes( File "/home/ec2-user/random1234/aws_neuron_venv_pytorch/lib64/python3.9/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes while not context.join(): File "/home/ec2-user/random1234/aws_neuron_venv_pytorch/lib64/python3.9/site-packages/torch/multiprocessing/spawn.py", line 140, in join raise ProcessExitedException( torch.multiprocessing.spawn.ProcessExitedException: process 0 terminated with signal SIGSEGV

neuronx-cc==2.* torch-neuronx==1.13.* torchvision

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running Autobucketing BERT example in ec2 instance throwing process error #1001

Running Autobucketing BERT example in ec2 instance throwing process error #1001

hasadata commented Oct 4, 2024 •

edited

Loading

Running Autobucketing BERT example in ec2 instance throwing process error #1001

Running Autobucketing BERT example in ec2 instance throwing process error #1001

Comments

hasadata commented Oct 4, 2024 • edited Loading

hasadata commented Oct 4, 2024 •

edited

Loading