Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running Autobucketing BERT example in ec2 instance throwing process error #1001

Open
hasadata opened this issue Oct 4, 2024 · 0 comments
Open

Comments

@hasadata
Copy link

hasadata commented Oct 4, 2024

I am trying to run the Autobucketing BERT example from neuronX docs en ec2 instance R7A.xlarge2

I get this error

Traceback (most recent call last): File "/home/ec2-user/random1234/autobucketing.py", line 83, in <module> bucket_trace_neuron = torch_neuronx.bucket_model_trace(get_bert_model, [paraphrase_s128,paraphrase_s512], bucket_config) File "/home/ec2-user/random1234/aws_neuron_venv_pytorch/lib64/python3.9/site-packages/torch_neuronx/xla_impl/bucket_trace.py", line 358, in bucket_model_trace xmp.spawn( File "/home/ec2-user/random1234/aws_neuron_venv_pytorch/lib64/python3.9/site-packages/torch_xla/distributed/xla_multiprocessing.py", line 393, in spawn return torch.multiprocessing.start_processes( File "/home/ec2-user/random1234/aws_neuron_venv_pytorch/lib64/python3.9/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes while not context.join(): File "/home/ec2-user/random1234/aws_neuron_venv_pytorch/lib64/python3.9/site-packages/torch/multiprocessing/spawn.py", line 140, in join raise ProcessExitedException( torch.multiprocessing.spawn.ProcessExitedException: process 0 terminated with signal SIGSEGV

neuronx-cc==2.* torch-neuronx==1.13.* torchvision

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant