Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C/C++ Pointer error when running container with apptainer #225

Open
jthet opened this issue Sep 15, 2023 · 2 comments
Open

C/C++ Pointer error when running container with apptainer #225

jthet opened this issue Sep 15, 2023 · 2 comments

Comments

@jthet
Copy link

jthet commented Sep 15, 2023

I've been getting errors when running the MTUQ container on TACC's frontera through apptainer. The errors have been indeterminant, however have always happened after the third "about 75 percent finished" message. See below for the std out, but I have also gotten error like malloc(): invalid size (unsorted) , double free or corruption (out) , corrupted size vs. prev_size in fastbins

The sif image was freshly pulled and it is the newest version.

c202-001[clx](423)$ APPTAINERENV_SYNGINE_CACHE=syngine_output ibrun apptainer run mtuq_ubuntu20.04.sif python3 /home/scoped/mtuq/examples/DetailedAnalysis.py
TACC:  Starting up job 5796560 
TACC:  Starting parallel tasks... 
  about 0 percent finished
  about 25 percent finished
  about 50 percent finished
  about 75 percent finished
  about 0 percent finished
  about 25 percent finished
  about 50 percent finished
  about 75 percent finished
  about 0 percent finished
  about 25 percent finished
  about 50 percent finished
  about 75 percent finished
free(): invalid pointer
@rmodrak
Copy link
Member

rmodrak commented Sep 15, 2023

Thanks for reporting this issue, which I wasn't aware of previously.

The progress messages you mentioned are from the following Cython function:
https://github.com/uafgeotools/mtuq/blob/master/mtuq/misfit/waveform/c_ext_L2.c

In this function, it appears that most or all of the memory allocation/deallocation occurs through the Numpy API.

To start, it is probably worth double checking the NumPy API is being used correctly.

Also, it may be worth double checking this module intialization by comparing it against the Cython docs.

I am hoping that a software developer at my workplace might be able to start looking at the issue in October, but anyone is welcome to try troubleshooting.

In the meantime, if you create the misfit function usingWaveformMisfit(optimization_level=1, ...), then mtuq falls back to a slower pure Python implementation in which the Cython extensions are not called.

@rmodrak
Copy link
Member

rmodrak commented Sep 15, 2023

As expected for such a generic error message, free(): invalid pointer brings a very large number of stackoverflow and other search results.

Interestingly though, many of the top results appear to be Cython related, including a still apparently unresolved Pytorch issue, for example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants