Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash after user interrupt #164

Open
albertz opened this issue Dec 18, 2023 · 4 comments
Open

Crash after user interrupt #164

albertz opened this issue Dec 18, 2023 · 4 comments

Comments

@albertz
Copy link
Member

albertz commented Dec 18, 2023

Sometimes, but not always (maybe 20% of the cases?), when I hit Ctrl+C, I get this crash:

^C[2023-12-18 18:53:21,090] INFO: Got user interrupt signal stop engine and exit                                                                        [2023-12-18 18:53:21,090] WARNING: Main thread exit. Still running non-daemon threads: {<LocalEngine(Thread-1, started 140176269506112)>}               
[2023-12-18 18:53:21,665] ERROR: Exception in thread <DummyProcess(Thread-12 (worker), started daemon 140175636158016)>:                                [2023-12-18 18:53:21,666] ERROR: Exception in thread <DummyProcess(Thread-18 (worker), started daemon 140175107679808)>:                                
[2023-12-18 18:53:21,734] ERROR: Exception in thread <DummyProcess(Thread-14 (worker), started daemon 140175619372608)>:                                [2023-12-18 18:53:21,734] ERROR: Exception in thread <DummyProcess(Thread-7 (worker), started daemon 140176156243520)>:                                 
[2023-12-18 18:53:21,734] ERROR: Exception in thread <DummyProcess(Thread-6 (worker), started daemon 140176164636224)>:                                 [2023-12-18 18:53:21,776] ERROR: Exception in thread <DummyProcess(Thread-15 (worker), started daemon 140175610979904)>:                                
[2023-12-18 18:53:21,817] ERROR: Exception in thread <DummyProcess(Thread-3 (worker), started daemon 140176189814336)>:                                 
[2023-12-18 18:53:21,858] ERROR: Exception in thread <DummyProcess(Thread-9 (worker), started daemon 140176139458112)>:                                 
[2023-12-18 18:53:21,858] ERROR: Exception in thread <DummyProcess(Thread-4 (worker), started daemon 140176181421632)>:                                 [2023-12-18 18:53:21,859] ERROR: Exception in thread <DummyProcess(Thread-13 (worker), started daemon 140175627765312)>:
EXCEPTION
Traceback (most recent call last):
(Exclude vars because we are exiting.) 
  File "/u/zeyer/setups/combined/2021-05-31/tools/sisyphus/sisyphus/tools.py", line 311, in default_handle_exception_interrupt_main_thread.<locals>.wrap
ped_func
EXCEPTION
Traceback (most recent call last):
[2023-12-18 18:53:21,859] ERROR: Exception in thread <DummyProcess(Thread-11 (worker), started daemon 140175644550720)>:
EXCEPTION
Traceback (most recent call last):
EXCEPTION
    line: return func(*args, **kwargs)
  File "/u/zeyer/setups/combined/2021-05-31/tools/sisyphus/sisyphus/graph.py", line 570, in SISGraph.for_all_nodes.<locals>.runner_helper
    line: runner(path.creator)
  File "/u/zeyer/setups/combined/2021-05-31/tools/sisyphus/sisyphus/graph.py", line 547, in SISGraph.for_all_nodes.<locals>.runner
EXCEPTION
(Exclude vars because we are exiting.) 
  File "/u/zeyer/setups/combined/2021-05-31/tools/sisyphus/sisyphus/tools.py", line 311, in default_handle_exception_interrupt_main_thread.<locals>.wrap
ped_func
    line: return func(*args, **kwargs)
EXCEPTION
EXCEPTION
Traceback (most recent call last):
Traceback (most recent call last):
(Exclude vars because we are exiting.) 
EXCEPTION
  File "/u/zeyer/setups/combined/2021-05-31/tools/sisyphus/sisyphus/graph.py", line 570, in SISGraph.for_all_nodes.<locals>.runner_helper
    line: runner(path.creator)
EXCEPTION
Traceback (most recent call last):
Traceback (most recent call last):
(Exclude vars because we are exiting.) 
(Exclude vars because we are exiting.) 
...
    line: self._check_running()                                                                                                                         
  File "/work/tools/users/zeyer/linuxbrew/opt/[email protected]/lib/python3.11/multiprocessing/pool.py", line 353, in Pool._check_running                     
    line: raise ValueError("Pool not running")                                                                                                          ValueError: Pool not running                                                                                                                            
    line: self._check_running()                                                                                                                           File "/work/tools/users/zeyer/linuxbrew/opt/[email protected]/lib/python3.11/multiprocessing/pool.py", line 353, in Pool._check_running                     
Exception ignored in atexit callback: <function shutdown at 0x7f7d659ae5c0>                                                                             
  File "/work/tools/users/zeyer/linuxbrew/opt/[email protected]/lib/python3.11/multiprocessing/pool.py", line 458, in Pool.apply_async                        
    line: self._check_running()                                                                                                                         
EXCEPTION                                                                                                                                               
Traceback (most recent call last):                                                                                                                      
EXCEPTION                                                                                                                                               
Traceback (most recent call last):                                                                                                                      
(Exclude vars because we are exiting.)                                                                                                                  
    line: raise ValueError("Pool not running")                                                                                                          
  File "/work/tools/users/zeyer/linuxbrew/opt/[email protected]/lib/python3.11/multiprocessing/pool.py", line 353, in Pool._check_running                     
Exception ignored in sys.unraisablehook: <built-in function unraisablehook>                                                                             (Exclude vars because we are exiting.)                                                                                                                  
  File "/u/zeyer/setups/combined/2021-05-31/tools/sisyphus/sisyphus/tools.py", line 311, in default_handle_exception_interrupt_main_thread.<locals>.wrapped_func                                                                                                                                                
KeyboardInterrupt                                                                                                                                       Fatal Python error: _enter_buffered_busy: could not acquire lock for <_io.BufferedWriter name='<stderr>'> at interpreter shutdown, possibly due to daemon threads                                                                                                                                               
Python runtime state: finalizing (tstate=0x00007f7d668932d8)                                                                                            
                                                                                                                                                        
Current thread 0x00007f7d66080000 (most recent call first):                                                                                             
  <no Python frame>                                                                                                                                     
                                                                                                                                                        
Extension modules: psutil._psutil_linux, psutil._psutil_posix, numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, n
umpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, h5py._errors, h5py.defs, h5py._objects, h5py.h5, h5py.utils
, h5py.h5t, h5py.h5s, h5py.h5ac, h5py.h5p, h5py.h5r, h5py._proxy, h5py._conv, h5py.h5z, h5py.h5a, h5py.h5d, h5py.h5ds, h5py.h5g, h5py.h5i, h5py.h5f, h5p
y.h5fd, h5py.h5pl, h5py.h5o, h5py.h5l, h5py._selector, markupsafe._speedups, _cffi_backend (total: 41)                                                  
fish: Job 2, '/work/tools/users/zeyer/py-envs…' terminated by signal SIGABRT (Abort)     
@albertz
Copy link
Member Author

albertz commented Dec 18, 2023

The scrambled output means that there are many processes here stopped at the same time by SIGINT.

@critias
Copy link
Contributor

critias commented Jan 2, 2024

The graph computations are using a ThreadPool (https://github.com/rwth-i6/sisyphus/blob/master/sisyphus/graph.py#L232C12-L232C12).
I guess you get this output if you hit Ctrl-C while these computations are running. This problem might go away if you set gs.GRAPH_WORKER=1, but you would also use the multithreading speed up if your filesystem has a higher latency.

@albertz
Copy link
Member Author

albertz commented Jan 2, 2024

Are you saying GRAPH_WORKER=1 is anyway always better and we can remove the old code which handles GRAPH_WORKER>1?

I'm not searching for workarounds. Also, I could simply just ignore this message.

I simply report this because I think it's bad if the process crashes with terminated by signal SIGABRT, and maybe this should be investigated further.

@critias
Copy link
Contributor

critias commented Jan 2, 2024

No, I'm not saying GRAPH_WORKER=1 is better, it's just a workaround which in most cases makes sisyphus slower.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants