Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cupy CUDADriverError: context is destroyed #32

Closed
laurendeaumatthieu opened this issue Feb 6, 2024 · 9 comments
Closed

Cupy CUDADriverError: context is destroyed #32

laurendeaumatthieu opened this issue Feb 6, 2024 · 9 comments

Comments

@laurendeaumatthieu
Copy link

Hello,

While using Cupy and ITK, I got a Cuda error: context is destroyed. This seems to happen when I call Cupy functions around ITK's use of Cuda.

Here is a minimal example to reproduce the error:

import cupy as cp
import itk


cp.log(1/cp.ones((1200, 120)))/0.01879

img = itk.CudaImage[itk.F, 3].New()
img.SetRegions([128]*3)
img.Allocate()
img.GetBufferPointer()

cp.log(1/cp.ones((1200, 120)))/0.01879 # <-- crash here

Screenshot from 2024-02-06 16-18-04

cuda version: 11.2
itk version : 5.4.0
cupy version : 11.3.0 (cupy-cuda112)

I would guess ITK destroy the GPU context created by Cupy.

Thanks in advance for any help,

Matthieu

@SimonRit
Copy link
Collaborator

SimonRit commented Feb 7, 2024

I can reproduce and I don't understand the issue. Surprisingly, if you do

import cupy as cp
import itk

cp.log(1/cp.ones((1200, 120)))/0.01879                                                                                                      
device = cp.cuda.runtime.getDevice()
img = itk.CudaImage[itk.F, 3].New()
cp.cuda.runtime.setDevice(device)
cp.log(1/cp.ones((1200, 120)))/0.01879

it does not crash so setDevice must reset something...

@SimonRit
Copy link
Collaborator

SimonRit commented Feb 8, 2024

Can you try the following piece of code and let me know if it helps?

import cupy as cp
import itk

cp.log(1/cp.ones((1200, 120)))/0.01879
ctx = cp.cuda.driver.ctxGetCurrent()
img = itk.CudaImage[itk.F, 3].New()
cp.cuda.driver.ctxSetCurrent(ctx)
cp.log(1/cp.ones((1200, 120)))/0.01879

@laurendeaumatthieu
Copy link
Author

Thanks !

The second solution runs, and using context like this seems to work in my other codes.
However, after closing my python script, I still got an error message:

Traceback (most recent call last):
File "cupy_backends/cuda/api/driver.pyx", line 217, in cupy_backends.cuda.api.driver.moduleUnload
File "cupy_backends/cuda/api/driver.pyx", line 60, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_CONTEXT_IS_DESTROYED: context is destroyed
Exception ignored in: 'cupy.cuda.function.Module.dealloc'
Traceback (most recent call last):
File "cupy_backends/cuda/api/driver.pyx", line 217, in cupy_backends.cuda.api.driver.moduleUnload
File "cupy_backends/cuda/api/driver.pyx", line 60, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_CONTEXT_IS_DESTROYED: context is destroyed

So there is still a problem with context destruction.

Also, I have tested with pytorch to check that the problem is not in the cupy side:

import cupy as cp
import torch
torch.cuda.device_count()

cp.log(1/cp.ones((1200, 120)))/0.01879
t = torch.ones((100,100,3)).cuda()
cp.log(1/cp.ones((1200, 120)))/0.01879

This works and no error in closing the python script.


Finally, I have explored in details the src files and see that the function cudaSetDevice() initializes the primary context. I have modified the function CudaContextManager(), empty the destructor ~CudaContextManager() and remove GetCurrentContext() (and where it is called). You can check the branch cudaContextManagement to see the details.
This solution works for me, the little example, my other codes and scripts close without problem.

I do not know how delete the lines CUDA_CHECK(cuCtxSetCurrent(*(this->m_ContextManager->GetCurrentContext()))); // This is necessary when running multithread to bind the host CPU thread to the right context could impact the rest. I have tested some reconstruction with RTK (which uses ITKCudaCommon) and all seems fine.

I'll let you get back to me and do the modifications if everything is ok.

Thanks,

Matthieu

@SimonRit
Copy link
Collaborator

SimonRit commented Feb 8, 2024

Does it also happen if you comment the cudaDeviceReset line https://github.com/RTKConsortium/ITKCudaCommon/blob/master/src/itkCudaContextManager.cxx#L92 ?

@laurendeaumatthieu
Copy link
Author

If I only comment cudaDeviceReset(); and let the rest as original; I have the cupy error

Matthieu

@SimonRit
Copy link
Collaborator

SimonRit commented Feb 8, 2024

I can't reproduce this error so I'm not sure how to help...

@SimonRit
Copy link
Collaborator

SimonRit commented Feb 8, 2024

Have you tried adding

cp.cuda.driver.ctxSetCurrent(ctx)

as the last line of the program?

@laurendeaumatthieu
Copy link
Author

laurendeaumatthieu commented Feb 9, 2024

EDIT:
Keeping the original ITKCudaCommon and commenting cudaDeviceReset(); works well if I use ctx = cp.cuda.driver.ctxGetCurrent() after the first cupy call and cp.cuda.driver.ctxSetCurrent(ctx) after RTK functions.
I also did not get the error message cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_CONTEXT_IS_DESTROYED: context is destroyed.

If possible, I would still prefer the changes made in the branch cudaContextManagement as I would not have to add ctx = cp.cuda.driver.ctxGetCurrent() and cp.cuda.driver.ctxSetCurrent(ctx) in my scripts.

@laurendeaumatthieu
Copy link
Author

I made a PR #34 from my branch cudaContextManagement for the tests

SimonRit referenced this issue Feb 16, 2024
Use the primary context with cudaSetDevice() introduced by Cuda 7 (https://developer.download.nvidia.com/compute/cuda/7_0/Prod/doc/CUDA_Toolkit_Release_Notes.pdf) instead of a new one. cudaSetDevice is called before every memory transfer between the CPU and GPU to be sure that the context is set for the current thread, see https://developer.nvidia.com/blog/cuda-pro-tip-always-set-current-device-avoid-multithreading-bugs/.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants