-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pod stuck in containercreating phase #35
Comments
That should work. It’s being allocated and scheduled to the node, but then failing in the prepare stage on the node. What do the logs of the DRA plugin show? |
Here are the logs:
it says podschedulingcontext was deleted! |
Those are not the logs for the kubelet plugin, those are the logs of controller. |
ok sure, here are the logs of Kubelet plugin:
|
Below is the resourceclaim status:
|
Something's not right. You said you had claims for two |
ok, there was an issue with 3rd GPU partition. now I am creating 1g.10gb partition and I have already created two 2g.20gb but still it is unable to create the partition:
below is the snippet of nas object:
|
I deleted the old kind cluster, created a fresh one, and redid the experiments. All jobs ran successfully, thanks for your help but I am not sure about the root cause. Please feel free to close this issue. |
|
Can this be closed? |
Hello,
launched 3 jobs two with profile 2g.20gb and one with profile 1g.10gb. The last job is stuck in containercreating phase:
are the profiles not supported on single A100 80GB GPU?
The text was updated successfully, but these errors were encountered: