Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[k8s] Disable autostop for controller on kubernetes #3521

Merged
merged 96 commits into from
May 8, 2024

Conversation

romilbhardwaj
Copy link
Collaborator

@romilbhardwaj romilbhardwaj commented May 8, 2024

Skips autostop when using the jobs controller on Kubernetes.

This should be merged after #3377, since that PR will add support for SERVICE_ACCOUNT, which is helpful for running the controller on GKE exec based auth clusters.

Tested (run the relevant ones):

  • Code formatting: bash format.sh
  • Tested manually with sky jobs launch with controller and job on Kubernetes
  • Managed job smoke tests pytest tests/test_smoke.py --managed-jobs --kubernetes

@romilbhardwaj romilbhardwaj added the do not merge do not merge this PR now label May 8, 2024
Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @romilbhardwaj! LGTM.

@romilbhardwaj romilbhardwaj removed the do not merge do not merge this PR now label May 8, 2024
@romilbhardwaj
Copy link
Collaborator Author

Thanks @Michaelvll! Tested with pytest tests/test_smoke.py --managed-jobs --kubernetes on GKE cluster with this config.yaml:

kubernetes:
  remote_identity: SERVICE_ACCOUNT
jobs:
  controller:
    resources:
      cloud: kubernetes
      cpus: 8
      memory: 8

@Michaelvll
Copy link
Collaborator

Thanks @Michaelvll! Tested with pytest tests/test_smoke.py --managed-jobs --kubernetes on GKE cluster with this config.yaml:

kubernetes:
  remote_identity: SERVICE_ACCOUNT
jobs:
  controller:
    resources:
      cloud: kubernetes
      cpus: 8
      memory: 8

Awesome! Can we also test it without explicit specify the controller resources?

@romilbhardwaj
Copy link
Collaborator Author

Yes, smoke tests pass even without specifying controller resources! Kubernetes is automatically chosen for the controller if it has enough resources (8 CPU, 24 GB mem), verified with kubectl get pods.

Comment on lines +1997 to +2000
if isinstance(to_provision.cloud, clouds.Kubernetes
) and controller_utils.Controllers.from_name(
cluster_name
) == controller_utils.Controllers.JOBS_CONTROLLER:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: would be nice to add ()

Suggested change
if isinstance(to_provision.cloud, clouds.Kubernetes
) and controller_utils.Controllers.from_name(
cluster_name
) == controller_utils.Controllers.JOBS_CONTROLLER:
if (isinstance(to_provision.cloud, clouds.Kubernetes
) and controller_utils.Controllers.from_name(
cluster_name
) == controller_utils.Controllers.JOBS_CONTROLLER):

@romilbhardwaj romilbhardwaj merged commit 12c156a into master May 8, 2024
20 checks passed
@romilbhardwaj romilbhardwaj deleted the k8s_enable_job_controller branch May 8, 2024 07:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants