Skip to content

Commit

Permalink
[k8s] Disable autostop for controller on kubernetes (#3521)
Browse files Browse the repository at this point in the history
* playing around

* wip with hacks

* wip refactor get_endpoints

* working get_endpoints

* wip

* fixed circular import

* Working for ingress and loadbalancer svc

* lint

* add purging from #3094

* Use local catalog on the controller too

* use externalip if available

* add dshm_size_limit

* optimize dependency installation

* Add todo

* optimize ingress

* fix

* fix

* remove autostop timing

* Fix URLs for raw IP:ports

* fixes

* wip

* SA wip

* Allow use of service accounts through remote_identity field

* Make purge work for no clusters in kubeconfig

* Handle ingress namespace not present

* setup optimizations and critical SA key fix

* fix docs

* fix docs

* Add support for skypilot.co/external-ip annotation for ingress

* Remove dshm_size_limit

* Undo kind changes

* Update service account docs

* minor docs

* update comment

* is_same_cloud to cloud_in_list

* refactor query_ports to use head_ip

* autodown + http prefixing in callers

* fix ssh key issues when user hash is reused

* linting

* lint

* lint, HOST_CONTROLLERS

* add serve smoke tests for k8s

* disallow file_mounts and workdir if no storage cloud is enabled

* minor

* lint

* update fastchat to use --host 127.0.0.1

* extend timeout

* docs comments

* rename to port

* add to core.py

* docstrs

* add docs on exec based auth

* expand elif

* add lb comment

* refactor

* refactor

* fix docs build

* add PODIP mode support

* make ssh services optional

* nits

* Revert "make ssh services optional"

This reverts commit 87d4d25.

* Revert "add PODIP mode support"

This reverts commit 750d4d4.

* nits

* use 0.0.0.0 when on k8s; use common impl for other clouds

* return dict instead of raising errors in core.endpoints()

* lint

* merge fixes

* merge fixes

* merge fixes

* lint

* fix smoke tests

* fix smoke tests

* comment

* add enum for remote identity

* lint

* disable autostop for kubernetes

* add skip_status_check

* remove zone requirement

* fix timings for test

* silence curl download

* move jq from yaml to test_minimal

* move jq from yaml to test_minimal

* add assert

* lint

* lint
  • Loading branch information
romilbhardwaj authored May 8, 2024
1 parent 0a03995 commit 12c156a
Showing 1 changed file with 23 additions and 1 deletion.
24 changes: 23 additions & 1 deletion sky/backends/cloud_vm_ray_backend.py
Original file line number Diff line number Diff line change
Expand Up @@ -1991,9 +1991,21 @@ def provision_with_retries(
cloud_user = None
else:
cloud_user = to_provision.cloud.get_current_user_identity()

requested_features = self._requested_features.copy()
# Skip stop feature for Kubernetes jobs controller.
if isinstance(to_provision.cloud, clouds.Kubernetes
) and controller_utils.Controllers.from_name(
cluster_name
) == controller_utils.Controllers.JOBS_CONTROLLER:
assert (clouds.CloudImplementationFeatures.STOP
in requested_features), requested_features
requested_features.remove(
clouds.CloudImplementationFeatures.STOP)

# Skip if to_provision.cloud does not support requested features
to_provision.cloud.check_features_are_supported(
to_provision, self._requested_features)
to_provision, requested_features)

config_dict = self._retry_zones(
to_provision,
Expand Down Expand Up @@ -4053,6 +4065,16 @@ def set_autostop(self,
# The core.autostop() function should have already checked that the
# cloud and resources support requested autostop.
if idle_minutes_to_autostop is not None:
# Skip auto-stop for Kubernetes clusters.
if isinstance(handle.launched_resources.cloud, clouds.Kubernetes):
# We should hit this code path only for the jobs controller on
# Kubernetes clusters.
assert (controller_utils.Controllers.from_name(
handle.cluster_name) == controller_utils.Controllers.
JOBS_CONTROLLER), handle.cluster_name
logger.info('Auto-stop is not supported for Kubernetes '
'clusters. Skipping.')
return

# Check if we're stopping spot
assert (handle.launched_resources is not None and
Expand Down

0 comments on commit 12c156a

Please sign in to comment.