Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Service in Descheduler without ClusterIP as None - Helm Chart #1437

Open
jmk47912204 opened this issue Jun 10, 2024 · 5 comments
Open
Labels
kind/support Categorizes issue or PR as a support question. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@jmk47912204
Copy link

jmk47912204 commented Jun 10, 2024

I would like to raise this concern, we wanted to enable metrics for descheduler but without prometheus installed in our cluster at all. Why? Because we are using datadog as a observability tool and it's required to scrape the metrics from descheduler service metrics endpoint which is not available because the clusterIP of service is None and due to this datadog are not able to scrape this endpoint

I have shared all the logs and configuration details here and other github issues references as well

Descheduler Version: 0.30.0
GKE version: 1.28.3

Thank you

@a7i
Copy link
Contributor

a7i commented Jun 11, 2024

Hi @jmk47912204 how are you defining it?

This is how I've defined it for Datadog using PodSpec annotations

kind: Deployment
...
spec:
  template:
    metadata:
      annotations:
        ad.datadoghq.com/descheduler.checks: |
          {
            "openmetrics": {
              "instances": [
                {
                  "openmetrics_endpoint": "https://%%host%%:10258/metrics",
                  "namespace": "descheduler",
                  "metrics": [
                    "descheduler_pods_evicted",
                    { "descheduler_descheduler_loop_duration_seconds": "descheduler_loop_duration_seconds" },
                    { "descheduler_descheduler_strategy_duration_seconds": "descheduler_strategy_duration_seconds" }
                  ],
                  "collect_histogram_buckets": true,
                  "histogram_buckets_as_distributions": true,
                  "tls_ca_cert": false,
                  "tls_verify": false,
                  "tls_ignore_warning": true,
                  "tags": [
                    "service:descheduler"
                  ]
                }
              ]
            }
          }

This instructs datadog to scrape the metrics from the Pod/Container. You want to take this approach in case you run Descheduler in high-availability mode (2 pods), and in that scenario, the Service of type ClusterIP will do it round-robin, leading the incomplete results, given that one may not be a leader.

@a7i
Copy link
Contributor

a7i commented Jun 11, 2024

/kind support

@k8s-ci-robot k8s-ci-robot added the kind/support Categorizes issue or PR as a support question. label Jun 11, 2024
@jmk47912204
Copy link
Author

jmk47912204 commented Jun 12, 2024

Hey @a7i

Thanks for the response. Actually, we are using the above configuration already in our deployment and we ignore some other configurations since it's not required to achieve our goal/requirement

So basically datadog agent is running as operator with daemonset in our cluster and in-order to scrape the metrics from descheduler into datadog, we need to have service which need to run as cluster IP assigned to the service only then it scrapes the metrics.

Is there any specific reason why descheduler svc shouldn't have clusterIP because we have lot of products are running in our cluster in which those we don't have clusterIP: None which means datadog agent scrapes the metrics from our product

You can find the datadog agent logs as well here

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 10, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/support Categorizes issue or PR as a support question. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

4 participants