Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Karpenter integration incompatible with Karpenter >= 1.0.0 #18367

Open
JacobHenner opened this issue Aug 19, 2024 · 3 comments
Open

Karpenter integration incompatible with Karpenter >= 1.0.0 #18367

JacobHenner opened this issue Aug 19, 2024 · 3 comments

Comments

@JacobHenner
Copy link

Karpenter's 1.0.0 release renames several metrics. After upgrading to 1.0.0, new data points for the previously reported metrics are no longer accessible in Datadog.

Steps to reproduce the issue:

  1. Upgrade Karpenter from 0.x.y to >= 1.0.0
  2. View Karpenter metrics in Datadog

Describe the results you received:

Several metrics are no longer reported

Describe the results you expected:

Metrics continue to report (or continue to report following a datadog-agent upgrade)

Additional information you deem important (e.g. issue happens only occasionally):

I can submit a PR to modify the integration, but I am not sure if there's an existing convention for renaming both the input and output metric names, or just the input (to maintain continuity with pre-existing monitors, dashboards, etc). I'll gladly submit a PR once guidance is provided.

@JacobHenner
Copy link
Author

For spectators: I'm told that #18448 is expected to be included in datadog-agent 7.58. In the meantime, you can continue to ingest metrics from Karpenter>=1.0.0 using the following configuration:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: karpenter
  namespace: kube-system
spec:
  template:
    metadata:
      annotations:
        ad.datadoghq.com/controller.checks: |
          {
            "karpenter": {
              "init_config": {},
              "instances": [
                {
                  "openmetrics_endpoint": "http://%%host%%:8080/metrics",
                  "extra_metrics": [
                    {
                      "karpenter_nodes_termination_duration_seconds": "nodes.termination.time_seconds"
                    },
                    {
                      "karpenter_pods_startup_duration_seconds": "pods.startup.time_seconds"
                    },
                    {
                      "karpenter_voluntary_disruption_queue_failures": "disruption.replacement.nodeclaim.failures"
                    },
                    {
                      "karpenter_voluntary_disruption_decision_evaluation_duration_seconds": "disruption.evaluation.duration_seconds"
                    },
                    {
                      "karpenter_voluntary_disruption_eligible_nodes": "disruption.eligible_nodes"
                    },
                    {
                      "karpenter_voluntary_disruption_consolidation_timeouts": "disruption.consolidation_timeouts"
                    },
                    {
                      "karpenter_nodepools_allowed_disruptions": "disruption.budgets.allowed_disruptions"
                    },
                    {
                      "karpenter_voluntary_disruption_decisions": "disruption.actions_performed"
                    },
                    {
                      "karpenter_scheduler_scheduling_duration_seconds": "provisioner.scheduling.simulation.duration_seconds"
                    },
                    {
                      "karpenter_scheduler_queue_depth": "provisioner.scheduling.queue_depth"
                    },
                    {
                      "karpenter_interruption_message_queue_duration_seconds": "interruption.message.latency.time_seconds"
                    },
                    {
                      "karpenter_nodepools_usage": "nodepool_usage"
                    },
                    {
                      "karpenter_nodepools_limit": "nodepool_limit"
                    }
                  ]
                }
              ]
            }
          }

@aelliottatsonatype
Copy link

If I'm using the helm chart, where does this code go? Is it under the agents section of the chart?

So far I have not been able to get this working.

@visokoo
Copy link

visokoo commented Sep 13, 2024

If I'm using the helm chart, where does this code go? Is it under the agents section of the chart?

So far I have not been able to get this working.

It goes under podAnnotations, like:

podAnnotations:
  ad.datadoghq.com/controller.checks: |
    {
      "karpenter": {
        "init_config": {},
        "instances": [
          {
            "openmetrics_endpoint": "http://%%host%%:%%port_1%%/metrics",
            "extra_metrics": [
              {
                "karpenter_nodes_termination_duration_seconds": "nodes.termination.time_seconds"
              },
              {
                "karpenter_pods_startup_duration_seconds": "pods.startup.time_seconds"
              },
...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants