Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add instanceId to health check log #568

Open
wants to merge 1 commit into
base: mainline
Choose a base branch
from

Conversation

dil-ddecarvalhogomes
Copy link

@dil-ddecarvalhogomes dil-ddecarvalhogomes commented May 16, 2024

Description of changes:

I need to create a log monitor alert for SSM Agents but since the instance Id is only present in the Log Stream, it's not possible to get the log stream and pack it as a dimension, making the alert useless.

Therefore, this is what I need, output the instanceId in the health check line:

image

Top couple lines are from the original AWS SSM Agent (no instance id) and bottom lines is after I installed my version.

Then, with the following AWS CDK v2 code, I was able to create the monitor that I needed:

  createSSMAgentHealthMetricFilter(
    context: InfraContext
  ): logs.MetricFilter {
    const pattern = logs.FilterPattern.spaceDelimited(
      "w1",
      "w2",
      "w3",
      "w4",
      "w5",
      "w6",
      "InstanceId",
      "w7",
      "w8",
      "..."
    )
      .whereString("w6", "=", "HealthCheck")
      .whereString("w8", "=", "reporting");

    const metricFilter = new logs.MetricFilter(
      this,
      "SSMAgentHealthLogFilter",
      {
        filterPattern: pattern,
        logGroup: context.ssmAgentLogGroup,
        metricNamespace: "SSM/Agent",
        metricName: "SSMAgentHealthLog",
        metricValue: "1",
        dimensions: {
          InstanceId: "$InstanceId",
        },
      }
    );
    
    return metricFilter;
  }

... instantiate alert in a different stack ...

  // create a CLoudWatch alert for the SSM Agent logs
  createSsmAgentHealthLogAlarm(
    context: ServiceContext,
    instance: ec2.Instance,
    instanceIndex: number
  ) {
    const metric = new cloudwatch.Metric({
      namespace: "SSM/Agent",
      metricName: "SSMAgentHealthLog",
      statistic: "Sum",
      dimensionsMap: {
        InstanceId: instance.instanceId,
      },
    });

    new cloudwatch.Alarm(this, `SSMAgentErrorAlarm${instanceIndex}`, {
      metric: metric,
      threshold: 1,
      evaluationPeriods: 5,
      comparisonOperator: cloudwatch.ComparisonOperator.LESS_THAN_THRESHOLD,
      alarmName: "SSMAgentHealthAlarm[" + instance.instanceId + "]",
      alarmDescription:
        "SSM Agent not reporting health checks on " + instance.instanceId,
      actionsEnabled: true,
      treatMissingData: cloudwatch.TreatMissingData.BREACHING,
    });
  }

Now, a reasonable alternative would be to install the CloudWatch agent and configure a process monitor, which requires a PowerShell script. This is a LOT of hops to get the SSM agent monitored.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant