Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Integration of Prometheus Push Gateway and Energy Metrics Collection in Zeus #125

Open
sharonsyh opened this issue Sep 15, 2024 · 6 comments

Comments

@sharonsyh
Copy link
Collaborator

sharonsyh commented Sep 15, 2024

Integration of Prometheus Push Gateway and Energy Metrics Collection in Zeus

Issue #30

Motivation

The goal of this proposal is to streamline the integration of energy consumption metrics (e.g., for GPU, CPU, DRAM) into Prometheus using a push-based model with the Prometheus Push Gateway. It also proposes implementing support for Counters, Histograms to provide detailed insights into energy consumption and Gauges for power consumption trends across different windows and time periods.

Background

In Zeus, the framework already measures and records energy consumption for different components, but storing and visualizing these metrics in a time-series database has been challenging. Prometheus is a highly efficient monitoring system that provides real-time, multi-dimensional data collection, and visualization.

Prometheus typically follows a pull model where it scrapes metrics from an endpoint. However, in environments where long-running processes or jobs run on distributed systems, a push model becomes necessary. This is achieved using Prometheus Push Gateway, which allows intermediate results (such as energy metrics) from batch jobs to be pushed from applications.

Proposed Design

The integration will introduce three main metric types to capture energy data from Zeus:

  • Histogram: Records energy consumption over different windows, with custom bucket ranges based on hardware.

    • Buckets will track total energy consumption for components like GPU, CPU, and DRAM.
    • Warnings will be triggered if energy exceeds a specified bucket threshold.
  • Counter: Tracks the cumulative energy consumed across multiple windows over time. This metric is especially useful for long-running processes.

    • Counter is updated periodically (e.g., every 5 seconds).
  • Gauge: This will later be implemented to monitor power consumption at specific instances.

Each metric will be registered with a Prometheus CollectorRegistry, which organizes and tracks all metrics before they are pushed to the Push Gateway.

How the Collector Registry in Prometheus works in detail

IMG_5313E68C1455-1

Implementation

Usage Example:

# Histogram for total energy consumption over different windows
metric = HistogramMetric(monitor=monitor, bucket_ranges=[50.0, 100.0, 200.0, 300.0, 400.0, 500.0, 1000.0, 5000.0])
metric.begin_window("window1")
metric.end_window("window1")

metric = HistogramMetric(monitor=monitor) # Using default bucket ranges
metric.begin_window("window2")
metric.end_window("window2")
# Counter for cumulative energy consumption
metric = CounterMetric(monitor=monitor, update_period =5)
metric.begin_window("window1")  # Starts a polling process 

Code Breakdown:

Collector Registry

In both classes, the CollectorRegistry manages the metrics to be pushed to Prometheus. While a default registry can be used, a custom registry is being employed here for better separation and control over metrics.

registry = CollectorRegistry()

Abstract Base Class (ABC)

The Metric class is an abstract base class (ABC) that defines a common interface for different types of metrics. It provides two abstract methods, begin_window and end_window, that must be implemented by any subclasses, such as HistogramMetric or CounterMetric.

class Metric(abc.ABC):  
    @abc.abstractmethod
    def begin_window(self, name: str):
        pass

    @abc.abstractmethod
    def end_window(self, name: str):
        pass

HistogramMetric Class

The HistogramMetric class tracks energy consumption for GPU, CPU, and DRAM across pre-defined buckets. It uses prometheus_client. Histogram to observe energy consumption at the end of each window.

  • Constructor: __init__(self, monitor, bucket_ranges=None)

    • Parameters:
      • monitor: This parameter is an instance of ZeusMonitor, which handles the collection of energy data from different hardware components like GPU, CPU, and DRAM.
      • bucket_ranges: An optional parameter that defines the bucket ranges for the histogram. If not provided, a default set of bucket ranges will be used.
  • Histogram Initialization (self.energy_histogram):

    • total_energy: This is the name of the histogram metric
    • Total energy consumed: This is the description of the metric
    • ['component', 'window']: This label This will ensure that the histogram tracks both the component (e.g., CPU, GPU, DRAM) and the specific window during which energy consumption occurred.
    • buckets=bucket_ranges: The provided bucket ranges define how the data will be aggregated.
    • registry=registry:This argument links the histogram to a Prometheus CollectorRegistry, which organizes and manages the collected metrics.
class HistogramMetric(Metric):

    def __init__(self, monitor, bucket_ranges=None):
        if bucket_ranges is None:
            bucket_ranges = [50.0, 100.0, 200.0, 300.0, 400.0, 500.0, 1000.0, 5000.0, 10000.0] # Default bucket ranges
        self.energy_histogram = Histogram(
            'total_energy', 
            'Total energy consumed', 
            ['component', 'window'],
            buckets=bucket_ranges,
            registry=registry
        )
        self.max_bucket = max(bucket_ranges) # Save the maximum bucket value for warnings when exceeded
        self.monitor = monitor
  • Start Energy Monitoring Window
    • begin_window(self, name: str) calls self.monitor.begin_window(f"__HistogramMetric_{name}"), where self.monitor is an instance of ZeusMonitor.
    def begin_window(self, name: str):
        self.monitor.begin_window(f"__HistogramMetric_{name}")
  • Stop Energy Monitoring Window
    • end_window(self, name: str) retrieves the collected energy data, updates the histogram, triggers warnings for high energy usage, and pushes the metrics to the Prometheus Push Gateway.
    • The self.energy_histogram.labels(component="gpu").observe(total_gpu_energy) call registers the GPU energy in the Prometheus histogram under the "gpu" label.
    • push_to_gateway('localhost:9091', job='energy_monitoring', registry=registry) pushes the collected energy data to the Prometheus Push Gateway. This makes the histogram data available for external systems to query and visualize.
    def end_window(self, name: str):
        measurement = self.monitor.end_window(f"__HistogramMetric_{name}")

        total_gpu_energy = sum(measurement.gpu_energy.values())
        if total_gpu_energy > self.max_bucket :
            warnings.warn(f"GPU energy {total_gpu_energy} exceeds the maximum bucket value of {self.max_bucket}")
        self.energy_histogram.labels(component="gpu", window=f"__HistogramMetric_{name}").observe(total_gpu_energy)

        if measurement.cpu_energy:
            total_cpu_energy = sum(measurement.cpu_energy.values())
            if total_cpu_energy > self.max_bucket:
                warnings.warn(f"CPU energy {total_cpu_energy} exceeds the maximum bucket value of {self.max_bucket}")
            self.energy_histogram.labels(component="cpu", window=f"__HistogramMetric_{name}").observe(total_cpu_energy)

        if measurement.dram_energy:
            total_dram_energy = sum(measurement.dram_energy.values())
            if total_dram_energy > self.max_bucket:
                warnings.warn(f"DRAM energy {total_dram_energy} exceeds the maximum bucket value of {self.max_bucket}")
            self.energy_histogram.labels(component="dram", window=f"__HistogramMetric_{name}").observe(total_dram_energy)
            
        push_to_gateway('localhost:9091', job='energy_monitoring', registry=registry)

CounterMetric Class

The CounterMetric class continuously tracks cumulative energy consumption and pushes the updates at specified intervals (e.g., every 5 seconds).

  • Constructor: __init__(self, monitor, update_period: int):
    • Parameters:
      • monitor: An instance of ZeusMonitor used to track energy data.
      • update_period : An integer specifying the frequency (in seconds) for polling the energy data.
class CounterMetric(Metric):

    def __init__(self, monitor, update_period: int):
        # The constructor to accept monitor and update_period
        self.monitor = monitor  
        self. update_period = update_period  
  • Start Energy Monitoring Window
    • self.queue = mp.Queue() creates a queue to communicate with the child process. This will be used to signal the child process to stop.
    • self.proc = mp.Process(target=self._poll, args=(name, self.queue, self.monitor, self. update_period)) creates a new process to run the _poll method. The process will be passed the name, queue, monitor, and polling period as arguments.
    • self.proc.start() starts the child process, which will handle the background polling of energy data while the main process continues its execution.
    def begin_window(self, name: str):
        self.queue = mp.Queue()
        self.proc = mp.Process(target=self._poll, args=(name, self.queue, self.monitor, self. update_period))
        self.proc.start() #The main process continue as normal and the new process performs background work, running the _poll method
  • Stop Energy Monitoring Window
    • self.queue.put("stop") places a "stop" message in the queue to notify the child process to terminate.
    • self.proc.join() waits for the child process to finish execution and terminate cleanly before proceeding.
    def end_window(self, name: str):
        self.queue.put("stop")
        self.proc.join()
  • Polling Method

    • The _poll method is executed by the child process. It polls the energy consumption data at regular intervals (as specified by update_period ) and increments the Prometheus counter metric with the energy data.
    • The while loop continuously polls the energy data until a "stop" signal is received in the queue (while len(pipe) == 0). This checks if the queue is empty, which means no stop signal has been sent yet.
    • metric.labels(window_name=f"__CounterMetric_{name}").inc(total_energy) increments the counter with the energy data from the measurement. It associates this data with the specific window using the window_name label.
    def _poll(self, name: str, pipe: mp.Queue, monitor: monitor, update_period: float):
        metric = Counter(
            'counter_metric',  # Static metric name
            'Measures total energy consumption',
            ['window_name'],  # Label for the dynamic window name
            registry = registry
        )

        while len(pipe) == 0:  # Not stopped yet. (Until you get the signal "stop")
            monitor.begin_window(f"__CounterMetric_{name}")
            time.sleep(poll_period)
            measurement = monitor.end_window(f"__CounterMetric_{name}")
            total_energy = sum(measurement.gpu_energy.values() + measurement.cpu_energy.values() + measurement.dram_energy.values())
            metric.labels(window_name=f"__CounterMetric_{name}").inc(total_energy)
            push_to_gateway('localhost:9091', job='energy_consumption_total', registry=registry)

    # Doesn't really do much, because the counter metric has been pushed to Prometheus continuously over time (by the polling process)
    metric.end_window("name")

Next Steps

  • Include prometheus client library as optional dependency
  • Implement Gauge metric for power consumption monitoring
  • Write end to end tests for each type of metric to ensure the proper functioning of both HistogramMetric and CounterMetric
    • Spin up Prometheus server and gateway using docker
    • Query the Prometheus server to check if HistogramMetric, CounterMetric and GaugeMetric behave as expected
@jaywonchung
Copy link
Member

Thanks @sharonsyh for the awesome write-up!! This is going to be a great addition to Zeus.

  • It's not exactly clear to me what CollectorRegistry is for. What is the role of this class? Is this part of the Prometheus client library?
  • Similarly, is the Histogram class part of the Python Prometheus client library? If so, I want you to clearly distinguish what is going to be built as part of Zeus and what is going to be imported from external libraries.
  • I think there should be a way for users specify the endpoint of the Prometheus Push Gateway when defining metrics.
  • Since you're designing the histogram and counter metrics, can you also do it for gauge? It's not likely but you may find that the gauge metric is not exactly compatible with some parts of the current design (e.g., histogram, counter), in which case it's better to adjust the design early on before we do any work. I presume it will be very similar to CounterMetric.
  • I would use update_period instead of poll_period in CounterMetric.
  • I think using the default bucket range should also result in a warning, since it's likely that the default range will either be too small or too large.
  • I think it's very likely that GPU, CPU, and DRAM energy values have vastly different ranges. Thus, I think bucket ranges should be separately configurable and should have different defaults. Do different labels allow different buckets?
  • Which module are these going to be in? zeus.metrics?
  • In CounterMetric, shouldn't it be an mp.Queue?
  • I suppose when the polling process receives a "stop", it should push the energy value one last time to Prometheus before terminating.
  • In push_to_gateway (I'm assuming this is also part of the Prometheus client library), what is job? Would the user want to customize this?

@sharonsyh
Copy link
Collaborator Author

sharonsyh commented Sep 19, 2024

1. It's not exactly clear to me what CollectorRegistry is for. What is the role of this class? Is this part of the Prometheus client library?

CollectorRegistry is part of the Prometheus client library which acts as the collection hub for all metrics (Histogram, Counter, Gauge). Every metric is registered to a registry that tracks the metric values before pushing them to the Prometheus Push Gateway.

It is not necessary to use CollectorRegistry for us since we only have one registry and the default registry automatically tracks all metrics we define (like Histograms, Gauges, and Counters) without the need for you to manually pass a custom registry. However, to have a full control over what metrics are pushed and to separate metrics into different groups or namespaces, customizing the registry (using CollectorRegistry) seems to be a better way.

2. Similarly, is the Histogram class part of the Python Prometheus client library? If so, I want you to clearly distinguish what is going to be built as part of Zeus and what is going to be imported from external libraries.

The Histogram class is also part of the Prometheus client library. It is used to create and record histogram data, such as total energy consumption, with the capability to track consumption across different buckets.

  • From Zeus:

    • HistogramMetric, CounterMetric, and eventually GaugeMetric are custom classes designed in Zeus to wrap the Prometheus Histogram, Counter, and Gauge types and integrate them with ZeusMonitor.
  • From Prometheus Client Library:

    • CollectorRegistry: Manages and registers metrics.
    • Histogram: Tracks the distribution of energy consumption across pre-defined buckets.
    • Counter: Tracks cumulative energy consumption.
    • Gauge: (Will be used for instantaneous power consumption values).
    • push_to_gateway: Pushes metrics from Zeus to the Prometheus Push Gateway.

3. I think there should be a way for users to specify the endpoint of the Prometheus Push Gateway when defining metrics.

I agree it’s a good idea to allow the user to specify the Push Gateway endpoint at the time of defining the metrics. I assume it could be implemented with the code below:

    metric = HistogramMetric(monitor=monitor, bucket_ranges=[50, 100, 200], prometheus_url=SPECIFY THE ENDPOINT’)

4. Since you're designing the histogram and counter metrics, can you also do it for gauge? It's not likely but you may find that the gauge metric is not exactly compatible with some parts of the current design (e.g., histogram, counter), in which case it's better to adjust the design early on before we do any work. I presume it will be very similar to CounterMetric.

Yes, I have a plan to work on the structure for Gauge soon. I'll update to this write-up once I get it done!

5. I would use update_period instead of poll_period in CounterMetric.

Got it!

6. I think using the default bucket range should also result in a warning, since it's likely that the default range will either be too small or too large. I think it's very likely that GPU, CPU, and DRAM energy values have vastly different ranges. Thus, I think bucket ranges should be separately configurable and should have different defaults. Do different labels allow different buckets?

We can't have different bucket ranges for each component as each Histogram metric applies its bucket ranges globally across all labels for that particular histogram.

If we are to use different default bucket ranges, I assume we have to define Histogram metrics seperately for each component with its respective bucket ranges. Since it's very likely that users might only use one of GPU, CPU, or DRAM, creating separate histograms for each component with appropriate bucket ranges would allow flexibility while preventing issues with bucket ranges that might be too broad or too narrow.

# Previously
self.energy_histogram = Histogram(
            'total_energy', 
            'Total energy consumed', 
            ['component', 'window'],
            buckets=bucket_ranges,
            registry=registry
)

# Possible solution
# Seperating the Histogram metrics by each component
# Each default_gpu_buckets, default_cpu_buckets, default_dram_buckets will be defined separately

def __init__(self, monitor, gpu_buckets=None, cpu_buckets=None, dram_buckets=None):
    self.gpu_histogram = Histogram('gpu_energy', 'GPU energy consumption', buckets=gpu_buckets or [default_gpu_buckets], registry=registry)
    self.cpu_histogram = Histogram('cpu_energy', 'CPU energy consumption', buckets=cpu_buckets or [default_cpu_buckets], registry=registry)
    self.dram_histogram = Histogram('dram_energy', 'DRAM energy consumption', buckets=dram_buckets or [default_dram_buckets], registry=registry)

7. Which module are these going to be in? zeus.metrics?

zeus.metrics.py

8. In CounterMetric, shouldn't it be an mp.Queue?

My bad, should be mp.Queue!

9. I suppose when the polling process receives a "stop", it should push the energy value one last time to Prometheus before terminating.

Code Modified!

10. In push_to_gateway (I'm assuming this is also part of the Prometheus client library), what is job? Would the user want to customize this?

The job parameter in the push_to_gateway() function which is part of the Prometheus Client library, serves as an tag or identifier for a specific job or process. It groups the metrics that are pushed to the Prometheus Push Gateway under a job label.

# for energy monitoring metrics
push_to_gateway('localhost:9091', job='energy_monitoring', registry=registry)

# for power monitoring metrics
push_to_gateway('localhost:9091', job='power_monitoring', registry=registry)

Users might want to customize the name of the job since how metrics are viewed and organized in Prometheus could differ by name.

@jaywonchung
Copy link
Member

jaywonchung commented Sep 19, 2024

1. It's not exactly clear to me what CollectorRegistry is for. What is the role of this class? Is this part of the Prometheus client library?

Sounds good, let's keep the current registry design! But if it ends up being that we never expose any Prometheus client APIs to our users (e.g., I think users never call push_to_gateway themselves under the current design?), this may not be needed as well. Let's reevaluate this decision later.

2. Similarly, is the Histogram class part of the Python Prometheus client library? If so, I want you to clearly distinguish what is going to be built as part of Zeus and what is going to be imported from external libraries.

Do you think a more power/energy-specific naming is appropriate, especially is Zeus class names and Prometheus client class names are very similar? Like EnergyCumulativeCounter, EnergyHistogram, and PowerGauge.

3. I think there should be a way for users to specify the endpoint of the Prometheus Push Gateway when defining metrics.

Cool, now this can internally call push_to_gateway using prometheus_url.

6. I think using the default bucket range should also result in a warning, since it's likely that the default range will either be too small or too large. I think it's very likely that GPU, CPU, and DRAM energy values have vastly different ranges. Thus, I think bucket ranges should be separately configurable and should have different defaults. Do different labels allow different buckets?

I see, yeah. Since CPU/GPU/DRAM energy are distinct, it isn't too strange for them to be separate metrics. It should be important to have some consistent naming convention so that users can understand that a set of CPU/GPU/DRAM metrics are from the same Zeus Histogram metric. I guess something like f"{window_name}_gpu0_energy_joules. With that said, Prometheus metrics have naming conventions, so please make sure the metric names we generate follow best practices.

9. I suppose when the polling process receives a "stop", it should push the energy value one last time to Prometheus before terminating.

Actually I think I might have read something strangely last time 😅 With the modified code snippet, monitor.end_window will be called on a window that was never started, which is an error. After coming back from time.sleep, the process will unconditionally do a push first and then check whether it has to stop. I think this is good enough.

10. In push_to_gateway (I'm assuming this is also part of the Prometheus client library), what is job? Would the user want to customize this?

Under the current design, are users ever supposed to call push_to_gateway themselves? I think our metric classes call push_to_gateway under the hood, and that's why we needed to add the prometheus_url parameter. If job is what users would be interested in setting, our metric class constructors should accept it.

@sharonsyh
Copy link
Collaborator Author

I'm little curious under what circumstances monitor.end_window would be called on a window that was never started. Since our polling method operates on a single child process, which runs synchronously without interruption, and we consistently push metrics without requiring end_window, I’m not sure when this scenario would occur.

@jaywonchung
Copy link
Member

I was eyeballing this code snippet:

    def _poll(self, name: str, pipe: mp.Queue, monitor: monitor, update_period: float):
        metric = Counter(
            'counter_metric',  # Static metric name
            'Measures total energy consumption',
            ['window_name'],  # Label for the dynamic window name
            registry = registry
        )

        while True:  # Not stopped yet. (Until you get the signal "stop")
            if not pipe.empty():
                signal = pipe.get()
                if signal == "stop":
                    measurement = monitor.end_window(f"__CounterMetric_{name}")
                    total_energy = sum(measurement.gpu_energy.values() + measurement.cpu_energy.values() + measurement.dram_energy.values())
                    metric.labels(window_name=f"__CounterMetric_{name}").inc(total_energy)
                    push_to_gateway('localhost:9091', job='energy_consumption_total', registry=registry)
                    break    
            monitor.begin_window(f"__CounterMetric_{name}")
            time.sleep(update_period)
            measurement = monitor.end_window(f"__CounterMetric_{name}")
            total_energy = sum(measurement.gpu_energy.values() + measurement.cpu_energy.values() + measurement.dram_energy.values())
            metric.labels(window_name=f"__CounterMetric_{name}").inc(total_energy)
            push_to_gateway('localhost:9091', job='energy_consumption_total', registry=registry)

    # Doesn't really do much, because the counter metric has been pushed to Prometheus continuously over time (by the polling process)
    metric.end_window("name")
  1. The polling process goes to sleep by calling time.sleep.
  2. While the polling process is asleep, the outside world tells it to stop with pipe.put("stop").
  3. The polling process wakes up and calls monitor.end_window, pushes to the gateway, and moves to the next iteration of the outer while loop.
  4. It finds that the pipe is not empty and the message equals "stop", at which point it calls monitor.end_window again. monitor.begin_window was never called, so this is an error.

@sharonsyh
Copy link
Collaborator Author

Got it, thanks for the explanation! I'll also update the code based on your feedback in terms of naming conventions and constructor.

@jaywonchung jaywonchung linked a pull request Oct 17, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants