Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus integration #133

Closed
wants to merge 92 commits into from
Closed

Prometheus integration #133

wants to merge 92 commits into from

Conversation

sharonsyh
Copy link
Collaborator

@sharonsyh sharonsyh commented Oct 17, 2024

This pull request introduces Prometheus-based metric tracking for energy and power usage within the Zeus framework. It includes functionality for monitoring GPU, CPU, and DRAM energy usage via Histograms, Cumulative Counters, and Gauges.

  • zeus/metric.py:
    A new module that introduces EnergyHistogram, EnergyCumulativeCounter, and PowerGauge classes. These classes enable real-time monitoring of CPU, GPU, and DRAM energy and power consumption by integrating with Prometheus.

  • zeus/prometheus.yml:
    Configuration file for setting up Prometheus monitoring.

  • zeus/docker-compose.yml:
    A Docker Compose file for easily setting up Prometheus with the project for local or cloud-based monitoring.

  • Modified pyproject.toml:
    Added prometheus-client as an optional dependency for Prometheus metric integration.

@jaywonchung
Copy link
Member

jaywonchung commented Oct 17, 2024

Some quick points:

  • The two YAML files do not belong in zeus/. Please move them into docker/. If appropriate, feel free to create a subdirectory inside docker/.
  • Merge conflicts should be resolved and CI should pass.
  • Tests and documentation (under docs/) are missing. If you're not sure about how to write the doc, let's discuss separately.

pyproject.toml Outdated Show resolved Hide resolved
- 9091:9091
networks:
localprom:
driver: bridge
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all files, please make sure they end with a newline. It's an editor issue.

zeus/metric.py Outdated Show resolved Hide resolved
@jaywonchung
Copy link
Member

jaywonchung commented Oct 18, 2024

Why was push_docker.yaml modified in the first place? I see that this branch has old commits from when you added the multi-arch docker image feature, suggesting that you continued working on an old merged branch instead of creating a new branch for Prometheus integration. If push_docker.yaml was modified in master separately, this will create a merge conflict. Instead of trying to resolve this merge conflict, I would rather create a fresh new branch from master and just copy over the new/modified files to the new branch. If you decide to do that, we can just close the PR and you can open a new PR with the new branch.

#106 (review)
This is the second time you reused an old branch and I already asked you not to.

@sharonsyh
Copy link
Collaborator Author

Why was push_docker.yaml modified in the first place? I see that this branch has old commits from when you added the multi-arch docker image feature, suggesting that you continued working on an old merged branch instead of creating a new branch for Prometheus integration. If push_docker.yaml was modified in master separately, this will create a merge conflict. Instead of trying to resolve this merge conflict, I would rather create a fresh new branch from master and just copy over the new/modified files to the new branch. If you decide to do that, we can just close the PR and you can open a new PR with the new branch.

#106 (review) This is the second time you reused an old branch and I already asked you not to.

I realized my newly created prometheus-integration branch was created from a branch that had already been merged. I will create a fresh branch from master, copy the new or modified files over, and open a new PR.

@sharonsyh sharonsyh closed this Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RFC] Integration of Prometheus Push Gateway and Energy Metrics Collection in Zeus
2 participants