-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Common: Check_expired_locked_rules modernization. #127 #133
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m afraid there’s been some miscommunication. I must repeat what I wrote a few weeks ago: this probe is currently listing the expired locked rules.
Locked expired rules
…
Datasets expired with locked rules
…
This behaviour is not optional; its removal is not acceptable to ATLAS.
What I would suggest is to:
- Port the queries to SQLAlchemy, but without changing what they do
- Add logic to do the grouping and counting client-side
- Push those metrics to Prometheus
- Make any other miscellaneous changes
Thanks for the comments, those are fairly easy changes to make. I suppose the confusion stems from my lack of understanding how atlas monitoring works - how are you picking up the results of these probes? If I knew that I think I could avoid making breaking changes in the future. |
ATLAS uses Nagios to run its probes. Hence, the output is captured. Most of the time we don’t consult it, but this is one of the few probes where we do (so that our operators don’t have to identify themselves which rules to unlock). |
4fb2c04
to
595b45c
Compare
595b45c
to
036eeeb
Compare
Sorry. Am I missing something? Don't lines 60 and 101 produce exactly the same output that 33 and 47 in the original do? |
The intent of that SQLAlchemy was definitely to reproduce the exact query. I would assume this was verified back some time ago (and one can have SQLAlchemy print the generated query). Is there something you saw? |
The content of the commit has been altered since I made the initial review. |
Yeah sorry about the confusion, original commit skipped the printing/summary dictionary step and just queried the count reported in the probe directly. |
Ahh. Good. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the comments to the first portion also apply to the second.
Side note: kindly allow me to resolve the conversations myself. It helps me to keep track of what has been addressed and what requires further work. |
Sounds good - let me know if there's anything else I can do to accommodate your preferred review style |
…ta model. rucio#127 Changes: - Change text-only queries to poll the data model (rucio.db.sqla.models) - Push results to a remote (See documentation of probes for discriptions). Names: locked_expired_rules.(rse), locked_expired_rules.dids.(rse)
…tements to use true() and null() options, use default dictionary as way to collect results
7f69b96
to
d6ba22c
Compare
d6ba22c
to
68ccd17
Compare
models.ReplicationRule.rse_expression, | ||
) | ||
|
||
# Use prometheus pusher to send results to a remote service |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that comment necessary? I feel like this and some other below are superfluous.
# Print rules for nagios monitoring
# Send to Prometheus pusher
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think they're worth keeping, we did have that large conversation at the start of this PR that I can't go and delete print statements. Because metrics are being consumed different ways in common it's useful to know what lines are doing that.
Changes:
- Change text-only queries to poll the data model (rucio.db.sqla.models)
- Push results to a remote (See documentation of probes for descriptions). Names: locked_expired_rules.(rse), locked_expired_rules.dids.(rse), locked_expired_rules.rules_for_dids.(rse)
- included a check that still updates the probe even if the result is 0, to ensure the dashboards reading the probe always have data.
Sqla update uses pr #46 as a basis.
cc: @ericvaandering