Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Common: Check_expired_locked_rules modernization. #127 #133

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

voetberg
Copy link
Contributor

@voetberg voetberg commented Mar 6, 2024

Changes:
- Change text-only queries to poll the data model (rucio.db.sqla.models)
- Push results to a remote (See documentation of probes for descriptions). Names: locked_expired_rules.(rse), locked_expired_rules.dids.(rse), locked_expired_rules.rules_for_dids.(rse)
- included a check that still updates the probe even if the result is 0, to ensure the dashboards reading the probe always have data.

Sqla update uses pr #46 as a basis.

cc: @ericvaandering

Copy link
Contributor

@dchristidis dchristidis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m afraid there’s been some miscommunication. I must repeat what I wrote a few weeks ago: this probe is currently listing the expired locked rules.

Locked expired rules
…
Datasets expired with locked rules
…

This behaviour is not optional; its removal is not acceptable to ATLAS.

What I would suggest is to:

  1. Port the queries to SQLAlchemy, but without changing what they do
  2. Add logic to do the grouping and counting client-side
  3. Push those metrics to Prometheus
  4. Make any other miscellaneous changes

@voetberg
Copy link
Contributor Author

voetberg commented Mar 7, 2024

Thanks for the comments, those are fairly easy changes to make.

I suppose the confusion stems from my lack of understanding how atlas monitoring works - how are you picking up the results of these probes? If I knew that I think I could avoid making breaking changes in the future.

@voetberg voetberg marked this pull request as draft March 7, 2024 13:59
@dchristidis
Copy link
Contributor

ATLAS uses Nagios to run its probes. Hence, the output is captured. Most of the time we don’t consult it, but this is one of the few probes where we do (so that our operators don’t have to identify themselves which rules to unlock).

@voetberg voetberg force-pushed the modernize_common_without_prom branch 2 times, most recently from 4fb2c04 to 595b45c Compare March 7, 2024 19:47
@voetberg voetberg marked this pull request as ready for review March 7, 2024 19:51
@voetberg voetberg force-pushed the modernize_common_without_prom branch from 595b45c to 036eeeb Compare March 7, 2024 20:34
@ericvaandering
Copy link
Contributor

ericvaandering commented Mar 7, 2024

Sorry. Am I missing something? Don't lines 60 and 101 produce exactly the same output that 33 and 47 in the original do?

@ericvaandering
Copy link
Contributor

The intent of that SQLAlchemy was definitely to reproduce the exact query. I would assume this was verified back some time ago (and one can have SQLAlchemy print the generated query). Is there something you saw?

@dchristidis
Copy link
Contributor

Sorry. Am I missing something? Don't lines 60 and 101 produce exactly the same output that 33 and 47 in the original do?

The intent of that SQLAlchemy was definitely to reproduce the exact query. I would assume this was verified back some time ago (and one can have SQLAlchemy print the generated query). Is there something you saw?

The content of the commit has been altered since I made the initial review.

@voetberg
Copy link
Contributor Author

voetberg commented Mar 8, 2024

Sorry. Am I missing something? Don't lines 60 and 101 produce exactly the same output that 33 and 47 in the original do?

The intent of that SQLAlchemy was definitely to reproduce the exact query. I would assume this was verified back some time ago (and one can have SQLAlchemy print the generated query). Is there something you saw?

The content of the commit has been altered since I made the initial review.

Yeah sorry about the confusion, original commit skipped the printing/summary dictionary step and just queried the count reported in the probe directly.

@ericvaandering
Copy link
Contributor

Ahh. Good.

Copy link
Contributor

@dchristidis dchristidis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the comments to the first portion also apply to the second.

common/check_expired_locked_rules Outdated Show resolved Hide resolved
common/check_expired_locked_rules Outdated Show resolved Hide resolved
common/check_expired_locked_rules Outdated Show resolved Hide resolved
common/check_expired_locked_rules Outdated Show resolved Hide resolved
common/check_expired_locked_rules Outdated Show resolved Hide resolved
common/check_expired_locked_rules Outdated Show resolved Hide resolved
common/check_expired_locked_rules Outdated Show resolved Hide resolved
common/check_expired_locked_rules Outdated Show resolved Hide resolved
common/check_expired_locked_rules Outdated Show resolved Hide resolved
common/check_expired_locked_rules Outdated Show resolved Hide resolved
common/check_expired_locked_rules Outdated Show resolved Hide resolved
common/check_expired_locked_rules Outdated Show resolved Hide resolved
common/check_expired_locked_rules Outdated Show resolved Hide resolved
@dchristidis
Copy link
Contributor

Side note: kindly allow me to resolve the conversations myself. It helps me to keep track of what has been addressed and what requires further work.

@voetberg
Copy link
Contributor Author

voetberg commented Aug 5, 2024

Side note: kindly allow me to resolve the conversations myself. It helps me to keep track of what has been addressed and what requires further work.

Sounds good - let me know if there's anything else I can do to accommodate your preferred review style

…ta model. rucio#127

Changes:
        - Change text-only queries to poll the data model (rucio.db.sqla.models)
        - Push results to a remote (See documentation of probes for discriptions). Names: locked_expired_rules.(rse), locked_expired_rules.dids.(rse)
…tements to use true() and null() options, use default dictionary as way to collect results
@voetberg voetberg force-pushed the modernize_common_without_prom branch from 7f69b96 to d6ba22c Compare August 5, 2024 15:26
common/check_expired_locked_rules Outdated Show resolved Hide resolved
common/check_expired_locked_rules Outdated Show resolved Hide resolved
models.ReplicationRule.rse_expression,
)

# Use prometheus pusher to send results to a remote service
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that comment necessary? I feel like this and some other below are superfluous.

# Print rules for nagios monitoring
# Send to Prometheus pusher

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think they're worth keeping, we did have that large conversation at the start of this PR that I can't go and delete print statements. Because metrics are being consumed different ways in common it's useful to know what lines are doing that.

common/check_expired_locked_rules Outdated Show resolved Hide resolved
common/check_expired_locked_rules Outdated Show resolved Hide resolved
common/check_expired_locked_rules Show resolved Hide resolved
common/check_expired_locked_rules Outdated Show resolved Hide resolved
common/check_expired_locked_rules Outdated Show resolved Hide resolved
@voetberg voetberg self-assigned this Aug 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants