Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "retention" feature allowing idle metrics to expire #204

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

phihos
Copy link

@phihos phihos commented Jul 31, 2022

As proposed in #20 I implemented a way to auto remove idle metrics at runtime without needing to restart.
Example:

<metric>
  name message_foo_counter
  type counter
  desc The total number of foo in message.
  key foo
  retention 3600 # 1h
  retention_check_interval 1800 # 30m
  <labels>
    bar ${bar}
  </labels>
</metric>

If ${bar} was baz one time but after that no records with that value were processed, then after one hour the metric
foo{bar="baz"} might be removed.
When this actually happens depends on retention_check_interval (default 60).
It causes a background thread to check every 30 minutes for expired metrics.
So worst case the metrics are removed 30 minutes after expiration.

The naming of the config keys were shamelessly stolen from inspired by grok_exporter to make this feature more familiar to people using the grok_exporter.

Additional to the implementation I had to refactor the Metrics class to directly implement instrument(record, expander) and put subclass-specific logic into value(record), set_value?(value) and set_value(value, labels).
That reduces code duplication and was necessary for not introducing further duplicates.

I also had to introduce a new data store based on the default data store of prometheus/client_ruby to allow for removal of elements.

The last thing I want to mention is that I need to use the thread helper to start cleaning expired metrics in the background. I first tried to use the timer helper but it caused a test to go into an infinite loop.

If you need more tests or need other alterations to the code please let me know. I am looking forward to your feedback 🙂

Copy link
Member

@ashie ashie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for my late response.
Could you resolve conflict?

@@ -22,6 +24,17 @@ def configure(conf)
@metrics = Fluent::Plugin::Prometheus.parse_metrics_elements(conf, @registry, labels)
end

def start
super
Fluent::Plugin::Prometheus.start_retention_threads(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really need to use thread?
I'm not sure but timer_execute (run it in plugin thread) might be enough to do it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@phihos any update?

@phihos
Copy link
Author

phihos commented Nov 4, 2022

Sorry I had two very busy weeks. I hope I get to it tomorrow.

@gromnsk
Copy link

gromnsk commented Dec 21, 2022

@phihos any update?

@AlbusLumos
Copy link

@phihos is this still on-going?

@dkulchinsky
Copy link

Hey @phihos 👋🏼 wondering if you still have cycles to work on this? I can try and take over though not as experienced with Ruby.

Looks like there was mainly a concern whether a dedicated thread is needed, instead of running it in the plugin thread.

@Lusitaniae
Copy link

Wonder if you had the bandwidth to pick this up again @phihos

🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants