Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grouping alarms which might be caused due to the same reason #361

Open
rhl-bthr opened this issue Mar 15, 2024 · 0 comments
Open

Grouping alarms which might be caused due to the same reason #361

rhl-bthr opened this issue Mar 15, 2024 · 0 comments

Comments

@rhl-bthr
Copy link

rhl-bthr commented Mar 15, 2024

Is your feature request related to a problem? Please describe.
While debugging the alarms in FoundationDB's operator, I noticed that 266 out of 270 were raised because Acto was adding a new key in the spec/processes named ACTOKEY, and a reasonably well-thought object as it's value. However, in all the 266 alarms, Acto was modifying the values within the object itself, but the tests were failing since the operator did not accept the name ACTOKEY, and only accepted a set of predefined values.

Acto can attempt to group the failing test cases to give a better insight into the root cause of the problem.

Describe the solution you'd like
A way to do this is to look at all the modifications made as a large decision tree, the leaves of which indicate if the value of ALARM is true or false. Then, Acto can do a breadth first search on all the nodes of the tree and check if all the leaves of a node result as true. If it does, then all the test cases of that node. It should be a reasonable assumption that the alarm is likely caused by the same reason, which is the value modified at that node of the decision tree.

Describe alternatives you've considered
Ofcourse the right place for this would be for the operator to either,

  1. Not accept an arbitrary string in spec/processes and already define all the keys it can accept, and then mark whichever it's not using as NULL or,
  2. Throw an error! Currently no erorr was displayed in any of the logs, which made it slightly hard to debug.

But hoping for operators to do the right thing defeats the purpose of the project ;)

Additional context
xlab-uiuc/kube-523#135

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant