-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add setter methods map to WMWorkload and call it for all reqArg parameters #12099
base: master
Are you sure you want to change the base?
Add setter methods map to WMWorkload and call it for all reqArg parameters #12099
Conversation
Jenkins results:
|
Jenkins results:
|
@amaltaro Please take a look at this PR, I think this one fully covers all the requirements and addresses our fears for affecting scalability due to increased database calls. |
Jenkins results:
|
|
||
# Commit the changes of the current workload object to the database: | ||
workload.saveCouchUrl(workload.specUrl()) | ||
|
||
# Commit all Global WorkQueue changes per workflow in a single go: | ||
self.gq_service.updateElementsByWorkflow(workload.name(), reqArgs, status=['Available']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this call, you will actually change the behavior of priority to get only applied to Available elements.
In addition, do you ensure now that reqArgs
will always be only the supported parameters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this call, you will actually change the behavior of priority to get only applied to Available elements.
Well, does it even matter? As we discussed with you few days ago - do we actually know if we continue to update any Worqueue Element's parameters at the agent once it is acquired from the GlobalWorkQueue ? If we do, then it matters also for Sitelists as well. So IIRC, during our discussion we decided to stay safe and update only available WQE and not to risk getting into some race conditions related to input data location etc. If we do not care I can remove the requirement for status=['Availiable']
. But if we care to propagate the Priority
information down to the agent for every WQE status - Aqcuired et. all.
, but we want to update site information only at Global WQ, then we simply cannot have a mechanism to do it in a single push. The two calls should be separated. So we cannot have them both:
- a single DB action for updating WQE
- different WQE status requirement based on request parameters
(an escape of this is to hardcode this in the Workqueue module logic ofupdateelementsByWorkflow
, but you know my opinion on hardcoding requirements in the code.
In addition, do you ensure now that reqArgs will always be only the supported parameters?
As long as we follow the same logic as we did while validating the allowed parameters in the first set of actions in these calls (meaning to call validate_request_update_args
before calling the put
method and hence _handleNoStatusUpdate
) then we are keeping the same logic as before - hence following the map of allowed actions from
WMCore/src/python/WMCore/ReqMgr/DataStructs/RequestStatus.py
Lines 118 to 147 in 7ddd036
ALLOWED_ACTIONS_FOR_STATUS = { | |
"new": ["RequestPriority"], | |
"assignment-approved": ["RequestPriority", "Team", "SiteWhitelist", "SiteBlacklist", | |
"AcquisitionEra", "ProcessingString", "ProcessingVersion", | |
"Dashboard", "MergedLFNBase", "TrustSitelists", | |
"UnmergedLFNBase", "MinMergeSize", "MaxMergeSize", | |
"MaxMergeEvents", "BlockCloseMaxWaitTime", | |
"BlockCloseMaxFiles", "BlockCloseMaxEvents", "BlockCloseMaxSize", | |
"SoftTimeout", "GracePeriod", | |
"TrustPUSitelists", "CustodialSites", | |
"NonCustodialSites", "Override", | |
"SubscriptionPriority"], | |
"assigned": ["RequestPriority"], | |
"staging": ["RequestPriority", "SiteWhitelist", "SiteBlacklist"], | |
"staged": ["RequestPriority"], | |
"acquired": ["RequestPriority", "SiteWhitelist", "SiteBlacklist"], | |
"running-open": ["RequestPriority", "SiteWhitelist", "SiteBlacklist"], | |
"running-closed": ["RequestPriority"], | |
"failed": [], | |
"force-complete": [], | |
"completed": [], | |
"closed-out": [], | |
"announced": [], | |
"aborted": [], | |
"aborted-completed": [], | |
"rejected": [], | |
"normal-archived": [], | |
"aborted-archived": [], | |
"rejected-archived": [], | |
} |
reqArgs
to setterMethod
map at updateWorkloadArgs
:
self.settersMap['RequestPriority'] = setterTuple('RequestPriority', self.setPriority, inspect.signature(self.setPriority))
self.settersMap['SiteBlacklist'] = setterTuple('SiteBlacklist', self.setSiteBlacklist, inspect.signature(self.setSiteBlacklist))
self.settersMap['SiteWhitelist'] = setterTuple('SiteWhitelist', self.setSiteWhitelist, inspect.signature(self.setSiteWhitelist))
If we want to be extra detailed and fully exhaustive on any possible reqarg
per status
setters, we may also include into this map the eventual methods for all allowed actions per assignment-approved
status.:
"assignment-approved": ["RequestPriority", "Team", "SiteWhitelist", "SiteBlacklist",
"AcquisitionEra", "ProcessingString", "ProcessingVersion",
"Dashboard", "MergedLFNBase", "TrustSitelists",
"UnmergedLFNBase", "MinMergeSize", "MaxMergeSize",
"MaxMergeEvents", "BlockCloseMaxWaitTime",
"BlockCloseMaxFiles", "BlockCloseMaxEvents", "BlockCloseMaxSize",
"SoftTimeout", "GracePeriod",
"TrustPUSitelists", "CustodialSites",
"NonCustodialSites", "Override",
"SubscriptionPriority"],
But for some of them the logic may need to change because some of those setters are parametrized by more than a single argument. While at the same time we already know that for assignment-approved
we would never call _handleNoStatusUpdate
... So that's why, to me it seems safe to proceed only with those 3 setters mapped. Of course, if you ask me - I am completely up for moving the whole logic to be implemented here in a more generic way .... for all status updates, then get rid of a big chunk of code covering custom cases ... and only make the proper calls to this generic method here from upstream modules (e.g. Request in the current case) But I do not think we will have the time during this line of development here to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might be able to distribute this update, but before answering how we can proceed, can you please check the JobUpdater code to check:
- in which databases are the WQEs updated?
- does it use any status filter? If so, which statuses?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @amaltaro,
ok, Here it is:
-
JobUpdater
at the agent uses the very same methodupdatePriority
def updatePriority(self, wf, priority):
from theWMCore.Services.WorkQueue
to update the workflow priority (just as before), here is the actual call:
self.workqueue.updatePriority(workflow, priorityCache[workflow]) -
It checks for any diff between the priority in WMBS and couch database before taking actions:
if workqueuePrio != priorityCache[workflow]:
and:
if wmbsPrio != priorityCache[workflow]: -
The couchd database to be used is taken from the agent configuration for the component (actually the regular local workqueue component, not the
JobUpdater
config section):
WMCore/src/python/WMComponent/JobUpdater/JobUpdaterPoller.py
Lines 50 to 51 in 4e0759d
self.workqueue = WorkQueue(self.config.WorkQueueManager.couchurl, self.config.WorkQueueManager.dbname)
which in this case is (taken from one agent configuration):
config.WorkQueueManager.dbname = 'workqueue'
So the inbox db is indeed configured at the WorkqueueManager
component at the agent, but as long as I do not see any actual actions during this update from the updatepriority
method, I do not think we even touch this DB while updating the WQE priority at the agents.
- I found No filters being applied for any of the WQE, but the workflows considerred are the ones returned by:
workflowsToCheck = self.workqueue.getAvailableWorkflows()
and:
WMCore/src/python/WMCore/Services/WorkQueue/WorkQueue.py
Lines 209 to 218 in 4e0759d
def getAvailableWorkflows(self): """Get the workflows that have all their elements available in the workqueue""" data = self.db.loadView('WorkQueue', 'elementsDetailByWorkflowAndStatus', {'reduce': False, 'stale': 'update_after'}) availableSet = set((x['value']['RequestName'], x['value']['Priority']) for x in data.get('rows', []) if x['key'][1] == 'Available') notAvailableSet = set((x['value']['RequestName'], x['value']['Priority']) for x in data.get('rows', []) if x['key'][1] != 'Available') return availableSet - notAvailableSet
# call the proper setter methods bellow. | ||
|
||
# populate the current instance settersMap | ||
self.settersMap['RequestPriority'] = setterTuple('RequestPriority', self.setPriority, inspect.signature(self.setPriority)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that you are already checking the parameter name and associating it to the actual function (plus inspecting its signature), isn't it more clear to simply call the actual method according to the parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this is exactly what happens here. Well few line bellow though but still the same:
self.settersMap[reqArg].setterFunc(argValue)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I understand it. What I am saying is that creating a map and passing a pointer to a function does not look as friendly as I classic if-elif dealing with the 3 parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
creating a map gives the option of following the same pattern for adding as many setter calls (workflow parameter setter) as one may wish, by providing a generic method of validating and assembling the calls and not changing the logic for the rest of the function (i.e. not forking with hundreds of elifs, and getting into the trap of code growth by number of custom usecases). Here, the addition of a new setter is just a matter of adding yet another line in the map, following the same pattern as the rest in the map i.e.:
a named tuple of: ('reqArg', 'setterFunc', 'setterSignature')
So pretty universal way I'd say. And perfectly serves the exact purpose of what is needed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the rationale for implementing something generic, but right now I am not aware of any other spec parameters that will have to be changed, so it's definitely not hundreds of future elifs.
In addition, I am very much adept of the KISS principle, whenever possible and fit. To me, this overly complex code (personally I don't know any of those inspect methods) can easily lead to a bug, or a buggy development.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amaltaro , @anpicci , @todor-ivanov
Given that Python (or any programming language) offers multiple ways to achieve the same outcome, evaluating the best approach can often be subjective, depending on factors such as experience, proficiency, or personal style. Consequently, these discussions can become matters of personal preference.
For example, choosing between a full for-loop or list comprehension is a matter of choice. While we can debate which one might be "better" based on various criteria, both serve the same functional purpose. In this instance, I feel the this particular conversation is similar, and rather than debating stylistic choices, the focus should be on whether the code achieves its intended purpose. Otherwise, we risk undermining the role of both developer and reviewer by favoring one approach over another, which I find counterproductive.
If there are concerns about correctness or functionality, I would recommend incorporating unit tests for validation instead of continuing to focus on the specific implementation style.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not coding style and personal preferences, but I see it as code readability and sustainability.
Anyhow, thank you for sharing your perspective on this matter. I won't bug people anymore on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alan,
This:
try:
if 'RequestPriority' in reqArgs:
self.setPriority(reqArgs['RequestPriority'])
if 'SiteBlacklist' in reqArgs:
self.setSiteBlacklist(reqArgs['SiteBlacklist'])
if 'SiteWhitelist' in reqArgs:
self.setSiteWhitelist(reqArgs['SiteWhitelist'])
except Exception as ex:
msg = f"Failure to update workload parameters. Details: {str(ex)}"
raise WMWorkloadException(msg) from None
is simply not equal to this:
for reqArg, argValue in reqArgs.items():
if not self.settersMap.get(reqArg, None):
msg = f"Unsupported or missing setter method for updating reqArg: {reqArg}."
raise WMWorkloadException(msg)
try:
self.settersMap[reqArg].setterSignature.bind(argValue)
except TypeError as ex:
msg = f"Setter's method signature does not match the method calls we currently support: Error: req{str(ex)}"
raise WMWorkloadException(msg) from None
# Now go through the reqArg again and call every setter method according to the map
for reqArg, argValue in reqArgs.items():
try:
self.settersMap[reqArg].setterFunc(argValue)
except Exception as ex:
currFrame = inspect.currentframe()
argsInfo = inspect.getargvalues(currFrame)
argVals = {arg: argsInfo.locals.get(arg) for arg in argsInfo.args}
msg = f"Failure while calling setter method {self.settersMap[reqArg].setterFunc.__name__} "
msg += f"With arguments: {argVals}"
msg += f"Full exception string: {str(ex)}"
raise WMWorkloadException(msg) from None
At the end, in those both implementations, if the call to the setter method is wrong they will both fail the call, but the later is much more descriptive. It recognizes (at least 3) different possible conditions of failure and forwards the proper message for later debugging, while the former just fails the call and masks all the relevant information for the one who is to chase eventual problems. And that was the exact reason why I went for this implementation. I know you'd always prefer the shorter version, but sometimes it hides/masks valuable information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something more, since this is supposed to be a method triggered by a user's call rather than an operation triggered by the internals of our system, the later method inspects the stack and gives you the values of all parameters in the call. Meaning it shows you the exact (eventually) user mistake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Talking about inspection and validation. Validation of the user parameters is not supposed to happen at this layer (at the setter call). Validation should be performed in the Service/Request class (which is likely calling the Validation utils).
With that said, I just noticed that we are not validating the site lists, as we usually do for workflow assignment.
So we should add that validation in here as well to protect the system and us from unneded debugging. For reference, this is how we define the site lists and the validation function:
https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/WMSpec/StdSpecs/StdBase.py#L1147
…eters Add a method in the WorkQueue service for updating all wf elements with all parameters in a sinigle go Pylint fixes Remove WQE status filter
3ea670b
to
099da8b
Compare
Jenkins results:
|
Jenkins results:
|
hi @amaltaro I have addressed your comments - Mostly, I removed the WQE status filter, based on the investigation I did on the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Todor, please find more comments along the code.
Once we are about to converge on this development, I think it is important to test it in a Dev environment (both for central services and WMAgent).
# call the proper setter methods bellow. | ||
|
||
# populate the current instance settersMap | ||
self.settersMap['RequestPriority'] = setterTuple('RequestPriority', self.setPriority, inspect.signature(self.setPriority)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the rationale for implementing something generic, but right now I am not aware of any other spec parameters that will have to be changed, so it's definitely not hundreds of future elifs.
In addition, I am very much adept of the KISS principle, whenever possible and fit. To me, this overly complex code (personally I don't know any of those inspect methods) can easily lead to a bug, or a buggy development.
|
||
# Update all WorkQueue elements with the parameters provided in a single push | ||
if elementsToUpdate: | ||
self.updateElements(*elementsToUpdate, **updateParams) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe I've seen you answering my question about these updateParams
.
Are we completely 100% sure that the only parameters that will reach this update are one and/or a combination of:
- RequestPriority
- SiteWhitelist
- SiteBlacklist
?
Of course, also considering that we only want to update these values when there is an actual update to the value.
In addition, I would like you to test this in WMAgent with a crafted workflow/scenario, please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe I've seen you answering my question about these updateParams
Yes the answer is the second one in line, here from this comment: #12099 (comment)
In addition, I would like you to test this in WMAgent with a crafted workflow/scenario, please.
I had it tested. With a full validation campaign, not a single workflow, in my dev cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could not find an exact answer to the question above.
Additionally, the code you are deleting had an else statement populating a list of not handled arguments:
https://github.com/dmwm/WMCore/pull/12099/files#diff-120ee6838284a3d1c1799f511da7f147179d0a955f87d0da6fc8b58a8b66c794L440
which makes me believe that it can receive parameters other than only those 3.
About the validation, to properly validate these changes, we need to actually trigger specific scenarios and actions. The standard "campaign-style" validation is not going to expose any issues with this code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@todor-ivanov @amaltaro I don't know if you have already converged on a common ground, but here is my opinion about map vs if-elif-else approach.
Considering that I am not as experienced as both of you with the WMCore software, to me the map approach looks clearer and more understandable than the if-elif-else approach. Indeed, I can read which are the parameters of interest at the very beginning, and from these lines I can see that only one parameter among RequestPriority, SiteWhitelist, SiteBlacklist can be modified. The all process doesn't require more intellectual effort than the previous if-elif-else approach, and in addition to me sounds more robust and easier to debug (where with "debug" I am also referring to "actually fix a potential issue").
Regarding the possible concerns:
I believe I've seen you answering my question about these
updateParams
. Are we completely 100% sure that the only parameters that will reach this update are one and/or a combination of:
* RequestPriority
* SiteWhitelist
* SiteBlacklist
?
Of course, also considering that we only want to update these values when there is an actual update to the value.
Given the original code, we can assume that this is a complete list of parameters that are supposed to reach this update, provided that what we want to get with this PR doesn't require other parameters to be properly updated
Additionally, the code you are deleting had an else statement populating a list of not handled arguments:
https://github.com/dmwm/WMCore/pull/12099/files#diff-120ee6838284a3d1c1799f511da7f147179d0a955f87d0da6fc8b58a8b66c794L440
which makes me believe that it can receive parameters other than only those 3.
This occurrence should be handled by the try-except in these lines, right @todor-ivanov? @amaltaro do you have any concern that this approach could fail to prevent updating parameters other than RequestPriority, SiteWhitelist, and SiteBlacklist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anpicci I think you meant to send this reply in this thread instead https://github.com/dmwm/WMCore/pull/12099/files#r1766081573 ?
'reduce': False}) | ||
|
||
# Fetch only a list of WorkQueue element Ids && Filter them by allowed status | ||
if status: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that we are not breaking the current behavior of RequestPriority update.
However, thinking twice about this, if we want to make these updates more efficient, we could update it only for elements in Available, Negotiating and Acquired, from the list of potential statuses for WQE:
https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/WorkQueue/DataStructs/WorkQueueElement.py#L14
If you change it, could you please also update the Issue description (because it says only Available).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Jenkins results:
|
Can one of the admins verify this patch? |
@todor-ivanov apart from my comment to @amaltaro 's review, I only suggest to resolve the conflicts on |
@vkuznet thank you for the feedback.
I want to clarify that I would like to adopt these accountability principles to every issue, meaning that this is not a special treatment outlined only for this PR. I will follow up on this during the next group meeting. Thanks! |
Fixes #12038
Status
Ready
Description
With the current PR we add the new functionality to call all needed WMWorkload setter methods based on a full
reqArgs
dictionary passed from the upper level caller and try to set all of parameters at once, rather than calling every single setter method one by one. This is achieved by creating a proper map between methods and possible request arguments. And later validating if the set of arguments passed withreqArgs
dictionary would properly match the signature of the setter method which is to be called. In the current implementation only a small set of methods is mapped to request parameters :(Mostly single argument parametrized methods, but that's ok for the time being, because those cover perfectly the functionality we need to plug in here)
With this change a path for updating all possible workqueue elements in a single go was opened. In the
WorkQueue
service an additional method was developed for fetching all possible workqueue elements and update them all with the full set of arguments provided through thereqArgs
dictionary with the cost of a single database action per WorkQuee element, rather than 3 (or more) separate calls to the database for updating every single element parameter separately. Upon updating all workqueue elements with the new parameters the WMSpec copy of the given workflow at the workqueue is also updated in a single push using the same logic from above.Is it backward compatible (if not, which system it affects?)
YES
Related PRs
None
External dependencies / deployment changes
None