Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SiteList actions to transient statuses in ReqMgr2 #12077

Conversation

todor-ivanov
Copy link
Contributor

@todor-ivanov todor-ivanov commented Aug 21, 2024

Fixes #12037

Status

Ready

Description

With the current change we allow SiteLists related actions for statuses: staging, acquired, running-open in ReqMgr2

Before making a call to central services for changing any of the request parameters, an additional step is executed to
to check which are the allowed parameter modifications for the given status and if the so provided new values fromthe rest call actually differ from the workflow parameters already defined in central couchdb.

With the current change we no longer ignore all the rest of the request arguments provided with the REST call in the cases of a change of the Request priority. See: #8457 (comment)

Is it backward compatible (if not, which system it affects?)

YES

Related PRs

None

External dependencies / deployment changes

None

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 1 changes in unstable tests
  • Python3 Pylint check: succeeded
    • 7 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 8 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/15160/artifact/artifacts/PullRequestReport.html

Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@todor-ivanov , the current changes are solely giving permissions to update those 2 attributes in the relevant workflow statuses.

However, we still have to deal with the "setter" functions, such that we can properly update the WMWorkload object.

This is the entry port for updating a workflow configuration (without any state changes):
https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/ReqMgr/Service/Request.py#L403

then it would have to call the setter functions defined in this module, e.g.:
https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/WMSpec/WMWorkload.py#L676

@todor-ivanov
Copy link
Contributor Author

thanks @amaltaro This helps.

I was still developing and on the stage of testing the current minimal change. (Faced some bumps on the road with my long forgotten central services environment though, but this was another story).

@todor-ivanov
Copy link
Contributor Author

todor-ivanov commented Aug 23, 2024

@amaltaro while reading and investigating the code surrounding this change here, I tried to understand this comment in the code:

# Yes, we completely ignore any other arguments posted by the user (web UI case)

And it seems to be related to this PR and this particular comment of yours:
#8457 (comment)

But the main reason imposing the need of this 'single workflow parameter change' seems to be hidden in this PR (according to your own words in the comment above):
https://github.com/dmwm/WMCore/pull/8419/files

It seems now is the time for this to be revisited. Do you remember what exactly were the requirements and why couldn't it be solved back then? And if those are still a factor?

@amaltaro
Copy link
Contributor

Todor, in addition to the discussion we had today - and now that I properly read the pointers you provided above - I wanted to say that there is a different payload between ReqMgr2 interaction through script vs web UI.

  • when someone is updating a parameter in the request, this person could either be providing the whole workflow dictionary; or only the dictionary/json that they want to change (e.g. RequestStatus and/or RequestPriority).
  • while when someone is updating parameters through web UI, ReqMgr2 is actually GETting that request in the UI, meaning that the parameter update gets posted together with everything else in the request.

This is probably not relevant, but just so you know what payload the server can receive.

Additionally, it would be ideal to be able to say whether a parameter actually changed or not. In other words, we do not want to update Site lists unless there is really an update to the content.

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 3 new failures
    • 1 tests no longer failing
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 3 warnings and errors that must be fixed
    • 15 warnings
    • 172 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 14 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/15183/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 2 new failures
  • Python3 Pylint check: failed
    • 3 warnings and errors that must be fixed
    • 15 warnings
    • 178 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 14 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/15184/artifact/artifacts/PullRequestReport.html

@todor-ivanov
Copy link
Contributor Author

Hi @amaltaro
I think I managed to filter out the request parameters which are not changing and WE now alter only those which are either new or different from the parameters set in the workload in couchdb. Please take a look and maybe if we manage to go quickly through this PR. You may include it in the next Release Candidate as well. So far every change has been tested interactively from my dev environment with one active workflow in the system. Printouts are bellow:

  • Changing RequestStatus:
In [138]: reqArgs = {'RequestStatus': 'staging'}; reqApi._updateRequest(workload, reqArgs)
[04/Sep/2024:15:39:13]  Updating request status for tivanov_SC_LumiMask_Rules_SiteListsTest_v4_240830_135942_6572 to staging.
2024-09-04 15:39:13,909:INFO:_cplogging:error(): [04/Sep/2024:15:39:13]  Updating request status for tivanov_SC_LumiMask_Rules_SiteListsTest_v4_240830_135942_6572 to staging.
Out[138]: {'tivanov_SC_LumiMask_Rules_SiteListsTest_v4_240830_135942_6572': 'OK'}
  • Correctly raising an exception during a change of RequestStatus due to extra parameters for a state change call:
In [142]: reqArgs = {'SiteWhitelist': ['T2_CERN_CH'], 'SiteBlacklist': ['T2_CERN_CH', 'T1_US_FNAL'], 'RequestPriority': 2, 'RequestName': 'tivanov_SC_LumiMask_Rules_SiteListsTest_v4_240830_1
     ...: 35942_6572', 'RequestStatus': 'staging'}; reqApi._updateRequest(workload, reqArgs)
---------------------------------------------------------------------------
InvalidSpecParameterValue                 Traceback (most recent call last)
Cell In[142], line 1
----> 1 reqArgs = {'SiteWhitelist': ['T2_CERN_CH'], 'SiteBlacklist': ['T2_CERN_CH', 'T1_US_FNAL'], 'RequestPriority': 2, 'RequestName': 'tivanov_SC_LumiMask_Rules_SiteListsTest_v4_240830_135942_6572', 'RequestStatus': 'staging'}; reqApi._updateRequest(workload, reqArgs)

File /data/WMCore.venv3.cmsweb-test1/srv/WMCore/src/python/WMCore/ReqMgr/Service/Request.py:546, in Request._updateRequest(self, workload, request_args)
    544     else:
    545         msg = "There are invalid arguments with this status transition: %s" % request_args
--> 546         raise InvalidSpecParameterValue(msg)
    548 if report == 'OK':
    549     return {workload.name(): "OK"}

InvalidSpecParameterValue: InvalidSpecParameterValue a8f692bb909dff22b0cc7e8a0c7ed0c6 [HTTP 400, APP 1102, MSG "Invalid spec parameter value: There are invalid arguments with this status transition: {'SiteWhitelist': ['T2_CERN_CH'], 'SiteBlacklist': ['T2_CERN_CH', 'T1_US_FNAL'], 'RequestPriority': 2, 'RequestName': 'tivanov_SC_LumiMask_Rules_SiteListsTest_v4_240830_135942_6572', 'RequestStatus': 'staging'}", INFO None, ERR None]

  • Changing Site{Black,White}Lists and RequestPriority' for a workflow in staging`:
In [139]: reqArgs = {'SiteWhitelist': ['T2_CERN_CH','T1_US_FNAL'], 'SiteBlacklist': ['T2_CERN_CH'], 'RequestPriority': 2, 'RequestName': 'tivanov_SC_LumiMask_Rules_SiteListsTest_v4_240830_13
     ...: 5942_6572'}; reqApi._updateRequest(workload, reqArgs)
[04/Sep/2024:15:39:19]  Updated priority of "tivanov_SC_LumiMask_Rules_SiteListsTest_v4_240830_135942_6572" to: 2
2024-09-04 15:39:19,770:INFO:_cplogging:error(): [04/Sep/2024:15:39:19]  Updated priority of "tivanov_SC_LumiMask_Rules_SiteListsTest_v4_240830_135942_6572" to: 2
report: ['OK', 'OK', 'OK']
Out[139]: {'tivanov_SC_LumiMask_Rules_SiteListsTest_v4_240830_135942_6572': 'OK'}
  • Repeating the call with the same parameters causes NO change to be attempted, because no values differ from the request configuration in central CouchDB (the previous call was successful and the changes are already present):
In [140]: reqArgs = {'SiteWhitelist': ['T2_CERN_CH','T1_US_FNAL'], 'SiteBlacklist': ['T2_CERN_CH'], 'RequestPriority': 2, 'RequestName': 'tivanov_SC_LumiMask_Rules_SiteListsTest_v4_240830_13
     ...: 5942_6572'}; reqApi._updateRequest(workload, reqArgs)
[04/Sep/2024:15:39:22]  Nothing to be changed at this stage
2024-09-04 15:39:22,986:INFO:_cplogging:error(): [04/Sep/2024:15:39:22]  Nothing to be changed at this stage
report: []
Out[140]: {'tivanov_SC_LumiMask_Rules_SiteListsTest_v4_240830_135942_6572': 'OK'}

@todor-ivanov
Copy link
Contributor Author

@amaltaro

In addition to the previous interactive tests, I patched my development cluster and checked the WEB interface. The siteLists fields are now available through the web interface:
Reqmgr-handleNoStatusUpdateWEB

But unfortunately once I try to use the WEB interface to change the allowed parameters I get an Server error and no erro message:
Reqmgr-handleNoStatusUpdateWEB-Fail

I am investigating now.

@todor-ivanov
Copy link
Contributor Author

And here is the reason for the above error. From the ReqMgr logs: [1] I am trying a fix now.

[1]

[04/Sep/2024:18:51:12]  Updating request "tivanov_SC_LumiMask_Rules_SiteListsTest_v4_240830_135942_6572" with these user-provided args: {'RequestPriority': 2, 'SiteWhitelist': 'T2_CERN_CH', 'SiteBlacklist': 'T1_US_FNAL'}
[04/Sep/2024:18:51:12]  SERVER OTHER ERROR __builtins__.KeyError 7724a6f5a74c54e57cef2bdfb9bb0afc ('RequestName')
[04/Sep/2024:18:51:12]    Traceback (most recent call last):
[04/Sep/2024:18:51:12]      File "/usr/local/lib/python3.8/site-packages/WMCore/REST/Server.py", line 749, in default
[04/Sep/2024:18:51:12]        return self._call(RESTArgs(list(args), kwargs))
[04/Sep/2024:18:51:12]      File "/usr/local/lib/python3.8/site-packages/WMCore/REST/Server.py", line 832, in _call
[04/Sep/2024:18:51:12]        obj = apiobj['call'](*safe.args, **safe.kwargs)
[04/Sep/2024:18:51:12]      File "/usr/local/lib/python3.8/site-packages/WMCore/ReqMgr/Service/Request.py", line 557, in put
[04/Sep/2024:18:51:12]        result = self._updateRequest(workload, request_args)
[04/Sep/2024:18:51:12]      File "/usr/local/lib/python3.8/site-packages/WMCore/ReqMgr/Service/Request.py", line 535, in _updateRequest
[04/Sep/2024:18:51:12]        report = self._handleNoStatusUpdate(workload, request_args, dn)
[04/Sep/2024:18:51:12]      File "/usr/local/lib/python3.8/site-packages/WMCore/ReqMgr/Service/Request.py", line 413, in _handleNoStatusUpdate
[04/Sep/2024:18:51:12]        _, reqArgsDiff = validate_request_update_args(request_args, self.config, self.reqmgr_db_service, None)
[04/Sep/2024:18:51:12]      File "/usr/local/lib/python3.8/site-packages/WMCore/ReqMgr/Utils/Validation.py", line 87, in validate_request_update_args
[04/Sep/2024:18:51:12]        request_name = request_args.pop("RequestName")
[04/Sep/2024:18:51:12]    KeyError: 'RequestName'
[04/Sep/2024:18:51:12] reqmgr2-88b98c6b5-cxhkq 188.185.19.4:33658 "PUT /reqmgr2/data/request/tivanov_SC_LumiMask_Rules_SiteListsTest_v4_240830_135942_6572 HTTP/1.1" 500 Internal Server Error

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 3 warnings and errors that must be fixed
    • 15 warnings
    • 180 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 14 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/15187/artifact/artifacts/PullRequestReport.html

@todor-ivanov
Copy link
Contributor Author

todor-ivanov commented Sep 4, 2024

And here (from my last commit f82b8cc) are both the cause and the fix:

   def _handleNoStatusUpdate ...
...
        # NOTE: Here we need to re-add `RequestName` since down bellow will be the second call of `validate_request_update_args`
        #       which, by itself is a destructive function, which changes `request_args` with the following line:
        #           request_name = request_args.pop("RequestName")
        #       So if we call this function more than once inside the same request instance on every next call it will raise:
        #           `KeyError: 'RequestName'`.
        #       The first time it was called was during the validation of the REST call.
        if 'RequestName' not in reqArgsDiff:
            reqArgsDiff['RequestName'] = workload.name()
        _, reqArgsDiff = validate_request_update_args(reqArgsDiff, self.config, self.reqmgr_db_service, None)

Unfortunately I noticed, if the request properties are changed through the web interface, we cannot catch the cases when there is no change of the parameter value from the call compared to what is already in couchDB, and so ReqMgr always goes through the full set of modification calls, rather than only through those relevant to the actually changed parameters. This behaviour is not observed when I use an interactive instance of Request and make direct calls to all the methods (as demonstrated above). I am currently looking for the eventual reasons.

@amaltaro
Copy link
Contributor

amaltaro commented Sep 5, 2024

@todor-ivanov I think it is fine to not touch the web UI. The site lists are usually large and the vast majority interactions are performed through scripts. Changing site lists via Web UI can be left for a future development IMO.

@todor-ivanov
Copy link
Contributor Author

todor-ivanov commented Sep 5, 2024

hi @amaltaro :

I think it is fine to not touch the web UI. The site lists are usually large and the vast majority interactions are performed through scripts

I'll need examples. BTW the WebUI works just fine - it executes correctly. And the WebUI is actually not a mandatory part, but I look at it rather like a sign that the sequence in which I call the validation routines/methods could be wrong. It would be interesting to understand what is going on before we close the issue. And those scripts, which you mention, I suspect they also use REST calls similar to the web UI, but I'll need to know the entry points they use in order to confirm things work just fine.

So far this is what I use for these interactive investigations and tests: https://github.com/todor-ivanov/WMCoreAux/blob/master/bin/WMCStandalone/init.py

@amaltaro
Copy link
Contributor

amaltaro commented Sep 5, 2024

I'll need examples.

For what concerns getting/creating/updating a request, it is always the same /reqmgr2/data/request endpoint (update is done with PUT method). Here is one example: https://github.com/dmwm/WMCore/blob/master/bin/adhoc-scripts/setrequeststatus.py#L12-L32

AFAIR, the main difference between Web UI and someone's script can be that WebUI always fetches the original document. In other words, it could be that user A makes a change to a request via script by simply providing the RequestName and SiteWhitelist key/value pairs. While user B makes a change to a request via Web UI, but the whole dictionary is sent back to the backend service, even though the user only wanted to update SiteWhitelist.

Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Todor, please find some comments/notes along the code.

# The first time it was called was during the validation of the REST call.
if 'RequestName' not in reqArgsDiff:
reqArgsDiff['RequestName'] = workload.name()
_, reqArgsDiff = validate_request_update_args(reqArgsDiff, self.config, self.reqmgr_db_service, None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed today, I understand this extra validation is going away.

Copy link
Contributor Author

@todor-ivanov todor-ivanov Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @amaltaro , I checked twice and it is actually all good. Upon properly patching my cluster I can see it works as expected. So my concerns about having this extra check here were at end irrelevant. Here are the reqmgr logs from my test cluster while using the web interface for two consecutive attempts to change the same set of request parameters. As can be seen clearly, on the second attempt nothing was about to be changed. And this check happens at this exact piece of code, while if I remove this section, then we experience the ill-behavior of changing all request parameters upon the call.

[06/Sep/2024:12:12:12] reqmgr2-88b98c6b5-cxhkq 127.0.0.1 "GET /reqmgr2/data/info HTTP/1.1" 200 OK [data: 296 in 662 out 50211 us ] [auth: OK "" "" ] [ref: "" "Go-http-client/1.1" ]
[06/Sep/2024:12:12:17]  Updating request "tivanov_SC_LumiMask_Rules_SiteListsTest_v4_240830_135942_6572" with these user-provided args: {'RequestPriority': 3, 'SiteWhitelist': ['T2_CERN_CH', 'T1_US_FNAL'], 'SiteBlacklist': ['T2_CERN_CH', 'T1_US_FNAL']}
[06/Sep/2024:12:12:17]  Updated priority of "tivanov_SC_LumiMask_Rules_SiteListsTest_v4_240830_135942_6572" to: 3
[06/Sep/2024:12:12:17]  Updated SiteWhitelist of "tivanov_SC_LumiMask_Rules_SiteListsTest_v4_240830_135942_6572", with:  ['T2_CERN_CH', 'T1_US_FNAL']
[06/Sep/2024:12:12:17]  Updated SiteBlacklist of "tivanov_SC_LumiMask_Rules_SiteListsTest_v4_240830_135942_6572", with:  ['T2_CERN_CH', 'T1_US_FNAL']
[06/Sep/2024:12:12:17] reqmgr2-88b98c6b5-cxhkq 188.185.19.4:37334 "PUT /reqmgr2/data/request/tivanov_SC_LumiMask_Rules_SiteListsTest_v4_240830_135942_6572 HTTP/1.1" 200 OK [data: 2278 in 108 out 446906 us ] [auth: ok "/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=tivanov/CN=726807/CN=Todor Trendafilov Ivanov" "" ] [ref: "https://cmsweb-test1.cern.ch" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36" ]




[06/Sep/2024:12:12:43] reqmgr2-88b98c6b5-cxhkq 127.0.0.1 "GET /reqmgr2/data/info HTTP/1.1" 200 OK [data: 296 in 662 out 65762 us ] [auth: OK "" "" ] [ref: "" "Go-http-client/1.1" ]
[06/Sep/2024:12:12:44]  Updating request "tivanov_SC_LumiMask_Rules_SiteListsTest_v4_240830_135942_6572" with these user-provided args: {'RequestPriority': 3, 'SiteWhitelist': ['T2_CERN_CH', 'T1_US_FNAL'], 'SiteBlacklist': ['T2_CERN_CH', 'T1_US_FNAL']}
[06/Sep/2024:12:12:44]  Nothing to be changed at this stage

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed it and tested without ti still works .. but as I mentioned in our chat yesterday.... the redundant parameters checks won't work properly if one does not go through the whole convoluted REST stack for WMCore.

reqArgsDiff['RequestName'] = workload.name()
_, reqArgsDiff = validate_request_update_args(reqArgsDiff, self.config, self.reqmgr_db_service, None)

def _reduceReport(report):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understood this function correctly, it is supposed to parse a list of dictionaries with outcome of CouchDB operations and decide whether the request was successful or not.

I don't think we need it, but if I am wrong, then I would suggest moving it to src/python/Utils and creating proper unit tests for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

validate_request_priority(reqArgsDiff)
# must update three places: GQ elements, workload_cache and workload spec
self.gq_service.updatePriority(workload.name(), reqArgsDiff['RequestPriority'])
report.append(self.reqmgr_db_service.updateRequestProperty(workload.name(), reqArgsDiff, dn))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should be calling updateRequestProperty multiple times for each different argument, as that will generate separated HTTP requests and be source for failures and inconsistencies.

Having a single update call keeps it simple as well and does not require the reduce custom function that you implemented above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was considering updating it once per every parameter instead..... with a call of the form:

report.append(self.reqmgr_db_service.updateRequestProperty(workload.name(), {reqArg: reqArgsDiff[reqArg]}, dn))

Such that we go progressively with those workflow changes and update as much as we can at couch before we give up.... but anyway we can do it in a single go as well. I'll change that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and actually I am not sure if when updatgin request statistics we need to call: reqmgr_db_service.updateRequestProperty, in the previous implementation this was skipped.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

about this:

Having a single update call keeps it simple as well and does not require the reduce custom function that you implemented above

We would actually still need it, because we are aggregating reports from two different calls to couchdb:

  • reqmgr_db_service.updateRequestStats
  • reqmgr_db_service.updateRequestProperty

report.append(self.reqmgr_db_service.updateRequestProperty(workload.name(), reqArgsDiff, dn))
workload.setSiteWhitelist(reqArgsDiff["SiteWhitelist"])
cherrypy.log('Updated SiteWhitelist of "{}", with: {}'.format(workload.name(), reqArgsDiff['SiteWhitelist']))
elif reqArg == "SiteBlacklist":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though the setter functions that you are calling are correct, note that this if-elif will result in a single if block getting executed. Whereas an user can provide multiple arguments to be updated in the same HTTP call.

Copy link
Contributor Author

@todor-ivanov todor-ivanov Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alan, you are inside a for cycle here. Every check you do at this stage is always a single possible match during one iteration of the cycle. So it actually spares you extra/redundant steps, because the moment the parameter matches a check, we take the relevant action and we get out of the elif sequence to continue with the next iteration of checks for the next request parameter from the for cycle. The other possibility is to use multiple independent if statements, which would have exactly the same effect, but at the cost of iterating through all checks every time, regardless if we have found a match or not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I overlooked the fact that it was inside a for loop.

I see you have separated the call to self.reqmgr_db_service, which can be either for updateRequestStats or for updateRequestProperty. Those are exclusive, so there is no need to deal with report as a list and provide the extra complexity with the reduceReport function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are exclusive, so there is no need to deal with report as a list

I did not get this comment. We do need to accumulate the reports in a list exactly because of the calls to different methods from the reqmgr_db_service. All of them use the same upstream mechanisms and functionalities from couchdb, so the set of report values from both types of calls are similar, ergo applying identical criteria for checking the reports from all of them is feasible. And the fact that we can get a request for changing multiple request parameters at once, leads to the need of those reports being accumulated in a single list and later crosschecked if any of the couchdb calls failed. So to me this is the logical and sensible approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Todor, I followed your code once again in a clean view (without GH diff):
https://github.com/dmwm/WMCore/blob/0da38322dfb816bd00b0df079cf24d58a9060966/src/python/WMCore/ReqMgr/Service/Request.py

and I do not see multiple requests being consumed in any moment. It is always returned as a single call report! So please remove the call to reduceReport() in (currently) L455.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well ... before we start this whole process of changing everything back and forth multiple times ... the report results were accumulated while iterating through the request parameters. They have never been intended to be accumulated through iteration over multiple request - because it simply never happens. But none of this matters any more, because as I sent you in a private chat this whole for cycle here I removed in my next PR, where I make this whole set of calls to the workload and global workque a single action. Which is actually the main challenge of the next issue to fight in this line of coding. The whole thing is reduced to just two lines like this:
https://github.com/dmwm/WMCore/pull/12096/files#diff-120ee6838284a3d1c1799f511da7f147179d0a955f87d0da6fc8b58a8b66c794R428-R462

        # reqArgsNothandled = []
        # for reqArg in reqArgsDiff.copy():
        #     if reqArg == 'RequestPriority':
        #         validate_request_priority(reqArgsDiff)
        #         # must update three places: GQ elements, workload_cache and workload spec
        #         self.gq_service.updatePriority(workload.name(), reqArgsDiff['RequestPriority'])
        #         workload.setPriority(reqArgsDiff['RequestPriority'])
        #         cherrypy.log('Updated priority of "{}" to: {}'.format(workload.name(), reqArgsDiff['RequestPriority']))
        #     elif reqArg == "SiteWhitelist":
        #         workload.setSiteWhitelist(reqArgsDiff["SiteWhitelist"])
        #         cherrypy.log('Updated SiteWhitelist of "{}", with:  {}'.format(workload.name(), reqArgsDiff['SiteWhitelist']))
        #     elif reqArg == "SiteBlacklist":
        #         workload.setSiteBlacklist(reqArgsDiff["SiteBlacklist"])
        #         cherrypy.log('Updated SiteBlacklist of "{}", with:  {}'.format(workload.name(), reqArgsDiff['SiteBlacklist']))
        #     else:
        #         reqArgsNothandled.append(reqArg)
        #         cherrypy.log("Uunhandled argument for no-status update: %s" % reqArg)


        # TODO:  map setters to key names:
        # Creating a setter method map - directly in the workload object.

        # Update all workload parameters based on the full reqArgsDiff dictionary
        workload.updateWorkloadArgs(reqArgsDiff)

        # Commit the changes of the current workload object to the database:
        workload.saveCouchUrl(workload.specUrl())

        # Commit all Global WorkQueue changes per workflow in a single go:
        # self.gq_service.updateElementsByWorkflow(workload.name(), reqArgsDiff)

        # if reqArgsNothandled:
        #     msg = "There were unhandled arguments left for no-status update: %s" % reqArgsNothandled
        #     raise InvalidSpecParameterValue(msg)

P.S. the commented lines I left just for now to point the code evolution. they will all be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I also rearranged the code in this PR such that nothing is persisted in the Database in case of unsuported arguments sent by the user, so that we can merge this and move on.

workload.setSiteBlacklist(reqArgsDiff["SiteBlacklist"])
cherrypy.log('Updated SiteBlacklist of "{}", with: {}'.format(workload.name(), reqArgsDiff['SiteBlacklist']))
else:
reqArgsNothandled.append(reqArg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I followed this logic properly, here you are simply gathering the list of arguments received by the server but not acted on. For debugging purposes is Okay, but I would say it's unnecessary to push it upstream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I did not put this only for debugging purposes. It is a completely valid use case. If we get a set of request_arguments sent here including a request parameter which we do not handle, we need to know once we get out of the cycle and properly log this, and eventually signal to the end user.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my comment below as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Please fix the typo in the message below then "Uunhandled".
Actually, we might even delete that line and keep only the final one reporting all unhandled/unexpected parameters.

reqArgsNothandled.append(reqArg)
cherrypy.log("Uunhandled argument for no-status update: %s" % reqArg)

if reqArgsNothandled:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be removed. In practice, we would be potentially persisting changes in the workload object and later on failing the request back to the end user.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to disagree... This would be unneeded only if we decide NOT to act on finding a request parameter change which we do not support and silently accept the user's call and never giving a feedback on what we have dropped. Of course this is one way to go.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you raise this exception, it will get mapped to a http status code >=400, which is error.
So we cannot execute some actions in the workload, and in the end report to the end user that his http request failed. It is inconsistent and misleading, at the very least.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, give me your suggestion on this. How would you want to address the situation. To me it is all good. just let me know what is your preference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion is not to change this behavior with this change. In other words, if we were not failing an HTTP call that provided us with extra arguments, I would be in favor of keeping that same behavior for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but we are failing it, that's the case. look at the current code:

else:
msg = "There are invalid arguments for no-status update: %s" % request_args
raise InvalidSpecParameterValue(msg)

The only difference is that we currently do not go and iterate on several parameters, simply because so far we already consider only tow parameters as changeable (which as you said are exclusive) and it is done on a single pass. From now on we will have to fail (raise the exception) again but we simply need to accumulate all non-handled parameters provided by the user and properly report them at the end. I was finding it already quite uncomfortable to have a misleading message with this exception and not giving any details about what is wrong with those arguments - which is redundant or which is more or missing etc .... And exactly because of this I decided ok - lets keep all non handled parameters in a single list during the cycle and report them back to the user(or some script used for the action) at the end so he is pointed to his exact mistake.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, now I see it can have value in knowing which parameters from the user were dropped, so it's Okay to write a record of those in the logs.

However, this behavior is different than the behavior currently implemented in master. In current master, we only fail the HTTP call if the user didn't provide any of the expected parameters (RequestPriority or stats). While with this PR, it fails for any unsupported parameter provided by the end user (even worse, which could be after persisting changes for supported parameters).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amaltaro , this is addressed in my next PR. See my comment above where I point to the lines of change in my next PR. We will again warn the user for the unsupported parameters requested, but we will do it as early as the workload update in memory and we will never persist anything in the database, because it will happen not here, but rather where it need be - in the code for workload update.

src/python/WMCore/ReqMgr/Utils/Validation.py Outdated Show resolved Hide resolved
else:
reqArgsDiffKeys.append(key)

# NOTE: We need to Explicitly add RequestStatus to reqArgsDiff, since it is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can assume that RequestStatus parameter can be changed at any moment against any current status.
Upon requesting a status change, that is anyways validated in a different place with allowed transition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And that different place again loops back to calling this function. Which in the past was not going to ask for allowed actions per status.I did not like to have this workaround in the validation function either. But given that this logic applies not only for RequestStatus but also to all stat parameters which by themselves are not a valid user defined request parameter I had to add it here. Of course the better way would be to move it up to the ALOWED_ACTIONS_FOR_STATUS list or even better the whole logic to be moved as an addition to the get modifiable_properties function and have the ALLOWED_ACTIONS_ALL_STATUS list properly defined in the RequestStatus.py file as the rest. Honestly this is my preference if you agree with me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

src/python/WMCore/ReqMgr/Utils/Validation.py Show resolved Hide resolved

validate_request_priority(request_args)
if 'RequestPriority' in reqArgsDiff:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is already performed inside the function, so we can keep the previous and shorter implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you refer to the _handleNoStatusUpdate function, then no, there we check to find out which is the proper set of actions to be taken, here we validate the request priority value sent IIUC.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referring to validate_request_priority, in the line below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pong

Copy link
Contributor Author

@todor-ivanov todor-ivanov Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would we need to increase the stack with an extra function call, when we already know at this very stage that a RequestPriority validation won't be needed. Anyway .. I'll remove it and will let us call the validate_request_priority constantly

@todor-ivanov todor-ivanov force-pushed the feature_SeteWhitelist_SupportChangeInReqmgr2_fix-12307 branch from fe03f89 to a765796 Compare September 6, 2024 15:50
@todor-ivanov
Copy link
Contributor Author

@amaltaro I have addressed all your comments and the final tests were sucessful. Please take another look

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 1 tests added
  • Python3 Pylint check: failed
    • 3 warnings and errors that must be fixed
    • 15 warnings
    • 201 comments to review
  • Pylint py3k check: failed
    • 2 errors and warnings that should be fixed
  • Pycodestyle check: succeeded
    • 30 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/15193/artifact/artifacts/PullRequestReport.html

@todor-ivanov todor-ivanov force-pushed the feature_SeteWhitelist_SupportChangeInReqmgr2_fix-12307 branch from a765796 to e940a00 Compare September 6, 2024 15:54
@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 1 tests added
  • Python3 Pylint check: failed
    • 3 warnings and errors that must be fixed
    • 15 warnings
    • 201 comments to review
  • Pylint py3k check: failed
    • 2 errors and warnings that should be fixed
  • Pycodestyle check: succeeded
    • 30 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/15194/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 1 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 3 warnings and errors that must be fixed
    • 15 warnings
    • 201 comments to review
  • Pylint py3k check: failed
    • 2 errors and warnings that should be fixed
  • Pycodestyle check: succeeded
    • 30 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/15195/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 2 new failures
    • 1 tests added
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 3 warnings and errors that must be fixed
    • 15 warnings
    • 201 comments to review
  • Pylint py3k check: failed
    • 2 errors and warnings that should be fixed
  • Pycodestyle check: succeeded
    • 30 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/15196/artifact/artifacts/PullRequestReport.html

@todor-ivanov
Copy link
Contributor Author

hi @amaltaro the unit tests failures seem to be not related to the current change. Same for the highlighted pylint notes. Could you please take another look at this code change?

@todor-ivanov
Copy link
Contributor Author

hi @amaltaro Could we finish this review here. I'd like to switch gears to #12038 and I strongly depend on this change, because the _handleNoStatusUpdate method is actually where the extra Workqueu elements check should happen.

Last time all my tests were successful, so I'd consider this code as good.

Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Todor, I don't think I cover 100% of code changes in my review, but I think there is enough information/material for you to work on.

In addition, please refer to the Jenkins report for a few things that you are expected to resolve in this PR.


validate_request_priority(request_args)
if 'RequestPriority' in reqArgsDiff:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping

reqArgsNothandled.append(reqArg)
cherrypy.log("Uunhandled argument for no-status update: %s" % reqArg)

if reqArgsNothandled:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion is not to change this behavior with this change. In other words, if we were not failing an HTTP call that provided us with extra arguments, I would be in favor of keeping that same behavior for now.

workload.setSiteBlacklist(reqArgsDiff["SiteBlacklist"])
cherrypy.log('Updated SiteBlacklist of "{}", with: {}'.format(workload.name(), reqArgsDiff['SiteBlacklist']))
else:
reqArgsNothandled.append(reqArg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my comment below as well.

workload.setPriority(reqArgsDiff['RequestPriority'])
workload.saveCouchUrl(workload.specUrl())
cherrypy.log('Updated priority of "{}" to: {}'.format(workload.name(), reqArgsDiff['RequestPriority']))
elif workqueue_stat_validation(reqArgsDiff):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For workqueue stats, you are unnecessarily validating the whole user input arguments while you iterate over each of the input arguments.

As I said in one of my previous comments, this action is exclusive to the other updates. So I would remove this check from inside this loop for, move it above in this method and check/act/return before proceeding with the remaining logic of the other arguments.

In other words, I suggest something like:

if workqueue_stat_validation(request_args):
    report = self.reqmgr_db_service.updateRequestStats(workload.name(), request_args)
    cherrypy.log('Updated workqueue statistics of "{}", with:  {}'.format(workload.name(), request_args))
    return report
...
report = []
reqArgsDiff = deepcopy(request_args)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

src/python/WMCore/ReqMgr/Service/Request.py Show resolved Hide resolved
raise InvalidSpecParameterValue(msg)

return report
report.append(self.reqmgr_db_service.updateRequestProperty(workload.name(), reqArgsDiff, dn))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that if you rollback your change above - about raise an exception for unsupported parameters - that you also need to change reqArgsDiff here, otherwise you would be updating unsupported properties as well.

Copy link
Contributor Author

@todor-ivanov todor-ivanov Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I do not want to roll it back. I know I am doing the correct and user friendly thing here. And I am doing it on purpose, because we safe what - tow lines of code here and now ... and then for the rest of the lifetime of the universe we give an incomplete and misleading debugging information, and who ever is reading the logs will always need background information in order to make proper sense of them and link his won action and the actual mechanism/reason why his call failed.

# must update three places: GQ elements, workload_cache and workload spec
self.gq_service.updatePriority(workload.name(), reqArgsDiff['RequestPriority'])
workload.setPriority(reqArgsDiff['RequestPriority'])
workload.saveCouchUrl(workload.specUrl())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this call:

workload.saveCouchUrl(workload.specUrl())

I think it also needs to be executed for site list changes, such that those information get persisted in the workload object. Said that, I would suggest moving it outside of this loop, such that we only call it once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, as we also discussed it earlier during the chat

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

case when all the entries in the list are identical or
False in the case when any of them deviates from the expected value.
"""
if reduce(lambda x, y: x == y == expectedValue and expectedValue, reportList, expectedValue):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As already expressed in my previous comments, I don't think we actually need this function. In addition, this expression looks cumbersome and it is hard to say what exactly it does. Or it could be just me.
Nonetheless, let us keep discussing/reviewing this PR before taking any action here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does a simple thing:

  • Takes a list of values (reportList) and applies a two parameterized lambda expression to each of them sequentially aggregating the previous result as an argument to the next step - IIUC this should be the standard reduce behavior: https://docs.python.org/3/library/functools.html#functools.reduce, but more or less this is the 'key' expression for understanding it: ((((1+2)+3)+4)+5) it tells it all.
  • The lambda expression on its own does the following: It simply compares if the tow arguments which it gets during the call are equal between each other and if they are equal if they match the expected value on top of that and if all of this is true returns the expected value otherwice the result of the comparission (which should be simply False
  • For initialization the sequence of steps in the reduce call an initial value is provided, which is the last argument - expectedValue, such that all comparisons are done against the expected value - even starting from the very first one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW this function is agnostic to the list provided... so it is applicable in many other cases as the so provided unittests shows it. I called it reduceReport only because of the context under which I initially suggested it before we moved it here. If I need to be honest the best name for this function should be reduceList or something in that line.


# NOTE: We need to explicitly add all stat keys during validation
# They are needed for the workqueue_stat_validation() calls
ALLOWED_STAT_KEYS = ['total_jobs', 'input_lumis', 'input_events', 'input_num_files']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of proliferating these stat parameters, please create it under:
https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/ReqMgr/DataStructs/RequestStatus.py
and them import it here.

It would be great if you could update this line as well:
https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/ReqMgr/Utils/Validation.py#L26
such that we stop using this hardcoded variable, and use it from the RequestStatus module variables.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, if you do the separation that I suggested in Request.py, you won't need these extra checks over here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Importing the ALLOWED_STAT_KEYS is good. I'll change it, but we will still need the extra line allowedKeys.extend(ALLOWED_STAT_KEYS) the other option would be to add it to the values returned from get_modifiable_properties(status) irectly but this I am afraid may change behaviour for others who depend on this. Ergo this is why I used this mechanism to additionally extend the list here. rather than in RequestSatatus

src/python/WMCore/ReqMgr/Utils/Validation.py Show resolved Hide resolved
@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 1 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 3 warnings and errors that must be fixed
    • 15 warnings
    • 200 comments to review
  • Pylint py3k check: failed
    • 2 errors and warnings that should be fixed
  • Pycodestyle check: succeeded
    • 30 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/15205/artifact/artifacts/PullRequestReport.html

@todor-ivanov todor-ivanov force-pushed the feature_SeteWhitelist_SupportChangeInReqmgr2_fix-12307 branch from 938f2f4 to 125f4f6 Compare September 12, 2024 14:04
@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 2 tests added
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 4 warnings and errors that must be fixed
    • 15 warnings
    • 202 comments to review
  • Pylint py3k check: failed
    • 2 errors and warnings that should be fixed
  • Pycodestyle check: succeeded
    • 31 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/15206/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 2 tests added
  • Python3 Pylint check: failed
    • 4 warnings and errors that must be fixed
    • 15 warnings
    • 202 comments to review
  • Pylint py3k check: failed
    • 2 errors and warnings that should be fixed
  • Pycodestyle check: succeeded
    • 31 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/15207/artifact/artifacts/PullRequestReport.html

@todor-ivanov todor-ivanov force-pushed the feature_SeteWhitelist_SupportChangeInReqmgr2_fix-12307 branch from 19e1e3b to 0da3832 Compare September 12, 2024 14:24
@todor-ivanov
Copy link
Contributor Author

hi @amaltaro I have addressed all of your comments. Please take another look.

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 2 tests added
    • 3 changes in unstable tests
  • Python3 Pylint check: failed
    • 3 warnings and errors that must be fixed
    • 15 warnings
    • 202 comments to review
  • Pylint py3k check: failed
    • 2 errors and warnings that should be fixed
  • Pycodestyle check: succeeded
    • 22 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/15208/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 2 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 3 warnings and errors that must be fixed
    • 15 warnings
    • 202 comments to review
  • Pylint py3k check: failed
    • 2 errors and warnings that should be fixed
  • Pycodestyle check: succeeded
    • 22 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/15209/artifact/artifacts/PullRequestReport.html

validate_request_priority(reqArgsDiff)
# must update three places: GQ elements, workload_cache and workload spec
self.gq_service.updatePriority(workload.name(), reqArgsDiff['RequestPriority'])
workload.setPriority(reqArgsDiff['RequestPriority'])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amaltaro let me ask something here about this line of code (not related to the current review process, though) ....

While working on #12038, I tried to follow the sequence of steps taken while updating a request priority and figure out if we could update both the workqueue elements and the workload priority from a single place - like the wmspec for example (i.e. the workload itself). And then I realized this step is somehow redundant, because workload priority update is a step already performed intrinsically during the call of Global Queue priority update. See:

# Update the spec, if it exists
if self.db.documentExists(wf):
wmspec = WMWorkloadHelper()
wmspec.load(self.hostWithAuth + "/%s/%s/spec" % (self.db.name, wf))
wmspec.setPriority(priority)
dummy_values = {'name': wmspec.name()}
wmspec.saveCouch(self.hostWithAuth, self.db.name, dummy_values)

Is there any particular reason why should we repeat the same step twice in a sequence .... or did we just happened to be living with this redundant step for sometime now and simply not noticing.... Or maybe I am wrong and I cannot grasp some minor detail, which makes a difference between those two calls to update the wmspec at couchDB:

  • the one coming from the workqueue service priority update and
  • the other from the request._handleNoStatutsUpdate methods?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a great question! At some point in the last ~17 years, it was decided that workqueue would actually have its own copy of the workload spec (as you can see it uses self.db in here as well, which points to workqueue), while the "first" workload spec is created in reqmgr2 during the workflow creation in the system (I guess in reqmgr_workload_cache db). So yes, we need to keep the current logic as those are different updates.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Todor not like!!!
Todor think many copies of objects many sources of concurrency

Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@todor-ivanov I still have some follow up to be done here before we can converge with these changes. Please find my comments along the code.

@@ -184,7 +196,10 @@ def get_modifiable_properties(status=None):
TODO: Currently gets the result from hardcoded list. change to get from configuration or db
"""
if status:
return ALLOWED_ACTIONS_FOR_STATUS.get(status, 'all_attributes')
allowedKeys = ALLOWED_ACTIONS_FOR_STATUS.get(status, 'all_attributes')
if not allowedKeys == 'all_attributes':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not if allowedKeys != 'all_attributes':?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a matter of taste. It makes no difference here. If you refer to the object equality in memory check which is done with is, this is not the case here. The not here plays a role only to negate the result of the comparison, while the comparison operator is still ==

Copy link
Contributor

@amaltaro amaltaro Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a matter of taste, but code readability. If you can read "different than", why would you write "not equal to"?


if not reqArgsDiff:
cherrypy.log("Nothing to be changed at this stage")
return reduceReport(report)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why calling reduceReport on an empty list? Won't it return False from that function back to the end user (or after the REST layer consumes the False value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, calling reduceReport on an empty list will return the expected value, because the expected value is set as a default.

In [10]: reduceReport([])
Out[10]: 'OK'

return reduceReport(report)

reqArgsNothandled = []
for reqArg in reqArgsDiff.copy():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I failed to see where RequestName is removed inside this loop. Can you please point me to the line/function doing so?

report.append(self.reqmgr_db_service.updateRequestProperty(workload.name(), reqArgsDiff, dn))
workload.setSiteWhitelist(reqArgsDiff["SiteWhitelist"])
cherrypy.log('Updated SiteWhitelist of "{}", with: {}'.format(workload.name(), reqArgsDiff['SiteWhitelist']))
elif reqArg == "SiteBlacklist":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Todor, I followed your code once again in a clean view (without GH diff):
https://github.com/dmwm/WMCore/blob/0da38322dfb816bd00b0df079cf24d58a9060966/src/python/WMCore/ReqMgr/Service/Request.py

and I do not see multiple requests being consumed in any moment. It is always returned as a single call report! So please remove the call to reduceReport() in (currently) L455.

workload.setSiteBlacklist(reqArgsDiff["SiteBlacklist"])
cherrypy.log('Updated SiteBlacklist of "{}", with: {}'.format(workload.name(), reqArgsDiff['SiteBlacklist']))
else:
reqArgsNothandled.append(reqArg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Please fix the typo in the message below then "Uunhandled".
Actually, we might even delete that line and keep only the final one reporting all unhandled/unexpected parameters.

reqArgsNothandled.append(reqArg)
cherrypy.log("Uunhandled argument for no-status update: %s" % reqArg)

if reqArgsNothandled:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, now I see it can have value in knowing which parameters from the user were dropped, so it's Okay to write a record of those in the logs.

However, this behavior is different than the behavior currently implemented in master. In current master, we only fail the HTTP call if the user didn't provide any of the expected parameters (RequestPriority or stats). While with this PR, it fails for any unsupported parameter provided by the end user (even worse, which could be after persisting changes for supported parameters).

Add proper checks for allowed request properties changes && Stop reducing request_args only to RequestPriority for noStatusTransition actions

Extend allowed arguments including stat_keys and RequestStatus && Call the relevant modifiers from _handleNoStatusUpdate

Fix missing RequestName from request_args on mutipple calls of validate_request_update_args

Add proper log messages for Sitelists changes

Remove redundant validation calls && Update reqdb couch with single action && Move reduceReport to Utils.

Typo

Remove forgotten commented lines of code

Review comments

Source files pylint fixes

Review comments
Unit tests

Unit tests pylint fixes

Unit tests - remove tests for reduceReport
@todor-ivanov todor-ivanov force-pushed the feature_SeteWhitelist_SupportChangeInReqmgr2_fix-12307 branch from 09de473 to 901463d Compare September 17, 2024 07:46
@todor-ivanov
Copy link
Contributor Author

@amaltaro take another look. And lets merge this and move on.

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 1 tests deleted
    • 1 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 4 warnings and errors that must be fixed
    • 15 warnings
    • 185 comments to review
  • Pylint py3k check: failed
    • 2 errors and warnings that should be fixed
  • Pycodestyle check: succeeded
    • 13 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/15213/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 1 tests added
    • 3 changes in unstable tests
  • Python3 Pylint check: failed
    • 3 warnings and errors that must be fixed
    • 15 warnings
    • 179 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 9 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/15214/artifact/artifacts/PullRequestReport.html

@todor-ivanov
Copy link
Contributor Author

test this please

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 1 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 3 warnings and errors that must be fixed
    • 15 warnings
    • 179 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 9 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/15215/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 1 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 3 warnings and errors that must be fixed
    • 15 warnings
    • 179 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 9 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/15216/artifact/artifacts/PullRequestReport.html

Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@todor-ivanov , I think you covered most of my comments in this PR.
As you started working in these 2 PRs in parallel and you are now having issues to track these modifications, we can follow up on anything else in the coming PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support change to SiteWhitelist/SiteBlacklist in ReqMgr2 for active workflows
3 participants