Here we go again: DIRAC environment leaking into user jobs #6277
-
Hi All, When we ran into this problem going from v6 to v7: Unfortunately we only checked the PYTHON/LD_LIBRARY_PATH in testing as this was the issue last time around, so we didn't quite see this coming. Daniela |
Beta Was this translation helpful? Give feedback.
Replies: 7 comments 7 replies
-
Hi @marianne013 Actually you are right, this is all an unfinished business, maybe closed too soon #4480 And indeed, it is even unclear what the real strategy should be ! And we do cleanup more and more variables (sigh...) DIRACGrid/DIRACOS2#74 So I really do not have a solution now.But isn't the best thing to do what you advice to your users (https://www.gridpp.ac.uk/wiki/Quick_Guide_to_Dirac) ?
Also, I just had a very quick look at "snoplus" on github, and it seems their recommended way of doing is through containers. That should be a good enough way to split the environment no ? |
Beta Was this translation helpful? Give feedback.
-
Do you know how hard it is to explain containers to every one, which directories to bind-mount (is that even a word?) etc ? It's all doable, but I would prefer the rather clean setup we had in v7r3 (I meant v7r2) as a starting point. Simon and me had some brief discussion about this, but if we had come up with a master plan you would have heard about it by now. |
Beta Was this translation helpful? Give feedback.
-
Just to suggest a possible solution, I suspect it'll be considered too complicated:
(Although actually now I've written it, it's doesn't look quite as bad as I thought it might :-) That way the user will get the native version of everything, but will still be able to use the DIRAC tools (as they have a wrapper script that points to the diracos python version, so can be run without activating the full environment). The problem with trying to fix anything at a VO level is that some of our VOs aren't as tightly integrated as some of the bigger ones: There may well be multiple groups with independent stuff, for example I don't think our snoplus users are using containers at all, that's a different sub-group! This goes a little beyond just python: Things like curl, openssl & perl are also preferentially used from diracos2, so you get a much newer version than you might expect. While I don't have any concrete examples, it's only a matter of time before someone tries to run something like curl and finds it has a slightly different behaviour or command line switches to what they expect (wrong python version is only the visible tip of the iceburg). Regards, |
Beta Was this translation helpful? Give feedback.
-
I confess being a bit lost in all this, I don't recall of all the changes, but did we REALLY have a clean setup, or did it just look like it, because you had a python2 anyway ?
I don't think this is entirely true, as activating the environment also sets and unsets some variables. Just to make sure we all have the same thing in mind: do we (still) all agree that what we initially want to achieve is that a payload starts with the vanilla environment from the worker node ? And possibly still be able to use dirac commands ? This actually matches perfectly the summary done by Marko
|
Beta Was this translation helpful? Give feedback.
-
How do they run the Job? Can we jet the jobDescription.xml (via email, if not public here)? |
Beta Was this translation helpful? Give feedback.
-
"... but if the VO, user, operators decide to follow such an approach, it is their responsibility to ensure the environment separation for calling dirac command and payload execution." |
Beta Was this translation helpful? Give feedback.
-
I've got another suggested fix: |
Beta Was this translation helpful? Give feedback.
How do they run the Job? Can we jet the jobDescription.xml (via email, if not public here)?
How did the error message change with the python2 pilots?