Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shared namespaces #534

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

anivegesana
Copy link
Contributor

@anivegesana anivegesana commented Aug 4, 2022

I'll fix anything tomorrow night. It is late.

Ok. I think that was the only additional case I needed to fix.

This PR caches global dictionary copies when pickling functions. The caching allows for the preservation of a globals dictionary shared between multiple functions. If the global dictionary is pickled, it will be pickled as the copy in all future references to prevent duplication of the globals dictionary.

Fixes #532, fixes #466

@anivegesana
Copy link
Contributor Author

anivegesana commented Aug 5, 2022

I remember why I didn't do this earlier. It makes save_module_dict more complicated, but we need to make it simpler to do #485.

@mmckerns
Copy link
Member

mmckerns commented Aug 5, 2022

When this is ready, either mark it as ready for review, or request a review.

@anivegesana
Copy link
Contributor Author

anivegesana commented Aug 5, 2022

I still don't have permissions to do that, but this PR is now ready. I might now, I will check on desktop later.

@mmckerns @leogama Look at the comment concerning cPickle first, because I do not know if that has higher priority.

@leogama
Copy link
Contributor

leogama commented Aug 5, 2022

I still don't have permissions to do that, but this PR is now ready

This is the GitHub workflow we kind of discovered and have being using:

  1. I open a Draft Pull Request
  2. We discuss some ideia, I push some commits
  3. After a while, when I reach the point where review is necessary, I change the PR from Draft to Ready for review
  4. mmckerns receives a notification (I think) and assigns himself as a reviewer
  5. mmckerns does the review and either merges, or leaves comments and requests changes (and I'm notified)
  6. I apply the requested and other changes
  7. As a review was already done, I'm able to re-request a review for mmckerns
  8. mmckerns receives a notification
  9. GOTO (5)

(If you didn't open the PR in draft mode, you may still convert it to a draft.)

@mmckerns @leogama Look at the comment concerning cPickle first, because I do not know if that has higher priority.

I'll take a (deeper) look at this later today. Edit: a brief look at the changes made me think that:

  1. They conflict with somewhat similar changes made in Session: improvements to documentation + handle unpickleable objects #527 to save_module_dict() and a new helper method _save_module_dict(); and
  2. The cache dict may not work for cases in dump_module() (the old dump_session()) where the global dictionary saved is a potentially modified copy of the module's dictionary

@mmckerns mmckerns self-requested a review August 5, 2022 22:18
@leogama
Copy link
Contributor

leogama commented Aug 9, 2022

@anivegesana I read your changes more thoroughly and couldn't find any problem relative dump_module(). As the attributes _session and _recurse are mutually exclusive, the chances of unexpected behaviors are small. But you left some indentation tabs in test_functions.py.

About conflicts with the other PR, I'll have to solve it there.

@leogama
Copy link
Contributor

leogama commented Aug 9, 2022

@mmckerns and @anivegesana: I'm thinking of ways to reorganize and simplify the segment of save_function() shown below. For that I need to know if there is any situation where a function's __globals__ or a module's __dict__ is None. Is that possible at all?

dill/dill/_dill.py

Lines 1825 to 1871 in 7fe9bb5

globs = None
if id(obj.__globals__) in pickler.memo:
# It is possible that the globals dictionary itself is also being
# pickled directly.
globs = globs_copy = obj.__globals__
elif _recurse:
# recurse to get all globals referred to by obj
from .detect import globalvars
globs_copy = globalvars(obj, recurse=True, builtin=True)
else:
globs_copy = obj.__globals__
# If the globals is the __dict__ from the module being saved as a
# session, substitute it by the dictionary being actually saved.
if _main_modified and globs_copy is _original_main.__dict__:
globs_copy = getattr(pickler, '_main', _original_main).__dict__
globs = globs_copy
# If the globals is a module __dict__, do not save it in the pickle.
elif globs_copy is not None and obj.__module__ is not None and \
getattr(_import_module(obj.__module__, True), '__dict__', None) is globs_copy:
globs = globs_copy
if globs is None:
# Add the name of the module to the globs dictionary and prevent
# the duplication of the dictionary. Pickle the unpopulated
# globals dictionary and set the remaining items after the function
# is created to correctly handle recursion.
if _globals_cache is not None and obj.__globals__ is not None:
if id(obj.__globals__) not in _globals_cache:
globs = {'__name__': obj.__module__}
_globals_cache[id(obj.__globals__)] = globs
else:
globs = _globals_cache[id(obj.__globals__)]
else:
globs = {'__name__': obj.__module__}
if globs_copy is not None and globs is not globs_copy:
# In the case that the globals are copied, we need to ensure that
# the globals dictionary is updated when all objects in the
# dictionary are already created.
glob_ids = {id(g) for g in globs_copy.values()}
for stack_element in _postproc:
if stack_element in glob_ids:
_postproc[stack_element].append((_setitems, (globs, globs_copy)))
break
else:
postproc_list.append((_setitems, (globs, globs_copy)))

@anivegesana
Copy link
Contributor Author

Just test if globs is None. globs_copy is a list of items to update the globals dictionary cache.

@leogama
Copy link
Contributor

leogama commented Aug 12, 2022

I've opened a PR in your fork: anivegesana#2

@mmckerns mmckerns added this to the dill-0.3.7 milestone Oct 23, 2022
# we only care about session the first pass thru
pickler._first_pass = False
StockPickler.save_dict(pickler, obj)

# IMPORTANT: update the following code whenever save_dict is changed in pickle.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears that the code for save_dict in all supported version of python are is the same (at least from if pickler.bin to pickler.memoize(obj)). The idea here is to insert the code with __name__ to save_dict while not yielding control to the StockPickler, correct? It seems that save_dict hasn't changed as far back as at least 3.1, so I'm not worried about the copy being made here.

@@ -0,0 +1,10 @@
# This file is used by test_shared_globals in test_functions.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be possible to use or modify a simple test like test_functors so that file is used as opposed to using an additional file like this one.

Copy link
Member

@mmckerns mmckerns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me. I'll do some testing and also have to understand what concerns @leogama has with your solution aside from there's some intersection with code in some of his PRs. Then I can move this to approve/request.

EDIT: Appears to correct behavior seen in #532, however doesn't seem to affect code initiating #466. Also fails the test case presented in #536:

     from statistics import mean
     pickle_loaded = dill.dumps(mean)
     del sys.modules['statistics']
     pickle_unloaded = dill.dumps(mean)
     assert 'statistics' not in sys.modules  #FAIL

Ok, apparently there's some conceptual difference on what the correct solution is... so this appears to need some thought. I'll dig into #536 before coming back to this.

@mmckerns mmckerns modified the milestones: dill-0.3.7, dill-0.3.8 Feb 5, 2023
Reorganize and simplify if-else clauses in `save_function()`
@anivegesana
Copy link
Contributor Author

I will need to look into this again at some point since it was a while that I looked at this before and forgot what these PRs were for. If I remember correctly, @leogama's changes were supposed to update the PRs to work with the code on master.

@mmckerns mmckerns modified the milestones: dill-0.3.8, dill-0.3.9 Nov 14, 2023
@mmckerns mmckerns modified the milestones: dill-0.3.9, dill-0.4.0 Sep 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

__globals__ of function in deleted module restored inconsistently Function namespace preservation
3 participants