Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bindings] Update memory leak test #22059

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

rpoyner-tri
Copy link
Contributor

@rpoyner-tri rpoyner-tri commented Oct 17, 2024

Convert the test to unittest format, rewrite the instrumentation, and track the current numbers of leaks, with the goal of reaching 0 once fixes are landed.

As leaks are fixed, the test will complain about allowing more leaks than actually occurred, prompting update or removal of the leaks allowed.


This change is Reviewable

@rpoyner-tri rpoyner-tri added the release notes: none This pull request should not be mentioned in the release notes label Oct 17, 2024
Copy link
Contributor Author

@rpoyner-tri rpoyner-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+@jwnimmer-tri for feature review.

The results show that some tests miss the mark, and (perhaps) others may be missing. I don't know that we need to address that in this PR. YMMV.

Reviewable status: LGTM missing from assignee jwnimmer-tri(platform), needs at least two assigned reviewers

Copy link
Collaborator

@jwnimmer-tri jwnimmer-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 2 files at r1, all commit messages.
Reviewable status: 2 unresolved discussions, LGTM missing from assignee jwnimmer-tri(platform), needs at least two assigned reviewers


bindings/pydrake/test/memory_leak_test.py line 117 at r1 (raw file):

                if key != 'self'
                and not any(isinstance(value, typ) for typ in [list, str])}

Everything from class Sentinel up above through here is quite opaque to me, and indeed has few to no comments. We will need a lot more explanatory text in the first 100 lines to hand-hold drake developers through what all is happening.


bindings/pydrake/test/memory_leak_test.py line 299 at r1 (raw file):

                f" leaks_allowed count is stale, please update the test.")

nit Lint is here (blank line).

Copy link
Contributor Author

@rpoyner-tri rpoyner-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 unresolved discussions, LGTM missing from assignee jwnimmer-tri(platform), needs at least two assigned reviewers


bindings/pydrake/test/memory_leak_test.py line 2 at r1 (raw file):

"""Eventually this program might grow up to be an actual regression test for
memory leaks, but for now it merely serves to demonstrate such leaks.

stale


bindings/pydrake/test/memory_leak_test.py line 117 at r1 (raw file):

Previously, jwnimmer-tri (Jeremy Nimmer) wrote…

Everything from class Sentinel up above through here is quite opaque to me, and indeed has few to no comments. We will need a lot more explanatory text in the first 100 lines to hand-hold drake developers through what all is happening.

Done.

Copy link
Contributor Author

@rpoyner-tri rpoyner-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 3 unresolved discussions, LGTM missing from assignee jwnimmer-tri(platform), needs at least two assigned reviewers


bindings/pydrake/test/memory_leak_test.py line 99 at r2 (raw file):

def _counts_for_cycle_parts(o, name):
    """Print relevant reference counts for the objects expected if `o`
    participates in a cycle created by drake::pydrake::internal::ref_cycle.

anachronism

@rpoyner-tri rpoyner-tri force-pushed the update-mem-leak-test branch 2 times, most recently from 6ba1b9b to 3ad502b Compare October 18, 2024 17:40
Copy link
Collaborator

@jwnimmer-tri jwnimmer-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all commit messages.
Reviewable status: 3 unresolved discussions, LGTM missing from assignee jwnimmer-tri(platform), needs at least two assigned reviewers


bindings/pydrake/test/memory_leak_test.py line 62 at r3 (raw file):

def _object_generation(o) -> int | None:
    """Return the garbage collection generation of the passed object, or None

nit GSG mood.

Ditto throughout this file.

Suggestion:

Returns

bindings/pydrake/test/memory_leak_test.py line 71 at r3 (raw file):

        gen_id_list = [id(x) for x in gen_list]
        if id(o) in gen_id_list:
            return gen

BTW The way this is spelled seems odd to me. It is weird for performance reasons? Seems like this is simpler and more clear:

if any([x is o for x in gen_list]):
    return gen

Code quote:

        gen_id_list = [id(x) for x in gen_list]
        if id(o) in gen_id_list:
            return gen

bindings/pydrake/test/memory_leak_test.py line 128 at r3 (raw file):

            verbose: if True, print additional debug information.
        """
        self._verbose = verbose

I can't understand why this is a member field. It seems like it's only ever changed by hand, by editing the code, and never just for one dut, rather for everything at once?

Under that premise, a test-wide global VERBOSE = False would be much easier to read and understand even just for the if VERBOSE logic, but then would also allow us to split DutRepeater into pieces. Right now it's doing too many jobs at once, and is very difficult to follow and reason about.

Once "verbose" is make into a global, then make_sentinel no longer needs a self, and so it can be hoisted up right next to class Sentinel for cohesiveness. Its helper make_sentinels_from_locals would also move up there for the same reasons (dropping self).

After that, the dut functions don't need to be member functions, so would go back to also being little stateless free functions. Having them be member functions with self._leaks in their scope is very confusing.

After all of that, possibly it makes sense for DutRepeater to just merge into the class TestMemoryLeaks? Hard to say w/o seeing the next draft, but I'm struggling to understand what it's supposed to be encapsulating as a class. Possibly it still makes sense as a standalone thing. LMK.

Copy link
Contributor Author

@rpoyner-tri rpoyner-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 unresolved discussion, LGTM missing from assignee jwnimmer-tri(platform), needs at least two assigned reviewers


bindings/pydrake/test/memory_leak_test.py line 71 at r3 (raw file):

Previously, jwnimmer-tri (Jeremy Nimmer) wrote…

BTW The way this is spelled seems odd to me. It is weird for performance reasons? Seems like this is simpler and more clear:

if any([x is o for x in gen_list]):
    return gen

I was trying not to bump ref counts of stuff on the verge of GC, but maybe it doesn't matter. We'll try it your way.


bindings/pydrake/test/memory_leak_test.py line 128 at r3 (raw file):

Previously, jwnimmer-tri (Jeremy Nimmer) wrote…

I can't understand why this is a member field. It seems like it's only ever changed by hand, by editing the code, and never just for one dut, rather for everything at once?

Under that premise, a test-wide global VERBOSE = False would be much easier to read and understand even just for the if VERBOSE logic, but then would also allow us to split DutRepeater into pieces. Right now it's doing too many jobs at once, and is very difficult to follow and reason about.

Once "verbose" is make into a global, then make_sentinel no longer needs a self, and so it can be hoisted up right next to class Sentinel for cohesiveness. Its helper make_sentinels_from_locals would also move up there for the same reasons (dropping self).

After that, the dut functions don't need to be member functions, so would go back to also being little stateless free functions. Having them be member functions with self._leaks in their scope is very confusing.

After all of that, possibly it makes sense for DutRepeater to just merge into the class TestMemoryLeaks? Hard to say w/o seeing the next draft, but I'm struggling to understand what it's supposed to be encapsulating as a class. Possibly it still makes sense as a standalone thing. LMK.

Done.

Copy link
Collaborator

@jwnimmer-tri jwnimmer-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r5, all commit messages.
Reviewable status: 7 unresolved discussions, needs at least two assigned reviewers


bindings/pydrake/test/memory_leak_test.py line 163 at r4 (raw file):

    if VERBOSE:
        for key, value in locals().copy().items():
            _counts_for_cycle_parts(value, key)

BTW Using value, key here comes across as a bit unusual, things are typically "key, value" the other way around. Probably the "counts for ..." should require calling by kwargs, and so have name=key, o=value or key=key, value=value here at the caller.


bindings/pydrake/test/memory_leak_test.py line 294 at r4 (raw file):

    diagram.SetRandomContext(simulator.get_mutable_context(), random)
    simulator.AdvanceTo(0.5)
    return _make_sentinels_from_locals("full_simulator", locals())

nit This string doesn't match the function name, which is "full_example". I imagine we want them to be 100% identical?


bindings/pydrake/test/memory_leak_test.py line 300 at r4 (raw file):

    """Calls dut() for count times in a row, performing a full garbage
    collection before and after each call. Tracks memory leaks of interest; the
    count of leaks is returned.  if `VERBOSE=True`, additional debug

typo s/if/If/


bindings/pydrake/test/memory_leak_test.py line 310 at r4 (raw file):

    Returns:
        int: the total number of leaks detected by examining returned

BTW When I read this docstring, I took it to be the number of leaked objects, not the number of dut() calls which leaked anything.


bindings/pydrake/test/memory_leak_test.py line 330 at r4 (raw file):

class TestMemoryLeaks(unittest.TestCase):
    def do_test(self, *, dut, count, leaks_allowed=0):
        """Run the requested `dut` (see _repeat() above) for `count`

nit GSG "Runs"


bindings/pydrake/test/memory_leak_test.py line 123 at r5 (raw file):

# TODO(rpoyner-tri): update or delete this when ref_cycle is adopted, changed,
# or abandoned.
def _counts_for_cycle_parts(o, name):

nit This needs a verb. Maybe "_print_counts...etc"? Currently "counts" seems like it's supposed to be the verb, which is somewhat confusing.

Actually, without any of the 22022 infra landed, this function seems somewhat premature to land now?


bindings/pydrake/test/memory_leak_test.py line 360 at r5 (raw file):

        # TODO(rpoyner-tri): investigate platform differences, if leaks persist
        # after fixing.
        leaks_by_platform = {

BTW This kind of thing is why I'm a little bit hesitant to have the "fail if the allowed number is too high". For my taste, grooming that number should be mostly manual, without CI failures nagging us about it. Since the goal number our work on this ticket is necessarily zero, I don't know that we need a lot of hand-holding to remind us to keep lowering it.

Our C++ limit_malloc has min_num_allocations but it is rarely used. If you do want to keep the lower bound as part of this test, maybe "leaks_required=1" is a better phrasing that equating leaks_allowed to leaks_required implicitly?

Convert the test to unittest format, rewrite the instrumentation, and
track the current numbers of leaks, with the goal of reaching 0 once
fixes are landed.

As leaks are fixed, the test will complain about allowing more leaks
than actually occurred, prompting update or removal of the leaks
allowed.
Copy link
Contributor Author

@rpoyner-tri rpoyner-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+@sherm1 for next not-me platform review.

Reviewable status: LGTM missing from assignee sherm1(platform)

Copy link
Collaborator

@jwnimmer-tri jwnimmer-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r6, all commit messages.
Reviewable status: LGTM missing from assignee sherm1(platform)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: high release notes: none This pull request should not be mentioned in the release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants