Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RBAC] Optimize compute_object_role_permissions as iterable with prefetch_related #300

Open
AlanCoding opened this issue Apr 10, 2024 · 0 comments
Labels
app:rbac performance Reduction of queries, query performance, etc. ready to work Item is ready to be worked on

Comments

@AlanCoding
Copy link
Member

If you use Django debug toolbar, you can run the following to see how performant DAB RBAC is for rebuilding the entire RoleEvaluation table

from ansible_base.rbac.caching import compute_team_member_roles, compute_object_role_permissions

compute_object_role_permissions()

You find that it is limited by a particular constraint, which is that we need to some simple-looking prefetches added:

diff --git a/ansible_base/rbac/caching.py b/ansible_base/rbac/caching.py
index a90bf00..24584ab 100644
--- a/ansible_base/rbac/caching.py
+++ b/ansible_base/rbac/caching.py
@@ -168,7 +168,7 @@ def compute_object_role_permissions(object_roles=None, types_prefetch=None):
     if types_prefetch is None:
         types_prefetch = TypesPrefetch.from_database(RoleDefinition)
     if object_roles is None:
-        object_roles = ObjectRole.objects.iterator()
+        object_roles = ObjectRole.objects.prefetch_related('permission_partials', 'permission_partials_uuid', 'provides_teams__has_roles')
 
     for object_role in object_roles:
         role_to_delete, role_to_add = object_role.needed_cache_updates(types_prefetch=types_prefetch)

However, this loses .iterator() which is probably unacceptable, because this is the one big memory-intensive table involved in querysets.

There is a very good proposed solution at https://djangosnippets.org/snippets/1949/ but it is not a standard Django util.

import gc

def queryset_iterator(queryset, chunksize=1000):
    '''''
    Iterate over a Django Queryset ordered by the primary key

    This method loads a maximum of chunksize (default: 1000) rows in it's
    memory at the same time while django normally would load all rows in it's
    memory. Using the iterator() method only causes it to not preload all the
    classes.

    Note that the implementation of the iterator does not support ordered query sets.
    '''
    pk = 0
    last_pk = queryset.order_by('-pk')[0].pk
    queryset = queryset.order_by('pk')
    while pk < last_pk:
        for row in queryset.filter(pk__gt=pk)[:chunksize]:
            pk = row.pk
            yield row
        gc.collect()

This will give 1 order-of-magnitude improvement in the above scenario.

The ToS on the site provides license for the code snippets: https://djangosnippets.org/about/tos/

That you grant any third party who sees the code you post a royalty-free, non-exclusive license to copy and distribute that code and to make and distribute derivative works based on that code.

@AlanCoding AlanCoding added performance Reduction of queries, query performance, etc. app:rbac Ready To Merge ready to work Item is ready to be worked on and removed Ready To Merge labels Apr 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
app:rbac performance Reduction of queries, query performance, etc. ready to work Item is ready to be worked on
Projects
None yet
Development

No branches or pull requests

1 participant