Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix staleness bug for application export #722

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 10 additions & 21 deletions backend/clubs/views.py
Original file line number Diff line number Diff line change
Expand Up @@ -6700,26 +6700,25 @@ def list(self, *args, **kwargs):
"""

app_id = self.kwargs["application_pk"]
serializer = kwargs.get("serializer", self.get_serializer_class())
key = f"applicationsubmissions:{app_id}"

cached = cache.get(key)
if cached is not None:
return Response(cached)
return Response(serializer(cached, many=True).data)
else:
serializer = self.get_serializer_class()
qs = self.get_queryset()
cache.set(key, qs, 60 * 60)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we're caching the queryset instead of the data?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only get the data after running the queryset through the serializer, while I wanted the cache contents itself to be serializer agnostic given that export uses a different serializer. On second thought though, get_queryset() might be evaluated lazily so I would need to rethink this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think .objects.all() would work @aviupadhyayula?

Copy link
Member

@aviupadhyayula aviupadhyayula Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I've read, it looks like the Django caching library forces evaluation upon insertion. We can test this later (force Django to log each query evaluation in settings). I'm more concerned about breaking our pattern of not caching querysets anywhere.

Honestly, is there a reason to cache this view? Querying the list of applicants will be done very rarely. Some latency on invocation is preferable to leaving a couple applicants off the UI or downloaded CSV. Not caching also obviates this concern.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair, caching definitely isn't necessary, at least for the CSV export (I didn't know that clubs used this feature until very recently). However, the cache invalidation logic was already previously written for the list view, and application-related views also contribute to site-level performance (I'm not sure how strict our transaction isolation levels are if at all, but you could imagine that querying all applications repeatedly, especially while new upsert transactions could still occur, is bad during application season).

data = serializer(qs, many=True).data
cache.set(key, data, 60 * 60)

return Response(data)

@method_decorator(cache_page(60 * 60 * 2))
@action(detail=False, methods=["get"])
def export(self, *args, **kwargs):
"""
Given some application submissions, export them to CSV.

Cached for 2 hours.
Uses potentially cached data from the list method.
---
requestBody:
content:
Expand All @@ -6741,22 +6740,12 @@ def export(self, *args, **kwargs):
type: string
---
"""
app_id = int(self.kwargs["application_pk"])
data = (
ApplicationSubmission.objects.filter(application=app_id)
.select_related("user__profile", "committee", "application__club")
.prefetch_related(
Prefetch(
"responses",
queryset=ApplicationQuestionResponse.objects.select_related(
"multiple_choice", "question"
),
),
"responses__question__committees",
"responses__question__multiple_choice",
)
)
df = pd.DataFrame(ApplicationSubmissionCSVSerializer(data, many=True).data)
data = self.list(
serializer=ApplicationSubmissionCSVSerializer, *args, **kwargs
).data

df = pd.DataFrame(data)

resp = HttpResponse(
content_type="text/csv",
headers={"Content-Disposition": "attachment;filename=submissions.csv"},
Expand Down
Loading