Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch upload refactored #732

Open
wants to merge 24 commits into
base: main
Choose a base branch
from
Open

Batch upload refactored #732

wants to merge 24 commits into from

Conversation

aufdenkampe
Copy link
Member

@aufdenkampe aufdenkampe commented Oct 1, 2024

As described in #674 (comment), @ptomasula created this batch_upload branch by cherrypicking commits from this PR by @tpwrules:

This PR reflects that work, and will also close:

tpwrules and others added 24 commits December 28, 2022 01:33
A few places make Django ORM queries at the module level and thus run
during import of the portal code. This implies that a working connection
to a properly set-up database is required to e.g. use the Django management
command. This greatly complicates packaging, initial setup, and deployment.

This problem is fixed by simple refactorings and caches.
Creation of the SQLAlchemy model caches occurs during import of the
portal code. This implies that a working connection to a properly set-up
database is required to e.g. use the Django management command. This
greatly complicates packaging, initial setup, and deployment.

This problem is fixed by ignoring errors creating the cache and reminding
the user to check the database.
The `iterkeys()` method on a dict has been replaced by the equivalent
`keys()` method in Python 3. Update to use the correct method.
Reject duplicate names in the controlled vocabularies list. It is unknown
why such duplicates are being served, but they are as of 2022-12-27.
With release 0.17.0, organization_id is now the identifying field for site registration. I missed this correction in resolving merge conflicts.
352KB file (one column): 148s -> 1.6s

(cherry picked from commit 0f80fe5)
(cherry picked from commit 832cc38)
(cherry picked from commit 4365f6a)
(cherry picked from commit 434ab81)
(cherry picked from commit e90b108)
(cherry picked from commit a1e3eaf)
(cherry picked from commit 87a6c53)
(cherry picked from commit 1002c8d)
Request.data is of type QueryDict (https://docs.djangoproject.com/en/3.0/ref/request-response/#django.http.QueryDict) which subclasses the standard Dictionary. One notable change is that QueryDict contains a false mutability flag (true by default), making the pop commands trigger an exception. Changing to get so that this does not require mutability (e.g. removing item from the dictionary)
Remove the undefined variable from message (causing a new NameError). This will stop the exception for reraising a new exception and triggering a 500 response. Update the except generic exception to only catch SQLAlchemy errors.
Clean up items flagged by linter.
* Remove unreferenced imports
* Fix invalid type hint.
@aufdenkampe
Copy link
Member Author

@ptomasula just deployed this to staging for @SRGDamia1 to test from:

@neilh10
Copy link

neilh10 commented Oct 2, 2024

I'm assuming from the comment on #649
its available for regression checking on https://staging.monitormywatershed.org/.?

Is this a good place to make comments or where?
An FYI I got a -- Response Code -- 301 waited 3613 mS Timeout 15000
though I haven't decoded what its saying as that requires rebuilding the sw

[2024-10-02 09:22:49] pubDQTR Sending data to [ 0 ] staging.monitormywatershed.org:80
[2024-10-02 09:22:50] POST /api/data-stream/ HTTP/1.1
[2024-10-02 09:22:50] Host: staging.monitormywatershed.org
[2024-10-02 09:22:50] TOKEN: 5083cf3d-69cd-46fa-8d3a-0e3ba0b9e8fd
[2024-10-02 09:22:50] Content-Length: 409
[2024-10-02 09:22:50] Content-Type: application/json
[2024-10-02 09:22:50]
[2024-10-02 09:22:50] {"sampling_feature":"939d54e3-b92f-4e8d-bae5-1a5b4ae29193","timestamp":"2024-10-02T08:22:00-08:00","6e5516fc-cdcf-46a6-9c30-d96f3a016b0d":1,"fe4b74dd-598a-440d-b7d2-db0119584d5d":4.154,"5cbc0f3a-1b41-43cc-822f-374513de1899":15.98,"41330794-27cf-4130-b2b0-a1cffefe8ba2":-0.0672,"941ba4ed-f7d5-4c71-8356-591899ec6e1a":52.75,"d5f4c4dc-421d-436b-9989-99cb94ebc50a":18.72,"ffdc5b73-b800-4957-ab19-b25feba69dde":-1}
[2024-10-02 09:22:50]
[2024-10-02 09:22:55] -- Response Code -- 301 waited 3613 mS Timeout 15000

Copy link

@tpwrules tpwrules left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excited to see this moving forward!

Did some testing on the staging instance and it appears commit 8601a06 has broken the ability to have multiple data points per request. Because the sampling_feature key is no longer removed, the check for an unequal number of data points will fail.

I don't know why I didn't run into this during development (maybe Django versions?) but in any case I have filed another PR #734 to fix this and a couple other minor issues I noticed.

Merging that PR should update this one automatically. Please do that and then re-deploy and I can continue testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants