Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix some scrambled references #2275

Merged
merged 1 commit into from
Aug 19, 2024
Merged

Fix some scrambled references #2275

merged 1 commit into from
Aug 19, 2024

Conversation

bemoody
Copy link
Collaborator

@bemoody bemoody commented Aug 16, 2024

Project references have a field called order which is nominally
supposed to indicate the order they're displayed.

Correct use of this field hasn't been enforced (see issue #2137), and
a lot of published projects on PhysioNet (probably fair to say most
projects that have more than one published version) have references
displayed in the wrong order, because the order wasn't copied from one
version to another.

This script should retroactively fix a large number of both published
and active projects.

I tried running it on a recent database dump from PhysioNet, and it
identified a total of 36 projects that it was able to fix. I
spot-checked annotation-dataset-sdoh and mimiciv and both looked
correct.

There are about 12 published projects that will still require manual
fixing.

As I said in issue #2137, I don't intend to do any manual fixing of
active projects, but I do intend to add an integrity check.

Quite some time ago an "order" field was added to the Reference and
PublishedReference models, to make the order explicit.

However, this field was neither required to be non-null, nor required
to be unique, so nothing prevented creating and publishing a project
with an ill-defined reference order.  In particular:

(a) NewProjectVersionForm did not copy "order" from PublishedReference
    into Reference; so newly created versions would have unpredictable
    ordering.

    (This bug has been fixed by commit 6b42613.)

(b) ReferenceFormSet tried to set "order", but only succeeded in saving
    the value for references that were being created/modified; so if a
    reference was deleted and then new references added, there could
    be multiple references with the same "order" value.

    (This bug is still present.)

It is difficult, bordering on impossible - especially after a project
has been published - to guess the author's intent and fix the
resulting issues automatically.

However, a common situation is that two versions of the same project
have *identical* lists of references, and the second version has all
"order" values set to None.  That means that the author created a new
version of the existing project and *never edited* any of the
references; so it is probably safe to assume that the author intended
to keep the order exactly as it was in the previous version.

An example of this would be:

- https://web.archive.org/web/20240418010540/https://physionet.org/content/annotation-dataset-sdoh/1.0.0/
- https://web.archive.org/web/20240710153623/https://physionet.org/content/annotation-dataset-sdoh/1.0.1/

A slightly less common situation is that the second version of the
project has all the same references as the first version, but also has
one or more additional references added to (what the author intended
to be) the end of the list.  Since those new references would have
been created via ReferenceFormSet, the new references have "order"
greater than the number of old references, while the old references
still have "order" equal to None.

Again, if the old references were never edited, it's probably
reasonable to assume that the author didn't intend to change their
order.

An example of this would be:

- https://web.archive.org/web/20240731053747/https://physionet.org/content/mimiciv/2.2/
- https://web.archive.org/web/20240806233833/https://physionet.org/content/mimiciv/3.0/

(notice that the three new references are listed as the *first* three
items, because null is sorted after any non-null value.)

Here, we add a data migration that should automatically fix those two
cases, and other similarly unambiguous cases.  It works by iterating
over projects in publication order, meaning that it should be able to
repair references that were copied from version A to version B, and
then later copied from version B to version C.

One thing worth noting here is that we do not simply change the
"order" of existing Reference and PublishedReference objects.
ReferenceFormSet has never honored the "order" when it comes to
displaying references in the Project Content (or Copyedit) pages.
That formset has always ordered references by their primary key
("id").  That logic would be difficult to change, and doing so would
only make the problem worse by further obscuring the author's intent.
On the other hand, it also seems impractical to revert to
"id"-ordering in previews and published projects.

Thus, when repairing a broken project, we actually shuffle the
"descriptions" of the existing Reference objects, so that the final
"order" and "id" should result in the same actual ordering.  It's
possible that this could lead to errors if an author is trying to edit
the reference list while this migration is being deployed, but that
seems fairly unlikely and less bad than the alternatives.  It's not
really necessary to do this for PublishedReferences, but using the
same logic as for References doesn't hurt.
@tompollard
Copy link
Member

thanks for looking at this!

@tompollard tompollard merged commit d129a65 into dev Aug 19, 2024
8 checks passed
@tompollard tompollard deleted the bm/fix-some-references branch August 19, 2024 18:20
@tompollard
Copy link
Member

This is the output on production:

project.migrations.0077_fix_some_references INFO     correcting 28 references in annotation-dataset-sdoh-1.0.1 by copying from annotation-dataset-sdoh-1.0.0
project.migrations.0077_fix_some_references INFO     correcting 7 references in asclepius-r-1.0.1 by copying from asclepius-r-1.0.0
project.migrations.0077_fix_some_references INFO     correcting 7 references in asclepius-r-1.1.0 by copying from asclepius-r-1.0.1
project.migrations.0077_fix_some_references INFO     correcting 5 references in big-ideas-glycemic-wearable-1.1.0 by copying from big-ideas-glycemic-wearable-1.0.0
project.migrations.0077_fix_some_references INFO     correcting 5 references in big-ideas-glycemic-wearable-1.1.1 by copying from big-ideas-glycemic-wearable-1.1.0
project.migrations.0077_fix_some_references INFO     correcting 5 references in big-ideas-glycemic-wearable-1.1.2 by copying from big-ideas-glycemic-wearable-1.1.1
project.migrations.0077_fix_some_references INFO     correcting 8 references in bionlp-workshop-2023-task-1a-1.1.0 by copying from bionlp-workshop-2023-task-1a-1.0.0
project.migrations.0077_fix_some_references INFO     correcting 10 references in x3yb97BFFB0TRfYXS4PG-1.1 by copying from blood-gas-oximetry-1.0
project.migrations.0077_fix_some_references INFO     correcting 12 references in chexmask-cxr-segmentation-data-0.2 by copying from chexmask-cxr-segmentation-data-0.1
project.migrations.0077_fix_some_references INFO     correcting 15 references in cxr-lt-iccv-workshop-cvamd-1.1.0 by copying from cxr-lt-iccv-workshop-cvamd-1.0.0
project.migrations.0077_fix_some_references INFO     correcting 10 references in discharge-me-1.3 by copying from discharge-me-1.2
project.migrations.0077_fix_some_references INFO     correcting 10 references in syIIx6RD9i3eiWpXjvp3-1.4 by copying from discharge-me-1.3
project.migrations.0077_fix_some_references INFO     correcting 13 references in ZlZIL8lFA810BV9BFVjq-1.0.1 by copying from eeg-eye-gaze-for-fls-tasks-1.0.0
project.migrations.0077_fix_some_references INFO     correcting 4 references in 22pYe5Nclv0DFHbKkxYE-1.1.0 by copying from ffa-ir-medical-report-1.0.0
project.migrations.0077_fix_some_references INFO     correcting 18 references in globem-1.1 by copying from globem-1.0
project.migrations.0077_fix_some_references INFO     correcting 21 references in grabmyo-1.1.0 by copying from grabmyo-1.0.2
project.migrations.0077_fix_some_references INFO     correcting 5 references in MRTCrjU9gNRrZsU04JgL-2.0.0 by copying from hirid-1.1.1
project.migrations.0077_fix_some_references INFO     correcting 13 references in i-care-2.0 by copying from i-care-1.0
project.migrations.0077_fix_some_references INFO     correcting 7 references in 0FPXJyEOjDM0Cmntj84W-1.0.1 by copying from in-gauge-and-en-gage-1.0.0
project.migrations.0077_fix_some_references INFO     correcting 13 references in inspire-1.0 by copying from inspire-0.1
project.migrations.0077_fix_some_references INFO     correcting 10 references in kinecal-1.0.3 by copying from kinecal-1.0.2
project.migrations.0077_fix_some_references INFO     correcting 8 references in mimic-cxr-2.1.0 by copying from mimic-cxr-2.0.0
project.migrations.0077_fix_some_references INFO     correcting 11 references in mimic-cxr-jpg-2.1.0 by copying from mimic-cxr-jpg-2.0.0
project.migrations.0077_fix_some_references INFO     correcting 23 references in mimicel-ed-2.1.0 by copying from mimicel-ed-2.0.0
project.migrations.0077_fix_some_references INFO     correcting 6 references in mimiciv-3.0 by copying from mimiciv-2.2
project.migrations.0077_fix_some_references INFO     correcting 11 references in mimic-iv-ecg-0.3 by copying from mimic-iv-ecg-0.2
project.migrations.0077_fix_some_references INFO     correcting 9 references in e3tS56xZergoQNUekgfr-2.3 by copying from mimic-iv-note-2.2
project.migrations.0077_fix_some_references INFO     correcting 13 references in iYi9UuHIAC385OWDYQMm-1.1.0 by copying from multimodal-dental-dataset-1.0.0
project.migrations.0077_fix_some_references INFO     correcting 9 references in openox-repo-1.0.1 by copying from openox-repo-1.0.0
project.migrations.0077_fix_some_references INFO     correcting 5 references in patient-level-data-covid-ms-1.0.1 by copying from patient-level-data-covid-ms-1.0.0
project.migrations.0077_fix_some_references INFO     correcting 23 references in scientisst-move-biosignals-1.0.1 by copying from scientisst-move-biosignals-1.0.0
project.migrations.0077_fix_some_references INFO     correcting 6 references in semg-1.0.1 by copying from semg-1.0.0
project.migrations.0077_fix_some_references INFO     correcting 6 references in OfGQ5f3bFgaSMS7iHXjN-1.0.1 by copying from siena-scalp-eeg-1.0.0
project.migrations.0077_fix_some_references INFO     correcting 7 references in 0d47UkSH7SqGkpf3h44q-2.0.0 by copying from virtual-reality-piloting-1.0.0
project.migrations.0077_fix_some_references INFO     correcting 18 references in ojL7jcN448wf0pFh9YxT-1.1 by copying from zhejiang-ehr-critical-care-1.0
project.migrations.0077_fix_some_references INFO     correcting 18 references in y3uk3MVmULFXNviu2JJa-1.1 by copying from zhejiang-ehr-critical-care-1.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants