Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update usage of "accession" as the ID column #29

Merged
merged 4 commits into from
Sep 14, 2023

Conversation

victorlin
Copy link
Member

@victorlin victorlin commented Jul 17, 2023

Description of proposed changes

New options in Augur 22.1.0 allow usage of this column as the ID column across all subcommands that read metadata.

For this workflow in particular, the metadata file can now be used as-is. This removes the need for a modified copy of the metadata. It also allows specifying an original metadata column "strain" as the display strain field, rather than a column "strain_original" generated during the Snakemake workflow.

Related issue(s)

Testing

  • Checks pass
  • Manually compared Auspice JSON visualized in Auspice. Tip attributes are identical. (ref)

@victorlin victorlin self-assigned this Jul 17, 2023
@victorlin victorlin force-pushed the victorlin/use-custom-metadata-id-column branch from ac2b5d5 to 0ef13fa Compare July 17, 2023 23:08
@victorlin victorlin force-pushed the victorlin/use-custom-metadata-id-column branch from 0ef13fa to e012512 Compare July 17, 2023 23:34
@j23414
Copy link
Contributor

j23414 commented Jul 18, 2023

Not sure how to test this properly.

May be able to check this manually, although I'm all for a reasonable automated check!

  1. Update to the latest nextstrain version to get the --metadata-id-columns flag
nextstrain update
  1. Compare runs, compare a few files and make sure nothing breaks
git clone https://github.com/nextstrain/rsv.git original
git clone https://github.com/nextstrain/rsv.git changed

cd original
nextstrain build .

cd ../changed
git checkout victorlin/use-custom-metadata-id-column
nextstrain build .
  1. Mouse over the tips in the tree. Is strain or accession displayed as tips in the two sets of trees? Do they match?
nextstrain view auspice
nextstrain view ../original/auspice

Or compare changed against the live RSV site

@victorlin
Copy link
Member Author

Thanks @j23414! I'll give it a try.

@victorlin
Copy link
Member Author

I added CI with example data to get a quick build. Looks like accession is missing from the Auspice JSON. I'll investigate more tomorrow.

@j23414
Copy link
Contributor

j23414 commented Jul 21, 2023

Oh, thanks for catching! And I like the fixup. For the missing accession, might be able to adjust

if node["name"] in lookup:
node["name"] = lookup[node["name"]]

to something like:

 if node["name"] in lookup:
     node[metadata_id_columns] = node["name"]
     node["name"] = lookup[node["name"]] 

I haven't tried the above code so I'm not certain if adding new attributes to a node would cause downstream errors.

@victorlin
Copy link
Member Author

I think it has something to do with rule export which comes before rule final_strain_name, since accession is missing from tree.json as well.

I didn't get around to this yet and might not today, but I'll update here when I do!

@victorlin victorlin marked this pull request as draft July 21, 2023 17:38
@victorlin
Copy link
Member Author

Ok, I think I got to the root of this. Reasons:

  1. In augur export, the accession node attribute gets exported as long as it's an available node attribute.
  2. Among other sources, node attributes come from metadata.
  3. The metadata used above comes from read_metadata, which does not make the index column name accessible like the others.
  4. This PR is updating the index column to be accession, making it inaccessible as a node attribute due to all of the above.

If this is it, then a proper fix would be to update read_metadata to keep the index column as a column when setting the index.

Signing off now, I'll get to this another day.

Add a note where it should be kept.
Checking presence of a key in the config should be done with the
membership operator.
Don't set defaults when retrieving strain_id_field so "accession" is
only set on the config level.
New options in Augur 22.2.0 allow usage of this column as the ID column
across all subcommands that read metadata.

For this workflow in particular, the metadata file can now be used
as-is. This removes the need for a modified copy of the metadata. It
also allows specifying an original metadata column "strain" as the
display strain field, rather than a column "strain_original" generated
during the Snakemake workflow.
@victorlin victorlin force-pushed the victorlin/use-custom-metadata-id-column branch from 77840b2 to d123d60 Compare September 14, 2023 20:42
@victorlin victorlin changed the title Use "accession" column as ID column directly Update usage of "accession" as the ID column Sep 14, 2023
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

d123d60's PR-triggered CI run results (pr29/d123d60/docker/rsv/a/F, etc.) should be comparable to those from latest master (master/4ecf498/docker/rsv/a/F, etc.).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@victorlin victorlin marked this pull request as ready for review September 14, 2023 21:07
Copy link
Contributor

@j23414 j23414 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran a manual test, I can see both strain name and accession named tips. Looks good to me!

@victorlin victorlin merged commit 29b1c29 into master Sep 14, 2023
3 checks passed
@victorlin victorlin deleted the victorlin/use-custom-metadata-id-column branch September 14, 2023 23:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

3 participants