Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature proposal: SAM tag enum #1272

Open
msto opened this issue Mar 15, 2024 · 1 comment
Open

Feature proposal: SAM tag enum #1272

msto opened this issue Mar 15, 2024 · 1 comment

Comments

@msto
Copy link
Contributor

msto commented Mar 15, 2024

Hi,

I think it would be valuable to add two features to improve the use of SAM tags.

  1. An enum describing the standard SAM tags.
  2. A class decorator to enforce tag conventions when declaring locally-defined tags .

Would these features be welcomed into pysam?

I am happy to implement these but would appreciate feedback on whether this is a contribution that would be accepted into pysam, and if so, on some design considerations before starting.

Thank you!

SAM tag enum

The primary question I have regarding a SAM tag enum is whether the member names should be the actual SAM tags, or more semantically meaningful?

e.g.

class SamTag(str, Enum):
    """Standard SAM tags."""
    
    RG: "RG"
    """Read group."""

    RX: "RX"
    """Sequence bases of the (possibly corrected) unique molecular identifier."""

or

class SamTag(str, Enum):
    """Standard SAM tags."""
    
    READ_GROUP: "RG"
    """Read group."""

    UMI: "RX"
    """Sequence bases of the (possibly corrected) unique molecular identifier."""

(note that I suggest mixing in str or subclassing StrEnum so the enums can be passed directly to pysam's tagging functions, e.g. read.has_tag(SamTag.UMI))

SAM tag decorator

To support locally-defined tags, I would propose providing an enumeration class decorator that implements the following validations:

  1. Enforce uniqueness (using enum.unique)
  2. Enforce that tags are two-character strings
  3. Optionally enforce that locally-defined tags adhere to SAM convention, namely that tags start with "X", "Y", or "Z", or are lowercase

e.g.

@sam_tag(strict=True)
class CustomTag(str, Enum):
    """Custom SAM tags used for $project."""
    
    FOO: "XF"
    """Foo."""
    
    BAR: "XB"
    """Bar."""
@msto
Copy link
Contributor Author

msto commented Apr 19, 2024

I have a proof-of-concept for this feature that I'd happily open a PR for here, if it's a contribution that you think would be sensible to add to pysam

https://github.com/msto/sam_tags/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant