Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

in silico deletion script: hound_isd_bed #43

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

in silico deletion script: hound_isd_bed #43

wants to merge 3 commits into from

Conversation

anyakors
Copy link
Contributor

Description of your changes

Added new script hound_isd_bed.py analogous to the hound_ism_bed.py. The ISD script performs in silico deletions instead of in silico mutations.
Stitching is performed on reference to avoid doubling of the deleted sequence portion in the left and right shifts in alternative.

New arguments: "-s", dest="del_len" (Deletion size for ISD [Default: 1])

Type of change

  • New feature
    • Backwards Incompatible?

(If applicable) How has this been tested?

Tested on the MPRA-deletion dataset (M Kircher, Nat Comm 2019) -- F9, HBG1 and LDLR gene promoters

Copy link
Collaborator

@davek44 davek44 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks like it should work! Just a few comments to improve.

scores_h5.create_dataset("seqs", dtype="bool", shape=(num_seqs, options.mut_len, 4))
for snp_stat in options.snp_stats:
scores_h5.create_dataset(
snp_stat, dtype="float16", shape=(num_seqs, options.mut_len, 4, num_targets)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the 3rd axis disappear? In ISM, it represents the alternative nucleotides, but they don't exist here.

ref_preds_stitch, alt_preds, options.snp_stats, None
)
for snp_stat in options.snp_stats:
scores_h5[snp_stat][si, mi - mut_start, 0] = ism_scores[snp_stat]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By dropping the 3rd axis, you can remove the "0]" here.

ref_preds.append(ref_preds_shift)

# for mutation positions
for mi in range(mut_start, mut_end):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the deletion size is >1, I think you'd want to advance your index by the size. Otherwise, your deleting overlapping k-mers, and I can't think of a scenario where you'd prefer that over the single nt deletions.

)
parser.add_option(
"-s",
dest="del_len",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"del_size" maybe so -s matches the first letter?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants