Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

false negatives? #35

Open
metalichen opened this issue Feb 23, 2022 · 3 comments
Open

false negatives? #35

metalichen opened this issue Feb 23, 2022 · 3 comments

Comments

@metalichen
Copy link

Hey! I do have another question!

After annotating my MAGs, I saw that FeGenie didn't find any transport-related clusters in any of my MAGs, which wouldn't make sense biologically (I have, among others, several cyanobacterial MAGs, and they must get their iron somewhere, right?). If I use the --all_results flag, I get some transport genes, but I'm not sure I should use them, since you mention in a different thread that this flag can create false-positives.

I imagine something goes wrong during the clustering step? I looked into one MAG specifically. According to the output produced by --all_results, it has the three EfeUOB genes, all next to each other, but they don't show up when I run the same MAG in strict mode. Are the other genes that should be present for the cluster to be complete?

Sorry for the basic question, I'm very new to the iron metabolism world :)

I can send you the MAG I looked into, or the output files, if needed.

Thanks!

@metalichen metalichen changed the title false negetives? false negatives? Feb 23, 2022
@Arkadiy-Garber
Copy link
Owner

Hi Gulya,

Your reasoning makes total sense. It does seem like something is going wrong with the clustering step, and I suspect that is where the problem lies. If the three EfeUOB genes are encoded next to each other, they should definitely be picked up by FeGenie (without the --all_results flag). Are you by chance running FeGenie with the --orfs or --gbk mode? And which MAG is it?

Welcome to the iron metabolism world :) it gets confusing at times, but everyone gets along and helps each other out. Plus, we have good coffee.

Arkadiy

@metalichen
Copy link
Author

I was using --orfs (which, now when I think about it, would mean that fegenie might not know that these genes are next to each other?). And I looked into private_T1916_metawrap_bin.6

@Arkadiy-Garber
Copy link
Owner

Thanks, Gulya. You are exactly correct. When providing the --orfs flag, FeGenie skips the step where it clusters genes based on where they are encoded on the genome/contig. I need to make this clear in the README, or implement into FeGenie some kind of way to guess coordinates based on the order in which ORFs are listed in the FASTA file. Although, with the latter, there is potential to run into issues if the provided ORFs come from a highly fragmented assembly.

If you provide genbank files, along with the --gbk flag, that should allow FeGenie to keep track of the relative positions of ORFs on each contig. Otherwise, contigs are also another potential input, but in this case, FeGenie will run prodigal and generate new gene calls.

From the MAGs that you emailed me, it seems that you annotated with Prokka? Prokka also uses prodigal for ORF prediction, so the gene calls should be same, but with a different name. In any case, it wouldn't be very difficult to consolidate the two sets of ORFs.

Let me know if you have any other questions, or if anything here doesn't make sense!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants