Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch on intermediate small molecule instance sharing for Reactome #308

Open
dustine32 opened this issue Dec 6, 2023 · 9 comments
Open
Assignees

Comments

@dustine32
Copy link
Collaborator

We've decided to try applying the sharing of input/output instances between reaction activities to Reactome GO-CAM models. This was already applied for YeastPathways models in #258. Referencing #258 (comment).

While I call this a "switch," it's going to involve editing many entityStrategy.equals(EntityStrategy.[REACTO|YeastCyc]) conditionals in the code.

This change should affect several models. An example model is R-HSA-4641262 Disassembly of the destruction complex and recruitment of AXIN to the membrane.

Tagging @thomaspd @deustp01 @ukemi @vanaukenk

@dustine32
Copy link
Collaborator Author

@ukemi @deustp01 I guess my model load PR geneontology/reactome-go-cams#13 auto-closed this ticket so reopening at least for testing. These models are loaded into noctua-dev now for testing.

@dustine32 dustine32 reopened this Mar 15, 2024
@ukemi
Copy link

ukemi commented Mar 20, 2024

Thanks @dustine32 ! I will have time to look at these this afternoon and we can discuss any issues on the weeds call tomorrow.

@ukemi
Copy link

ukemi commented Mar 20, 2024

Models checked look ok, but since Reactome captures all of the participants they are very busy. I have tried to tease out the key players. This might be a job for primary inputs/outputs. Note also that if more than one reaction has the same output in a model, then they share an instance of that output. Not to get bogged down with the instance discussion ans what instances in models represent, I think at the level of abstraction we are working this is ok.
http://noctua-dev.berkeleybop.org/editor/graph/gomodel:R-HSA-70171? (busy but good)
http://noctua-dev.berkeleybop.org/editor/graph/gomodel:R-HSA-9634600? (This is an artifact of assigning individual BPs to reactions???)
http://noctua-dev.berkeleybop.org/editor/graph/gomodel:R-HSA-75109? (busy but good)
http://noctua-dev.berkeleybop.org/editor/graph/gomodel:R-HSA-500753? (busy but good)
http://noctua-dev.berkeleybop.org/editor/graph/gomodel:R-HSA-75105? (busy but good)
http://noctua-dev.berkeleybop.org/editor/graph/gomodel:R-HSA-196836? (the MFs in branches are given a more generic causal relationship. I think this is ok. See downstream of R-HSA-198818. Let's look at this one. (Transporters))
http://noctua-dev.berkeleybop.org/editor/graph/gomodel:R-HSA-196819? (We are missing the output of R-HSA-965067 and R-HSA-196761, R-ALL-29480, which is the input of the downstream transporter R-HSA-8875838)
http://noctua-dev.berkeleybop.org/editor/graph/gomodel:R-HSA-9759218? (This one is kind of puzzling and I can't figure out why. Maybe it's because the inputs and outputs are often gene products)
http://noctua-dev.berkeleybop.org/editor/graph/gomodel:R-HSA-2142789? (Busy but good)

@ukemi
Copy link

ukemi commented Mar 21, 2024

Hi @dustine32. I just noticed something else. Small molecules that are shared by connected inputs and outputs of reactions should only exist if the reactions are directly upstream-downstream partners. I just noticed in the glycolysis model (R-HSA-70171) that ATP (R-ALL-113592_cytosol) is only represented as a single instance which is connected to reactions that are not consecutive (output of R-HSA-71850, R-HSA-71670 and input of R-HSA-70467, R-HSA-70420). This doesn't make biological sense and will effect the results of #317.

@dustine32
Copy link
Collaborator Author

@ukemi Yes, connecting these reactions to the same ATP instance does seem to go against the (mostly) linear pathway idea. But as @thomaspd has explained to me before, these location-specific instances can be thought of as pools of molecules, not individual molecules.

However, if the ATP class is considered a currency chemical, we could add it to the small_mol_do_not_join_ids list and they would be distinct to their reactions (and likely clean up the UI a bit).

@dustine32
Copy link
Collaborator Author

http://noctua-dev.berkeleybop.org/editor/graph/gomodel:R-HSA-196819? (We are missing the output of R-HSA-965067 and R-HSA-196761, R-ALL-29480, which is the input of the downstream transporter R-HSA-8875838)

This issue of missing outputs is a bug created by this line deleting the old transport input instance after renaming its ID:

deleteOwlEntityAndAllReferencesToIt(thing_ind);

I could solve this in two different ways:

  1. Change to not delete the instance if it's also an input/output of a non-transport reaction. This will retain the separate instance of the small molecule.
  2. Keep the code to delete the old small mol instance but add a new has_output edge between the non-transport reaction and the new "primary input" small mol instance. Example: [rxn:thiamine triphosphate phosphatase activity] -has_output-> [thiamine(1+) diphosphate(3-)] <-has_primary_input- [rxn:thiamine pyrophosphate transmembrane transporter activity]

@ukemi @deustp01 Do you have any preference? If not, number 1 is probably easier to implement.

@ukemi
Copy link

ukemi commented Mar 22, 2024

Let's try number one and see how it looks. If we don't like it, we should try number 2. I think in the long run we should coordinate with the ontology group to make transporter activities have primary outputs. The only case that I can think of off the top of my head where the output of a transporter is not the same as an input is a transporter in bacteria that simultaneously phosphphorylates a glucose molecule. It's kind of cool. @deustp01, what do you think?

@ukemi
Copy link

ukemi commented Mar 22, 2024

@deustp01 do you think a biochemist considering glycolysis would say that the (pool) ATP output of pyruvate kinase is the input of phosphofructokinase. If this were the case, pyruvate kinase would be directly causally upstream of PFK. This seems like a dangerous oversimplification of the biology. I think a biochemist would say that PK is the last step of glycolysis, not that it is upstream of PFK. The problem here is that ATP from glycolysis is important, but so is ATP from oxidative phosphorylation. And the invention of a generic pool in a specific model doesn't represent the biology well because it doesn't take into account levels. When ATP is low, the processes that create it are positively regulated and when it is too high they are negatively regulated (except in cancer when glycolysis goes crazy). However, the pool of glucose-6-P that is made by glucokinase is the major input of G-P-isomerase in the glycolytic process. So we would want to say that glucokinase has an output of G-6-P that GPI converts to F-6-P. These relations are easy to find in Reactome because you guys assert the immediately upstream-downstream reaction partners. Also not that in our current pruning proposal, ATP will be pruned from glycolysis so it won't be an output. It will also be pruned as a currency chemical. Even a high-school biology student knows that glycolysis makes ATP and it's important. (I asked my son this exact question when he was taking AP bio).

@deustp01
Copy link
Collaborator

do you think a biochemist considering glycolysis would say that the (pool) ATP output of pyruvate kinase is the input of phosphofructokinase.

Never. I have never seen glycolysis represented as a cyclical process. But, if enzymes and small molecules are moving randomly in a homogeneous solution, what prevents the process from being an adenine nucleotide cycle? Or a NAD cycle? I think there are good data saying that the dilute-aqueous-solution model is wrong - there is all sorts of local structure such as successive enzymes tethered into assembly lines (so the pools you mention are selectively accessible to enzymes), and probably features to allow ATP concentrations to vary from place to place in the cytosol that create local inhomogeneity.

And while the simplifications of current activity flow models make it hard, maybe impossible, to capture this inhomogeneity, even a process description doesn't do it consistently well. This is probably another place where, while for efficiency we want to build reasoning tools to identify primary inputs and outputs, we need to recognize that the best tools we know how to build are going to get some stuff wrong, and therefore we always need an option for expert manual review and correction.

Losing ATP entirely as a product of glycolysis? If I understand right, the current exercise is an attempt to identify small molecules that have roles of primary input and primary output, and then to use possession of these roles to infer causal order of reactions.That should not require us to suppress the existence of other inputs and outputs, I hope, so a causally ordered glycolysis activity flow could still alos be seen to generate net ATP and net NADH?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants