Synonym lookup super slow? How to fix? #2367

edeutsch · 2024-09-07T23:21:48Z

I've noticed this for a while, but only posting now. Has anyone noticed that the Synonym lookup through the ARAX GUI is super slow? Try searching for metformin or ibuprofen or anything reasonably common, and I start hearing my CPU fans groaning and it takes 15+ seconds for something to appear. I assume this is either because so much data is returned or rendering the graph is so expensive or? I wonder if anyone else has this issue? And if anyone has ideas on how best to solve it? Return less data? Don't render the graph unless asked? This service was great when answers came back within a second, but now it's painful to use.

ideas?

amykglen · 2024-09-08T03:49:40Z

yes, this started happening after we started using the SRI Node Normalizer's drug_chemical_conflate parameter, which made the clusters for certain drugs really big.

I definitely think it's the 'match graph' that's causing the issue (I think the acetaminophen graph has 10s of thousands of edges now) - I wonder if we could just not display the graph if it has more than some reasonable number of edges? not sure if there's an existing way to determine the number of edges without actually having to load all of them..

isbluis · 2024-09-17T01:29:55Z

As a quick test in devLM, looking up metformin results in the following rough timings:

13 seconds to receive JSON response (>77Mb)
35 seconds to render full table, without displaying the Concept Graph
55 seconds to display, including graph

edeutsch · 2024-09-17T05:52:19Z

oof, thanks. Yeah, I think we should put some effort into slimming down the response first somehow. And then maybe something on the front end.

amykglen · 2024-10-12T00:06:24Z

ok, per discussion with @edeutsch and others today - I've added an optional max_synonyms parameter to the NodeSynonymizer's get_normalizer_results() (in master), which you can use like this:

synonymizer.get_normalizer_results(entities=DOID:14330, max_synonyms=2)

and which produces a truncated cluster like this one (I haven't shown the full knowledge_graph below, but it is also truncated to two nodes and only edges that connect those two nodes):

{
  "DOID:14330": {
    "id": {
      "identifier": "MONDO:0005180",
      "name": "Parkinson disease",
      "category": "biolink:Disease",
      "SRI_normalizer_name": "Parkinson disease",
      "SRI_normalizer_category": "biolink:Disease",
      "SRI_normalizer_curie": "MONDO:0005180"
    },
    "total_synonyms": 18,
    "categories": {
      "biolink:Disease": 18
    },
    "nodes": [
      {
        "identifier": "DOID:14330",
        "category": "biolink:Disease",
        "label": "Parkinson's disease",
        "major_branch": "DiseaseOrPhenotypicFeature",
        "in_sri": true,
        "name_sri": "Parkinson's disease",
        "category_sri": "biolink:Disease",
        "in_kg2pre": true,
        "name_kg2pre": "Parkinson&apos;s disease",
        "category_kg2pre": "biolink:Disease"
      },
      {
        "identifier": "MONDO:0005180",
        "category": "biolink:Disease",
        "label": "Parkinson disease",
        "major_branch": "DiseaseOrPhenotypicFeature",
        "in_sri": true,
        "name_sri": "Parkinson disease",
        "category_sri": "biolink:Disease",
        "in_kg2pre": true,
        "name_kg2pre": "Parkinson disease",
        "category_kg2pre": "biolink:Disease"
      }
    ],
    "knowledge_graph": {
      "nodes": {
        ...

so we were thinking the UI can decide how many nodes is reasonable to display in one cluster (e.g., 200?), and then call get_normalizer_results() with that number as max_synonyms. and maybe also provide a dropdown or the like that lets a user increase max_synonyms.

note that the top-level "categories" slot shown above that reports node counts by category includes counts for the full cluster, and I also added a top-level "total_synonyms" slot to make it easy to report how many nodes are in the full cluster.

let me know if I can do anything else!

amykglen added a commit that referenced this issue Oct 12, 2024

Add 'max_synonyms' param to get_normalizer_results() #2367

7ebd333

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Synonym lookup super slow? How to fix? #2367

Synonym lookup super slow? How to fix? #2367

edeutsch commented Sep 7, 2024

amykglen commented Sep 8, 2024

isbluis commented Sep 17, 2024

edeutsch commented Sep 17, 2024

amykglen commented Oct 12, 2024

Synonym lookup super slow? How to fix? #2367

Synonym lookup super slow? How to fix? #2367

Comments

edeutsch commented Sep 7, 2024

amykglen commented Sep 8, 2024

isbluis commented Sep 17, 2024

edeutsch commented Sep 17, 2024

amykglen commented Oct 12, 2024