Utility functions for traversing the Medical Subject Heading (MeSH) ontology tree
Medical Subject Headings (MeSH) is an ontology for classifying information within the biomedical domain.
$ npm install mesh-tree --save
To use in a single process, specify path to the MeSH levelgraph DB. If config object is empty, mesh-tree will use $PATH_TO_MESH_DB as the path:
import MeshTree from 'mesh-tree';
let meshTree = new MeshTree({
dbPath: '/path/to/mesh/db'
});
meshTree.getAllDescUIs().then(result => {
console.log(result);
});
To use with multiple processes, pass in a multilevel
client.
-
server:
import level from 'level'; import multilevel from 'multilevel'; import net from 'net'; let db = level('/path/to/mesh/db'); net.createServer(con => { con.pipe(multilevel.server(db)).pipe(con); }).listen(meshTreePort);
-
client:
import MeshTree from 'mesh-tree'; import multilevel from 'multilevel'; import net from 'net'; let db = multilevel.client(); let con = net.connect(meshTreePort); con.pipe(db.createRpcStream()).pipe(con); let meshTree = new MeshTree({ level: db }); meshTree.getAllDescUIs().then(result => { console.log(result); });
Returns array of all descriptor record UIs. Four types of descriptor records exist: TopicalDescriptor
, GeographicalDescriptor
, PublicationType
, CheckTag
. By default, only TopicalDescriptor
and GeographicalDescriptor
are included.
opts
:
format
:rdf
ormesh
for output formatclasses
: array of descriptor record types to include (default['TopicalDescriptor', 'GeographicalDescriptor']
)
Returns array of all chemical supplementary record UIs
opts
:
format
:rdf
ormesh
for output format
Returns array of all disease (rare) supplementary record UIs
opts
:
format
:rdf
ormesh
for output format
Returns array of all protocol (e.g., cancer-related) supplementary record UIs
opts
:
format
:rdf
ormesh
for output format
Returns the cleaned text output of the wikipedia page corresponding to the descriptor record UI
opts
:
id
: either RDF URL id or MeSH UIlevel
:0
- abstract only1
- all text
One can extract either the abstract or entire body of text from wikipedia (cleaned, without link info, references, citations, etc.) for a particular concept, based on the preferred concept term. The function automatically follows any automatic redirects. For example, in MeSH the concept Calcimycin
corresponds to the wikipedia page on A23187
, which is an accepted term under the MeSH concept but not the preferred term.
Returns array of tree numbers by descriptor record unique identifier.
Example: 'D000001'
returns ['D03.438.221.173']
opts
:
id
: either RDF URL id or MeSH UIformat
:rdf
ormesh
for output format
Returns descriptor record unique identifier by tree number.
Example: 'D03.438.221.173'
returns 'D000001'
opts
:
treeNum
: either RDF URL id with tree number or tree number by itselfformat
:rdf
ormesh
for output format
Returns top-level MeSH category
Example: 'D000001'
returns 'Chemicals and Drugs'
opts
:
id
: either RDF URL id or MeSH UI
Returns the preferred term by descriptor record unique identifier (i.e., the preferred term of the preferred concept).
Example: 'D000001'
returns 'Calcimycin'
opts
:
id
: either RDF URL id or MeSH UI
Returns all terms by descriptor record unique identifier (i.e., all terms for all concepts, both preferred and not).
Example: 'D000001'
returns ['A23187, Antibiotic', 'Antibiotic A23187', 'A23187', 'A 23187', 'A-23187', 'Calcimycin']
Can also use chemical supplementary concept records UIs here as well:
Example: 'C025734'
returns ['CH-A1-MG', 'alpha 1 microglobulin, chorionic', 'chorionic alpha 1-microglobulin', 'chorionic alpha(1)-microglobulin']
opts
:
id
: either RDF URL id or MeSH UI
Returns scope note for descriptor record unique identifier (scope notes are contained in the preferred concept record).
Example: 'D000001'
, via concept 'M0000001'
, returns 'An ionophorous, polyether antibiotic from Streptomyces chartreusensis. It binds and transports CALCIUM and other divalent cations across membranes and uncouples oxidative phosphorylation while inhibiting ATPase of rat liver mitochondria. The substance is used mostly as a biochemical tool to study the role of divalent cations in various biological systems.'
opts
:
id
: either RDF URL id or MeSH UI
Returns parent descriptor records UIs (returns an array as records can exist in multiple tree branches).
Example: 'D000001'
returns ['D001583']
Example: 'D005138'
returns ['D006197', 'D005123']
If id provided is SCR, returns parent descriptor record UIs mapped from supplementary concept record UI.
Example: 'C041293'
returns ['D011140']
Example: 'C025735'
returns ['D001286', 'D002164', 'D012602']
opts
:
id
: either RDF URL id or MeSH UIformat
:rdf
ormesh
for output format
Returns all ancestor descriptor records UIs, following all parent branches. Returns an array.
Example: 'D000001'
returns ['D001583', 'D006574', 'D006571']
Example: 'D005138'
returns ['D005123', 'D006197', 'D005145', 'D012679', 'D034582', 'D006257', 'D001829']
opts
:
id
: either RDF URL id or MeSH UIformat
:rdf
ormesh
for output format
Returns children descriptor records UIs (immediate, not descendants)
Example: 'D012343'
returns ['D012345', 'D000926', 'D012346']
opts
:
id
: either RDF URL id or MeSH UIformat
:rdf
ormesh
for output format
Returns sibling descriptor records UIs (across all branches a descriptor record may exist under).
Example: D015834 (Cochlear Diseases)
returns D018159 (Endolymphatic Hydrops), D015837 (Vestibular Diseases), D007762 (Labyrinthitis)
opts
:
id
: either RDF URL id or MeSH UIformat
:rdf
ormesh
for output format
Takes as argument an array of descriptor record UIs and returns descriptor records UI of closest common ancestors of two or more descriptor record UIs (if a descriptor exists in more than one branch on the tree, there may be more than one common ancestor).
Example: D012345 (RNA, Transfer, Amino Acid-Specific), D000926 (Anticodon)
returns D012343 (RNA, Transfer)
Example: D000233 (Adenoidectomy), D014068 (Tonsillectomy), D007828 (Laryngoscopy)
returns D013517 (Otorhinolaryngologic Surgical Procedures)
Example: D011434 (Proprioception), D014785 (Vision, Ocular), D004856 (Postural Balance)
returns D012677 (Sensation)
opts
:
ids
: Array of either RDF URL id or MeSH UIformat
:rdf
ormesh
for output format
Tests whether or not id2 is a descendant of id1 (child of >=1 depth). IDs can be either RDF URL ids or MeSH UIs.
Takes a flat array of descriptor record UIs and returns a nested tree structure based on parent-descendant relationships amongst all the array elements. In other words, it recreates a subtree based on the overarching MeSH ontology tree on a given list of element nodes.
For example, given D000233 (Adenoidectomy), D014068 (Tonsillectomy), D007828 (Laryngoscopy), D013517 (Otorhinolaryngologic Surgical Procedures)
, the following is returned:
[
{
"@id": "http://id.nlm.nih.gov/mesh/D013517",
"parent": null,
"children": [
{
"@id": "http://id.nlm.nih.gov/mesh/D000233",
"parent": "http://id.nlm.nih.gov/mesh/D013517"
},
{
"@id": "http://id.nlm.nih.gov/mesh/D014068",
"parent": "http://id.nlm.nih.gov/mesh/D013517"
},
{
"@id": "http://id.nlm.nih.gov/mesh/D007828",
"parent": "http://id.nlm.nih.gov/mesh/D013517"
}
]
}
]
An example for a list containing more than one "relative top-level" element, such as D011434 (Proprioception), D014785 (Vision, Ocular), D004856 (Postural Balance)
, gives:
[
{
"@id": "http://id.nlm.nih.gov/mesh/D011434",
"parent": null,
"children": [
{
"@id": "http://id.nlm.nih.gov/mesh/D004856",
"parent": "http://id.nlm.nih.gov/mesh/D011434"
}
]
},
{
"@id": "http://id.nlm.nih.gov/mesh/D014785",
"parent": null
}
]
Tests whether a descriptor has pharmacological actions (in other words, if the descriptor is a drug). If true, returns array of descUI mappings of the pharmacological action, otherwise returns null.
Example: 'D000001' (Calcimycin)
returns ['D000900', 'D061207'] (Anti-Bacterial Agents, Calcium Ionophores)
opts
:
id
: either RDF URL id or MeSH UIformat
:rdf
ormesh
for output format
Performs mapping of MeSH concepts onto Schema.org classes (e.g., Drug)
opts
:
id
: either RDF URL id or MeSH UI
Creates properties object from descriptor id.
Example propRequestObj
:
{
'@id': 'http://id.nlm.nih.gov/mesh/D000001',
properties: ['name', 'description', 'synonyms', 'schemaOrgType', 'codeValue', 'codingSystem']
}
returns
{
'@id': 'http://id.nlm.nih.gov/mesh/D000001',
'name': 'Calcimycin',
'description': 'An ionophorous, polyether antibiotic from Streptomyces chartreusensis. It binds and transports CALCIUM and other divalent cations across membranes and uncouples oxidative phosphorylation while inhibiting ATPase of rat liver mitochondria. The substance is used mostly as a biochemical tool to study the role of divalent cations in various biological systems.',
'synonyms': ['A23187, Antibiotic', 'Antibiotic A23187', 'A23187', 'A 23187', 'A-23187'],
'schemaOrgType': 'Drug',
'codeValue': 'D000001',
'codingSystem': 'MeSH'
}
Note that fixture data needs to be loaded into a test levelgraph db prior to running test.
$ npm run initdb-test
$ npm test
To use with the full MeSH data:
Raw data are available here as ascii and XML formats. Registration is required: see the Memorandum of Understanding.
Run ./bin/fetch_mesh.sh
. The script will prompt for the email used in registration. All relevant MeSH files are downloaded into $HOME/data/mesh/
.
See HHS/meshrdf for instructions on transforming MeSH from XML to RDF N-triples using Saxon (java). There are no issues with Saxon 9.6 home edition. Additional details are available at the meshrdf website. Note: still as of July 2015, there is an error in the qual2015.dtd
file where the line <!ENTITY % DescriptorRecordSet SYSTEM "desc2014.dtd">
needs to be changed to desc2015.dtd
given that all the most up-to-date files are used. Export the path of Saxon $SAXON_JAR
and run ./bin/mesh_xml2rdf.sh
. The RDF N-triples file will be produced at $HOME/data/mesh/RDF
.
We utilize LevelGraph (which is built on top of LevelDB) with the LevelGraph-N3 extension for storing and accessing the MeSH ontology as RDF N-triples. Run PATH_TO_MESH_RDF=... PATH_TO_MESH_DB=... npm run initdb
to stream the RDF data into the datastore (npm run initdb-quiet
for quiet mode). Environment variables are as follows:
PATH_TO_MESH_RDF
: location of the N-triples file.PATH_TO_MESH_DB
: specifies the target path to the LevelGraph database. This is then used during initialization by the mesh-tree module.