-
Hi, I've collected annotations of a large number of genes from GO, and propagated annotations back to the GO ancestors. Now I am trying to use these annotations for a general-purpose genomic functional graph (similar to genomic KG, etc. that are recently developed). The problem is that those top-level terms (e.g., on levels 0 and 1) really don't make sense. However, I don't really know at which level I should be cutting off the GO terms... And there are also papers cutting off terms based on the number of genes associated with such terms. Can someone recommend some papers or approaches that contain details on essentially trimming GO terms w.r.t GO level or #genes associated (whether before or after propagation of GOAs), etc.? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 4 replies
-
I would suggest you use one of the GO subsets available http://geneontology.org/docs/download-ontology/ The 'Generic GO subset' is one that should be broadly useful for your case, giving enough granularity to map genes to meanful classes. Organism-specific subsets are also available. |
Beta Was this translation helpful? Give feedback.
-
Hi @jasperhyp we haven't written this up yet. The generic subset was updated fairly recently via a GO consortium working group and with feedback from GOC members. This subset is directed by grouping in biological classes that are physiologically relevant - so at a level that is not too specific and not too general. So, gene numbers/class are not important per se, but might be used to see if a class is too broad or too narrow. The major emphais was on coverage and - getting the number of genes with annotations that do not fall into the subset (across species) as low as possible. Also, producing classes that are not redundant/overlapping was also considered. PomBase have some useful doc on GO slims: https://www.pombase.org/browse-curation/fission-yeast-go-slimming-tips which captures this essence of the process we followed. Their paper: https://royalsocietypublishing.org/doi/10.1098/rsob.180241 also captures the major aims of the generic slim that we followed. |
Beta Was this translation helpful? Give feedback.
Hi @jasperhyp we haven't written this up yet. The generic subset was updated fairly recently via a GO consortium working group and with feedback from GOC members.
This subset is directed by grouping in biological classes that are physiologically relevant - so at a level that is not too specific and not too general. So, gene numbers/class are not important per se, but might be used to see if a class is too broad or too narrow. The major emphais was on coverage and - getting the number of genes with annotations that do not fall into the subset (across species) as low as possible. Also, producing classes that are not redundant/overlapping was also considered.
PomBase have some useful doc on …