Clustering: What are the most common mechanisms nature uses for a particular function? #130

bruffridge · 2022-10-05T14:42:31Z

While we wait to get mechanisms and function extracted using OpenAI, we can try out clustering algorithms using AskNature functions and summaries as an approximation of mechanism.

bruffridge · 2022-10-19T14:12:09Z

Some interns a few years ago developed clusters of biology papers. Putting a link to the code here incase it is of use.

https://github.com/nasa-petal/PeTaL/blob/legacy/petal/cluster.py

AI-Complete · 2023-12-26T17:51:25Z

-- coding: utf-8 --

'''
Created on Wed Jul 11 07:52:04 2018
@author: bwhiteak and cbaumler
'''

import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation
import spacy
from spacy.matcher import Matcher
import pyLDAvis
import pyLDAvis.sklearn
import string, json, sys, pickle

ignore = pickle.load(open("petal/data/cluster/ignore/ignore.p", "rb"))

def create_df0(text):
"""
Format and parse a long string of abstracts into a DataFrame.

Parameters:
- text (str): A long string containing multiple abstracts.

Returns:
- DataFrame: A pandas DataFrame containing parsed abstracts.
"""
try:
    if sys.platform == 'win32':
        line_sep = '\r\n'
    else:
        line_sep = '\n'

    text_list = [y.strip() for y in text.split(sep=(line_sep * 3))]
    title_list = [x.split(line_sep, 1) for x in text_list]
    if title_list[-1] == ['']:
        del title_list[-1]

    nlp = spacy.load('en_core_web_sm')
    matcher = Matcher(nlp.vocab)
    matcher.add("hyphen", None, [{}, {"TEXT": "-"}, {}])

    for i in range(len(title_list)):
        if len(title_list[i]) > 1:
            doc = nlp(title_list[i][1])
            doc = match_n_merge(matcher, doc)
            n, v, vd, al = get_split_tokens(doc)
            title_list[i] = [title_list[i][0], n, v, vd, al]

    return pd.DataFrame(title_list, columns=['Title', 'n', 'v', 'vd', 'all'])

except Exception as e:
    print(f"Error processing abstracts: {e}")
    return pd.DataFrame()

def make_doc_dict(doc_list, assoc_topics):
"""
Create a dictionary mapping document titles to their associated topics.

Parameters:
- doc_list (list): List of document titles.
- assoc_topics (list): List of topics associated with each document.

Returns:
- dict: A dictionary where each key is a document title and the value is the associated topics.
"""
try:
    return {f"doc_{i}": topic for i, topic in enumerate(assoc_topics)}
except Exception as e:
    print(f"Error creating document dictionary: {e}")
    return {}

def make_topic_dict(model_list, feature_names, n_top_words):
"""
Generates a dictionary of topics with their corresponding terms.

Parameters:
- model_list (list): List of model components.
- feature_names (list): List of feature names from the model.
- n_top_words (int): Number of top words to include for each topic.

Returns:
- dict: A dictionary where each key is a topic number and value is a string of top terms.
"""
try:
    all_mat = np.vstack([model.components_ for model in model_list])
    topic_dict = {
        i: " ".join(feature_names[idx] for idx in topic.argsort()[:-n_top_words - 1:-1])
        for i, topic in enumerate(all_mat)
    }
    return topic_dict
except Exception as e:
    print(f"Error in making topic dictionary: {e}")
    return {}

def make_subset_dict(df):
"""
Creates a dictionary mapping index values to document titles from a DataFrame.

Parameters:
- df (DataFrame): The DataFrame containing document indices and titles.

Returns:
- dict: Dictionary with index values as keys and document titles as values.
"""
try:
    return dict(zip(df.index.values, df['Title'].tolist()))
except Exception as e:
    print(f"Error creating subset dictionary: {e}")
    return {}

def get_lemma(token):
"""
Extracts the lemma of a given token, with special handling for certain cases.

Parameters:
- token (Token): A Spacy token object.

Returns:
- str: The lemma of the token.
"""
if token.text == "species":
    return "species"  # Handling exception for the word 'species'
return token.lemma_

[Continuation of other functions and script logic]

bruffridge assigned 4eshanb Oct 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clustering: What are the most common mechanisms nature uses for a particular function? #130

Clustering: What are the most common mechanisms nature uses for a particular function? #130

bruffridge commented Oct 5, 2022

bruffridge commented Oct 19, 2022

AI-Complete commented Dec 26, 2023

Clustering: What are the most common mechanisms nature uses for a particular function? #130

Clustering: What are the most common mechanisms nature uses for a particular function? #130

Comments

bruffridge commented Oct 5, 2022

bruffridge commented Oct 19, 2022

AI-Complete commented Dec 26, 2023

-- coding: utf-8 --

[Continuation of other functions and script logic]