Skip to content

sagnik1511/CureGraph

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CureGraph

HIV Inhibition Detector using GraphNN




About

Current day, there are multiple pieces of research are happening on drug discovery. After the generation of GNN, researchers started to use Deep Learning which takes every compound's internal data, then uses graphs they try to minimize the losses. This repo tries to solve a similar problem where the molecules are annotated in the format of binary labels where they can inhibit HIV disease or not.
The raw dataset has been taken from moleculenet.org. The target feature of the dataset is not near uniformity. So, I tried the drop off some of the records so that they may reach a valuable state, also it contains a good number of records.
Finally it is split into train.csv and test.csv
Number of records in train dataset : 3848
Number of records in train dataset : 962

Features of the Project

1. Working with Graph Attention Neural Networks.
2. Easily identifies molecules expected behaviour to HIV disease.
3. Getting hands on towards medicinal AI.
4. Integrated with mlflow to track training results.
5. Using Graph data structures somewhere different than Competitive Programming.

Installation & Usage

  1. Install Python on device. Use this link.

  2. Install Anaconda on device. Use this tutorial.

  3. Install RDKit and it's components. Check rdkit.org/docs/Install.html. The RDKit modules installation was somehow not with pip. So, strictly use conda.

  4. Install PyTorch with cuda for faster training. Version details are shared below.

     torch==1.12.0+cu116
     torchvision==0.13.0+cu116
    
  5. Install PyG (PyTorch Geometric to prepare graph datasets). Follow this link : pytorch-geometric.readthedocs.io/

  6. Clone the repository. Run this command on terminal

     git clone https://github.com/sagnik1511/CureGraph.git
    
  7. Go inside repository using cd.

     cd CureGraph
    
  8. Run the streamlit app using this command

     streamlit run app.py
    
  9. If you want to train the model, follow the procedures below

a) Update the config/default.yaml as per your need.

dataset:
    root: "data/"
    batch_size: 128
model:
    model_embedding_size: 64
    model_attention_heads: 2
    model_layers: 4
    model_dropout_rate: 0.2
    model_top_k_ratio: 0.5
    model_top_k_every_n: 1
    model_dense_neurons: 256
optimizer:
    name: "adam"
    lr: 0.001
    weight_decay: 0.00001
    momentum: NA  # some of the optimizer uses momentum, some of them don't. Use NA in case there are no parameter like momentum
training:
    loss_fn: "bce"
    max_epochs: 100
    early_stop_count: 10
    gpu_node: "0"

You can add your new configuration by mimicing this format. Just make sure the file is a yaml file and it is stored inside config directory.

b) Run the training scripts as a module.

python -m src.train

c) Fire mlflow server and track model training jobs using this command.

mlflow ui

Module Functionalities Achieved

  • Effective Web-platform UI.
  • Training through CPU / GPU.
  • Basic Loggings & Reports.
  • MLFlow Integratio
  • W&B Integration.
  • Endpoint Service.

If you get any errors while running the code, please make a PR.

Thanks for Visiting!!!

If you like the project, do ⭐

Also follow me on GitHub , Kaggle , LinkedIn