CureGraph

HIV Inhibition Detector using GraphNN

About

Current day, there are multiple pieces of research are happening on drug discovery. After the generation of GNN, researchers started to use Deep Learning which takes every compound's internal data, then uses graphs they try to minimize the losses. This repo tries to solve a similar problem where the molecules are annotated in the format of binary labels where they can inhibit HIV disease or not.
The raw dataset has been taken from moleculenet.org. The target feature of the dataset is not near uniformity. So, I tried the drop off some of the records so that they may reach a valuable state, also it contains a good number of records.
Finally it is split into train.csv and test.csv
Number of records in train dataset : 3848
Number of records in train dataset : 962

Features of the Project

1. Working with Graph Attention Neural Networks.
2. Easily identifies molecules expected behaviour to HIV disease.
3. Getting hands on towards medicinal AI.
4. Integrated with mlflow to track training results.
5. Using Graph data structures somewhere different than Competitive Programming.

Installation & Usage

Install Python on device. Use this link.
Install Anaconda on device. Use this tutorial.
Install RDKit and it's components. Check rdkit.org/docs/Install.html. The RDKit modules installation was somehow not with pip. So, strictly use conda.
Install PyTorch with cuda for faster training. Version details are shared below.
```
 torch==1.12.0+cu116
 torchvision==0.13.0+cu116
```
Install PyG (PyTorch Geometric to prepare graph datasets). Follow this link : pytorch-geometric.readthedocs.io/

Clone the repository. Run this command on terminal

 git clone https://github.com/sagnik1511/CureGraph.git

Go inside repository using cd.
```
 cd CureGraph
```
Run the streamlit app using this command
```
 streamlit run app.py
```
If you want to train the model, follow the procedures below

a) Update the config/default.yaml as per your need.

dataset:
    root: "data/"
    batch_size: 128
model:
    model_embedding_size: 64
    model_attention_heads: 2
    model_layers: 4
    model_dropout_rate: 0.2
    model_top_k_ratio: 0.5
    model_top_k_every_n: 1
    model_dense_neurons: 256
optimizer:
    name: "adam"
    lr: 0.001
    weight_decay: 0.00001
    momentum: NA  # some of the optimizer uses momentum, some of them don't. Use NA in case there are no parameter like momentum
training:
    loss_fn: "bce"
    max_epochs: 100
    early_stop_count: 10
    gpu_node: "0"

You can add your new configuration by mimicing this format. Just make sure the file is a yaml file and it is stored inside config directory.

b) Run the training scripts as a module.

python -m src.train

c) Fire mlflow server and track model training jobs using this command.

mlflow ui

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
assets		assets
config		config
data		data
notebooks		notebooks
src		src
.gitignore		.gitignore
README.md		README.md
app.py		app.py
best_model.pth		best_model.pth

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CureGraph

HIV Inhibition Detector using GraphNN

About

Features of the Project

Installation & Usage

Module Functionalities Achieved

If you get any errors while running the code, please make a PR.

Thanks for Visiting!!!

If you like the project, do ⭐

Also follow me on GitHub , Kaggle , LinkedIn

About

Releases

Packages

Languages

sagnik1511/CureGraph

Folders and files

Latest commit

History

Repository files navigation

CureGraph

HIV Inhibition Detector using GraphNN

About

Features of the Project

Installation & Usage

Module Functionalities Achieved

If you get any errors while running the code, please make a PR.

Thanks for Visiting!!!

If you like the project, do ⭐

Also follow me on GitHub , Kaggle , LinkedIn

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages