The raw dataset has been taken from moleculenet.org. The target feature of the dataset is not near uniformity. So, I tried the drop off some of the records so that they may reach a valuable state, also it contains a good number of records.
Finally it is split into train.csv and test.csv
Number of records in train dataset : 3848
Number of records in train dataset : 962
1. Working with Graph Attention Neural Networks.
2. Easily identifies molecules expected behaviour to HIV disease.
3. Getting hands on towards medicinal AI.
4. Integrated with mlflow to track training results.
5. Using Graph data structures somewhere different than Competitive Programming.
-
Install Python on device. Use this link.
-
Install Anaconda on device. Use this tutorial.
-
Install RDKit and it's components. Check rdkit.org/docs/Install.html. The RDKit modules installation was somehow not with pip. So, strictly use
conda
. -
Install PyTorch with cuda for faster training. Version details are shared below.
torch==1.12.0+cu116 torchvision==0.13.0+cu116
-
Install PyG (PyTorch Geometric to prepare graph datasets). Follow this link : pytorch-geometric.readthedocs.io/
-
Clone the repository. Run this command on terminal
git clone https://github.com/sagnik1511/CureGraph.git
-
Go inside repository using
cd
.cd CureGraph
-
Run the streamlit app using this command
streamlit run app.py
-
If you want to train the model, follow the procedures below
a) Update the config/default.yaml as per your need.
dataset:
root: "data/"
batch_size: 128
model:
model_embedding_size: 64
model_attention_heads: 2
model_layers: 4
model_dropout_rate: 0.2
model_top_k_ratio: 0.5
model_top_k_every_n: 1
model_dense_neurons: 256
optimizer:
name: "adam"
lr: 0.001
weight_decay: 0.00001
momentum: NA # some of the optimizer uses momentum, some of them don't. Use NA in case there are no parameter like momentum
training:
loss_fn: "bce"
max_epochs: 100
early_stop_count: 10
gpu_node: "0"
You can add your new configuration by mimicing this format. Just make sure the file is a yaml
file and it is stored inside config
directory.
b) Run the training scripts as a module.
python -m src.train
c) Fire mlflow server and track model training jobs using this command.
mlflow ui
- Effective Web-platform UI.
- Training through CPU / GPU.
- Basic Loggings & Reports.
- MLFlow Integratio
- W&B Integration.
- Endpoint Service.