Skip to content

Latest commit

 

History

History
197 lines (117 loc) · 4.88 KB

05_add_mpgcn.md

File metadata and controls

197 lines (117 loc) · 4.88 KB

Adding a new graph neural network from the class of MP-GCNs

1. New graph layer

Add a class MyGraphLayer() in my_graph_layer.py file in the layers/ directory. A standard code is

import torch
import torch.nn as nn
import dgl

class MyGraphLayer(nn.Module):
    
    def __init__(self, in_dim, out_dim, dropout):
        super().__init__()

        # write your code here
        
    def forward(self, g_in, h_in, e_in):
        
        # write your code here
        # write the dgl reduce and updates function call here

        return h_out, e_out

Directory layers/ contains all layer classes for all graph networks and standard layers like MLP for readout layers.

As instance, the GCN class GCNLayer() is defined in the layers/gcn_layer.py file.


2. New graph network

Add a class MyGraphNetwork() in my_gcn_net.py file in the net/ directory. The loss() function of the network is also defined in class MyGraphNetwork().

import torch
import torch.nn as nn
import dgl

from layers.my_graph_layer import MyGraphLayer

class MyGraphNetwork(nn.Module):
    
    def __init__(self, in_dim, out_dim, dropout):
        super().__init__()

        # write your code here
        self.layer = MyGraphLayer()
        
    def forward(self, g_in, h_in, e_in):
        
        # write your code here
        # write the dgl reduce and updates function call here

        return h_out

    def loss(self, pred, label):

        # write your loss function here

        return loss

Add a name MyGNN for the proposed new graph network class in load_gnn.py file in the net/ directory.

from nets.my_gcn_net import MyGraphNetwork

def MyGNN(net_params):
    return MyGraphNetwork(net_params)

def gnn_model(MODEL_NAME, net_params):
    models = {
        'MyGNN': MyGNN
    }
    return models[MODEL_NAME](net_params)

For the ZINC example, GCNNet() in nets/molecules_graph_regression/gcn_net.py is given the GNN name GCN in nets/molecules_graph_regression/load_net.py.


3. Define the training/testing loops of the new task

Add a file train_data_my_new_task.py in the train/ directory.

def train_epoch_sparse(model, optimizer, device, data_loader, nb_epochs):
    model.train()

    # write your code here
    
    return train_loss, train_acc

def evaluate_network_sparse(model, device, data_loader):
    model.eval()

    # write your code here
        
    return test_loss, test_acc

For ZINC, the loops are defined in file train/train_molecules_graph_regression.py.


4. Main code

Add a new notebook file main_my_new_task.ipynb or python main_my_new_task.py for the new task.

from nets.load_net import gnn_model 
from data.data import LoadData 
from train.train_data_my_new_task import train_epoch_sparse as train_epoch, evaluate_network_sparse as evaluate_network

DATASET_NAME = 'MY_DATASET'
dataset = LoadData(DATASET_NAME)

MODEL_NAME = 'MyGNN'
model = gnn_model(MODEL_NAME, net_params)

optimizer = optim.Adam(model.parameters())
train_loader = DataLoader(trainset, batch_size=128, shuffle=True, collate_fn=dataset.collate)
epoch_train_loss, epoch_train_acc = train_epoch(model, optimizer, device, train_loader, epoch)   

Python file main_my_new_task.py can be generated by saving the notebook main_my_new_task.ipynb as a regular python file. (We actually developed a new graph network within the notebook and then converted the .ipynb to .py, but it can be done directly in .py)

As for ZINC, the main file is main_molecules_graph_regression.ipynb or main_molecules_graph_regression.py.


5. Run code

Code can be executed in the notebook main_my_new_task.ipynb or in terminal with command

bash main_my_new_task.py  --dataset DATASET_NAME --gpu_id 0 --config 'configs/my_new_task_MyGNN_DATASET_NAME.json' 

The training and network parameters for the dataset and the network is stored in a json file in the configs/ directory.

{
    "gpu": {
        "use": true,
        "id": 0
    },
    
    "model": MyGNN,
    "dataset": DATASET_NAME,
    
    "out_dir": "out/my_new_task/",
    
    "params": {
        "seed": 41,
        "epochs": 1000,
        "batch_size": 128,
        "init_lr": 0.001
    },
    
    "net_params": {
        "L": 4,
        "hidden_dim": 70,
        "out_dim": 70,
        "residual": true
    }
}

For ZINC, the config is molecules_graph_regression_GCN_ZINC_100k.json and the code is run with

python main_molecules_graph_regression.py --dataset ZINC --gpu_id 0 --config 'configs/molecules_graph_regression_GCN_ZINC_100k.json'