Monitor Training with Tensorboard #105

PauliSpin · 2023-05-03T07:36:30Z

I am trying to monitor training for the OpenChatKit-7B model by increasing the number of iterations etc. I want to monitor the quality of the training with Tensorboard but have not managed to get it to work. I have been including the SummaryWriter into the test_loop function in dist_clm_train.py:

    ...
    ...
    loss = torch.tensor(loss_list).mean()
    ppls = torch.exp(loss)
    metric = {"valid.perplexity": ppls.item(), "valid.loss": loss.item()}

    # ADDED to calculate tensorboard scalars 
    metric = {"train.perplexity": ppls.item(), "train.loss": loss.item()}
    train_log(metric, step=pipe.global_step)
    tb.add_scalar('train/perplexity', ppls.item(), tmpStep)
    tb.add_scalar('train/loss', loss.item(), tmpStep)
    tmpStep += 1
    # END of ADDED

    ...
    ...

Please can you advise whether this is possible and if so how it can be done. Any help / guidance would be much appreciated.

Many thanks,

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Monitor Training with Tensorboard #105

Monitor Training with Tensorboard #105

PauliSpin commented May 3, 2023

Monitor Training with Tensorboard #105

Monitor Training with Tensorboard #105

Comments

PauliSpin commented May 3, 2023