-
Notifications
You must be signed in to change notification settings - Fork 26
Transition Guide from 0.3.x to 0.4.0
Author: Justus Schock (@justusschock)
Delira 0.4.0 offers a couple of new features, which are essential for future developments. The core of all changes is the unified training and prediction API.
For this we completely rewrote a unified trainer and introduced a Predictor
class. Unfortunately, these changes are breaking the backward compatibility by quite a bit. Let's first see, how we trained and defined a model in delira 0.3.2 :
Let's assume, we just want to train a very simple network consisting of 3 fully connected layers in PyTorch.
Our model definition would probably look something like this:
from delira.models import AbstractPyTorchNetwork
import torch
import logging
class SimpleNet(AbstractPyTorchNetwork):
def __init__(self, num_inputs=32, num_outputs=10, num_hidden=64):
super().__init__()
# build our actual network, just use some linear layers and relus here
self.fc1 = torch.nn.Linear(num_inputs, num_hidden)
self.fc2 = torch.nn.Linear(num_hidden, num_hidden)
self.fc3 = torch.nn.Linear(num_hidden, num_outputs)
self.relu = torch.nn.ReLU()
def forward(self, x):
# pass our tensor x through all the layers
return self.fc3(self.relu(self.fc2(self.relu(self.fc1(x)))))
@staticmethod
def closure(model, data_dict, optimizers, criterions=None, metrics=None, fold=0, **kwargs):
# initialize variables
if criterions is None:
criterions = {}
if metrics is None:
metrics = {}
assert (optimizers and criterions) or not optimizers, \
"Criterion dict cannot be emtpy, if optimizers are passed"
loss_vals = {}
metric_vals = {}
total_loss = 0
# choose suitable context manager
if optimizers:
context_man = torch.enable_grad
else:
context_man = torch.no_grad
with context_man():
# obtain predictions from network
inputs = data_dict.pop("data")
preds = model(inputs)
# calculate losses
if data_dict:
for key, crit_fn in criterions.items():
_loss_val = crit_fn(preds, *data_dict.values())
loss_vals[key] = _loss_val.item()
total_loss += _loss_val
# calculate metrics
with torch.no_grad():
for key, metric_fn in metrics.items():
metric_vals[key] = metric_fn(preds, *data_dict.values()).item()
# backpropagation
if optimizers:
optimizers["default"].zero_grad()
total_loss.backward()
optimizers["default"].step()
# log values
for key, val in {**metric_vals, **loss_vals}.items():
logging.info({"scalar", {"name": key, "value": val}})
return metric_vals, loss_vals, [preds.detach()]
@staticmethod
def prepare_batch(batch: dict, input_device, output_device):
return_dict = {"data": torch.from_numpy(batch.pop("data")).to(input_device).to(torch.float)
for key, val in batch.items():
return_dict[key] = torch.from_numpy(val).to(output_device).to(torch.float)
return return_dict
For simplicity, we did not log predictions, did not enable mixed precision via APEX and left out all the docstrings. We also implemented the prepare_batch
method in a way, that we can use a simple L1-Error
for training.
To train a model, we also need a dataset, which will provide the actual data. For this example we use an artificial dataset, which creates random arrays as input and output:
from delira.data_loading import AbstractDataset
import numpy as np
class RandomDataset(AbstractDataset):
def __init__(self, length, num_inputs, num_outputs):
super().__init__(None, None)
# set attributes for length, number of inputs and number of outputs
self._length = length
self._num_inputs = num_inputs
self._num_outputs = num_outputs
def __getitem__(self, index):
# sample random data
input_data = np.random.rand(self._num_inputs)
output_data = np.random.rand(self._num_outputs)
return {"data": input_data, "label": output_data}
def get_sample_from_index(self, index):
return self.__getitem__(index)
def __len__(self):
return self._length
Next we set up our hyperparameters for training as well as the model kwargs:
import torch
from delira.training import Parameters
params = Parameters(fixed_params={
"model": {
"num_inputs": 5,
"num_hidden": 20,
"num_outputs": 10
},
"training": {
"batch_size": 64, # batchsize to use
"num_epochs": 10, # number of epochs to train
"optimizer_cls": torch.optim.Adam, # optimization algorithm to use
"optimizer_params": {'lr': 1e-3}, # initialization parameters for this algorithm
"losses": {"L1": torch.nn.L1Loss()}, # the loss function
"lr_sched_cls": None, # the learning rate scheduling algorithm to use
"lr_sched_params": {}, # the corresponding initialization parameters
"metrics": {"MSE": torch.nn.MSELoss()} # and some evaluation metrics
}
})
After we created our parameters and thereby defined our actual training, we just need to create instances of our dataset for training and validation and wrap it into datamanagers:
from delira.data_loading import BaseDataManager, SequentialSampler, RandomSampler
dset_train = RandomDataset(5000, 5, 10)
dset_val = RandomDataset(500, 5, 10)
manager_train = BaseDataManager(dset_train, params.nested_get("batch_size"),
transforms=None, sampler_cls=RandomSampler,
n_process_augmentation=4)
manager_val = BaseDataManager(dset_val, params.nested_get("batch_size"),
transforms=None, sampler_cls=SequentialSampler,
n_process_augmentation=4
For simplicity, we just omitted any transforms here. Now, we just have to create an Experiment and run it:
from delira.training import PyTorchExperiment
from delira.training.train_utils import create_optims_default_pytorch
experiment = PyTorchExperiment(params, SimpleNet,
name="ClassificationExample",
save_path="./tmp/delira_Experiments",
optim_builder=create_optims_default_pytorch,
gpu_ids=[0])
experiment.save()
model = experiment.run(manager_train, manager_val)
And that's it. Training should start now. The PyTorchExperiment
internally creates a PyTorchNetworkTrainer
and calls its train
method. If we wanted to do the same with TensorFlow, we would have to TfExperiment
instead, which creates and calls an TfNetworkTrainer
. Sounds simple, right? Theoretically it is that simple - with just one problem:
So far, we had an AbstractTrainer
specifying an interface which was implemented by both, the PyTorchNetworkTrainer
and the TfNetworkTrainer
and same goes for the experiment classes. These implementations were not coupled and did share almost no code. Thus, the behavior of both trainers - although providing the same API - could differ slightly or (in worst case) be completely different. Additionally there was a lot of code duplication inside the framework.
To solve this, we implemented a BaseNetworkTrainer
and a BaseExperiment
, which contain most of the training and experiment based code in a backend-agnostic way and the backend-specific classes (PyTorchNetworkTrainer
, TfNetworkTrainer
, PyTorchExperiment
and TfExperiment
just add the backend-specific stuff like seeding, switching between training and validation mode and backend-specific options for initialization and potential speedup.
There's another problem, if we wanted to test this model on an extra test dataset, the way to go was to partially initialize a full trainer with dummy values and then call its predict
function. This is extremely inefficient (in terms of memory and computation time, as the trainer setup usually involves checking many conditions and eventually wrapping classes by other classes or changing arguments based on these conditions) and also not framework-agnostic.
To solve this, we created a new base class for the BaseExperiment
: the Predictor
. The Predictor
provides basic functionalities for prediction and metric calculation. These capabilities are extended by training-related stuff in the BaseExperiment
.
While doing so, we also solved another problem: delira's data_loading
package was designed with huge datasets in mind, which cannot be completely stored in RAM. Therefore, it provides BaseLazyDataset
etc. While predicting from a dataset, we cached all predictions for one epoch. Although we only cached predictions from the validation/testset and they are usually much smaller than the actual trainset, even these sets may cause OutOfMemory Errors.
To solve this, we made the Predictor
return a generator when predicting from a datamanager instead of internally caching everything.
Now let's have a look at the new API and the changes:
With the new API we need to change a few minor things, to make the training work again:
First let's have a look at our new network definition:
from delira.models import AbstractPyTorchNetwork
import torch
import logging
class SimpleNet(AbstractPyTorchNetwork):
def __init__(self, num_inputs=32, num_outputs=10, num_hidden=64):
super().__init__()
# build our actual network, just use some linear layers and relus here
self.fc1 = torch.nn.Linear(num_inputs, num_hidden)
self.fc2 = torch.nn.Linear(num_hidden, num_hidden)
self.fc3 = torch.nn.Linear(num_hidden, num_outputs)
self.relu = torch.nn.ReLU()
def forward(self, x):
# pass our tensor x through all the layers
out = self.fc3(self.relu(self.fc2(self.relu(self.fc1(x)))))
# NOTE: models must now return a dict; can contain arbitrary number of elements and arbitrary types
return {"pred": out}
# NOTE: criterions were renamed to losses
@staticmethod
def closure(model, data_dict, optimizers, losses=None, metrics=None, fold=0, **kwargs):
# initialize variables
if losses is None:
losses = {}
if metrics is None:
metrics = {}
assert (optimizers and losses) or not optimizers, \
"Criterion dict cannot be emtpy, if optimizers are passed"
loss_vals = {}
metric_vals = {}
total_loss = 0
# choose suitable context manager
if optimizers:
context_man = torch.enable_grad
else:
context_man = torch.no_grad
with context_man():
# obtain predictions from network
inputs = data_dict.pop("data")
preds = model(inputs)
# calculate losses
if data_dict:
for key, crit_fn in losses.items():
# NOTE: to access the actual tensor, we need to index the resulting dict
_loss_val = crit_fn(preds["pred"], *data_dict.values())
loss_vals[key] = _loss_val.item()
total_loss += _loss_val
# calculate metrics
with torch.no_grad():
for key, metric_fn in metrics.items():
metric_vals[key] = metric_fn(preds["pred"], *data_dict.values()).item()
# backpropagation
if optimizers:
optimizers["default"].zero_grad()
total_loss.backward()
optimizers["default"].step()
# log values
for key, val in {**metric_vals, **loss_vals}.items():
logging.info({"scalar", {"name": key, "value": val}})
# NOTE: closure returns only dicts now
return metric_vals, loss_vals, {k: v.detach() for k, v in preds.items()}
@staticmethod
def prepare_batch(batch: dict, input_device, output_device):
return_dict = {"data": torch.from_numpy(batch.pop("data")).to(input_device).to(torch.float)
for key, val in batch.items():
return_dict[key] = torch.from_numpy(val).to(output_device).to(torch.float)
return return_dict
So all together the major changes inside the model, were to make everything a dict now (which is needed during training and prediction) and renaming the criterions
to losses
for conformity between backends.
The new training API in delira 0.4.0 does not change the dataset API, meaning that our dataset from before should work as is (The dataset API has changed from delira 0.3.1 to delira 0.3.2 as you can see here.
For training preparation the names of our hyperparameters have changed a bit. We now have to use val_metrics
and train_metrics
instead of metrics
, since currently the train_metrics
are also calculated within the closure
and thus work on the backend's tensor class, but the val_metrics
don't. They currently operate on numpy arrays, since the validations is no longer done inside the closure, but automatically inside the trainer/predictor. The next major release (0.5.0) will probably re-unite the metrics and move the calculation of the train metrics outside the closure.
Our new hyperparameters are:
import torch
from sklearn.metrics import mean_squared_error
from delira.training import Parameters
params = Parameters(fixed_params={
"model": {
"num_inputs": 5,
"num_hidden": 20,
"num_outputs": 10
},
"training": {
"batch_size": 64, # batchsize to use
"num_epochs": 10, # number of epochs to train
"optimizer_cls": torch.optim.Adam, # optimization algorithm to use
"optimizer_params": {'lr': 1e-3}, # initialization parameters for this algorithm
"losses": {"L1": torch.nn.L1Loss()}, # the loss function
"lr_sched_cls": None, # the learning rate scheduling algorithm to use
"lr_sched_params": {}, # the corresponding initialization parameters
# NOTE: The name and the argument of the metrics have changed!
"val_metrics": {"MSE": mean_squared_error} # and some evaluation metrics
}
})
Creating the datasets and wrapping them into datamanagers remains the same as before.
Our experiment class now accepts and additional key_mapping
argument, which is a dict. This dict defines the mapping of data keys inside our batch dict to keys accepted by our model when calling it.
Since calling a PyTorch model executes its forward
and our forward
has one parameter x
, we need to define the keymapping as key_mapping={"x": "data"}
which means, that the value lying under the data
key inside the batchdict will be given as x
to our model. This is necessary, since the trainer automatically validates the network if a validation set is given. Due to this, the closure should not contain any data-dependent operations that are necessary for prediction and training. These operations should be moved to the preprocessing or inside the model's forward
instead.
The new training now looks like:
from delira.training import PyTorchExperiment
from delira.training.train_utils import create_optims_default_pytorch
experiment = PyTorchExperiment(params, SimpleNet,
name="ClassificationExample",
save_path="./tmp/delira_Experiments",
optim_builder=create_optims_default_pytorch,
key_mapping={"x": "data"},
gpu_ids=[0])
experiment.save()
model = experiment.run(manager_train, manager_val)
Hopefully the changes from delira 0.3.2 to delira 0.4.0 are now clear to you.
If there are any questions left, feel free to contact us. The best way to do so is via our slack community or by just opening an issue at this repo.
If there are any questions left, feel free to contact us. The best way to do so is via our slack community or by just opening an issue at this repo.