Skip to content
This repository has been archived by the owner on Sep 27, 2019. It is now read-only.

Tensorflow within Peloton

Saatvik Shah edited this page Jul 6, 2018 · 2 revisions

Overview

Tensorflow(TF) within Peloton can be used to create and use Deep Learning models for miscellaneous learning tasks. Currently it is being used for the Workload forecasting modules(brain/workload). The diagram below summarized the process involved in setting up and running a TF model: An important point to keep in mind is that there is a multi-language TF dependency. The actual TF models need to be written in Python(Why - because its highly documented and stable compared to other languages), after which they are serialized to Protobuf. This serialized export can now be imported into C++ and used for training and prediction.

Steps

Packages and Installation

We need both the C and Python Tensorflow APIs:

  1. Python(3.6):pip install --upgrade tensorflow.
  2. C API: The TF C API can be installed as explained here. It is a native API described as suitable for building bindings for other languages on the TF website.
    Since Google offers prebuilt binaries for the C API, its intallation is relatively lightweight and fast.

Quirks in Setup

Getting TF to work within Travis/Jenkins is a bit tricky. This is mainly because of the prebuilt binaries often put constraints on the OS environment.

  1. Protobuf 3.4.0+ is needed for the newest versions of Tensorflow which are available on Mac's brew but not Ubuntu apt-get. So it has to be built from source on Ubuntu. For Ubuntu 18.04 onwards, Maarten Fonwille's PPA can be used.
  2. Newer versions of TF(>= 1.5.0) don't work correctly on Ubuntu 14.04 so TF 1.4.0 needs to be used for that.

One alternative explored by us(and worth exploring in the future) is the TF C++ API. It traditionally requires a long bazel installation, build and additional setup. An easier way of installing is described here. But this still requires installing bazel and building tensorflow with it. The overall process is a bit tricky to get right and time consuming.

Python Model Setup

The main DL model has to be written in Python. An example LSTM model is available at src/brain/modelgen/LSTM.py. When building a model there are three important things to follow:

  1. Named Input/Output Graph Nodes: The graph nodes which accept some sort of input(placeholders) and which return a result(prediction output/error metric) should be provided an appropriate name. These names are used to call these components within C++ and pass data to them. For example, you'll notice data_, target_, lossOp_ names in the LSTM code.
  2. Passing Arguments by CLI: The python script should be able to run by CLI by passing arguments to the script(using a library such as argparse). All relevant parameters for the model should be set this way.
  3. Protobuf Export Code: The write_graph method should be directly reused and called in the main function as:
args = parser.parse_args()
model = LSTM(args.nfeats, args.nencoded, args.nhid,
             args.nlayers, args.lr, args.dropout_ratio,
             args.clip_norm)
model.tf_init()
model.write_graph(' '.join(args.graph_out_path))

To give perspective about why we are doing all this here's a little lookahead. The way things work within C++ is as follows - First we will execute, build and serialize the Tensorflow graph on the fly in the C++ model constructor. Immediately after we import the model and use the C API to call its nodes for passing inputs and getting outputs. Finally we cleanup the exported model in the C++ model destructor.

C++ Model Control

TF Session Entity

The TF Session Entity uses the TF C API for fine grained model control, providing clients higher level functions to abstract away implementation. There are 3 main classes at brain/util/tf_session_entity:

  1. TF Session Entity Input: Used to define and allow the TF graph to accept inputs as C++ data types. It currently works with native arrays and std::vector. Example usage is as follows:
// Inputs for backprop
std::vector<TfFloatIn *> inputs_optimize{
        // flattened C++ vectors(but works even without flattening)
        // if flattened dimensions have to be passed separately
        new TfFloatIn(data_batch.data(), dims, "data_"),
        new TfFloatIn(target_batch.data(), dims, "target_"),
        // single float value
        new TfFloatIn(dropout_ratio_, "dropout_ratio_"),
  1. TF Session Entity Output: Used to define and allow the TF graph to accept outputs as C++ data types. Example usage:
// loss output
auto output_loss = new TfFloatOut("lossOp_");
// prediction output
auto output_predict = new TfFloatOut("pred_");
  1. TF Session Entity: This class accepts the input and output types, evaluates and returns the relevant result. Example usage:
auto out = this->tf_session_entity_->Eval(inputs_loss, output_loss);

Building a Tensorflow Model class

The BaseTFModel C++ class can be inherited to obtain all the required functionality for automatically constructing the serialized model, using the TF session entity for training/prediction and finally destroying the model. You can refer to brain/workload/lstm.cpp for an example implementation.

Constructor

BaseTFModel needs the path to the python script, to the directory where to generate the model and the final generated model name. It resolved relative paths to absolute paths automatically(see the member variables).

  1. Calling Python script: Simply call GenerateModel with all the arguments passed as --argname argvalue.
  2. Importing Model: Simply tf_session_entity_->ImportGraph(graph_path_);.
  3. Initialize the TF session: Simply call TFInit()
Fitting and Prediction

The way this works is completely upto the client. The tf_session_entity_ is available to call upon the graph nodes for backpropagation/loss/prediction.

Clone this wiki locally