Neural Style Transfer

The idea behind neural style transfer is actually not that hard to grasp if you are already familiar with convolutional neural networks. It was initially presented in a 2016 paper A Neural Algorithm of Artistic Style by Gatys et al. There are many implementations and tutorials available online. This code is based on the nice Colab notebook by Google. If you wish to try this and do not own a GPU, using Colab or other cloud services is recommended, since using a CPU is very slow (I couldn't even wait for 1000 iterations to finish on my monstrous Core i3).

Theory

As the researchers mention in the paper, the key takeaway is that the information about the content and style of an image is disentangled and we can separate them by extracting specific features from an image using a CNN. The original paper uses VGG-19 trained on ImageNet for this purpose. I will give a brief high-level overview here. We consider 3 images: an input image, a content image, and a style image. The input image is the one we modify to obtain the result. Content and style images provide the content and style features respectively. The problem is formulated as an optimization problem, where we aim to minimize a loss function with respect to the input image. The loss function is a linear combination of content loss and style loss (total variation loss may also be added). Each of these losses have a scalar weight associated to them which is a hyperparameter. We change the input image using the gradient from the loss function.

Content loss

Content loss measures the "difference" between the input and content images. We can take the squared-error loss between images, which is simply the sum of squeared pixel-wise differences:

Minimising this loss alone would result in a reconstruction of the content image.

Style loss

Unlike the content loss, the style loss captures the differences that are not spatially fixed, e.g., if there is a certain feature in, say, upper right corner of the style image, that feature does not necessarily have to be present in the same part of the input image. We are more interested in the correlation between the features rather than their presence. This can be expressed as distance (like content loss) between the Gram matrices of style features:

where w_t represents the weight of l'th layer loss and

I know I am omitting the notation here, for which I recommend reading the paper. In my implementation, I don't divide by (4 * N^2 * M^2).

Total variation loss

TV loss is used to impose smoothness and minimize high-frequency artifacts in the result image. It is defined as the sum of (absolute values / squares) of differences in x and y direction:

You may notice that this operation is similar to edge detection. Indeed, the differences give us the "derivative", or "gradient" that is also useful for finding edges. If we visualize the differences it would look much like if the image was filtered using a Sobel filter or something similar. I use the squared version in this implementation.

Implementation

The code has plenty of comments. I have used TensorFlow's Eager mode because I wanted to try it out and aforementioned notebook uses it too. The code is for Python 3.

Usage

Instructions are for Linux. First of all, clone the repo:

git clone https://github.com/iibrahimli/nst.git

Then, cd into it and install the requirements:

cd nst
pip3 install -r requirements.txt

Notice that to use the GPU (which is highly recommended) with tensorflow you need to install tensorflow-gpu (pip3 install tenorflow-gpu), CUDA and CuDNN. You can find the instructions for your platform here.

After these steps are done, you can call the script nst.py with option -h to see usage. An example run with

content image: examples/baku.jpg
style image: examples/impasto.jpg
result image: baku_impasto.jpg

python3 nst.py -v -c examples/baku.jpg -s examples/impasto.jpg -r baku_impasto.jpg

Other parameters may be omitted in which case default parameters will be used:

content loss weight: 1e4
style loss weight: 1e-2
total variation loss weight: 1e2
number of iterations: 1000
initial learning rate: 5.0
no verbose output

I would recommend experimenting with weights and seeing what works better.

Results

Here is a matrix of images where the first row and first column consist of style and content images respectively.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
examples		examples
rsc		rsc
.gitignore		.gitignore
README.md		README.md
nst.py		nst.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Neural Style Transfer

Theory

Content loss

Style loss

Total variation loss

Implementation

Usage

Results

About

Releases

Packages

Languages

iibrahimli/nst

Folders and files

Latest commit

History

Repository files navigation

Neural Style Transfer

Theory

Content loss

Style loss

Total variation loss

Implementation

Usage

Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages