Skip to content

Latest commit

 

History

History
455 lines (399 loc) · 17.2 KB

apps.md

File metadata and controls

455 lines (399 loc) · 17.2 KB

Introduction of Prediction Interface

PaddleGAN(ppgan.apps)provides prediction APIs covering multiple applications, including super resolution, video frame interpolation, colorization, makeup shifter, image animation, face parsing, etc. The integral pre-trained high-performance models enable users' flexible and efficient usage and inference.

Public Usage

Switch of CPU and GPU

By default, GPU devices with the PaddlePaddle GPU environment package installed conduct inference by using GPU. If the CPU environment package is installed, CPU is used for inference.

If manual switch of CPU and GPU is needed,you can do the following:

import paddle
paddle.set_device('cpu') #set as CPU
#paddle.set_device('gpu') #set as GPU

ppgan.apps.DeOldifyPredictor

ppgan.apps.DeOldifyPredictor(output='output', weight_path=None, render_factor=32)

Build the instance of DeOldify. DeOldify is a coloring model based on GAN. The interface supports the colorization of images or videos. The recommended video format is mp4.

Example

from ppgan.apps import DeOldifyPredictor
deoldify = DeOldifyPredictor()
deoldify.run("docs/imgs/test_old.jpeg")

Parameters

  • output (str): path of the output image, default: output. Note that the save path should be set as output/DeOldify.
  • weight_path (str): path of the model, default: None,pre-trained integral model will then be automatically downloaded.
  • artistic (bool): whether to use "artistic" model, which may produce interesting colors, but there are more glitches.
  • render_factor (int): the zoom factor during image rendering and colorization. The image will be zoomed to a square with side length of 16xrender_factor before being colorized. For example, with a default value of 32,the entered image will be resized to (16x32=512) 512x512. Normally,the smaller the render_factor,the faster the computation and the more vivid the colors. Therefore, old images with low quality usually benefits from lowering the value of rendering factor. The higher the value, the better the image quality, but the color may fade slightly.

run

run(input)

The execution interface after building the instance. Parameters

  • input (str|np.ndarray|Image.Image): the input image or video files。For images, it could be its path, np.ndarray, or PIL.Image type. For videos, it could only be the file path.

Return Value

  • tuple(pred_img(np.array), out_paht(str)): for image input, return the predicted image, PIL.Image type and the path where the image is saved.
  • tuple(frame_path(str), out_path(str)): for video input, frame_path is the save path of the images after colorizing each frame of the video, and out_path is the save path of the colorized video.

run_image

run_image(img)

The interface of image colorization. Parameters

  • img (str|np.ndarray|Image.Image): input image,it could be the path of the image, np.ndarray, or PIL.Image type.

Return Value

  • pred_img(PIL.Image): return the predicted image, PIL.Image type.

run_video

run_video(video)

The interface of video colorization. Parameters

  • Video (str): path of the input video files.

Return Value

  • tuple(frame_path(str), out_path(str)): frame_path is the save path of the images after colorizing each frame of the video, and out_path is the save path of the colorized video.

ppgan.apps.DeepRemasterPredictor

ppgan.apps.DeepRemasterPredictor(output='output', weight_path=None, colorization=False, reference_dir=None, mindim=360)

Build the instance of DeepRemasterPredictor. DeepRemaster is a GAN-based coloring and restoring model, which can provide input reference frames. Only video input is available now, and the recommended format is mp4.

Example

from ppgan.apps import DeepRemasterPredictor
deep_remaster = DeepRemasterPredictor()
deep_remaster.run("docs/imgs/test_old.jpeg")

Parameters

  • output (str): path of the output image, default: output. Note that the path should be set as output/DeepRemaster.
  • weight_path (str): path of the model, default: None,pre-trained integral model will then be automatically downloaded.
  • colorization (bool): whether to enable the coloring function, default: False, only the restoring function will be executed.
  • reference_dir(str|None): path of the reference frame when the coloring function is on, no reference frame is also allowed.
  • mindim(int): minimum side length of the resized image before prediction.

run

run(video_path)

The execution interface after building the instance. Parameters

  • video_path (str): path of the video file.

Return Value

  • tuple(str, str)): return two types of str, the former is the save path of each frame of the colorized video, the latter is the save path of the colorized video.

ppgan.apps.RealSRPredictor

ppgan.apps.RealSRPredictor(output='output', weight_path=None)

Build the instance of RealSR。RealSR, Real-World Super-Resolution via Kernel Estimation and Noise Injection, is launched by CVPR 2020 Workshops in its super resolution model based on real-world images training. The interface imposes 4x super resolution on the input image or video. The recommended video format is mp4.

*Note: the size of the input image should be less than 1000x1000pix。

Example

from ppgan.apps import RealSRPredictor
sr = RealSRPredictor()
sr.run("docs/imgs/test_sr.jpeg")

Parameters

  • output (str): path of the output image, default: output. Note that the path should be set as output/RealSR.
  • weight_path (str): path of the model, default: None,pre-trained integral model will then be automatically downloaded.
run(video_path)

The execution interface after building the instance. Parameters

  • video_path (str): path of the video file.

Return Value

  • tuple(pred_img(np.array), out_paht(str)): for image input, return the predicted image, PIL.Image type and the path where the image is saved.
  • tuple(frame_path(str), out_path(str)): for video input, frame_path is the save path of each frame of the video after super resolution, and out_path is the save path of the video after super resolution.

run_image

run_image(img)

The interface of image super resolution. Parameter

  • img (str|np.ndarray|Image.Image): input image, it could be the path of the image, np.ndarray, or PIL.Image type.

Return Value

  • pred_img(PIL.Image): return the predicted image, PIL.Image type.

run_video

run_video(video)

The interface of video super resolution. Parameter

  • Video (str): path of the video file.

Return Value

  • tuple(frame_path(str), out_path(str)): frame_path is the save path of each frame of the video after super resolution, and out_path is the save path of the video after super resolution.

ppgan.apps.EDVRPredictor

ppgan.apps.EDVRPredictor(output='output', weight_path=None)

Build the instance of RealSR. EDVR is a model designed for video super resolution. For more details, see the paper, EDVR: Video Restoration with Enhanced Deformable Convolutional Networks (https://arxiv.org/abs/1905.02716). The interface imposes 2x super resolution on the input video. The recommended video format is mp4.

*Note: The interface is only available in static graph, add the following codes to enable static graph before using it:

import paddle
paddle.enable_static() #enable static graph
paddle.disable_static() #disable static graph

Parameter

from ppgan.apps import EDVRPredictor
sr = EDVRPredictor()
# test a video file
sr.run("docs/imgs/test.mp4")

参数

  • output (str): path of the output image, default: output. Note that the path should be set as output/EDVR.
  • weight_path (str): path of the model, default: None,pre-trained integral model will then be automatically downloaded.
run(video_path)

The execution interface after building the instance. Parameter

  • video_path (str): path of the video files.

Return Value

  • tuple(str, str): the former is the save path of each frame of the video after super resolution, the latter is the save path of the video after super resolution.

ppgan.apps.DAINPredictor

ppgan.apps.DAINPredictor(output='output', weight_path=Nonetime_step=None, use_gpu=True, key_frame_thread=0remove_duplicates=False)

Build the instance of DAIN model. DAIN supports video frame interpolation, producing videos with higher frame rate. For more details, see the paper, DAIN: Depth-Aware Video Frame interpolation (https://arxiv.org/abs/1904.00830).

*Note: The interface is only available in static graph, add the following codes to enable static graph before using it:

import paddle
paddle.enable_static() #enable static graph
paddle.disable_static() #disable static graph

Example

from ppgan.apps import DAINPredictor
dain = DAINPredictor(time_step=0.5) # With no defualt value, time_step need to be manually specified
# test a video file
dain.run("docs/imgs/test.mp4")

Parameters

  • output_path (str): path of the predicted output, default: output. Note that the path should be set as output/DAIN.
  • weight_path (str): path of the model, default: None, pre-trained integral model will then be automatically downloaded.
  • time_step (float): the frame rate changes by a factor of 1./time_step, e.g. 2x frames if time_step is 0.5 and 4x frames if it is 0.25.
  • use_gpu (bool): whether to make predictions by using GPU, default: True.
  • remove_duplicates (bool): whether to remove duplicates, default: False.
run(video_path)

The execution interface after building the instance. Parameters

  • video_path (str): path of the video file.

Return Value

  • tuple(str, str): for video input, frame_path is the save path of the image after colorizing each frame of the video, and out_path is the save path of the colorized video.

ppgan.apps.FirstOrderPredictor

ppgan.apps.FirstOrderPredictor(output='output', weight_path=Noneconfig=None, relative=False, adapt_scale=Falsefind_best_frame=False, best_frame=None)

Build the instance of FirstOrder model. The model is dedicated to Image Animation, i.e., generating a video sequence so that an object in a source image is animated according to the motion of a driving video.

For more details, see paper, First Order Motion Model for Image Animation (https://arxiv.org/abs/2003.00196) .

Example

from ppgan.apps import FirstOrderPredictor
animate = FirstOrderPredictor()
# test a video file
animate.run("source.png","driving.mp4")

Parameters

  • output_path (str): path of the predicted output, default: output. Note that the path should be set as output/result.mp4.
  • weight_path (str): path of the model, default: None, pre-trained integral model will then be automatically downloaded.
  • config (dict|str|None): model configuration, it can be a dictionary type or a YML file, and the default value None is adopted. When the weight is None by default, the config also needs to adopt the default value None. otherwise, the configuration here should be consistent with the corresponding weight.
  • relative (bool): indicate whether the relative or absolute coordinates of key points in the video are used in the program, default: False.
  • adapt_scale (bool): adapt movement scale based on convex hull of key points, default: False.
  • find_best_frame (bool): whether to start generating from the frame that best matches the source image, which exclusively applies to face applications and requires libraries with face alignment.
  • best_frame (int): set the number of the starting frame, default: None, that is, starting from the first frame(counting from 1).
run(source_imagedriving_video)

The execution interface after building the instance, the predicted video is save in output/result.mp4. Parameters

  • source_image (str): input the source image。
  • driving_video (str): input the driving video, mp4 format recommended.

Return Value

None.

ppgan.apps.FaceParsePredictor

ppgan.apps.FaceParsePredictor(output_path='output')

Build the instance of the face parsing model. The model is devoted to address the task of distributing a pixel-wise label to each semantic components (e.g. hair, lips, nose, ears, etc.) in accordance with the input facial image. The task proceeds with the help of BiseNet.

For more details, see the paper, BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation (https://arxiv.org/abs/1808.00897v1).

*Note: dlib package is needed for this interface, use the following codes to install it:

pip install dlib

It may take long to install this package under Windows, please be patient.

Parameters:

  • input_image: path of the input image to be parsed
  • output_path: path of the output to be saved Example:
from ppgan.apps import FaceParsePredictor
parser = FaceParsePredictor()
parser.run('docs/imgs/face.png')

Return Value:

  • mask(numpy.ndarray): return the mask matrix of the parsed facial components, data type: numpy.ndarray.

ppgan.apps.AnimeGANPredictor

ppgan.apps.AnimeGANPredictor(output_path='output_dir',weight_path=None,use_adjust_brightness=True)

Adopt the AnimeGAN v2 to realize the animation of scenery images.

For more details, see the paper, AnimeGAN: A Novel Lightweight GAN for Photo Animation (https://link.springer.com/chapter/10.1007/978-981-15-5577-0_18). Parameters:

  • input_image: path of the input image to be parsed. Example:
from ppgan.apps import AnimeGANPredictor
predictor = AnimeGANPredictor()
predictor.run('docs/imgs/animeganv2_test.jpg')

Return Value:

  • anime_image(numpy.ndarray): return the stylized scenery image.

ppgan.apps.MiDaSPredictor

ppgan.apps.MiDaSPredictor(output=None, weight_path=None)

MiDaSv2 is a monocular depth estimation model (see https://github.com/intel-isl/MiDaS). Monocular depth estimation is a method used to compute depth from a singe RGB image.

For more details, see the paper Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer (https://arxiv.org/abs/1907.01341v3). Example

from ppgan.apps import MiDaSPredictor
# if set output, will write depth pfm and png file in output/MiDaS
model = MiDaSPredictor()
prediction = model.run()

Color display of the depth image:

import numpy as np
import PIL.Image as Image
import matplotlib as mpl
import matplotlib.cm as cm

vmax = np.percentile(prediction, 95)
normalizer = mpl.colors.Normalize(vmin=prediction.min(), vmax=vmax)
mapper = cm.ScalarMappable(norm=normalizer, cmap='magma')
colormapped_im = (mapper.to_rgba(prediction)[:, :, :3] * 255).astype(np.uint8)
im = Image.fromarray(colormapped_im)
im.save('test_disp.jpeg')

Parameters:

  • output (str): path of the output, if it is None, no pfm and png depth image will be saved.
  • weight_path (str): path of the model, default: None, pre-trained integral model will then be automatically downloaded. Return Value:
  • prediction (numpy.ndarray): return the prediction.
  • pfm_f (str): return the save path of pfm files if the output path is set.
  • png_f (str): return the save path of png files if the output path is set.

ppgan.apps.Wav2LipPredictor

ppgan.apps.Wav2LipPredictor(face=None, ausio_seq=None, outfile=None)

Build the instance for the Wav2Lip model, which is used for lip generation, i.e., achieving the synchronization of lip movements on a talking face video and the voice from an input audio.

For more details, see the paper, A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild (http://arxiv.org/abs/2008.10010).

Example

from ppgan.apps import Wav2LipPredictor
import ppgan
predictor = Wav2LipPredictor()
predictor.run('/home/aistudio/先烈.jpeg', '/home/aistudio/pp_guangquan_zhenzhu46s.mp4','wav2lip')

Parameters:

  • face (str): path of images or videos containing human face.
  • audio_seq (str): path of the input audio, any processable format in ffmpeg is supported, including .wav, .mp3, .m4a etc.
  • outfile (str): path of the output video file. Return Value

None