Given a monocular video sequence, our proposed FlashAvatar can reconstruct a high-fidelity digital avatar in minutes which can be animated and rendered over 300FPS at the resolution of 512×512 with an Nvidia RTX 3090.
This code has been tested on Nvidia RTX 3090.
Create the environment:
conda env create --file environment.yml
conda activate FlashAvatar
Install PyTorch3D:
conda install -c fvcore -c iopath -c conda-forge fvcore iopath
conda install -c bottler nvidiacub
conda install pytorch3d -c pytorch3d
The data is organized in the following form:
dataset
├── <id1_name>
├── alpha # raw alpha prediction
├── imgs # extracted video frames
├── parsing # semantic segmentation
├── <id2_name>
...
metrical-tracker
├── output
├── <id1_name>
├── checkpoint
├── <id2_name>
...
- Evaluating pre-trained model
python test.py --idname <id_name> --checkpoint dataset/<id_name>/log/ckpt/chkpnt.pth
- Training on your own data
python train.py --idname <id_name>
Download the example with pre-processed data and pre-trained model for a try!
@inproceedings{xiang2024flashavatar,
author = {Jun Xiang and Xuan Gao and Yudong Guo and Juyong Zhang},
title = {FlashAvatar: High-fidelity Head Avatar with Efficient Gaussian Embedding},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2024},
}