Skip to content

Latest commit

 

History

History
59 lines (44 loc) · 3.3 KB

README.md

File metadata and controls

59 lines (44 loc) · 3.3 KB

🌅 DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation

arXiv Demo Page zhihu

😊 Please give us a star ⭐ to support us for continous update 😊

News

  • 2024.10.14 🔥 We release the Demo page.
  • 2024.10.18 🔥 We release the paper DAWN.
  • 2024.10.21 🔥 We update the Chinese introduction 知乎.

TODO list:

  • release the inference code
  • release the pretrained model of 128*128
  • release the pretrained model of 256*256
  • in progress ...

Equipment Requirements

With our VRAM-oriented optimized code, the maximum length of video that can be generated is linearly related to the size of the GPU VRAM. Larger VRAM produce longer videos.

  • To generate 128*128 video, we recommend using a GPU with 12GB or more VRAM. This can at least generate video of approximately 400 frames.
  • To generate 256*256 video, we recommend using a GPU with 24GB or more VRAM. This can at least generate video of approximately 200 frames.

PS: Although optimized code can improve VRAM utilization, it currently sacrifices inference speed due to incomplete optimization of local attention. We are actively working on this issue, and if you have a better solution, we welcome your PR. If you wish to achieve faster inference speeds, you can use unoptimized code, but this will increase VRAM usage (O(n²) space complexity).

Methodology

The overall structure of DAWN:

framework

In Progress ...

Citing DAWN

If you wish to refer to the baseline results published here, please use the following BibTeX entries:

@misc{dawn2024,
      title={DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation}, 
      author={Hanbo Cheng and Limin Lin and Chenyu Liu and Pengcheng Xia and Pengfei Hu and Jiefeng Ma and Jun Du and Jia Pan},
      year={2024},
      eprint={2410.13726},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2410.13726}, 
}

Acknowledgement

Limin Lin and Hanbo Cheng contributed equally to the project.

Thank you to the authors of Diffused Heads for assisting us in reproducing their work! We also extend our gratitude to the authors of MRAA, LFDM and ACTOR for their contributions to the open-source community. Lastly, we thank our mentors and co-authors for their continuous support in our research work!