In summary, this repository includes papers and implementations of face modeling and utilization. A list of things I've used myself and found to be robust and useful. Many basics of computer vision things are also included.
- [VoxCeleb2] Chung, J. S., Nagrani, A., & Zisserman, A. (2018). Voxceleb2: Deep speaker recognition. arXiv preprint arXiv:1806.05622. Homepage. pdf. kingsj0405/video-preprocessing.
- [WFLW] Wu, W., Qian, C., Yang, S., Wang, Q., Cai, Y., & Zhou, Q. (2018). Look at boundary: A boundary-aware face alignment algorithm. CVPR. Homepage.
- [LRW] Chung, J. S., & Zisserman, A. (2017). Lip reading in the wild. ACCV. Homepage. pdf.
- [CelebVHQ] Zhu, H., Wu, W., Zhu, W., Jiang, L., Tang, S., Zhang, L., ... & Loy, C. C. (2022, November). CelebV-HQ: A large-scale video facial attributes dataset. ECCV. Project Page. Code. arXiv
- [VFHQ] Xie, L., Wang, X., Zhang, H., Dong, C., & Shan, Y. (2022). VFHQ: A High-Quality Dataset and Benchmark for Video Face Super-Resolution. CVPR Workshop. pdf.
- [survey] Egger, B., Smith, W. A., Tewari, A., Wuhrer, S., Zollhoefer, M., Beeler, T., ... & Vetter, T. (2020). 3d morphable face models—past, present, and future. ACM Transactions on Graphics (TOG), 39(5), 1-38. arXiv
- [3DDFA_V2] Guo, J., Zhu, X., Yang, Y., Yang, F., Lei, Z., & Li, S. Z. (2020, November). Towards fast, accurate and stable 3d dense face alignment. ECCV. code. arXiv
- [dlib] http://dlib.net/face_detector.py.html. code.
- [ArcFace] Deng, J., Guo, J., Xue, N., & Zafeiriou, S. (2019). Arcface: Additive angular margin loss for deep face recognition. CVPR. pdf. code.
- [RetinaFace] Deng, J., Guo, J., Ververas, E., Kotsia, I., & Zafeiriou, S. (2020). Retinaface: Single-shot multi-level face localisation in the wild. CVPR. arXiv. code. code2.
- [dlib] http://dlib.net/face_landmark_detection.py.html. code.
- [AdaptiveWingLoss] Wang, X., Bo, L., & Fuxin, L. (2019). Adaptive wing loss for robust face alignment via heatmap regression. ICCV. code. arXiv.
- [HRNets] Wang, J., Sun, K., Cheng, T., Jiang, B., Deng, C., Zhao, Y., ... & Xiao, B. (2020). Deep high-resolution representation learning for visual recognition. IEEE transactions on pattern analysis and machine intelligence, 43(10), 3349-3364. code. arXiv. kingsj0405/hrnet_face_landmark. kingsj0405/HRNet-Interspecies-Landmark-Detection.
- [FAN] Yang, J., Bulat, A., & Tzimiropoulos, G. (2020, April). Fan-face: a simple orthogonal improvement to deep face recognition. AAAI. code. pdf.
- [SORT] Bewley, A., Ge, Z., Ott, L., Ramos, F., & Upcroft, B. (2016, September). Simple online and realtime tracking. In 2016 IEEE international conference on image processing (ICIP) (pp. 3464-3468). IEEE. code. arXiv.
- [Cross-Identity] Jeon, S., Nam, S., Oh, S. W., & Kim, S. J. (2020). Cross-identity motion transfer for arbitrary objects through pose-attentive video reassembling. ECCV. paper.
- [LIA] Wang, Y., Yang, D., Bremond, F., & Dantcheva, A. (2022). Latent image animator: Learning to animate images via latent space navigation. ICLR. arXiv. code. project page.
- [Wav2Lip] Prajwal, K. R., Mukhopadhyay, R., Namboodiri, V. P., & Jawahar, C. V. (2020, October). A lip sync expert is all you need for speech to lip generation in the wild. ACM Multimedia. arXiv. code. project page.
- [PC-AVS] Zhou, H., Sun, Y., Wu, W., Loy, C. C., Wang, X., & Liu, Z. (2021). Pose-controllable talking face generation by implicitly modularized audio-visual representation. CVPR. arXiv. code. project page.
- [StyleSync] Guan, J., Zhang, Z., Zhou, H., Hu, T., Wang, K., He, D., ... & Wang, J. (2023). StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-based Generator. CVPR. arXiv. code.
- [MakeItTalk] Zhou, Y., Han, X., Shechtman, E., Echevarria, J., Kalogerakis, E., & Li, D. (2020). Makelttalk: speaker-aware talking-head animation. ACM Transactions On Graphics (TOG). arXiv. code.
- [AD-NeRF] Guo, Y., Chen, K., Liang, S., Liu, Y. J., Bao, H., & Zhang, J. (2021). Ad-nerf: Audio driven neural radiance fields for talking head synthesis. ICCV. arXiv. code. project page.
- [SSP-NeRF] Liu, X., Xu, Y., Wu, Q., Zhou, H., Wu, W., & Zhou, B. (2022, October). Semantic-aware implicit neural audio-driven video portrait generation. ECCV. arXiv. code. project page.
- [GeneFace] Ye, Z., Jiang, Z., Ren, Y., Liu, J., He, J., & Zhao, Z. (2023). Geneface: Generalized and high-fidelity audio-driven 3d talking face synthesis. ICLR. arXiv. code. project page.
- [SadTalker] Zhang, W., Cun, X., Wang, X., Zhang, Y., Shen, X., Guo, Y., ... & Wang, F. (2023). SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation. CVPR. arXiv. code. project page.
- [IP_LAP] Zhong, W., Fang, C., Cai, Y., Wei, P., Zhao, G., Lin, L., & Li, G. (2023). Identity-Preserving Talking Face Generation with Landmark and Appearance Priors. CVPR. arXiv. code.
- [HyperIQA] Su, S., Yan, Q., Zhu, Y., Zhang, C., Ge, X., Sun, J., & Zhang, Y. (2020). Blindly assess image quality in the wild guided by a self-adaptive hyper network. CVPR. code. pdf.
- [Pareidolia Face Reenactment] Song, L., Wu, W., Fu, C., Qian, C., Loy, C. C., & He, R. (2021). Pareidolia Face Reenactment. CVPR. paper. code. project page.
- [DIFE] Yang, S., Jeon, S., Nam, S., & Kim, S. J. (2022). Dense Interspecies Face Embedding. NeurIPS. paper. code. project page
- [survey] Kingma, D. P., & Welling, M. (2019). An introduction to variational autoencoders. Foundations and Trends® in Machine Learning, 12(4), 307-392. arXiv.
- [thesis] Kingma, D. P. (2017). Variational inference & deep learning: A new synthesis. pdf.
- [VAE] Kingma, D. P., & Welling, M. (2014). Auto-encoding variational bayes. ICLR. arXiv.
- [IAF] Kingma, D. P., Salimans, T., Jozefowicz, R., Chen, X., Sutskever, I., & Welling, M. (2016). Improved variational inference with inverse autoregressive flow. NeurIPS. arXiv.
- [Glow] Kingma, D. P., & Dhariwal, P. (2018). Glow: Generative flow with invertible 1x1 convolutions. NeurIPS. arXiv. code.
- [Flow++] Ho, J., Chen, X., Srinivas, A., Duan, Y., & Abbeel, P. (2019, May). Flow++: Improving flow-based generative models with variational dequantization and architecture design. In International Conference on Machine Learning (pp. 2722-2730). PMLR. code. arXiv.
- [DDPM] Ho, J., Jain, A., & Abbeel, P. (2020). Denoising diffusion probabilistic models. NeurIPS. arXiv. code.
- [ImprovedDDPM] Nichol, A. Q., & Dhariwal, P. (2021, July). Improved denoising diffusion probabilistic models. In International Conference on Machine Learning (pp. 8162-8171). PMLR. code. arXiv.
- [GuidedDiffusion] Dhariwal, P., & Nichol, A. (2021). Diffusion models beat gans on image synthesis. NuerIPS. arXiv. code.
- [StyleGAN] Karras, T., Laine, S., & Aila, T. (2019). A style-based generator architecture for generative adversarial networks. CVPR. code. arXiv.
- [StyleGAN2] Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., & Aila, T. (2020). Analyzing and improving the image quality of stylegan. CVPR. code. arXiv
- [StyleGAN2-ADA] Karras, T., Aittala, M., Hellsten, J., Laine, S., Lehtinen, J., & Aila, T. (2020). Training generative adversarial networks with limited data. NuerIPS. code. arXiv.
- [StyleGAN3] Karras, T., Aittala, M., Laine, S., Härkönen, E., Hellsten, J., Lehtinen, J., & Aila, T. (2021). Alias-free generative adversarial networks. NeurIPS. code. arXiv.
- [MoStGAN-V] Shen, X., Li, X., & Elhoseiny, M. (2023). MoStGAN-V: Video Generation with Temporal Motion Styles. CVPR. code. project page. arXiv.