Skip to content

Holistic-Motion2D/Tender

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Holistic-Motion2D

This is the official code release of Holistic-Motion2D: Scalable Whole-body Human Motion Generation in 2D Space by Yuan Wang*, Zhao Wang*, Junhao Gong*, Di Huang, Tong He, Wanli Ouyang, Jile Jiao, Xuetao Feng, Qi Dou, Shixiang Tang†, Dan Xu.

📝 Changelog

  • [2024.11.04]: 🔥 Release the whole Holistic-Motion2D dataset. Find in Data Page.
  • [2024.06.20]: 🔥 Release a sample dataset. Find in Data Page. Code is coming soon!
  • [2024.09.10]: 🔥 Release the introduction and demo representations of our work, Holistic-Motion2D.

We present the first time a large-scale human motion benchmark, Holistic-Motion2D, including over 1M in-the-wild motion sequences, each paired with high-quality whole-body or partial pose annotations and textual descriptions.

image-20240817150254870

  • Our Holistic-Motion2D dataset provides not only fine-grained and comprehensive whole-body motion annotations but also high-resolution information on the face and hands. We use multi-source datasets with images of varying resolutions to jointly train a human generative foundation model.

image-20240817150543506

  • Our Holistic-Motion2D encompasses rich scenes, ranging from professional sports (playing tennis, skiing) and general daily actions (haircut, brushing teeth) to complex human-scene interactions (lying down, wall push-ups), capturing diverse environments such as indoor/outdoor landscapes, and dynamic action scenes.

image-20240817160339303

  • The video quantity of our Holistic-Motion2D is $10\times$ larger than the previously largest 3D motion dataset Motion-X. Compared to MotionX, our Holistic-Motion2D features more sophisticated actions, longer motion sequences, and increased diversity.

image-20240817160401230

  • Holistic-Motion2D is collected from 11 public video datasets along with two image datasets annotated by whole-body poses. Across 1M sequence-level motion sequences derived from in-the-wild scenarios, Holistic-Motion2D delivers 1M 2D whole-body pose annotations, complemented by 1M semantic descriptions.

Dataset Collection and Processing

image-20240817150513111

  • The data collection pipeline includes holistic 2D motion and caption generation. This pipeline involves several pivotal stages: 1) gathering large-scale videos, 2) annotating 2D whole-body keypoints and confidence scores, 3) filtering high-quality motion sequences, 4) designing text prompts via the Large Language Model, 5) generating descriptive captions for sequence-level movements, 6) executing the manual inspections.

All data will be downloaded on Open-Data Lab:

Path Size Files Format Description
Holistic-Motion-2D-dataset 118.15 GB 1,464,278 Main folder
├  kpfiles 118.06 GB 400,790 Sequence of key points for character motion
├ ├  UCF101 5.03 GB 11,391 Pickle Whole-body key-points for UCF101
├ ├  CAER 153.81 MB 3,542 Pickle Facial key-points for CAER
├ ├  K400 55.54 GB 152,798 Pickle Whole-body key-points for Kinetics-400
├ ├  InternVid 44.02 GB 85,665 Pickle Whole-body key-points for InternVid
├ ├  K700 0 0 Pickle Whole-body key-points for Kinetics-700
├ ├  IDEA400 6.33 GB 12,025 Pickle Whole-body key-points for IDEA400
├ ├  sthv2 900.19 MB 106,661 Pickle Hand key-points for Something-to-Something-v2
├ ├  UBody 3.21 GB 5,195 Pickle Whole-body key-points for UBody
├ ├  DFEW 1.68 GB 15,524 Pickle Facial key-points for DFEW
├  texts 101.05 MB 1,063,488 Caption for character motion video
├ ├  UCF101 4.68 MB 24,711 TXT Texts for UCF101
├ ├  CAER 32.16 KB 4,574 TXT Texts for CARE
├ ├  K400 40.6 MB 215,479 TXT Texts for Kinetics-400
├ ├  InternVid 20.52 MB 421,894 TXT Texts for InternVid
├ ├  K700 22.75 MB 141,611 TXT Texts for Kinetics-700
├ ├  IDEA400 2.96 MB 12,025 TXT Texts for IDEA400
├ ├  sthv2 8.05 MB 220,848 TXT Texts for Something-to-Something-v2
├ ├  UBody 1.01 MB 5,974 TXT Texts for UBody
├ ├  DFEW 456.1 KB 16,372 TXT Texts for DFEW

2D text-driven whole-body motion generation model

image-20240817173409065

  • Our Text-drivEN whole-boDy motion genERation (Tender), is tailored for 2D whole-body human motion synthesis. This model incorporates two novel designs to enhance the quality of generated motion: Part-aware Attention for Motion Variational Auto-Encoder (PA-VAE) and Confidence-Aware Generation (CAG).
MDM MLD T2M-GPT Tender(Ours)
  • Notably, Tender consistently outperforms these benchmarks by generating more vivid and lifelike human motion sequences. Our Tender not only captures the nuanced dynamics of human movement but also enhances the fidelity and temporal consistency of the motions.

Downstream Applications

  • Using MagicAnimate, we dynamically animate a human character by applying our generated pose sequences, resulting in exceptionally lifelike and fluid animations that demonstrate the seamless integration of our Tender model with real-time video generation tools.
- We employ MotionBERT to elevate these 2D human motions into 3D space, showcasing our model’s ability to facilitate complex 3D pose estimations. The lifted 3D motions maintain a high degree of smoothness and fidelity, making them suitable for applications in virtual reality (VR) and augmented reality (AR), where immersive and accurate 3D representations are essential.

License

  • Data License Confirmation and Author Responsibility.
    • All the Holistic-Motion2D (including JSON metadata, download script, and documentation) is distributed under the CC-BY-NC-SA (Attribution-NonCommercial-ShareAlike) license to ensure its legitimate and widespread use.
    • For the sub-datasets of Holistic-Motion2D, we would ask the user to read the original license of each original dataset, and we would only provide our annotated result to the user with the approvals from the original Institution. We confirm that our Holistic-Motion2D does not contain any personally identifiable information or offensive content.
    • You can use, redistribute, and adapt it for non-commercial purposes, as long as you (a) give appropriate credit by citing our paper, (b) indicate any changes that you've made, and (c) distribute any derivative works under the same license.
  • Code License. The code for pre-processing and training our Tender model uses the MIT license. Please refer to the GitHub repository for license details.

Releases

No releases published

Packages

No packages published