MediaPipe is a framework for building multimodal (eg. video, audio, any time series data) applied ML pipelines. With MediaPipe, a perception pipeline can be built as a graph of modular components, including, for instance, inference models (e.g., TensorFlow, TFLite) and media processing functions.
"MediaPipe has made it extremely easy to build our 3D person pose reconstruction demo app, facilitating accelerated neural network inference on device and synchronization of our result visualization with the video capture stream. Highly recommended!" - George Papandreou, CTO, Ariel AI
- Hand Tracking
- Multi-hand Tracking
- Face Detection
- Hair Segmentation
- Object Detection
- Object Detection and Tracking
Follow these instructions.
See mobile, desktop and Google Coral examples.
MediaPipe Read-the-Docs or docs.mediapipe.dev
Check out the Examples page for tutorials on how to use MediaPipe. Concepts page for basic definitions
A web-based visualizer is hosted on viz.mediapipe.dev. Please also see instructions here.
- Discuss - General community discussion around MediaPipe
- On-Device, Real-Time Hand Tracking with MediaPipe
- MediaPipe: A Framework for Building Perception Pipelines
- AI Nextcon 2020, 12-16 Feb 2020, Seattle
- MediaPipe Madrid Meetup, 16 Dec 2019
- MediaPipe London Meetup, Google 123 Building, 12 Dec 2019
- ML Conference, Berlin, 11 Dec 2019
- MediaPipe Berlin Meetup, Google Berlin, 11 Dec 2019
- The 3rd Workshop on YouTube-8M Large Scale Video Understanding Workshop Seoul, Korea ICCV 2019
- AI DevWorld 2019 on Oct 10 in San Jose, California
- Google Industry Workshop at ICIP 2019 Presentation on Sept 24 in Taipei, Taiwan
- Open sourced at CVPR 2019 on June 17~20 in Long Beach, CA
MediaPipe is currently in alpha for v0.6. We are still making breaking API changes and expect to get to stable API by v1.0.
We welcome contributions. Please follow these guidelines.
We use GitHub issues for tracking requests and bugs. Please post questions to the MediaPipe Stack Overflow with a 'mediapipe' tag.