Skip to content

收集 CVPR 最新的成果,包括论文、代码和demo视频等,欢迎大家推荐!


Notifications You must be signed in to change notification settings



Folders and files

Last commit message
Last commit date

Latest commit


Repository files navigation





🌟 CVPR 2022 持续更新最新论文/paper和相应的开源代码/code!

🚗 CVPR 2022 收录列表ID:

🚗 官网链接:


✋ ​注:欢迎各位大佬提交issue,分享CVPR 2022论文/paper和开源项目!共同完善这个项目




🎆 欢迎进群 | Welcome

CVPR 2022 论文/paper交流群已成立!已经收录的同学,可以添加微信:nvshenj125,请备注:CVPR+姓名+学校/公司名称!一定要根据格式申请,可以拉你进群。

🔨 目录 |Table of Contents(点击直接跳转)





3MASSIV: Multilingual, Multimodal and Multi-Aspect dataset of Social Media Short Videos

Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities

DynamicEarthNet: Daily Multi-Spectral Satellite Dataset for Semantic Change Segmentation

Dataset Distillation by Matching Training Trajectories

FERV39k: A Large-Scale Multi-Scene Dataset for Facial Expression Recognition in Videos

GrainSpace: A Large-scale Dataset for Fine-grained and Domain-adaptive Recognition of Cereal Grains

STCrowd: A Multimodal Dataset for Pedestrian Perception in Crowded Scenes



ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer

BEHAVE: Dataset and Method for Tracking Human Object Interactions

SoccerNet-Tracking: Multiple Object Tracking Dataset and Benchmark in Soccer Videos



Optimizing Elimination Templates by Greedy Parameter Search

Searching for Network Width with Bilaterally Coupled Network

Arch-Graph: Acyclic Architecture Relation Predictor for Task-Transferable Neural Architecture Search


Knowledge Distillation

Decoupled Knowledge Distillation

Knowledge Distillation with the Reused Teacher Classifier


多模态 / Multimodal

Balanced Multimodal Learning via On-the-fly Gradient Modulation

Conditional Prompt Learning for Vision-Language Models

Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation

Motron: Multimodal Probabilistic Human Motion Forecasting

StyleT2I: Toward Compositional and High-Fidelity Text-to-Image Synthesis

Text2Pos: Text-to-Point-Cloud Cross-Modal Localization

Towards Implicit Text-Guided 3D Shape Generation

UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection

Versatile Multi-Modal Pre-Training for Human-Centric Perception

X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval

ViSTA: Vision and Scene Text Aggregation for Cross-Modal Retrieval

STCrowd: A Multimodal Dataset for Pedestrian Perception in Crowded Scenes

XMP-Font: Self-Supervised Cross-Modality Pre-training for Few-Shot Font Generation

Robust Cross-Modal Representation Learning with Progressive Self-Distillation

Multimodal Transformer for Nursing Activity Recognition

Probabilistic Compositional Embeddings for Multimodal Image Retrieval

Are Multimodal Transformers Robust to Missing Modality?


Contrastive Learning

Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation

ContrastMask: Contrastive Learning to Segment Every Thing

Fair Contrastive Learning for Facial Attribute Classification

Frame-wise Action Representations for Long Videos via Sequence Contrastive Learning

Rethinking Minimal Sufficient Representation in Contrastive Learning

Selective-Supervised Contrastive Learning with Noisy Labels

Unsupervised Deraining: Where Contrastive Learning Meets Self-similarity

Fine-grained Temporal Contrastive Learning for Weakly-supervised Temporal Action Localization

Unified Contrastive Learning in Image-Text-Label Space

Probabilistic Representations for Video Contrastive Learning


图神经网络 / Graph Neural Networks

Lifelong Graph Learning

Long-term Visual Map Sparsification with Heterogeneous GNN

SkinningNet: Two-Stream Graph Convolutional Neural Network for Skinning Prediction of Synthetic Characters


胶囊网络 / Capsule Network

HP-Capsule: Unsupervised Face Part Discovery by Hierarchical Parsing Capsule Network


图像分类 / Image Classification

CAD: Co-Adapting Discriminative Features for Improved Few-Shot Classification

Integrative Few-Shot Learning for Classification and Segmentation

Matching Feature Sets for Few-Shot Image Classification

Joint Distribution Matters: Deep Brownian Distance Covariance for Few-Shot Classification

Regression or Classification? Reflection on BP prediction from PPG data using Deep Neural Networks in the scope of practical applications


目标检测/Object Detection

A Dual Weighting Label Assignment Scheme for Object Detection

Implicit Motion Handling for Video Camouflaged Object Detection

Democracy Does Matter: Comprehensive Feature Mining for Co-Salient Object Detection

DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection

Efficient Two-Stage Detection of Human-Object Interactions with a Novel Unary-Pairwise Transformer

Expanding Low-Density Latent Regions for Open-Set Object Detection

Ev-TTA: Test-Time Adaptation for Event-Based Object Recognition

Canonical Voting: Towards Robust Oriented Bounding Box Detection in 3D Scenes

Back to Reality: Weakly-supervised 3D Object Detection with Shape-guided Label Enhancement

LiDAR Snowfall Simulation for Robust 3D Object Detection

Learning to Prompt for Open-Vocabulary Object Detection with Vision-Language Model

Knowledge Distillation as Efficient Pre-training: Faster Convergence, Higher Data-efficiency, and Better Transferability

Optimal Correction Cost for Object Detection Evaluation

Point2Seq: Detecting 3D Objects as Sequences

Point Density-Aware Voxels for LiDAR 3D Object Detection

MonoJSG: Joint Semantic and Geometric Cost Volume for Monocular 3D Object Detection

MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer

MonoDETR: Depth-aware Transformer for Monocular 3D Object Detection

Real-time Object Detection for Streaming Perception

SIOD: Single Instance Annotated Per Category Per Image for Object Detection

SIGMA: Semantic-complete Graph Matching for Domain Adaptive Object Detection

Sparse Fuse Dense: Towards High Quality 3D Detection with Depth Completion

Task-specific Inconsistency Alignment for Domain Adaptive Object Detection

TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers

VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial Attention

Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds

Rope3D: TheRoadside Perception Dataset for Autonomous Driving and Monocular 3D Object Detection Task

Understanding 3D Object Articulation in Internet Videos

AdaMixer: A Fast-Converging Query-Based Object Detector

Forecasting from LiDAR via Future Object Detection

Target-aware Dual Adversarial Learning and a Multi-scenario Multi-Modality Benchmark to Fuse Infrared and Visible for Object Detection

Learning of Global Objective for Network Flow in Multi-Object Tracking

FLOAT: Factorized Learning of Object Attributes for Improved Multi-object Multi-part Scene Parsing

Omni-DETR: Omni-Supervised Object Detection with Transformers

Learning to Detect Mobile Objects from LiDAR Scans Without Labels

Multi-Granularity Alignment Domain Adaptation for Object Detection

CAT-Det: Contrastively Augmented Transformer for Multi-modal 3D Object Detection

R(Det)^2: Randomized Decision Routing for Object Detection

Homography Loss for Monocular 3D Object Detection

Overcoming Catastrophic Forgetting in Incremental Object Detection via Elastic Response Distillation

Towards Robust Adaptive Object Detection under Noisy Annotations

Towards Open-Set Object Detection and Discovery

DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection

HyperDet3D: Learning a Scene-conditioned 3D Object Detector


目标跟踪/Object Tracking

DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion

Global Tracking Transformers

MixFormer: End-to-End Tracking with Iterative Mixed Attention

Transforming Model Prediction for Tracking

TCTrack: Temporal Contexts for Aerial Tracking

Unified Transformer Tracker for Object Tracking

Learning of Global Objective for Network Flow in Multi-Object Tracking

Global Tracking via Ensemble of Local Trackers

MeMOT: Multi-Object Tracking with Memory

Unsupervised Learning of Accurate Siamese Tracking

Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline

BEHAVE: Dataset and Method for Tracking Human Object Interactions

SoccerNet-Tracking: Multiple Object Tracking Dataset and Benchmark in Soccer Videos

3D Object Tracking

Iterative Corresponding Geometry: Fusing Region and Depth for Highly Efficient 3D Tracking of Textureless Objects


轨迹预测/Trajectory Prediction

How many Observations are Enough? Knowledge Distillation for Trajectory Forecasting

Non-Probability Sampling Network for Stochastic Human Trajectory Prediction

Remember Intentions: Retrospective-Memory-based Trajectory Prediction

Stochastic Trajectory Prediction via Motion Indeterminacy Diffusion



Class-Balanced Pixel-Level Self-Labeling for Domain Adaptive Semantic Segmentation

Deep Hierarchical Semantic Segmentation

E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation

Hyperbolic Image Segmentation

Mask Transfiner for High-Quality Instance Segmentation

Noisy Boundaries: Lemon or Lemonade for Semi-supervised Instance Segmentation?

Rethinking Semantic Segmentation: A Prototype View

Regional Semantic Contrast and Aggregation for Weakly Supervised Semantic Segmentation

Representation Compensation Networks for Continual Semantic Segmentation

SimT: Handling Open-set Noise for Domain Adaptive Semantic Segmentation

Semantic Segmentation by Early Region Proxy

Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels

SharpContour: A Contour-based Boundary Refinement Approach for Efficient and Accurate Instance Segmentation

ST++: Make Self-training Work Better for Semi-supervised Semantic Segmentation

Scribble-Supervised LiDAR Semantic Segmentation

Sparse Instance Activation for Real-Time Instance Segmentation

Tree Energy Loss: Towards Sparsely Annotated Semantic Segmentation

Towards Fewer Annotations: Active Learning via Region Impurity and Prediction Uncertainty for Domain Adaptive Semantic Segmentation

Weakly Supervised Semantic Segmentation using Out-of-Distribution Data

ReSTR: Convolution-free Referring Image Segmentation Using Transformers

FIFO: Learning Fog-invariant Features for Foggy Scene Segmentation

WildNet: Learning Domain Generalized Semantic Segmentation from the Wild

Semantic-Aware Domain Generalized Segmentation

FocalClick: Towards Practical Interactive Image Segmentation

Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation

Pin the Memory: Learning to Generalize Semantic Segmentation

Coarse-to-Fine Feature Mining for Video Semantic Segmentation

L2G: A Simple Local-to-Global Knowledge Transfer Framework for Weakly Supervised Semantic Segmentation

Video K-Net: A Simple, Strong, and Unified Baseline for Video Segmentation

NightLab: A Dual-level Architecture with Hardness Detection for Segmentation at Night

TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation

Panoptic, Instance and Semantic Relations: A Relational Context Encoder to Enhance Panoptic Segmentation

Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity

Joint Forecasting of Panoptic Segmentations with Difference Attention (Oral)

Cross-Image Relational Knowledge Distillation for Semantic Segmentation


弱监督语义分割/Weakly Supervised Semantic Segmentation


医学图像分割/Medical Image Segmentation


视频目标分割/Video Object Segmentation

Language as Queries for Referring Video Object Segmentation


交互式视频目标分割/Interactive Video Object Segmentation

MSTR: Multi-Scale Transformer for End-to-End Human-Object Interaction Detection

OakInk: A Large-scale Knowledge Repository for Understanding Hand-Object Interaction

What to look at and where: Semantic and Spatial Refined Transformer for detecting human-object interactions


Visual Transformer

Affine Medical Image Registration with Coarse-to-Fine Vision Transformer

Automated Progressive Learning for Efficient Training of Vision Transformers

Attribute Surrogates Learning and Spectral Tokens Pooling in Transformers for Few-shot Learning

Cascade Transformers for End-to-End Person Search

EDTER: Edge Detection with Transformer

Few-Shot Object Detection with Fully Cross-Transformer

Global Tracking Transformers

GradViT: Gradient Inversion of Vision Transformers

Hyperbolic Vision Transformers: Combining Improvements in Metric Learning

Meta-attention for ViT-backed Continual Learning

MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation

Self-Supervised Transformers for Unsupervised Object Discovery using Normalized Cut

Training-free Transformer Architecture Search

Towards Practical Certifiable Patch Defense with Vision Transformer

Towards Robust Vision Transformer

Collaborative Transformers for Grounded Situation Recognition

TubeDETR: Spatio-Temporal Video Grounding with Transformers

InstaFormer: Instance-Aware Image-to-Image Translation with Transformer

Spatial-Temporal Parallel Transformer for Arm-Hand Dynamic Estimation

Omni-DETR: Omni-Supervised Object Detection with Transformers

TransEditor: Transformer-Based Dual-Space GAN for Highly Controllable Facial Editing

VL-InterpreT: An Interactive Visualization Tool for Interpreting Vision-Language Transformers

CRAFT: Cross-Attentional Flow Transformer for Robust Optical Flow

Deformable Video Transformer

ReSTR: Convolution-free Referring Image Segmentation Using Transformers

TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting

Consistency driven Sequential Transformers Attention Model for Partially Observable Scenes

Multi-View Transformer for 3D Visual Grounding

Dual-AI: Dual-path Action Interaction Learning for Group Activity Recognition

Detector-Free Weakly Supervised Group Activity Recognition

Text Spotting Transformers

PSTR: End-to-End One-Step Person Search With Transformers

Consistency Learning via Decoding Path Augmentation for Transformers in Human Object Interaction Detection

Multimodal Transformer for Nursing Activity Recognition

Learning Trajectory-Aware Transformer for Video Super-Resolution

TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation

Are Multimodal Transformers Robust to Missing Modality?

MiniViT: Compressing Vision Transformers with Weight Multiplexing

ViTOL: Vision Transformer for Weakly Supervised Object Localization


深度估计/Depth Estimation

OACC-Net: Occlusion-Aware Cost Constructor for Light Field Depth Estimation

P3Depth: Monocular Depth Estimation with a Piecewise Planarity Prior

HiMODE: A Hybrid Monocular Omnidirectional Depth Estimation Model


人脸识别/Face Recognition

Adaface: Quality Adaptive Margin for Face Recognition


人脸检测/Face Detection

Privacy-preserving Online AutoML for Domain-Specific Face Detection

Robust Neonatal Face Detection in Real-world Clinical Settings


人脸活体检测/Face Anti-Spoofing

Domain Generalization via Shuffled Style Assembly for Face Anti-Spoofing

PatchNet: A Simple Face Anti-Spoofing Framework via Fine-Grained Patch Recognition

Self-supervised Learning of Adversarial Example: Towards Good Generalizations for Deepfake Detection


人脸年龄估计/Age Estimation


人脸表情识别/Facial Expression Recognition

MDAN: Multi-level Dependent Attention Network for Visual Emotion Analysis

Towards Semi-Supervised Deep Facial Expression Recognition with An Adaptive Confidence Margin


人脸属性识别/Facial Attribute Recognition

Fair Contrastive Learning for Facial Attribute Classification

人脸编辑/Facial Editing

TransEditor: Transformer-Based Dual-Space GAN for Highly Controllable Facial Editing

Face Relighting with Geometrically Consistent Shadows

Escaping Data Scarcity for High-Resolution Heterogeneous Face Hallucination


换脸/Face Swap

High-resolution Face Swapping via Latent Semantics Disentanglement


人体姿态估计/Human Pose Estimation

Capturing Humans in Motion: Temporal-Attentive 3D Human Pose and Shape Estimation from Monocular Video

DiffPoseNet: Direct Differentiable Camera Pose Estimation

EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation

GPV-Pose: Category-level Object Pose Estimation via Geometry-guided Point-wise Voting

MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video

MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation

OSOP: A Multi-Stage One Shot Object Pose Estimation Framework

Temporal Feature Alignment and Mutual Information Maximization for Video-Based Human Pose Estimation

PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision

Ray3D: ray-based 3D human pose estimation for monocular absolute 3D localization

Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose Estimation

Templates for 3D Object Pose Estimation Revisited: Generalization to New Objects and Robustness to Occlusions

Focal Length and Object Pose Estimation via Render and Compare


6D位姿估计 /6D Pose Estimation

FS6D: Few-Shot 6D Pose Estimation of Novel Objects

Uni6D: A Unified CNN Framework without Projection Breakdown for 6D Pose Estimation

ZebraPose: Coarse to Fine Surface Encoding for 6DoF Object Pose Estimation

RNNPose: Recurrent 6-DoF Object Pose Refinement with Robust Correspondence Field Estimation and Pose Optimization

ES6D: A Computation Efficient and Symmetry-Aware 6D Pose Regression Framework


手势姿态估计(重建)/Hand Pose Estimation( Hand Mesh Recovery


视频动作检测/Video Action Detection

DirecFormer: A Directed Attention in Transformer Approach to Robust Action Recognition

End-to-End Semi-Supervised Learning for Video Action Detection

How Do You Do It? Fine-Grained Action Understanding with Pseudo-Adverbs

Look for the Change: Learning Object States and State-Modifying Actions from Untrimmed Web Videos

RCL: Recurrent Continuous Localization for Temporal Action Detection

SPAct: Self-supervised Privacy Preservation for Action Recognition

An Empirical Study of End-to-End Temporal Action Detection

SOS! Self-supervised Learning Over Sets Of Handled Objects In Egocentric Action Recognition


手语翻译/Sign Language Translation

A Simple Multi-Modality Transfer Learning Baseline for Sign Language Translation


3D人体重建/Person Reconstruction

ImFace: A Nonlinear 3D Morphable Face Model with Implicit Neural Representations

AutoSDF: Shape Priors for 3D Completion, Reconstruction and Generation

Learning Motion-Dependent Appearance for High-Fidelity Rendering of Dynamic Humans from a Single Camera

MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation

OcclusionFusion: Occlusion-aware Motion Estimation for Real-time Dynamic 3D Reconstruction

Structured Local Radiance Fields for Human Avatar Modeling


行人重识别/Person Re-identification

Camera-Conditioned Stable Feature Generation for Isolated Camera Supervised Person Re-IDentification

Part-based Pseudo Label Refinement for Unsupervised Person Re-identification

Cloning Outfits from Real-World Images to 3D Characters for Generalizable Person Re-Identification

Implicit Sample Extension for Unsupervised Person Re-Identification

Clothes-Changing Person Re-identification with RGB Modality Only


行人搜索/Person Search


人群计数 / Crowd Counting



A Style-aware Discriminator for Controllable Image Translation

Attribute Group Editing for Reliable Few-shot Image Generation

Bailando: 3D Dance Generation by Actor-Critic GPT with Choreographic Memory

Compound Domain Generalization via Meta-Knowledge Encoding

Diverse Plausible 360-Degree Image Outpainting for Efficient 3DCG Background Creation

Exact Feature Distribution Matching for Arbitrary Style Transfer and Domain Generalization

FlexIT: Towards Flexible Semantic Image Translation

GCFSR: a Generative and Controllable Face Super Resolution Method Without Facial and GAN Priors

GAN-Supervised Dense Visual Alignment

GIRAFFE HD: A High-Resolution 3D-aware Generative Model

HyperStyle: StyleGAN Inversion with HyperNetworks for Real Image Editing

Look Outside the Room: Synthesizing A Consistent Long-Term 3D Scene Video from A Single Image

Modulated Contrast for Versatile Image Synthesis

Maximum Spatial Perturbation Consistency for Unpaired Image-to-Image Translation

Pastiche Master: Exemplar-Based High-Resolution Portrait Style Transfer

QS-Attn: Query-Selected Attention for Contrastive Learning in I2I Translation

  • 论文/Paper
  • 代码/Code:

RGB-Depth Fusion GAN for Indoor Depth Completion

Stacked Hybrid-Attention and Group Collaborative Learning for Unbiased Scene Graph Generation

Style Transformer for Image Inversion and Editing

Unsupervised Domain Adaptation for Nighttime Aerial Tracking

Wavelet Knowledge Distillation: Towards Efficient Image-to-Image Translation

Industrial Style Transfer with Large-scale Geometric Warping and Content Preservation

TransEditor: Transformer-Based Dual-Space GAN for Highly Controllable Facial Editing

TransEditor: Transformer-Based Dual-Space GAN for Highly Controllable Facial Editing

Marginal Contrastive Correspondence for Guided Image Generation

Style-Based Global Appearance Flow for Virtual Try-On

Arbitrary-Scale Image Synthesis

Unsupervised Image-to-Image Translation with Generative Prior

Commonality in Natural Images Rescues GANs: Pretraining GANs with Generic and Privacy-free Synthetic Data

medXGAN: Visual Explanations for Medical Classifiers through a Generative Latent Space

Multi-View Consistent Generative Adversarial Networks for 3D-aware Image Synthesis


彩妆迁移 / Color-Pattern Makeup Transfer


字体生成 / Font Generation



Fourier Document Restoration for Robust Document Dewarping and Recognition

SimAN: Exploring Self-Supervised Representation Learning of Scene Text via Similarity-Aware Normalization


A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-resolution

场景文本检测、识别/Scene Text Detection/Recognition

Kernel Proposal Network for Arbitrary Shape Text Detection

SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition

Towards End-to-End Unified Scene Text Detection and Layout Analysis


Open-set Text Recognition via Character-Context Decoupling


其它(文档图像预训练模型,Text VQA、数据集,Retrieval , 应用)


图像、视频检索 / Image Retrieval/Video retrieval

Correlation Verification for Image Retrieval

Sketching without Worrying: Noise-Tolerant Sketch-Based Image Retrieval

Beyond Cross-view Image Retrieval: Highly Accurate Vehicle Localization Using Satellite Image

Probabilistic Compositional Embeddings for Multimodal Image Retrieval


Image Animation

Thin-Plate Spline Motion Model for Image Animation


抠图/Image Matting/Video Matting


超分辨率/Super Resolution

Details or Artifacts: A Locally Discriminative Learning Approach to Realistic Image Super-Resolution

Learning Graph Regularisation for Guided Super-Resolution

Reflash Dropout in Image Super-Resolution

Look Back and Forth: Video Super-Resolution with Explicit Temporal Difference Modeling


图像复原/Image Restoration

Exploring and Evaluating Image Restoration Potential in Dynamic Scenes

Interacting Attention Graph for Single Image Two-Hand Reconstruction


图像补全/Image Inpainting

Bridging Global Context Interactions for High-Fidelity Image Completion

MAT: Mask-Aware Transformer for Large Hole Image Inpainting

MISF: Multi-level Interactive Siamese Filtering for High-Fidelity Image Inpainting

Towards An End-to-End Framework for Flow-Guided Video Inpainting


图像去噪/Image Denoising

AP-BSN: Self-Supervised Denoising for Real-World Images via Asymmetric PD and Blind-Spot Network

Blind2Unblind: Self-Supervised Image Denoising with Visible Blind Spots

CVF-SID: Cyclic multi-Variate Function for Self-Supervised Image Denoising by Disentangling Noise from Image

Learning to Deblur using Light Field Generated and Real Defocus Images

Dancing under the stars: video denoising in starlight


图像编辑/Image Editing


图像拼接/Image stitching

Deep Rectangling for Image Stitching: A Learning Baseline


图像匹配/Image Matching


图像融合/Image Blending


图像去雾/Image Dehazing


图像压缩/Image Compression

ELIC: Efficient Learned Image Compression with Unevenly Grouped Space-Channel Contextual Adaptive Coding

Unified Multivariate Gaussian Mixture for Efficient Neural Image Compression


反光去除/Reflection Removal


车道线检测/Lane Detection

CLRNet: Cross Layer Refinement Network for Lane Detection

Eigenlanes: Data-Driven Lane Descriptors for Structurally Diverse Lanes

Rethinking Efficient Lane Detection via Curve Modeling

Towards Driving-Oriented Metric for Lane Detection Models


自动驾驶 / Autonomous Driving

Learning from All Vehicles


流体重建/Fluid Reconstruction


场景重建 / Scene Reconstruction

3D Shape Reconstruction from 2D Images with Disentangled Attribute Flow

NeRFusion: Fusing Radiance Fields for Large-Scale Scene Reconstruction

PlaneMVS: 3D Plane Reconstruction from Multi-View Stereo


视频插帧/Frame Interpolation

Long-term Video Frame Interpolation via Feature Propagation

TimeReplayer: Unlocking the Potential of Event Cameras for Video Interpolation

Unifying Motion Deblurring and Frame Interpolation with Events

Time Lens++: Event-based Frame Interpolation with Parametric Non-linear Flow and Multi-scale Fusion

Many-to-many Splatting for Efficient Video Frame Interpolation


视频超分 / Video Super-Resolution

Reference-based Video Super-Resolution Using Multi-Camera Video Triplets


3D点云/3D point cloud

ART-Point: Improving Rotation Robustness of Point Cloud Classifiers via Adversarial Rotation

AziNorm: Exploiting the Radial Symmetry of Point Cloud for Azimuth-Normalized 3D Perception

Contrastive Boundary Learning for Point Cloud Segmentation

Equivariant Point Cloud Analysis via Learning Orientations for Message Passing

IDEA-Net: Dynamic 3D Point Cloud Interpolation via Deep Embedding Alignment

Learning a Structured Latent Space for Unsupervised Point Cloud Completion

Not All Points Are Equal: Learning Highly Efficient Point-based Detectors for 3D LiDAR Point Clouds

No Pain, Big Gain: Classify Dynamic Point Cloud Sequences with Static Models by Fitting Feature-level Space-time Surfaces

REGTR: End-to-end Point Cloud Correspondences with Transformers

SC^2-PCR: A Second Order Spatial Compatibility for Efficient and Robust Point Cloud Registration

Stratified Transformer for 3D Point Cloud Segmentation

Shape-invariant 3D Adversarial Point Clouds

WarpingGAN: Warping Multiple Uniform Priors for Adversarial 3D Point Cloud Generation

Deformation and Correspondence Aware Unsupervised Synthetic-to-Real Scene Flow Estimation for Point Clouds

Deformation and Correspondence Aware Unsupervised Synthetic-to-Real Scene Flow Estimation for Point Clouds

Learning Local Displacements for Point Cloud Completion

3DeformRS: Certifying Spatial Deformations on Point Clouds


标签噪声 Label-Noise


对抗样本 / Adversarial Examples

LAS-AT: Adversarial Training with Learnable Attack Strategy



DINE: Domain Adaptation from Single and Multiple Black-box Predictors

It's About Time: Analog clock Reading in the Wild

Neural Face Identification in a 2D Wireframe Projection of a Manifold Object

Probabilistic Warp Consistency for Weakly-Supervised Semantic Correspondences

TeachAugment: Data Augmentation Optimization Using Teacher Knowledge

UKPGAN: Unsupervised KeyPoint GANeration

DeltaCNN: End-to-End CNN Inference of Sparse Frame Differences in Videos

Generative Cooperative Learning for Unsupervised Video Anomaly Detection

Shadows can be Dangerous: Stealthy and Effective Physical-world Adversarial Attack by Natural Phenomenon

Unknown-Aware Object Detection: Learning What You Don't Know from Videos in the Wild

On Generalizing Beyond Domains in Cross-Domain Continual Learning

Generating 3D Bio-Printable Patches Using Wound Segmentation and Reconstruction to Treat Diabetic Foot Ulcers

What Matters For Meta-Learning Vision Regression Tasks?

ChiTransformer:Towards Reliable Stereo from Cues

Dynamic Dual-Output Diffusion Models

Spatial Commonsense Graph for Object Localisation in Partial Scenes

Practical Evaluation of Adversarial Robustness via Adaptive Auto Attack

Frequency-driven Imperceptible Adversarial Attack on Semantic Similarity

REX: Reasoning-aware and Grounded Explanation

FLAG: Flow-based 3D Avatar Generation from Sparse Observations

Learning Distinctive Margin toward Active Domain Adaptation

Active Learning by Feature Mixing

UniVIP: A Unified Framework for Self-Supervised Visual Pre-training

Forward Compatible Few-Shot Class-Incremental Learning

XYLayoutLM: Towards Layout-Aware Multimodal Networks For Visually-Rich Document Understanding

Accelerating DETR Convergence via Semantic-Aligned Matching

ADAS: A Direct Adaptation Strategy for Multi-Target Domain Adaptive Semantic Segmentation

Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs

LAS-AT: Adversarial Training with Learnable Attack Strategy

Depth-Aware Generative Adversarial Network for Talking Head Video Generation

AutoGPart: Intermediate Supervision Search for Generalizable 3D Part Segmentation

Sparse Local Patch Transformer for Robust Face Alignment and Landmarks Inherent Relation Learning

Implicit Feature Decoupling with Depthwise Quantization

Interspace Pruning: Using Adaptive Filter Representations to Improve Training of Sparse CNNs

Learning What Not to Segment: A New Perspective on Few-Shot Segmentation

Can Neural Nets Learn the Same Mode

l Twice? Investigating Reproducibility and Double Descent from the Decision Boundary Perspective

Scalable Penalized Regression for Noise Detection in Learning with Noisy Labels

Deep vanishing point detection: Geometric priors make dataset variations vanish

Non-isotropy Regularization for Proxy-based Deep Metric Learning

Integrating Language Guidance into Vision-based Deep Metric Learning

Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding

The Devil Is in the Details: Window-based Attention for Image Compression

Represent, Compare, and Learn: A Similarity-Aware Framework for Class-Agnostic Counting

Vox2Cortex: Fast Explicit Reconstruction of Cortical Surfaces from 3D MRI Scans with Geometric Deep Neural Networks

Bi-directional Object-context Prioritization Learning for Saliency Ranking

Object Localization under Single Coarse Point Supervision

Neural Compression-Based Feature Learning for Video Restoration

MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-based Visual Question Answering

Improving the Transferability of Targeted Adversarial Examples through Object-Based Diverse Input

DATA: Domain-Aware and Task-Aware Pre-training

Fine-tuning Global Model via Data-Free Knowledge Distillation for Non-IID Federated Learning

Global Convergence of MAML and Theory-Inspired Neural Architecture Search for Few-Shot Learning

Learning Affordance Grounding from Exocentric Images

DTA: Physical Camouflage Attacks using Differentiable Transformation Network

Cross-Modal Perceptionist: Can Face Geometry be Gleaned from Voices?

Revisiting Domain Generalized Stereo Matching Networks from a Feature Consistency Perspective

ViM: Out-Of-Distribution with Virtual-logit Matching

Delving into the Estimation Shift of Batch Normalization in a Network

Depth Estimation by Combining Binocular Stereo and Monocular Structured-Light

TVConv: Efficient Translation Variant Convolution for Layout-aware Visual Processing

Portrait Eyeglasses and Shadow Removal by Leveraging 3D Synthetic Data

Discovering Objects that Can Move

φ-SfT: Shape-from-Template with a Physics-Based Deformation Model

Practical Stereo Matching via Cascaded Recurrent Network with Adaptive Correlation

Mixed Differential Privacy in Computer Vision

Global Matching with Overlapping Attention for Optical Flow Estimation

DR.VIC: Decomposition and Reasoning for Video Individual Counting

DTFD-MIL: Double-Tier Feature Distillation Multiple Instance Learning for Histopathology Whole Slide Image Classification

Efficient Translation Variant Convolution for Layout-aware Visual Processing

Moving Window Regression: A Novel Approach to Ordinal Regression

Egocentric Prediction of Action Target in 3D

Hierarchical Nearest Neighbor Graph Embedding for Efficient Dimensionality Reduction

Neural Reflectance for Shape Recovery with Shadow Handling

DyRep: Bootstrapping Training with Dynamic Re-parameterization

Multidimensional Belief Quantification for Label-Efficient Meta-Learning

Give Me Your Attention: Dot-Product Attention Considered Harmful for Adversarial Patch Robustness

Unsupervised Pre-training for Temporal Action Localization Tasks

Continual Test-Time Domain Adaptation

Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

NPBG++: Accelerating Neural Point-Based Graphics

Weakly-Supervised Online Action Segmentation in Multi-View Instructional Videos

Probing Representation Forgetting in Supervised and Unsupervised Continual Learning

Energy-based Latent Aligner for Incremental Learning

Controllable Dynamic Multi-Task Architectures

Attributable Visual Similarity Learning

Learning Where to Learn in Cross-View Self-Supervised Learning

Doodle It Yourself: Class Incremental Learning by Drawing a Few Sketches

Partially Does It: Towards Scene-Level FG-SBIR with Partial Input

Bi-level Doubly Variational Learning for Energy-based Latent Variable Models

Sketch3T: Test-Time Training for Zero-Shot SBIR

Brain-inspired Multilayer Perceptron with Spiking Neurons

Catching Both Gray and Black Swans: Open-set Supervised Anomaly Detection

NOC-REK: Novel Object Captioning with Retrieved Vocabulary from External Knowledge

ARCS: Accurate Rotation and Correspondence Search

iPLAN: Interactive and Procedural Layout Planning

Locality-Aware Inter-and Intra-Video Reconstruction for Self-Supervised Correspondence Learning

Local-Adaptive Face Recognition via Graph-based Meta-Clustering and Regularized Adaptation

Unsupervised Vision-Language Parsing: Seamlessly Bridging Visual Scene Graphs with Language Structures via Dependency Relationships

Knowledge Mining with Scene Text for Fine-Grained Recognition

Long-Tailed Recognition via Weight Balancing

HINT: Hierarchical Neuron Concept Explainer

Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos

Learning to Answer Questions in Dynamic Audio-Visual Scenarios

Neural MoCon: Neural Motion Control for Physically Plausible Human Motion Capture

Visual Abductive Reasoning

RSCFed: Random Sampling Consensus Federated Semi-supervised Learning

GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection

Sylph: A Hypernetwork Framework for Incremental Few-shot Object Detection

Causality Inspired Representation Learning for Domain Generalization

Transformer-empowered Multi-scale Contextual Matching and Aggregation for Multi-contrast MRI Super-resolution

CHEX: CHannel EXploration for CNN Model Compression

FisherMatch: Semi-Supervised Rotation Regression via Entropy-based Filtering

EnvEdit: Environment Editing for Vision-and-Language Navigation

Exploring Frequency Adversarial Attacks for Face Forgery Detection

BARC: Learning to Regress 3D Dog Shape from Images by Exploiting Breed Information

Learning Structured Gaussians to Approximate Deep Ensembles

Quantifying Societal Bias Amplification in Image Captioning

Alignment-Uniformity aware Representation Learning for Zero-shot Video Classification

Self-Supervised Image Representation Learning with Geometric Set Consistency

Nested Collaborative Learning for Long-Tailed Visual Recognition

Online Continual Learning on a Contaminated Data Stream with Blurry Task Boundaries

CNN Filter DB: An Empirical Investigation of Trained Convolutional Filters

Dressing in the Wild by Watching Dance Videos

Eigencontours: Novel Contour Descriptors Based on Low-Rank Approximation

Pop-Out Motion: 3D-Aware Image Deformation via Learning the Shape Laplacian

Zero-Query Transfer Attacks on Context-Aware Object Detectors

ASM-Loc: Action-aware Segment Modeling for Weakly-Supervised Temporal Action Localization

Registering Explicit to Implicit: Towards High-Fidelity Garment mesh Reconstruction from Single Images

Clean Implicit 3D Structure from Noisy 2D STEM Images

Equivariance Allows Handling Multiple Nuisance Variables When Analyzing Pooled Neuroimaging Datasets

Large-Scale Pre-training for Person Re-identification with Noisy Labels

Understanding 3D Object Articulation in Internet Videos

CaDeX: Learning Canonical Deformation Coordinate Space for Dynamic Surface Representation via Neural Homeomorphism

Unseen Classes at a Later Time? No Problem

Fast Light-Weight Near-Field Photometric Stereo

AdaMixer: A Fast-Converging Query-Based Object Detector

Fast, Accurate and Memory-Efficient Partial Permutation Synchronization

Balanced MSE for Imbalanced Visual Regression

Multi-Robot Active Mapping via Neural Bipartite Graph Matching

Image-to-Lidar Self-Supervised Distillation for Autonomous Driving Data

FLOAT: Factorized Learning of Object Attributes for Improved Multi-object Multi-part Scene Parsing

STRPM: A Spatiotemporal Residual Predictive Model for High-Resolution Video Prediction

Learning Program Representations for Food Images and Cooking Recipes

AxIoU: An Axiomatically Justified Measure for Video Moment Retrieval

Progressively Generating Better Initial Guesses Towards Next Stages for High-Quality Human Motion Prediction

Iterative Deep Homography Estimation

PSMNet: Position-aware Stereo Merging Network for Room Layout Estimation

Disentangled3D: Learning a 3D Generative Model with Disentangled Geometry and Appearance from Monocular Images

Learning to Detect Mobile Objects from LiDAR Scans Without Labels

Proactive Image Manipulation Detection

NICGSlowDown: Evaluating the Efficiency Robustness of Neural Image Caption Generation Models

Practical Learned Lossless JPEG Recompression with Multi-Level Cross-Channel Entropy Model in the DCT Domain

Bringing Old Films Back to Life

Generating High Fidelity Data from Low-density Regions using Diffusion Models

Continuous Scene Representations for Embodied AI

SimVQA: Exploring Simulated Environments for Visual Question Answering

Leverage Your Local and Global Representations: A New Self-Supervised Learning Strategy

AEGNN: Asynchronous Event-based Graph Neural Networks

It's All In the Teacher: Zero-Shot Quantization Brought Closer to the Teacher

Towards Robust Rain Removal Against Adversarial Attacks: A Comprehensive Benchmark Analysis and Beyond

End-to-End Trajectory Distribution Prediction Based on Occupancy Grid Maps

Reflection and Rotation Symmetry Detection via Equivariant Learning

Stochastic Backpropagation: A Memory Efficient Strategy for Training Video Models

Personalized Image Aesthetics Assessment with Rich Attributes

Constrained Few-shot Class-incremental Learning

Counterfactual Cycle-Consistent Learning for Instruction Following and Generation in Vision-Language Navigation

Exploiting Explainable Metrics for Augmented SGD

Task Adaptive Parameter Sharing for Multi-Task Learning

D-Grasp: Physically Plausible Dynamic Grasp Synthesis for Hand-Object Interactions

On the Importance of Asymmetry for Siamese Representation Learning

DIP: Deep Inverse Patchmatch for High-Resolution Optical Flow

Unimodal-Concentrated Loss: Fully Adaptive Label Distribution Learning for Ordinal Regression

Perception Prioritized Training of Diffusion Models

Bridging the Gap between Classification and Localization for Weakly Supervised Object Localization

GraftNet: Towards Domain Generalized Stereo Matching with a Broad-Spectrum and Task-Oriented Feature

LASER: LAtent SpacE Rendering for 2D Visual Localization


代码/Code: None

TransGeo: Transformer Is All You Need for Cross-view Image Geo-localization

Investigating Top-$k$ White-Box and Transferable Black-box Attack

Efficient Maximal Coding Rate Reduction by Variational Forms

Joint Hand Motion and Interaction Hotspots Prediction from Egocentric Videos

LISA: Learning Implicit Shape and Appearance of Hands

Exemplar-bsaed Pattern Synthesis with Implicit Periodic Field Network

Degradation-agnostic Correspondence from Resolution-asymmetric Stereo

RayMVSNet: Learning Ray-based 1D Implicit Fields for Accurate Multi-View Stereo

Exploiting Temporal Relations on Radar Perception for Autonomous Driving

BNV-Fusion: Dense 3D Reconstruction using Bi-level Neural Volume Fusion

Neural Global Shutter: Learn to Restore Video from a Rolling Shutter Camera with Global Reset Feature

DST: Dynamic Substitute Training for Data-free Black-box Attack

Progressive Minimal Path Method with Embedded CNN

Online Convolutional Re-parameterization

SIMBAR: Single Image-Based Scene Relighting For Effective Data Augmentation For Automated Driving Vision Tasks

Rethinking Visual Geo-localization for Large-Scale Applications

IRON: Inverse Rendering by Optimizing Neural SDFs and Materials from Photometric Images

SNUG: Self-Supervised Neural Dynamic Garments

Leveraging Equivariant Features for Absolute Pose Regression

MonoTrack: Shuttle trajectory reconstruction from monocular badminton video

Revisiting Near/Remote Sensing with Geospatial Attention

Temporal Alignment Networks for Long-term Video

"The Pedestrian next to the Lamppost" Adaptive Object Graphs for Better Instantaneous Mapping

Masking Adversarial Damage: Finding Adversarial Saliency for Robust and Sparse Network

Aesthetic Text Logo Synthesis via Content-aware Layout Inferring

Learning to Anticipate Future with Dynamic Context Removal

SqueezeNeRF: Further factorized FastNeRF for memory-efficient inference

Gait Recognition in the Wild with Dense 3D Representations and A Benchmark

MixFormer: Mixing Features across Windows and Dimensions

RODD: A Self-Supervised Approach for Robust Out-of-Distribution Detection

Adversarial Robustness through the Lens of Convolutional Filters

Learning Optimal K-space Acquisition and Reconstruction using Physics-Informed Neural Networks

Total Variation Optimization Layers for Computer Vision

Pre-train, Self-train, Distill: A simple recipe for Supersizing 3D Reconstruction

Class-Incremental Learning with Strong Pre-trained Models

AutoRF: Learning 3D Object Radiance Fields from Single View Observations

Deep Visual Geo-localization Benchmark

Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality

UIGR: Unified Interactive Garment Retrieval

AUV-Net: Learning Aligned UV Maps for Texture Transfer and Synthesis

Hierarchical Self-supervised Representation Learning for Movie Understanding

Learning from Untrimmed Videos: Self-Supervised Video Representation Learning with Hierarchical Consistency

Multi-Scale Memory-Based Video Deblurring

Gravitationally Lensed Black Hole Emission Tomography

General Incremental Learning with Domain-aware Categorical Representations

Identifying Ambiguous Similarity Conditions via Semantic Matching

Does Robustness on ImageNet Transfer to Downstream Tasks?

Deep Hyperspectral-Depth Reconstruction Using Single Color-Dot Projection

CD$^2$-pFed: Cyclic Distillation-guided Channel Decoupling for Model Personalization in Federated Learning

Reusing the Task-specific Classifier as a Discriminator: Discriminator-free Adversarial Domain Adaptation

TorMentor: Deterministic dynamic-path, data augmentations with fractals

TemporalUV: Capturing Loose Clothing with Temporally Coherent UV Coordinates

Single-Photon Structured Light

Pyramid Grafting Network for One-Stage High Resolution Saliency Detection

Structure-Aware Motion Transfer with Deformable Anchor Model

Reasoning with Multi-Structure Commonsense Knowledge in Visual Dialog

NAN: Noise-Aware NeRFs for Burst-Denoising

Learning Pixel-Level Distinctions for Video Highlight Detection

Explaining Deep Convolutional Neural Networks via Latent Visual-Semantic Filter Attention

DeepLIIF: An Online Platform for Quantification of Clinical Pathology Slides

ManiTrans: Entity-Level Text-Guided Image Manipulation via Token-wise Semantic Alignment and Generation

FedCorr: Multi-Stage Federated Learning for Label Noise Correction

Adaptive Differential Filters for Fast and Communication-Efficient Federated Learning

The Two Dimensions of Worst-case Training and the Integrated Effect for Out-of-domain Generalization

Continual Predictive Learning from Videos

Few-shot Learning with Noisy Labels

Out-Of-Distribution Detection In Unsupervised Continual Learning

Generalizing Adversarial Explanations with Grad-CAM

Recognition of Freely Selected Keypoints on Human Limbs

3D-SPS: Single-Stage 3D Visual Grounding via Referred Point Progressive Selection

Defensive Patches for Robust Recognition in the Physical World

COAP: Compositional Articulated Occupancy of People

What's in your hands? 3D Reconstruction of Generic Objects in Hands

GIFS: Neural Implicit Function for General Shape Representation

The multi-modal universe of fast-fashion: the Visuelle 2.0 benchmark

Semi-Supervised Training to Improve Player and Ball Detection in Soccer

Pyramidal Attention for Saliency Detection

OccAM's Laser: Occlusion-based Attribution Maps for 3D Object Detectors on LiDAR Data



如何评价 CVPR2022 的论文接收结果?


收集 CVPR 最新的成果,包括论文、代码和demo视频等,欢迎大家推荐!







No releases published


No packages published