Skip to content

Duck-BilledPlatypus/CVPR2022-Papers-with-Code

 
 

Repository files navigation

CVPR 2022 论文和开源项目合集(Papers with Code)

CVPR 2022 论文和开源项目合集(papers with code)!

CVPR 2022 收录列表ID:https://drive.google.com/file/d/15JFhfPboKdUcIH9LdbCMUFmGq_JhaxhC/view

注1:欢迎各位大佬提交issue,分享CVPR 2022论文和开源项目!

注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision

如果你想了解最新最优质的的CV论文、开源项目和学习资料,欢迎扫码加入【CVer学术交流群】!互相学习,一起进步~

【CVPR 2022 论文开源目录】

Backbone

A ConvNet for the 2020s

Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs

MPViT : Multi-Path Vision Transformer for Dense Prediction

Mobile-Former: Bridging MobileNet and Transformer

MetaFormer is Actually What You Need for Vision

CLIP

HairCLIP: Design Your Hair by Text and Reference Image

PointCLIP: Point Cloud Understanding by CLIP

Blended Diffusion for Text-driven Editing of Natural Images

GAN

SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing

Style Transformer for Image Inversion and Editing

NAS

β-DARTS: Beta-Decay Regularization for Differentiable Architecture Search

ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior

NeRF

Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields

Point-NeRF: Point-based Neural Radiance Fields

NeRF in the Dark: High Dynamic Range View Synthesis from Noisy Raw Images

Urban Radiance Fields

Pix2NeRF: Unsupervised Conditional π-GAN for Single Image to Neural Radiance Fields Translation

HumanNeRF: Free-viewpoint Rendering of Moving People from Monocular Video

Visual Transformer

Backbone

MPViT : Multi-Path Vision Transformer for Dense Prediction

MetaFormer is Actually What You Need for Vision

Mobile-Former: Bridging MobileNet and Transformer

应用(Application)

Language-based Video Editing via Multi-Modal Multi-Level Transformer

MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video

Embracing Single Stride 3D Object Detector with Sparse Transformer

Multi-class Token Transformer for Weakly Supervised Semantic Segmentation

Spatio-temporal Relation Modeling for Few-shot Action Recognition

Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction

Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling

GroupViT: Semantic Segmentation Emerges from Text Supervision

Restormer: Efficient Transformer for High-Resolution Image Restoration

Splicing ViT Features for Semantic Appearance Transfer

Self-supervised Video Transformer

Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers

Accelerating DETR Convergence via Semantic-Aligned Matching

DN-DETR: Accelerate DETR Training by Introducing Query DeNoising

Style Transformer for Image Inversion and Editing

MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer

Mask Transfiner for High-Quality Instance Segmentation

Language as Queries for Referring Video Object Segmentation

X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning

AdaMixer: A Fast-Converging Query-Based Object Detector

Omni-DETR: Omni-Supervised Object Detection with Transformers

视觉和语言(Vision-Language)

Conditional Prompt Learning for Vision-Language Models

Bridging Video-text Retrieval with Multiple Choice Question

自监督学习(Self-supervised Learning)

UniVIP: A Unified Framework for Self-Supervised Visual Pre-training

Crafting Better Contrastive Views for Siamese Representation Learning

HCSC: Hierarchical Contrastive Selective Coding

数据增强(Data Augmentation)

TeachAugment: Data Augmentation Optimization Using Teacher Knowledge

AlignMix: Improving representation by interpolating aligned features

目标检测(Object Detection)

BoxeR: Box-Attention for 2D and 3D Transformers

DN-DETR: Accelerate DETR Training by Introducing Query DeNoising

Accelerating DETR Convergence via Semantic-Aligned Matching

Localization Distillation for Dense Object Detection

Focal and Global Knowledge Distillation for Detectors

A Dual Weighting Label Assignment Scheme for Object Detection

AdaMixer: A Fast-Converging Query-Based Object Detector

Omni-DETR: Omni-Supervised Object Detection with Transformers

目标跟踪(Visual Tracking)

Correlation-Aware Deep Tracking

TCTrack: Temporal Contexts for Aerial Tracking

多目标跟踪(Multi-Object Tracking)

Learning of Global Objective for Network Flow in Multi-Object Tracking

语义分割(Semantic Segmentation)

弱监督语义分割

Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation

Multi-class Token Transformer for Weakly Supervised Semantic Segmentation

Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers

半监督语义分割

ST++: Make Self-training Work Better for Semi-supervised Semantic Segmentation

Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels

无监督语义分割

GroupViT: Semantic Segmentation Emerges from Text Supervision

实例分割(Instance Segmentation)

BoxeR: Box-Attention for 2D and 3D Transformers

E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation

Mask Transfiner for High-Quality Instance Segmentation

自监督实例分割

FreeSOLO: Learning to Segment Objects without Annotations

视频实例分割

Efficient Video Instance Segmentation via Tracklet Query and Proposal

小样本分割(Few-Shot Segmentation)

Learning What Not to Segment: A New Perspective on Few-Shot Segmentation

视频理解(Video Understanding)

Self-supervised Video Transformer

行为识别(Action Recognition)

Spatio-temporal Relation Modeling for Few-shot Action Recognition

动作检测(Action Detection)

End-to-End Semi-Supervised Learning for Video Action Detection

图像编辑(Image Editing)

Style Transformer for Image Inversion and Editing

Blended Diffusion for Text-driven Editing of Natural Images

SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing

Low-level Vision

ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior

Restormer: Efficient Transformer for High-Resolution Image Restoration

超分辨率(Super-Resolution)

图像超分辨率(Image Super-Resolution)

Learning the Degradation Distribution for Blind Image Super-Resolution

视频超分辨率(Video Super-Resolution)

BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment

3D点云(3D Point Cloud)

Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling

A Unified Query-based Paradigm for Point Cloud Understanding

CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding

PointCLIP: Point Cloud Understanding by CLIP

3D目标检测(3D Object Detection)

BoxeR: Box-Attention for 2D and 3D Transformers

Embracing Single Stride 3D Object Detector with Sparse Transformer

Canonical Voting: Towards Robust Oriented Bounding Box Detection in 3D Scenes

MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer

3D语义分割(3D Semantic Segmentation)

Scribble-Supervised LiDAR Semantic Segmentation

3D目标跟踪(3D Object Tracking)

Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds

PTTR: Relational 3D Point Cloud Object Tracking with Transformer

3D人体姿态估计(3D Human Pose Estimation)

MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation

MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video

Distribution-Aware Single-Stage Models for Multi-Person 3D Pose Estimation

3D语义场景补全(3D Semantic Scene Completion)

MonoScene: Monocular 3D Semantic Scene Completion

3D重建(3D Reconstruction)

BANMo: Building Animatable 3D Neural Models from Many Casual Videos

伪装物体检测(Camouflaged Object Detection)

Zoom In and Out: A Mixed-scale Triplet Network for Camouflaged Object Detection

深度估计(Depth Estimation)

单目深度估计

NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation

OmniFusion: 360 Monocular Depth Estimation via Geometry-Aware Fusion

Toward Practical Self-Supervised Monocular Indoor Depth Estimation

立体匹配(Stereo Matching)

ACVNet: Attention Concatenation Volume for Accurate and Efficient Stereo Matching

车道线检测(Lane Detection)

Rethinking Efficient Lane Detection via Curve Modeling

图像修复(Image Inpainting)

Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding

人群计数(Crowd Counting)

Leveraging Self-Supervision for Cross-Domain Crowd Counting

医学图像(Medical Image)

BoostMIS: Boosting Medical Image Semi-supervised Learning with Adaptive Pseudo Labeling and Informative Active Annotation

场景图生成(Scene Graph Generation)

SGTR: End-to-end Scene Graph Generation with Transformer

参考视频目标分割(Referring Video Object Segmentation)

Language as Queries for Referring Video Object Segmentation

风格迁移(Style Transfer)

StyleMesh: Style Transfer for Indoor 3D Scene Reconstructions

Adversarial Examples(对抗样本)

Shadows can be Dangerous: Stealthy and Effective Physical-world Adversarial Attack by Natural Phenomenon

弱监督物体检测(Weakly Supervised Object Localization)

Weakly Supervised Object Localization as Domain Adaption

高光谱图像重建(Hyperspectral Image Reconstruction)

Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction

水印(Watermarking)

Deep 3D-to-2D Watermarking: Embedding Messages in 3D Meshes and Extracting Them from 2D Renderings

数据集(Datasets)

It's About Time: Analog Clock Reading in the Wild

Toward Practical Self-Supervised Monocular Indoor Depth Estimation

Kubric: A scalable dataset generator

Scribble-Supervised LiDAR Semantic Segmentation

新任务(New Task)

Language-based Video Editing via Multi-Modal Multi-Level Transformer

It's About Time: Analog Clock Reading in the Wild

Splicing ViT Features for Semantic Appearance Transfer

其他(Others)

Kubric: A scalable dataset generator

X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning

Balanced MSE for Imbalanced Visual Regression

About

CVPR 2022 论文和开源项目合集

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published