Skip to content

Latest commit

 

History

History
1456 lines (864 loc) · 46.4 KB

ECCV2024.md

File metadata and controls

1456 lines (864 loc) · 46.4 KB

Awesome-ECCV2024-AIGCAwesome

A Collection of Papers and Codes for ECCV2024 AIGC

整理汇总下今年ECCV AIGC相关的论文和代码,具体如下。

欢迎star,fork和PR~

Please feel free to star, fork or PR if helpful~

相关整理(Related Collections)

参考或转载请注明出处

ECCV2024官网:https://eccv.ecva.net/

ECCV接收论文列表:

ECCV完整论文库:https://www.ecva.net/papers.php

开会时间:2024年9月29日-10月4日

论文接收公布时间:2024年

【Contents】

1.图像生成(Image Generation/Image Synthesis)

∞-Brush : Controllable Large Image Synthesis with Diffusion Models in Infinite Dimensions

Accelerating Diffusion Sampling with Optimized Time Steps

Accelerating Image Generation with Sub-path Linear Approximation Model

AccDiffusion: An Accurate Method for Higher-Resolution Image Generation

AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation

AID-AppEAL: Automatic Image Dataset and Algorithm for Content Appeal Enhancement and Assessment Labeling

AnyControl: Create Your Artwork with Versatile Control on Text-to-Image Generation

Arc2Face: A Foundation Model for ID-Consistent Human Faces

Assessing Sample Quality via the Latent Space of Generative Models

AttentionHand: Text-driven Controllable Hand Image Generation for 3D Hand Reconstruction in the Wild

A Watermark-Conditioned Diffusion Model for IP Protection

Beta-Tuned Timestep Diffusion Model

BeyondScene: Higher-Resolution Human-Centric Scene Generation With Pretrained Diffusion

Block-removed Knowledge-distilled Stable Diffusion

Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation

COHO: Context-Sensitive City-Scale Hierarchical Urban Layout Generation

ColorPeel: Color Prompt Learning with Diffusion Models via Color and Shape Disentanglement

ComFusion: Personalized Subject Generation in Multiple Specific Scenes From Single Image

ConceptExpress: Harnessing Diffusion Models for Single-image Unsupervised Concept Extraction

ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback

D4-VTON: Dynamic Semantics Disentangling for Differential Diffusion based Virtual Try-On

Data Augmentation for Saliency Prediction via Latent Diffusion

DataDream: Few-shot Guided Dataset Generation

DC-Solver: Improving Predictor-Corrector Diffusion Sampler via Dynamic Compensation

Defect Spectrum: A Granular Look of Large-Scale Defect Datasets with Rich Semantics

DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control

DiffFAS: Face Anti-Spoofing via Generative Diffusion Models

DiffiT: Diffusion Vision Transformers for Image Generation

Diffusion2GAN: Distilling Diffusion Models into Conditional GANs

Distilling Diffusion Models into Conditional GANs

Efficient Training with Denoised Neural Weights

Energy-Calibrated VAE with Test Time Free Lunch

Enhancing Diffusion Models with Text-Encoder Reinforcement Learning

Enhancing Semantic Fidelity in Text-to-Image Synthesis: Attention Regulation in Diffusion Models

FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis

FreeCompose: Generic Zero-Shot Image Composition with Diffusion Prior

Getting it Right: Improving Spatial Consistency in Text-to-Image Models

Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering

HiDiffusion: Unlocking Higher-Resolution Creativity and Efficiency in Pretrained Diffusion Models

HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance

Improving Diffusion Models for Authentic Virtual Try-on in the Wild

Improving Geo-diversity of Generated Images with Contextualized Vendi Score Guidance

Improving Virtual Try-On with Garment-focused Diffusion Models

Inserting Anybody in Diffusion Models via Celeb Basis

Iterative Ensemble Training with Anti-Gradient Control for Mitigating Memorization in Diffusion Models

Large-scale Reinforcement Learning for Diffusion Models

Latent Guard: a Safety Framework for Text-to-image Generation

LayoutFlow: Flow Matching for Layout Generation

Learning Differentially Private Diffusion Models via Stochastic Adversarial Distillation

LogoSticker: Inserting Logos into Diffusion Models for Customized Generation

Lost in Translation: Latent Concept Misalignment in Text-to-Image Diffusion Models

Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation

MasterWeaver: Taming Editability and Identity for Personalized Text-to-Image Generation

Memory-Efficient Fine-Tuning for Quantized Diffusion Model

Merging and Splitting Diffusion Paths for Semantically Coherent Panoramas

MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization

Navigating Text-to-Image Generative Bias across Indic Languages

NeuroPictor: Refining fMRI-to-Image Reconstruction via Multi-individual Pretraining and Multi-level Modulation

Object-Conditioned Energy-Based Attention Map Alignment in Text-to-Image Diffusion Models

OMG: Occlusion-friendly Personalized Multi-concept Generation in Diffusion Models

One-Shot Diffusion Mimicker for Handwritten Text Generation

PartCraft: Crafting Creative Objects by Parts

PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

Post-training Quantization for Text-to-Image Diffusion Models with Progressive Calibration and Activation Relaxing

PosterLlama: Bridging Design Ability of Langauge Model to Contents-Aware Layout Generation

Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning

ProCreate, Dont Reproduce! Propulsive Energy Diffusion for Creative Generation

Receler: Reliable Concept Erasing of Text-to-Image Diffusion Models via Lightweight Erasers

ReGround: Improving Textual and Spatial Grounding at No Cost

Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models

Safeguard Text-to-Image Diffusion Models with Human Feedback Inversion

Self-Guided Generation of Minority Samples Using Diffusion Models

Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance

SlimFlow: Training Smaller One-Step Diffusion Models with Rectified Flow

SmartControl: Enhancing ControlNet for Handling Rough Visual Conditions

SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its Teacher

Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-Experts

StoryImager: A Unified and Efficient Framework for Coherent Story Visualization and Completion

T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models

The Gaussian Discriminant Variational Autoencoder (GdVAE): A Self-Explainable Model with Counterfactual Explanations

Timestep-Aware Correction for Quantized Diffusion Models

Towards Reliable Advertising Image Generation Using Human Feedback

Training-free Composite Scene Generation for Layout-to-Image Synthesis

UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models

Unmasking Bias in Diffusion Model Training

Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction

Zero-shot Text-guided Infinite Image Synthesis with LLM guidance

ZigMa: A DiT-Style Mamba-based Diffusion Model

2.图像编辑(Image Editing)

A Task is Worth One Word: Learning with Task Prompts for High-Quality Versatile Image Inpainting

BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion

COMPOSE: Comprehensive Portrait Shadow Editing

CQS: CBAM and Query-Selection Diffusion Model for text-driven Content-aware Image Style Transfer

Diffusion-Based Image-to-Image Translation by Noise Correction via Prompt Interpolation

DreamSampler: Unifying Diffusion Sampling and Score Distillation for Image Manipulation

Efficient 3D-Aware Facial Image Editing via Attribute-Specific Prompt Learning

Enhanced Controllability of Diffusion Models via Feature Disentanglement and Realism-Enhanced Sampling Methods

Every Pixel Has its Moments: Ultra-High-Resolution Unpaired Image-to-Image Translation via Dense Normalization

Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control

Fast Diffusion-Based Counterfactuals for Shortcut Removal and Generation

FlexiEdit: Frequency-Aware Latent Refinement for Enhanced Non-Rigid Editing

FreeDiff: Progressive Frequency Truncation for Image Editing with Diffusion Models

GarmentAligner: Text-to-Garment Generation via Retrieval-augmented Multi-level Corrections

Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free Real Image Editing

GroupDiff: Diffusion-based Group Portrait Editing

InstaStyle: Inversion Noise of a Stylized Image is Secretly a Style Adviser

InstructGIE: Towards Generalizable Image Editing

Leveraging Text Localization for Scene Text Removal via Text-aware Masked Image Modeling

MERLiN: Single-Shot Material Estimation and Relighting for Photometric Stereo

Pixel-Aware Stable Diffusion for Realistic Image Super-Resolution and Personalized Stylization

Real-time 3D-aware Portrait Editing from a Single Image

RegionDrag: Fast Region-Based Image Editing with Diffusion Models

Robust-Wide: Robust Watermarking against Instruction-driven Image Editing

ShoeModel: Learning to Wear on the User-specified Shoes via Diffusion Model

Source Prompt Disentangled Inversion for Boosting Image Editability with Diffusion Models

StableDrag: Stable Dragging for Point-based Image Editing

StyleTokenizer: Defining Image Style by a Single Instance for Controlling Diffusion Models

Taming Latent Diffusion Model for Neural Radiance Field Inpainting

TinyBeauty: Toward Tiny and High-quality Facial Makeup with Data Amplify Learning

Tuning-Free Image Customization with Image and Text Guidance

TurboEdit: Instant text-based image editing

Watch Your Steps: Local Image and Scene Editing by Text Instructions

3.视频生成(Video Generation/Video Synthesis)

Animate Your Motion: Turning Still Images into Dynamic Videos

Audio-Synchronized Visual Animation

Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance

Dyadic Interaction Modeling for Social Behavior Generation

DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors

EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis

FreeInit: Bridging Initialization Gap in Video Diffusion Models

Hybrid Video Diffusion Models with 2D Triplane and 3D Wavelet Representation

IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation

Kinetic Typography Diffusion Model

MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model

MotionDirector: Motion Customization of Text-to-Video Diffusion Models

MoVideo: Motion-Aware Video Generation with Diffusion Models

Noise Calibration: Plug-and-play Content-Preserving Video Enhancement using Pre-trained Video Diffusion Models

PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation

VEnhancer: Generative Space-Time Enhancement for Video Generation

ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video

4.视频编辑(Video Editing)

Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation

DNI: Dilutional Noise Initialization for Diffusion Video Editing

DragAnything: Motion Control for Anything using Entity Representation

DreamMotion: Space-Time Self-Similar Score Distillation for Zero-Shot Video Editing

DreamMover: Leveraging the Prior of Diffusion Models for Image Interpolation with Large Motion

Fast Sprite Decomposition from Animated Graphics

TCAN: Animating Human Images with Temporally Consistent Pose Guidance using Diffusion Models

Towards Model-Agnostic Dataset Condensation by Heterogeneous Models

5.3D生成(3D Generation/3D Synthesis)

BAMM: Bidirectional Autoregressive Motion Model

Beat-It: Beat-Synchronized Multi-Condition 3D Dance Generation

Beyond the Contact: Discovering Comprehensive Affordance for 3D Objects from Pre-trained 2D Diffusion Models

CanonicalFusion: Generating Drivable 3D Human Avatars from Multiple Images

Connecting Consistency Distillation to Score Distillation for Text-to-3D Generation

DiffSurf: A Transformer-based Diffusion Model for Generating and Reconstructing 3D Surfaces in Pose

DreamView: Injecting View-specific Text Guidance into Text-to-3D Generation

EchoScene: Indoor Scene Generation via Information Echo over Scene Graph Diffusion

EMDM: Efficient Motion Diffusion Model for Fast, High-Quality Human Motion Generation

EmoTalk3D: High-Fidelity Free-View Synthesis of Emotional 3D Talking Head

Expressive Whole-Body 3D Gaussian Avatar

Fast Training of Diffusion Transformer with Extreme Masking for 3D Point Clouds Generation

GenerateCT: Text-Conditional Generation of 3D Chest CT Volumes

GenRC: Generative 3D Room Completion from Sparse Image Collections

GVGEN:Text-to-3D Generation with Volumetric Representation

Head360: Learning a Parametric 3D Full-Head for Free-View Synthesis in 360°

HiFi-123: Towards High-fidelity One Image to 3D Content Generation

iHuman: Instant Animatable Digital Humans From Monocular Videos

JointDreamer: Ensuring Geometry Consistency and Text Congruence in Text-to-3D Generation via Joint Score Distillation

KMTalk: Speech-Driven 3D Facial Animationwith Key Motion Embedding

Length-Aware Motion Synthesis via Latent Diffusion

LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation

Local Action-Guided Motion Diffusion Model for Text-to-Motion Generation

MeshAvatar: Learning High-quality Triangular Human Avatars from Multi-view Videos

MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model

Motion Mamba: Efficient and Long Sequence Motion Generation with Hierarchical and Bidirectional Selective SSM

NeuSDFusion: A Spatial-Aware Generative Model for 3D Shape Completion, Reconstruction, and Generation

PanoFree: Tuning-Free Holistic Multi-view Image Generation with Cross-view Self-Guidance

ParCo: Part-Coordinating Text-to-Motion Synthesis

Pyramid Diffusion for Fine 3D Large Scene Generation

Realistic Human Motion Generation with Cross-Diffusion Models

Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable 2D Repainting

RodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models

ScaleDreamer: Scalable Text-to-3D Synthesis with Asynchronous Score Distillation

ScanTalk: 3D Talking Heads from Unregistered Scans

SceneTeller: Language-to-3D Scene Generation

StructLDM: Structured Latent Diffusion for 3D Human Generation

Surf-D: Generating High-Quality Surfaces of Arbitrary Topologies Using Diffusion Models

SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion

TexGen: Text-Guided 3D Texture Generation with Multi-view Sampling and Resampling

UniDream: Unifying Diffusion Priors for Relightable Text-to-3D Generation

VCD-Texture: Variance Alignment based 3D-2D Co-Denoising for Text-Guided Texturing

VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models

Viewpoint Textual Inversion: Discovering Scene Representations and 3D View Control in 2D Diffusion Models

VividDreamer: Invariant Score Distillation For Hyper-Realistic Text-to-3D Generation

6.3D编辑(3D Editing)

3DEgo: 3D Editing on the Go!

Chat-Edit-3D: Interactive 3D Scene Editing via Text Prompts

DGE: Direct Gaussian 3D Editing by Consistent Multi-view Editing

Free-Editor: Zero-shot Text-driven 3D Scene Editing

GaussCtrl: Multi-View Consistent Text-Driven 3D Gaussian Splatting Editing

Gaussian Grouping: Segment and Edit Anything in 3D Scenes

KMTalk: Speech-Driven 3D Facial Animation with Key Motion Embedding

LatentEditor: Text Driven Local Editing of 3D Scenes

RoomTex: Texturing Compositional Indoor Scenes via Iterative Inpainting

SC4D: Sparse-Controlled Video-to-4D Generation and Motion Transfer

SMooDi: Stylized Motion Diffusion Model

StyleCity: Large-Scale 3D Urban Scenes Stylization

Texture-GS: Disentangling the Geometry and Texture for 3D Gaussian Splatting Editing

Towards High-Quality 3D Motion Transfer with Realistic Apparel Animation

View-Consistent 3D Editing with Gaussian Splatting

Watch Your Steps: Local Image and Scene Editing by Text Instructions

7.多模态大语言模型(Multi-Modal Large Language Models)

About Unveiling Typographic Deceptions: Insights of the Typographic Vulnerability in Large Vision-Language Model

AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shield Prompting

AddressCLIP: Empowering Vision-Language Models for City-wide Image Address Localization

Adversarial Prompt Tuning for Vision-Language Models

A Large Multimodal Model Perceiving Any Aspect Ratio and High-Resolution Images

An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

API: Attention Prompting on Image for Large Vision-Language Models

Bi-directional Contextual Attention for 3D Dense Captioning

BI-MDRG: Bridging Image History in Multimodal Dialogue Response Generation

CLAP: Isolating Content from Style through Contrastive Learning with Augmented Prompts

ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference

ControlCap: Controllable Region-level Captioning

Controllable Navigation Instruction Generation with Chain of Thought Prompting

DreamLIP: Language-Image Pre-training with Long Captions

DriveLM: Driving with Graph Visual Question Answering

Elysium: Exploring Object-level Perception in Videos via MLLM

Emergent Visual-Semantic Hierarchies in Image-Text Representations

Empowering Multimodal Large Language Model as a Powerful Data Generator

EventBind: Learning a Unified Representation to Bind Them All for Event-based Open-world Understanding

FALIP: Visual Prompt as Foveal Attention Boosts CLIP Zero-Shot Performance

GazeXplain: Learning to Predict Natural Language Explanations of Visual Scanpaths

Genixer: Empowering Multimodal Large Language Models as a Powerful Data Generator

GeoChat: Grounded Large Vision-Language Model for Remote Sensing

GiT: Towards Generalist Vision Transformer through Universal Language Interface

Goldfish: Vision-Language Understanding of Arbitrarily Long Videos

Groma: Grounded Multimodal Assistant

How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs

InternVideo: Video Foundation Models for Multimodal Understanding

Introducing Routing Functions to Vision-Language Parameter-Efficient Fine-Tuning with Low-Rank Bottlenecks

Instruction Tuning-free Visual Token Complement for Multimodal LLMs

LAPT: Label-driven Automated Prompt Tuning for OOD Detection with Vision-Language Models

Learning Video Context as Interleaved Multimodal Sequences

LLM as Dataset Analyst: Subpopulation Structure Discovery with Large Language Model

LLMGA: Multimodal Large Language Model based Generation Assistant

Long-CLIP: Unlocking the Long-Text Capability of CLIP

MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

Merlin:Empowering Multimodal LLMs with Foresight Minds

Meta-Prompting for Automating Zero-shot Visual Recognition with LLMs

Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment

MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models

NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models

Omniview-Tuning: Boosting Viewpoint Invariance of Vision-Language Pre-training Models

Parrot Captions Teach CLIP to Spot Text

Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs

Platypus: A Generalized Specialist Model for Reading Text in Various Forms

PointLLM: Empowering Large Language Models to Understand Point Clouds

PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model

R2-Bench: Benchmarking the Robustness of Referring Perception Models under Perturbations

Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models

REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models

Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models

SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation

SDPT: Synchronous Dual Prompt Tuning for Fusion-based Visual-Language Pre-trained Models

Self-Adapting Large Visual-Language Models to Edge Devices across Visual Modalities

ShareGPT4V: Improving Large Multi-Modal Models with Better Captions

Soft Prompt Generation

SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant

ST-LLM: Large Language Models Are Effective Temporal Learners

Taming CLIP for Fine-grained and Structured Visual Understanding of Museum Exhibits

TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias

UMBRAE: Unified Multimodal Brain Decoding

UniIR: Training and Benchmarking Universal Multimodal Information Retrievers

Unveiling Typographic Deceptions: Insights of the Typographic Vulnerability in Large Vision-Language Model

Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models

VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks

X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs

8.其他任务(Others)

Which Model Generated This Image? A Model-Agnostic Approach for Origin Attribution

持续更新~