Sorry, you need to enable JavaScript to visit this website.

IterDiff

DOI:
10.60864/7tnd-mx88
Citation Author(s):
Submitted by:
anony anony
Last updated:
7 February 2025 - 12:09pm
Document Type:
Supplementary Materials
 

The rise of generative models has transformed image generation and editing, enabling high-quality, user-guided outputs. Iterative face editing, essential for applications like virtual makeup and entertainment, allows users to refine images progressively. However, this process often leads to artifact accumulation, semantic inconsistency, and quality degradation over multiple edits. Existing methods, while effective in single-step modifications, struggle with sequential edits. To robustly maintain fidelity and consistency in iterative face editing across multiple sessions, we propose \textit{IterDiff}, a training-free framework leveraging diffusion models with a novel Training-Free Feature Preservation ($\text{TF}^2\text{P}$) approach to tackle these challenges by storing and retrieving key-value (KV) pairs from self-attention layers. Additionally, we further improve its efficiency and feasibility by Efficient CLIP-guided Memory Bank (ECMB). Experiments on the proposed benchmark show that IterDiff excels in prompt alignment, content consistency, and image quality, providing a robust solution for iterative facial attribute editing.

up
0 users have voted: