Documents
Supplementary Materials
IterDiff

- DOI:
- 10.60864/7tnd-mx88
- Citation Author(s):
- Submitted by:
- anony anony
- Last updated:
- 7 February 2025 - 12:09pm
- Document Type:
- Supplementary Materials
- Categories:
- Log in to post comments
The rise of generative models has transformed image generation and editing, enabling high-quality, user-guided outputs. Iterative face editing, essential for applications like virtual makeup and entertainment, allows users to refine images progressively. However, this process often leads to artifact accumulation, semantic inconsistency, and quality degradation over multiple edits. Existing methods, while effective in single-step modifications, struggle with sequential edits. To robustly maintain fidelity and consistency in iterative face editing across multiple sessions, we propose \textit{IterDiff}, a training-free framework leveraging diffusion models with a novel Training-Free Feature Preservation ($\text{TF}^2\text{P}$) approach to tackle these challenges by storing and retrieving key-value (KV) pairs from self-attention layers. Additionally, we further improve its efficiency and feasibility by Efficient CLIP-guided Memory Bank (ECMB). Experiments on the proposed benchmark show that IterDiff excels in prompt alignment, content consistency, and image quality, providing a robust solution for iterative facial attribute editing.