Sep 2025
Abstract
Recent face-swapping methods excel under controlled conditions but often fail when presented with extreme facial poses. Diffusion-based approaches may be able to overcome these issues, but they still face significant computational costs. This paper introduces MagicMask, a novel faceswapping framework that robustly handles various poses in real time by fusing visual and geometric information. Our method incorporates explicit, identity-adapted geometric cues into the latent feature space via a multi-head attention mechanism. It employs an Adversarial Facial Silhouette Alignment (AFSA) loss to preserve detailed facial boundaries that are adapted to the source identity. Comprehensive experiments on multiple benchmarks demonstrate that MagicMask competes with state-of-the-art methods under standard conditions and significantly outperforms them in extreme pose scenarios.
Keywords
Face identity swap, face swap, pose robustness, generative adversarial network, transformer
Key Contributions
Dual visual–geometric representation for extreme pose robustness
ARIE block for stable, fine-grained identity embedding
Cross-attention structure alignment via landmarks & depth
AFSA loss for sharper silhouettes & boundary consistency
Real-time performance (~36FPS) with SOTA accuracy
