Nov 2025
Abstract
Anomaly detection, or outlier detection, refers to identifying rare or abnormal instances or patterns within a dataset that deviate significantly from the expected or normal behaviour. Various methods have been proposed, but most assume that their training datasets take full, complete integrity. However, the innocent integrity of data is not easy to maintain in reality. Existing anomaly detection methods generally see given data as a single class and learn features that can represent it well, but this approach is very vulnerable to data contamination. This paper proposes a Normality-Calibrated Autoencoder (NCAE), which can boost anomaly detection performance on the contaminated datasets without any prior information or explicit abnormal samples in the training phase. The NCAE adversarially generates highly confident normal samples from a latent space with low entropy and leverages them to predict abnormal samples in a training dataset. NCAE is trained to minimise reconstruction errors in uncontaminated samples and maximise reconstruction errors in contaminated samples. The experimental results demonstrate that our method outperforms shallow, hybrid, and deep methods for unsupervised anomaly detection and achieves comparable performance compared with semi-supervised methods using labelled anomaly samples in the training phase.
Keywords
unsupervised anomaly detection, normality calibration, autoencoder, data contamination, data pollution
Key Contributions
Introduces NCAE, the first autoencoder trained to be robust under contaminated (noisy) training data.
Aligns latent space to a clean Gaussian prior and generates “high-confidence normal” samples for calibration.
Joint adversarial–reconstruction learning separates normal vs. contaminated instances without labels.
Outperforms shallow, hybrid, and deep AD methods across various contamination ratios.
