Adversary's adversary can be a good friend: Revisiting labels of low-margin examples to reconcile accuracy and robustness

Citations

WEB OF SCIENCE

0
Citations

SCOPUS

0

초록

Adversarial training (AT) is widely recognized as one of the most effective methods for improving the robustness of deep learning models. However, AT suffers from a fundamental trade-off between robustness and generaliza tion, which has motivated various mitigation strategies. Among them, margin-based AT approaches employ loss reweighting, reflecting the idea that more critical examples should contribute larger gradients. Yet, these meth ods are limited by their exclusive focus on gradient magnitude. In this work, we identify that prior approaches overlook the role of gradient direction, and we provide both theoretical and empirical evidence to support this claim. We argue that both the magnitude and direction of gradients should be considered in adversarial training, and propose a novel label design framework, ADA-Lab (ADversary's Adversary for Label adjustment), which incor porates both aspects to refine supervision for low-margin examples. Specifically, we introduce the concept of the adversary's adversary to explicitly encode directional information aligned with gradient descent. Our theoretical analysis shows that labels designed using this concept better approximate the true label distribution, especially for low-margin examples (i.e., more important examples). Furthermore, by estimating example importance based on the distance to the decision boundary, our method adaptively controls the degree of label interpolation. Our key novelty lies in introducing direction-aware label refinement based on the adversary's adversary, a concept that explicitly leverages the gradient descent direction of adversarial inputs to correct label mismatch. This unified design integrates gradient magnitude-based importance weighting and label distribution correction, resulting in improved robustness and generalization, as demonstrated by extensive theoretical and empirical results.

키워드

Adversarial robustnessAdversarial trainingNoisy label learningMachine learning
제목
Adversary's adversary can be a good friend: Revisiting labels of low-margin examples to reconcile accuracy and robustness
저자
Kim, SeongminJung, YoojinSong, Byung Cheol
DOI
10.1016/j.neucom.2026.132664
발행일
2026-04
유형
Article
저널명
Neurocomputing
672