Adversary's adversary can be a good friend: Revisiting labels of low-margin examples to reconcile accuracy and robustness

Kim, Seongmin; Jung, Yoojin; Song, Byung Cheol

doi:10.1016/j.neucom.2026.132664

상세 보기

Adversary's adversary can be a good friend: Revisiting labels of low-margin examples to reconcile accuracy and robustness

Kim, Seongmin;
Jung, Yoojin;
Song, Byung Cheol

Citations

WEB OF SCIENCE

0

Citations

SCOPUS

0

초록

Adversarial training (AT) is widely recognized as one of the most effective methods for improving the robustness of deep learning models. However, AT suffers from a fundamental trade-off between robustness and generaliza tion, which has motivated various mitigation strategies. Among them, margin-based AT approaches employ loss reweighting, reflecting the idea that more critical examples should contribute larger gradients. Yet, these meth ods are limited by their exclusive focus on gradient magnitude. In this work, we identify that prior approaches overlook the role of gradient direction, and we provide both theoretical and empirical evidence to support this claim. We argue that both the magnitude and direction of gradients should be considered in adversarial training, and propose a novel label design framework, ADA-Lab (ADversary's Adversary for Label adjustment), which incor porates both aspects to refine supervision for low-margin examples. Specifically, we introduce the concept of the adversary's adversary to explicitly encode directional information aligned with gradient descent. Our theoretical analysis shows that labels designed using this concept better approximate the true label distribution, especially for low-margin examples (i.e., more important examples). Furthermore, by estimating example importance based on the distance to the decision boundary, our method adaptively controls the degree of label interpolation. Our key novelty lies in introducing direction-aware label refinement based on the adversary's adversary, a concept that explicitly leverages the gradient descent direction of adversarial inputs to correct label mismatch. This unified design integrates gradient magnitude-based importance weighting and label distribution correction, resulting in improved robustness and generalization, as demonstrated by extensive theoretical and empirical results.

키워드

Adversarial robustness; Adversarial training; Noisy label learning; Machine learning

제목: Adversary's adversary can be a good friend: Revisiting labels of low-margin examples to reconcile accuracy and robustness

저자: Kim, Seongmin; Jung, Yoojin; Song, Byung Cheol

DOI: 10.1016/j.neucom.2026.132664

발행일: 2026-04

유형: Article

저널명: Neurocomputing

권: 672