Reward design in multi-agent systems using successor features and multi-information source bayesian optimization

Park, Kyeonghyeon; Concha, David Molina; Lee, Hyun-Rok; Lee, Taesik; Lee, Chi-Guhn

doi:10.1007/s13042-025-02622-z

상세 보기

Reward design in multi-agent systems using successor features and multi-information source bayesian optimization

Park, Kyeonghyeon;
Concha, David Molina;
Lee, Hyun-Rok;
Lee, Taesik;
Lee, Chi-Guhn

Citations

WEB OF SCIENCE

2

Citations

SCOPUS

1

초록

Coordinating self-interested agents in multi-agent systems to achieve system-level objectives presents significant challenges due to the inherent misalignment between individual and collective goals. Mechanism design offers a solution by employing a bi-level optimization framework, where a designer agent intervenes in the reward structures to incentivize desired behaviors among self-interested agents. However, a major obstacle in reward optimization lies in solving multi-agent reinforcement learning problems given a reward structure. This paper addresses this challenge by introducing a novel algorithm that leverages successor features (SFs) at both levels of the optimization. Specifically, SFs help reduce the number of design iterations at the upper level by using previously learned equilibria as biased information sources and accelerate equilibrium learning at the lower level by transferring equilibria from previously solved Markov games. This innovative approach leads to significant computational savings, making the process up to ten times faster compared to traditional methods.

키워드

Reward design; Multi-information source Bayesian optimization; Mean-field reinforcement learning; Transfer learning; Successor feature; INEFFICIENCY

제목: Reward design in multi-agent systems using successor features and multi-information source bayesian optimization

저자: Park, Kyeonghyeon; Concha, David Molina; Lee, Hyun-Rok; Lee, Taesik; Lee, Chi-Guhn

DOI: 10.1007/s13042-025-02622-z

발행일: 2025-04-18

유형: Article

저널명: International Journal of Machine Learning and Cybernetics

권: 16

호: 9

페이지: 6249 ~ 6270