
基于通信的协作型多智能体强化学习算法综述
A Survey of Communication Based Cooperative Multi-Agent Reinforcement Learning Algorithms
多智能体系统在许多实际领域中得到了广泛应用,包括机器人技术、分布式控制和多人游戏等。这些领域中的许多复杂任务无法通过预定义的智能体行为来解决,而基于通信的多智能体强化学习(Multi-Agent Reinforcement Learning, MARL)技术是应对这些挑战的有效方法之一。该领域存在2个核心问题:1) 如何建立有效的多智能体通信机制,从而提升多智能体系统的整体性能;2) 在带宽受限的场景下,如何设计高效的通信调度方案从而压缩通信过程中冗余信息。本文首先对处理这两个核心问题的文献进行了概述并重点介绍具有代表性的一些工作,接着说明其在航天领域的应用前景,最后进行总结。
Multi-agent systems are widely used in many practical fields, including robotics, distributed control, and multiplayer games. Many complex tasks in these fields can not be solved by predefined agent behaviors, and communication based multi-agent reinforcement learning (MARL) technology is one of the effective methods to deal with these challenges. There are two core research issues in this field: 1) How to establish an effective multi-agent communication mechanism to improve the overall performance of the multi-agent system; 2) In the scenario under limited bandwidth, how to design an efficient communication schedule to compress redundant information in the communication process. The literature is summarized for dealing with these two core issues and some representative works are focused, then its application prospects in the aerospace field is presented, and finally the points of this research are shown.
强化学习 / 通信机制 / 多智能体系统 {{custom_keyword}} /
Reinforcement learning / Communication mechanism / Multi-agent system {{custom_keyword}} /
表1 传统通信方法的总结 |
名称 | 日期 | 类型 | 特点 |
---|---|---|---|
CommNet[8] | 2016 | A | 引入均值聚合 |
DIAL[9] | 2016 | A | 引入可微分消息 |
IC3 Net[10] | 2019 | B | 引入门机制 |
TarMAC[15] | 2019 | C | 引入注意力机制 |
VBC[11] | 2019 | B | 引入动作影响机制 |
DGN[16] | 2020 | C | 引入图卷积网络 |
G2A[18] | 2020 | B/C | 引入软硬注意力机制 |
TMC[12] | 2020 | B | 引入消息存储机制 |
SymbC[17] | 2020 | C | 引入神经符号 |
I2C[13] | 2020 | B | 引入因果影响机制 |
MAGIC[19] | 2021 | B/C | 引入图注意力网络 |
MAIC[20] | 2022 | B/C | 引入表征学习 |
SMS[14] | 2022 | B | 引入沙普利值 |
[1] |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
[2] |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
[3] |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
[4] |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
[5] |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
[6] |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
[7] |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
[8] |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
[9] |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
[10] |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
[11] |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
[12] |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
[13] |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
[14] |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
[15] |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
[16] |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
[17] |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
[18] |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
[19] |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
[20] |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
[21] |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
[22] |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
[23] |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
[24] |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
[25] |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
[26] |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_ref.label}} |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
/
〈 |
|
〉 |