查看“︁分散式部分可观察马尔可夫决策过程”︁的源代码

{{Expand language|1=en|time=2022-04-25T00:38:33+00:00}}
'''分散式部分可观察马尔可夫决策过程'''（{{lang-en|Decentralized partially observable Markov decision process}}，'''Dec-POMDP'''）<ref>{{Cite journal|last1=Bernstein|first1=Daniel S.|last2=Givan|first2=Robert|last3=Immerman|first3=Neil|last4=Zilberstein|first4=Shlomo|date=November 2002|title=The Complexity of Decentralized Control of Markov Decision Processes|url=https://archive.org/details/sim_mathematics-of-operations-research_2002-11_27_4/page/819|journal=Math. Oper. Res.|volume=27|issue=4|pages=819–840|doi=10.1287/moor.27.4.819.297|issn=0364-765X|arxiv=1301.3836|s2cid=1195261}}</ref><ref>{{Cite book|title=A Concise Introduction to Decentralized POMDPs {{!}} SpringerLink|last1=Oliehoek|first1=Frans A.|last2=Amato|first2=Christopher|language=en-gb|doi=10.1007/978-3-319-28929-8|series=SpringerBriefs in Intelligent Systems|year=2016|isbn=978-3-319-28927-4|s2cid=3263887|url=http://www.fransoliehoek.net/docs/OliehoekAmato16book.pdf|access-date=2022-04-24|archive-date=2021-09-16|archive-url=https://web.archive.org/web/20210916223214/https://www.fransoliehoek.net/docs/OliehoekAmato16book.pdf|dead-url=no}}</ref>是一种多智慧体协调[[决策]]模型。这是一种[[概率模型]]，对于现实生活中结果、传感器和通信的不确定性具有很好的解决能力。

该模型是[[马尔可夫决策过程]]和[[部分可觀察馬可夫決策過程]]的泛化，适用于分布式多智慧体的情形。<ref>{{Cite book|last1=Oliehoek|first1=Frans A.|url=https://books.google.com/books?id=FZRPDAAAQBAJ&q=Decentralized+partially+observable+Markov+decision+process|title=A Concise Introduction to Decentralized POMDPs|last2=Amato|first2=Christopher|date=2016-06-03|publisher=Springer|isbn=978-3-319-28929-8|language=en|access-date=2022-04-24|archive-date=2022-04-24|archive-url=https://web.archive.org/web/20220424091740/https://books.google.com/books?id=FZRPDAAAQBAJ&q=Decentralized%20partially%20observable%20Markov%20decision%20process|dead-url=no}}</ref>
==定义==
=== 正式定义 ===
Dec-POMDP是一个[[多元组|7元组]]，其中：
* <math>S</math>是状态的集合，
* <math>A_i</math>是智慧体''i''的行动集合，其中<math>A=\times_i A_i</math>是联合行动的集合，
* <math>T</math>是是状态间条件转移概率的集合，<math>T(s,a,s')=P(s'\mid s,a)</math>，
* <math>R: S \times A \to \mathbb{R}</math>是回报函数，
* <math>\Omega_i</math>是智慧体''i''的观察集合，其中<math>\Omega=\times_i \Omega_i</math>是联合观察的集合，
* <math>O</math>是一组条件观察概率，<math>O(s',a, o)=P(o\mid s',a)</math>
* <math>\gamma \in [0, 1]</math>是折现因子

==参考文献==
{{reflist}}

[[Category:马尔可夫过程]]