查看“︁蒙特卡洛树搜索”︁的源代码

{{noteTA
|T=zh-hans:蒙特卡洛树搜索; zh-hant:蒙地卡羅樹搜尋;
|1=zh-hans:蒙特卡洛; zh-hant:蒙地卡羅;
|G1=IT
|G2=Math
|G3=Games
}}

'''蒙特卡洛树搜索'''（{{lang-en|Monte Carlo tree search}}；简称：'''MCTS'''）是一种用于某些决策过程的[[启发式搜索|启发式]][[搜索算法]]，最引人注目的是在游戏中的使用。一个主要例子是[[电脑围棋]]程序<ref>{{cite web|url=http://www.mcts.ai/?q=mcts|title=MCTS.ai: Everything Monte Carlo Tree Search|accessdate=2012-02-19|archive-date=2018-11-27|archive-url=https://web.archive.org/web/20181127064909/http://mcts.ai/?q=mcts|dead-url=no}}</ref>，它也用于其他[[棋盘游戏]]、即时电子游戏以及不确定性游戏。

== 历史 ==
基于随机抽样的[[蒙特卡洛方法]]可以追溯到20世纪40年代<ref>{{cite journal |last1=Nicholas |first1=Metropolis |last2=Stanislaw |first2=Ulam |title=The monte carlo method |url=https://archive.org/details/sim_journal-of-the-american-statistical-association_1949-09_44_247/page/335 |journal=Journal of the American statistical association |date=1949 |volume=44 |page=335-341}}</ref>。布鲁斯·艾布拉姆森（Bruce Abramson）在他1987年的博士论文中探索了这一想法，称它“展示出了准确、精密、易估、有效可计算以及域独立的特性。”<ref name=Abramson>{{cite book|last=Abramson|first=Bruce|title=The Expected-Outcome Model of Two-Player Games|publisher=Technical report, Department of Computer Science, Columbia University|year=1987|url=http://academiccommons.columbia.edu/download/fedora_content/download/ac:142327/CONTENT/CUCS-315-87.pdf|accessdate=23 December 2013|archive-date=2016-10-27|archive-url=https://web.archive.org/web/20161027040145/http://academiccommons.columbia.edu/download/fedora_content/download/ac:142327/CONTENT/CUCS-315-87.pdf|dead-url=no}}</ref>他深入试验了[[井字棋]]，然后试验了[[黑白棋]]和[[国际象棋]]的机器生成的评估函数。1992年，B·布鲁格曼（B. Brügmann）首次将其应用于对弈程序<ref name="Bruegmann">{{cite book|url=http://www.ideanest.com/vegos/MonteCarloGo.pdf|title=Monte Carlo Go|last=Brügmann|first=Bernd|publisher=Technical report, Department of Physics, Syracuse University|year=1993|access-date=2016-03-15|archive-date=2021-04-14|archive-url=https://web.archive.org/web/20210414125720/http://www.ideanest.com/vegos/MonteCarloGo.pdf|dead-url=no}}</ref>，但他的想法未获得重视。2006年堪称围棋领域蒙特卡洛革命的一年<ref>{{cite book|author=Rémi Coulom|chapter=The Monte-Carlo Revolution in Go|title=Japanese-French Frontiers of Science Symposium|year=2008|url=http://remi.coulom.free.fr/JFFoS/JFFoS.pdf|access-date=2016-03-15|archive-date=2015-02-18|archive-url=https://web.archive.org/web/20150218143518/http://remi.coulom.free.fr/JFFoS/JFFoS.pdf|dead-url=no}}</ref>，雷米·库洛姆（Remi Coulom）描述了蒙特卡洛方法在[[博弈树|游戏树]]搜索的应用并命名为蒙特卡洛树搜索<ref>{{cite book|author=Rémi Coulom|chapter=Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search|pages=72–83|others=H. Jaap van den Herik, Paolo Ciancarini, H. H. L. M. Donkers (eds.)|title=Computers and Games, 5th International Conference, CG 2006, Turin, Italy, May 29–31, 2006. Revised Papers|publisher=Springer|year=2007|isbn=978-3-540-75537-1|url=http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.81.6817|doi=|access-date=2016-03-15|archive-date=2021-04-14|archive-url=https://web.archive.org/web/20210414124235/https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.81.6817|dead-url=no}}</ref>。列文特·科奇什（Levente Kocsis）和乔鲍·塞派什瓦里（Csaba Szepesvári）开发了UCT算法<ref name="Kocsis-Szepesvari"/>，西尔万·热利（Sylvain Gelly）等人在他们的程序MoGo中实现了UCT<ref name="Gelly-et-al">{{cite book|author=Sylvain Gelly, Yizao Wang, Rémi Munos, Olivier Teytaud|title=Modification of UCT with Patterns in Monte-Carlo Go|date=November 2006|publisher=Technical report, INRIA|url=http://hal.inria.fr/docs/00/11/72/66/PDF/MoGoReport.pdf|access-date=2016-03-15|archive-date=2014-07-30|archive-url=https://web.archive.org/web/20140730050628/http://hal.inria.fr/docs/00/11/72/66/PDF/MoGoReport.pdf|dead-url=no}}</ref>。2008年，MoGo在九路围棋中达到[[段级位制|段位]]水平<ref>{{cite journal|author=Chang-Shing Lee, Mei-Hui Wang, Guillaume Chaslot, Jean-Baptiste Hoock, Arpad Rimmel, Olivier Teytaud, Shang-Rong Tsai, Shun-Chin Hsu, Tzung-Pei Hong|title=The Computational Intelligence of MoGo Revealed in Taiwan’s Computer Go Tournaments|journal=IEEE Transactions on Computational Intelligence and AI in Games|pages=73–89|volume=1|issue=1|year=2009|url=http://hal.inria.fr/docs/00/36/97/86/PDF/TCIAIG-2008-0010_Accepted_.pdf|doi=10.1109/tciaig.2009.2018703|access-date=2016-03-15|archive-date=2013-09-26|archive-url=https://web.archive.org/web/20130926031115/http://hal.inria.fr/docs/00/36/97/86/PDF/TCIAIG-2008-0010_Accepted_.pdf|dead-url=no}}</ref>，Fuego程序开始在九路围棋中战胜实力强劲的业余棋手<ref>{{cite book|author=Markus Enzenberger, Martin Mūller|title=Fuego – An Open-Source Framework for Board Games and Go Engine Based on Monte Carlo Tree Search|publisher=Technical report, University of Alberta|year=2008|url=https://www.cs.ualberta.ca/system/files/tech_report/2009/TR09-08_0.pdf|deadurl=yes|archiveurl=https://web.archive.org/web/20120529220112/https://www.cs.ualberta.ca/system/files/tech_report/2009/TR09-08_0.pdf|archivedate=2012-05-29}}</ref>。2012年1月，Zen程序在19路围棋上以3：1击败二段棋手约翰·特朗普（John Tromp）<ref>{{cite web|url=http://dcook.org/gobet/|title=The Shodan Go Bet|accessdate=2012-05-02|archive-date=2021-04-14|archive-url=https://web.archive.org/web/20210414124239/http://dcook.org/gobet/|dead-url=no}}</ref>。
[[File:Computer-go-ratings-Chinese.png|frame|center|自2007年以来KGS围棋服务器上最优秀围棋程序的评级。自2006年以来，最优秀的程序全部使用蒙特卡洛树搜索。<ref>{{cite web|url=http://senseis.xmp.net/?KGSBotRatings|title=Sensei's Library: KGSBotRatings|accessdate=2012-05-03|archive-date=2021-05-06|archive-url=https://web.archive.org/web/20210506152255/https://senseis.xmp.net/?KGSBotRatings|dead-url=no}}</ref>]]

蒙特卡洛树搜索也被用于其他[[棋盘游戏]]程序，如[[六貫棋|六贯棋]]<ref>{{cite journal|author=Broderick Arneson, Ryan Hayward, Philip Henderson|title=MoHex Wins Hex Tournament|journal=ICGA Journal|volume=32|issue=2|pages=114–116|date=June 2009|url=http://webdocs.cs.ualberta.ca/~hayward/papers/rptPamplona.pdf|access-date=2016-03-15|archive-date=2021-04-16|archive-url=https://web.archive.org/web/20210416092530/http://webdocs.cs.ualberta.ca/~hayward/papers/rptPamplona.pdf|dead-url=no}}</ref>、[[三寶棋|三宝棋]]<ref>{{cite book|author=Timo Ewalds|title=Playing and Solving Havannah|publisher=Master's thesis, University of Alberta|year=2011|url=http://havannah.ewalds.ca/static/thesis.pdf|access-date=2016-03-15|archive-date=2016-03-04|archive-url=https://web.archive.org/web/20160304051450/http://havannah.ewalds.ca/static/thesis.pdf|dead-url=no}}</ref>、[[亞馬遜棋|亚马逊棋]]<ref>{{cite book|author=Richard J. Lorentz|chapter=Amazons Discover Monte-Carlo|pages=13–24|others=H. Jaap van den Herik, Xinhe Xu, Zongmin Ma, Mark H. M. Winands (eds.)|title=Computers and Games, 6th International Conference, CG 2008, Beijing, China, September 29 – October 1, 2008. Proceedings|publisher=Springer|year=2008|isbn=978-3-540-87607-6}}</ref>和[[印度鬥獸棋|印度斗兽棋]]<ref>{{cite book|author=Tomáš Kozelek|title=Methods of MCTS and the game Arimaa|publisher=Master's thesis, Charles University in Prague|year=2009|url=http://arimaa.com/arimaa/papers/TomasKozelekThesis/mt.pdf|access-date=2016-03-15|archive-date=2021-04-16|archive-url=https://web.archive.org/web/20210416092715/http://arimaa.com/arimaa/papers/TomasKozelekThesis/mt.pdf|dead-url=no}}</ref>；即时电子游戏，如《{{Le|吃豆小姐|Ms. Pac-Man}}》<ref>{{cite journal|author=Xiaocong Gan, Yun Bao, Zhangang Han|title=Real-Time Search Method in Nondeterministic Game – Ms. Pac-Man|pages=209–222|journal=ICGA Journal|volume=34|issue=4|date=December 2011}}</ref><ref>{{cite journal|author=Tom Pepels, Mark H. M. Winands, Marc Lanctot|title=Real-Time Monte Carlo Tree Search in Ms Pac-Man|pages=245–257|journal=IEEE Transactions on Computational Intelligence and AI in Games|volume=6|issue=3|date=September 2014|url=http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6731713|doi=10.1109/tciaig.2013.2291577|access-date=2016-03-15|archive-date=2015-09-09|archive-url=https://web.archive.org/web/20150909222510/http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6731713|dead-url=no}}</ref>、《{{Le|神鬼寓言:传奇|Fable Legends}}》<ref>{{cite web|url=http://events.nucl.ai/track/decisions/#tactical-planning-and-real-time-mcts-in-fable-legends|title=Tactical Planning and Real-time MCTS in Fable Legends|accessdate=2015-08-27|archive-url=https://web.archive.org/web/20150821022336/http://events.nucl.ai/track/decisions/#tactical-planning-and-real-time-mcts-in-fable-legends|archive-date=2015-08-21|dead-url=yes}}</ref>、《[[羅馬II：全軍破敵|罗马II：全面战争]]》<ref>{{cite web|url=http://aigamedev.com/open/coverage/mcts-rome-ii/|title=Monte-Carlo Tree Search in TOTAL WAR: ROME II's Campaign AI|accessdate=2014-08-13|archive-url=https://web.archive.org/web/20170313041719/http://aigamedev.com/open/coverage/mcts-rome-ii/|archive-date=2017-03-13|dead-url=yes}}</ref>；不确定性游戏，如[[斯卡特]]<ref>{{cite book|author=Michael Buro, Jeffrey Richard Long, Timothy Furtak, Nathan R. Sturtevant|chapter=Improving State Evaluation, Inference, and Search in Trick-Based Card Games|pages=1407–1413|others=Craig Boutilier (ed.)|title=IJCAI 2009, Proceedings of the 21st International Joint Conference on Artificial Intelligence, Pasadena, California, USA, July 11–17, 2009|year=2009|url=http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.150.3077|doi=|access-date=2016-03-15|archive-date=2021-04-14|archive-url=https://web.archive.org/web/20210414124404/https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.150.3077|dead-url=no}}</ref>、[[撲克|扑克]]<ref>{{cite journal|author=Jonathan Rubin, Ian Watson|title=Computer poker: A review|journal=Artificial Intelligence|volume=175|issue=5–6|date=April 2011|doi=10.1016/j.artint.2010.12.005|url=http://www.cs.auckland.ac.nz/~jrub001/files/CPReviewPreprintAIJ.pdf|pages=958–987|deadurl=yes|archiveurl=https://web.archive.org/web/20131001185632/http://www.cs.auckland.ac.nz/~jrub001/files/CPReviewPreprintAIJ.pdf|archivedate=2013-10-01}}</ref>、[[万智牌]]<ref>{{cite book|author=C.D. Ward, P.I. Cowling|chapter=Monte Carlo Search Applied to Card Selection in Magic: The Gathering|title=CIG'09 Proceedings of the 5th international conference on Computational Intelligence and Games|publisher=IEEE Press|year=2009|url=http://scim.brad.ac.uk/staff/pdf/picowlin/CIG2009.pdf|deadurl=yes|archiveurl=https://web.archive.org/web/20160528074031/http://scim.brad.ac.uk/staff/pdf/picowlin/CIG2009.pdf|archivedate=2016-05-28}}</ref>、[[卡坦岛]]<ref>{{cite book|author=István Szita, Guillaume Chaslot, Pieter Spronck|chapter=Monte-Carlo Tree Search in Settlers of Catan|pages=21–32|others=H. Jaap van den Herik, Pieter Spronck (eds.)|title=Advances in Computer Games, 12th International Conference, ACG 2009, Pamplona, Spain, May 11–13, 2009. Revised Papers|publisher=Springer|year=2010|isbn=978-3-642-12992-6|url=http://ticc.uvt.nl/icga/acg12/proceedings/Contribution100.pdf|access-date=2016-03-15|archive-url=https://web.archive.org/web/20160304082042/http://ticc.uvt.nl/icga/acg12/proceedings/Contribution100.pdf|archive-date=2016-03-04|dead-url=yes}}</ref>。

== 原理 ==
蒙特卡洛树搜索的每个循环包括四个步骤：<ref name="chaslot2008">{{cite journal|author=G.M.J.B. Chaslot, M.H.M. Winands, J.W.H.M. Uiterwijk, H.J. van den Herik, B. Bouzy|title=Progressive Strategies for Monte-Carlo Tree Search|journal=New Mathematics and Natural Computation|volume=4|issue=3|pages=343–359|year=2008|url=https://dke.maastrichtuniversity.nl/m.winands/documents/pMCTS.pdf|doi=10.1142/s1793005708001094|access-date=2016-03-15|archive-date=2021-04-14|archive-url=https://web.archive.org/web/20210414151134/https://dke.maastrichtuniversity.nl/m.winands/documents/pMCTS.pdf|dead-url=no}}</ref>
* 选择（Selection）：从根節点{{math|''R''}}开始，连续向下选择子節点至叶子節点{{math|''L''}}。下文將给出一种选择子節点的方法，让[[遊戲樹|游戏树]]向最优的方向扩展，这是蒙特卡洛树搜索的精要所在。
* 扩展（Expansion）：除非任意一方的输赢使得游戏在L结束，否则创建一个或多个子節点并选取其中一个節点{{math|''C''}}。 
* 仿真（Simulation）：再从節点{{math|''C''}}开始，用随机策略进行游戏，又称为playout或者rollout。
*反向传播（Backpropagation）：使用随机游戏的结果，更新从{{math|''C''}}到{{math|''R''}}的路径上的節点信息。
每一個節點的內容代表''勝利次數/遊戲次數''
[[File:MCTS (English).svg|frame|center|蒙特卡洛树搜索的步骤]]

== 探索与利用 ==
选择子结点的主要困难是：在较高平均胜率的移动后，在对深层次变型的利用和对少数模拟移动的探索，这二者中保持某种平衡。第一个在游戏中平衡利用与探索的公式被称为UCT（Upper Confidence Bounds to Trees，上限置信区间算法 ），由匈牙利国家科学院计算机与自动化研究所高级研究员列文特·科奇什与[[阿尔伯塔大学]]全职教授乔鲍·塞派什瓦里提出<ref name="Kocsis-Szepesvari">{{cite book|last=Kocsis|first=Levente|last2=Szepesvári|first2=Csaba|chapter=Bandit based Monte-Carlo Planning|others=Johannes Fürnkranz, Tobias Scheffer, Myra Spiliopoulou (eds.)|title=Machine Learning: ECML 2006, 17th European Conference on Machine Learning, Berlin, Germany, September 18–22, 2006, Proceedings. Lecture Notes in Computer Science 4212|publisher=Springer|isbn=3-540-45375-X|url=http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.102.1296|pages=282–293|year=2006|doi=|access-date=2016-03-15|archive-date=2021-05-06|archive-url=https://web.archive.org/web/20210506224815/https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.102.1296|dead-url=no}}</ref>。UCT基于奥尔（Auer）、西萨-比安奇（Cesa-Bianchi）和费舍尔（Fischer）提出的UCB1公式<ref>{{cite journal|last=Auer|first=Peter|last2=Cesa-Bianchi|first2=Nicolò|last3=Fischer|first3=Paul|title=Finite-time Analysis of the Multiarmed Bandit Problem|journal=Machine Learning|volume=47|pages=235–256|year=2002|url=http://moodle.technion.ac.il/pluginfile.php/192340/mod_resource/content/0/UCB.pdf|doi=10.1023/a:1013689704352|access-date=2016-03-15|archive-date=2014-05-12|archive-url=https://web.archive.org/web/20140512220524/http://moodle.technion.ac.il/pluginfile.php/192340/mod_resource/content/0/UCB.pdf|dead-url=yes}}</ref>，并首次由马库斯等人应用于多级决策模型（具体为[[马尔可夫决策过程]]）<ref>{{cite journal|last=Chang|first=Hyeong Soo|last2=Fu|first2=Michael C.|last3=Hu|first3=Jiaqiao|last4=Marcus|first4=Steven I.|title=An Adaptive Sampling Algorithm for Solving Markov Decision Processes|journal=Operations Research|volume=53|pages=126–139|year=2005|url=http://scholar.rhsmith.umd.edu/sites/default/files/mfu/files/cfhm05.pdf?m=1449834091|access-date=2016-03-15|archive-date=2021-04-20|archive-url=https://web.archive.org/web/20210420114214/https://scholar.rhsmith.umd.edu/sites/default/files/mfu/files/cfhm05.pdf?m=1449834091|dead-url=no}}</ref>。科奇什和塞派什瓦里建议选择游戏树中的每个结点移动，从而使表达式 <math>\frac{w_i}{n_i} + c\sqrt{\frac{\ln t}{n_i}}</math>具有最大值。在该式中：
* <math>w_i</math>代表第<math>i</math>次移动后取胜的次数；
* <math>n_i</math>代表第<math>i</math>次移动后仿真的次数；
* <math>c</math>为探索参数—理论上等于<math>\sqrt{2}</math>；在实际中通常可凭经验选择；
* <math>t</math>代表仿真总次数，等于所有<math>n_i</math>的和。
目前蒙特卡洛树搜索的实现大多是基于UCT的一些变形。

== 参见 ==
* [[AlphaGo]]，一个同时使用蒙特卡洛树搜索和[[深度学习]]的围棋程序。

== 参考来源 ==
{{Reflist|2}}

== 延伸阅读 ==
* {{cite journal|author=Cameron Browne, Edward Powley, Daniel Whitehouse, Simon Lucas, Peter I. Cowling, Philipp Rohlfshagen, Stephen Tavener, Diego Perez, Spyridon Samothrakis, Simon Colton|title=A Survey of Monte Carlo Tree Search Methods（蒙特卡洛树搜索方法综述）|journal=IEEE Transactions on Computational Intelligence and AI in Games|volume=4|issue=1|date=March 2012|url=http://www.cameronius.com/cv/mcts-survey-master.pdf|deadurl=yes|archiveurl=https://web.archive.org/web/20130309060408/http://www.cameronius.com/cv/mcts-survey-master.pdf|archivedate=2013-03-09}}

[[Category:组合博弈论]]
[[Category:蒙地卡罗方法]]
[[Category:人工智能]]
[[Category:搜尋演算法]]