查看“︁受限玻尔兹曼机”︁的源代码

[[File:Restricted Boltzmann machine.svg|thumb|包含三个可见单元和四个隐单元的受限玻兹曼机示意图（不包含偏置节点）]]
{{机器学习导航栏}}
'''受限玻尔兹曼机'''（{{lang-en|restricted Boltzmann machine}}, RBM）是一种可通过输入数据集学习概率分布的[[随机神经网络|随机]][[生成模型|生成]][[神经网络]]。RBM最初由发明者{{tsl|en|Paul Smolensky|保罗·斯模棱斯基}}于1986年命名为'''簧风琴'''（Harmonium）<ref>{{cite book
 |last          = Smolensky
 |first         = Paulgengxin
 |editor1-first = David E.
 |editor1-last  = Rumelhart
 |editor2-first = James L.
 |editor2-last  = McLelland
 |title         = Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Volume 1: Foundations
 |publisher     = MIT Press
 |year          = 1986
 |pages         = 194–281
 |chapter       = Chapter 6: Information Processing in Dynamical Systems: Foundations of Harmony Theory
 |url           = http://www-psych.stanford.edu/~jlm/papers/PDP/Volume%201/Chap6_PDP86.pdf
 |isbn          = 0-262-68053-X
 |deadurl       = yes
 |archiveurl    = https://web.archive.org/web/20130613014045/http://www-psych.stanford.edu/~jlm/papers/PDP/Volume%201/Chap6_PDP86.pdf
 |archivedate   = 2013-06-13
 |access-date   = 2014-09-16
}}</ref>，但直到[[杰弗里·辛顿]]及其合作者在2000年代中叶发明快速学习算法后，受限玻兹曼机才变得知名。受限玻兹曼机在[[降维]]<ref>{{cite journal|title=Reducing the Dimensionality of Data with Neural Networks|journal=Science|date=2006-07-28|doi=10.1126/science.1127647|url=http://science.sciencemag.org/content/313/5786/504|language=en|volume=313|issue=5786|pages=504–507|issn=0036-8075|accessdate=2018-04-02|author=G. E. Hinton, R. R. Salakhutdinov|archive-date=2018-03-20|archive-url=https://web.archive.org/web/20180320113924/http://science.sciencemag.org/content/313/5786/504|dead-url=no}}</ref>、[[统计分类|分类]]<ref>{{cite journal|title=Classification using discriminative restricted Boltzmann machines|date=2008-07-05|publisher=ACM|isbn=9781605582054|doi=10.1145/1390156.1390224|pages=536–543|url=http://dl.acm.org/citation.cfm?id=1390156.1390224|accessdate=2018-04-02|author=Hugo Larochelle, Yoshua Bengio}}</ref>、[[协同过滤]]<ref name="softCF">{{Cite conference | doi = 10.1145/1273496.1273596| title = Restricted Boltzmann machines for collaborative filtering| conference = Proceedings of the 24th international conference on Machine learning - ICML '07| pages = 791| year = 2007| last1 = Salakhutdinov | first1 = R. | last2 = Mnih | first2 = A. | last3 = Hinton | first3 = G. | isbn = 9781595937933}}</ref>、[[特征学习]]<ref name="coates2011">{{cite conference
 |last1       = Coates
 |first1      = Adam
 |last2       = Lee
 |first2      = Honglak
 |last3       = Ng
 |first3      = Andrew Y.
 |title       = An analysis of single-layer networks in unsupervised feature learning
 |conference  = International Conference on Artificial Intelligence and Statistics (AISTATS)
 |year        = 2011
 |url         = http://www.stanford.edu/~acoates/papers/coatesleeng_aistats_2011.pdf
 |deadurl     = yes
 |archiveurl  = https://web.archive.org/web/20130510120705/http://www.stanford.edu/~acoates/papers/coatesleeng_aistats_2011.pdf
 |archivedate = 2013-05-10
 |access-date = 2014-09-16
}}</ref>和[[主题建模]]<ref>Ruslan Salakhutdinov and Geoffrey Hinton (2010). [http://books.nips.cc/papers/files/nips22/NIPS2009_0817.pdf Replicated softmax: an undirected topic model] {{Wayback|url=http://books.nips.cc/papers/files/nips22/NIPS2009_0817.pdf |date=20120525063031 }}. ''[[Neural Information Processing Systems]]'' '''23'''.</ref>中得到了应用。根据任务的不同，受限玻兹曼机可以使用[[监督学习]]或[[非監督式學習|无监督学习]]的方法进行训练。

正如名字所提示的那样，受限玻兹曼机是一种[[玻尔兹曼机|玻兹曼机]]的变体，但限定模型必须为[[二分图]]。模型中包含对应输入参数的输入（可见）单元和对应训练结果的隐单元，图中的每条边必须连接一个可见单元和一个隐单元。（与此相对，“无限制”玻兹曼机包含隐单元间的边，使之成为[[循环神经网络]]。）这一限定使得相比一般玻兹曼机更高效的训练算法成为可能，特别是基于[[梯度下降法|梯度]]的对比分歧（contrastive divergence）算法<ref name="oncd">Miguel Á. Carreira-Perpiñán and Geoffrey Hinton (2005). On contrastive divergence learning. ''Artificial Intelligence and Statistics''.</ref>。

受限玻兹曼机也可被用于[[深度学习]]网络。具体地，[[深度信念网络]]可使用多个RBM堆叠而成，并可使用[[梯度下降法]]和[[反向传播算法]]进行调优<ref>{{Cite journal 
| last1 = Hinton | first1 = G. 
| title = Deep belief networks 
| doi = 10.4249/scholarpedia.5947 
| journal = Scholarpedia 
| volume = 4 
| issue = 5 
| pages = 5947 
| year = 2009 
| pmid =  
| pmc = 
| bibcode = 2009SchpJ...4.5947H}}</ref>。

== 结构 ==
标准的受限玻尔兹曼机由二值（[[布尔代数|布尔]]/[[伯努利分布|伯努利]]）隐层和可见层单元组成。权重矩阵<math>W = (w_{i,j})</math>中的每个元素指定了隐层单元<math>h_i</math>和可见层单元<math>v_j</math>之间边的权重。此外对于每个可见层单元<math>v_i</math>有偏置<math>a_i</math>，对每个隐层单元<math>h_j</math>有偏置<math>b_j</math>。在这些定义下，一种受限玻尔兹曼机配置（即给定每个单元取值）的“能量”{{math|(''v'',''h'')}}被定义为

:<math>E(v,h) = -\sum_i a_i v_i - \sum_j b_j h_j -\sum_i \sum_j h_j w_{i,j} v_i</math>

或者用矩阵的形式表示如下：

:<math>E(v,h) = -a^{\mathrm{T}} v - b^{\mathrm{T}} h -h^{\mathrm{T}} W v</math>

这一能量函数的形式与[[霍普菲尔德神经网络]]相似。在一般的玻尔兹曼机中，隐层和可见层之间的联合概率分布由能量函数给出：<ref name="guide">Geoffrey Hinton (2010). ''[http://www.cs.toronto.edu/~hinton/absps/guideTR.pdf A Practical Guide to Training Restricted Boltzmann Machines] {{Wayback|url=http://www.cs.toronto.edu/~hinton/absps/guideTR.pdf |date=20140925041702 }}''. UTML TR 2010–003, University of Toronto.</ref>

:<math>P(v,h) = \frac{1}{Z} e^{-E(v,h)}</math>

其中，<math>Z</math>为[[配分函数]]，定义为在节点的所有可能取值下<math>e^{-E(v,h)}</math>的和（亦即使得概率分布和为1的[[归一化常数]]）。类似地，可见层取值的[[边缘分布]]可通过对所有隐层配置求和得到：<ref name="guide"/>

:<math>P(v) = \frac{1}{Z} \sum_h e^{-E(v,h)}</math>

由于RBM为一个二分图，层内没有边相连，因而隐层是否激活在给定可见层节点取值的情况下是[[条件独立]]的。类似地，可见层节点的激活状态在给定隐层取值的情况下也条件独立<ref name="oncd"/>。亦即，对<math>m</math>个可见层节点和<math>n</math>个隐层节点，可见层的配置{{mvar|v}}对于隐层配置{{mvar|h}}的[[条件概率]]如下：

:<math>P(v|h) = \prod_{i=1}^m P(v_i|h)</math>.

类似地，{{mvar|h}}对于{{mvar|v}}的条件概率为

:<math>P(h|v) = \prod_{j=1}^n P(h_j|v)</math>.

其中，单个节点的激活概率为

:<math>P(h_j=1|v) = \sigma \left(b_j + \sum_{i=1}^m w_{i,j} v_i \right)\,</math>和<math>\,P(v_i=1|h) = \sigma \left(a_i + \sum_{j=1}^n w_{i,j} h_j \right)</math>

其中<math>\sigma</math>代表[[逻辑函数]]。

=== 与其他模型的关系 ===
受限玻尔兹曼机是玻尔兹曼机和[[马尔可夫网络|马尔科夫随机场]]的一种特例<ref name="cdconvergence">{{cite journal | first1 = Ilya | last1 = Sutskever | first2 = Tijmen | last2 = Tieleman | year = 2010 | title = On the convergence properties of contrastive divergence | journal = Proc. 13th Int'l Conf. on AI and Statistics (AISTATS) | url = http://machinelearning.wustl.edu/mlpapers/paper_files/AISTATS2010_SutskeverT10.pdf | deadurl = yes | archiveurl = https://web.archive.org/web/20150610230811/http://machinelearning.wustl.edu/mlpapers/paper_files/AISTATS2010_SutskeverT10.pdf | archivedate = 2015-06-10 | access-date = 2014-09-16 }}</ref><ref name="RBMTutorial">Asja Fischer and Christian Igel. [http://image.diku.dk/igel/paper/TRBMAI.pdf  Training Restricted Boltzmann Machines: An Introduction] {{Wayback|url=http://image.diku.dk/igel/paper/TRBMAI.pdf |date=20150610230447 }}. Pattern Recognition 47, pp. 25-39, 2014</ref>。这些[[概率图模型]]可以对应到[[因子分析]]<ref>{{cite journal |author1=María Angélica Cueto |author2=Jason Morton |author3=Bernd Sturmfels |year=2010 |url=http://www.jasonmorton.com/morton/publications/geomBoltzofficial.pdf |title=Geometry of the restricted Boltzmann machine |journal=Algebraic Methods in Statistics and Probability |volume=516 |publisher=American Mathematical Society |arxiv=0908.4425 }}{{dead link|date=2017年12月 |bot=InternetArchiveBot |fix-attempted=yes }}</ref>。

== 训练算法 ==
受限玻尔兹曼机的训练目标是针对某一训练集<math>V</math>，最大化概率的乘积。其中，<math>V</math>被视为一矩阵，每个行向量作为一个可见单元向量<math>v</math>：

:<math>\arg\max_W \prod_{v \in V} P(v)</math>

或者，等价地，最大化<math>V</math>的[[对数概率]][[期望]]：<ref name="cdconvergence"/><ref name="RBMTutorial"/>

:<math>\arg\max_W \mathbb{E} \left[\sum_{v \in V} \log P (v)\right]</math>

训练受限玻尔兹曼机，即最优化权重矩阵<math>W</math>，最常用的算法是[[杰弗里·辛顿]]提出的对比分歧（contrastive divergence，CD）算法。这一算法最早被用于训练辛顿提出的“专家积”模型<ref>{{cite journal|url=http://www.mitpressjournals.org/doi/10.1162/089976602760128018|pages=1771–1800|title=Training Products of Experts by Minimizing Contrastive Divergence|journal=Neural Computation|volume=14|issue=8|date=2006-03-30|language=en|accessdate=2018-04-02|doi=10.1162/089976602760128018|author=Geoffrey E. Hinton|archive-date=2018-11-27|archive-url=https://web.archive.org/web/20181127114526/https://www.mitpressjournals.org/doi/10.1162/089976602760128018|dead-url=no}}</ref>。这一算法在[[梯度下降]]的过程中使用[[吉布斯采样]]完成对权重的更新，与训练前馈神经网络中利用反向传播算法类似。

基本的针对一个样本的单步对比分歧（CD-1）步骤可被总结如下：

# 取一个训练样本{{mvar|v}}，计算隐层节点的概率，在此基础上从这一概率分布中获取一个隐层节点激活向量的样本{{mvar|h}}；
# 计算{{mvar|v}}和{{mvar|h}}的[[外积]]，称为“正梯度”；
# 从{{mvar|h}}获取一个重构的可见层节点的激活向量样本{{mvar|v'}}，此后从{{mvar|v'}}再次获得一个隐层节点的激活向量样本{{mvar|h'}}；
# 计算{{mvar|v'}}和{{mvar|h'}}的外积，称为“负梯度”；
# 使用正梯度和负梯度的差以一定的学习率更新权重<math>w_{i,j}</math>：<math>\Delta w_{i,j} = \epsilon (vh^\mathsf{T} - v'h'^\mathsf{T})</math>。

偏置{{mvar|a}}和{{mvar|b}}也可以使用类似的方法更新。

== 参见 ==
* [[自編碼]]
* [[自编码器]]
* [[深度学习]]
* [[Hopfield神經網絡]]

== 参考资料 ==
{{Reflist|30em}}

== 外部链接 ==
* [http://blog.echen.me/2011/07/18/introduction-to-restricted-boltzmann-machines/ Introduction to Restricted Boltzmann Machines]{{Wayback|url=http://blog.echen.me/2011/07/18/introduction-to-restricted-boltzmann-machines/ |date=20121029074609 }}. Edwin Chen's blog, July 18, 2011.

[[Category:神經網路架構]]