查看“︁坐标下降法”︁的源代码

'''坐标下降法'''（{{lang-en|coordinate descent}}）是一种非梯度优化算法。算法在每次迭代中，在当前点处沿一个坐标方向进行{{link-en|一维搜索|line search}}以求得一个函数的局部极小值。在整个过程中循环使用不同的坐标方向。对于不可拆分的函数而言，算法可能无法在较小的迭代步数中求得最优解。为了加速收敛，可以采用一个适当的坐标系，例如通过[[主成分分析]]获得一个坐标间尽可能不相互关联的新坐标系（参考{{link-en|自适应坐标下降法|Adaptive coordinate descent}}）。
 
== 算法描述 ==
坐标下降法基于的思想是多变量函数<math>F(\mathbf{x})</math>可以通过每次沿一个方向优化来获取最小值。与通过梯度获取最速下降的方向不同，在坐标下降法中，优化方向从算法一开始就予以固定。例如，可以选择[[线性空间]]的一组[[基_(線性代數)|基]]<math>\mathbf{e}_1, \mathbf{e}_2, \dots, \mathbf{e}_n </math>作为搜索方向。 在算法中，循环最小化各个坐标方向上的目标函数值。亦即，如果<math>\mathbf{x}^k</math>已给定，那么，<math>\mathbf{x}^{k+1}</math>的第<math>i</math>个维度为

:<math> \mathbf{x}^{k+1}_i = \underset{y\in\mathbb R}{\operatorname{arg\,min}}\; f(x^{k+1}_1,...,x^{k+1}_{i-1},y,x^k_{i+1},...,x^k_n);</math>

因而，从一个初始的猜测值<math>\mathbf{x}_0</math>以求得函数<math>F</math>的局部最优值，可以迭代获得<math>\mathbf{x}_0, \mathbf{x}_1, \mathbf{x}_2, \dots</math>的序列。

通过在每一次迭代中采用{{link-en|一维搜索|line search}}，可以很自然地获得不等式

:<math>F(\mathbf{x}_0)\ge F(\mathbf{x}_1)\ge F(\mathbf{x}_2)\ge \cdots,</math>

可以知道，这一序列与最速下降具有类似的收敛性质。如果在某次迭代中，函数得不到优化，说明一个[[驻点]]已经达到。

这一过程可以用下图表示。

[[File:Coordinate descent.svg|center|500px]]

===例子===
对于非平滑函数，坐标下降法可能会遇到问题。下图展示了当函数等高线非平滑时，算法可能在非驻点中断执行。

[[File:Nonsmooth coordinate descent.svg|center|500px]]

== 应用 ==
坐标下降法在[[机器学习]]中有应用，例如训练线性[[支持向量机]]<ref>{{cite journal|title=A dual coordinate descent method for large-scale linear SVM|date=2008-07-05|publisher=ACM|isbn=9781605582054|doi=10.1145/1390156.1390208|pages=408–415|url=http://dl.acm.org/citation.cfm?id=1390156.1390208|accessdate=2018-04-02|author=Cho-Jui Hsieh, Kai-Wei Chang, Chih-Jen Lin, S. Sathiya Keerthi, S. Sundararajan}}</ref>（可见{{link-en|LIBLINEAR|LIBLINEAR}}）以及{{link-en|非负矩阵分解|non-negative matrix factorization}}<ref>{{cite journal|title=Fast coordinate descent methods with variable selection for non-negative matrix factorization|date=2011-08-21|publisher=ACM|isbn=9781450308137|doi=10.1145/2020408.2020577|pages=1064–1072|url=http://dl.acm.org/citation.cfm?id=2020408.2020577|accessdate=2018-04-02|author=Cho-Jui Hsieh, Inderjit S. Dhillon}}</ref>。

== 参见 ==
* {{link-en|自适应坐标下降法|Adaptive coordinate descent}}
* [[共轭梯度法]]
* [[梯度下降法]]
* [[牛顿法]]
* [[最优化]]
* {{link-en|一维搜索|line search}}

== 参考 ==
{{reflist}}
<!--- After listing your sources please cite them using inline citations and place them after the information they cite. Please see http://en.wikipedia.org/wiki/Wikipedia:REFB for instructions on how to add citations. --->

*{{Citation
 | last=Bezdek
 | first=J. C.
 | last2=Hathaway
 | first2=R. J.
 | last3=Howard
 | first3=R. E.
 | last4=Wilson
 | first4=C. A.
 | last5=Windham
 | first5=M. P.
 | year=1987
 | title=Local convergence analysis of a grouped variable version of coordinate descent
 | periodical=Journal of Optimization theory and applications
 | volume=54
 | issue=3
 | pages=471–477
 | doi=10.1007/BF00940196
 | publisher=Kluwer Academic/Plenum Publishers
}}
* Bertsekas, Dimitri P. (1999). ''Nonlinear Programming, Second Edition'' Athena Scientific, Belmont, Massachusetts. ISBN 1-886529-00-0.
* {{Citation
 | last=Canutescu
 | first=AA
 | last2=Dunbrack
 | first2=RL
 | title=Cyclic coordinate descent: A robotics algorithm for protein loop closure.
 | periodical=Protein science
 | year=2003
 | volume=12
 | issue=5
 | pages=963-72
 | pmid=12717019
}}. 
*{{Citation
 | last=Luo
 | first=Zhiquan
 | last2=Tseng
 | first2=P.
 | year=1992
 | title=On the convergence of the coordinate descent method for convex differentiable minimization
 | periodical=Journal of Optimization theory and applications
 | volume=72
 | issue=1
 | pages=7–35
 | doi=10.1007/BF00939948
 | publisher=Kluwer Academic/Plenum Publishers
}}.
*{{Citation
 | last=Wu
 | first=TongTong
 | last2=Lange
 | first2=Kenneth
 | year=2008
 | title=Coordinate descent algorithms for Lasso penalized regression
 | periodical=The Annals of Applied Statistics
 | volume=2
 | issue=1
 | pages=224–244
 | doi=10.1214/07-AOAS147
 | publisher=Institute of Mathematical Statistics
}}.
*{{Citation
 | last=Richtarik
 | first=Peter
 | last2=Takac
 | first2=Martin
 | year=April 2011
 | title=Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function
 | periodical=Mathematical Programming
 | doi=10.1007/s10107-012-0614-z
 | publisher=Springer
}}.
*{{Citation
 | last=Richtarik
 | first=Peter
 | last2=Takac
 | first2=Martin
 | year=December 2012
 | title=Parallel coordinate descent methods for big data optimization
 | periodical=arXiv:1212.0873
}}.

[[Category:最优化算法]]