查看“︁應用於最優化的牛頓法”︁的源代码

{{noteTA
|G1=Math
|G2=IT
}}

[[Image:Newton optimization vs grad descent.svg|right|thumb|[[最速下降法]] (綠色) 與牛頓法 (紅色) 在求最小值問題上的比較 (帶有步長). 可見牛頓法根據曲率選擇了一條「捷徑」. ]] 
<!--__NOTOC__-->

[[牛頓法]]是[[微積分學]]中, 通過[[疊代法|疊代]]以求解[[可微函數]]<math>f</math>的[[零點]]的一種算法 (即求<math>x</math>使得<math>f(x)=0</math>). 而在[[最佳化]]中, 牛頓法通常被運用於求解一個[[光滑函數|二次可微函數]]<math>f</math>的[[導數|一階導數]]<math>f^\prime</math>的零點 (即求<math>x</math>使得<math>f^\prime(x)=0</math>), 同時也是<math>f</math>的[[駐點]]. 因此從另一個角度而言，'''應用於最佳化的牛頓法'''是搜索函數<math>f(x)</math>的最小值或最大值的一種算法。


一維問題的牛頓法主要步驟如下:
取一個點<math>x_0</math>為'''初值''', 依如下公式'''疊代''':
:<math>x_{n+1}=x_n-\frac{f^\prime(x_n)}{f^{\prime\prime}(x_n)},</math>
直至滿足一定條件 (如<math>f^\prime(x_n)=0</math>或<math>x_{n+1}-x_n<\varepsilon</math>, 其中<math>\varepsilon</math>為一個給定的足夠小的常數) 後, 算法終止。


== 方法描述 ==

在一維問題中, 牛頓法將構造一個以<math>x_0</math>為首項, [[收斂]]到<math>x^*</math>的數列<math>\{x_n\}</math>, 其中<math>x^*</math>使得<math>f^\prime(x^*)=0</math>成立. 

<math>f(x)</math>在<math>x=x_n</math>處的二階[[泰勒展開式]]<math>f_T(x)</math>為:
:<math>f_T(x)=f_T(x_n+\Delta x)\approx f(x_n)+f^\prime(x_n)\Delta x+\frac{1}{2}f^{\prime\prime}(x_n)\Delta x^2.</math>
我們希望找到<math>\Delta x</math>使<math>x_n+\Delta x</math>為<math>f_T(x)</math>的一個駐點。則將上式對<math>\Delta x</math>進行求導:
:<math>0=\frac{\text{d}}{\text{d}\Delta x}(f(x_n)+f^\prime(x_n)\Delta x+\frac{1}{2}f^{\prime\prime}(x_n)\Delta x^2)=f^\prime(x_n)+f^{\prime\prime}(x_n)\Delta x.</math>
上述方程的解<math>\Delta x=-\frac{f^\prime(x_n)}{f^{\prime\prime}(x_n)}</math>滿足
:<math>x_{n+1}=x_{n}+\Delta x=x_n-\frac{f^\prime(x_n)}{f^{\prime\prime}(x_n)}</math>
收斂於<math>f_T(x)</math>的駐點<math>x^*</math>.

== 幾何意義 ==

牛頓法的幾何意義為: 在每一次疊代中，均以一個[[二次函數]]去逼近<math>f(x)</math>. 具體而言: 
在一維問題中，已知<math>x_n</math>, <math>f(x_n)</math>, <math>f^\prime(x_n)</math>及<math>f^{\prime\prime}(x_n)</math>, 設[[二次函數]]表逹式為<math>ax^2+bx+c</math>, 依下列方程求解未知數<math>a,\ b,\ c,</math>
:<math>ax_n^2+bx_n+c=f(x_n),</math>
:<math>2ax_n+b=f^\prime(x_n),</math>
:<math>2a=f^{\prime\prime}(x_n).</math>
然後[[二次函數]]<math>ax^2+bx+c</math>的[[極值]]點即為下一個疊代點, 
:<math>x_{n+1}=-\frac{b}{2a}.</math>

而在高維問題中, 上述的[[極值]]點也可以是[[鞍點]]. 值得一提的是, 如果<math>f(x)</math>恰為一個[[二次函數]], 則其[[極值]]點只需一次疊代中即可找到.


== 高維問題求解 ==

上述的一維問題的疊代法可以被推廣至多維問題. 只需將[[導數]]替換為[[梯度]] (<math>\nabla f(x)</math>), 並將二階導數的[[倒數]]替換為[[Hessian矩陣]]的[[逆矩陣]] (<math>\mathbf{H}f(x)</math>), 即: 
:<math>x_{n+1}=x_n-[\mathbf{H}f(x_n)]^{-1}\nabla f(x_n), n\ge 0.</math>
通常, 使用牛頓法時會加入一個步長變量<math>\gamma\in(0,1)</math>作微調以使每一步疊代都滿足[[Wolfe條件]], 即, 
:<math>x_{n+1}=x_n-\gamma[\mathbf{H}f(x_n)]^{-1}\nabla f(x_n).</math>
這個方法被稱為'''無約束牛頓法''', 通常用於第一步之後的疊代. 

只要牛頓法適用, 其[[收斂]]於最小值或最大值的速度均頗快於[[最速下降法]]. 事實上, 對於每一個極小值, 都存在一個[[鄰域]]<math>N</math>使得, 只要[[Hessian矩陣]]是[[可逆矩陣|可逆]]的且是一個關於<math>x\in N</math>的[[利普希茨連續|Lipschitz連續]]函數, 以<math>x_0\in N</math>為初值, 步長<math>\gamma=1</math>的牛頓法是'''[[收斂速度|二次收斂]]'''的.



求一個高維問題的[[Hessian矩陣]]的[[逆矩陣]]是一件頗費工夫的事情. 在實際應用中, 通常會用[[向量]]<math>\Delta x=x_{n+1}-x_n</math>作為[[線性方程組]]
:<math>[\mathbf{H}f(x_n)]\Delta x=-\nabla f(x_n)</math>
的解. 這個求解過程中, 透過使用各種矩陣分解方法同近似求解方法, 求解速度可以大大提升. 然而, 這些矩陣分解方法或近似求解方法的使用需要滿足一定條件; 例如, [[Cholesky分解]]同[[共軛梯度法]]只有在<math>\mathbf{H}f(x)</math>是[[正定矩陣]]時才適用. 這看似是一個限制, 但有時也能充當檢驗答案的工具; 例如, 在一個最小化問題 (<math>\text{min  }f(x)</math>) 中, 求出一個<math>x^\prime</math>使得<math>\nabla f(x^\prime)=0</math>但<math>\mathbf{H}f(x)</math>不是[[正定矩陣]], 那麽<math>(x^\prime,f(x^\prime))</math>只是<math>f(x)</math>的一個[[鞍點]]而非極小值點. 


另一方面, 一個有約束的問題的求解過程可能會遇到'''當前解'''陷入一個鞍點的情況, 這時的[[Hessian矩陣]]是[[對稱]][[正定矩陣#負定、半定及不定矩陣|不定]]的; 此時則要使用其他方法, 例如[[Cholesky分解|Cholesky分解的<math>\mathbf{LDL}^\mathbf{T}</math>變形]]或[[共軛梯度法]]等的方法, 來疊代得出<math>x_{n+1}</math>.


此外, 為規避求[[Hessian矩陣]]的繁瑣, 也存在多種[[擬牛頓法]], 通過調整[[梯度]]以求出[[Hessian矩陣]]的近似. 


如果[[Hessian矩陣]]<math>\mathbf{H}f(x)</math>接近一個[[非奇異矩陣|奇異矩陣]], 則其逆矩陣會變得數值不穩定且疊代不會收斂. 此種情形下, 前人探索出了很多成功的方法來解決問題. 目標之一是通過引入'''修正矩陣'''<math>B_n</math>使得<math>\mathbf{H}f(x_n):= \mathbf{H}f(x_n) + B_n</math>是[[正定矩陣|對稱正定]]的. 其中一種方法是將<math>\mathbf{H}f(x_n)</math>[[對角化]], 選擇<math>B_n</math>使<math>\mathbf{H}f(x_n) + B_n</math>有相同的[[特徵向量]], 但每一個<math>\mathbf{H}f(x_n)</math>的負[[特徵值]]都被替換成<math>\epsilon>0.</math>


一個應用於[[萊文貝格－馬夸特方法]] (其中用到了近似的[[Hessian矩陣]]) 的方法是引入一個帶係數的[[單位矩陣]]<math>\mu\mathbf{I}</math>, 係數在每一步疊代中調整. 對於較大的<math>\mu</math>及較小的[[Hessian矩陣]], 疊代將變得與以<math>\mu^{-1}</math>為步長的[[最速下降法]]相似, 這將使得疊代收斂變慢, 但在[[Hessian矩陣]][[正定矩陣#負定、半定及不定矩陣|不定]]或[[正定矩陣#負定、半定及不定矩陣|半定]]的情況下, 收斂更穩定.

==參閱==

*[[擬牛頓法]]
*[[最速下降法]]
*[[高斯–牛頓算法]]
*[[萊文貝格－馬夸特方法]]
*[[置信域方法]]
*[[最佳化]]
*[[Nelder–Mead方法]]

==參考文獻==
* {{cite book |last= Avriel |first= Mordecai |date= 2003 |title= Nonlinear Programming: Analysis and Methods |url= https://archive.org/details/nonlinearprogram0000avri_m5r2 |publisher= Dover Publishing |isbn= 0-486-43227-0 }}
* {{cite book |last1= Bonnans |first1= J.&nbsp;Frédéric |last2= Gilbert |first2= J.&nbsp;Charles |last3= Lemaréchal |first3= Claude |authorlink3= Claude Lemaréchal |last4= Sagastizábal |first4= Claudia&nbsp;A. |title= Numerical optimization: Theoretical and practical aspects |url= http://www.springer.com/mathematics/applications/book/978-3-540-35445-1 |edition= Second revised ed. of  translation of 1997 French <!-- |orig-year= 1997 |language= fr  |trans-title= Optimisation numérique: Aspects théoriques et pratiques --> |series= Universitext |publisher= Springer-Verlag |location= Berlin |year= 2006 |pages= xiv+490 |isbn= 3-540-35445-X |doi= 10.1007/978-3-540-35447-5 |mr= 2265882 |access-date= 2017-08-07 |archive-date= 2013-07-19 |archive-url= https://web.archive.org/web/20130719050442/http://www.springer.com/mathematics/applications/book/978-3-540-35445-1 |dead-url= no }}
* {{Cite book |last1= Fletcher |first1= Roger |title= Practical methods of optimization |url= https://archive.org/details/practicalmethods0000flet |publisher= [[John Wiley & Sons]] |location= New York |edition= 2nd |isbn= 978-0-471-91547-8 |year= 1987 }}
* {{cite book |last1= Nocedal |first1= Jorge |last2= Wright |first2= Stephen J. |date= 1999 |title= Numerical Optimization |url= https://archive.org/details/numericaloptimiz0000noce |publisher= Springer-Verlag |isbn= 0-387-98793-2 }}

==外部連結==
* {{Cite web |url= http://bl.ocks.org/dannyko/ffe9653768cb80dfc0da/ |title= Newton-Raphson visualization (1D) |first= Daniel |last= Korenblum |id= ffe9653768cb80dfc0da |date= Aug 29, 2015 |website= [[Mike Bostock|Bl.ocks]] |accessdate= 2017-08-07 |archive-date= 2014-07-14 |archive-url= https://web.archive.org/web/20140714222509/http://bl.ocks.org/dannyko/ffe9653768cb80dfc0da |dead-url= no }}

{{艾薩克·牛頓}}

[[Category:优化算法和方法]]