查看“︁神经微分方程”︁的源代码

'''神经微分方程'''（{{lang-en|neural differential equation}}）是[[机器学习]]中的一种[[微分方程]]，其方程右侧项由[[人工神经网络]]的权重<math>\theta</math>参数化。<ref name=":0">{{Cite conference |last=Chen |first=Ricky T. Q. |last2=Rubanova |first2=Yulia |last3=Bettencourt |first3=Jesse |last4=Duvenaud |first4=David K. |year=2018 |editor-last=Bengio |editor-first=S. |editor2-last=Wallach |editor2-first=H. |editor3-last=Larochelle |editor3-first=H. |editor4-last=Grauman |editor4-first=K. |editor5-last=Cesa-Bianchi |editor5-first=N. |editor6-last=Garnett |editor6-first=R. |title=Neural Ordinary Differential Equations |url=https://proceedings.neurips.cc/paper_files/paper/2018/file/69386f6bb1dfed68692a24c8686939b9-Paper.pdf |conference= |publisher=Curran Associates, Inc. |volume=31 |arxiv=1806.07366 |booktitle=Advances in Neural Information Processing Systems}}</ref>'''神经[[常微分方程]]'''（{{lang|en|nerual ordinary differential equation}}，简称{{lang|en|neural ODE}}）是最常见的神经微分方程，可写作如下形式：

:<math>\frac{\mathrm{d} \mathbf{h}(t)}{\mathrm{d} t}=f_\theta(\mathbf{h}(t), t).</math>

在经典的神经网络中，各层是按自然数排序的。而在神经ODE中，各层形成一个由正实数排序的连续体。具体来说，函数<math>h: \mathbb{R}_{\ge 0} \to \mathbb{R} </math>将每个正序号''t''映射为一个实数值，表示神经网络在该层的状态。

神经ODE可以理解为连续时间[[控制系统]]，其数据插值能力可以用[[可控制性]]来解释。<ref>{{Cite journal |last=Ruiz-Balet |first=Domènec |last2=Zuazua |first2=Enrique |title=Neural ODE Control for Classification, Approximation, and Transport |url=https://epubs.siam.org/doi/10.1137/21M1411433 |journal=SIAM Review |language=en |date=2023 |volume=65 |issue=3 |page=735–773 |arxiv=2104.05278 |doi=10.1137/21M1411433 |issn=0036-1445}}</ref>

== 与残差神经网络的关联 ==
神经ODE可以被视为一种具有连续层而非离散层的[[残差神经网络]]。<ref name=":0" />将单位时间步长的[[欧拉方法]]应用于神经ODE，会得到残差神经网络的前向传播公式：

:<math>\mathbf{h}_{\ell+1} = f_{\theta}(\mathbf{h}_{\ell}, \ell) + \mathbf{h}_{\ell},</math>

其中<math>\ell</math>表示该残差神经网络的第<math>\ell</math>层。在残差神经网络中，前向传播是通过逐层应用一系列变换来实现的，而神经ODE的前向传播则是由求解微分方程来完成的。具体而言，给定神经ODE的输入<math>\mathbf{h}_{\text{in}}</math>，对应的输出<math>\mathbf{h}_{\text{out}}</math>可以通过求解以下[[初值问题]]得到：

:<math>\frac{\mathrm{d} \mathbf{h}(t)}{\mathrm{d} t}=f_\theta(\mathbf{h}(t), t), \quad \mathbf{h}(0)=\mathbf{h}_{\text{in}}, </math>

而<math>t=T</math>时的解<math>\mathbf{h}(T)</math>即为输出<math>\mathbf{h}_{\text{out}} </math>。

== 通用微分方程 ==
在已知某些物理信息的情况下，可以将神经ODE与已有的第一性原理模型相结合，构建一个被称为通用微分方程（{{lang|en|universal differential equation}}，简称{{lang|en|UDE}}）的物理信息神经网络模型。<ref>{{Cite arXiv |arxiv=2001.04385 |class=cs.LG |author=Christopher Rackauckas |author2=Yingbo Ma |title=Universal Differential Equations for Scientific Machine Learning |date=2024}}</ref><ref>{{Cite journal |last=Xiao |first=Tianbai |last2=Frank |first2=Martin |title=RelaxNet: A structure-preserving neural network to approximate the Boltzmann collision operator |url=https://linkinghub.elsevier.com/retrieve/pii/S0021999123004126 |journal=Journal of Computational Physics |language=en |date=2023 |volume=490 |page=112317 |arxiv=2211.08149 |bibcode=2023JCoPh.49012317X |doi=10.1016/j.jcp.2023.112317}}</ref><ref>{{Citation|last=Silvestri|first=Mattia|title=An Analysis of Universal Differential Equations for Data-Driven Discovery of Ordinary Differential Equations|date=2023|journal=Computational Science – ICCS 2023|volume=10476|pages=353–366|editor-last=Mikyška|editor-first=Jiří|url=https://link.springer.com/10.1007/978-3-031-36027-5_27|access-date=2024-08-18|place=Cham|publisher=Springer Nature Switzerland|language=en|doi=10.1007/978-3-031-36027-5_27|isbn=978-3-031-36026-8|last2=Baldo|first2=Federico|last3=Misino|first3=Eleonora|last4=Lombardi|first4=Michele|editor2-last=de Mulatier|editor2-first=Clélia|editor3-last=Paszynski|editor3-first=Maciej|editor4-last=Krzhizhanovskaya|editor4-first=Valeria V.}}</ref><ref>{{Cite arXiv |arxiv=2408.07143 |class=math.OC |author=Christoph Plate |author2=Carl Julius Martensen |title=Optimal Experimental Design for Universal Differential Equations}}</ref>例如，[[洛特卡-沃爾泰拉方程|洛特卡-沃尔泰拉模型]]的UDE版本可写成以下形式：<ref>{{Cite thesis |degree=Doctor of Philosophy |title=On Neural Differential Equations |last=Patrick Kidger |publisher=University of Oxford, Mathematical Institute |date=2021 |url=https://ora.ox.ac.uk/objects/uuid:af32d844-df84-4fdc-824d-44bebc3d7aa9}}</ref>

:<math>\begin{align}
 \frac{dx}{dt} &= \alpha x - \beta x y + f_{\theta}(x(t),y(t)), \\
 \frac{dy}{dt} &= - \gamma y + \delta x y + g_{\theta}(x(t),y(t)),
\end{align}</math>

其中<math>f_{\theta}</math>和<math>g_{\theta} </math>是神经网络参数化的修正项。

== 参见 ==
* [[物理信息神经网络]]

== 参考文献 ==
{{Reflist}}

[[Category:微分方程]]
[[Category:人工神经网络]]