查看“︁基于流的生成模型”︁的源代码

{{Machine learning bar}}
'''基于流的生成模型'''（{{lang-en|flow-based generative model}}）是[[机器学习]]中的一类[[生成模型]]，利用'''归一化流'''（{{lang|en|normalizing flow}}）显式建模[[概率分布]]。<ref>{{Cite journal |last=Tabak |first=Esteban G. |last2=Vanden-Eijnden |first2=Eric |title=Density estimation by dual ascent of the log-likelihood |url=https://projecteuclid.org/journals/communications-in-mathematical-sciences/volume-8/issue-1/Density-estimation-by-dual-ascent-of-the-log-likelihood/cms/1266935020.full |journal=Communications in Mathematical Sciences |date=2010 |volume=8 |issue=1 |page=217–233 |doi=10.4310/CMS.2010.v8.n1.a11}}</ref><ref>{{Cite journal |last=Tabak |first=Esteban G. |last2=Turner |first2=Cristina V. |title=A family of nonparametric density estimation algorithms |url=https://onlinelibrary.wiley.com/doi/abs/10.1002/cpa.21423 |journal=Communications on Pure and Applied Mathematics |date=2012 |volume=66 |issue=2 |page=145–164 |doi=10.1002/cpa.21423 |hdl=11336/8930 |s2cid=17820269 |hdl-access=free}}</ref><ref>{{Cite journal |last=Papamakarios |first=George |last2=Nalisnick |first2=Eric |last3=Jimenez Rezende |first3=Danilo |last4=Mohamed |first4=Shakir |last5=Bakshminarayanan |first5=Balaji |title=Normalizing flows for probabilistic modeling and inference |url=https://dl.acm.org/doi/abs/10.5555/3546258.3546315 |journal=Journal of Machine Learning Research |year=2021 |volume=22 |issue=1 |page=2617–2680 |arxiv=1912.02762}}</ref>这是一种使用[[概率密度]]变量变换法将简单分布转换为复杂分布的统计方法。

直接建模[[似然函数]]具有很多优点。例如，可以直接计算得到负对数似然并将其作为[[损失函数]]最小化。此外，通过从初始分布中采样并应用流变换可以生成新的样本。

相比之下，[[变分自编码器]]、[[生成对抗网络]]等其他生成模型无法显式地表示似然函数。

== 方法 ==
[[File:Normalizing-flow.svg|thumb|归一化流图示]]
考虑随机变量<math>z_1</math>和<math>z_0</math>，其中<math>z_0 = f^{-1}_1(z_1)</math>。对于<math>i = 1, ..., K</math>，定义一系列由<math>z_0</math>变换得到的随机变量<math>z_i = f_i(z_{i-1})</math>。其中函数<math>f_1, ..., f_K</math>满足可逆性，即存在[[反函数]]<math>f^{-1}_i</math>。最终输出变量<math>z_K</math>用于对目标分布进行建模。

<math>z_K</math>的对数似然为（参见下方[[#对数似然的推导|推导过程]]）：

:<math>\log p_K(z_K) = \log p_0(z_0) - \sum_{i=1}^{K} \log \left|\det \frac{df_i(z_{i-1})}{dz_{i-1}}\right|</math>

为了高效计算对数似然，函数<math>f_1, ..., f_K</math>应既易于求逆，也易于计算其[[雅可比矩阵]]的[[行列式]]。在实践中，这些函数通常使用[[深度神经网络]]建模，并通过训练以最小化目标分布数据样本的负对数似然。这些架构一般设计为只需神经网络的正向传播即可完成逆运算与雅可比行列式的计算，例如NICE<ref name=":1">{{Cite arXiv |arxiv=1410.8516 |class=cs.LG |first=Laurent |last=Dinh |first2=David |last2=Krueger |title=NICE: Non-linear Independent Components Estimation}}</ref>、RealNVP<ref name=":2">{{Cite arXiv |arxiv=1605.08803 |class=cs.LG |first=Laurent |last=Dinh |first2=Jascha |last2=Sohl-Dickstein |title=Density estimation using Real NVP}}</ref>、Glow<ref name="glow">{{Cite arXiv |arxiv=1807.03039 |class=stat.ML |first=Diederik P. |last=Kingma |first2=Prafulla |last2=Dhariwal |title=Glow: Generative Flow with Invertible 1x1 Convolutions}}</ref>等。

=== 对数似然的推导 ===
考虑<math>z_1</math>与<math>z_0</math>，两者之间满足<math>z_0 = f^{-1}_1(z_1)</math>。通过[[機率密度函數|概率密度]]变量变换公式，<math>z_1</math>的分布为：

:<math>p_1(z_1) = p_0(z_0)\left|\det \frac{df_1^{-1}(z_1)}{dz_1}\right|</math>

其中<math>\det \frac{df_1^{-1}(z_1)}{dz_1}</math>是<math>f^{-1}_1</math>的[[雅可比矩阵]]的[[行列式]]。

由[[反函数定理]]可以得到

:<math>p_1(z_1) = p_0(z_0)\left|\det \left(\frac{df_1(z_0)}{dz_0}\right)^{-1}\right|</math>

使用行列式性质<math>\det(A^{-1}) = \det(A)^{-1}</math>（其中<math>A</math>是[[非奇异方阵|可逆矩阵]]），则有：

:<math>p_1(z_1) = p_0(z_0)\left|\det \frac{df_1(z_0)}{dz_0}\right|^{-1}</math>

对上式取对数后得到对数似然：

:<math>\log p_1(z_1) = \log p_0(z_0) - \log \left|\det \frac{df_1(z_0)}{dz_0}\right|</math>

对于任意<math>z_i</math>和<math>z_{i-1}</math>都能得到类似结论。最终由递归关系可以得到：

:<math>\log p_K(z_K) = \log p_0(z_0) - \sum_{i=1}^{K} \log \left|\det \frac{df_i(z_{i-1})}{dz_{i-1}}\right|</math>

== 训练方法 ==
与训练其他一些深度学习模型类似，归一化流的目标是最小化模型的似然分布与目标分布之间的[[KL散度]]。将模型的似然分布记为<math>p_\theta</math>，要学习的目标分布记为<math>p^*</math>，则（正向）KL散度为：

:<math>D_{KL}[p^{*}(x)||p_{\theta}(x)] = -\mathbb{E}_{p^{*}(x)}[\log(p_{\theta}(x))] + \mathbb{E}_{p^{*}(x)}[\log(p^{*}(x))]</math>

上式右边第二项表示目标分布的熵，与模型参数<math>\theta</math>无关，因此在优化时可以忽略，留下需要最小化的项是目标分布下的负对数似然的期望。由于此项难以直接计算，可以通过[[重要性采样]]的蒙特卡洛方法来近似。若已从目标分布<math>p^*(x)</math>中独立采样得到的数据集<math>\{x_{i}\}_{i=1:N}</math>从的样本，则该项可近似估计为：

:<math>-\hat{\mathbb{E}}_{p^{*}(x)}[\log(p_{\theta}(x))] = -\frac{1}{N} \sum_{i=0}^{N} \log(p_{\theta}(x_{i})) </math>

因此，可以将学习目标

:<math>\underset{\theta}{\operatorname{arg\,min}}\ D_{KL}[p^{*}(x)||p_{\theta}(x)]</math>

替换为

:<math>\underset{\theta}{\operatorname{arg\,max}}\ \sum_{i=0}^{N} \log(p_{\theta}(x_{i}))</math>

换句话说，最小化KL相度相当于[[最大似然估计|最大化模型在观测样本下的似然]]。<ref>{{Cite journal |last=Papamakarios |first=George |last2=Nalisnick |first2=Eric |last3=Rezende |first3=Danilo Jimenez |last4=Shakir |first4=Mohamed |last5=Balaji |first5=Lakshminarayanan |title=Normalizing Flows for Probabilistic Modeling and Inference |url=https://jmlr.org/papers/v22/19-1028.html |journal=Journal of Machine Learning Research |date=March 2021 |volume=22 |issue=57 |page=1–64 |arxiv=1912.02762}}</ref>

训练归一化流的伪代码如下：<ref>{{Cite journal |last=Kobyzev |first=Ivan |last2=Prince |first2=Simon J.D. |last3=Brubaker |first3=Marcus A. |title=Normalizing Flows: An Introduction and Review of Current Methods |url=https://ieeexplore.ieee.org/document/9089305 |journal=IEEE Transactions on Pattern Analysis and Machine Intelligence |date=November 2021 |volume=43 |issue=11 |page=3964–3979 |arxiv=1908.09257 |doi=10.1109/TPAMI.2020.2992934 |issn=1939-3539 |pmid=32396070 |s2cid=208910764}}</ref>

* 输入：数据集<math>x_{1:n}</math>，归一化流模型<math>f_\theta(\cdot), p_0 </math>
* 求解：通过梯度下降法最化<math>\max_\theta \sum_j \ln p_\theta(x_j)</math>
* 输出：优化后的参数<math>\hat\theta</math>

== 变体 ==
=== 平面流 ===
平面流（{{lang|en|planar flow}}）是最早的归一化流方法。<ref name=":0">{{Cite arXiv |arxiv=1505.05770 |class=stat.ML |author=Danilo Jimenez Rezende |first2=Shakir |last2=Mohamed |title=Variational Inference with Normalizing Flows}}</ref>给定某个激活函数<math>h</math>，以及具有适当维度的参数<math>\theta = (u, w, b)</math>，可以定义<math display="block">x = f_\theta(z) = z + u h(\langle w, z \rangle + b)</math>一般而言，逆映射<math>f_\theta^{-1}</math>没有解析解。

相应的雅可比行列式为<math>|\det (I + h'(\langle w, z \rangle + b) uw^T)| = |1 + h'(\langle w, z \rangle + b) \langle u, w\rangle|</math> 。

为保证其处处可逆，行列式必须在整个定义域内非零。例如当<math>h = \tanh</math>、<math>\langle u, w \rangle > -1</math>时可以满足可逆性要求。

=== 非线性独立成分估计（NICE） ===
非线性独立成分估计（{{lang|en|nonlinear independent components estimation}}，简称{{lang|en|NICE}}）假设<math>x, z\in \R^{2n}</math>是偶数维变量，并将其从中间分成两部分。<ref name=":12">{{Cite arXiv |arxiv=1410.8516 |class=cs.LG |first=Laurent |last=Dinh |first2=David |last2=Krueger |title=NICE: Non-linear Independent Components Estimation}}</ref>此时归一化流的定义为

:<math>x = \begin{bmatrix}
	 x_1 \\ x_2
	 \end{bmatrix}= f_\theta(z) = \begin{bmatrix}
	 z_1 \\z_2
	 \end{bmatrix} + \begin{bmatrix}
	 0 \\ m_\theta(z_1)
	 \end{bmatrix}</math>

其中<math>m_\theta</math>是任何带有权重<math>\theta</math>的神经网络。

其逆映射<math>f_\theta^{-1}</math>为<math>z_1 = x_1, z_2 = x_2 - m_\theta(x_1)</math>，雅可比行列式为1，即该归一化流动是体积保持（{{lang|en|volume-preserving}}）的。

当<math>n=1</math>时，这一映身可以视为沿<math>x_2</math>方向的一种曲线剪切。

=== 实值非体积保持（Real NVP）===
实值非体积保持（{{lang|en|real non-volume preserving}}，简称{{lang|en|Real NVP}}）是NICE模型的一种推广，定义为：<ref name=":22">{{Cite arXiv |arxiv=1605.08803 |class=cs.LG |first=Laurent |last=Dinh |first2=Jascha |last2=Sohl-Dickstein |title=Density estimation using Real NVP}}</ref>

:<math>x = \begin{bmatrix}
	 x_1 \\ x_2
	 \end{bmatrix}= f_\theta(z) = \begin{bmatrix}
	 z_1 \\ e^{s_\theta(z_1)} \odot z_2
	 \end{bmatrix} + \begin{bmatrix}
	 0 \\ m_\theta(z_1)
	 \end{bmatrix}</math>

其逆映射是<math>z_1 = x_1, z_2 = e^{-s_\theta (x_1)}\odot (x_2 - m_\theta (x_1))</math>，相应的雅可比行列式为<math>\prod^n_{i=1} e^{s_\theta(z_{1, })}</math>。当<math>s_\theta = 0</math>时，退化为NICE模型。由于Real NVP映射将向量<math>x</math>的两部分分开处理，通常需要在每一层后添加一个置换操作<math>(x_1, x_2) \mapsto (x_2, x_1)</math>。

=== 生成流（Glow） ===
生成流（{{lang|en|generative flow}}，简称{{lang|en|Glow}}）模型<ref name="glow2">{{Cite arXiv |arxiv=1807.03039 |class=stat.ML |first=Diederik P. |last=Kingma |first2=Prafulla |last2=Dhariwal |title=Glow: Generative Flow with Invertible 1x1 Convolutions}}</ref>中每层由三个部分组成：

* 通道方向的仿射变换<math display="block">y_{cij} = s_c(x_{cij} + b_c)</math>相应的雅可比行列式为<math>\prod_c s_c^{HW}</math> 。
* 可逆1x1卷积<math display="block">z_{cij} = \sum_{c'} K_{cc'} y_{cij}</math>相应的雅可比行列式为<math>\det(K)^{HW}</math>，其中<math>K</math>是任意可逆矩阵。
* Real NVP部分，其雅可比行列式如前所述。

生成流通过引入可逆1x1卷积，改进了Real NVP中仅仅置换前后两部分的方式，而是对所有层的通道进行总体上的置换。

=== 掩码自回归流（MAF） ===
掩码自回归流（{{lang|en|masked autoregresssive flow}}，简称{{lang|en|MAF}}）基于自回归模型，其定义了一个分布在<math>\R^n</math>上的随机过程<ref>{{Cite journal |last=Papamakarios |first=George |last2=Pavlakou |first2=Theo |last3=Murray |first3=Iain |title=Masked Autoregressive Flow for Density Estimation |url=https://proceedings.neurips.cc/paper/2017/hash/6c1da886822c67822bcf3679d04369fa-Abstract.html |journal=Advances in Neural Information Processing Systems |publisher=Curran Associates, Inc. |date=2017 |volume=30 |arxiv=1705.07057}}</ref>：

:<math>\begin{align}
		 x_1 \sim& N(\mu_1, \sigma_1^2)\\
		 x_2 \sim& N(\mu_2(x_1), \sigma_2(x_1)^2)\\
		 &\cdots \\
		 x_n \sim& N(\mu_n(x_{1:n-1}), \sigma_n(x_{1:n-1})^2)\\
\end{align}</math>

其中<math>\mu_i: \R^{i-1} \to \R</math>和<math>\sigma_i: \R^{i-1} \to (0, \infty)</math>是定义自回归模型的固定函数。

使用{{le|重参数化技巧|Reparameterization trick}}，该自回归模型可以被推广为归一化流：

:<math>\begin{align}
		 x_1 =& \mu_1 + \sigma_1 z_1\\
		 x_2 =& \mu_2(x_1) + \sigma_2(x_1) z_2\\
		 &\cdots \\
		 x_n =& \mu_n(x_{1:n-1}) + \sigma_n(x_{1:n-1}) z_n\\
\end{align}</math>

令<math>z \sim N(0, I_{n})</math> 可以重新得到自回归模型。

正向映射由于是顺序性的因而会很慢，但反向映射因为可以并列而会比较很快。

相应的雅可比矩阵是下对角矩阵，其行列式为<math>\sigma_1 \sigma_2(x_1)\cdots \sigma_n(x_{1:n-1})</math> 。

通过反转<math>f_\theta</math>和<math>f_\theta^{-1}</math>这两个映射，可以得到逆自回归流 ({{lang|en|inverse autoregressive flow}}，简称{{lang|en|IAF}})。与MAF相反，IAF的正向映射较快、反向映射较慢。<ref>{{Cite journal |last=Kingma |first=Durk P |last2=Salimans |first2=Tim |last3=Jozefowicz |first3=Rafal |last4=Chen |first4=Xi |last5=Sutskever |first5=Ilya |last6=Welling |first6=Max |title=Improved Variational Inference with Inverse Autoregressive Flow |url=https://proceedings.neurips.cc/paper/2016/hash/ddeebdeefdb7e7e7a697e1c3e3d8ef54-Abstract.html |journal=Advances in Neural Information Processing Systems |publisher=Curran Associates, Inc. |date=2016 |volume=29 |arxiv=1606.04934}}</ref>

=== 连续归一化流（CNF） ===
除了通过函数组合来构建流的方法，另一种方法是将流表示为连续时间动力学，得到连续归一化流（{{lang|en|continuous normalizing flow}}，简称{{lang|en|CNF}}）。<ref name="ffjord">{{Cite arXiv |arxiv=1810.01367 |class=cs.LG |first=Will |last=Grathwohl |first2=Ricky T. Q. |last2=Chen |title=FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models}}</ref><ref>{{Cite arXiv |arxiv=2210.02747 |class=cs.LG |first=Yaron |last=Lipman |first2=Ricky T. Q. |last2=Chen |title=Flow Matching for Generative Modeling |date=2022-10-01}}</ref>设<math>z_0</math>为具有分布<math>p(z_0)</math>的潜变量，使用以下流函数将此潜变量映射到数据空间：

:<math>x = F(z_0) = z_T = z_0 + \int_0^T f(z_t, t) dt</math>

其中<math>f</math>是任意函数，可以使用神经网络等进行建模。其反函数为：<ref name="ffjord2">{{Cite arXiv |arxiv=1810.01367 |class=cs.LG |first=Will |last=Grathwohl |first2=Ricky T. Q. |last2=Chen |title=FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models}}</ref>

:<math>z_0 = F^{-1}(x) = z_T + \int_T^0 f(z_t, t) dt = z_T - \int_0^T f(z_t,t) dt </math>

于是可以得到<math>x</math>的对数似然：<ref name="ffjord3">{{Cite arXiv |arxiv=1810.01367 |class=cs.LG |first=Will |last=Grathwohl |first2=Ricky T. Q. |last2=Chen |title=FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models}}</ref>

:<math>\log(p(x)) = \log(p(z_0)) - \int_0^T \text{Tr}\left[\frac{\partial f}{\partial z_t} \right] dt</math>

由于上式中的迹仅取决于雅可比矩阵<math>\partial_{z_t} f</math>的对角线，这意味着对雅可比矩阵的形式没有任何限制。<ref>{{Cite arXiv |arxiv=1810.01367 |class=cs.LG |first=Will |last=Grathwohl |first2=Ricky T. Q. |last2=Chen |title=FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models |date=2018-10-22}}</ref>这与先前的离散归一化模型不同，后者将雅可比矩阵设计为上对角或下对角形式，以便更高效地计算其行列式。

其中的迹可以使用“哈钦森技巧”（{{lang|en|Hutchinson's trick}}）来估计<ref name="Finlay 3154–3164">{{Cite journal |last=Finlay |first=Chris |last2=Jacobsen |first2=Joern-Henrik |last3=Nurbekyan |first3=Levon |last4=Oberman |first4=Adam |title=How to Train Your Neural ODE: the World of Jacobian and Kinetic Regularization |url=https://proceedings.mlr.press/v119/finlay20a.html |journal=International Conference on Machine Learning |language=en |publisher=PMLR |date=2020-11-21 |page=3154–3164 |arxiv=2002.02798}}</ref><ref>{{Cite journal |last=Hutchinson |first=M.F. |title=A Stochastic Estimator of the Trace of the Influence Matrix for Laplacian Smoothing Splines |url=http://www.tandfonline.com/doi/abs/10.1080/03610918908812806 |journal=Communications in Statistics - Simulation and Computation |language=en |date=January 1989 |volume=18 |issue=3 |page=1059–1076 |doi=10.1080/03610918908812806 |issn=0361-0918}}</ref>：给定任意矩阵<math>W\in \R^{n\times n}</math>以及满足<math>E[uu^T] = I</math>的任意随机向量<math>u\in \R^n</math>，可以得到<math>E[u^T W u] = tr(W)</math> 。

通常，随机向量<math>u</math>可以从正态分布<math>N(0, I)</math> 或{{le|拉德马赫分布| Rademacher}}<math>\{\pm n^{-1/2}\}^n</math>中进行采样。

当<math>f</math>由神经网络实现时，需要使用[[神经常微分方程]]<ref>{{Cite conference |last=Chen |first=Ricky T. Q. |last2=Rubanova |first2=Yulia |last3=Bettencourt |first3=Jesse |last4=Duvenaud |first4=David K. |year=2018 |editor-last=Bengio |editor-first=S. |editor2-last=Wallach |editor2-first=H. |editor3-last=Larochelle |editor3-first=H. |editor4-last=Grauman |editor4-first=K. |editor5-last=Cesa-Bianchi |editor5-first=N. |editor6-last=Garnett |editor6-first=R. |title=Neural Ordinary Differential Equations |url=https://proceedings.neurips.cc/paper_files/paper/2018/file/69386f6bb1dfed68692a24c8686939b9-Paper.pdf |conference= |publisher=Curran Associates, Inc. |volume=31 |arxiv=1806.07366 |booktitle=Advances in Neural Information Processing Systems}}</ref>。实际上，CNF最早是与神经常微分方程在同一篇论文中提出的。

CNF主要存在两个缺陷：一是连续流必须是[[同胚]]的，从而保持方向性和{{le|环境同痕|Ambient isotopy}}（例如，不可能通过空间连续变形将左手翻转为右手，也不可能[[斯梅爾悖論|将球面外翻]]或解开一个结），二是学习到的流<math>f</math>可能会由于退化而表现不佳（有无数个可能的<math>f</math>都能解决同一问题)。

通过增加额外的维度，CNF可以获得足够的自由度来反转方向并突破环境同痕（类似于可以在三维空间中翻转一个多边形，或在四维空间中解开一个结），从而得到“增强神经常微分方程”。<ref>{{Cite journal |last=Dupont |first=Emilien |last2=Doucet |first2=Arnaud |last3=Teh |first3=Yee Whye |title=Augmented Neural ODEs |url=https://proceedings.neurips.cc/paper/2019/hash/21be9a4bd4f81549a9d1d241981cec3c-Abstract.html |journal=Advances in Neural Information Processing Systems |publisher=Curran Associates, Inc. |date=2019 |volume=32}}</ref>

通过结合流形的{{le|惠特尼嵌入定理|Whitney embedding theorem}}和神经网络的[[通用近似定理]]，能够证明<math>\R^n</math>的任何同胚可以由<math>\R^{2n+1}</math>上的神经常微分方程近似。<ref>{{Cite arXiv |arxiv=1907.12998 |class=cs.LG |first=Han |last=Zhang |first2=Xi |last2=Gao |title=Approximation Capabilities of Neural ODEs and Invertible Residual Networks |date=2019-07-30 |language=en}}</ref>

还可以为流<math>f</math>引入正则化损失，例如基于[[最优传输理论]]的正则化损失：<ref name="Finlay 3154–31642">{{Cite journal |last=Finlay |first=Chris |last2=Jacobsen |first2=Joern-Henrik |last3=Nurbekyan |first3=Levon |last4=Oberman |first4=Adam |title=How to Train Your Neural ODE: the World of Jacobian and Kinetic Regularization |url=https://proceedings.mlr.press/v119/finlay20a.html |journal=International Conference on Machine Learning |language=en |publisher=PMLR |date=2020-11-21 |page=3154–3164 |arxiv=2002.02798}}</ref>

:<math>\lambda_{K} \int_{0}^{T}\left\|f(z_t, t)\right\|^{2} dt
+\lambda_{J} \int_{0}^{T}\left\|\nabla_z f(z_t, t)\right\|_F^{2} dt
</math>

其中<math>\lambda_K, \lambda_J >0</math>是超参数。第一项惩罚模型随时间变化流场的振荡，第二项则惩罚模型随空间变化流场的振荡。这两项共同引导模型生成在空间和时间上平滑的流。

== 缺点 ==
尽管归一化流在估计高维概率密度函数方面取得了成功，但其设计仍然存在一些缺陷。首先，归一化流的潜空间并不是一个低维空间，因此基于流的模型默认情况下不支持数据压缩，需要很大的计算量。不过，仍有办法可以用它们进行图像压缩。<ref name="Lossy Image Compression with Normal">{{Cite arXiv |arxiv=2008.10486 |class=cs.CV |first=Leonhard |last=Helminger |first2=Abdelaziz |last2=Djelouah |title=Lossy Image Compression with Normalizing Flows}}</ref>

此外，基于流的模型在估计分布外样本（即非训练集分布中抽取的样本）的似然值时通常表现不佳。<ref>{{Cite arXiv |arxiv=1810.09136v3 |class=stat.ML |first=Eric |last=Nalisnick |first2=Teh |last2=Matsukawa |title=Do Deep Generative Models Know What They Don't Know?}}</ref>学者提出了一些假设来解释这一现象，其中包括典型集假设<ref>{{Cite arXiv |arxiv=1906.02994 |class=stat.ML |first=Eric |last=Nalisnick |first2=Teh |last2=Matsukawa |title=Detecting Out-of-Distribution Inputs to Deep Generative Models Using Typicality}}</ref>、模型训练中的估计问题<ref>{{Cite journal |last=Zhang |first=Lily |last2=Goldstein |first2=Mark |last3=Ranganath |first3=Rajesh |title=Understanding Failures in Out-of-Distribution Detection with Deep Generative Models |url=https://proceedings.mlr.press/v139/zhang21g.html |journal=Proceedings of Machine Learning Research |date=2021 |volume=139 |page=12427–12436 |pmc=9295254 |pmid=35860036}}</ref>或由于数据分布熵引起的基础性问题<ref>{{Cite arXiv |arxiv=2109.10794 |class=stat.ML |first=Anthony L. |last=Caterini |first2=Gabriel |last2=Loaiza-Ganem |title=Entropic Issues in Likelihood-Based OOD Detection |date=2022}}</ref> 。

归一化流最有趣的特性之一是其学习到的[[双射]]映射的可逆性。这一特性通过模型设计中的约束得以保证，从而确保理论上的可逆性。逆映射对确保变量变换定理的适用性、雅可比行列式的计算以及模型采样都至关重要。然而在实践中，由于数值不精确性，这种可逆性可能被破坏，进而导致逆映射爆炸。<ref>{{Cite arXiv |arxiv=2006.09347 |class=cs.LG |first=Jens |last=Behrmann |first2=Paul |last2=Vicol |title=Understanding and Mitigating Exploding Inverses in Invertible Neural Networks}}</ref>

== 应用 ==
基于流的生成模型已应用于多种场景，例如：

* 音频生成<ref>{{Cite arXiv |arxiv=1912.01219 |class=cs.SD |first=Wei |last=Ping |first2=Kainan |last2=Peng |title=WaveFlow: A Compact Flow-based Model for Raw Audio}}</ref>
* 图像生成<ref name="glow3">{{Cite arXiv |arxiv=1807.03039 |class=stat.ML |first=Diederik P. |last=Kingma |first2=Prafulla |last2=Dhariwal |title=Glow: Generative Flow with Invertible 1x1 Convolutions}}</ref>
* 分子图生成<ref>{{Cite arXiv |arxiv=2001.09382 |class=cs.LG |first=Chence |last=Shi |first2=Minkai |last2=Xu |title=GraphAF: A Flow-based Autoregressive Model for Molecular Graph Generation}}</ref>
* 点云建模<ref>{{Cite arXiv |arxiv=1906.12320 |class=cs.CV |first=Guandao |last=Yang |first2=Xun |last2=Huang |title=PointFlow: 3D Point Cloud Generation with Continuous Normalizing Flows}}</ref>
* 视频生成<ref>{{Cite arXiv |arxiv=1903.01434 |class=cs.CV |first=Manoj |last=Kumar |first2=Mohammad |last2=Babaeizadeh |title=VideoFlow: A Conditional Flow-Based Model for Stochastic Video Generation}}</ref>
* 有损图像压缩<ref name="Lossy Image Compression with Normal2">{{Cite arXiv |arxiv=2008.10486 |class=cs.CV |first=Leonhard |last=Helminger |first2=Abdelaziz |last2=Djelouah |title=Lossy Image Compression with Normalizing Flows}}</ref>
* 异常检测<ref>{{Cite arXiv |arxiv=2008.12577 |class=cs.CV |first=Marco |last=Rudolph |first2=Bastian |last2=Wandt |title=Same Same But DifferNet: Semi-Supervised Defect Detection with Normalizing Flows}}</ref>

== 参考文献 ==
{{reflist|30em}}

{{生成式人工智能}}

[[Category:机器学习]]
[[Category:统计模型]]
[[Category:概率模型]]
[[Category:生成式人工智能]]