查看“︁变分自编码器”︁的源代码

{{机器学习导航栏}}

[[机器学习]]中，'''变分自编码器（Variational Autoencoder，VAE）'''是由Diederik P. Kingma和Max Welling提出的一种[[人工神经网络]]结构，属于概率[[图模式]]和[[变分贝叶斯方法]]。<ref>{{cite book |first1=Lucas |last1=Pinheiro Cinelli |first2=Matheus |last2=Araújo Marins |first3=Eduardo Antônio |last3=Barros da Silva |first4=Sérgio |last4=Lima Netto |display-authors=1 |title=Variational Methods for Machine Learning with Applications to Deep Networks |location= |publisher=Springer |year=2021 |pages=111–149 |chapter=Variational Autoencoder |isbn=978-3-030-70681-4 |chapter-url=https://www.google.com/books/edition/Variational_Methods_for_Machine_Learning/N5EtEAAAQBAJ?hl=en&gbpv=1&pg=PA111 |doi=10.1007/978-3-030-70679-1_5 |s2cid=240802776 }}</ref>

VAE与[[自编码器]]模型有关，因为两者在结构上有一定亲和力，但在目标和数学表述上有很大区别。VAE属于概率生成模型（Probabilistic Generative Model），神经网络仅是其中的一个组件，依照功能的不同又可分为编码器和解码器。编码器可将输入变量映射到与变分分布的参数相对应的潜空间（Latent Space），这样便可以产生多个遵循同一分布的不同样本。解码器的功能基本相反，是从潜空间映射回输入空间，以生成数据点。虽然噪声模型的方差可以单独学习而来，但它们通常都是用重参数化技巧（Reparameterization Trick）来训练的。

此类模型最初是为[[无监督学习]]设计的，<ref>{{cite arXiv |last1=Dilokthanakul |first1=Nat |last2=Mediano |first2=Pedro A. M. |last3=Garnelo |first3=Marta |last4=Lee |first4=Matthew C. H. |last5=Salimbeni |first5=Hugh |last6=Arulkumaran |first6=Kai |last7=Shanahan |first7=Murray |title=Deep Unsupervised Clustering with Gaussian Mixture Variational Autoencoders |date=2017-01-13 |class=cs.LG |eprint=1611.02648}}</ref><ref>{{cite book |last1=Hsu |first1=Wei-Ning |last2=Zhang |first2=Yu |last3=Glass |first3=James |title=2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) |chapter=Unsupervised domain adaptation for robust speech recognition via variational autoencoder-based data augmentation |date=December 2017 |pages=16–23 |doi=10.1109/ASRU.2017.8268911 |arxiv=1707.06265 |isbn=978-1-5090-4788-8 |s2cid=22681625 |chapter-url=https://ieeexplore.ieee.org/abstract/document/8268911?casa_token=i8S9DzueB5gAAAAA:SnZUh5mfUYtRpusQLMJxN7eC_-6-qOQs9vpkEcA0Ai_ju-nJH7o1H1DN6nDFdeCY-LgGg3OVKQ |access-date=2023-02-24 |archive-date=2021-08-28 |archive-url=https://web.archive.org/web/20210828043431/https://ieeexplore.ieee.org/abstract/document/8268911?casa_token=i8S9DzueB5gAAAAA:SnZUh5mfUYtRpusQLMJxN7eC_-6-qOQs9vpkEcA0Ai_ju-nJH7o1H1DN6nDFdeCY-LgGg3OVKQ |dead-url=no }}</ref>但在[[半监督学习]]<ref>{{cite book |last1=Ehsan Abbasnejad |first1=M. |last2=Dick |first2=Anthony |last3=van den Hengel |first3=Anton |title=Infinite Variational Autoencoder for Semi-Supervised Learning |date=2017 |pages=5888–5897 |url=https://openaccess.thecvf.com/content_cvpr_2017/html/Abbasnejad_Infinite_Variational_Autoencoder_CVPR_2017_paper.html |access-date=2023-02-24 |archive-date=2021-06-24 |archive-url=https://web.archive.org/web/20210624195246/https://openaccess.thecvf.com/content_cvpr_2017/html/Abbasnejad_Infinite_Variational_Autoencoder_CVPR_2017_paper.html |dead-url=no }}</ref><ref>{{cite journal |last1=Xu |first1=Weidi |last2=Sun |first2=Haoze |last3=Deng |first3=Chao |last4=Tan |first4=Ying |title=Variational Autoencoder for Semi-Supervised Text Classification |journal=Proceedings of the AAAI Conference on Artificial Intelligence |date=2017-02-12 |volume=31 |issue=1 |doi=10.1609/aaai.v31i1.10966 |s2cid=2060721 |url=https://ojs.aaai.org/index.php/AAAI/article/view/10966 |language=en |doi-access=free |access-date=2023-02-24 |archive-date=2021-06-16 |archive-url=https://web.archive.org/web/20210616043347/https://ojs.aaai.org/index.php/AAAI/article/view/10966 |dead-url=no }}</ref>和[[监督学习]]中也表现出卓越的有效性。<ref>{{cite journal |last1=Kameoka |first1=Hirokazu |last2=Li |first2=Li |last3=Inoue |first3=Shota |last4=Makino |first4=Shoji |title=Supervised Determined Source Separation with Multichannel Variational Autoencoder |journal=Neural Computation |date=2019-09-01 |volume=31 |issue=9 |pages=1891–1914 |doi=10.1162/neco_a_01217 |pmid=31335290 |s2cid=198168155 |url=https://direct.mit.edu/neco/article/31/9/1891/8494/Supervised-Determined-Source-Separation-with |access-date=2023-02-24 |archive-date=2021-06-16 |archive-url=https://web.archive.org/web/20210616043338/https://direct.mit.edu/neco/article/31/9/1891/8494/Supervised-Determined-Source-Separation-with |dead-url=no }}</ref>

==结构与操作概述==
VAE是一个分别具有先验和噪声分布的生成模型，一般用最大期望算法（Expectation-Maximization meta-algorithm）来训练。这样可以优化数据似然的下限，用其它方法很难实现这点，且需要q分布或变分后验。这些q分布通常在一个单独的优化过程中为每个单独数据点设定参数；而VAE则用神经网络作为一种摊销手段来联合优化各个数据点，将数据点本身作为输入，输出变分分布的参数。从一个已知的输入空间映射到低维潜空间，这是一种编码过程，因此这张神经网络也叫“编码器”。

解码器则从潜空间映射回输入空间，如作为噪声分布的平均值。也可以用另一个映射到方差的神经网络，为简单起见一般都省略掉了。这时，方差可以用梯度下降法进行优化。

优化模型常用的两个术语是“重构误差（reconstruction error）”和“[[KL散度]]”。它们都来自概率模型的自由能表达式（Free Energy Expression ），因而根据噪声分布和数据的假定先验而有所不同。例如，像IMAGENET这样的标准VAE任务一般都假设具有高斯分布噪声，但二值化的MNIST这样的任务则需要伯努利噪声。自由能表达式中的KL散度使得与p分布重叠的q分布的概率质量最大化，但这样可能导致出现搜寻模态（Mode-Seeking Behaviour）。自由能表达式的剩余部分是“重构”项，需要用采样逼近来计算其期望。<ref>{{cite arXiv|last1=Kingma |first1=Diederik |title=Autoencoding Variational Bayes |eprint=1312.6114 |date=2013|class=stat.ML }}</ref>

==系统阐述==
[[File:VAE Basic.png|thumb|425x425px|VAE的基本框架。模型接受<math>x</math>为输入。编码器将其压缩到潜空间。解码器以在潜空间采样的信息为输入，并产生<math>{x'}</math>，使其与<math>x</math>尽可能相似。]]
从建立概率模型的角度来看，人们希望用他们选择的参数化概率分布<math>p_{\theta}(x) = p(x|\theta)</math>使数据<math>x</math>的概率最大化。这一分布常是高斯分布<math>N(x|\mu,\sigma)</math>，分别参数化为<math>\mu</math>和<math>\sigma</math>，作为指数族的一员很容易作为噪声分布来处理。简单的分布很容易最大化，但如果假设了潜质（latent）<math>z</math>的先验分布，可能会产生难以解决的积分。让我们通过对<math>z</math>的[[边缘分布|边缘化]]找到<math>p_\theta(x)</math>。
: <math>p_\theta(x) = \int_{z}p_\theta({x,z}) \, dz, </math>

其中，<math>p_\theta({x,z})</math>表示可观测数据<math> x </math>于<math>p_\theta</math>下的[[联合分布]]，和在潜空间中的形式（也就是编码后的<math> z </math>）。根据[[连锁法则]]，方程可以改写为

: <math>p_\theta(x) = \int_{z}p_\theta({x| z})p_\theta(z) \, dz</math>

在原始的VAE中，通常认为<math>z</math>是实数的有限维向量，<math>p_\theta({x|z})</math>则是[[高斯分布]]。那么<math>p_\theta(x)</math>便是高斯分布的混合物。

现在，可将输入数据和其在潜空间中的表示的映射定义为
* 先验<math>p_\theta(z)</math>
* 似然值<math>p_\theta(x|z)</math>
* 后验<math>p_\theta(z|x)</math>

不幸的是，对<math>p_\theta(x)</math>的计算十分困难。为了加快计算速度，有必要再引入一个函数，将后验分布近似为

:<math>q_\phi({z| x}) \approx p_\theta({z| x})</math>

其中<math>\phi</math>是参数化的<math>q</math>的实值集合。这有时也被称为“摊销推理”（amortized inference），因为可以通过“投资”找到好的<math>q_\phi</math>，之后不用积分便可以从<math>x</math>快速推断出<math>z</math>。

这样，问题就变成了找到一个好的概率自编码器，其中条件似然分布<math>p_\theta(x|z)</math>由概率解码器（probabilistic decoder）计算得到，后验分布近似<math>q_\phi(z|x)</math>由概率编码器（probabilistic encoder）计算得到。

下面将编码器参数化为<math>E_\phi</math>，将解码器参数化为<math>D_\theta</math>。

==证据下界（Evidence lower bound，ELBO）==
如同每个[[深度学习]]问题，为了通过[[反向传播算法]]更新神经网络的权重，需要定义一个可微损失函数。

对于VAE，这一思想可以实现为联合优化生成模型参数<math>\theta</math>和<math>\phi</math>，以减少输入输出间的重构误差，并使<math>q_\phi({z| x})</math>尽可能接近<math>p_\theta(z|x)</math>。重构损失常用[[均方误差]]和[[交叉熵]]。

作为两个分布之间的距离损失，反向KL散度<math>D_{KL}(q_\phi({z| x})\parallel p_\theta({z| x}))</math>可以很有效地将<math>q_\phi({z| x})</math>挤压到<math>p_\theta(z|x)</math>之下。<ref name=":0">{{cite arXiv |last1=Kingma |first1=Diederik P. |last2=Welling |first2=Max |title=Auto-Encoding Variational Bayes |date=2014-05-01 |class=stat.ML |eprint=1312.6114}}</ref><ref>{{cite news |title=From Autoencoder to Beta-VAE |url=https://lilianweng.github.io/lil-log/2018/08/12/from-autoencoder-to-beta-vae.html |website=Lil'Log |language=en |date=2018-08-12 |accessdate=2023-02-24 |archive-date=2021-05-14 |archive-url=https://web.archive.org/web/20210514164202/https://lilianweng.github.io/lil-log/2018/08/12/from-autoencoder-to-beta-vae.html |dead-url=no }}</ref>

刚刚定义的距离损失可扩展为

: <math>\begin{align}
D_{KL}(q_\phi({z| x})\parallel p_\theta({z| x})) &= \mathbb E_{z \sim q_\phi(\cdot | x)} \left[\ln \frac{q_\phi(z|x)}{p_\theta(z|x)}\right]\\
&= \mathbb E_{z \sim q_\phi(\cdot | x)} \left[\ln \frac{q_\phi({z| x})p_\theta(x)}{p_\theta(x, z)}\right]\\
&=\ln p_\theta(x) + \mathbb E_{z \sim q_\phi(\cdot | x)} \left[\ln \frac{q_\phi({z| x})}{p_\theta(x, z)}\right]
\end{align}</math>

现在定义证据下界（Evidence lower bound，ELBO）：<math display="block">L_{\theta,\phi}(x) := 
\mathbb E_{z \sim q_\phi(\cdot | x)} \left[\ln \frac{p_\theta(x, z)}{q_\phi({z| x})}\right] 
= \ln p_\theta(x) - D_{KL}(q_\phi({\cdot| x})\parallel p_\theta({\cdot | x})) </math>使ELBO最大化<math display="block">\theta^*,\phi^* = \underset{\theta,\phi}\operatorname{arg max} \, L_{\theta,\phi}(x) </math>等于同时最大化<math>\ln p_\theta(x) </math>、最小化<math> D_{KL}(q_\phi({z| x})\parallel p_\theta({z| x}))  </math>。即，最大化观测数据似然的对数值，同时最小化近似后验<math>q_\phi(\cdot | x) </math>与精确后验<math>p_\theta(\cdot | x) </math>的差值。

给出的形式不大方便进行最大化，可以用下面的等价形式：<math display="block">L_{\theta,\phi}(x) = \mathbb E_{z \sim q_\phi(\cdot | x)} \left[\ln p_\theta(x|z)\right] - D_{KL}(q_\phi({\cdot| x})\parallel p_\theta(\cdot)) </math>其中<math>\ln p_\theta(x|z)</math>实现为<math>\| x - D_\theta(z)\|^2_2</math>，因为这是在加性常数的前提下<math>x \sim \mathcal N(D_\theta(z), I)</math>得到的东西。也就是说，我们把<math>x</math>在<math>z</math>上的条件分布建模为以<math>D_\theta(z)</math>为中心的高斯分布。<math>q_\phi(z |x)</math>和<math>p_\theta(z)</math>的分布通常也被选为高斯分布，因为<math>z|x \sim \mathcal(E_\phi(x), \sigma_\phi(x)^2I)</math>和<math>z \sim \mathcal(0, I)</math>可以通过高斯分布的KL散度公式得到：<math display="block">L_{\theta,\phi}(x) = -\frac 12\mathbb E_{z \sim q_\phi(\cdot | x)} \left[ \|x - D_\theta(z)\|_2^2\right] - \frac 12 \left( N\sigma_\phi(x)^2 + \|E_\phi(x)\|_2^2 - 2N\ln\sigma_\phi(x) \right) + Const </math>

==重参数化==
[[File:Reparameterization Trick.png|thumb|300x300px|重参数化技巧方案。随机变量<math>{\varepsilon}</math>可作为外部输入注入潜空间<math>z</math>，这样一来便可以不更新随机变量，而反向传播梯度。]]
有效搜索到<math display="block">\theta^*,\phi^* = \underset{\theta,\phi}\operatorname{arg max} \, L_{\theta,\phi}(x) </math>的典型方法是[[梯度下降法]]。

它可以很直接地找到<math display="block">\nabla_\theta \mathbb E_{z \sim q_\phi(\cdot | x)} \left[\ln \frac{p_\theta(x, z)}{q_\phi({z| x})}\right]
= \mathbb E_{z \sim q_\phi(\cdot | x)} \left[ \nabla_\theta  \ln \frac{p_\theta(x, z)}{q_\phi({z| x})}\right]  </math>但是，<math display="block">\nabla_\phi \mathbb E_{z \sim q_\phi(\cdot | x)} \left[\ln \frac{p_\theta(x, z)}{q_\phi({z| x})}\right]  </math>不允许将<math>\nabla_\phi </math>置于期望中，因为<math>\phi </math>出现在概率分布本身之中。'''重参数化技巧'''（也被称为随机反向传播<ref>{{Cite journal |last1=Rezende |first1=Danilo Jimenez |last2=Mohamed |first2=Shakir |last3=Wierstra |first3=Daan |date=2014-06-18 |title=Stochastic Backpropagation and Approximate Inference in Deep Generative Models |url=https://proceedings.mlr.press/v32/rezende14.html |journal=International Conference on Machine Learning |language=en |publisher=PMLR |pages=1278–1286 |arxiv=1401.4082 |access-date=2023-02-24 |archive-date=2023-02-24 |archive-url=https://web.archive.org/web/20230224094947/https://proceedings.mlr.press/v32/rezende14.html |dead-url=no }}</ref>）则绕过了这个难点。<ref name=":0" /><ref>{{Cite journal|last1=Bengio|first1=Yoshua|last2=Courville|first2=Aaron|last3=Vincent|first3=Pascal|title=Representation Learning: A Review and New Perspectives|url=https://ieeexplore.ieee.org/abstract/document/6472238?casa_token=wQPK9gUGfCsAAAAA:FS5uNYCQVJGH-bq-kVvZeTdnQ8a33C6qQ4VUyDyGLMO13QewH3wcry9_Jh-5FATvspBj8YOXfw|journal=IEEE Transactions on Pattern Analysis and Machine Intelligence|year=2013|volume=35|issue=8|pages=1798–1828|doi=10.1109/TPAMI.2013.50|pmid=23787338|issn=1939-3539|arxiv=1206.5538|s2cid=393948|access-date=2023-02-24|archive-date=2021-06-27|archive-url=https://web.archive.org/web/20210627195543/https://ieeexplore.ieee.org/abstract/document/6472238?casa_token=wQPK9gUGfCsAAAAA:FS5uNYCQVJGH-bq-kVvZeTdnQ8a33C6qQ4VUyDyGLMO13QewH3wcry9_Jh-5FATvspBj8YOXfw|dead-url=no}}</ref><ref>{{Cite arXiv|last1=Kingma|first1=Diederik P.|last2=Rezende|first2=Danilo J.|last3=Mohamed|first3=Shakir|last4=Welling|first4=Max|date=2014-10-31|title=Semi-Supervised Learning with Deep Generative Models|class=cs.LG|eprint=1406.5298}}</ref>

最重要的例子是当<math>z \sim q_\phi(\cdot | x)  </math>遵循正态分布时，如<math>\mathcal N(\mu_\phi(x), \Sigma_\phi(x))  </math>。

: [[File:Reparameterized Variational Autoencoder.png|thumb|重参数化技巧之后的VAE方案|300x300px]]

可以通过让<math>\boldsymbol{\varepsilon} \sim \mathcal{N}(0, \boldsymbol{I})</math>构成“标准[[随机数生成]]器”来实现重参数化，并将<math>z   </math>构建为<math>z = \mu_\phi(x)  + L_\phi(x)\epsilon  </math>。这里，<math>L_\phi(x)  </math>通过[[科列斯基分解]]得到：<math display="block">\Sigma_\phi(x) = L_\phi(x)L_\phi(x)^T  </math>接着我们有<math display="block">\nabla_\phi \mathbb E_{z \sim q_\phi(\cdot | x)} \left[\ln \frac{p_\theta(x, z)}{q_\phi({z| x})}\right] 
= 
\mathbb {E}_{\epsilon}\left[ \nabla_\phi \ln {\frac {p_{\theta }(x, \mu_\phi(x)  + L_\phi(x)\epsilon)}{q_{\phi }(\mu_\phi(x)  + L_\phi(x)\epsilon | x)}}\right]  </math>由此，我们得到了梯度的无偏估计，这就可以应用[[随机梯度下降法]]了。

由于我们重参数化了<math>z</math>，所以需要找到<math>q_\phi(z|x)</math>。令<math>q_0</math>为<math>\epsilon</math>的概率密度函数，那么<math display="block">\ln q_\phi(z | x) = \ln q_0 (\epsilon) - \ln|\det(\partial_\epsilon z)|</math>，其中<math>\partial_\epsilon z</math>是<math>\epsilon</math>相对于<math>z</math>的雅可比矩阵。由于<math>z = \mu_\phi(x)  + L_\phi(x)\epsilon  </math>，这就是<math display="block">\ln q_\phi(z | x) = -\frac 12 \|\epsilon\|^2 - \ln|\det L_\phi(x)| - \frac n2 \ln(2\pi)</math>

==变体==
许多VAE的应用和扩展已被用来使其适应其他领域，并提升性能。

<math>\beta</math>-VAE是带加权KL散度的实现，用于自动发现并解释因子化的潜空间形式。这种实现可以对大于1的<math>\beta</math>值强制进行流形分解。这个架构可以在无监督下发现解耦的潜因子。<ref>{{Cite journal|last1=Higgins|first1=Irina|last2=Matthey|first2=Loic|last3=Pal|first3=Arka|last4=Burgess|first4=Christopher|last5=Glorot|first5=Xavier|last6=Botvinick|first6=Matthew|last7=Mohamed|first7=Shakir|last8=Lerchner|first8=Alexander|date=2016-11-04|title=beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework|url=https://openreview.net/forum?id=Sy2fzU9gl|language=en|access-date=2023-02-24|archive-date=2021-07-20|archive-url=https://web.archive.org/web/20210720082052/https://openreview.net/forum?id=Sy2fzU9gl|dead-url=no}}</ref><ref>{{Cite arXiv|last1=Burgess|first1=Christopher P.|last2=Higgins|first2=Irina|last3=Pal|first3=Arka|last4=Matthey|first4=Loic|last5=Watters|first5=Nick|last6=Desjardins|first6=Guillaume|last7=Lerchner|first7=Alexander|date=2018-04-10|title=Understanding disentangling in &beta;-VAE|class=stat.ML|eprint=1804.03599}}</ref>

条件性VAE（CVAE）在潜空间中插入标签信息，强制对所学数据进行确定性约束表示（Deterministic Constrained Representation）。<ref>{{Cite journal|last1=Sohn|first1=Kihyuk|last2=Lee|first2=Honglak|last3=Yan|first3=Xinchen|date=2015-01-01|title=Learning Structured Output Representation using Deep Conditional Generative Models|url=https://proceedings.neurips.cc/paper/2015/file/8d55a249e6baa5c06772297520da2051-Paper.pdf|language=en|access-date=2023-02-24|archive-date=2021-07-09|archive-url=https://web.archive.org/web/20210709183454/https://proceedings.neurips.cc/paper/2015/file/8d55a249e6baa5c06772297520da2051-Paper.pdf|dead-url=no}}</ref>

一些结构可以直接处理生成样本的质量，<ref>{{Cite arXiv|last1=Dai|first1=Bin|last2=Wipf|first2=David|date=2019-10-30|title=Diagnosing and Enhancing VAE Models|class=cs.LG|eprint=1903.05789}}</ref><ref>{{Cite arXiv|last1=Dorta|first1=Garoe|last2=Vicente|first2=Sara|last3=Agapito|first3=Lourdes|last4=Campbell|first4=Neill D. F.|last5=Simpson|first5=Ivor|date=2018-07-31|title=Training VAEs Under Structured Residuals|class=stat.ML|eprint=1804.01050}}</ref>或实现多个潜空间，以进一步改善[[表征学习]]的效果。<ref>{{Cite journal|last1=Tomczak|first1=Jakub|last2=Welling|first2=Max|date=2018-03-31|title=VAE with a VampPrior|url=http://proceedings.mlr.press/v84/tomczak18a.html|journal=International Conference on Artificial Intelligence and Statistics|language=en|publisher=PMLR|pages=1214–1223|arxiv=1705.07120|access-date=2023-02-24|archive-date=2021-06-24|archive-url=https://web.archive.org/web/20210624200636/http://proceedings.mlr.press/v84/tomczak18a.html|dead-url=no}}</ref><ref>{{Cite arXiv|last1=Razavi|first1=Ali|last2=Oord|first2=Aaron van den|last3=Vinyals|first3=Oriol|date=2019-06-02|title=Generating Diverse High-Fidelity Images with VQ-VAE-2|class=cs.LG|eprint=1906.00446}}</ref>

一些结构将VAE和[[生成对抗网络]]混合起来，以获得混合模型。<ref>{{Cite journal|last1=Larsen|first1=Anders Boesen Lindbo|last2=Sønderby|first2=Søren Kaae|last3=Larochelle|first3=Hugo|last4=Winther|first4=Ole|date=2016-06-11|title=Autoencoding beyond pixels using a learned similarity metric|url=http://proceedings.mlr.press/v48/larsen16.html|journal=International Conference on Machine Learning|language=en|publisher=PMLR|pages=1558–1566|arxiv=1512.09300|access-date=2023-02-24|archive-date=2021-05-17|archive-url=https://web.archive.org/web/20210517131233/https://proceedings.mlr.press/v48/larsen16.html|dead-url=no}}</ref><ref>{{cite arXiv|last1=Bao|first1=Jianmin|last2=Chen|first2=Dong|last3=Wen|first3=Fang|last4=Li|first4=Houqiang|last5=Hua|first5=Gang|date=2017|title=CVAE-GAN: Fine-Grained Image Generation Through Asymmetric Training|pages=2745–2754|class=cs.CV|eprint=1703.10155}}</ref><ref>{{Cite journal|last1=Gao|first1=Rui|last2=Hou|first2=Xingsong|last3=Qin|first3=Jie|last4=Chen|first4=Jiaxin|last5=Liu|first5=Li|last6=Zhu|first6=Fan|last7=Zhang|first7=Zhao|last8=Shao|first8=Ling|date=2020|title=Zero-VAE-GAN: Generating Unseen Features for Generalized and Transductive Zero-Shot Learning|url=https://ieeexplore.ieee.org/abstract/document/8957359?casa_token=d6k1X5ClbTsAAAAA:AiOSfZQ7S3EsfIaecikiuLX8Y9-Lf5FHqTFRjL-FMQQ8bNjdW2rD0UZxA0BC4gVMO0QjF_YXkw|journal=IEEE Transactions on Image Processing|volume=29|pages=3665–3680|doi=10.1109/TIP.2020.2964429|pmid=31940538|bibcode=2020ITIP...29.3665G|s2cid=210334032|issn=1941-0042|access-date=2023-02-24|archive-date=2021-06-28|archive-url=https://web.archive.org/web/20210628151541/https://ieeexplore.ieee.org/abstract/document/8957359?casa_token=d6k1X5ClbTsAAAAA:AiOSfZQ7S3EsfIaecikiuLX8Y9-Lf5FHqTFRjL-FMQQ8bNjdW2rD0UZxA0BC4gVMO0QjF_YXkw|dead-url=no}}</ref>

==另见==
{{div col|colwidth=22em}}
* [[自编码器]]
* [[人工神经网络]]
* [[深度学习]]
* [[生成对抗网络]]
* [[表征学习]]
* [[稀松字典学习]]
* [[数据增强]]
* [[反向传播算法]]
{{div col end}}

==参考==
{{reflist|2}}

{{Differentiable computing}}

[[Category:图模式]]
[[Category:贝叶斯统计]]
[[Category:神經網路架構]]