查看“︁残差神经网络”︁的源代码

{{NoteTA|G1=IT|1=zh-hans:激活函数; zh-hant:激勵函數;}}
{{机器学习导航栏}}
[[File:ResBlock.png|thumb|right|在残差神经网络中的一个残差块里，残差连接跳过了两个网络层。]]
'''残差神经网络'''（'''Residual Neural Network'''，简称'''ResNet'''）<ref name="resnet">{{Cite conference|last1=He|first1=Kaiming|last2=Zhang|first2=Xiangyu|last3=Ren|first3=Shaoqing|last4=Sun|first4=Jian|date=10 Dec 2015|title=Deep Residual Learning for Image Recognition|arxiv=1512.03385}}</ref>属于深度学习模型的一种，其核心在于让网络的每一层不直接学习预期输出，而是学习与输入之间的残差关系。这种网络通过添加“跳跃连接”，即跳过某些网络层的连接来实现身份映射，再与网络层的输出相加合并。其运作机制与{{Le|高速神经网络|Highway network}}类似，通过极大的正偏置权重来打开“门控”。<ref name="highway2015may" />  这一设计使得拥有几十上百层的深度学习模型可以更易于训练，增加模型深度时还能保持甚至提高准确度。所谓的“残差连接”即“直连跳过”，这一概念也被应用于1997年的[[長短期記憶|长短期记忆]]模型LSTM、<ref name="lstm1997" /> [[Transformer模型]]（比如[[BERT]]和[[基于转换器的生成式预训练模型|GPT]]系列，[[ChatGPT]]等）、[[AlphaGo Zero]]、{{Le|AlphaStar|AlphaStar (software)}}以及[[AlphaFold]]等。

残差神经网络由[[何恺明]]、张祥雨、任少卿和孙剑开发，这一成果在2015年的[[ImageNet]]大规模视觉识别挑战赛中夺冠。<ref name="imagenet">{{cite journal |title=ImageNet: A large-scale hierarchical image database |first1=Jia |last1=Deng |first2=Wei |last2=Dong |first3=Richard |last3=Socher |first4=Li-Jia |last4=Li |first5=Kai |last5=Li |first6=Li |last6=Fei-Fei |journal=CVPR |year=2009 |url=https://scholar.google.com/citations?view_op=view_citation&hl=en&user=rDfyQnIAAAAJ&citation_for_view=rDfyQnIAAAAJ:qjMakFHDy7sC |access-date=2024-01-27 |archive-date=2019-09-29 |archive-url=https://web.archive.org/web/20190929005334/https://scholar.google.com/citations?view_op=view_citation&hl=en&user=rDfyQnIAAAAJ&citation_for_view=rDfyQnIAAAAJ:qjMakFHDy7sC |dead-url=no }}</ref><ref name="ilsvrc2015">{{Cite web|url=https://image-net.org/challenges/LSVRC/2015/results.php|title=ILSVRC2015 Results|website=image-net.org|access-date=2024-01-27|archive-date=2023-09-29|archive-url=https://web.archive.org/web/20230929104011/https://www.image-net.org/challenges/LSVRC/2015/results.php|dead-url=no}}</ref>

== 基本原理 ==

=== 背景介绍 ===
2012年，针对[[ImageNet]]竞赛开发的[[AlexNet]]模型是一个包含8层的[[卷积神经网络]]。到了2014年，[[牛津大学]]的视觉几何组（VGG）通过叠加3x3卷积层将网络深度增加到了19层。<ref name="vggnet">{{cite arXiv  |eprint=1409.1556 |title=Very Deep Convolutional Networks for Large-Scale Image Recognition  |first1=Karen |last1=Simonyan |first2=Andrew |last2=Zisserman | year=2014|class=cs.CV }}</ref>
但是，层级的增加却导致训练精度的迅速下降，<ref name="prelu">{{cite arXiv  |eprint=1502.01852 |last1=He |first1=Kaiming |title=Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification |last2=Zhang |first2=Xiangyu |last3=Ren |first3=Shaoqing |last4=Sun |first4=Jian |year=2016|class=cs.CV }}</ref> 这种现象被称为“性能退化”问题。<ref name="resnet" />

理论上，如果一个更深的网络仅仅是通过在一个较浅网络的基础上增加额外层来构建的，那么这个更深的网络不应该比其较浅的网络有更高的训练损失。<ref name="resnet" /> 如果这些额外层具有身份映射的能力，那么更深的网络应该能够实现与其较浅网络相同的功能。但这里存在一个假设，即优化器不能有效地将这些参数化的网络层调整为身份映射。

=== 残差学习 ===

在多层神经网络模型里，设想一个包含若干层的子网络。这个子网络的函数用<math display="inline">H(x)</math>来表示，其中<math display="inline">x</math>是子网络的输入。残差学习是通过重新设定这个子网络的参数，让参数层表达一个残差函数<math display="inline">F(x):=H(x)-x</math>。因此，这个子网络的输出<math display="inline">y</math>可以表示为：

: <math id="residual">
\begin{align} y & = F(x) + x
\end{align}</math>

这一原理同样适用于1997年提出的[[長短期記憶|长短期记忆]]LSTM单元，<ref name="lstm1997"/> 在{{Le|随时间反向传播|Backpropagation through time}}里计算<math display="inline"> y_{t+1} = F(x_{t}) + x_t </math>，简化为<math display="inline">y = F(x) + x</math>。

函数<math display="inline">F(x)</math>常通过矩阵乘法实现，并结合[[激活函数]]以及规范化操作（如{{Le|批量规范化|Batch normalization}}或层规范化）。

这类子网络被称作“残差块”。<ref name="resnet" /> 通过叠加这样的残差块，形成深度残差网络。

在"<math display="inline">y = F(x) + x</math>"公式中的"<math display="inline">+\ x</math>"操作是通过一个相当于恒等映射的跳跃连接来完成，它将残差块的输入直接与输出连接。在随后的研究中，这种连接常被称作“残差连接”。<ref name="inceptionv4">{{cite arXiv  |eprint=1602.07261 |title= Inception-v4, Inception-ResNet and the impact of residual connections on learning | first1=Christian |last1=Szegedy | first2=Sergey | last2=Ioffe | first3=Vincent  |last3=Vanhoucke | first4=Alex   |last4=Alemi |year=2016|class= cs.CV }}</ref>

=== 信号传递 ===

身份映射的引入有利于信号在前向传播路径和反向传播路径中的传递。<ref name="resnetv2">{{cite arXiv  |eprint=1603.05027 |title=Identity Mappings in Deep Residual Networks |last1=He |first1=Kaiming |last2=Zhang |first2=Xiangyu |last3=Ren |first3=Shaoqing |last4=Sun |first4=Jian | year=2015|class=cs.CV }}</ref>

==== 向前传播 ====
如果第<math display="inline">\ell</math>个残差块的输出是第<math display="inline">(\ell+1)</math>个残差块的输入（这里假设块与块之间没有激活函数），可以得到：<ref name="resnetv2" />

: <math id="forward1">
\begin{align} x_{\ell+1} & = F(x_{\ell}) + x_{\ell}
\end{align}</math>

若递归应用此公式，例如，<math id="forward2">
\begin{align} x_{\ell+2}
= F(x_{\ell+1}) + x_{\ell+1}
= F(x_{\ell+1}) + F(x_{\ell}) + x_{\ell}
\end{align}</math>，可以推导出：

: <math id="forward3">
\begin{align} x_{L}
& = x_{\ell} + \sum_{i=l}^{L-1} F(x_{i}) \\
\end{align}</math>

这里<math display="inline">L</math>表示任意后续残差块的索引（比如处于最末尾的块），<math display="inline">\ell</math>代表任意靠前的块对应的索引。该公式说明了总有一个信号能够直接从浅层块<math display="inline">\ell</math>传递到深层块<math display="inline">L</math>。

==== 反向传播 ====

残差学习的公式还在一定程度上缓解了[[梯度消失问题]]。然而，梯度消失并不是导致性能退化问题的根源，因为通过引入规范化层（如批量规范化）可在一定程度上解决此问题。根据上面的前向传播过程，对<math display="inline">x_{\ell}</math>进行求导，可以得到：<ref name="resnetv2" />

: <math id="backward1">
\begin{align} \frac{\partial \mathcal{E} }{\partial x_{\ell} }
& = \frac{\partial \mathcal{E} }{\partial x_{L} }\frac{\partial x_{L} }{\partial x_{\ell} } \\
& = \frac{\partial \mathcal{E} }{\partial x_{L} } \left( 1 + \frac{\partial }{\partial x_{\ell} } \sum_{i=l}^{L-1} F(x_{i}) \right) \\
& = \frac{\partial \mathcal{E} }{\partial x_{L} }  + \frac{\partial \mathcal{E} }{\partial x_{L} } \frac{\partial }{\partial x_{\ell} } \sum_{i=l}^{L-1} F(x_{i})  \\
\end{align}</math>

这里<math display="inline"> \mathcal{E} </math> 是最小化损失函数。以上表明，浅层的梯度计算<math display="inline">\frac{\partial \mathcal{E} }{\partial x_{\ell} }</math>总会直接加上一个项<math display="inline">\frac{\partial \mathcal{E} }{\partial x_{L} }</math>。因此，由于额外项<math display="inline">\frac{\partial \mathcal{E} }{\partial x_{L} }</math>的存在，即使<math display="inline"> F(x_{i}) </math>的梯度很小，总梯度<math display="inline">\frac{\partial \mathcal{E} }{\partial x_{\ell} }</math>也不会消失。

== 残差块 ==

[[File:ResBlockVariants.png|thumb|right|两种类型的卷积残差块。左侧是基本块，它由两个3x3卷积层组成。右侧是瓶颈块，该块先通过一个1x1卷积层进行降维，接着是一个3x3卷积层，最后再通过一个1x1卷积层恢复原来的维度。]]

=== 基本残差块 ===
基本残差块是原始ResNet研究中最简单的部分。<ref name="resnet" /> 它包括两个串行的3x3[[卷积神经网络|卷积层]]以及一个残差连接。这两层的输入输出尺寸保持一致。

=== 瓶颈残差块 ===
瓶颈残差块包含三个串联的卷积层和一个残差连接。<ref name="resnet" /> 该块的第一层是1x1卷积，用于降维，比如降至输入维度的1/4；第二层是3x3卷积；最后一层是另一个1x1卷积，用于恢复维度。ResNet-50、ResNet-101和ResNet-152模型都基于瓶颈块构建。<ref name="resnet" />

=== 预激活残差块 ===

预激活残差块<ref name="resnetv2" />在应用残差函数<math display="inline">F</math>之前，先使用激活函数，如非线性和规范化的处理。预激活残差块的计算可以表述为：

: <math id="preact">
\begin{align} x_{\ell+1} & = F(\phi(x_{\ell})) + x_{\ell}
\end{align}</math>

这里的<math display="inline">\phi</math>可以是如[[线性整流函数]]等任意非线性激活或归一化操作。这种设计减少了残差块间非恒等映射的数量，被用于训练200层到1000多层的模型。<ref name="resnetv2" />

从[[GPT-2]]开始，[[Transformer模型|Transformer]]块常被用于预激活块，这在Transformer模型的相关文献中被称为“预规范化”。<ref name="gpt2paper">{{cite web
 |url          = https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
 |title        = Language models are unsupervised multitask learners
 |last1        = Radford
 |first1       = Alec
 |last2        = Wu
 |first2       = Jeffrey
 |last3        = Child
 |first3       = Rewon
 |last4        = Luan
 |first4       = David
 |last5        = Amodei
 |first5       = Dario
 |last6        = Sutskever
 |first6       = Ilya
 |date         = 14 February 2019
 |access-date  = 19 December 2020
 |archive-date = 6 February 2021
 |archive-url  = https://web.archive.org/web/20210206183945/https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
 |url-status   = live
}}</ref>

=== Transformer块 ===

[[File:Full_GPT_architecture.svg|thumb|right|原始[[基于转换器的生成式预训练模型|GPT]]模型采用的[[Transformer模型|Transformer]]架构是由两种类型的残差块构成：一个是多头注意力块，另一个是前馈的[[多层感知器]]（MLP）块。这种设计通过结合两个功能强大的残差块，使得Transformer能够高效地处理数据并学习复杂的特征，其中每个残差块都利用残差连接来促使信号在网络深层之间的流动以及更有效的进行梯度回传，克服了深度模型训练过程中遇到的梯度消失等问题。]]

Transformer块是由两个残差块组成，每个残差块都设有一个残差连接。

第一个残差块为多头注意力块，使用了自注意力运算，随后连接一个线性映射层。第二个残差块是一个前馈式的[[多层感知器]]（MLP）块，这个块在某种程度上像是一个“反向”的瓶颈块，它通过一个线性映射层（在卷积神经网络中相当于1x1的卷积）来扩大维度，然后通过另一个线性映射层来减少维度。

一个Transformer块包含了四层线性映射。[[GPT-3]]模型拥有96个这样的Transformer块（在Transformer领域的文献中，通常将一个Transformer块称作一个“Transformer层”）。因此，该模型包含了大约400层的映射层，包括Transformer块内的96x4层，以及一些额外的层用于输入嵌入和输出预测。

若没有残差连接，训练网络深度极高的Transformer模型将无法取得成功。<ref name="lose_rank">{{cite arXiv  |eprint=2103.03404 |title=Attention is not all you need: pure attention loses rank doubly exponentially with depth | first1=Yihe | last1=Dong | first2=Jean-Baptiste | last2=Cordonnier | first3=Andreas | last3=Loukas | year=2021|class=cs.LG }}</ref>

== 相关研究 ==

1961年，在[[弗兰克·罗森布拉特]]出版的书籍中介绍了一个含有跳跃连接的三层[[多层感知器]]（MLP）模型（详见第15章，第313页<ref name="mlpbook">{{cite book
	| last = Rosenblatt
	| first = Frank
	| author-link = 
	| date = 1961
	| title = Principles of neurodynamics. perceptrons and the theory of brain mechanisms
	| url = https://safari.ethz.ch/digitaltechnik/spring2018/lib/exe/fetch.php?media=neurodynamics1962rosenblatt.pdf#page=327
	| location = 
	| publisher = 
	| page = 
	| isbn = 
	| access-date = 2024-01-27
	| archive-date = 2023-05-04
	| archive-url = https://web.archive.org/web/20230504010603/https://safari.ethz.ch/digitaltechnik/spring2018/lib/exe/fetch.php?media=neurodynamics1962rosenblatt.pdf#page=327
	| dead-url = no
	}}</ref>）。这种模型被称作“交叉耦合系统”，其跳跃连接实际上是交叉耦合连接的一种形式。<ref name="mlpbook" />

在1994年<ref name="massbook">{{cite book
	| last1 = Venables
	| first1 = W. N.
	| last2 = Ripley
	| first2 = Brain D.
	| author-link = 
	| date = 1994
	| title = Modern Applied Statistics with S-Plus
	| url = https://books.google.com/books?id=ayDvAAAAMAAJ
	| location = 
	| publisher = Springer
	| page = 
	| isbn = 9783540943501
	| access-date = 2024-01-27
	| archive-date = 2023-08-22
	| archive-url = https://web.archive.org/web/20230822220153/https://books.google.com/books?id=ayDvAAAAMAAJ
	| dead-url = no
	}}</ref>和1996年<ref name="prnnbook">{{cite book
	| last = Ripley
	| first = B. D.
	| author-link = 
	| date = 1996
	| title = Pattern Recognition and Neural Networks
	| url = https://www.cambridge.org/core/books/pattern-recognition-and-neural-networks/4E038249C9BAA06C8F4EE6F044D09C5C
	| location = 
	| publisher = Cambridge University Press
	| page = 
	| isbn = 
	| access-date = 2024-01-27
	| archive-date = 2023-12-02
	| archive-url = https://web.archive.org/web/20231202035324/https://www.cambridge.org/core/books/pattern-recognition-and-neural-networks/4E038249C9BAA06C8F4EE6F044D09C5C
	| dead-url = no
	}}</ref>出版的书籍中提出了在前馈MLP模型中使用的“跳层”连接：“MLP一般允许存在多个隐藏层，并且支持从输入直接到输出的‘跳层’连接”（详见第261页<ref name="massbook" />，第144页<ref name="prnnbook" />），“...这使得非线性单元能够调整线性函数形式”（详见第262页<ref name="massbook" />）。这种说法其实已经暗示了非线性MLP的表现就像是在一个线性函数上加上了一个残差函数。

{{Le|塞普·霍赫赖特|Sepp Hochreiter}}在1991年分析了[[梯度消失问题]]，并认为这是[[深度学习]]效果不佳的原因。<ref name="hochreiter1991">{{cite thesis
|url=http://www.bioinf.jku.at/publications/older/3804.pdf
|degree=diploma
|first=Sepp
|last=Hochreiter
|title=Untersuchungen zu dynamischen neuronalen Netzen
|publisher=Technical University Munich, Institute of Computer Science, advisor: J. Schmidhuber
|year=1991
|access-date=2024-01-27
|archive-date=2023-03-20
|archive-url=https://web.archive.org/web/20230320225540/http://www.bioinf.jku.at/publications/older/3804.pdf
|dead-url=no
}}</ref> 为解决这一问题，[[長短期記憶|长短期记忆]]（LSTM）网络<ref name="lstm1997">{{Cite journal |author=Sepp Hochreiter |author-link= |author2=Jürgen Schmidhuber |author2-link=Jürgen Schmidhuber |year=1997 |title=Long short-term memory |url=https://www.researchgate.net/publication/13853244 |journal=Neural Computation |volume=9 |issue=8 |pages=1735–1780 |doi=10.1162/neco.1997.9.8.1735 |pmid=9377276 |s2cid=1915014 |access-date=2024-01-27 |archive-date=2021-01-22 |archive-url=https://web.archive.org/web/20210122144703/https://www.researchgate.net/publication/13853244_Long_Short-term_Memory |dead-url=no }}</ref>在每个LSTM单元中引入了权重为1.0的跳跃或残差连接，计算公式为<math display="inline"> y_{t+1} = F(x_{t}) + x_t </math>。在{{Le|随时间反向传播|Backpropagation through time}}中，这个公式转化为前馈神经网络中的残差公式<math display="inline">y = F(x) + x</math>，从而实现了深层[[循环神经网络]]的有效训练。2000年<ref name="lstm2000">{{Cite journal |author=Felix A. Gers |author2=Jürgen Schmidhuber |author3=Fred Cummins |year=2000 |title=Learning to Forget: Continual Prediction with LSTM |journal=Neural Computation |volume=12 |issue=10 |pages=2451–2471 |citeseerx=10.1.1.55.5709 |doi=10.1162/089976600300015015 |pmid=11032042 |s2cid=11598600}}</ref>发布的一种LSTM版本通过“遗忘门”来调节连接，这些门的权重不再固定为1.0，而是可以学习的。在实际实验中，遗忘门通过正偏置权重进行初始化，从而解决梯度消失问题。<ref name="lstm2000"/>

2015年，{{Le|高速神经网络|Highway network}}<ref name="highway2015may">{{cite arXiv|last1=Srivastava|first1=Rupesh Kumar|last2=Greff|first2=Klaus|last3=Schmidhuber|first3=Jürgen|title=Highway Networks|eprint=1505.00387|date=3 May 2015|class=cs.LG}}</ref><ref name="highway2015july">{{cite arXiv|last1=Srivastava|first1=Rupesh Kumar|last2=Greff|first2=Klaus|last3=Schmidhuber|first3=Jürgen|title=Training Very Deep Networks|eprint=1507.06228|date=22 July 2015|class=cs.LG}}</ref>将上述原理应用于[[前馈神经网络]]，被媒体报道称为“首个具有数百层深度的[[前馈神经网络]]”。<ref name="highwayblog">{{cite web
 |title        = Microsoft Wins ImageNet 2015 through Highway Net (or Feedforward LSTM) without Gates
 |url          = https://people.idsia.ch/~juergen/microsoft-wins-imagenet-through-feedforward-LSTM-without-gates.html
 |last         = Schmidhuber
 |first        = Jürgen
 |date         = 2015
 |access-date  = 2024-01-27
 |archive-date = 2023-11-27
 |archive-url  = https://web.archive.org/web/20231127155507/https://people.idsia.ch/~juergen/microsoft-wins-imagenet-through-feedforward-LSTM-without-gates.html
 |dead-url     = no
}}</ref> 原有的高速神经网络论文<ref name="highway2015may" />不仅提出了非常深的前馈网络的基本架构，还展示了20层、50层和100层网络的实验结果，并提及正在进行的深达900层的实验。50层或100层的网络相较于它们常用的神经网络有更低的训练误差，但与20层的训练结果相比，误差并无降低（详见MNIST数据集中的图1<ref name="highway2015may" />）。在超过19层的网络上，并未有训练精度的提高<ref name="highway2015may" />。然而，ResNet的研究证明了多于20层以上的训练结果同样有效。<ref name="resnetv2" /> 它指出跳跃连接中的调节可能导致前向和反向传播中信号的消失（详见第3节<ref name="resnetv2" />）。这也解释了为什么2000年的LSTM<ref name="lstm2000" />的遗忘门要通过正偏置权重初始化为开启状态：只要门处于开启状态，它就如同1997年的LSTM。同样，高速神经网络如果通过强正偏置权重保持门的开启状态，就表现得如同ResNet一样。现在的神经网络（如[[Transformer模型|Transformer]]）中使用的跳跃连接主要就是身份映射。

2016年出现的DenseNets<ref>{{Cite conference|last1=Huang|first1=Gao|last2=Liu|first2=Zhuang|last3=van der Maaten|first3=Laurens|last4=Weinberger|first4=Kilian|date=2016|title=Densely Connected Convolutional Networks|arxiv=1608.06993}}</ref>被设计为一种深层的神经网络，旨在将每一层与其他所有层连接。DenseNets通过使用身份映射作为跳跃连接来实现这一目标。与ResNet不同，DenseNets通过拼接而非加法将层输出与跳跃连接合并。

利用残差网络架构，实现了具有随机深度的神经网络。<ref>{{Cite conference|last1=Huang|first1=Gao|last2=Sun|first2=Yu|last3=Liu|first3=Zhuang|last4=Weinberger|first4=Kilian|date=2016|title=Deep Networks with Stochastic Depth|arxiv=1603.09382}}</ref> 通过随机丢弃一部分网络层，让信号通过跳跃连接进行传播。这种做法也被称为“路径丢弃”。这是训练大型深层模型，如{{Le|视觉Transformer|Vision transformer}}（ViT）的一种十分有效的正则化方法。

== 与生物学的联系 ==

虽然最初的残差网络研究并未受[[生物学]]启发，但后来的研究却发现残差网络与生物学有关。<ref name="liao2016">
{{cite conference
    |first1=Qianli
    |last1=Liao
    |first2=Tomaso
    |last2=Poggio
    |date=2016
    |title=Bridging the Gaps Between Residual Learning, Recurrent Neural Networks and Visual Cortex
    |arxiv=1604.03640
}}</ref><ref name="xiao2018">
{{cite conference
    |first1=Will
    |last1=Xiao
    |first2=Honglin
    |last2=Chen
    |first3=Qianli
    |last3=Liao
    |first4=Tomaso
    |last4=Poggio
    |date=2018
    |title=Biologically-Plausible Learning Algorithms Can Scale to Large Datasets
    |arxiv=1811.03567
}}</ref>

2023年《[[科学 (期刊)|科学]]》杂志上发表的一项研究展示了[[果蠅屬|果蝇]][[幼体|幼虫]][[大脑]]的完整神经[[连接组]]。<ref name="Winding2023">{{Cite journal |last1=Winding |first1=Michael |last2=Pedigo |first2=Benjamin |last3=Barnes |first3=Christopher |last4=Patsolic |first4=Heather |last5=Park |first5=Youngser |last6=Kazimiers |first6=Tom |last7=Fushiki |first7=Akira |last8=Andrade |first8=Ingrid |last9=Khandelwal |first9=Avinash |last10=Valdes-Aleman |first10=Javier |last11=Li |first11=Feng |last12=Randel |first12=Nadine |last13=Barsotti |first13=Elizabeth |last14=Correia |first14=Ana |last15=Fetter |first15=Fetter |last16=Hartenstein |first16=Volker |last17=Priebe |first17=Carey |last18=Vogelstein |first18=Joshua |last19=Cardona |first19=Albert |last20=Zlatic |first20=Marta |date=10 Mar 2023 |title=The connectome of an insect brain |journal=Science |volume=379 |issue=6636 |pages=eadd9330 |biorxiv=10.1101/2022.11.28.516756v1 |doi=10.1126/science.add9330 |pmid=36893230|pmc=7614541 |s2cid=254070919 }}</ref> 这项研究发现了类似于人工神经网络中如ResNet一样的跳跃连接。

== 参考文献 ==
<references />
{{Differentiable computing}}

[[Category:人工神经网络]]
[[Category:深度学习]]