查看“︁LogSumExp”︁的源代码

{{NoteTA|G1=Math|G2=IT}}
'''LogSumExp'''（LSE，也称'''RealSoftMax'''<ref>{{cite web |last1=Zhang |first1=Aston |last2=Lipton |first2=Zack |last3=Li |first3=Mu |last4=Smola |first4=Alex |title=Dive into Deep Learning, Chapter 3 Exercises |url=https://www.d2l.ai/chapter_linear-networks/softmax-regression.html#exercises |website=www.d2l.ai |accessdate=27 June 2020 |ref=d2lch3ex |archive-date=2022-03-31 |archive-url=https://web.archive.org/web/20220331022728/https://www.d2l.ai/chapter_linear-networks/softmax-regression.html#exercises |dead-url=no }}</ref>或多变量'''[[softplus]]'''）[[函数]]是一个[[平滑最大值]]——一个对[[极值]]函数的[[光滑函数|光滑]][[近似]]，主要用在[[机器学习]]算法中。<ref name="F. Nielsen 2016">{{cite journal
  | first1 = Frank | last1 = Nielsen
  | first2 = Ke | last2 = Sun
  | year = 2016
  | title = Guaranteed bounds on the Kullback-Leibler divergence of univariate mixtures using piecewise log-sum-exp inequalities
  | arxiv = 1606.05850
  | doi=10.3390/e18120442
  | volume=18
  | journal=Entropy
  | issue = 12
 | page=442
  | bibcode=2016Entrp..18..442N
  | s2cid = 17259055
 | doi-access = free
 }}</ref> 其定义为参数的[[指数函数|指数]]的和的[[对数]]：

:<math>\mathrm{LSE}(x_1, \dots, x_n) = \log\left( \exp(x_1) + \cdots + \exp(x_n) \right).</math>

== 性质==
LogSumExp函数的[[定义域]]为<math>\R^n</math>（{{link-en|实数空间|real coordinate space}}），共域是<math>\R</math>（[[实数线]]）。
它是对极值函数<math>\max_i x_i</math>的近似，同时有如下的界限：
:<math>\max{\{x_1, \dots, x_n\}} \leq \mathrm{LSE}(x_1, \dots, x_n) \leq \max{\{x_1, \dots, x_n\}} + \log(n).</math>
第一个[[不等式]]在<math>n = 1</math>以外的情况是严格成立的，第二个不等式仅在所有元素相等时取等号。
（证明：令<math>m = \max_i x_i</math>，则<math>\exp(m) \leq \sum_{i=1}^n \exp(x_i) \leq n \exp(m)</math>。将不等式取对数即可。）

另外，我们可以将不等式缩放到更紧的界限。考虑函数<math>\frac 1 t \mathrm{LSE}(tx)</math>。然后，
:<math> \max{\{x_1, \dots, x_n\}} < \frac 1 t \mathrm{LSE}(tx) \leq \max{\{x_1, \dots, x_n\}} + \frac{\log(n)}{t}</math>

（证明：将上式<math>x_i</math>用<math>t>0</math>的<math>tx_i</math>替换，得到
:<math>\max{\{tx_1, \dots, tx_n\}} < \mathrm{LSE}(tx_1, \dots, tx_n)\leq \max{\{tx_1, \dots, tx_n\}} + \log(n)</math>
由于<math>t>0</math>，
:<math>t \max{\{x_1, \dots, x_n\}} < \mathrm{LSE}(tx_1, \dots, tx_n)\leq t\max{\{x_1, \dots, x_n\}} + \log(n)</math>
最后，同除<math>t</math>得到结果。）

此外，如果我们乘上一个负数，可以得到一个与<math> \min </math>有关的不等式：
:<math> \min{\{x_1, \dots, x_n\}} - \frac{\log(n)}{t} \leq \frac 1 {-t} \mathrm{LSE}(-tx) < \min{\{x_1, \dots, x_n\}}.</math>

LogSumExp函数是[[凸函数]]，因此在定义域上[[单调函数|严格递增]]。<ref name="L. El Ghaoui 2017">{{cite book|url=http://livebooklabs.com/keeppies/c5a5868ce26b8125|title=Optimization Models and Applications|first=Laurent|last=El Ghaoui|year=2017|access-date=2022-10-16|archive-date=2020-12-19|archive-url=https://web.archive.org/web/20201219222156/http://livebooklabs.com/keeppies/c5a5868ce26b8125|dead-url=no}}</ref> （但并非处处都是严格凸的<ref>{{cite web|url=https://math.stackexchange.com/q/1189638 |title=convex analysis - About the strictly convexity of log-sum-exp function - Mathematics Stack Exchange|work=stackexchange.com}}</ref>。）

令<math>\mathbf{x} = (x_1, \dots, x_n)</math>，[[偏导数]]为：
:<math>\frac{\partial}{\partial x_i}{\mathrm{LSE}(\mathbf{x})} = 
\frac{\exp x_i}{\sum_j \exp {x_j}},</math>
表明LogSumExp的[[梯度]]是[[softmax函数]]。

LogSumExp的[[凸共轭]]是{{link-en|负熵|negative entropy}}。

==对数域中的log-sum-exp计算技巧==
当通常的算术计算在[[对数尺度]]上进行时，经常会遇到LSE函数，例如[[对数概率]]。<ref>{{Cite book|last=McElreath|first=Richard|url=http://worldcat.org/oclc/1107423386|title=Statistical Rethinking|oclc=1107423386}}</ref>

类似于线性尺度中的乘法运算变成对数尺度中的简单加法，线性尺度中的加法运算变成对数尺度中的LSE：

:<math>\mathrm{LSE}(\log(x_1), ..., \log(x_n)) = \log(x_1 + \dots + x_n)</math>

使用对数域计算的一个常见目的是在使用有限精度浮点数直接表示（在线性域中）非常小或非常大的数字时提高精度并避免溢出问题.<ref>{{Cite web|title=Practical issues: Numeric stability.|url=https://cs231n.github.io/linear-classify/#softmax-classifier|url-status=no|website=CS231n Convolutional Neural Networks for Visual Recognition|access-date=2022-10-16|archive-date=2022-12-06|archive-url=https://web.archive.org/web/20221206224416/https://cs231n.github.io/linear-classify/#softmax-classifier}}</ref>

不幸的是，在一些情况下直接使用 LSE 依然会导致上溢/下溢问题，必须改用以下等效公式（尤其是当上述“最大”近似值的准确性不够时）。 因此，[[IT++]]等很多数学库都提供了LSE的默认例程，并在内部使用了这个公式。

:<math>\mathrm{LSE}(x_1, \dots, x_n) = x^* + \log\left( \exp(x_1-x^*)+ \cdots + \exp(x_n-x^*) \right)</math>
其中<math>x^* = \max{\{x_1, \dots, x_n\}}</math>

== 一个严格凸的log-sum-exp型函数 ==

LSE是凸的，但不是严格凸的。我们可以通过增加一项为零的额外参数来定义一个严格凸的log-sum-exp型函数<ref name="F. Nielsen 2018">{{cite journal
  | first1 = Frank | last1 = Nielsen
  | first2 = Gaetan | last2 = Hadjeres
  | year = 2018
  | title = Monte Carlo Information Geometry: The dually flat case
  | arxiv = 1803.07225
  | bibcode = 2018arXiv180307225N}}</ref>：

:<math>\mathrm{LSE}_0^+(x_1,...,x_n) = \mathrm{LSE}(0,x_1,...,x_n)</math>  
	
This function is a proper Bregman generator (strictly convex and [[differentiable function|differentiable]]). 
It is encountered in machine learning, for example, as the cumulant of the multinomial/binomial family.

在{{link-en|热带分析|tropical analysis}}中，这是{{link-en|对数半环|log semiring}}的和。

== 参见 ==
* [[對數平均]]
* [[Log semiring]]
* [[平滑最大值]]
* [[Softmax函数]]

== 参考资料==
{{Reflist}}
{{refbegin}}
  


{{refend}}

[[Category:对数]]