查看“︁核方法”︁的源代码

{{机器学习导航栏}}

[[机器学习]]中，'''核机'''是一类用于[[模式识别|模式分析]]的算法，最知名的成员是[[支持向量机]]（SVM）。 这些方法用线性分类器解决非线性问题。<ref>{{Cite web |title=Kernel method |url=https://www.engati.com/glossary/kernel-method |access-date=2023-04-04 |website=Engati |language=en |archive-date=2023-04-04 |archive-url=https://web.archive.org/web/20230404160939/https://www.engati.com/glossary/kernel-method |dead-url=no }}</ref>模式分析的一般任务是在数据集中发现并研究一般类型的关系（如[[聚类分析]]、[[排名]]、[[主成分分析]]、[[相关 (概率论)|相关性分析]]、[[统计分类]]）。对许多解决此类任务的算法来说，原始表示的数据必须通过用户指定的特征映射明确转换为[[特征向量]]表示；相反，核方法只需要用户指定核，即用内积计算所有数据点对的[[相似性测度|相似性函数]]。核机中的特征映射是无限维的，但根据[[表示定理]]，只需输入有限维矩阵。在不并行处理的情况下，核机的计算速度会很慢。

核方法得名于[[正定核|核函数]]的使用，使其能在高维隐特征空间中运行，而无需计算空间中数据的坐标，只需计算特征空间中所有数据对的像之间的内积。这种运算比显式计算坐标更节约，称为“'''核技巧'''”。<ref>{{Cite book|title=Pattern Recognition|last=Theodoridis|first=Sergios|publisher=Elsevier B.V.|year=2008|isbn=9780080949123|pages=203}}</ref>核函数已被引入序列数据、[[图核函数]]、文本、图像及向量处理中。

能使用核的算法有[[核感知器]]、支持向量机（SVM）、[[高斯过程]]、[[主成分分析]]（PCA）、[[典型相关]]分析、[[岭回归]]、[[谱聚类]]、[[自适应滤波器]]等等很多算法。
大多数核算法都基于[[凸优化]]或特征问题，有很好的统计学基础。通常用[[统计学习理论]]（如[[拉德马赫复杂度]]）分析其统计特性。
==动机与非正式解释==
核方法可被视为[[循例學習]]器：不是学习与输入特征相对应的固定参数集，而是“记忆”第<math>i</math>个训练样本<math>(\mathbf{x}_i, y_i)</math>并学习相应权<math>w_i</math>。预测不在训练集中的输入（未标记输入）是用未标记输入<math>\mathbf{x'}</math>与每个训练输入<math>\mathbf{x}_i</math>的[[相似性测度|相似性函数]]<math>k</math>（称作'''核'''）。例如，核化[[二分分类]]器通常计算相似度的加权和

:<math>\hat{y} = \sgn \sum_{i=1}^n w_i y_i k(\mathbf{x}_i, \mathbf{x'})</math>,
其中
* <math>\hat{y} \in \{-1, +1\}</math>是核化二分分类器对无标输入<math>\mathbf{x'}</math>的预测标签，其隐藏的真实标签<math>y</math>是我们感兴趣的；
* <math>k \colon \mathcal{X} \times \mathcal{X} \to \mathbb{R}</math>是核函数，度量了任意一对输入 <math>\mathbf{x}, \mathbf{x'} \in \mathcal{X}</math>的相似性；
* 和的范围是分类器训练集中的n个有标示例<math>\{(\mathbf{x}_i, y_i)\}_{i=1}^n\ (y_i \in \{-1, +1\})</math>；
* <math>w_i \in \mathbb{R}</math>是由学习算法确定的训练样本的权重；
* [[符号函数]]<math>\sgn</math>决定预测分类结果<math>\hat{y}</math>属于正类还是负类。

核分类器的描述可追溯至1960年代[[核感知器]]的发明。<ref>{{cite journal |last1=Aizerman |first=M. A. |first2=Emmanuel M. | last2 = Braverman |first3=L. I. |last3=Rozonoer |title=Theoretical foundations of the potential function method in pattern recognition learning |url=https://archive.org/details/sim_automation-and-remote-control_1964-06_25_6/page/821 |journal=Automation and Remote Control |volume=25 |year=1964 |pages=821–837}} Cited in {{Cite conference |last1=Guyon |first1=Isabelle |first2=B. |last2=Boser |first3=Vladimir |last3=Vapnik |title=Automatic capacity tuning of very large VC-dimension classifiers |conference=Advances in neural information processing systems |year=1993 |citeseerx = 10.1.1.17.7215}}</ref>1990年代，随着[[支持向量机]]（SVM）的流行，人们发现其在[[手写识别]]等任务上的表现可与[[神经网络]]媲美。

==数学：核技巧==
[[Image:Kernel trick idea.svg|thumb|500px|SVM，核为<math>\varphi((a, b)) = (a, b, a^2 + b^2)</math>，因此<math>k(\mathbf{x}, \mathbf{y}) = \mathbf{x} \cdot \mathbf{y} + \left\| \mathbf{x} \right\| ^2  \left\| \mathbf{y} \right\| ^2  </math>。训练点被映射到三维空间，很容易找到分离超平面。]]
核技巧避免了让线性[[学习算法]]学习非线性函数或[[决策边界]]所需的显式映射。<math>\forall \mathbf{x}</math>与<math>\mathbf{x'}\in \mathcal{X}</math>（输入空间），某些函数<math>k(\mathbf{x}, \mathbf{x'})</math>可表示为另一空间<math>\mathcal{V}</math>中的[[内积]]。函数<math>k \colon \mathcal{X} \times \mathcal{X} \to \mathbb{R}</math> 常被称为核或[[核函数]]。“核”在数学中用于表示加权和或积分的加权函数。
机器学习中某些问题比任意权函数<math>k</math>更具结构性。若核能写成“特征映射”<math>\varphi\colon \mathcal{X} \to \mathcal{V}</math>，满足
:<math>k(\mathbf{x}, \mathbf{x'}) = \langle \varphi(\mathbf{x}), \varphi(\mathbf{x'}) \rangle_\mathcal{V}</math>
那么计算就能简化很多。关键的约束是<math> \langle \cdot, \cdot \rangle_\mathcal{V}</math>必须是适当的内积。另一方面，只要<math>\mathcal{V}</math>是[[内积空间]]，就不必明确表示出<math>\varphi</math>。另一种方法来自[[默瑟定理]]：只要空间<math>\mathcal{X}</math>匹配了合适的[[测度]]确保函数<math>k</math>满足默瑟条件，就存在隐定义的函数<math>\varphi</math>。

默瑟定理类似于线性代数中将内积与任意[[正定矩阵]]相关联的结果的推广。实际上，默瑟条件可以简化为这种更简单的情况：若<math>\forall T \subset X </math>择[[计数]]测度<math> \mu(T) = |T| </math>（计算集合<math>T</math>内部的点数），那么默瑟定理中的积分就简化为和式
:<math> \sum_{i=1}^n\sum_{j=1}^n k(\mathbf{x}_i, \mathbf{x}_j) c_i c_j \geq 0.</math>
若对于<math>\mathcal{X}</math>中的所有有限点序列<math>(\mathbf{x}_1, \dotsc, \mathbf{x}_n)</math>及所有<math>n</math>个实值系数选择<math>(c_1, \dots, c_n)</math>（参考正定核），和式都成立，那么函数<math>k</math>满足默瑟条件。

一些依赖于原空间<math>\mathcal{X}</math>中任意关系的算法在别的环境中会有线性解释：<math>\varphi</math>的范围空间。线性解释让我们对算法有了更深入的了解。此外，在计算过程中通常无需直接计算<math>\varphi</math>，支持向量机就是这样。有人认为这种运行时间上的节省是其主要优点。研究人员也用它来证明现有算法的意义和特性。

理论上讲，关于<math>\{\mathbf{x}_1, \dotsc, \mathbf{x}_n\}</math>的[[格拉姆矩阵]]<math>\mathbf{K} \in \mathbb{R}^{n \times n}</math>（有时也称为“核矩阵”<ref>{{cite journal | first1 = Thomas | last1 = Hofmann | first2 = Bernhard | last2 = Scholkopf | first3 = Alexander J. | last3 = Smola | date = 2008 | title = Kernel Methods in Machine Learning | journal = The Annals of Statistics | volume = 36 | issue = 3 | doi = 10.1214/009053607000000677 | s2cid = 88516979 |url=https://projecteuclid.org/download/pdfview_1/euclid.aos/1211819561| doi-access = free }}</ref>），其中<math>K_{ij} = k(\mathbf{x}_i, \mathbf{x}_j)</math>须是正半定（PSD）矩阵。根据经验，对于机器学习的启发式方法来说，若<math>k</math>至少近似于相似性的直观概念，那么不满足默瑟条件的函数<math>k</math>的选择可能仍有合理表现。<ref>{{cite web|url=http://www.svms.org/mercer/|title=Support Vector Machines: Mercer's Condition|first=Martin|last=Sewell|publisher=Support Vector Machines|access-date=2014-05-30|archive-date=2018-10-15|archive-url=https://web.archive.org/web/20181015031456/http://www.svms.org/mercer/|url-status=dead}}</ref>无论<math>k</math>是不是默瑟核，<math>k</math>仍可称为“核”。

若核函数<math>k</math>也是[[高斯过程]]中使用的[[协方差函数]]，那么格拉姆矩阵<math>\mathbf{K}</math>也可称为[[协方差矩阵]]。<ref>{{cite document | first1 = C. E. | last1 = Rasmussen | first2 = C. K. I. | last2 = Williams | date = 2006 | title = Gaussian Processes for Machine Learning }}</ref>

==应用==
核方法的应用十分广泛，见于[[地理统计]]<ref>{{cite journal | last1 = Honarkhah | first1 = M. | last2 = Caers | first2 = J. | date = 2010 | doi = 10.1007/s11004-010-9276-7 | title = Stochastic Simulation of Patterns Using Distance-Based Pattern Modeling |journal=[[Mathematical Geosciences]] | volume = 42 | issue = 5 | pages = 487–517 | s2cid = 73657847 }}</ref>、[[克里金法]]、[[反距离加权]]、[[三维重建]]、[[生物信息学]]、[[化学信息学]]、[[信息抽取]]和[[手写识别]]。

==常用核函数==
*Fisher核
*图核
*核平滑器
*多项式核
*[[径向基函数核]]
*字符串函数核
*[[神经正切核]]
*[[神经网络高斯过程]]（NNGP）核

==另见==
*[[向量输出的核方法]]
*[[核密度估计]]
*[[表示定理]]
*[[相似性学习]]
*[[科孚定理]]

== 参考文献 ==
{{Reflist|30em}}

== 阅读更多 ==
* {{cite book | author-link1 = John Shawe-Taylor | first1 = J. | last1 = Shawe-Taylor | author-link2 = Nello Cristianini | first2 = N. | last2 = Cristianini | title = Kernel Methods for Pattern Analysis | publisher = Cambridge University Press | year = 2004 }}
* {{cite book | first1 = W. | last1 = Liu | first2 = J. | last2 = Principe | first3 = S. | last3 = Haykin | title = Kernel Adaptive Filtering: A Comprehensive Introduction | publisher = Wiley | year = 2010 | isbn = 9781118211212 |url=https://books.google.com/books?id=eWUwB_P5pW0C }}
* {{cite book |first1=B. |last1=Schölkopf |author-link=Bernhard Schölkopf |first2=A. J. |last2=Smola |first3=F. |last3=Bach |title=Learning with Kernels : Support Vector Machines, Regularization, Optimization, and Beyond |publisher=MIT Press |year=2018 |isbn=978-0-262-53657-8 |url=https://books.google.com/books?id=ZQxiuAEACAAJ }}

==外部链接==
* [http://www.kernel-machines.org Kernel-Machines Org] {{Wayback|url=http://www.kernel-machines.org/ |date=20061104065824 }}—community website
* [http://onlineprediction.net/?n=Main.KernelMethods onlineprediction.net Kernel Methods Article] {{Wayback|url=http://onlineprediction.net/?n=Main.KernelMethods |date=20220629012514 }}

[[Category:地理统计]]
[[Category:分类算法]]