查看“︁核回归”︁的源代码

{{NoteTA |G1=Math}}
'''核回归（又称局部加权线性回归）'''是统计学中用于估计[[随机变量]]的[[条件期望]]的[[無母數統計|非参数]]方法。目的是找到一对随机变量'''''X'''''和'''''Y'''''之间的非线性关系。 

在任何非参数回归中 ，变量的[[条件期望]] <math>Y</math>相对于变量<math>X</math>可以写成： 

<math>\operatorname{E}(Y | X) = m(X)</math>

m为一个未知函数。

== Nadaraya–Watson核回归 ==
1964年， Nadaraya和Watson都提出了估算<math>m</math>作为局部加权平均值，使用内核作为加权函数的方法。 <ref>{{Cite journal|title=On Estimating Regression|last=Nadaraya|first=E. A.|journal=Theory of Probability and Its Applications|issue=1|doi=10.1137/1109020|year=1964|volume=9|pages=141–2|ref=harv}}</ref> <ref>{{Cite journal|title=Smooth regression analysis|last=Watson|first=G. S.|authorlink=|journal=Sankhyā: The Indian Journal of Statistics, Series A|issue=4|year=1964|volume=26|pages=359–372|jstor=25049340|ref=harv}}</ref> <ref>{{Cite book|first=Herman J.|last=Bierens|title=The Nadaraya–Watson kernel regression function estimator|location=New York|publisher=Cambridge University Press|year=1994|isbn=0-521-41900-X|pages=212–247|chapterurl=https://books.google.com/books?id=M5QBuJVtbWQC&pg=PA212}}</ref> Nadaraya–Watson估计量为： 

<math> \widehat{m}_h(x)=\frac{\sum_{i=1}^n K_h(x-x_i) y_i}{\sum_{j=1}^nK_h(x-x_j)}
</math>

<math>K_h</math>是一个带宽为 <math>h</math> 的核。 分母是一个总和为1的加权项。

=== 推导 ===
<math>
\operatorname{E}(Y | X=x) = \int y f(y|x) dy = \int y \frac{f(x,y)}{f(x)} dy
</math>

将[[核密度估计|内核密度估计]]用于具有内核'''''K'''''的联合分布''f（x，y）''和''f（x）'' ， 

<math>
\hat{f}(x,y) = \frac{1}{n}\sum_{i=1}^{n} K_h\left(x-x_i\right) K_h\left(y-y_i\right)
</math>,
<math>
\hat{f}(x) = \frac{1}{n} \sum_{i=1}^{n} K_h\left(x-x_i\right)
</math>,

我们得到

<math>
\begin{align}
\operatorname{\hat E}(Y | X=x) &= \int \frac{y \sum_{i=1}^{n} K_h\left(x-x_i\right) K_h\left(y-y_i\right)}{\sum_{j=1}^{n} K_h\left(x-x_j\right)} dy,\\
&= \frac{\sum_{i=1}^{n} K_h\left(x-x_i\right) \int y \, K_h\left(y-y_i\right) dy}{\sum_{j=1}^{n} K_h\left(x-x_j\right)},\\
&= \frac{\sum_{i=1}^{n} K_h\left(x-x_i\right) y_i}{\sum_{j=1}^{n} K_h\left(x-x_j\right)},
\end{align}
</math>

这便是Nadaraya–Watson估计量。

== Priestley–Chao核估计函数 ==
<math>
\widehat{m}_{PC}(x) = h^{-1} \sum_{i=2}^n (x_i - x_{i-1}) K\left(\frac{x-x_i}{h}\right) y_i
</math>

此处 <math> h </math> 为带宽（或平滑参数）。

== Gasser–Müller核估计函数 ==
<math>
\widehat{m}_{GM}(x) = h^{-1} \sum_{i=1}^n \left[\int_{s_{i-1}}^{s_i} K\left(\frac{x-u}{h}\right) du\right] y_i
</math>

此处 <math>s_i = \frac{x_{i-1} + x_i}{2}</math>

== 示例 ==
<!-- 檔案不存在 [[File:Cps71_lc_mean.png|右|缩略图|250x250像素|估计回归函数。 ]] ，可從英文維基百科取得 -->
此示例基于加拿大截面工资数据，该数据由1971年加拿大人口普查公用带中的随机样本组成，这些样本适用于受过普通教育的男性（13年级）。共有205个观测值。 

右图显示了使用二阶高斯核以及渐近变化范围的估计回归函数

=== 程序实例 ===
以下[[R语言]]命令使用<code>npreg()</code>函数提供最佳平滑效果并创建上面给出的图形。 这些命令可以通过剪切和粘贴在命令提示符下输入。 <syntaxhighlight lang="rsplus" line="1">
 install.packages("np")
 library(np) # non parametric library
 data(cps71)
 attach(cps71)

 m <- npreg(logwage~age)

 plot(m,plot.errors.method="asymptotic",
   plot.errors.style="band",
   ylim=c(11,15.2))

 points(age,logwage,cex=.25)
</syntaxhighlight>

== 相关资料 ==
大卫·萨尔斯堡 （David Salsburg）指出 ，用于内核回归的算法是独立开发的，并且已用于[[模糊控制|模糊系统]] ：“通过几乎完全相同的计算机算法，模糊系统和基于内核密度的回归似乎是完全独立于彼此而开发的。 ” <ref>{{Cite book|last=Salsburg|first=D.|title=[[The Lady Tasting Tea|The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century]]|publisher=W.H. Freeman|year=2002|isbn=0-8050-7134-2|pages=290–91|ref=harv}}</ref> 

== 统计实现 ==

* [[MATLAB]] [https://www.math.muni.cz/english/science-and-research/developed-software/232-matlab-toolbox.html 这些页面] {{Wayback|url=https://www.math.muni.cz/english/science-and-research/developed-software/232-matlab-toolbox.html |date=20191014162712 }}上提供了免费的MATLAB工具箱，其中包括内核回归，内核密度估计，危险函数的内核估计以及许多其他工具的实现（此工具箱是本书的一部分<ref name="HorKolZel">{{Cite book|last=Horová|first=I.|last2=Koláček|first2=J.|last3=Zelinka|first3=J.|title=Kernel Smoothing in MATLAB: Theory and Practice of Kernel Smoothing|url=https://archive.org/details/kernelsmoothingi0000horo|date=2012|publisher=World Scientific Publishing|location=Singapore|isbn=978-981-4405-48-5}}</ref> ）。 
* [[Stata]] [https://web.archive.org/web/20180519032545/https://www.stata.com/manuals/rnpregress.pdf npregress] ， [https://ideas.repec.org/c/boc/bocode/s372601.html kernreg2] {{Wayback|url=https://ideas.repec.org/c/boc/bocode/s372601.html |date=20200811014839 }} 
* [[R语言|R]] ： ''np package''的函数<code>npreg</code>可以执行内核回归。 <ref>{{Cite web |url=https://cran.r-project.org/web/packages/np/index.html |title=''np'': Nonparametric kernel smoothing methods for mixed data types |access-date=2019-10-14 |archive-date=2020-08-17 |archive-url=https://web.archive.org/web/20200817224253/https://cran.r-project.org/web/packages/np/index.html |dead-url=no }}</ref> <ref>{{Cite book|first=John|last=Kloke|first2=Joseph W.|last2=McKean|title=Nonparametric Statistical Methods Using R|location=|publisher=CRC Press|year=2014|isbn=978-1-4398-7343-4|pages=98–106|url=https://books.google.com/books?id=b-msBAAAQBAJ&pg=PA98}}</ref> 
* [[Python]] ：所述<code>[http://www.statsmodels.org/stable/generated/statsmodels.nonparametric.kernel_regression.KernelReg.html KernelReg]</code>在混合数据类型类<code>[http://www.statsmodels.org/stable/nonparametric.html statsmodels.nonparametric]</code>子包（包括其他内核密度相关的类），封装[https://github.com/jmetzen/kernel_regression kernel_regression] {{Wayback|url=https://github.com/jmetzen/kernel_regression |date=20201016111521 }}作为的延伸sklearn （低效存储器明智的，有用的，只有对于小数据集） 
* [[GNU Octave]]数学程序包： 

== 相关资料 ==

* 内核平滑 
* 局部回归 

== 参考文献 ==
{{Reflist}}

== 延申阅读 ==

* {{Cite book|first=Daniel J.|last=Henderson|first2=Christopher F.|last2=Parmeter|title=Applied Nonparametric Econometrics|location=|publisher=Cambridge University Press|year=2015|isbn=978-1-107-01025-3|url=https://books.google.com/books?id=hD3WBQAAQBAJ|access-date=2019-10-14|archive-date=2020-08-06|archive-url=https://web.archive.org/web/20200806231247/https://books.google.com/books?id=hD3WBQAAQBAJ|dead-url=no}}
* {{Cite book|last=Li|first=Qi|last2=Racine|first2=Jeffrey S.|title=Nonparametric Econometrics: Theory and Practice|publisher=Princeton University Press|year=2007|isbn=0-691-12161-3|url=https://books.google.com/books?id=Zsa7ofamTIUC}}
* {{Cite book|last=Pagan|first=A.|first2=A.|last2=Ullah|year=1999|title=Nonparametric Econometrics|publisher=Cambridge University Press|isbn=0-521-35564-8|url=https://books.google.com/books?id=m2soVgsuMgMC|access-date=2019-10-14|archive-date=2016-06-24|archive-url=https://web.archive.org/web/20160624094657/https://books.google.com/books?id=m2soVgsuMgMC|dead-url=no}}
* {{Cite book|last=Simonoff|first=Jeffrey S.|title=Smoothing Methods in Statistics|publisher=Springer|year=1996|isbn=0-387-94716-7|url=https://books.google.com/books?id=dgHaBwAAQBAJ}}

== 外部链接 ==

* [http://www.cs.tut.fi/~lasip 可缩放比例的内核回归] {{Wayback|url=http://www.cs.tut.fi/~lasip |date=20201031201040 }} （使用Matlab软件）。 
* [https://web.archive.org/web/20070927062200/http://people.revoledu.com/kardi/tutorial/Regression/KernelRegression/index.html 使用电子表格] （使用[[Microsoft Excel]] ） [https://web.archive.org/web/20070927062200/http://people.revoledu.com/kardi/tutorial/Regression/KernelRegression/index.html 进行内核回归的教程] 。 
* [http://pcarvalho.com/things/kernelregressor/ 在线内核回归演示] {{Wayback|url=http://pcarvalho.com/things/kernelregressor/ |date=20170801101530 }} Requires。 NET 3.0或更高版本。 
* [https://github.com/jmetzen/kernel_regression 具有自动带宽选择功能的内核回归] {{Wayback|url=https://github.com/jmetzen/kernel_regression |date=20201016111521 }} （使用Python） 

{{Authority control}}
[[Category:無母數迴歸]]