查看“︁稳健回归”︁的源代码

{{回归侧栏}}
在[[稳健统计]]中，'''稳健回归'''试图克服传统[[回归分析]]的一些局限性。回归分析对自变量与因变量的关系进行建模。[[普通最小二乘法]]等标准类型的回归，在基本假设为真时有有利的特性，但其他情形下可能产生误导（即对违背假设的情形不稳健）。稳健回归法旨在限制底数据生成中违反假设的情形对回归估计的影响。

例如，最小二乘估计的回归模型对[[异常值]]非常敏感：误差幅度为典型观测值2倍的异常值，对平方误差[[损失函数]]的贡献是典型观测值的4倍（2倍的平方），因此对回归估计值的影响更大。[[休伯损失]]函数是普通平方误差损失的一种稳健替代，可减少异常值对平方误差损失的贡献，从而限制其对回归估计值的影响。

== 应用 ==
=== 异方差误差 ===
当强烈怀疑存在[[异方差]]时，就要考虑采用稳健估计。同方差模型假定误差项的方差对所有''x''都是常数。例如，高收入人群的支出方差往往大于低收入人群。软件包通常默认使用同方差，尽管可能不如异方差模型准确。一种简单方法（[[#Tofallis2008|Tofallis, 2008]]）是对百分误差应用最小二乘法，与普通最小二乘法相比，这样可减少因变量的较大值造成的影响。

=== 异常值 ===
使用稳健估计的另一种常见情况是数据包含异常值。若异常值与其他数据的产生过程不同，最小二乘法估算的效率就会很低，且会产生偏差。由于最小二乘预测结果会被拖向异常值，且估计值的方差也会被扩大，结果就是异常值会被掩盖（在许多时候，包括[[地理统计]]和医学统计的部分领域，待研究的往往是异常值）。

有时有人称最小二乘法（或一般的经典统计方法）是稳健的，但这只是指在违反模型的情况下[[第一类错误]]率不会增加。实际上，出现异常值时，第一类错误率往往会低于定类水平（nominal level），而[[第二类错误]]率则会急剧上升。第一类错误率的下降被称为经典方法的保守性。

== 稳健回归的历史与不受欢迎 ==
虽然稳健回归法在很多时候都比最小二乘法的性能更优越，但仍未得到广泛应用。不受欢迎的原因有几个（[[#Hampel1986|Hampel et al. 1986, 2005]]），其一是有多种方法相互竞争，使得领域有多个错误的开端；另外，文件回归的计算量比最小二乘法大得多；近年来，随着算力的大幅提高，这种反对意见已变得不重要了。另一个原因可能是一些流行统计软件包还没有实现这些方法（[[#Stromberg2004|Stromberg, 2004]]）。许多统计学家认为经典方法是稳健的，这可能又是一个原因{{Citation needed|reason=Given robust methods are readily available today, this claim seems quite dubious|date=September 2017}}。

尽管稳健方法的应用进展缓慢，但现代的主流统计学教科书通常都有对这些方法的讨论（例如，Seber & Lee 及 Faraway 的著作；关于各种稳健回归方法如何相互发展的概述，请参阅 Andersen 的著作）。

== 稳健回归方法 ==
=== 最小二乘的代替 ===
最简单的方法是使用[[最小一乘法]]估计回归模型中的参数，这种方法对异常值的敏感度低于最小二乘法。即便如此，严重的异常值仍会对模型产生相当大的影响，促使人们研究更加稳健的方法。

1964年，休伯引入了[[M估计]]，M代表“最大似然”，对响应变量中的异常值很稳健，但对解释变量（杠杆点）的异常值则无能为力。事实上，这时这种方法与最小二乘相比没有任何优势。
1980年代，提出了集中M估计的替代方案，试图克服缺乏抵抗的问题。可参Rousseeuw、Leroy的著作。[[最小截平方]]（LTS）是一种可行的替代，目前(2007)是Rousseeuw & Ryan (1997, 2008)的首选。[[泰尔-森估算]]的分解点低于LTS，但在统计上很有效，也很受欢迎。另一种建议的解决方案是S估计，能找到一条线（面或超平面），使残差规模的稳健估计值（名称出处）最小化。这种方法对杠杆点有很强抵抗力，对响应中的异常值也很稳健，但往往很低效。
[[MM估计]]试图保留S估计的稳健性，同时获得M估计的效率。首先要找到一个十分稳健、抗干扰的S估计值，可使残差尺度的M估计值（第一个M）最小化。然后，在确定参数的M估计值（第二个M）的同时，保持估计值不变。

=== 参数替代方法 ===
另一种稳健估计回归方法是用重尾分布代替正态分布。据报道，在各种实际情况下，自由度为4~6的[[T分布]]都是不错的选择。作为完全参数化的贝叶斯稳健回归，在很大程度上依赖于这种分布。

在残差为''t''分布的假设下，分布是一个位置尺度族，即<math>x \leftarrow (x-\mu)/\sigma</math>。''t''分布的自由度，有时也称为峰度系数。Lange、Little & Taylor (1989)从非贝叶斯的角度深入讨论了这一模型；Gelman et al. (2003)对贝叶斯模型进行了阐述。

另一种参数方法是假设残差遵循混合正态分布（[[#Daemi2019|Daemi et al. 2019]]）；特别是'''污染正态分布'''，其中大部分观测值来自指定的正态分布，小部分来自方差大得多的正态分布。即，残差来自方差为<math>\sigma^2</math>的正态分布的概率为<math>1-\varepsilon</math>，其中<math>\varepsilon</math>很小，而对某个<math>c > 1</math>，来自方差为<math>c\sigma^2</math>的正态分布的概率为<math>\varepsilon</math>：

:<math>e_i \sim (1 - \varepsilon) N(0, \sigma^2) + \varepsilon N(0, c\sigma^2).</math>

通常有<math>\varepsilon < 0.1</math>。这有时被称为<math>\varepsilon</math>污染模型。

参数法的优点是，由似然理论提供了一种“现成”的推断方法（虽然对<math>\varepsilon</math>污染模型之类不适用通常的正则行条件），且可根据拟合结果建立模拟模型。但这种参数模型仍假定基本模型是真实的，因此不能考虑偏移的残差分布或有限的观测精度。

=== 单位权 ===
另一种稳健方法是[[单位权回归|单位权]]（Wainer & Thissen, 1976），适用于单一结果有多个预测因素的情况。Ernest Burgess (1928)用单位权法预测假释成功率，对21个积极因素进行评分，分为存在（如“无逮捕前科”= 1）或不存在（“有逮捕前科”= 0），然后求和得出预测得分，结果表明得分是预测假释成功的有效指标。Samuel S. Wilks (1938)的研究表明，几乎所有回归权集的和都是彼此高度相关的，也包括单位权，这一结果被称为[[威尔克斯定理]]（Ree, Carretta, & Earles, 1998）。Robyn Dawes (1979)研究了应用环境下的决策制定，发现使用单位权的简单模型的结果甚至往往优于人类专家。Bobko、Roth、Buster (2007)回顾了有关单位权的文献，并得出结论：数十年的经验研究表明，单位权在交叉验证中的表现与普通回归权相似。

== 另见 ==
* [[回归分析]]
* [[迭代重加权最小二乘]]
* [[M估计量]]
* [[随机抽样一致]]
* [[重复中位数回归]]
* [[泰尔-森估算]]，一种稳健[[简单线性回归]]估计

== 参考文献 ==
* {{cite journal |last1=Liu |first1=J. |last2=Cosman |first2=P. C. |last3=Rao |first3=B. D. |title=Robust Linear Regression via L0 Regularization |journal=IEEE Transactions on Signal Processing |date=2018 |volume=66 |issue=3 |pages=698–713 |doi=10.1109/TSP.2017.2771720|doi-access=free }}
* {{Cite book| last = Andersen | first = R. | title = Modern Methods for Robust Regression | publisher = Sage University Paper Series on Quantitative Applications in the Social Sciences, 07-152| year = 2008}}
* Ben-Gal I., [http://www.eng.tau.ac.il/~bengal/outlier.pdf Outlier detection] {{Wayback|url=http://www.eng.tau.ac.il/~bengal/outlier.pdf |date=20221215100532 }}, In: Maimon O. and Rockach L. (Eds.) Data Mining and Knowledge Discovery Handbook: A Complete Guide for Practitioners and Researchers," Kluwer Academic Publishers, 2005, {{ISBN|0-387-24435-2}}.
* Bobko, P., Roth, P. L., & Buster, M. A. (2007). "The usefulness of unit weights in creating composite scores: A literature review, application to content validity, and meta-analysis". ''Organizational Research Methods'', volume 10, pages 689-709. {{doi|10.1177/1094428106294734}}
*Daemi, Atefeh, Hariprasad Kodamana, and Biao Huang. "Gaussian process modelling with Gaussian mixture likelihood." Journal of Process Control 81 (2019): 209-220. {{doi|10.1016/j.jprocont.2019.06.007}}
* {{Cite journal| last = Breiman | first = L. | title = Statistical Modeling: the Two Cultures | url = https://archive.org/details/sim_statistical-science_2001-08_16_3/page/199 | journal = Statistical Science | volume = 16 | issue = 3 | pages = 199–231 | year = 2001 | doi = 10.1214/ss/1009213725 | jstor=2676681| doi-access = free }}
* Burgess, E. W. (1928). "Factors determining success or failure on parole". In A. A. Bruce (Ed.), ''The Workings of the Indeterminate Sentence Law and Parole in Illinois'' (pp.&nbsp;205–249). Springfield, Illinois: Illinois State Parole Board. [https://books.google.com/books/about/The_Workings_of_the_Indeterminate_senten.html?id=V6xCAAAAIAAJ Google books]
* Dawes, Robyn M. (1979). "The robust beauty of improper linear models in decision making". ''American Psychologist'', volume 34, pages 571-582.  {{doi|10.1037/0003-066X.34.7.571 }}.  [http://www.cmu.edu/dietrich/sds/docs/dawes/the-robust-beauty-of-improper-linear-models-in-decision-making.pdf archived pdf] {{Wayback|url=http://www.cmu.edu/dietrich/sds/docs/dawes/the-robust-beauty-of-improper-linear-models-in-decision-making.pdf |date=20230513172624 }}
*{{Cite journal| last=Draper | first=David | journal=Statistical Science | volume=3 | year=1988 | title=Rank-Based Robust Analysis of Linear Models. I. Exposition and Review | url=https://archive.org/details/sim_statistical-science_1988-05_3_2/page/239 | pages=239–257 | doi=10.1214/ss/1177012915 | issue=2 | jstor=2245578| doi-access=free }}
*{{Cite book| last = Faraway | first = J. J. | title = Linear Models with R | publisher = Chapman & Hall/CRC | year = 2004 }}
*{{Cite journal| last=Fornalski | first = K. W. | title = Applications of the robust Bayesian regression analysis | journal = International Journal of Society Systems Science | volume = 7 | issue = 4 | pages = 314–333 | year = 2015| doi=10.1504/IJSSS.2015.073223}}
* {{Cite book| last = Gelman | first = A. |author2=J. B. Carlin |author3=H. S. Stern |author4=D. B. Rubin | title = Bayesian Data Analysis |edition=Second | publisher = Chapman & Hall/CRC | year = 2003 }}
* {{Cite book | ref = Hampel1986 | last = Hampel | first = F. R. |author2=E. M. Ronchetti |author3=P. J. Rousseeuw |author4=W. A. Stahel | title = Robust Statistics: The Approach Based on Influence Functions | publisher = Wiley | orig-year = 1986| year = 2005}}
* {{Cite journal | last = Lange | first = K. L. | author2 = R. J. A. Little | author3 = J. M. G. Taylor | title = Robust statistical modeling using the ''t''-distribution | journal = Journal of the American Statistical Association | volume = 84 | issue = 408 | pages = 881–896 | year = 1989 | doi = 10.2307/2290063 | jstor = 2290063 | url = https://escholarship.org/uc/item/27s1d3h7 | access-date = 2023-10-14 | archive-date = 2022-12-22 | archive-url = https://web.archive.org/web/20221222130302/https://escholarship.org/uc/item/27s1d3h7 | dead-url = no }}
* Lerman, G.; McCoy, M.; Tropp, J. A.; Zhang T. (2012). [http://users.cms.caltech.edu/~jtropp/papers/LMTZ12-Robust-Computation.pdf "Robust computation of linear models, or how to find a needle in a haystack"] {{Wayback|url=http://users.cms.caltech.edu/~jtropp/papers/LMTZ12-Robust-Computation.pdf |date=20130926222436 }}, {{arxiv|id=1202.4044}}.
* {{Cite book| last = Maronna | first = R. |author2=D. Martin |author3=V. Yohai | title = Robust Statistics: Theory and Methods | publisher = Wiley | year = 2006}}
*{{Cite journal| last=McKean | first=Joseph W. | journal=Statistical Science | volume=19 | year=2004 | pages=562–570 | title=Robust Analysis of Linear Models | url=https://archive.org/details/sim_statistical-science_2004-11_19_4/page/562 | doi=10.1214/088342304000000549 | issue=4 | jstor=4144426| doi-access=free }}
* {{Cite book| last = Radchenko S.G. | title = Robust methods for statistical models estimation: Monograph. (on Russian language) | publisher = Kiev: РР «Sanspariel» |isbn=978-966-96574-0-4 | pages = 504 | year = 2005}}
* Ree, M. J., Carretta, T. R., & Earles, J. A. (1998). "In top-down decisions, weighting variables does not matter: A consequence of Wilk's theorem. ''Organizational Research Methods'', volume 1(4), pages 407-420. {{doi|10.1177/109442819814003}}
* {{Cite book| last = Rousseeuw | first = P. J. | author-link=Peter Rousseeuw|author2=A. M. Leroy  | title = Robust Regression and Outlier Detection | title-link = Robust Regression and Outlier Detection | publisher = Wiley | orig-year = 1986| year = 2003}}
* {{Cite book| last = Ryan | first = T. P. | title = Modern Regression Methods | publisher = Wiley | orig-year = 1997| year = 2008}}
* {{Cite book| last = Seber | first = G. A. F. |author2=A. J. Lee  | title = Linear Regression Analysis | url = https://archive.org/details/linearregression0000sebe |edition=Second | publisher = Wiley | year = 2003}}
* {{Cite journal| ref = Stromberg2004 | last = Stromberg| first = A. J. | title = Why write statistical software? The case of robust statistical methods | journal = Journal of Statistical Software | volume = 10| issue = 5| year = 2004 | doi = 10.18637/jss.v010.i05| doi-access = free }}
* {{cite book |first=T. |last=Strutz| title=Data Fitting and Uncertainty (A practical introduction to weighted least squares and beyond) |publisher=Springer Vieweg |year=2016 |isbn=978-3-658-11455-8}}
* {{cite journal | ref=Tofallis2008 | last=Tofallis | first=Chris | ssrn=1406472 | title=Least Squares Percentage Regression | journal=Journal of Modern Applied Statistical Methods | volume=7 | year=2008 | pages=526–534 | doi=10.2139/ssrn.1406472 | url=https://digitalcommons.wayne.edu/cgi/viewcontent.cgi?article=1466&context=jmasm | access-date=2023-10-14 | archive-date=2023-08-14 | archive-url=https://web.archive.org/web/20230814122753/https://digitalcommons.wayne.edu/cgi/viewcontent.cgi?article=1466&context=jmasm | dead-url=no }}
* {{Cite book| last = Venables | first = W. N. |author2=B. D. Ripley  | title = Modern Applied Statistics with S | publisher = Springer| year = 2002}}
* [[Howard Wainer|Wainer, H.]], & Thissen, D. (1976). "Three steps toward robust regression." ''Psychometrika'', volume 41(1), pages 9–34. {{doi|10.1007/BF02291695}}
* Wilks, S. S. (1938). "Weighting systems for linear functions of correlated variables when there is no dependent variable". ''Psychometrika'', volume 3, pages 23–40. {{doi|10.1007/BF02287917}}

== 外部链接 ==
*[[Wikibooks:R Programming#Linear Models|R programming wikibooks]]
*[[Brian D. Ripley|Brian Ripley's]] [https://web.archive.org/web/20121021081319/http://www.stats.ox.ac.uk/pub/StatMeth/Robust.pdf robust statistics course notes.]
*[http://www.nickfieller.staff.shef.ac.uk/sheff-only/StatModall05.pdf Nick Fieller's course notes on Statistical Modelling and Computation] {{Wayback|url=http://www.nickfieller.staff.shef.ac.uk/sheff-only/StatModall05.pdf |date=20160303212615 }} contain material on robust regression.
*[http://webmining.spd.louisville.edu/wp-content/uploads/2014/05/A-Brief-Overview-of-Robust-Statistics.pdf Olfa Nasraoui's Overview of Robust Statistics] {{Wayback|url=http://webmining.spd.louisville.edu/wp-content/uploads/2014/05/A-Brief-Overview-of-Robust-Statistics.pdf |date=20220401043220 }}
*[http://webmining.spd.louisville.edu/wp-content/uploads/2014/05/A-Brief-Overview-of-Robust-Clustering-Techniques.pdf Olfa Nasraoui's Overview of Robust Clustering] {{Wayback|url=http://webmining.spd.louisville.edu/wp-content/uploads/2014/05/A-Brief-Overview-of-Robust-Clustering-Techniques.pdf |date=20220401043241 }}
*[http://www.jstatsoft.org/v10/a05/paper Why write statistical software? The case of robust statistical methods, A. J. Stromberg] {{Wayback|url=http://www.jstatsoft.org/v10/a05/paper |date=20120207221607 }}
*[http://www.sourceforge.net/projects/l1-norm-robust-regression Free software (Fortran 95) L1-norm regression. Minimization of absolute deviations instead of least squares.]
*[https://github.com/gsubramani/robust-nonlinear-regression Free open-source python implementation for robust nonlinear regression.] {{Wayback|url=https://github.com/gsubramani/robust-nonlinear-regression |date=20210125124753 }}

{{统计学}}

[[Category:稳健回归| ]]