查看“︁分段聚合近似法”︁的源代码

'''分段聚合近似法'''（英文：{{lang|en|'''P'''iecewise '''A'''ggregate '''A'''pproximation}}，{{lang|en|'''PAA'''}}）是一种[[时间序列]]数据的[[降维]]方法，最早由{{tsl|en|Eamonn Keogh|埃蒙·基奥}}（{{lang|en|Eamonn Keogh}}）等人提出，用于建立时间序列索引<ref name="Fu 2011">{{cite journal |author1=Tak-chung Fu |title=A review on time series data mining |journal=Engineering Applications of Artificial Intelligence |date=2011 |volume=24 |issue=1 |pages=164-181 |url=https://www.sciencedirect.com/science/article/abs/pii/S0952197610001727 |language=en |accessdate=2019-06-05 |archive-date=2019-06-05 |archive-url=https://web.archive.org/web/20190605213808/https://www.sciencedirect.com/science/article/abs/pii/S0952197610001727 |dead-url=no }}</ref>。相比于[[离散傅里叶变换]]、[[离散小波变换]]、[[奇异值分解]]等降维方法，分段聚合近似法操作比较简便，适用于更多[[距离]]度量，例如加权[[欧几里得距离|欧氏距离]]。并且分段聚合近似法还适用于索引长度和查询长度不同的情况<ref name="原继东 综述">{{cite journal |author1=原继东 |author2=王志海 |title=时间序列的表示与分类算法综述 |journal=计算机科学 |date=2015 |volume=42 |issue=3 |url=http://www.jsjkx.com/CN/10.11896/j.issn.1002-137X.2015.03.001 |accessdate=2019-06-05 |archive-date=2019-06-05 |archive-url=https://web.archive.org/web/20190605221923/http://www.jsjkx.com/CN/10.11896/j.issn.1002-137X.2015.03.001 |dead-url=no }}</ref>。如今分段聚合近似法已经成为一种广泛应用的时间序列处理方法<ref name="Baker 2010">{{cite conference |author=Bakar, Azuraliza Abu |coauthors=Almahdi Mohammed Ahmed, and Abdul Razak Hamdan |title=Discretization of time series dataset using relative frequency and K-nearest neighbor approach |conference=International Conference on Advanced Data Mining and Applications |date=2010-11 |publisher=Springer |location=Berlin, Heidelberg |pages=193-201 |url=https://link.springer.com/chapter/10.1007/978-3-642-17316-5_18 |accessdate=2019-06-05 |language=en |archive-date=2019-06-05 |archive-url=https://web.archive.org/web/20190605215005/https://link.springer.com/chapter/10.1007/978-3-642-17316-5_18 |dead-url=no }}</ref><ref name="Kulahcioglu 2008">{{cite conference |author=Kulahcioglu, Burcu, Serhan Ozdemir, and Bora Kumova. |title=Application of symbolic Piecewise Aggregate Approximation (PAA) analysis to ECG signals |conference=17th IASTED International conference on applied simulation and modelling |conferenceurl=https://www.researchgate.net/publication/229011056_Application_of_Symbolic_Piecewise_Aggregate_Approximation_PAA_Analysis_to_ECG_Signals |date=2008-01 |url=https://www.researchgate.net/profile/Bora_Kumova/publication/229011056_Application_of_Symbolic_Piecewise_Aggregate_Approximation_PAA_Analysis_to_ECG_Signals/links/004635304b6ac485da000000/Application-of-Symbolic-Piecewise-Aggregate-Approximation-PAA-Analysis-to-ECG-Signals.pdf |language=en|accessdate=2019-06-05}}</ref>。

== 定义 ==
设时间序列<math>X = x_1,...,x_n</math>的长度为<math>n</math>。将其用一条长度为<math>N</math>的向量<math>\overline{X}</math>表示，<math>\overline{X}</math>第<math>i</math>个元素的定义为<ref name="李海林 2011">{{cite journal |author1=李海林 |author2=郭崇慧 |author3=杨丽彬 |title=基于分段聚合时间弯曲距离的时间序列挖掘 |journal=山东大学学报 (工学版) |date=2011 |volume=41 |issue=5 |pages=57-62 |url=http://gxbwk.njournal.sdu.edu.cn/CN/article/downloadArticleFile.do?attachType=PDF&id=1083 |accessdate=2019-06-05 |archive-date=2019-06-05 |archive-url=https://web.archive.org/web/20190605213809/http://gxbwk.njournal.sdu.edu.cn/CN/article/downloadArticleFile.do%3FattachType%3DPDF%26id%3D1083 |dead-url=no }}</ref>：
<math>\overline{X}_i = \frac{N} {n} \sum^{\frac{n} {N} i}_{j = \frac{n} {N} (i-1)+1}x_j</math>

简言之，为了将原始时间序列从<math>n</math>维降低到<math>N</math>维，需要将原始数据分割成<math>N</math>个等长的分段，在每个分段内计算均值就可以得到降维后的数据表示。

当<math>N= n</math>时，分段聚合近似法得到的向量就是原始时间序列本身，当<math>N= 1</math>时，分段聚合近似法得到的得到值就是原始时间序列的均值<ref name="Vignesh Krishnamoorthy PAA">{{cite web |author1=Vignesh Krishnamoorthy |title=Piecewise Aggregate Approximation |url=http://vigne.sh/posts/piecewise-aggregate-approx/ |website=vigne.sh |accessdate=2019-06-05 |language=en |date=2018-02 |archive-date=2020-10-21 |archive-url=https://web.archive.org/web/20201021105358/https://vigne.sh/posts/piecewise-aggregate-approx/ |dead-url=no }}</ref>。
== 索引建立方法 ==
用分段聚合近似法建立用于子序列匹配的索引的时间复杂度为<math>O(nm)</math>。因为对于大约<math>m</math>个“窗口”，每个分段都要用上述公式计算<math>N</math>次，并且上述公式要对长度为<math>\frac{n}{N}</math>的部分求和。埃蒙·基奥（Eamonn Keogh）提出了一种快速计算的方法，可以将时间复杂度降低到<math>O(Nm)</math>： 每次计算时只要从上次的结果减去上一段离开“窗口”的数据点的部分，加上新进入“窗口”的数据点的部分即可。在第<math>j</math>个“窗口”内的第<math>i</math>个值可以通过以下公式更新<ref name="Keogh 2001">{{cite journal |author1=Keogh, Eamonn |author2=Kaushik Chakrabarti |author3=Michael Pazzani |author4=Sharad Mehrotra |title=Dimensionality reduction for fast similarity search in large time series databases |journal=Knowledge and information Systems |date=2001-08 |volume=3 |issue=3 |pages=263-286 |url=https://link.springer.com/article/10.1007/PL00011669 |accessdate=2019-06-05 |language=en |archive-date=2016-05-02 |archive-url=https://web.archive.org/web/20160502145457/http://link.springer.com/article/10.1007/PL00011669 |dead-url=no }}</ref>：
<math>\overline{x}_{ji} = \overline{x}_{j- 1i} - \frac{N}{n}x_{\frac{n} {N} (i-1)+1} + \frac{N}{n}x_{\frac{n} {N} i + 1}</math>

== 应用领域 ==
作为一种[[:Category:时间序列降维方法|时间序列降维方法]]，分段聚合近似法得到了广泛的应用，是一些时间序列的低维表示方法的前期处理步骤之一<ref name="时序专利柴毅">{{cite web |authorlink1=柴毅 |authorlink2=张可 |authorlink3=毛永芳 |authorlink4=黄磊 |authorlink5=许水清 |title=时间序列数据的一种符号化表示方法 |url=https://patents.google.com/patent/CN106095787A/zh |website=Google Patent |accessdate=2019-06-04 |date=2016-11-09}}</ref>。
在使用分段聚合近似法建立索引后，为了进行各种查询，要使用某种[[距离]]<math>D_{PAA}</math>进行{{tsl|en|Similarity measure|相似度}}度量。为了避免[[偽陽性和偽陰性|假阴性]]情况的出现，所使用的距离需要满足以下特征<ref name="Guo 2010">{{cite conference |author=Chonghui Guo |coauthors=Hailin Li, Donghua Pan |title=An Improved Piecewise Aggregate Approximation Based on Statistical Features for Time Series Mining |conference=An improved piecewise aggregate approximation based on statistical features for time series mining |conferenceurl=https://link.springer.com/chapter/10.1007/978-3-642-15280-1_23 |booktitle=International conference on knowledge science, engineering and management |date=2010-09-01 |pages=234-244 |language=en |accessdate=2019-06-05 |archive-date=2018-06-17 |archive-url=https://web.archive.org/web/20180617174252/https://link.springer.com/chapter/10.1007%2F978-3-642-15280-1_23 |dead-url=no }}</ref>：
<math>D_{PAA}(\overline{X}, \overline{Y})\leq D(X, Y)</math>
如果<math>D_{PAA}</math>的定义为：
<math>D_{PAA}(\overline{X}, \overline{Y}) := \sqrt{\frac{n}{N}}\sqrt{\sum^{N}_{i=1}(\overline{x}_{i} - \overline{y}_{i})^{2}}</math>，则满足上述条件<ref name="MMM Fuad 2012">{{cite book |author1=Fuad, Muhammad Marwan Muhammad |title=Using Differential Evolution to Set Weights to Segments with Different Information Content in the Piecewise Aggregate Approximation |date=2012 |publisher=KES |pages=440-449 |url=https://books.google.com/books?hl=zh-CN&lr=&id=POW2ijBMJNwC&oi=fnd&pg=PA440&dq=piecewise+aggregate+approximation&ots=H81vRDVrHt&sig=OHEzgk1GmQMoAG6CB5Lvrjwj5UY#v=onepage&q=piecewise%20aggregate%20approximation&f=false |language=en |accessdate=2019-06-05 |archive-date=2015-10-31 |archive-url=https://web.archive.org/web/20151031194401/https://books.google.com/books?hl=zh-CN#v=onepage&q=piecewise%20aggregate%20approximation&f=false |dead-url=no }}</ref><ref name="Lin 2007">{{cite journal |author1=Jessica Lin |author2=Eamonn Keogh |author3=Li Wei |author4=Stefano Lonardi |title=Experiencing SAX: a novel symbolic representation of time series |journal=Data Mining and knowledge discovery |volume=15 |issue=2 |pages=101-144 |url=https://link.springer.com/article/10.1007/s10618-007-0064-z |accessdate=2019-06-05 |language=en |archive-date=2020-02-10 |archive-url=https://web.archive.org/web/20200210152622/https://link.springer.com/article/10.1007%2Fs10618-007-0064-z |dead-url=no }}</ref>。一些时间序列的检索方法，例如[[最近鄰居法|K-NN]]的算法中，会使用具有以上特征的距离，根据索引进行初步筛选<ref name="Keogh 2005">{{cite journal |author1=Keogh, Eamonn |author2=Chotirat Ann Ratanamahatana |title=Exact indexing of dynamic time warping |journal=Knowledge and information systems |date=2005 |volume=7 |issue=3 |pages=358-386 |url=https://link.springer.com/article/10.1007/s10115-004-0154-9 |language=en |accessdate=2019-06-04 |archive-date=2019-02-26 |archive-url=https://web.archive.org/web/20190226052223/https://link.springer.com/article/10.1007/s10115-004-0154-9 |dead-url=no }}</ref>。
[[Category:时间序列降维方法]]
== 參考資料 ==
{{Reflist|2}}

== 外部連結 ==
* [https://www.cs.ucr.edu/~eamonn/kais_2000.pdf Dimensionality reduction for fast similarity search in large time series databases（提出分段聚合近似法的原始论文）] {{Wayback|url=https://www.cs.ucr.edu/~eamonn/kais_2000.pdf |date=20190819123152 }}
* [https://jmotif.github.io/sax-vsm_site/morea/algorithm/PAA.html Piecewise Aggregate Approximation of time series] {{Wayback|url=https://jmotif.github.io/sax-vsm_site/morea/algorithm/PAA.html |date=20201109175602 }}
* [https://www.cs.ucr.edu/~eamonn/ Eammon Keogh的个人主页] {{Wayback|url=https://www.cs.ucr.edu/~eamonn/ |date=20201217085753 }}