时序数据曲线排齐的相关性分析方法
作者:
基金项目:

国家自然科学基金(61273291, 71031006); 山西省回国留学人员科研基金(2012-008); 中国民航信息技术科研基地开放基金(CAAC-ITRB-201305)


Correlation Analysis in Curve Registration of Time Series
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [25]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    时序数据是数据挖掘的一类重要对象.在做时序数据分析时,若不考虑数据的时差,则会造成相关性的误判.所以,时序数据存在相关性和时差相互制约的问题.通过对时序数据的相关性和协同性进行研究,给出了双序列的相关性判定方法和曲线排齐方法.首先,从时间弯曲的角度分析了两类相关性错误产生的原因及其特点;然后,根据相关系数的渐近分布得到相关系数在一定显著性水平上的界,将两者综合得到基于时移序列相关系数特征的相关性判定方法;最后,提出一种基于相关系数最大化的曲线排齐模型,其适用范围比AISE准则更广.模型采用光滑广义期望最大化(S-GEM)算法求解时间弯曲函数.在构造数据和真实数据上的数值实验结果表明:该相关性判别方法在伪回归识别中,比常规的3种相关系数以及Granger因果检验更有效;提出的S-GEM算法在大多数情况下明显优于连续单调排齐法(CMRM)、自模型排齐法(SMR)和极大似然排齐法(MLR).该文考虑的是双序列的线性相关问题和函数型曲线排齐方法,这些结果可为回归分析的相关性判定和时间对齐提供理论基础,并为多序列相关性分析和曲线排齐提供参考方向.

    Abstract:

    Time series data is an important object of data mining. In analysis of time series, misjudgment of correlation will occur if time lags are not considered. Therefore, there exists mutual restraint between correlation and time lags in time series. Based on the exploration of correlation and simultaneousness of time series, the correlation identification and curve registration methods for double sequences are given in this paper. Concretely, the study investigates the reasons and characteristics of two types of errors in correlation analysis in the view of time warping, and then deduces the correlation coefficient’s bounds in a certain significance level by its asymptotic distribution. Further, a correlation identification method based on time-lag series is proposed. Finally, the curve registration model of maximizing the correlation coefficient is presented with a broader application than AISE. Smoothing-generalized expectation maximization (S-GEM) algorithm is used to solve the time warping function of the new model. The experimental results on simulated and real data demonstrate that the proposed correlation identification approach is more effective than 3 correlation coefficients and Granger causality test in recognition of spurious regression. The registration method provided is obviously performed better than the classical continuous monotone registration method (CMRM), Self-modeling registration (SMR) and maximum likelihood registration (MLR) in most situations. Linear correlation of double series and functional curve registration are considered here, and the results can provide the theoretical basis for correlation identification and time alignment in regression and reference direction for correlation analysis and curves registration of multiple series.

    参考文献
    [1] Adelfio G, Chiodi M, D’Alessandro A, Luzio D, D’Anna G, Mangano G. Simultaneous seismic wave clustering and registration. Computers & Geosciences, 2012,44:60~69. [doi: 10.1016/j.cageo.2012.02.017]
    [2] Ye L, Keogh E. Time series shapelets: A new primitive for data mining. In: Proc. of the 15th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining. New York: ACM Press, 2009. 947~956. [doi: 10.1145/1557019.1557122]
    [3] Zhang ZM, Salerno JJ, Yu PS. Applying data mining in investigating money laundering crimes. In: Proc. of the 9th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining. New York: ACM Press, 2003. 747~752. [doi: 10.1145/956750.956851]
    [4] Zou PC, Wang JD, Yang GQ, Zhang X, Wang LN. Distance metric learning based on side information autogeneration for time series. Ruan Jian Xue Bao/Journal of Software, 2013,24(11):2642~2655 (in Chinese with English abstract). http://www.jos.org.cn/ 1000-9825/4464.htm [doi: 10.3724/SP.J.1001.2013.04464]
    [5] Lin ZY, Jiang Y, Lai YX, Lin C. A new algorithm on lagged correlation analysis between time series: TPFP. Journal of Computer Research and Development, 2012,12:2645~2655 (in Chinese with English abstract).
    [6] Kneip A, Gasser T. Statistical tools to analyze data representing a sample of curves. Annals of Statistics, 1992,20(3):1266~1305. [doi: 10.1214/aos/1176348769]
    [7] Silverman BW. Incorporating parametric effects into functional principal components analysis. Journal of the Royal Statistical Society (Section B), 1995,57(4):673~689.
    [8] Ramsay JO, Li X. Curve registration. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 1998,60(2): 351~363. [doi: 10.1111/1467-9868.00129]
    [9] Wang K, Gasser T. Alignment of curves by dynamic time warping. Annals of Statistics, 1997,25(3):1251~1276. [doi: 10.1214/aos/ 1069362747]
    [10] Wang K, Gasser T. Asymptotic and bootstrap confidence bounds for the structural average of curves. Annals of Statistics, 1998, 26(3):972~991. [doi: 10.1214/aos/1024691084]
    [11] Wang K, Gasser T. Synchronizing sample curves nonparametrically. Annals of Statistics, 1999,27(2):439~460. [doi: 10.1214/aos/ 1018031202]
    [12] Kneip A, Li X, MacGibbon KB, Ramsay JO. Curve registration by local regression. Canadian Journal of Statistics, 2000,28(1): 19~29. [doi: 10.2307/3315251.n]
    [13] Liu X, Müller HG. Functional convex averaging and synchronization for time-warped random curves. Journal of the American Statistical Association, 2004,99(467):687~699. [doi: 10.1198/016214504000000999]
    [14] R?nn BB. Nonparametric maximum likelihood estimation for shifted curves. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2001,63(2):243~259. [doi: 10.1111/1467-9868.00283]
    [15] Gervini D, Gasser T. Nonparametric maximum likelihood estimation of the structural mean of a sample of curves. Biometrika, 2005, 92(4):801~820. [doi: 10.1093/biomet/92.4.801]
    [16] James GM. Curve alignment by moments. The Annals of Applied Statistics, 2007,1(2):480~501. [doi: 10.1214/07-AOAS127]
    [17] Liu X, Yang MCK. Simultaneous curve registration and clustering for functional data. Computational Statistics & Data Analysis, 2009,53(4):1361~1376. [doi: 10.1016/j.csda.2008.11.019]
    [18] Granger CWJ, Newbold P. Spurious regressions in econometrics. Journal of Econometrics, 1974,2(2):111~120. [doi: 10.1016/ 0304-4076(74)90034-7]
    [19] Phillips PCB. New tools for understanding spurious regressions. Econometrica, 1998,66(6):1299~1325. [doi: 10.2307/2999618]
    [20] Liu HZ. The analysis of spurious regressions in stationary processes without drifts. The Journal of Quantitative & Technical Economics, 2010,(11):142~154 (in Chinese with English abstract).
    [21] Liu HZ. The analysis of spurious between weak stationary processes based on autocorrelation perspective. Statistics & Information Forum, 2012,27(4):10~16 (in Chinese with English abstract).
    [22] Jin H, Zhang JS, Zhang S, Yu C. The spurious regression of AR(p) infinite-variance sequence in the presence of structural breaks. Computational Statistics & Data Analysis, 2013,67:25~40. [doi: 10.1016/j.csda.2013.04.011]
    [23] Zhao ZW, Liu YP, Song LX. Asymptotic normality of sample correlation coefficient of a bivariate normal distribution. Journal of Jiamusi University, 2009,27(4):607~608, 614 (in Chinese with English abstract).
    [24] National bureau of statistics of the PRC. 2013-11-16/2013-12-10 (in Chinese). http://data.stats.gov.cn/
    [25] Gervini D, Gasser T. Self-Modelling warping functions. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2004,66(4):959~971. [doi: 10.1111/j.1467-9868.2004.B5582.x]
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

姜高霞,王文剑.时序数据曲线排齐的相关性分析方法.软件学报,2014,25(9):2002-2017

复制
分享
文章指标
  • 点击次数:5571
  • 下载次数: 9798
  • HTML阅读次数: 3464
  • 引用次数: 0
历史
  • 收稿日期:2014-01-20
  • 最后修改日期:2014-04-22
  • 在线发布日期: 2014-09-09
文章二维码
您是第19894721位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号