本文由“非经典条件下的机器学习方法”专题特约编辑高新波教授、黎铭教授、李天瑞教授推荐.
流数据分类旨在从连续不断到达的流式数据中增量学习一个从输入变量到类标变量的映射函数,以便对随时到达的测试数据进行准确分类.在线学习范式作为一种增量式的机器学习技术,是流数据分类的有效工具.主要从在线学习的角度对流数据分类算法的研究现状进行综述.具体地,首先介绍在线学习的基本框架和性能评估方法,然后着重介绍在线学习算法在一般流数据上的工作现状,在高维流数据上解决"维度诅咒"问题的工作现状,以及在演化流数据上处理"概念漂移"问题的工作现状,最后讨论高维和演化流数据分类未来仍然存在的挑战和亟待研究的方向.
The objective of streaming data classification is to learn incrementally a decision function that maps input variables to a label variable, from continuously arriving streaming data, so as to accurately classify the test data that may arrive anytime. The online learning paradigm, as an incremental machine learning technology, is an effective tool for classification of streaming data. This paper mainly summarizes, from the perspective of online learning, the recent development of algorithms for streaming data classification. Specifically, the basic framework and the performance evaluation methodology of online learning are first introduced. Then, the latest development of online learning algorithms for general streaming data, for alleviating the "curse of dimensionality" problem in high-dimensional streaming data, and for resolving the "concept drifting" problem in evolving streaming data are reviewed respectively. Finally, future challenges and promising research directions for classification of high-dimensional and evolving streaming data are also discussed.
21世纪以来, 随着互联网、电子商务、移动通信和物联网等技术的飞速发展, 人们可搜集到的数据呈现爆炸性增长, 以“大数据”驱动的数据科学应运而生, 且受到世界各国政府、知名企业和科研机构的高度重视和密切关注.流数据是大数据的一个重要来源[
(1) 无限的数据量:数据以流的形式连续不断地来到, 因此存储所有的数据进行多遍扫描是不现实的.为了能够处理无限量的数据, 学习算法需要能够工作在资源受限的环境中, 算法存储学习模型所需的空间代价应独立于所处理的样本数.
(2) 样本产生速度快:这个特点要求算法必须具备实时处理和分析的能力.
流数据分类是流数据挖掘中一项非常重要的研究任务, 该任务旨在从流式数据中增量学习一个从输入变量到类标变量的映射函数, 以便对随时到达的新样本进行准确分类.传统的统计机器学习算法, 又称为“批量学习”算法, 例如, 决策树算法ID3、C4.5和CART以及基于序列最小优化的SVM等, 在学习过程中需要多遍扫描所有可利用的数据, 且一旦训练结束, 当有新的训练数据到来时, 不能在旧模型的基础上进行增量更新, 只能重新训练一个新模型, 因此很难处理流数据带来的样本无限量和实时分析的挑战.相比之下, 在线机器学习(简称在线学习)作为一种增量式的机器学习技术, 能够对模型进行实时增量更新, 学习过程中随时可以使用当前学到的模型对未知样本进行预测, 且一旦对样本处理完毕, 经常不需要对其进行存储和再访问, 因此在线学习非常适用于处理大规模流式数据.
除了在数据流应用上具有优势以外, 在学习理论方面, 在线学习与统计机器学习相比也具有一定的优势.统计机器学习假设给定一个训练数据集, 该集合中的样本是独立同分布的, 且服从某个未知分布
由于在线学习在理论和应用方面的优势, 以及近年来大数据应用需求的显著增长, 在线学习已成为热门的研究方向.尤其是在2007年, Shai等人首次利用随机次梯度下降设计出一种高效的在线SVM求解算法——Primal Estimated Sub-gradient Solver for SVM, 简称Pegasos[
本文主要从在线学习算法适用的流数据的特点对算法进行综述, 具体地, 分别调查了适用于一般流数据、高维流数据和演化流数据的在线分类算法研究现状.尽管Hoi等人[
本文第1节介绍在线学习的基本定义和性能评价方法.第2节~第4节分别对面向一般流数据、高维流数据以及演化流数据的在线分类算法进行详细的对比分析.第5节讨论高维和演化流数据上的分类任务仍然存在的挑战以及具有一定前途的研究方向.最后, 第6节对全文进行总结.
在线学习提供了一种可扩展性良好且灵活的、方法建模广泛的预测问题, 例如分类、回归和排名问题等[
一个在线学习过程可以被建模为一个重复的预测游戏[
在线学习过程
The process of online learning
在上述定义中, 当
常见的损失函数表达式
Expressions of common loss functions
损失函数 | 表达式 |
0-1损失 | 如果 |
逻辑损失 | |
铰链损失 | |
平方铰链损失 | |
多分类铰链损失 | |
最小二乘损失 | |
当决策域
在线学习使用悔恨(regret)来衡量算法的性能.记
其中,
在线学习算法的目标就是最小化在
一般流数据是最简单的流数据特点, 正因为如此, 该研究领域得到快速发展, 涌现出丰富的算法, 对所有算法进行描述是困难的, 这里仅介绍一些代表性的方法.
一般流数据上的在线学习算法分类
Taxonomy of online learning algorithms for general streaming data
在线线性学习旨在学习一个线性预测器.线性方法尽管简单, 但在文档类型的数据上却可以提供极具竞争力的结果[
(1) 一阶方法
在线梯度下降(online gradient descent, 简称OGD)[
被动主动算法(passive aggressive learning, 简称PA)[
Pegasos算法[
不同于OGD和Pegasos使用预定义的步长调度, 多步长梯度算法(multiple eta gradient, 简称MetaGrad)[
(2) 二阶方法
在线牛顿方法(online Newton step, 简称ONS)[
尽管ONS收敛速度快, 但在每轮学习中需要花费
置信度加权学习(confidence-weighted learning, 简称CW)[
算法CW总是强迫当前样本被正确分类, 这个约束使得它极容易受到噪声数据的影响而产生过拟合, 并且以概率形式表示的约束条件使得CW较难拓展到其他类型的学习任务中.于是, 自适应权重正则化(adaptive regularization of weights, 简称AROW)[
软置信度加权学习(soft confidence-weighted learning, 简称SCW)[
非线性学习旨在学习一个非线性预测器来解决线性不可分问题.非线性分类方法可以分为核方法和非核的其他方法.核方法主要是基于SVM的方法, 而非核方法中绝大多数是基于决策树的方法.两类方法不同的学习机制决定它们在不同特点的数据上各自具有优势, 因此, 当前对两类方法的研究呈现出平行发展的趋势.
(1) 核方法
核方法可以分为单核和多核方法.单核方法[
基于预算维护的方法在学习过程中总是保持支持向量的数目上界于一个预定义的预算常数, 每当支持向量的数目超过这个预算, 预算维护步骤就被触发.常见的预算维护策略包括:删除、投影和合并.删除策略每次寻找一个支持向量进行删除, 目前的删除策略包括删除最老的支持向量[
基于核函数近似表示的方法[
基于核心集表示的方法[
在线多核学习[
(2) 其他方法
其他非线性的方法主要是基于决策树的算法[
原始的Hoeffding树只能处理离散属性, 对测试数据遍历到叶子节点后使用多数表决的方式进行分类, Gama等人[
早期增量决策树算法研究主要集中于改进Hoeffding树的泛化性能或对Hoeffding树算法进行拓展.近5年来, Rutkowski等人[
完全信息设置中经常假设所有数据类标的完整信息都可以获取, 且表示数据的特征向量的获取代价较小, 然而, 在很多实际应用中, 这样的假设并不成立, 因此一些工作开始研究部分信息或弱信息下的在线分类算法.相比于完全信息的设置, 部分信息设置下, 算法可利用的信息受限, 学习也更具有挑战性.
(1) bandit反馈的在线多分类
在具有bandit反馈的多分类任务中[
2008年, Kakade等人[
2009年, Chen等人[
2011年, Hazan等人[
2013年, Crammer等人[
2017年, Beygelzimer等人[
理论分析的结果表明, 针对这组损失函数, SOBA算法都能近似取得
(2) 部分标记下的在线主动学习
在垃圾邮件在线分类系统[
2004年, Cesa-Bianchi[
(3) 部分属性下的在线学习
在一些典型的流数据应用中, 获取实例属性值的代价较大[
2010年, Cesa-Bianchi等人[
上述工作致力于解决回归问题, 如何将其高效地拓展到分类问题常用的损失函数, 仍然是一个开放的问题, 但研究已经表明, 梯度无偏估计的方法不能轻易地用在非光滑的损失函数上, 例如铰链损失[
在很多应用领域中, 表示数据的特征向量的维度很高, 非稀疏学习算法在这种情况下可能会遭遇“维度诅咒”问题而产生过拟合.稀疏模型通过大量约减数据维度有助于构建可靠的分类模型, 是维度诅咒问题典型的解决方案[
在稀疏在线学习问题中, 一个线性预测器
稀疏在线学习最近的工作
Recent work in sparse online learning
稀疏策略 | 参考文献/方法 |
[ |
|
FOBOS[ |
|
OFS[ |
基于
Duchi等人[
FOBOS算法每次迭代中直接对中间解进行截断的操作过于激进, 会阻碍模型在稀疏属性上的有效更新.相比之下, Langford等人[
2010年, Xiao[
RDA和CMD算法都探索了如何利用损失函数的正则化结构来提高模型的稀疏性, 但它们都未充分利用学习早期迭代中的次梯度信息, 该信息有助于识别有辨别力但出现次数少的特征.因此, Duchi等人[
对于完全矩阵和对角矩阵的算法, Duchi等人都证明算法可以取得与
2014年, Wang人[
Jin等人[
Zhai等人[
OFS和SALC均是基于一阶算法OGD设计的截断方法, 没有利用二阶信息来辅助学习.Wu等人[
Zhai等人[
当前演化流数据分类的工作仍然主要集中在完全信息设置下.演化流数据是指流中数据不再满足独立同分布假设, 数据分布会随着时间发生演化, 即存在“概念漂移”现象.形式上, 概念漂移指的是输入变量到类标变量之间的函数关系会随着时间发生难以预料的变化[
存在的演化流数据分类算法可以分为单模型算法和多模型算法[
多模型算法也称为自适应集成算法, 在内存中维护多个增量模型做出联合预测.多模型方法可以删除过期的模型并创建最新的模型, 借此来遗忘旧概念并适应最新概念, 因此能够更灵活地处理概念漂移.目前已有的算法可以分为两类:第1类是基于数据块的集成[
在线集成算法总结
A summary of online ensemble algorithms
算法 | 组件分类器 | 多样性策略 | 组件评估策略 | 聚合策略 | 漂移检测 | 结构调整 |
OzaBag[ |
任意在线分类器 | 重采样 | - | 多数投票 | - | × |
DWM[ |
任意在线分类器 | 周期性创建新组件 | 指数递减 | 加权多数投票 | × | √ |
ASHTBag[ |
Hoeffding树 | 重采样+控制树大小 | 指数加权移动平均 | 加权多数投票 | × | × |
ADWINBag[ |
任意在线分类器 | 重采样 | ADWIN[ |
多数投票 | √ | √ |
LevBag[ |
任意在线分类器 | 重采样 | ADWIN | 多数投票 | √ | √ |
DDD[ |
任意集成 | 不同的重采样参数 | 序列准确率 | 选择加权多数投票 | √ | × |
OAUE[ |
任意在线分类器 | 周期性创建新组件 | 均方差 | 加权多数投票 | × | √ |
EBPegasos[ |
BPegasos[ |
控制预测参数大小 | ADWIN | 加权多数投票 | √ | √ |
2005年, Oza[
2007年, Kolter等人[
2009年, Bifet等人[
Bifet等人[
2010年, Bifet等人[
2012年, Minku等人[
2014年, Brzezinski等人[
2017年, Zhai等人[
上面我们分析了几种典型的在线集成算法, 对基于数据块的集成算法的对比分析可以参考许冠英等人[
大数据应用需求的显著增长推动了在线学习领域的快速发展, 已涌现出更为丰富的方法.
不同场景设置下流数据分类工作的丰富程度
The degree of richness of research on streaming data classification in different settings
流数据特点 | 在线学习 | |
完全信息 | 部分信息 | |
一般 | ∆∆∆∆∆ | ∆∆∆ |
高维度 | ∆∆∆ | ∆ |
演化(概念漂移) | ∆∆∆ | ∆ |
从
高维和演化流数据分类有前途的研究方向
Future research directions for classification of high-dimensional or/and evolving streaming data
(1) 高维流数据上部分属性的稀疏在线学习/在线特征选择
在该设置下, 为了使算法收敛到一个好的稀疏模型, 当算法在每次迭代中选择要观察的属性值集合时, 一方面需要充分利用当前已有的信息来衡量哪些属性是最相关的, 另一方面也要探索新的潜在重要的属性, 算法需要在探索和利用之间进行合理的权衡, 并根据获得的部分属性信息有效地更新学习模型, 还要确保学到的模型满足给定的稀疏性约束.目前, Wang等人[
(2) 高维流数据上部分标记的稀疏在线学习/在线特征选择
在该设置下, 为在受限的标记代价下构建尽可能准确的稀疏预测模型, 算法需要谨慎挑选要标注的样本, 同时充分利用已获得标记的样本信息来构建稀疏预测模型.尽管在线主动学习近年来得到越来越多的关注[
(3) 演化流数据上部分属性的在线单模型/集成学习
在该设置下, 算法选择要观察的属性时, 应选择与当前概念最相关的属性集.这个问题的挑战在于, 概念会发生漂移, 导致与分类性能最相关的属性集也会随之发生变化.目前鲜有对该问题的研究, 亟待填补空白.
(4) 演化流数据上部分标记的在线单模型/集成学习
在该设置下, 当算法选择要标注的样本时, 要保证所选样本能够尽可能地反映最新的数据分布情况.这个问题的挑战在于, 概念漂移会导致数据的分布状况发生变化, 从而要求算法能尽早感知到漂移发生并对样本选择策略做出调整.目前主动学习的研究集中在稳态流数据上, 只有少量的工作可以处理有“概念漂移”的演化流数据[
互联网、电子商务、移动通信及物联网等应用领域中的数据往往以流的形式源源不断地生成, 数据呈现出大规模、高速到达的特点, 需要对数据进行实时处理和分析, 而传统的批量处理的机器学习方法很难满足上述需求, 相比之下, 在线学习由于其在线更新的计算模式非常契合流数据的特点, 有望成为流数据领域中的主流方法.本文从两个维度——流数据特点(即一般、高维、演化)和在线学习特点(即完全信息、部分信息), 对现有的流数据上的在线分类算法进行了分类对比分析.在综述中发现, 现有的在线分类方法主要集中在完全信息设置下, 而在部分信息设置下的研究工作则较少, 尤其是在高维和演化流数据上.导致该现象的其中一个原因是部分信息设置下, 算法可利用的数据信息非常受限, 例如部分实例的标记信息难以获取、部分实例的属性信息缺失等, 从而导致学习难度的大幅度提高.考虑到部分信息设置比较符合真实的流数据应用场景, 同时结合当前的流数据分类研究现状, 给出未来在流数据分类领域中在线学习的几个有前途的研究方向.
Aggarwal CC. A survey of stream classification algorithms. In: Data Classification: Algorithms and Applications. CRC Press, 2014. 245-274.
Krempl G, Zliobaite I, Brzezinski D, Hüllermeier E, Last M, Lemaire V, Noack T, Shaker A, Sievi S, Spiliopoulou M, Stefanowski J. Open challenges for data stream mining research. SIGKDD Explorations, 2014, 16(1):1-10.
Zhai TT. Online learning algorithms for classification of streaming data[Ph.D. Thesis]. Nanjing: Nanjing University, 2018(in Chinese with English abstract).
翟婷婷.面向流数据分类的在线学习算法研究[博士学位论文].南京: 南京大学, 2018.
Vapnik V. An overview of statistical learning theory. IEEE Trans. on Neural Networks, 1999, 10(5):988-999.
Shalev-Shwartz S, Singer Y. Online learning: Theory, algorithms, and applications[Ph.D. Thesis]. Jerusalem: Hebrew University, 2007.
Shalev-Shwartz S. Online learning and online convex optimization. Foundations and Trends in Machine Learning, 2012, 4(2):107-194.
Hazan E. Introduction to online convex optimization. Foundations and Trends in Optimization, 2016, 2(3/4):157-325.
Cesa-Bianchi N, Conconi A, Gentile C. On the generalization ability of online learning algorithms. IEEE Trans. on Information Theory, 2004, 50(9):2050-2057.
Shalev-Shwartz S, Singer Y, Srebro N. Pegasos: Primal estimated sub-gradient solver for SVM. In: Proc. of the Int'l Conf. on Machine Learning (ICML 2007). 2007. 807-814.
Zhang L, Yi J, Jin R, Lin M, He X. Online kernel learning with a near optimal sparsity bound. In: Proc. of the Int'l Conf. on Machine Learning (ICML 2013). 2013. 621-629.
Daniely A, Gonen A, Shalev-Shwartz S. Strongly adaptive online learning. In: Proc. of the Int'l Conf. on Machine Learning (ICML 2015). 2015. 1405-1411.
György A, Szepesvári C. Shifting regret, mirror descent, and matrices. In: Proc. of the Int'l Conf. on Machine Learning (ICML 2016). 2016. 2943-2951.
Shamir O, Szlak L. Online learning with local permutations and delayed feedback. In: Proc. of the 34th Int'l Conf. on Machine Learning (ICML 2017). 2017. 3086-3094.
Quanrud K, Khashabi D. Online learning with adversarial delays. In: Proc. of the Advances in Neural Information Processing Systems (NIPS 2015). 2015. 1270-1278.
Luo H, Agarwal A, Cesa-Bianchi N, Langford J. Efficient second order online learning by sketching. In: Proc. of the Advances in Neural Information Processing Systems (NIPS 2016). 2016. 902-910.
Erven T, Koolen WM. MetaGrad: Multiple learning rates in online learning. In: Proc. of the Advances in Neural Information Processing Systems (NIPS 2016). 2016. 3666-3674.
Luo H, Wei CY, Zheng K. Efficient online portfolio with logarithmic regret. In: Proc. of the Advances in Neural Information Processing Systems (NIPS 2018). 2018. 8245-8255.
Gillen S, Jung C, Kearns MJ, Roth A. Online learning with an unknown fairness metric. In: Proc. of the Advances in Neural Information Processing Systems (NIPS 2018). 2018. 2605-2614.
Lu J, Hoi SCH, Wang J, Zhao P, Liu Z. Large scale online kernel learning. Journal of Machine Learning Research, 2016, 17:47:1-47:43.
Shi T, Zhu J. Online Bayesian passive-aggressive learning. Journal of Machine Learning Research, 2017, 18:33:1-33:39.
Le T, Nguyen TD, Nguyen V, Phung DQ. Approximation vector machines for large-scale online learning. Journal of Machine Learning Research, 2017, 18:111:1-111:55.
Lei Y, Shi L, Guo Z. Convergence of unregularized online learning algorithms. Journal of Machine Learning Research, 2017, 18:171:1-171:33.
Chaudhuri S, Tewari A. Online learning to rank with top-k feedback. Journal of Machine Learning Research, 2017, 18:103:1-103:50.
Wang J, Zhao P, Hoi SCH, Jin R. Online feature selection and its applications. IEEE Trans. on Knowledge and Data Engineering, 2014, 26(3):698-710.
Wang J, Wang M, Li P, Liu L, Zhao Z, Hu X, Wu X. Online feature selection with group structure analysis. IEEE Trans. on Knowledge and Data Engineering, 2015, 27(11):3029-3041.
Hao S, Lu J, Zhao P, Zhang C, Hoi SCH, Miao C. Second-order online active learning and its applications. IEEE Trans. on Knowledge and Data Engineering, 2018, 30(7):1338-1351.
http://arxiv.org/abs/1802.02871]]>
Li ZJ, Li YX, Wang F, He GL, Kuang L. Online learning algorithms for big data analytics:A survey. Journal of Comuter Research and Develoment, 2015, 52(8):1707-1721(in Chinese with English abstract).
李志杰, 李元香, 王峰, 何国良, 匡立.面向大数据分析的在线学习算法综述.计算机研究与发展, 2015, 52(8):1707-1721.
Pan ZS, Tang SQ, Qiu JY, Hu GY. Survey on online learning algorithms. Journal of Data Acquisition & Processing, 2016, 31(6):1067-1082(in Chinese with English abstract).
潘志松, 唐斯琪, 邱俊洋, 胡谷雨.在线学习算法综述.数据采集与处理, 2016, 31(6):1067-1082.
Rosenblatt F. The perceptron:A probabilistic model for information storage and organization in the brain. Psychological Review, 1958, 65(6):386-408.
Shalev-Shwartz S, Singer Y, Srebro N, Cotter A. Pegasos:Primal estimated sub-gradient solver for SVM. Mathematical Programming, 2011, 127(1):3-30.
Zinkevich M. Online convex programming and generalized infinitesimal gradient ascent. In: Proc. of the Int'l Conf. on Machine Learning (ICML). 2003. 928-936.
Cesa-Bianchi N, Lugosi G. Prediction, Learning, and Games. New York:Cambridge University Press, 2006. 40-66.
Yuan GX, Ho CH, Lin CJ. Recent advances of large-scale linear classification. Proc. of the IEEE, 2012, 100(9):2584-2603.
Crammer K, Dekel O, Keshet J, Shalev-Shwartz S, Singer Y. Online passive-aggressive algorithms. Journal of Machine Learning Research, 2006, 7:551-585.
Hazan E, Agarwal A, Kale S. Logarithmic regret algorithms for online convex optimization. Machine Learning, 2007, 69(2/3):169-192.
Dredze M, Crammer K, Pereira F. Confidence-weighted linear classification. In: Proc. of the Int'l Conf. on Machine Learning (ICML). 2008. 264-271.
Crammer K, Dredze M, Pereira F. Exact convex confidence-weighted learning. In: Proc. of the Advances in Neural Information Processing Systems (NIPS 2008). 2008. 345-352.
Crammer K, Dredze M, Pereira F. Confidence-weighted linear classification for text categorization. Journal of Machine Learning Research, 2012, 13:1891-1926.
Crammer K, Kulesza A, Dredze M. Adaptive regularization of weight vectors. Machine Learning, 2013, 91(2):155-187.
Wang J, Zhao P, Hoi SCH. Exact soft confidence-weighted learning. In: Proc. of the Int'l Conf. on Machine Learning (ICML). 2012.
Kivinen J, Smola AJ, Williamson RC. Online learning with kernels. IEEE Trans. on Signal Processing, 2004, 52(8):2165-2176.
Wang Z, Crammer K, Vucetic S. Breaking the curse of kernelization:Budgeted stochastic gradient descent for large-scale SVM training. Journal of Machine Learning Research, 2012, 13(1):3103-3131.
Dekel O, Shalev-Shwartz S, Singer Y. The forgetron: A kernel-based perceptron on a fixed budget. In: Proc. of the Advances in Neural Information Processing Systems (NIPS 2005). 2005. 259-266.
Cavallanti G, Cesa-Bianchi N, Gentile C. Tracking the best hyperplane with a simple budget perceptron. Machine Learning, 2007, 69(2/3):143-167.
Wang Z, Vucetic S. Online passive-aggressive algorithms on a budget. In: Proc. of the 13th Int'l Conf. on Artificial Intelligence and Statistics (AISTATS 2010). 2010. 908-915.
Zhao P, Wang J, Wu P, Jin R, Hoi SCH. Fast bounded online gradient descent algorithms for scalable kernel-based online learning. In: Proc. of the Int'l Conf. on Machine Learning (ICML). Edinburgh, 2012.
Orabona F, Keshet J, Caputo B. Bounded kernel-based online learning. Journal of Machine Learning Research, 2009, 10:2643-2666.
Wang Z, Vucetic S. Twin vector machines for online learning on a budget. In: Proc. of the SIAM Int'l Conf. on Data Mining (SDM 2009). 2009. 906-917.
Jin R, Hoi SCH, Yang T. Online multiple kernel learning: Algorithms and mistake bounds. In: Proc. of the 21st Int'l Conf. on Algorithmic Learning Theory (ALT 2010). 2010. 390-404.
Hoi SCH, Jin R, Zhao P, Yang T. Online multiple kernel classification. Machine Learning, 2013, 90(2):289-316.
Diethe T, Girolami MA. Online learning with (multiple) kernels:A review. Neural Computation, 2013, 25(3):567-625.
Lu J, Hoi SCH, Sahoo D, Zhao P. Budget online multiple kernel learning. CoRR, 2015, abs/1511.04813.
Domingos P, Hulten G. Mining high-speed data streams. In: Proc. of the 6th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining (KDD 2000). 2000. 71-80.
Jin R, Agrawal G. Efficient decision tree construction on streaming data. In: Proc. of the 9th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining (KDD 2003). 2003. 571-576.
Gama J, Rocha R, Medas P. Accurate decision trees for mining high-speed data streams. In: Proc. of the 9th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining (KDD 2003). 2003. 523-528.
Holmes G, Kirkby R, Pfahringer B. Stress-testing hoeffding trees. In: Proc. of the 9th European Conf. on Principles and Practice of Knowledge Discovery in Databases (PKDD 2005). 2005. 495-502.
Pfahringer B, Holmes G, Kirkby R. New options for Hoeffding trees. In: Proc. of the 20th Australian Joint Conf. on Artificial Intelligence. 2007. 90-99.
Hashemi S, Yang Y, Mirzamomen Z, Kangavari MR. Adapted one-versus-all decision trees for data stream classification. IEEE Trans. on Knowledge and Data Engineering, 2009, 21(5):624-637.
Rutkowski L, Pietruczuk L, Duda P, Jaworski M. Decision trees for mining data streams based on the McDiarmid's bound. IEEE Trans. on Knowledge and Data Engineering, 2013, 25(6):1272-1279.
Rutkowski L, Jaworski M, Pietruczuk L, Duda P. Decision trees for mining data streams based on the Gaussian approximation. IEEE Trans. on Knowledge and Data Engineering, 2014, 26(1):108-119.
Rutkowski L, Jaworski M, Pietruczuk L, Duda P. The CART decision tree for mining data streams. Information Sciences, 2014, 266:1-15.
Liang C, Zhang Y, Shi P, Hu Z. Learning accurate very fast decision trees from uncertain data streams. Int'l Journal of Systems Science, 2015, 46(16):3032-3050.
Bifet A, Holmes G, Pfahringer B, Kirkby R, Gavaldà R. New ensemble methods for evolving data streams. In: Proc. of the 15th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining (KDD 2009). 2009. 139-148.
Bifet A, Frank E, Holmes G, Pfahringer B. Accurate ensembles for data streams: Combining restricted Hoeffding trees using stacking. In: Proc. of the 2nd Asian Conf. on Machine Learning (ACML 2010). 2010. 225-240.
Pham XC, Dang MT, Dinh SV, Hoang S, Nguyen TT, Liew AWC. Learning from data stream based on random projection and Hoeffding tree classifier. In: Proc. of the Int'l Conf. on Digital Image Computing: Techniques and Applications (DICTA 2017). 2017. 1-8.
Rutkowski L, Jaworski M, Pietruczuk L, Duda P. A new method for data stream mining based on the misclassification error. IEEE Trans. on Neural Networks and Learning Systems, 2015, 26(5):1048-1059.
Jaworski M, Duda P, Rutkowski L. New splitting criteria for decision trees in stationary data streams. IEEE Trans. on Neural Networks and Learning Systems, 2018, 29(6):2516-2529.
Kakade SM, Shalev-Shwartz S, Tewari A. Efficient bandit algorithms for online multiclass prediction. In: Proc. of the Int'l Conf. on Machine Learning (ICML 2008). 2008. 440-447.
Valizadegan H, Jin R, Wang S. Learning to trade off between exploration and exploitation in multiclass bandit prediction. In: Proc. of the 17th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining (KDD 2011). 2011. 204-212.
Chen G, Chen G, Zhang J, Chen S, Zhang C. Beyond banditron: A conservative and efficient reduction for online multiclass prediction with bandit setting model. In: Proc. of the IEEE Int'l Conf. on Data Mining (ICDM 2009). 2009. 71-80.
Hazan E, Kale S. Newtron: An efficient bandit algorithm for online multiclass prediction. In: Proc. of the Advances in Neural Information Processing Systems (NIPS 2011). 2011. 891-899.
Crammer K, Gentile C. Multiclass classification with bandit feedback using adaptive regularization. Machine Learning, 2013, 90(3):347-383.
Beygelzimer A, Orabona F, Zhang C. Efficient online bandit multiclass learning with $\tilde O(\sqrt{T})$ regret. In: Proc. of the 34th Int'l Conf. on Machine Learning (ICML 2017). 2017. 488-497.
Allwein EL, Schapire RE, Singer Y. Reducing multiclass to binary:A unifying approach for margin classifiers. Journal of Machine Learning Research, 2000, 1:113-141.
Auer P, Cesa-Bianchi N, Fischer P. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 2002, 47(2/3):235-256.
Sculley D. Online active learning methods for fast label-efficient spam filtering. In: Proc. of the 4th Conf. on Email and Anti-Spam (CEAS 2007). 2007.
Chu W, Zinkevich M, Li L, Thomas A, Tseng BL. Unbiased online active learning in data streams. In: Proc. of the 17th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining (KDD 2011). 2011. 195-203.
Lughofer E, Pratama M. Online active learning in data stream regression using uncertainty sampling based on evolving generalized fuzzy models. IEEE Trans. on Fuzzy Systems, 2018, 26(1):292-309.
Cesa-Bianchi N, Gentile C, Zaniboni L. Worst-case analysis of selective sampling for linear-threshold algorithms. In: Proc. of the Advances in Neural Information Processing Systems (NIPS 2004). 2004. 241-248.
Cesa-Bianchi N, Gentile C, Zaniboni L. Worst-case analysis of selective sampling for linear classification. Journal of Machine Learning Research, 2006, 7:1205-1230.
Littlestone N. Learning quickly when irrelevant attributes abound:A new linear-threshold algorithm. Machine Learning, 1988, 2(4):285-318.
Zhao P, Hoi SCH. Cost-sensitive online active learning with application to malicious URL detection. In: Proc. of the 19th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining (KDD 2013). 2013. 919-927.
Lu J, Zhao P, Hoi SCH. Online passive-aggressive active learning. Machine Learning, 2016, 103(2):141-183.
Hao S, Zhao P, Lu J, Hoi SCH, Miao C, Zhang C. SOAL: Second-order online active learning. In: Proc. of 16th IEEE Int'l Conf. on Data Mining (ICDM 2016). 2016. 931-936.
Lughofer E. Online active learning:A new paradigm to improve practical useability of data stream modeling methods. Information Sciences, 2017, 415:356-376.
Cesa-Bianchi N, Shalev-Shwartz S, Shamir O. Efficient learning with partially observed attributes. Journal of Machine Learning Research, 2011, 12:2857-2878.
Zolghadr N, Bartók G, Greiner R, György A, Szepesvári C. Online learning with costly features and labels. In: Proc. of the Advances in Neural Information Processing Systems (NIPS 2013). 2013. 1241-1249.
Hazan E, Koren T. Linear regression with limited observation. In: Proc. of the 29th Int'l Conf. on Machine Learning (ICML 2012). 2012.
Cesa-Bianchi N, Shalev-Shwartz S, Shamir O. Online learning of noisy data. IEEE Trans. on Information Theory, 2011, 57(12):7907-7931.
Figueiredo MAT. Adaptive sparseness for supervised learning. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2003, 25(9):1150-1159.
Guyon I, Elisseeff A. An introduction to variable and feature selection. Journal of Machine Learning Research, 2003, 3:1157-1182.
Yu L, Liu H. Efficient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research, 2004, 5:1205-1224.
Brown G, Pocock AC, Zhao MJ, Luján M. Conditional likelihood maximisation:A unifying framework for information theoretic feature selection. Journal of Machine Learning Research, 2012, 13:27-66.
Tan M, Tsang IW, Wang L. Towards ultrahigh dimensional feature selection for big data. Journal of Machine Learning Research, 2014, 15(1):1371-1429.
Rao NS, Nowak RD, Cox CR, Rogers TT. Classification with the sparse group lasso. IEEE Trans. on Signal Processing, 2016, 64(2):448-463.
Shalev-Shwartz S, Srebro N, Zhang T. Trading accuracy for sparsity in optimization problems with sparsity constraints. SIAM Journal on Optimization, 2010, 20(6):2807-2832.
Duchi JC, Shalev-Shwartz S, Singer Y, Chandra T. Efficient projections onto the ℓ1 ball for learning in high dimensions. In: Proc. of the Int'l Conf. on Machine Learning (ICML 2008). 2008. 272-279.
Condat L. Fast projection onto the simplex and the ℓ1 ball. Mathematical Programming, 2016, 158(1/2):575-585.
Duchi JC, Singer Y. Efficient online and batch learning using forward backward splitting. Journal of Machine Learning Research, 2009, 10:2899-2934.
Langford J, Li L, Zhang T. Sparse online learning via truncated gradient. Journal of Machine Learning Research, 2009, 10:777-801.
Xiao L. Dual averaging methods for regularized stochastic learning and online optimization. Journal of Machine Learning Research, 2010, 11:2543-2596.
Duchi JC, Shalev-Shwartz S, Singer Y, Tewari A. Composite objective mirror descent. In: Proc. of the 23rd Conf. on Learning Theory (COLT 2010). 2010. 14-26.
Duchi JC, Hazan E, Singer Y. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 2011, 12:2121-2159.
Wang D, Wu P, Zhao P, Wu Y, Miao C, Hoi SCH. High-dimensional data stream classification via sparse online learning. In: Proc. of the IEEE Int'l Conf. on Data Mining (ICDM 2014). 2014. 1007-1012.
Zhai T, Koriche F, Wang H, Gao Y. Tracking sparse linear classifiers. IEEE Trans. on Neural Networks and Learning Systems, 2019, 30(7):2079-2092.[doi:10.1109/TNNLS.2018.2877433]
Wu Y, Hoi SCH, Mei T, Yu N. Large-scale online feature selection for ultra-high dimensional sparse data. ACM Trans. on Knowledge Discovery from Data, 2017, 11(4):48:1-48:22.
doi: 10.1007/978-3-030-10928-8_26.]]>
Gama J, Zliobaite I, Bifet A, Pechenizkiy M, Bouchachia A. A survey on concept drift adaptation. ACM Computing Surveys, 2014, 46(4):44:1-44:37.
Zhai T, Gao Y, Wang H, Cao L. Classification of high-dimensional evolving data streams via a resource-efficient online ensemble. Data Mining and Knowledge Discovery, 2017, 31(5):1242-1265.
Hosseini MJ, Gholipour A, Beigy H. An ensemble of cluster-based classifiers for semi-supervised classification of non-stationary data streams. Knowledge and Information Systems, 2016, 46(3):567-597.
Gama J, Fernandes R, Rocha R. Decision trees for mining data streams. Intelligent Data Analysis, 2006, 10(1):23-45.
Bifet A, Holmes G, Pfahringer B, Frank E. Fast perceptron decision tree learning from evolving data streams. In: Proc. of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2010. 299-310.
Bifet A, Pfahringer B, Read J, Holmes G. Efficient data stream classification via probabilistic adaptive windows. In: Proc. of the 28th Annual ACM Symp. on Applied Computing (SAC 2013). 2013. 801-806.
Hulten G, Spencer L, Domingos P. Mining time-changing data streams. In: Proc. of the 7th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining (KDD 2001). 2001. 97-106.
Elwell R, Polikar R. Incremental learning of concept drift in nonstationary environments. IEEE Trans. on Neural Networks, 2011, 22(10):1517-1531.
Brzezinski D, Stefanowski J. Accuracy updated ensemble for data streams with concept drift. In: Proc. of the Int'l Conf. on Hybrid Artificial Intelligence Systems. 2011. 155-163.
Brzezinski D, Stefanowski J. Reacting to different types of concept drift:The accuracy updated ensemble algorithm. IEEE Trans. on Neural Networks and Learning Systems, 2014, 25(1):81-94.
Bonab HR, Can F. GOOWE:Geometrically optimum and online-weighted ensemble classifier for evolving data streams. ACM Trans. on Knowledge Discovery from Data, 2018, 12(2):25:1-25:33.
http://www.jos.org.cn/1000-9825/4747.htm[doi: 10.13328/j.cnki.jos.004747]]]>
http://www.jos.org.cn/1000-9825/4747.htm[doi: 10.13328/j.cnki.jos.004747]]]>
Oza NC. Online bagging and boosting. In: Proc. of the IEEE Int'l Conf. on Systems, Man and Cybernetics (SMC 2005). 2005. 2340-2345.
Kolter JZ, Maloof MA. Dynamic weighted majority:An ensemble method for drifting concepts. Journal of Machine Learning Research, 2007, 8:2755-2790.
Bifet A, Gavalda R. Learning from time-changing data with adaptive windowing. In: Proc. of the 7th SIAM Int'l Conf. on Data Mining (SDM 2007). 2007. 443-448.
Bifet A, Holmes G, Pfahringer B. Leveraging bagging for evolving data streams. In: Proc. of the Joint European Conf. on Machine Learning and Knowledge Discovery in Databases. 2010. 135-150.
Minku LL, Yao X. DDD:A new ensemble approach for dealing with concept drift. IEEE Trans. on Knowledge and Data Engineering, 2012, 24(4):619-633.
Brzezinski D, Stefanowski J. Combining block-based and online methods in learning ensembles from concept drifting data streams. Information Sciences, 2014, 265:50-67.
Minku LL, White AP, Yao X. The impact of diversity on online ensemble learning in the presence of concept drift. IEEE Trans. on Knowledge and Data Engineering, 2010, 22(5):730-742.
Gama J, Sebastiao R, Rodrigues PP. On evaluating stream learning algorithms. Machine Learning, 2013, 90(3):317-346.
http://www.arocmag.com/article/01-2020-01-001.html[doi: 10.19734/j.issn.1001-3695.2018.09.0510]]]>
http://www.arocmag.com/article/01-2020-01-001.html[doi: 10.19734/j.issn.1001-3695.2018.09.0510]]]>
Cabral DRL, Barro RSM. Concept drift detection based on Fisher's exact test. Information Sciences, 2018, 442/443:220-234.
Pesaranghader A, Viktor H, Paquet E. Reservoir of diverse adaptive learners and stacking fast Hoeffding drift detection methods for evolving data streams. Machine Learning, 2018, 107(11):1711-1743.
Liu A, Lu J, Liu F, Zhang G. Accumulating regional density dissimilarity for concept drift detection in data streams. Pattern Recognition, 2018, 76:256-272.
Pan WB, Cheng G, Guo XJ, Huang SX. An adaptive classification approach based on information entropy for network traffic in presence of concept drift. Chinese Journal of Computers, 2017, 40(7):1556-1571(in Chinese with English abstract).
潘吴斌, 程光, 郭晓军, 黄顺翔.基于信息熵的自适应网络流概念漂移分类方法.计算机学报, 2017, 40(7):1556-1571.
Foster D, Kale S, Karloff H. Online sparse linear regression. In: Proc. of the 29th Annual Conf. on Learning Theory (COLT 2016). 2016, 49: 960-970.
Zliobaite I, Bifet A, Pfahringer B, Holmes G. Active learning with drifting streaming data. IEEE Trans. on Neural Networks and Learning Systems, 2014, 25(1):27-39.
Mohamad S, Mouchaweh MS, Bouchachia A. Active learning for data streams under concept drift and concept evolution. In: Proc. of the Workshop on Large-scale Learning from Data Streams in Evolving Environments (STREAMEVOLV 2016) co-located with the 2016 European Conf. on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD 2016). 2016.
Park CH, Kang Y. An active learning method for data streams with concept drift. In: Proc. of the IEEE Int'l Conf. on Big Data (BigData 2016). 2016. 746-752.