面向集值数据的孪生支持函数机
作者:
中图分类号:

TP18


Twin Support Function Machine for Set-valued Data
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    孪生支持向量机 (twin support vector machine, TSVM) 能有效地处理交叉或异或等类型的数据. 然而, 当处理集值数据时, TSVM通常利用集值对象的均值、中值等统计信息. 不同于TSVM, 提出能直接处理集值数据的孪生支持函数机(twin support function machine, TSFM). 依据集值对象定义的支持函数, TSFM在巴拿赫空间取得非平行的超平面. 为了抑制集值数据中的离群点, TSFM采用了弹球损失函数并引入了集值对象的权重. 考虑到TSFM是无穷维空间的优化问题, 测度采用狄拉克测度的线性组合的形式, 这构建有限维空间的优化模型. 为了有效地求解优化模型, 利用采样策略将模型转化成二次规划(quadratic programming, QP)问题并推导出二次规划问题的对偶形式, 这为判断哪些采样点是支持向量提供了理论基础. 为了分类集值数据, 定义集值对象到巴拿赫空间的超平面的距离并由此得出判别规则. 也考虑支持函数的核化以便取得数据的非线性特征, 这使得提出的模型可用于不定核函数. 实验结果表明TSFM能获取交叉类型的集值数据的内在结构并且在离群点或集值对象包含少量高维事例的情况下取得了良好的分类性能.

    Abstract:

    Twin support vector machine (TSVM) can effectively tackle data such as cross or XOR data. However, when set-valued data are handled, TSVM usually makes use of statistical information of set-valued objects such as the mean and the median. Unlike TSVM, this study proposes twin support function machine (TSFM) that can directly deal with set-valued data. In terms of support functions defined for set-valued objects, TSFM obtains nonparallel hyperplanes in a Banach space. To suppress outliers in set-valued data, TSFM adopts the pinball loss function and introduce the weights of set-valued objects. Considering that TSFM involves optimization problems in the infinite-dimensional space, the measure is taken in the form of a linear combination of Dirac measures. Thus the optimization model in the finite-dimensional space is constructed. To solve the optimization model effectively, this study employs the sampling strategy to transform the model into quadratic programming (QP) problems. The dual formulations of the QP problems are derived, which provides theoretical foundations for determining which sampling points are support vectors. To classify set-valued data, the distance from the set-valued object to the hyperplane in a Banach space is defined, and the decision rule is derived therefrom. This study also considers the kernelization of support functions to capture the nonlinear features of data, which makes the proposed model available for indefinite kernels. Experimental results demonstrate that TSFM can capture the intrinsic structure of cross-plane set-valued data and obtain good classification performance in the case of outliers or set-valued objects containing a few high-dimensional examples.

    参考文献
    相似文献
    引证文献
引用本文

梁志贞,闵玉寒,丁世飞.面向集值数据的孪生支持函数机.软件学报,,():1-18

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-03-17
  • 最后修改日期:2024-07-18
  • 在线发布日期: 2025-02-26
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号