Abstract:Twin support vector machine (TSVM) can effectively tackle data such as cross or XOR data. However, when set-valued data are handled, TSVM usually makes use of statistical information of set-valued objects such as the mean and the median. Unlike TSVM, this study proposes twin support function machine (TSFM) that can directly deal with set-valued data. In terms of support functions defined for set-valued objects, TSFM obtains nonparallel hyperplanes in a Banach space. To suppress outliers in set-valued data, TSFM adopts the pinball loss function and introduce the weights of set-valued objects. Considering that TSFM involves optimization problems in the infinite-dimensional space, the measure is taken in the form of a linear combination of Dirac measures. Thus the optimization model in the finite-dimensional space is constructed. To solve the optimization model effectively, this study employs the sampling strategy to transform the model into quadratic programming (QP) problems. The dual formulations of the QP problems are derived, which provides theoretical foundations for determining which sampling points are support vectors. To classify set-valued data, the distance from the set-valued object to the hyperplane in a Banach space is defined, and the decision rule is derived therefrom. This study also considers the kernelization of support functions to capture the nonlinear features of data, which makes the proposed model available for indefinite kernels. Experimental results demonstrate that TSFM can capture the intrinsic structure of cross-plane set-valued data and obtain good classification performance in the case of outliers or set-valued objects containing a few high-dimensional examples.