Abstract:Chinese chunking plays an important role in natural language processing. This paper presents a large margin method for Chinese chunking based on structural SVMs (support vector machines). First, a sequence labeling model and the formulation of the learning problem are introduced for Chinese chunking problem, and then the cutting plane algorithm is applied to efficiently approximate the optimal solution of the optimization problem.Finally, an improved F1 loss function is proposed to tackle Chinese chunking. The loss function can scale the F1loss value to the length of the sentence to adjust the margin accordingly, leading to more effective constraintinequalities. Experiments are conducted on UPENN Chinese Treebank-4 (CTB4), and the hamming loss function is compared with the improved F1 loss function. The experimental results show that the training algorithm with the improved F1 loss function can achieve higher performance than the Hamming loss function. The overall F1 score of Chinese chunking obtained with this approach is 91.61%, which is higher than the performance produced by the state-of-the-art machine learning models, such as CRFs (conditional random fields) and SVMs models.