Abstract:An improved sequence labeling model named Mixed Skip-Chain Conditional Random Field is presented to solve the problem of schema matching between semi-structured Web records and relational database. The proposed model can be trained on mixed samples set which consists of labeled samples and unlabeled relational database records to reduce the dependence on manually labeled training data. Moreover, it provides a novel way to incorporate the long-distance dependencies between different state variants. Experimental results using a large number of real-world data collected from diverse domains show that the proposed method can improve the performance of schema matching significantly.