Abstract:This paper addresses the problem of paraphrase collocation extraction by using “OBJ” relationship as a case study. Specifically, the proposed method recasts paraphrase collocation extraction as a binary classification problem, which combines multiple features based on translation, thesaurus, polarity words, and web mining. Experimental results show that the binary classification-based method is effective for paraphrase collocation extraction. Especially, the exploited features are all helpful for improving the extraction performance. With the proposed method, more than 280 000 pairs of paraphrase collocations are extracted, the precision of which is above 70%. Further experiments show that nearly 40% of sentences can be paraphrased by using the extracted paraphrase collocations, which demonstrates that the proposed method is useful in practice.