Abstract:Open domain new word detection is vital for Chinese natural language processing research. This paper proposes a novel detection algorithm based condition random field (CRF), which treats the new word detection problem as a classification problem. In this algorithm, the study tries to separate boundaries of new words from existing words with both the CRF method and a serial of statistical features extracted from large scale corpus. The effectiveness of three different discretization strategies are also compared including K-means, equal-frequency, and information gain. Experimental results on a large-scale Web corpus named SogouT show the effectiveness of the proposed algorithms.