Abstract:Natural multimodal human computer interaction dialog requires computer be able to produce intelligent response to user's statement. Due to the limitations of knowledge base and randomness of user's discourse, a traditional human-computer dialogue system cannot answer or produce consistent answer with user's expectations when the conversation is beyond the scope of knowledge, thus affecting user's sense of experience to the natural machine dialogue system. To solve this problem, this paper presents a method of generating optimal sentence by integrating multi-modal interaction history information and data-oriented parsing model. First, rules of context-free grammar from large-scale syntax tree libraries are extracted. Then combining user's expressions, gestures and other multi-modal interaction history information in dialogue process, a data-oriented parsing (DOP) model is integrated to filter Chinese sentences which are generated by context-free grammars, ultimately generating a sentence which is grammatically and semantically sound. The method allows a computer to generate responses to the current dialogue according to the interaction history information when the system can't get the support of knowledge base, therefore enhancing user's experience to multi-channel natural-machine interaction system. The proposed method is applied to traffic information search and multi-modal multi-topic dialogue system, and the result shows it can effectively improve the naturalness and enhance user's experience.