ZHAO Jing-Sheng
School of Information and Control Engineering, Qingdao University of Technology, Qingdao 266520, China;School of Computer Science and Technology, Soochow University, Suzhou 215021, ChinaSONG Meng-Xue
School of Information and Control Engineering, Qingdao University of Technology, Qingdao 266520, ChinaGAO Xiang
School of Information and Control Engineering, Qingdao University of Technology, Qingdao 266520, ChinaZHU Qiao-Ming
School of Computer Science and Technology, Soochow University, Suzhou 215021, ChinaTP391
National Natural Science Foundation of China (61773276; 61836007)
Natural language processing is the core technology of artificial intelligence. Text representation is the basic and necessary work of natural language processing, which affects or even determines the quality and performance of natural language processing systems. This study discusses the basic principle of text representation, the formalization of natural language, the language model, and the connotation and extension of text representation. The technical classification of text representation on a macro level is analyzed. The mainstreams of text representation technologies and methods are analyzed, induced and summarized, including vector space model, topic model, graph-based model, neural network-based model, and representation learning. Event-based, semantic-based, and knowledge-based text representation technologies are also introduced. The development trends and directions of text representation technology are predicted and further discussed. Neural network-based deep learning and representation learning on text will play an important role in natural language processing. The strategy of pre-training and fine-tune optimization will gradually become the mainstream technology. Text representation needs specific analysis according to specific problems. The integration of technology and application is the driving force.
赵京胜,宋梦雪,高祥,朱巧明.自然语言处理中的文本表示研究.软件学报,2022,33(1):102-128
Copy