Retrieval and Management Technology for Industrial-scale Massive Code
Author:
Affiliation:

Clc Number:

Fund Project:

National Key Research and Development Program of China (2018YFB1003900)

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    In large IT companies, especially like Google or Baidu, code search is an indispensable and frequent activity in the software development process, which speeds up the development process by learning or reusing existing code. Over the years, a large number of researchers have focused on code search and designed many excellent tools. However, the existing research and tools are mainly on a small-scale or single programming language code data set, not from the actual requirement of industries, and the user's query input is also limited; there is still a lack of a set of industrial-scale massive code retrieval and management technology solutions. This study proposes a code search engine solution and system implementation based on industrial-scale massive data, oriented to the most direct needs of users in the development process, through offline analysis and online analysis, complete the index construction and retrieval of massive code base. Among them, offline analysis is responsible for the acquisition and analysis of code-related data and building an index cluster. The online process is responsible for transforming the user's query, sorting the results of the search, and generating a summary. The system is deployed on the Baidu code base, and the index is built for dozens of TB-level Git code bases. The average retrieval time is within 1s. Since the launch of Baidu's application, the number of visits has gradually increased. There are thousands of users per week and tens of thousands of times searching. The system is widely praised by Baidu engineers.

    Reference
    Related
    Cited by
Get Citation

刘志伟,邢永旭,于澔,李涛,张晓东.企业级海量代码的检索与管理技术.软件学报,2019,30(5):1498-1509

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:August 31,2018
  • Revised:October 31,2018
  • Adopted:
  • Online: May 08,2019
  • Published:
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063