Abstract:Index is one of the key technologies to improve the performance of database systems. In the era of big data, the traditional indexes, such as B+-Tree, have exposed some limitations. Firstly, they cost too much space. For example, B+-Tree requires an extra O(n) space, which is intolerable for big data environment. Secondly, they require multiple indirect searches per query. For example, each query in a B+-Tree requires access to all nodes from the root to the leaf, which limits the search performance of the B+-Tree to the data size. Since 2018, the combination of artificial intelligence and database has given birth to a new research direction called "learned index". Learned indexes use machine learning to learn data distribution and query load characteristics, and replace the traditional indirect index search with a direct search based on fitting functions, so as to reduce the space cost and improve the query performance. This survey firstly systematically sorts out and classifies the existing works of learned indexes. Then, the motivation and key techniques of each learned index are introduced, and the advantages and disadvantages of various index structures are compared and analyzed. Finally, the future research directions of learned indexes are prospected.