Abstract:Persistent memory (PM), serving as a supplement and potential replacement for main memory, offers a lower cost for data storage while ensuring data persistence. However, traditional index structures tailored for PM like B+ trees fail to fully exploit the distribution characteristics of data for optimizing reading and writing performance on PM. Recent research endeavors have sought to enhance indexes’ reading and writing performance on PM and support index persistence through the data distribution awareness of learning indexes. Nonetheless, existing designs of persistent learning index structures suffer from additional PM accesses and poor performance when confronted with real-world data. To address the performance degradation of persistent learning indexes in the face of real data distributions, this study proposes a learning index PLTree, a DRAM/PM hybrid architecture. PLTree optimizes reading and writing performance under real data distributions through the following approaches: (1) a two-stage approach to construct the index, eliminating last-mile search in internal nodes and reducing the access of PM, (2) model-based search for efficient query performance on PM and accelerated query by leveraging metadata in DRAM, and (3) a log-based hierarchical overflow buffer structure tailored to PM characteristics to optimize writing performance. The results show that, compared with the existing persistent memory indexes (APEX, FPTree, uTree, NBTree, and DPTree), PLTree achieves significantly better performance in index construction 1.9× to 34× across various datasets. In single-threaded scenarios, PLTree exhibits an average query and insertion performance improvement of 1.26× to 4.45× and 2.63× to 6.83×, respectively. In multi-threaded scenarios, PLTree surpasses the baseline by up to 10.2× and 23.7× in query and insertion performance, respectively.