Abstract:The popularization of GPS mobile devices and 5G Internet technology has led to the rapid growth of trajectory data. How to efficiently store, manage, and analyze massive trajectory data has become a hot research issue in the current environment. The traditional single-node trajectory index is limited by memory capacity, disk I/O speed, and other factors, and is no longer capable of managing large-scale trajectory data. Spark, as a distributed framework based on in-memory computing, has natural advantages in processing massive data. Therefore, this study proposes a distributed trajectory data indexing and query scheme based on the Spark platform. To improve the data storage capacity of a single node in a distributed cluster and the efficiency of trajectory queries, a trajectory encoding technique, Z-order trajectory encoding (ZTE), is proposed. This technique encodes the minimum adjacent subspaces covered by the trajectory minimum bounding rectangle (MBR), which can represent trajectories of different granularities and their movement directions, and is used to determine the relationship between a trajectory and the query space. Based on this technique, this study further organizes the ZTE codes of trajectories into a partial-order structure and designs a subspace partial-order branch (SPB). Combined with the hash mapping table IDMap, a local index is constructed. This index avoids the inefficiency caused by the dead space formed by the overlapping of minimum bounding rectangles in R-tree-like indexes and enables fast pruning. To support efficient retrieval of massive trajectory data, the study designs a distributed trajectory index named SPBSpark based on the SPB-branch local index. SPBSpark mainly consists of three components: data partition, local index, and global index. The proposed index effectively supports three types of queries: spatiotemporal range query, k-nearest neighbor query, and moving object trajectory query. Finally, the study selects the distributed trajectory indexes TrajSpark and LocationSpark, which are also based on the Spark framework, as comparison systems. Through comparative simulation experiments, the spatial utilization of the SPBSpark index is improved by about 15% compared with LocationSpark. In terms of query performance, SPBSpark achieves a 2–3 times performance improvement compared with TrajSpark and LocationSpark.