Abstract:Efficient storage and indexing of big data streams are challenging issues in the database field. By segmenting the temporal data stream into continuous time windows, a distributed master-slave index structure is proposed based on double-layer B+ tree called WB-Index. Lower B+ tree index is built on stream tuples in each time window. Upper B+ tree index is built on each successive time window. Lower B+ tree index is constructed by combining both batch loading and parallel sorting techniques. The core idea of the construction method is to slice the time window and isolate the parallelable operations from others in the time window. Sorting and data stream receiving between slices work in parallel, while the B+ tree skeleton (a B+ tree without value) construction for the time window and the merge-sorting operation are parallelized as well. These techniques effectively expedite the B+ tree construction. Due to the monotonous increasement of timestamps of time windows, a split-less method for upper B+ tree index construction is adopted to avoid the node splitting and memory movement overhead, and improve the space utilization and update efficiency. In WB-Index, data stream tuples and index are separated, and index and hotspot data are cached as much as possible to improve query efficiency. Finally, theoretic analysis and experiments have both demonstrated that WB-Index can support efficient real-time data stream writing and stream data querying.