Lock-free Concurrent Cuckoo Filter
Author:
Affiliation:

Clc Number:

TP393

  • Article
  • | |
  • Metrics
  • |
  • Reference [59]
  • | |
  • Cited by
  • | |
  • Comments
    Abstract:

    The cuckoo filter is an efficient probabilistic data structure that can quickly determine whether an element exists in a given set. The cuckoo filter is widely used in computer networks, IoT applications, and database systems. These systems usually involve the handling of massive amounts of data and numerous concurrent requests in practice. A cuckoo filter that supports high concurrency can significantly improve system throughput and data processing capabilities, which is crucial to system performance enhancement. Therefore, a cuckoo filter that supports lock-free concurrency is designed. The filter achieves high-performance lookup, insertion, and deletion through the two-stage query, separation of path exploration and element migration, and atomic migration based on multi-word compare-and-swap. Theoretical analysis and experimental results indicate that the lock-free concurrent cuckoo filter significantly improves the concurrent performance of the most cutting-edge algorithms in current times. The lookup throughput of a lock-free concurrent cuckoo filter is on average 1.94 times that of a cuckoo filter using fine-grained locks.

    Reference
    [1] Cohen E. All-distances sketches, revisited: HIP estimators for massive graphs analysis. In: Proc. of the 33rd ACM SIGMOD-SIGACT-SIGART Symp. on Principles of Database Systems. Snowbird: ACM, 2014. 88–99. [doi: 10.1145/2594538.2594546]
    [2] Dai HP, Li M, Liu A. Finding persistent items in distributed datasets. In: Proc. of the 2018 IEEE Conf. on Computer Communications. Honolulu: IEEE, 2018. 1403–1411. [doi: 10.1109/INFOCOM.2018.8486425]
    [3] Mosharraf SIM, Adnan MA. Improving lookup and query execution performance in distributed big data systems using cuckoo filter. Journal of Big Data, 2022, 9(1): 12.
    [4] Vora AV, Hegde S. Keyword-based private searching on cloud data along with keyword association and dissociation using cuckoo filter. Int’l Journal of Information Security, 2019, 18(3): 305–319.
    [5] Zhao YK, Dai WC, Wang SR, Xi L, Wang SQ, Zhang F. A review of cuckoo filters for privacy protection and their applications. Electronics, 2023, 12(13): 2809.
    [6] Morales D, Agudo I, Lopez J. Private set intersection: A systematic literature review. Computer Science Review, 2023, 49: 100567.
    [7] Kumar M, Singh A. Probabilistic data structures in smart city: Survey, applications, challenges, and research directions. Journal of Ambient Intelligence and Smart Environments, 2022, 14(4): 229–284.
    [8] Dai HP, Li M, Liu AX, Zheng JQ, Chen GH. Finding persistent items in distributed datasets. IEEE/ACM Trans. on Networking, 2020, 28(1): 1–14.
    [9] Zhao QC, Huang C, Dai LH. VULDEFF: Vulnerability detection method based on function fingerprints and code differences. Knowledge-based Systems, 2023, 260: 110139.
    [10] Yang JS, Jia WC, Gao Z, Guo ZH, Zhou Y, Pan Z. Cuckoo-store engine: A Reed-Solomon code-based ledger storage optimization scheme for blockchain-enabled IoT. Electronics, 2023, 12(15): 3328.
    [11] Sajitha M, Kavitha D, Reddy PC. An optimized clone node detection in WSN using cuckoo filter. SN Computer Science, 2023, 4(2): 167.
    [12] The Apache software foundation. Bloom filter index. 2022. https://doris.apache.org/docs/1.2/data-table/index/bloomfilter/
    [13] Honarkhah M, Talebzadeh A. HyperLogLog in presto: A significantly faster way to handle cardinality estimation. 2018. https://engineering.fb.com/2018/12/13/data-infrastructure/hyperloglog/
    [14] Ray N, Yang FJ. How we scaled HyperLogLog: Three real-world optimizations. 2014. https://metamarkets.com/2014/engineering-hyperloglog-optimizations-for-real-world-systems/
    [15] Ghemawat S, Dean J. LevelDB. 2023. https://github.com/google/leveldb
    [16] Meta Inc. Rocksdb. 2023. https://github.com/facebook/rocksdb
    [17] Gu R, Li SM, Dai HP, Wang HC, Luo YL, Fan B, Basat RB, Wang K, Song ZY, Chen SW, Wang BN, Huang YH, Chen GH. Adaptive online cache capacity optimization via lightweight working set size estimation at scale. In: Proc. of the 2023 USENIX Annual Technical Conf. Boston: USENIX Association, 2023. 467–484.
    [18] Sutter H. The free lunch is over: A fundamental turn toward concurrency in software. Dr. Dobb’s Journal, 2005, 30(3): 202–210.
    [19] 吕天根, 洪日昌, 何军, 胡社教. 多模态引导的局部特征选择小样本学习方法. 软件学报, 2023, 34(5): 2068–2082. http://www.jos.org.cn/1000-9825/6771.htm
    Lü TG, Hong RC, He J, Hu SJ. Multimodal-guided local feature selection for few-shot learning. Ruan Jian Xue Bao/Journal of Software, 2023, 34(5): 2068–2082 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6771.htm
    [20] 刘睿诚, 张俊晨, 罗永平, 金培权. 面向非易失内存的异构索引. 软件学报, 2022, 33(3): 832–848. http://www.jos.org.cn/1000-9825/6456.htm
    Liu RC, Zhang JC, Luo YP, Jin PQ. Heterogeneous index for non-volatile memory. Ruan Jian Xue Bao/Journal of Software, 2022, 33(3): 832–848 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6456.htm
    [21] 符鹏涛, 罗来龙, 郭得科, 赵翔, 李尚森, 王怀民. 跳跃滤波: 一种面向大数据治理的动态数据摘要设计. 软件学报, 2023, 34(3): 1193–1212. http://www.jos.org.cn/1000-9825/6782.htm
    Fu PT, Luo LL, Guo DK, Zhao X, Li SS, Wang HM. Jump filter: Dynamic sketch design for big data governance. Ruan Jian Xue Bao/Journal of Software, 2023, 34(3): 1193–1212 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6782.htm
    [22] Li XZ, Andersen DG, Kaminsky M, Freedman MJ. Algorithmic improvements for fast concurrent cuckoo hashing. In: Proc. of the 9th European Conf. on Computer Systems. Amsterdam: ACM, 2014. 27. [doi: 10.1145/2592798.2592820]
    [23] Rinberg A, Spiegelman A, Bortnikov E, Hillel E, Keidar I, Rhodes L, Serviansky H. Fast concurrent data sketches. In: Proc. of the 25th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming. San Diego: ACM, 2020. 117–129.
    [24] Arpaci-Dusseau RH, Arpaci-Dusseau AC. Operating Systems: Three Easy Pieces. North Charleston: CreateSpace Independent Publishing Platform, 2018.
    [25] Abraham S, Peter BG, Greg G. Operating System Concepts. 10th ed., Hoboken: John Wiley & Sons, 2018.
    [26] The Apache Software Foundation. Atomic instructions. 2023. https://brpc.incubator.apache.org/docs/rpc-in-depth/atomic-instructions/
    [27] Bloom BH. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 1970, 13(7): 422–426.
    [28] 谢鲲, 文吉刚, 张大方, 谢高岗. 布鲁姆过滤器查询算法. 软件学报, 2009, 20(1): 96–108. http://www.jos.org.cn/1000-9825/3458.htm
    Xie K, Wen JG, Zhang DF, Xie GG. Bloom filter query algorithm. Ruan Jian Xue Bao/Journal of Software, 2009, 20(1): 96–108. http://www.jos.org.cn/1000-9825/3458.htm
    [29] Wang HC, Dai HP, Li M, Yu J, Gu R, Zheng JQ, Chen GH. Bamboo filters: Make resizing smooth. In: Proc. of the 38th IEEE Int’l Conf. on Data Engineering. Kuala Lumpur: IEEE, 2022. 979–991. [doi: 10.1109/ICDE53745.2022.00078]
    [30] Fan B, Andersen DG, Kaminsky M, Mitzenmacher MD. Cuckoo filter: Practically better than bloom. In: Proc. of the 10th ACM Int’l on Conf. on Emerging Networking Experiments and Technologies. Sydney: ACM, 2014. 75–88. [doi: 10.1145/2674005.2674994]
    [31] Voras I, Žagar M. Adapting the bloom filter to multithreaded environments. In: Proc. of the 15th IEEE Mediterranean Electrotechnical Conf. Valletta: IEEE, 2010. 1488–1493. [doi: 10.1109/MELCON.2010.5476244]
    [32] Pandey P, Conway A, Durie J, Bender MA, Farach-Colton M, Johnson R. Vector quotient filters: Overcoming the time/space trade-off in filter design. In: Proc. of the 2021 Int’l Conf. on Management of Data. Virtual Event: ACM, 2021. 1386–1399.
    [33] Bender MA, Farach-Colton M, Johnson R, Karner R, Kuszmaul BC, Medjedovic D, Montes P, Shetty P, Spillane RP, Zadok E. Don’t thrash: How to cache your hash on flash. Proc. of the VLDB Endowment, 2012, 5(11): 1627–1637.
    [34] Pandey P, Bender MA, Johnson R, Patro R. A general-purpose counting filter: Making every bit count. In: Proc. of the 2017 ACM Int’l Conf. on Management of Data. Chicago: ACM, 2017. 775–787. [doi: 10.1145/3035918.3035963]
    [35] Wang ZQ, Pavlo A, Lim H, Leis V, Zhang HC, Kaminsky M, Anderson DG. Building a BW-tree takes more than just buzz words. In: Proc. of the 2018 Int’l Conf. on Management of Data. Houston: ACM, 2018. 473–488. [doi: 10.1145/3183713.3196895]
    [36] Levandoski JJ, Lomet DB, Sengupta S. The BW-tree: A B-tree for new hardware platforms. In: Proc. of the 29th IEEE Int’l Conf. on Data Engineering. Brisbane: IEEE, 2013. 302–313. [doi: 10.1109/ICDE.2013.6544834]
    [37] Leis V, Scheibner F, Kemper A, Neumann T. The ART of practical synchronization. In: Proc. of the 12th Int’l Workshop on Data Management on New Hardware. San Francisco: ACM, 2016. 3. [doi: 10.1145/2933349.2933352]
    [38] Nguyen N, Tsigas P. Lock-free cuckoo hashing. In: Proc. of the 34th IEEE Int’l Conf. on Distributed Computing Systems. Madrid: IEEE, 2014. 627–636. [doi: 10.1109/ICDCS.2014.70]
    [39] Kelly R, Pearlmutter BA, Maguire P. Concurrent robin hood hashing. arXiv:1809.04339, 2018.
    [40] Harris TL, Fraser K, Pratt IA. A practical multi-word compare-and-swap operation. In: Proc. of the 16th Int’l Conf. on Distributed Computing. Toulouse: Springer, 2002. 265–279. [doi: 10.1007/3-540-36108-1_18]
    [41] Arbel-Raviv M, Brown T. Reuse, Don’t recycle: Transforming lock-free algorithms that throw away descriptors. arXiv:1708.01797, 2017.
    [42] Kelly R, Pearlmutter BA, Maguire P. Lock-free hopscotch hashing. In: Proc. of the 2020 Symp. on Algorithmic Principles of Computer Systems. Philadelphia: SIAM, 2020. 45–59. [doi: 10.1137/1.9781611976021.4]
    [43] Fatourou P, Kallimanis ND, Ropars T. An efficient wait-free resizable hash table. In: Proc. of the 30th on Symp. on Parallelism in Algorithms and Architectures. Vienna: ACM, 2018. 111–120. [doi: 10.1145/3210377.3210408]
    [44] Williams A. C++ Concurrency in Action. 2nd ed., New York: Manning Publications, 2017.
    [45] Prakasam E, Manoharan A. A cache efficient one hashing blocked bloom filter (OHBB) for random strings and the K-mer strings in DNA sequence. Symmetry, 2022, 14(9): 1911.
    [46] CAIDA. The CAIDA anonymized Internet traces dataset (April 2008–January 2019). 2019. https://www.caida.org/data/passive/passive_dataset.xml
    [47] Copper BF, Silberstein A, Tam E, Ramakrishnan R, Sears R. Benchmarking cloud serving systems with YCSB. In: Proc. of the 1st ACM Symp. on Cloud Computing. Indianapolis: ACM, 2010. 143–154. [doi: 10.1145/1807128.1807152]
    [48] Marcus R, Kipf A, van Renen A, Stoian M, Misra S, Kemper A, Neumann T, Kraska T. Benchmarking learned indexes. Proc. of the VLDB Endowment, 2020, 14(1): 1–13.
    [49] Powers DMW. Applications and explanations of Zipf’s law. In: Proc. of the 1998 Joint Conf. on New Methods in Language Processing and Computational Natural Language Learning. Sydney: ACM, 1998. 151–160.
    [50] 俞加平, 陈华辉, 钱江波, 董一鸿. LSM树中基于热度预测的异构布隆过滤器方案. 电子学报, 2021, 49(11): 2090–2095.
    Yu JP, Chen HH, Qian JB, Dong YH. A heterogeneous bloom filter scheme in LSM tree based on hotness prediction. Acta Electronica Sinica, 2021, 49(11): 2090–2095 (in Chinese with English abstract).
    [51] 周舟, 付文亮, 嵩天, 刘庆云. 一种基于并行Bloom Filter的高速URL查找算法. 电子学报, 2015, 43(9): 1833–1840.
    Zhou Z, Fu WL, Song T, Liu QY. Fast URL lookup using parallel bloom filters. Acta Electronica Sinica, 2015, 43(9): 1833–1840 (in Chinese with English abstract).
    [52] 李琪, 钟将, 李雪, 李青. 基于新型非易失存储器的混合内存架构的内存管理机制. 电子学报, 2019, 47(3): 664–670.
    Li Q, Zhong J, Li X, Li Q. Memory management mechanism for hybrid memory architecture based on new non-volatile memory. Acta Electronica Sinica, 2019, 47(3): 664–670 (in Chinese with English abstract).
    Related
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

王瀚橙,陈志鹏,戴海鹏,顾荣,KIM Chaewon,陈贵海.无锁并发布谷鸟过滤器.软件学报,,():1-19

Copy
Share
Article Metrics
  • Abstract:157
  • PDF: 1644
  • HTML: 0
  • Cited by: 0
History
  • Received:October 13,2023
  • Revised:February 12,2024
  • Online: June 20,2024
You are the first2035080Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063