Trie+结构函数式建模、机械化验证及其应用
作者:
作者简介:

左正康(1980-), 男, 博士, 副教授, CCF高级会员, 主要研究领域为形式化方法, 智能化软件;柯雨含(1998-), 男, 硕士生, CCF学生会员, 主要研究领域为定理证明, 形式化方法. ;黄箐(1984-), 男, 博士, 副教授, CCF专业会员, 主要研究领域为智能化软件. ;王玥坤(1997-), 女, 硕士生, 主要研究领域为定理证明, 形式化方法. ;曾志城(1999-), 男, 硕士生, 主要研究领域为定理证明, 形式化方法. ;王昌晶(1977-), 男, 博士, 教授, 博士生导师, CCF高级会员, 主要研究领域为高可信软件, 智能化软件.

通讯作者:

王昌晶, E-mail: wcj@jxnu.edu.cn

基金项目:

国家自然科学基金(62262031); 江西省自然科学基金(20232BAB202010); 江西省教育厅科技重点项目(GJJ210307, GJJ2200302); 江西省主要学科学术与技术带头人培养项目(20232BCJ22013)


Trie+ Structural Functional Modeling, Mechanized Verification and Application
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [31]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    Trie结构是一种使用搜索关键字来组织信息的搜索树, 可用于高效地存储和搜索字符串集合. Nipkow等人给出了实现Trie的Isabelle建模与验证, 然而其Trie在存储和操作时存在大量的冗余, 导致空间利用率不高, 且仅考虑英文单模式下查找. 为此, 基于索引即键值的思想提出了Trie+结构, 相较于传统的索引与键值分开存储的结构能减少50%的存储空间, 大大提高了空间利用率. 并且, 对Trie+结构的查找、插入、删除等操作给出了函数式建模及其严格的机械化验证, 保证操作的正确性和可靠性. 进一步, 提出一种匹配算法的通用验证规约, 旨在解决一系列的匹配算法正确性验证问题. 最后, 基于Trie+结构与匹配算法通用验证规约, 建模和验证了函数式中英文混合多模式匹配算法, 发现并解决了现有研究中的基于完全哈希Trie的多模式匹配算法的模式串前缀终止的Bug. 该Trie+结构以及验证规约在提高Trie结构空间利用率和验证匹配算法中, 有一定的理论和应用价值.

    Abstract:

    A Trie structure is a type of search tree that organizes information by search keywords and can be employed to efficiently store and search a collection of strings. Meanwhile, Nipkow et al. provided Isabelle modeling and verification for Trie implementation. However, there is a significant amount of redundancy in the Trie’s storage and operation, resulting in poor space utilization, and only the English single-mode lookup is considered. Therefore, based on the idea that the index is the key value, this study proposes the Trie+ structure, which can reduce storage space by 50% compared to the traditional structure of storing the index and key value separately, and greatly improve space utilization. Furthermore, the Trie+ structure’s lookup, insertion, and deletion operations are modeled as functions and rigorously mechanized to ensure their correctness and reliability. Additionally, a generalized verification protocol for matching algorithms is proposed to solve the correctness verification and problems of a series of matching algorithms. Finally, a functional Chinese-English hybrid multi-pattern matching algorithm is modeled and verified by the Trie+ structure and the matching algorithm’s universal verification protocol, and the Bug of prefix termination of pattern strings in multi-pattern matching algorithms of existing research based on the full hash Trie is discovered and solved. The proposed Trie+ structure and verification protocol have theoretical and application significance in improving the space utilization of the Trie structure and verifying the matching algorithm.

    参考文献
    [1] Fredkin E. Trie memory. Communications of the ACM, 1960, 3(9): 490–499.
    [2] Hinze R. Generalizing generalized tries. Journal of Functional Programming, 2000, 10(4): 327–351.
    [3] Ghuge S. Map and trie based compression algorithm for data transmission. In: Proc. of the 2nd Int’l Conf. on Innovative Mechanisms for Industry Applications (ICIMIA). Bangalore: IEEE, 2020. 137–141.
    [4] Diop L, Diop CT, Giacometti A, Soulet A. Trie-based output space itemset sampling. In: Proc. of the 2022 IEEE Int’l Conf. on Big Data (Big Data). Osaka: IEEE, 2022. 6–15.
    [5] Indira B, Valarmathi K, Devaraj D. A trie based IP lookup approach for high performance router/switch. In: Proc. of the 2019 IEEE Int’l Conf. on Intelligent Techniques in Control, Optimization and Signal Processing (INCOS). Tamilnadu: IEEE, 2019. 1–6.
    [6] Song HZ, Sun WT, Zhang MZ, Lu Y. XML index algorithm based on Patricia tries. In: Proc. of the 2009 Int’l Conf. on Web Information Systems and Mining. Shanghai: IEEE, 2009. 289–293.
    [7] Gupta S, Garg ML. Binary trie coding scheme: An intelligent genetic algorithm avoiding premature convergence. Int’l Journal of Computer Mathematics, 2013, 90(5): 881–902.
    [8] Savnik I, Akulich M, Krnc M, ?krekovski R. Data structure set-trie for storing and querying sets: Theoretical and empirical analysis. PLoS ONE, 2021, 16(2): e0245122.
    [9] Tanhermhong T, Theeramunkong T, Chinnan W. A structure-shared trie compression method. In: Proc. of the 15th Pacific Asia Conf. on Language, Information and Computation. Hong Kong: City University of Hong Kong, 2001. 129–138.
    [10] Huet G. A functional toolkit for morphological and phonological processing, application to a Sanskrit tagger. Journal of Functional Programming, 2005, 15(4): 573–614.
    [11] 王戟, 詹乃军, 冯新宇, 刘志明. 形式化方法概貌. 软件学报, 2019, 30(1): 33–61. http://www.jos.org.cn/1000-9825/5652.htm
    Wang J, Zhan NJ, Feng XY, Liu ZM. Overview of formal methods. Ruan Jian Xue Bao/Journal of Software, 2019, 30(1): 33–61. (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5652.htm
    [12] 江南, 李清安, 汪吕蒙, 张晓瞳, 何炎祥. 机械化定理证明研究综述. 软件学报, 2020, 31(1): 82–112. http://www.jos.org.cn/1000-9825/5870.htm
    Jiang N, Li QA, Wang LM, Zhang XT, He YX. Overview on mechanized theorem proving. Ruan Jian Xue Bao/Journal of Software, 2020, 31(1): 82–112 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5870.htm
    [13] Nipkow T, Wenzel M, Paulson LC. Isabelle/HOL: A Proof Assistant for Higher-order Logic. Berlin: Springer, 2002.
    [14] Lochbihler A, Nipkow T. Trie. Archive of Formal Proofs. 2015. https://www.isa-afp.org/browser_info/current/AFP/Trie/document.pdf
    [15] Nipkow T, Blanchette J, Eberl M, Gómez-Londo?o A, Lammich P, Sternagel C, Wimmer S, Zhan BH. Functional Algorithms, Verified! 2021. https://functional-algorithms-verified.org/
    [16] 孙钦东, 黄新波, 王倩. 面向中英文混合环境的多模式匹配算法. 软件学报, 2008, 19(3): 674–686. https://www.jos.org.cn/jos/article/abstract/20080318
    Sun QD, Huang XB, Wang Q. Multiple pattern matching on Chinese/English mixed texts. Ruan Jian Xue Bao/Journal of Software, 2008, 19(3): 674–686 (in Chinese with English abstract). https://www.jos.org.cn/jos/article/abstract/20080318
    [17] Connelly RH, Morris FL. A generalization of the trie data structure. Mathematical Structures in Computer Science, 1995, 5(3): 381–418.
    [18] Appel AW. Verified Functional Algorithms. Software Foundations Vol. 3. 2023. https://softwarefoundations.cis.upenn.edu/vfa-current
    [19] Appel AW, Leroy X. Efficient extensional binary tries. Journal of Automated Reasoning, 2023, 67(1): 8.
    [20] Wu S, Manber U. A fast algorithm for multi-pattern searching. Tucson, AZ: University of Arizona, 1994.
    [21] 沈洲, 王永成, 刘功申. 改进的中文字串多模式匹配算法. 情报学报, 2002, 21(1): 27–32.
    Shen Z, Wang YC, Liu GS. Improved multiple pattern algorithm for Chinese string matching. Journal of the China Society for Scientific and Technical Information, 2002, 21(1): 27–32 (in Chinese with English abstract).
    [22] Back RJ. Invariant based programming: Basic approach and teaching experiences. Formal Aspects of Computing, 2009, 21(3): 227–244.
    [23] Aho AV, Corasick MJ. Efficient string matching: An aid to bibliographic search. Communications of the ACM, 1975, 18(6): 333–340.
    [24] Haftmann F, Bulwahn L, Nipkow T. Code generation from Isabelle/HOL theories. 2023. https://isabelle.in.tum.de/dist/Isabelle2023/doc/codegen.pdf
    [25] 赵永望. 函数式程序设计与证明. 2021. https://www.yuque.com/zhaoyongwang/fpp/
    Zhao YW. Functional programming and proof. 2021 (in Chinese). https://www.yuque.com/zhaoyongwang/fpp/
    [26] Gammie P. Verified synthesis of knowledge-based programs in finite synchronous environments. 2023. https://www.isa-afp.org/browser_info/current/AFP/KBPs/outline.pdf
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

左正康,柯雨含,黄箐,王玥坤,曾志城,王昌晶. Trie+结构函数式建模、机械化验证及其应用.软件学报,2024,35(9):4242-4264

复制
分享
文章指标
  • 点击次数:442
  • 下载次数: 2293
  • HTML阅读次数: 741
  • 引用次数: 0
历史
  • 收稿日期:2023-09-11
  • 最后修改日期:2023-10-30
  • 在线发布日期: 2024-01-05
  • 出版日期: 2024-09-06
文章二维码
您是第19757865位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号