LLM-Extractor: 基于大语言模型的软件配置间约束提取方法
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TP311

基金项目:

国家重点研发计划 (2023YFB4503805); 陕西省自然科学基础研究计划 (2025JC-YBQN-851)


LLM-Extractor: LLM-based Constraints Extraction Method among Software Configurations
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    软件配置是软件系统的重要组成部分, 在增强软件功能多样性和灵活性方面具有重要作用. 而随着软件系统越来越复杂, 软件配置项之间复杂的约束关系成为困扰运维人员的问题. 因此研究人员提出了基于不同数据源、使用不同技术的配置约束提取方法, 来识别软件配置之间的复杂约束关系. 然而, 这些方法存在难以应用于多种编程语言、分析规模有限、对高质量有标注数据需求大等多种问题, 针对上述问题提出了一种基于大语言模型的配置间约束提取方法LLM-Extractor. 该方法包括了配置-功能关联图构建和基于多配置关联子图的配置约束推断两个部分. 在配置-功能关联图构建阶段, LLM-Extractor借助大语言模型强大的文本理解和分析能力, 从配置文本中识别配置和软件功能相关的实体, 并抽取多种关联关系. 在配置间约束推断部分, LLM-Extractor在已有配置-功能关联图上搜索多配置关联子图, 并依据关联子图信息引导大语言模型推断配置间约束. 基于多配置关联子图的配置间约束推断方法让LLM-Extractor能够提取通过软件功能状态传递的配置约束, 填补了已有方法的空缺, 同时具有对编程语言不敏感、分析规模大的特点. 在3个开源软件系统的配置文档上评估了方法的效果, 分析了超过1400个软件配置项, 实验结果表明, LLM-Extarctor的效果相对已有的文本分析方法具有显著提高, F1分数有至少43.4%的提升. 消融实验的实验结果进一步表明, 多配置关联子图对于配置间约束推断方法的效果具有重要的积极影响.

    Abstract:

    Software configuration is a crucial component of software systems and plays an important role in enhancing the diversity and flexibility of software functionalities. As software systems become increasingly complex, the intricate constraint relationships between configuration options present a significant challenge for system administrators. To address this, researchers have proposed various constraint extraction methods based on different data sources and techniques to identify complex relationships between configurations. However, these methods face several limitations, such as limited applicability across multiple programming languages, constrained analysis scale, and a heavy reliance on high-quality annotated data. To overcome these issues, this study proposes LLM-Extractor, a configuration constraint extraction method based on large language models. This method consists of two main components: the construction of a configuration-function association graph and configuration constraint inference based on multi-configuration association subgraphs. In the graph construction phase, LLM-Extractor leverages the powerful text understanding and analysis capabilities of large language models to identify entities related to configurations and software functionalities from configuration documents and extract various types of relationships. In the constraint inference phase, LLM-Extractor searches for multi-configuration association subgraphs on the existing function graph and guides the large language model to infer configuration constraints based on the information within the subgraphs. By inferring constraints based on multi-configuration association subgraphs, LLM-Extractor can extract configuration constraints transmitted through software function states, filling the gap left by existing methods. It is also characterized by its language-agnostic nature and scalability. The effectiveness of this approach is evaluated on configuration documents from three open-source software systems, analyzing over 1,400 configuration options. Experimental results show that LLM-Extractor outperforms existing text analysis methods, with a 43.4% improvement in F1 score. Further ablation studies demonstrate the critical positive impact of multi-configuration association subgraphs on the effectiveness of configuration constraint inference.

    参考文献
    相似文献
    引证文献
引用本文

张添翼,周彤,张晨曦,彭鑫. LLM-Extractor: 基于大语言模型的软件配置间约束提取方法.软件学报,,():1-29

复制
相关视频

分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-12-11
  • 最后修改日期:2025-03-17
  • 录用日期:
  • 在线发布日期: 2025-11-05
  • 出版日期:
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号