ReproLink: 面向可复现性的科研数据管理系统
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TP311

基金项目:

北京市科技新星计划(Z211100002121159); 数据空间技术与系统全国重点实验室资助项目


ReproLink: Reproducibility-oriented Research Data Management System
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    科研成果的可复现性是科学研究可靠性的基本保证, 更是科学技术进步的基石. 然而, 当前学术界面临着严峻的可复现性危机, 大量在顶级期刊和会议上公开发表的科研成果无法复现. 在数据科学领域, 成果的可复现性面临着科研数据多源异构、计算流程复杂、计算环境复杂等挑战. 针对这些问题, 提出面向可复现性的科研数据管理系统ReproLink. ReproLink提出对科研数据的统一建模, 将科研数据抽象为包含标识、属性集、数据实体三要素的科研数据对象; 通过对于复现流程的细粒度建模, ReproLink建立一种对多步骤复杂复现流程的精确描述方法. 通过代码和运行环境的一体化建模, ReproLink消除不同环境中代码执行行为的不确定性给成果复现带来的影响. 对ReproLink的性能测试和实例分析表明, ReproLink在百万级的数据规模下具有较好的性能表现, 在论文复现、复现相关数据的溯源等现实场景中具有实用价值. ReproLink系统技术架构已集成到国内唯一专门面向科研院所的一体化综合管理与服务平台-科南软件, 支持国内数百家科研机构的成果复现需求.

    Abstract:

    The reproducibility of scientific research results is a fundamental guarantee for the reliability of scientific research and the cornerstone of scientific and technological advancement. However, the research community is currently facing a serious reproducibility crisis, with many research results published in top journals and conferences being irreproducible. In the field of data science, the reproducibility of research results faces challenges such as heterogeneous research data from multiple sources, complex computational processes, and intricate computational environments. To address these issues, this study proposes ReproLink, a reproducibility-oriented research data management system. ReproLink constructs a unified model of research data, abstracting it into research data objects that consist of three elements: identifier, attribute set, and data entity. Through fine-grained modeling of the reproduction process, ReproLink establishes a precise method for describing multi-step, complex reproduction processes. By integrating code and operating environment modeling, ReproLink eliminates the uncertainties caused by different environments affecting code execution. Performance tests and case studies show that ReproLink performs well with data scales up to one million records, demonstrating practical value in real-world scenarios such as paper reproduction and data provenance tracking. The technical architecture of ReproLink has been integrated into Conow Software, the only integrated comprehensive management and service platform in China specifically designed for scientific research institutes, supporting the reproducibility needs of hundreds of such institutes across the country.

    参考文献
    相似文献
    引证文献
引用本文

黄小龙,杨婧如,柳熠,马郓,景翔,黄罡. ReproLink: 面向可复现性的科研数据管理系统.软件学报,,():1-20

复制
相关视频

分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-08-20
  • 最后修改日期:2024-10-10
  • 录用日期:
  • 在线发布日期: 2025-05-22
  • 出版日期:
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号