Abstract:The reproducibility of scientific research results is a fundamental guarantee for the reliability of scientific research and the cornerstone of scientific and technological advancement. However, the research community is currently facing a serious reproducibility crisis, with many research results published in top journals and conferences being irreproducible. In the field of data science, the reproducibility of research results faces challenges such as heterogeneous research data from multiple sources, complex computational processes, and intricate computational environments. To address these issues, this study proposes ReproLink, a reproducibility-oriented research data management system. ReproLink constructs a unified model of research data, abstracting it into research data objects that consist of three elements: identifier, attribute set, and data entity. Through fine-grained modeling of the reproduction process, ReproLink establishes a precise method for describing multi-step, complex reproduction processes. By integrating code and operating environment modeling, ReproLink eliminates the uncertainties caused by different environments affecting code execution. Performance tests and case studies show that ReproLink performs well with data scales up to one million records, demonstrating practical value in real-world scenarios such as paper reproduction and data provenance tracking. The technical architecture of ReproLink has been integrated into Conow Software, the only integrated comprehensive management and service platform in China specifically designed for scientific research institutes, supporting the reproducibility needs of hundreds of such institutes across the country.