涂菲菲,周明辉.软件开发活动数据的数据质量问题.软件学报,2019,30(5):1522-1531 |
软件开发活动数据的数据质量问题 |
Data Quality Problems in Software Development Activity Data |
投稿时间:2018-08-31 修订日期:2018-10-31 |
DOI:10.13328/j.cnki.jos.005727 |
中文关键词: 数据质量 数据产生 数据收集 数据应用 问题追踪数据 版本控制数据 |
英文关键词:data quality data production data collection data use issue tracking data version control data |
基金项目:国家重点研发计划(2018YFB1004201);国家自然科学基金(61432001,61825201) |
|
摘要点击次数: 1616 |
全文下载次数: 1104 |
中文摘要: |
问题追踪系统和版本控制系统等软件开发支持工具已被广泛应用于开源和商业软件的开发中,产生了大量的数据,即软件开发活动数据.软件开发活动数据被广泛应用于科学研究和开发实践,为智能化开发提供支持.然而数据质量对相关的研究和实践有重大影响,却还没有得到足够的重视.为了能够更好地警示数据使用者潜在的数据质量问题,通过文献调研和访谈,并基于自有经验对数据进行分析,总结出了9种数据质量问题,覆盖了数据产生、数据收集和数据使用这3个不同的阶段.进一步地,提出了相应的方法以帮助发现和解决数据问题.发现问题是指加强对数据上下文的理解和通过统计分析及数据可视化发现潜在的数据质量问题,解决问题是指利用冗余数据或者挖掘用户行为模式进行修正. |
英文摘要: |
Software development tools, such as issue tracking system (ITS) and version control system (VCS), are widely used in the intelligent development of open source software and commercial software. When using these tools to assist software development, they produce substantial amount of data, which is called software development activity data. Data quality has attracted more and more attention with increasingly rich software activity data sources and their wide uses. Faithfully, data is the basis of intelligent development. Data quality has influence on research and practice. To remind data users of latent data quality problem of software developement activity data, three aspects are indicated that may have data quality problems through literature review and interview with data users. The data quality problems arose from three phases, i.e., data production, data collection, and data use. Next, to improve the data quality of software development activity data, several recommendations are proposed that could be taken into consideration, including finding data quality problems and solving data quality problems. First of all, researchers should have a clear understanding of the context of data. Next, they may use statistical analysis and data visualization to find latent data quality problems. Finally, they can try to correct the particular problems by redundant data or to improve data quality by user behavior analysis. |
HTML 下载PDF全文 查看/发表评论 下载PDF阅读器 |