国家自然科学基金(61872339); 福建省自然科学基金(2021J01316, 2021J01320); 中央高校基本科研业务费专项资金(ZQN-1010); 厦门市自然科学基金(3502Z20227191); 上海市自然科学基金(22ZR1422200)
正则表达式在计算机科学的许多领域具有广泛应用. 然而, 由于正则表达式语法比较复杂, 并且允许使用大量元字符, 导致开发人员在定义和使用时容易出错. 测试是保证正则表达式语义正确性的实用和有效手段, 常用的方法是根据被测表达式生成一些字符串, 并检查它们是否符合预期. 现有的测试数据生成大多只关注正例串, 而研究表明, 实际开发中存在的错误大部分在于定义的语言比预期语言小, 这类错误只能通过反例串才能发现. 研究基于变异的正则表达式反例测试串生成. 首先通过变异向被测表达式中注入缺陷得到一组变异体, 然后在被测表达式所定义语言的补集中选取反例字符串揭示相应变异体所模拟的错误. 为了能够模拟复杂缺陷类型, 以及避免出现变异体特化而无法获得反例串的问题, 引入二阶变异机制. 同时采取冗余变异体消除、变异算子选择等优化技术对变异体进行约简, 从而控制最终生成的测试集规模. 实验结果表明, 与已有工具相比, 所提算法生成的反例测试串规模适中, 并且具有较强的揭示错误能力.
Regular expressions are widely used in various areas of computer science. However, due to the complex syntax and the use of a large number of meta-characters, regular expressions are quite error-prone when defined and used by developers. Testing is a practical and effective way to ensure the semantic correctness of regular expressions. The most common method is to generate a set of character strings according to the tested expression and check whether they comply with the intended language. Most of the existing test data generation focuses only on positive strings. However, empirical study shows that a majority of errors during actual development are manifested by the fact that the defined language is smaller than the intended one. In addition, such errors can only be detected by negative strings. This study investigates the generation of negative strings from regular expressions based on mutation. The study first obtains a set of mutants by injecting defects into the tested expression through mutation and then selects a negative character string in the complementary set of the language defined by the tested expression to reveal the error simulated by the corresponding mutant. In order to simulate complex defects and avoid the problem that the negative strings cannot be obtained due to the specialization of mutants, a second-order mutation mechanism is adopted. Meanwhile, optimization techniques such as redundant mutant elimination and mutation operator selection are used to reduce the mutants, so as to control the size of the finally generated test set. The experimental results show that the proposed algorithm can generate negative test strings with a moderate size and have strong error detection ability compared with the existing tools.