机器学习方法可很好地与软件测试相结合, 增强测试效果, 但少有学者将其运用于测试数据生成方面. 为进一步提高测试数据生成效率, 提出一种结合SVM (support vector machine)和XGBoost (extreme gradient boosting)的链式模型, 并基于此模型借助遗传算法实现多路径测试数据生成. 首先, 利用一定样本训练若干个用于预测路径节点状态的子模型(SVM和XGBoost), 通过子模型的预测精度值筛选最优子模型, 并根据路径节点顺序将其依次链接, 形成一个链式模型C-SVMXGBoost (chained SVM and XGBoost). 在利用遗传算法生成测试用例时, 使用训练好的链式模型代替插桩法获取测试数据覆盖路径(预测路径), 寻找预测路径与目标路径相似的路径集, 对存在相似路径集的预测路径进行插桩验证, 获取精确路径, 计算适应度值. 在交叉变异过程中引入样本集中路径层级深度较大的优秀测试用例进行重用, 生成覆盖目标路径的测试数据. 最后, 保留进化生成中产生的适应度较高的个体, 更新链式模型C-SVMXGBoost, 进一步提高测试效率. 实验表明, C-SVMXGBoost较其他各对比链式模型更适合解决路径预测问题, 可提高测试效率. 并且通过与已有经典方法相比, 所提方法在覆盖率上提高可达15%, 平均进化代数也有所降低, 在较大规模程序上其降低百分比可达65%.
Machine learning methods can be well combined with software testing to enhance test effect, but few scholars have applied it to test data generation. In order to further improve the efficiency of test data generation, a chained model combining support vector machine (SVM) and extreme gradient boosting (XGBoost) is proposed, and multi-path test data generation is realized by a genetic algorithm based on the chained model. Firstly, this study uses certain samples to train several sub-models (i.e., SVM and XGBoost) for predicting the state of path nodes, filters the optimal sub-models based on the prediction accuracy value of the sub-models, and links the optimal sub-models in sequence according to the order of the path nodes, so as to form a chained model, namely chained SVM and XGBoost (C-SVMXGBoost). When using the genetic algorithm to generate test cases, the study makes use of the chained model that is trained instead of the instrumentation method to obtain the test data coverage path (i.e., predicted path), finds the path set with the predicted path similar to the target path, performs instrumentation verification on the predicted path with similar path sets, obtains accurate paths, and calculates fitness values. In the crossover and mutation process, excellent test cases with a large path level depth in the sample set are introduced for reuse to generate test data covering the target path. Finally, individuals with higher fitness during the evolutionary generation are saved, and C-SVMXGBoost is updated, so as to further improve the test efficiency. Experiments show that C-SVMXGBoost is more suitable for solving the path prediction problem and improving the test efficiency than other chained models. Moreover, compared with the existing classical methods, the proposed method can increase the coverage rate by up to 15%. The mean evolutionary algebra is also reduced, and the reduction percentage can reach 65% on programs of large size.