Abstract:Comparing with traditional software, the deep learning software has different structures. Even if a lot of test data is used for testing the deep learning software, the adequacy of testing still hard to be evaluted, and many unknown defects could be implied. The deep forest is an emerging deep learning model that overcomes many shortcomings of deep neural networks. For example, the deep neural network requires a lot of training data, high performance computing platform, and many hyperparameters. However, there is no research on testing deep forest. Based on the structural characteristics of deep forests, this study proposes a set of testing coverage criteria, including random forest node coverage (RFNC), random forest leaf coverage (RFLC), cascad forest class coverage (CFCC), and cascad forest output coverage (CFOC). DeepRanger, a coverage-oriented test data generation method based on genetic algorithm, is proposed to automatically generate new test data and effectively improve the model coverage of the test data. Experiments are carried out on the MNIST data set and the gcForest, which is an open source deep forest project. The experimental results show that the four coverage criteria proposed can effectively evaluate the adequacy of the test data set for the deep forest model. In addition, comparing with the genetic algorithm based on random selection, DeepRanger, which is guided by coverage information, can improve the testing coverage of the deep forest model under testing.