国家自然科学基金(61902222); 山东省泰山学者工程专项基金(ts20190936, tsqn201909109); 山东省自然科学基金优秀青年基金(ZR2021YQ45); 山东省高等学校青创科技计划创新团队项目(2021KJ031); 山东科技大学领军人才与优秀科研团队计划(2015TDJH102)
流程剩余时间预测对于业务异常的预防和干预有着重要的价值和意义. 现有的剩余时间预测方法通过深度学习技术达到了更高的准确率, 然而大多数深度模型结构复杂难以解释预测结果, 即不可解释问题. 此外, 剩余时间预测除了活动这一关键属性还会根据领域知识选择若干其他属性作为预测模型的输入特征, 缺少通用的特征选择方法, 对于预测的准确率和模型的可解释性存在一定影响. 针对上述问题, 提出基于可解释特征分层模型(explainable feature-based hierarchical model, EFH model)的流程剩余时间预测框架. 具体而言, 首先提出特征自选择策略, 通过基于优先级的后向特征删除和基于特征重要性值的前向特征选择, 得到对预测任务具有积极影响的属性作为模型输入. 然后提出可解释特征分层模型架构, 通过逐层加入不同特征得到每层的预测结果, 解释特征值与预测结果的内在联系. 采用LightGBM (light gradient boosting machine)和LSTM (long short-term memory)算法实例化所提方法, 框架是通用的不限于选用算法. 最后在8个真实事件日志上与最新方法进行比较. 实验结果表明所提方法能够选取出有效特征, 提高预测的准确率, 并解释预测结果.
Remaining process time prediction is important for preventing and intervening in abnormal business operations. For predicting the remaining time, existing approaches have achieved high accuracy through deep learning techniques. However, most of these techniques involve complex model structures, and the prediction results are difficult to be explained, namely, unexplainable issues. In addition, the prediction of the remaining time usually uses the key attribute, namely activity, or selects several other attributes as the input features of the predicted model according to the domain knowledge. However, a general feature selection method is missing, which may affect both prediction accuracy and model explainability. To tackle these two challenges, this study introduces a remaining process time prediction framework based on an explainable feature-based hierarchical (EFH) model. Specifically, a feature self-selection strategy is first proposed, and the attributes that have a positive impact on the prediction task are obtained as the input features of the model through the backward feature deletion based on priority and the forward feature selection based on feature importance. Then an EFH model is proposed. The prediction results of each layer are obtained by adding different features layer by layer, so as to explain the relationship between input features and prediction results. The study also uses the light gradient boosting machine (LightGBM) and long short-term memory (LSTM) algorithms to implement the proposed approach, and the framework is general and not limited to the algorithms selected in this study. Finally, the proposed approach is compared with other methods on eight real-life event logs. The experimental results show that the proposed approach can select effective features and improve prediction accuracy. In addition, the prediction results are explained.