基于优化主题模型的临床路径挖掘
作者:
作者简介:

徐啸(1990-),男,安徽宁国人,博士,主要研究领域为医疗大数据,数据挖掘;王建民(1968-),男,博士,教授,博士生导师,CCF高级会员,主要研究领域为大数据与知识工程,流程数据管理与挖掘;金涛(1980-),男,博士,助理研究员,主要研究领域为工作流,业务过程管理,医疗大数据.

通讯作者:

金涛,E-mail:jintao05@gmail.com

基金项目:

国家自然科学基金(61325008);国家科技支撑计划(2015BAH14F02)


Optimized Topic Model for Clinical Pathway Mining
Author:
Fund Project:

National Natural Science Foundation of China (61325008); National Key Technology R&D Program of China (2015BAH14F02)

  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [18]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    在健康领域,诊疗过程对于医疗质量至关重要.临床路径集合了各种医疗知识,是对诊疗过程进行标准化的重要途径.然而,当前大多数临床路径由专家研讨制定,往往静态不变,难以部署和实施.在之前的工作中,提出了一种基于主题的临床路径挖掘算法,可以从医疗数据中抽取历史执行路径,客观反映数据中实际存在的医疗模式.算法首先通过主题模型将繁杂的诊疗活动聚合成若干主题,而每个诊疗日就可以表示为一个主题分布,一个病人的诊疗日志也相应的转换为一个主题序列,然后利用过程挖掘方法从这些主题序列中生成基于主题的临床路径模型.但传统主题模型(LDA)的聚类效果往往难以满足医疗数据的特点,导致主题质量不高,影响最终过程模型的可解释性.其中,一个普遍的问题就是LDA无法保证两个相似的诊疗日所得的主题分布也是相似的,这是由于其忽略了诊疗日之间原有的相似性特征.提出了一种优化的主题模型算法,该算法引入了基于本体生成的诊疗日相似性约束,可以有效地提升聚类效果.实验结果表明,提出的方法能够发现更符合医疗领域特点的高质量主题,进而为基于主题的临床路径的挖掘奠定基础.

    Abstract:

    In healthcare domain, the care process is critical for the care quality. Clinical pathway (CP), which integrates a lot of medical knowledge, is a tool for standardizing the care process. However, most of existing CPs are designed by experts with limited experience and data, and consequently they are always static and non-adaptive for implementation. According to authors' previous work, topic-based CP mining is an effective approach which can discover the process model from clinical data. The various clinical activities are summarized into several topics by latent dirichlet allocation (LDA), and each clinical day in the patient trace is converted to a topic distribution. A CP model can be derived by applying process mining method on the topic-based sequences. However, LDA ignores the similarity between clinical days, which means that in some cases, two similar days may be assigned quite different topic distributions. This paper proposes an optimized topic model for clinical topic discovering by incorporating the similarity constraint, which is based on the domain knowledge. Experiments on real data demonstrate that this new approach can discover quality topics which are useful for topic-based CP mining.

    参考文献
    [1] van der Aalst WMP, Desel J, Oberweis A. Business Process Management Models, Techniques and Empirical Studies. Berlin, Heidelberg:Springer-Verlag, 2000.
    [2] Blei DM, Ng AY, Jordan MI. Latent dirichlet allocation. Journal of Machine Learning Research, 2003,3(1):993-1022.
    [3] Xu X, Jin T, Wei Z, Lv C, Wang J. TCPM:Topic-Based clinical pathway mining. In:Proc. of the 1st IEEE Int'l Conf. on Connected Health:Applications, Systems and Engineering Technologies (CHASE). IEEE, 2016. 292-301.
    [4] Lin F, Chou S, Pan S, Chen Y. Mining time dependency patterns in clinical pathways. Int'l Journal of Medical Informatics, 2001, 62(1):11-25.
    [5] Mans R, Schonenberg H, Leonardi G, Panzarasa S, Anna C, Quaglini S, van der Aalst WMP. Process mining techniques:An application to stroke care. Studies in Health Technology and Informatics, 2008,136(6):573-578.
    [6] Van Dongen BF, de Medeiros AKA, Verbeek HMW, van der Aalst WMP. The ProM framework:A new era in process mining tool support. In:Proc. of the Applications and Theory of Petri Nets 2005. Berlin, Heidelberg:Springer-Verlag, 2005. 444-454.
    [7] Weijters A, van Der Aalst WMP, De Medeiros AKA. Process mining with the heuristics miner-algorithm. Technical Report, 166, Technische Universiteit Eindhoven, 2006. 1-34.
    [8] Günther CW, Van Der Aalst WMP. Fuzzy mining-adaptive process simplification based on multi-perspective metrics. In:Proc. of the Business Process Management. Berlin, Heidelberg:Springer-Verlag, 2007. 328-343.
    [9] Poelmans J, Dedene G, Verheyden G, van der Mussele H, Viaene S, Perters E. Combining business process and data discovery techniques for analyzing and improving integrated care pathways. In:Proc. of the Advances in Data Mining. Applications and Theoretical Aspects, 2010. 505-517.
    [10] Yang W, Su Q. Process mining for clinical pathway:Literature review and future directions. In:Proc. of the 201411th Int'l Conf. on Service Systems and Service Management (ICSSSM). IEEE, 2014. 1-5.
    [11] Huang H, Jin T, Wang J. Clinical-Event packing method based on conditional probability. Computer Integrated Manufacturing Systems, 2017,23(5):1031-1039.
    [12] Huang Z, Lu X, Duan H, Fan W. Summarizing clinical pathways from event logs. Journal of Biomedical Informatics, 2013,46(1):111-127.
    [13] Huang Z, Dong W, Ji L, Gan C, Lu X, Duan H. Discovery of clinical pathway patterns from event logs using probabilistic topic models. Journal of Biomedical Informatics, 2014,47(1):39-57.
    [14] Huang Z, Dong W, Bath P, Ji L, Duan H. On mining latent treatment patterns from electronic medical records. Data Mining and Knowledge Discovery, 2015,29(4):914-949.
    [15] Huang Z, Dong W, Ji L, He C, Duan H. Incorporating comorbidities into latent treatment pattern mining for clinical pathways. Journal of Biomedical Informatics, 2016,59(1):227-239.
    [16] McInnes BT, Pedersen T. Evaluating semantic similarity and relatedness over the semantic grouping of clinical term pairs. Journal of Biomedical Informatics, 2015,54(1):329-336.
    [17] Du J, Jiang J, Song D, Liao L. Topic modeling with document relative similarities. In:Proc. of the IJCAI. 2015. 3469-3475.
    [18] Danilevsky M, Wang C, Desai N, Ren X, Guo J, Han J. Automatic construction and ranking of topical keyphrases on collections of short documents. In:Proc. of the 2014 SIAM Int'l Conf. on Data Mining. Society for Industrial and Applied Mathematics, 2014. 398-406.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

徐啸,金涛,王建民.基于优化主题模型的临床路径挖掘.软件学报,2018,29(11):3295-3305

复制
分享
文章指标
  • 点击次数:5051
  • 下载次数: 6713
  • HTML阅读次数: 3769
  • 引用次数: 0
历史
  • 收稿日期:2017-07-20
  • 最后修改日期:2017-09-16
  • 录用日期:2017-11-14
  • 在线发布日期: 2017-12-05
文章二维码
您是第19728331位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号