• Article
  • | |
  • Metrics
  • |
  • Reference [24]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    Dynamic changes in service environment will affect fault diagnosis algorithm. In order to reduce the impact, challenges of fault diagnosis in dynamic environment are analyzed in this paper. Multi-layer management model is presented to model the service system, Bipartite Bayesian network is chosen to model the dependency relationship and binary symmetric channel is chosen to model noises. To deal with the dynamic fault set caused by fault recovery mechanism, prior fault probability is modified based on fault persistent time statistic; To deal with the dynamic model, expected model is built based on the time of observing symptoms and original models in current window. Simulation results show that this fault diagnosis algorithm is efficient in dynamic Internet service environment.

    Reference
    [1] Jakobson G, Weissman M. Alarm correlation. IEEE Network, 1993,7(6):52-59.
    [2] Lewis LM. A case-based reasoning approach for the resolution of faults in communication networks. In: Proc. of the 3rd IFIP/IEEE Symp. on Integrated Network Management. San Francisco: North-Holland Publishing Co., 1993. 671-682. http://portal.acm.org/ citation.cfm?id=732040
    [3] Li F, Thottan M. End-to-End service quality measurement using source-routed probes. In: Proc. of the IEEE INFOCOM. 2006. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4146937
    [4] Chen ZX. Proactive probing and probing on demand in service fault localization. The Int’l Journal of Intelligence Control and Systems, 2005,2(2):107-113.
    [5] Natu M, Sethi AS. Active probing approach for fault localization in computer networks. In: Proc. of the 4th IEEE/IFIP Workshop on End-to-End Monitoring Techniques and Services (E2EMON 2006). 2006. 25-33. http://ieeexplore.ieee.org/xpls/abs_all.jsp? arnumber=1651276
    [6] Nguyen HX, Thiran P. Using end-to-end data to infer lossy links in sensor networks. In: Proc. of the 25th IEEE Int’l Conf. on Computer Communications. Barcelona: IEEE INFOCOM. 2006. 1-12. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber= 4146924
    [7] Steinder M, Sethi AS. A survey of fault localization techniques in computer networks. Science of Computer Programming. Computer Systems (AH), 2004,53(22):165-194.
    [8] Steinder M, Sethi AS. Probabilistic fault diagnosis in communication systems through incremental hypothesis updating. Computer Networks, 2004,45(4):537-562.
    [9] Hasselmeyer P. An infrastructure for the management of dynamic service networks. IEEE Communications Magazine, 2003,41(4): 120-126.
    [10] Candea G, Kiciman E, Zhang S, Keyani P, Fox A. JAGR: An autonomous self-recovering application server. In: Proc. of the 5th Int’l Workshop on Active Middleware Services. 2003. 168-177. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1210217
    [11] Natu M, Sethi AS. Using temporal correlation for fault localization in dynamically changing networks. Int’l Journal of Network Management, 2007. http://portal.acm.org/citation.cfm?id=141551512
    [12] Rish I, Brodie M, Ma S, Odintsova N. Adaptive diagnosis in distributed systems. IEEE Trans. on Neural Networks (Special Issue on Adaptive Learning Systems in Communication Networks), 2005,16(5):1088-1109.
    [13] Lerner U, Parr R, Koller D, Biswas G. Bayesian fault detection and diagnosis in dynamic systems. In: Proc. of the 17th National Conf. on Artificial Intelligence. AAAI Press/MIT Press, 2000. 531-537. http://portal.acm.org/citation.cfm?id=72111314
    [14] Ding JG, Kramer B, Xu SH, Chen HS, Bai YC. Predictive fault management in the dynamic environment of IP networks. In: Proc. of the IEEE Workshop on IP Operations and Management. 2004. 233-239.
    [15] Kompella RR, Yates J, Greenberg A, Snoeren AC. Detection and localization of network black holes. In: Proc. of the 26th IEEE Int’l Conf. on Computer Communications. Anchorage: IEEE INFOCOM, 2007. 2180-2188. http://ieeexplore.ieee.org/xpls/ abs_all.jsp?arnumber=4215834
    [16] Huang XH, Zou SH, Wang WD, Cheng SD. Fault management for Internet service: modeling and algorithms. In: Proc. of the IEEE Int’l Conf. on Communications (ICC 2006). 2006. http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4024235
    [17] Bagchi S, Kar G, Hellerstein J. Dependency analysis in distributed systems using fault injection: Application to problem determination in an e-commerce environment. In: Proc. of the 12th Int’l Workshop on Distributed Systems: Operations and Management (DSOM 2001). 2001. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.106.6353
    [18] Basu S, Casati F, Daniel F. Web service dependency discovery tool for SOA management. In: Proc. of the 2007 IEEE Int’l Conf. on Services Computing: SOA Industry Summit. 2007. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4278726
    [19] Steinder M, Sethi AS. Probabilistic fault localization in communication systems using belief networks. IEEE/ACM Trans. on Networking, 2004,12(5):809-822.
    [20] Huang XH, Zou SH, Wang WD, Cheng SD. MDFM: Multi-Domain fault management for Internet services. In: Royo JD, Hasegawa G, eds. Proc. of the 8th Int’l Conf. on Management of Multimedia Networks and Services (MMNS 2005). New York: Springer-Verlag, 2005. 121-132.
    [21] Narasimha R, Dihidar S, Ji C, McLaughlin SW. Scalable fault diagnosis in IP networks using graphical models: A variational inference approach. In: Proc. of the IEEE Int’l Conf. on Communications (ICC 2007). 2007. 147-152. http://ieeexplore.ieee.org/ xpls/abs_all.jsp?arnumber=4288703
    [22] Hsueh MC, Tsai TK, Iyer RK. Fault injection techniques and tools. Computer, 1997,30(4):75-82.
    [23] Yemini SA, Kliger S, Mozes E, Yemini Y, Ohsie D. High speed and robust event correlation. Communications Magazine, 1996, 34(5):82-90.
    [24] Chandalia G, Rish I. Blind source separation approach to performance diagnosis and dependency discovery. In: Proc. of the ACM SIGCOMM Conf. on Internet Measurement. 2007. 259-264. http://portal.acm.org/citation.cfm?id=1298342
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

褚灵伟,邹仕洪,程时端,田春岐,王文东.一种动态环境下的互联网服务故障诊断算法.软件学报,2009,20(9):2520-2530

Copy
Share
Article Metrics
  • Abstract:4724
  • PDF: 6524
  • HTML: 0
  • Cited by: 0
History
  • Received:June 23,2008
  • Revised:August 21,2008
You are the first2044866Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063