Survey on Log Research of Large Scale Software System
Author:
Affiliation:

Fund Project:

National Natural Science Foundation of China (61402496); National Program on Key Basic Research Project of China (973) (2014CB340703); Tencent Cooperation Projects in Universities of China

  • Article
  • | |
  • Metrics
  • |
  • Reference [46]
  • |
  • Related [20]
  • |
  • Cited by [2]
  • | |
  • Comments
    Abstract:

    Standardized and sufficient log is a necessary part of good code quality, and it plays an important role in failure diagnosis as well. Code quality management, however, is restricted by the high complexity of large-scale software. Currently, it's difficult and inefficient to reproduce and diagnose system failure with logs. This paper surveys log-related work from three aspects including log characterization, failure diagnosis with log and log enhancement. Through detailed study on several widely-used open-source software, the paper reveals some log-related observations, along with the problems which have not been well handled by existing tools. Finally, it proposes several possible log-related work, and analyzes potential challenges.

    Reference
    [1] Coverity. Coverity Scan: 2012 Open Source Report. 2013. http://www.coverity.com/
    [2] Yuan D, Park S, Huang P, Liu Y, Lee MM, Tang X, Zhou Y, Savage S. Be conservative: enhancing failure diagnosis with proactive logging. In: Proc. of the 10th Symp. on Operating Systems Design and Implementation (OSDI). 2012. 293-306.
    [3] Yuan D, Park S, Zhou Y. Characterizing logging practices in open-source software. In: Proc. of the 2012 Int'l Conf. on Software Engineering. 2012. 102-112. [doi: 10.1109/ICSE.2012.6227202]
    [4] Kavulya SP, Joshi K, Di Giandomenico F, Narasimhan P. Failure Diagnosis of Complex Systems, Resilience Assessment and Evaluation of Computing Systems. Springer-Verlag, 2012. 239-261. [doi: 10.1007/978-3-642-29032-9]
    [5] Fu Q, Zhu J, Hu W, Lou JG, Ding R, Lin Q, Zhang D, Xie T. Where do developers log? An empirical study on logging practices in industry. In: Proc. of the 36th Int'l Conf. on Software Engineering. 2014. 24-33. [doi: 10.1145/2591062.2591175]
    [6] Jiang W, Hu C, Pasupathy S, Kanevsky A, Li Z, Zhou Y. Understanding Customer Problem Troubleshooting from Storage System Logs. In: Proc. of the 7th USENIX Conf. on File and Storage Technologies (FAST). 2009. 43-56.
    [7] Prewett JE. Analyzing cluster log files using logsurfer. In: Proc. of the 4th Annual Conf. on Linux Clusters. 2003.
    [8] Hellerstein JL, Ma S, Perng CS. Discovering actionable patterns in event data. IBM Systems Journal, 2002,41(3):475-493. [doi: 10. 1147/sj.413.0475]
    [9] Ma S, Hellerstein JL. Mining partially periodic event patterns with unknown periods. In: Proc. of the 17th Int'l Conf. on Data Engineering. 2001. 205-214. [doi: 10.1109/ICDE.2001.914829]
    [10] Yamanishi K, Maruyama Y. Dynamic syslog mining for network failure monitoring. In: Proc. of the 11th ACM SIGKDD Int'l Conf. on Knowledge Discovery in Data Mining. 2005. 499-508. [doi: 10.1145/1081870.1081927]
    [11] Lim C, Singh N, Yajnik S. A log mining approach to failure analysis of enterprise telephony systems. In: Proc. of the IEEE Int'l Conf. on Dependable Systems and Networks with FTCS and DCC (DSN 2008). 2008. 398-403. [doi: 10.1109/DSN.2008.4630109]
    [12] Mariani L, Pastore F. Automated identification of failure causes in system logs. In: Proc. of the 19th Int'l Symp. on Software Reliability Engineering (ISSRE 2008). 2008. 117-126. [doi: 10.1109/ISSRE.2008.48]
    [13] Liang Y, Zhang Y, Sivasubramaniam A, Sahoo RK, Moreira J, Gupta M. Filtering failure logs for a bluegene/l prototype. In: Proc. of the Int'l Conf. on Dependable Systems and Networks (DSN 2005). 2005. 476-485. [doi: 10.1109/DSN.2005.50]
    [14] Xu W. System Problem Detection by Mining Console Logs. 2010.
    [15] Rabkin A, Xu W, Wildani A, Fox A, Patterson D, Katz R. A graphical representation for identifier structure in logs. In: Proc. of the Workshop Managing Systems via Log Analysis and Machine Learning Techniques. 2010.
    [16] Xu W, Huang L, Fox A, Patterson DA, Jordan MI. Mining console logs for large-scale system problem detection. In Proc. of the 3rd Conf. on Tackling Computer Systems Problems with Machine Learning Techniques. 2008. 4.
    [17] Xu W, Huang L, Fox A, Patterson D, Jordan MI. Detecting large-scale system problems by mining console logs. In: Proc. of ACM SIGOPS the 22nd Symp. on Operating Systems Principles. 2009. 117-132. [doi: 10.1145/1629575.1629587]
    [18] Xu W, Huang L, Fox A, Patterson D, Jordan M. Online system problem detection by mining patterns of console logs. In: Proc. of the 9th IEEE Int'l Conf. on Data Mining (ICDM 2009). 2009. 588-597. [doi: 10.1109/ICDM.2009.19]
    [19] Xu W, Huang L, Fox A, Patterson D, Jordan M. Experience mining Google's production console logs. In: Proc. of the SLAML. 2010.
    [20] Tang L, Li T, Perng CS. LogSig: Generating system events from raw textual logs. In: Proc. of the 20th ACM Int'l Conf. on Information and Knowledge Management. 2011. 785-794. [doi: 10.1145/2063576.2063690]
    [21] Lou JG, Fu Q, Yang S, Xu Y, Li J. Mining invariants from console logs for system problem detection. In: Proc. of the USENIX Annual Technical Conf. 2010.
    [22] Fu X, Ren R, Zhan J, Zhou W, Jia Z, Lu G. LogMaster: Mining event correlations in logs of large-scale cluster systems. In: Proc. of 2012 IEEE the 31st Symp. on Reliable Distributed Systems (SRDS). 2012. 71-80. [doi: 10.1109/SRDS.2012.40]
    [23] Ghanbari S, Hashemi AB, Amza C. Stage-Aware anomaly detection through tracking log points. In: Proc. of the 15th Int'l Middleware Conf. 2014. 253-264. [doi: 10.1145/2663165.2663319]
    [24] Beschastnikh I, Brun Y, Ernst MD, Krishnamurthy A. Inferring models of concurrent systems from logs of their behavior with CSight. In: Proc. of the 36th Int'l Conf. on Software Engineering. 2014. 468-479. [doi: 10.1145/2568225.2568246]
    [25] Fu Q, Lou JG, Wang Y, Li J. Execution anomaly detection in distributed systems through unstructured log analysis. In: Proc. of the 9th IEEE Int'l Conf. on Data Mining (ICDM 2009). 2009. 149-158. [doi: 10.1109/ICDM.2009.60]
    [26] Zhao X, Zhang Y, Lion D, Faizan M, Luo Y, Yuan D, Stumm M. lprof: A nonintrusive request flow profiler for distributed systems. In: Proc. of the 11th Symp. on Operating Systems Design and Implementation. 2014.
    [27] Tan J, Pan X, Kavulya S, Gandhi R, Narasimhan P. SALSA: Analyzing logs as state machines. In Proc. of the 1st USENIX Conf. on Analysis of System Logs. 2008. 6.
    [28] Tan J, Pan X, Kavulya S, Gandhi R, Narasimhan P. Mochi: Visual log-analysis based tools for debugging hadoop. In: Proc. of the USENIX Workshop on Hot Topics in Cloud Computing (HotCloud). San Diego, 2009.
    [29] Li W, Gorton I. Analyzing Web logs to detect user-visible failures. In: Proc. of the 2010 Workshop on Managing Systems via Log Analysis and Machine Learning Techniques (SLAML 2010). Vancouver, 2010. 6-6.
    [30] Nagaraj K, Killian C, Neville J. Structured comparative analysis of systems logs to diagnose performance problems. In: Proc. of the 9th USENIX Conf. on Networked Systems Design and Implementation. 2012. 26-26.
    [31] Fu Q, Lou JG, Lin Q, Ding R, Zhang D, Xie T. Contextual analysis of program logs for understanding system behaviors. In: Proc. of the 10th Working Conf. on Mining Software Repositories. 2013. 397-400. [doi: 10.1109/MSR.2013.6624054]
    [32] Glerum K, Kinshumann K, Greenberg S, Aul G, Orgovan V, Nichols G, Grant D, Loihle G, Hunt G. Debugging in the (very) large: 10 years of implementation and experience. In: Proc. of ACM SIGOPS the 22nd Symp. on Operating Systems Principles. 2009. 103-116. [doi: 10.1145/1629575.1629586]
    [33] Mozilla quality feedback agent. http://goo.gl/V9zl2
    [34] Yuan D, Mai H, Xiong W, Tan L, Zhou Y, Pasupathy S. SherLog: Error diagnosis by connecting clues from run-time logs. In: Proc. of the ACM SIGARCH Computer Architecture News. 2010. 143-154. [doi: 10.1145/1736020.1736038]
    [35] Veeraraghavan K, Lee D, Wester B, Ouyang J, Chen PM, Flinn J, Narayanasamy S. DoublePlay: Parallelizing sequential logging and replay. In Proc. of the 16th Int'l Conf. on Architectural Support for Programming Languages and Operating Systems. 2011. 15-26. [doi: 10.1145/2110356.2110359]
    [36] Subhraveti D, Nieh J. Record and transplay: Partial checkpointing for replay debugging across heterogeneous systems. In: Proc. of the ACM SIGMETRICS Joint Int'l Conf. on Measurement and Modeling of Computer Systems. 2011. 109-120. [doi: 10.1145/1993 744.1993757]
    [37] Altekar G, Stoica I. ODR: Output-deterministic replay for multicore debugging. In: Proc. of ACM SIGOPS the 22nd Symp. on Operating Systems Principles. 2009. 193-206. [doi: 10.1145/1629575.1629594]
    [38] Dunlap GW, King ST, Cinar S, Basrai MA, Chen PM. ReVirt: Enabling intrusion analysis through virtual-machine logging and replay. ACM SIGOPS Operating Systems Review, 2002,36(SI):211-224. [doi: 10.1145/844128.844148]
    [39] Pecchia A, Russo S. Detection of software failures through event logs: An experimental study. In: Proc. of 2012 IEEE the 23rd Int'l Symp. on Software Reliability Engineering (ISSRE). 2012. 31-40. [doi: 10.1109/ISSRE.2012.24]
    [40] Silva LM. Comparing error detection techniques for Web applications: An experimental study. In: Proc. of the 7th IEEE Int'l Symp. on Network Computing and Applications (NCA 2008). 2008. 144-151. [doi: 10.1109/NCA.2008.57]
    [41] Cinque M, Cotroneo D, Pecchia A. Event logs for the analysis of software failures: A rule-based approach. IEEE Trans. on Software Engineering, 2013,39(6):806-821. [doi: 10.1109/TSE.2012.67]
    [42] Cinque M, Cotroneo D, Natella R, Pecchia A. Assessing and improving the effectiveness of logs for the analysis of software faults. In: Proc. of the 2010 IEEE/IFIP Int'l Conf. on Dependable Systems and Networks (DSN). 2010. 457-466. [doi: 10.1109/DSN.2010. 5544279]
    [43] Zhu J, He P, Fu Q, Zhang H, Lyu MR, Zhang D. Learning to log: Helping developers make informed logging decisions. In: Proc. of the 37th Int'l Conf. on Software Engineering. 2015. [doi: 10.1109/ICSE.2015.60]
    [44] Yuan D, Zheng J, Park S, Zhou Y, Savage S. Improving software diagnosability via log enhancement. In Proc. of the 16th Int'l Conf. on Architectural Support for Programming Languages and Operating Systems. 2011. 3-14. [doi: 10.1145/2110356.2110360]
    [45] Salfner F, Tschirpke S, Malek M. Comprehensive logfiles for autonomic systems. In: Proc. of the 18th Int'l Parallel and Distributed Processing Symp. 2004. [doi: 10.1109/IPDPS.2004.1303243]
    [46] Tan L, Liu C, Li Z, Wang X, Zhou Y, Zhai C. Bug characteristics in open source software. Empirical Software Engineering, 2014, 19(6):1665-1705. [doi: 10.1007/s10664-013-9258-8]
    Comments
    Comments
    分享到微博
    Submit
Get Citation

廖湘科,李姗姗,董威,贾周阳,刘晓东,周书林.大规模软件系统日志研究综述.软件学报,2016,27(8):1934-1947

Copy
Share
Article Metrics
  • Abstract:9381
  • PDF: 12207
  • HTML: 3953
  • Cited by: 0
History
  • Received:June 17,2015
  • Revised:October 27,2015
  • Online: November 11,2015
You are the first2036761Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063