软件配置错误诊断与修复技术研究
作者:
基金项目:

国家自然科学基金(61402453); 国家高技术研究发展计划(863)(2013AA041301); 国家科技支撑计划(2013BAH05 F03, 2012BAH14B02)


Research on Software Misconfiguration Troubleshooting
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [67]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    软件的多样性、复杂性、灵活性和高度可定制性对系统的正确配置提出了挑战,配置错误已经成为影响应用服务质量的关键问题之一.很多学者和研究机构致力于配置错误的检测、诊断和故障修复的相关技术和方法研究,以提高复杂应用系统的可用性和可靠性.为系统了解软件配置错误相关的研究现状和进展,建立了一种多方面、多角度的分析框架对该领域的主要研究工作进行分类总结和分析评价,该分析框架覆盖了方法类型、方式和适用范围这3个方面的多个角度.基于该分析框架的分析结果,总结了当前软件配置错误相关研究中存在的问题,并针对今后该领域的研究趋势进行了展望,对继续和深入研究具有一定的指导意义.

    Abstract:

    Software configuration has been a big challenge due to the diversity, complexity, flexibility and customizability of a system. Configuration error has become a dominant cause leading to system failure and service outage. To improve software availability and reliability, many researchers and institutions have devoted their efforts on software misconfiguration troubleshooting. This paper first builds an analytical framework with establishment of 3 aspects and multiple perspectives, covering the method type, style and applicability. Based on this framework, the paper provides an overview of the state of art of misconfiguration troubleshooting along with analysis on the current leading methods of software misconfiguration troubleshooting. Finally, this paper summarizes the shortcomings in the current research and outlinesthe development prospects of future research. This paper aims to provide some available information and beneficial insight for future researches.

    参考文献
    [1] Rabkin A, Katz R. Static extraction of program configuration options. In: Proc. of the 33rd Int'l Conf. on Software Engineering (ICSE). 2011. [doi: 10.1145/1985793.1985812]
    [2] Oppenheimer D, Ganapathi A, Patterson DA. Why do Internet services fail, and what can be done about it? In: Proc. of the 4th USENIX Symp. on Internet Technologies and Systems (USITS). 2003.
    [3] Gray J. Why do computer stop and what can be done about it? Technical Report, 85.7, Tandem Corp., 1985.
    [4] Hale B. Why every IT practitioner should care about network change and configuration management. 2012. http://web.swcdn.net/ creative/pdf/Wh-itepapers/Why_Every_IT_Practitioner_Should_Care_About_NCCM.pdf
    [5] Sverdlik Y. Microsoft: Misconfigured network device led to azure outage. 2012. http://www.datacenterdynamics.com/focus/ archive/2012/07/microsoft-misconfigured-network-device-led-azure-outage
    [6] Amazon Web Services Team. Summary of the Amazon EC2 and Amazon RDS service disruption in the US east region. 2011. http://aws.amazon.com/message/65648
    [7] Johnson R. More details on today's outage. http://www.facebook.com/notes/facebook-engineering/more-details-on-todays-outage/ 431441338919
    [8] CircleID. Misconfiguration brings down entire .SE domain in Sweden. www.circleid.com/posts/misconfiguration_brings_down_ entire_se_domain_in_sweden/
    [9] Rabkin A, Katz R. Precomputing possible configuration error diagnoses. In: Proc. of the 26th IEEE/ACM Int'l Conf. on Automated Software Engineering (ASE). 2011. [doi: 10.1109/ASE.2011.6100053]
    [10] Attariyan M, Flinn J. Automating configuration troubleshooting with dynamic information flow analysis. In: Proc. of the 9th USENIX Conf. on Operating Systems Design and Implementation (OSDI). 2010.
    [11] Hadoop. http://hadoop.apache.org/
    [12] Rowstron A, Druschel P. Pastry: Scalable, decentralized object location, and routing for large-scale peer-to-peer systems. In: Proc. of the IFIP/ACM Int'l Conf. on Distributed Systems Platforms (Middleware), 2001.
    [13] HBase. http://hbase.apache.org/
    [14] Yin Z, Ma X, Zheng J, Zhou Y, Bairavasundaram L, Pasupath S. An empirical study on configuration errors in commercial and open source systems. In: Proc. of the 23rd ACM Symp. on Operating Systems Principles (SOSP). 2011. [doi: 10.1145/2043556. 2043572]
    [15] Eddy N. Human error caused Google glitch. 2009. http://www.eweek.com/c/a/Enterprise-Applications/Human-Error-Caused- Google-Glitch/
    [16] Levesque M. Fundamental issues with open source software development. 2005. http://dlc.dlib.indiana.edu/dlc/bitstream/handle/ 10535/3215/Fundamental_issues_with_open_source_software_development.pdf? sequence=1&isAllowed=y
    [17] Schreck D, Dallmeier V, Zimmermann T. How documentation evolves over time. In: Proc. of the 9th Int'l Workshop on Principles of Software Evolution (IWPSE). 2007. [doi: 10.1145/1294948.1294952]
    [18] Nagaraja K, Oliveira F, Bianchini R, Martin RP, Nguyen TD. Understanding and dealing with operator mistakes in Internet services. In: Proc. of the 6th Symp. on Operating Systems Design & Implementation (OSDI). 2004.
    [19] Oliveira F, Nagaraja K, Bachwani R, Bianchini R, Martin RP, Nguyen TD. Understanding and validating database system administration. In: Proc. of the USENIX Annual Technical Conf. (ATC). 2006.
    [20] Böehm H, Feldmann A, Maennel O, Reiser C, Volk R. Network wide inter-domain routing policies: Design and realization. In: Proc. of the 34th Conf. on North American Network Operators' Group Meeting. 2005.
    [21] Chen X, Mao Y, Mao ZM, van der Merwe J. Declarative configuration management for complex and dynamic networks. In: Proc. of the 6th Int'l Conf. on Emerging Networking Experiments and Technologies (CoNext). 2010. 61-72. [doi: 10.1145/ 1921168. 1921176]
    [22] Huan L, Dan O. Remote network labs: An on-demand network cloud for configuration testing. SIGCOMM Computer Communication Review, 2010,40(1):83-91. [doi: 10.1145/1672308.1672324]
    [23] Al-Shaer ES, Hamed HH. Discovery of policy anomalies in distributed firewalls. In: Proc. of the 23rd Annual Joint Conf. of the IEEE Computer and Communications Societies (INFOCOM). 2004. 2605-2616. [doi: 10.1109/INFCOM.2004. 1354680]
    [24] Enck W, Moyer T, McDaniel P, Sen S, Sebos P, Spoerel S, Greenberg A, Sung YW, Rao S, Aiello W. Configuration management at massive scale: System design and experience. IEEE Journal on Selected Areas in Communications, 2009,27(3):323-335. [doi: 10. 1109/JSAC.2009.090408]
    [25] Li FL, Yang JH, Wu JP, An CQ, Jiang N. Research on Internet automatic configuration. Ruan Jian Xue Bao/Journal of Software, 2014,25(1):118-134 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/4458.htm
    [26] Chen HF, Jiang GF, Yoshihira K, Saxena A. Invariants based failure diagnosis in distributed computing systems. In: Proc. of the 29th IEEE Symp. on Reliable Distributed Systems. India, 2010. 160-166. [doi: 10.1109/SRDS.2010.26]
    [27] Chen HF, Jiang GF, Ungureanu C, Yoshihira K. Failure detection and localization in component based systems by online tracking. In: Proc. of the 11th ACM SIGKDD Int'l Conf. on Knowledge Discovery in Data Mining. Chicago, 2005. 750-755. [doi: 10.1145/ 1081870.1081968]
    [28] Bodic P, Friedman G, Biewald L, Levine H, Candea G, Patel K, Tolle G, Hui J, Fox A, Jordan MI, Patterson D. Combining visualization and statistical analysis to improve operator confidence and efficiency for failure detection and localization. In: Proc. of the 2nd Int'l Conf. on Autonomic Computing. 2005. 89-100. [doi: 10.1109/ICAC.2005.18]
    [29] Cherkasova L, Ozonat K, Mi N, Symons J, Smirni E. Automated anomaly detection and performance modeling of enterprise applications. ACM Trans. on Computer Systems, 2009,27(3):1-32. [doi: 10.1145/1629087.1629089]
    [30] Zhang L, Xie B, Mei H, Shao WZ, Yang FQ. Study of component-based software configuration management technologies. Acta Electronica Sinica, 2001,29(2):1-3 (in Chinese with English abstract).
    [31] Sandbox. http://en.wikipedia.org/wiki/Sandbox_(computer_security)
    [32] Whitaker A, Cox RS, Gribble SD. Configuration debugging as search: Finding the needle in the haystack. In: Proc. of the 6th USENIX Conf. on Operating Systems Design and Implementation (OSDI). 2004.
    [33] Wang YM, Verbowski C, Dunagan J, Chen Y, Wang HJ, Yuan C, Zhang Z. STRIDER: A black-box, state-based approach to change and configuration management and support. In: Proc. of the 17th Large Installation Systems Administration Conf. (LISA). 2003. [doi: 10.1016/j.scico.2003.12.009]
    [34] Wang HJ, Platt JC, Chen Y, Zhang R, Wang YM. Automatic misconfiguration troubleshooting with PeerPressure. In: Proc. of the 6th USENIX Conf. on Operating Systems Design and Implementation (OSDI). 2004.
    [35] Ding XN, Huang H, Ruan YP, Shaikh A, Zhang XD. Automatic software fault diagnosis by exploiting application signatures. In: Proc. of the 22nd Large Installation Systems Administration Conf. (LISA). 2008.
    [36] Yuan C, Lao N, Wen JR, Li J, Zhang Z, Wang YM, Ma WY. Automated known problem diagnosis with event traces. In: Proc. of the 1st ACM SIGOPS/EuroSys European Conf. on Computer Systems (EuroSys). 2006. [doi: 10.1145/1217935.1217972]
    [37] Attariyan M, Flinn J. Using causality to diagnose configuration bugs. In: Proc. of the 2008 USENIX Annual Technical Conf. (ATC). Boston, 2008.
    [38] Xu TY, Zhang JQ, Huang P, Zheng J, Sheng TW, Yuan D, Zhou YY, Pasupathy S. Do not blame users for misconfigurations. In: Proc. of the 24th ACM Symp. on Operating Systems Principles (SOSP). 2013. [doi: 10.1145/2517349.2522727]
    [39] Zhang S, Ernst MD. Automated diagnosis of software configuration errors. In: Proc. of the 35th Int'l Conf. on Software Engineering (ICSE). 2013. [doi: 10.1109/ICSE.2013.6606577]
    [40] Rabkin A. Using program analysis to reduce misconfiguration in open source systems software [Ph.D. Thesis]. Berkeley: University of California, 2012.
    [41] Zhang S, Ernst MD. Which configuration option should I change? In: Proc. of the 36th Int'l Conf. on Software Engineering (ICSE). 2014. [doi: 10.1145/2568225.2568251]
    [42] Tucek J, Lu S, Huang C, Xanthos S, Zhou YY. Triage: Diagnosing production run failures at the user's site. In: Proc. of the 21st ACM Symp. on Operating Systems Principles (SOSP). 2007. [doi: 10.1145/1294261.1294275]
    [43] Attariyan M, Chow M, Flinn J. X-Ray: Automating root-cause diagnosis of performance anomalies in production software. In: Proc. of the 10th USENIX Symp. on Operating Systems Design and Implementation (OSDI). Hollywood, 2012.
    [44] Su YY, Attariyan M, Flinn J. AutoBash: Improving configuration management with operating system causality analysis. In: Proc. of the 21st ACM Symp. on Operating Systems Principles (SOSP). 2007. [doi: 10.1145/1294261.1294284]
    [45] Xiong Y, Hubaux A, She S, Czarnecki K. Generating range fixes for software configuration. In: Proc. of the 34th Int'l Conf. on Software Engineering (ICSE). 2012. [doi: 10.1109/ICSE.2012.6227206]
    [46] Zhang S. ConfDiagnoser: An automated configuration error diagnosis tool for Java software. In: Proc. of the 35th Int'l Conf. on Software Engineering (ICSE). 2013. [doi: 10.1109/ICSE.2013.6606737]
    [47] Yuan D, Zheng J, Park S, Zhou YY, Savage S. Improving software diagnosability via log enhancement. In: Proc. of the 16th Int'l Conf. on Architecture Support for Programming Language and Operating Systems (ASPLOS). 2011. [doi: 10.1145/1950365. 1950369]
    [48] Zhang JQ, Renganarayana L, Zhang XL, Ge NY, Bala V, Xu TY, Zhou YY. EnCore: Exploiting system environment and correlation information for misconfiguration detection. In: Proc. of the 19th Int'l Conf. on Architecture Support for Programming Language and Operating Systems (ASPLOS). 2014. [doi: 10.1145/2541940.2541983]
    [49] Yuan D, Xie XL, Panigrahy R, Yang JF, Verbowski C, Kumar A. Context-Based online configuration-error detection. In: Proc. of the 2011 USENIX Annual Technical Conf. (ATC). Portland, 2011. 313-326.
    [50] Rabkin A, Katz R. Using static analysis to diagnose configuration errors. In: Proc. of the 11th Int'l Symp. on Software Testing and Analysis (ISSTA). 2011.
    [51] Lao N, Wen JR, Ma WY, Wang YM. Combining high level symptom descriptions and low level state information for configuration fault diagnosis. In: Proc. of the 18th Large Installation Systems Administration Conf. (LISA). 2004.
    [52] Dong Z, Ghanavati M, Andrzejak A. Automated diagnosis of software misconfigurations based on static analysis. In: Proc. of the 2013 IEEE Int'l Symp. on Software Reliability Engineering Workshops (ISSREW). 2013. [doi: 10.1109/ISSREW.2013.6688897]
    [53] Keller L, Upadhyaya P, Candea G. ConfErr: A tool for assessing resilience to human configuration errors. In: Proc. of the IEEE Int'l Conf. on Dependable Systems and Networks with FTCS and DCC (DSN). 2008. [doi: 10.1109/DSN.2008.4630084]
    [54] Xia X, Lo D, Qiu W, Wang X. Automated configuration bug report prediction using text mining. In: Proc. of the IEEE 38th Annual Computer Software and Applications Conf. (COMPSAC). 2014. [doi: 10.1109/COMPSAC.2014.17]
    [55] Jin D, Qu X, Cohen MB, Robinson B. Configurations everywhere: implications for testing and debugging in practice. In: Companion Proc. of the 36th Int'l Conf. on Software Engineering (ICSE Companion). 2014. [doi: 10.1145/2591062.2591191]
    [56] Arshad FA, Krause RJ, Bagchi S. Characterizing configuration problems in Java EE application servers: An empirical study with GlassFish and JBoss. In: Proc. of the IEEE 24th Int'l Symp. on Software Reliability Engineering (ISSRE). 2013. [doi: 10.1109/ ISSRE.2013.6698919]
    [57] Meng FJ, Zhuo XJ. A generic framework for application configuration discovery with pluggable knowledge. In: Proc. of the IEEE 6th Int'l Conf. on Cloud Computing (CLOUD). 2013. [doi: 10.1109/CLOUD.2013.16]
    [58] Mickens J, Szummer, Narayanan D. Snitch: Interactive decision trees for troubleshooting misconfigurations. In: Proc. of the 2nd USENIX Workshop on Tackling Computer Systems Problems with Machine Learning Techniques (SYSML). 2007.
    [59] Attariyan M. Improving software configuration troubleshooting with causality analysis [Ph.D. Thesis]. University of Michigan, 2012.
    [60] Wang HJ, Platt J, Chen Y, Zhang R, Wang YM. PeerPressure: A statistical method for automatic misconfiguration troubleshooting. Technical Report, MSR-TR-2003-80, 2003. http://research.microsoft.com/en-us/um/people/helenw/papers/msr-tr-03-80.pdf?0sr=ar
    [61] Su YY, Flinn J. Automatically generating predicates and solutions for configuration troubleshooting. Technical Report, 2007. http://citeseerx.ist.psu.edu.sci-hub.org/viewdoc/download?doi=10.1.1.148.7963&rep=rep1&type=pdf
    [62] Naik M. JChord. http://jchord.googlecode.com
    [63] Nevill-Manning CG, Witten IH. Identifying hierarchical structure in sequences: A linear-time algorithm. Journal of Artificial Intelligence Research, 1997. 67-82.
    [64] Sridharan M, Fink SJ, Bodik R. Thin slicing. In: Proc. of the ACM SIGPLAN 2007 Conf. on Programming Language Design and Implementation (PLDI). 2007.
    [65] Chang CC, Lin CJ. LIBSVM: A library for support vector machines. ACM Trans. on Intelligent Systems and Technology (TIST), 2011,2(3):27. [doi: 10.1145/1961189.1961199]
    [66] Reiter R. A theory of diagnosis from first principles. Artificial Intelligence, 1987,32(1):57-95. [doi: 10.1016/0004-3702(87) 90062-2]
    [67] Kuhn DR, Reilly MJ. An investigation of the applicability of design of experiments to software testing. In: Proc. of the Annual NASA/IEEE Software Engineering Workshop (SEW). Los Alamitos: IEEE Press, 2002. 91-95. [doi: 10.1109/SEW.2002.1199454]
    引证文献
引用本文

陈伟,黄翔,乔晓强,魏峻,钟华.软件配置错误诊断与修复技术研究.软件学报,2015,26(6):1285-1305

复制
分享
文章指标
  • 点击次数:6074
  • 下载次数: 8848
  • HTML阅读次数: 4162
  • 引用次数: 0
历史
  • 收稿日期:2014-05-13
  • 最后修改日期:2015-02-06
  • 在线发布日期: 2015-06-04
文章二维码
您是第19727454位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号