Survey on Storage and Optimization Techniques of HDFS
Author:
Affiliation:

Clc Number:

TP311

Fund Project:

National Key Research and Development Program of China (2018YFB1004401); National Natural ScienceFoundation of China (U1711261, 61432006, 61732014)

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    As an append-only and read optimized open-source distributed file system, HDFS (Hadoop distributed file system) provides portability, high fault-tolerance, and massive horizontal scalability. Over the past decade, HDFS has been widely used for big data storage, and it manages various data, such as text, graph, key-values, etc. Moreover, big data systems based on or compatible with HDFS have been prevalent in many application scenarios such as complex SQL analysis, ad-hoc queries, interactive analysis, key-value storage, and iterative computation. HDFS has been the universal underlying file system to store massive data and support manifold analytical applications. Therefore, it is of great significance to optimizing the storage performance and data access efficiency of HDFS. In this study, the principles and features of HDFS are summarized and a survey on storage and optimization techniques of HDFS is carried out from three dimensions, including logic file structure, hardware, and application scenarios. It is also proposed that storage over heterogeneous hardware, workload-guided adaptive storage optimization, and storage optimization combined with machine learning technologies could be the most appealing research directions in the future.

    Reference
    Related
    Cited by
Get Citation

金国栋,卞昊穹,陈跃国,杜小勇. HDFS 存储和优化技术研究综述.软件学报,2020,31(1):137-161

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:January 17,2019
  • Revised:March 11,2019
  • Adopted:
  • Online: August 12,2019
  • Published: January 06,2020
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063