Optimization Method for Storing Massive Small Files in Multi-modal Medical Data

doi:10.13328/j.cnki.jos.006710

微信服务号

微信订阅号

2025-6-2- 23

Home > Archive>Volume 34, Issue 3, 2023 >1451-1469. DOI:10.13328/j.cnki.jos.006710

PDF HTML XML Export Cite reminder

Optimization Method for Storing Massive Small Files in Multi-modal Medical Data
DOI:
                        10.13328/j.cnki.jos.006710
                    
Author:
                        ZENG MengZENG Meng
School of Computer Science and Engineering, Central South University, Changsha 410083, China;Hunan Engineering Research Center of Machine Vision and Intelligent Medicine (Central South University), Changsha 410083, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
ZOU Bei-JiZOU Bei-Ji
School of Computer Science and Engineering, Central South University, Changsha 410083, China;Hunan Engineering Research Center of Machine Vision and Intelligent Medicine (Central South University), Changsha 410083, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
ZHANG Wen-ShengZHANG Wen-Sheng
Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
YANG Xue-BingYANG Xue-Bing
Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
ZHU Cheng-ZhangZHU Cheng-Zhang
School of Literature and Journalism, Central South University, Changsha 410083, China;Hunan Engineering Research Center of Machine Vision and Intelligent Medicine (Central South University), Changsha 410083, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:TP311
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

Hadoop distributed file system (HDFS) is used for the storage and management of large files, while storing and computing a large number of small files consume a lot of NameNode memory usage and access time. Therefore, the small file problem becomes an important factor that restricts HDFS performance. Aiming at the problem of massive small files in multi-modal medical data, a small file storage method based on two-layer hash coding and HBase is proposed to optimize the storage of massive small files on HDFS. When merging small files, an expandable hash function is utilized to build an index file bucket to expand the index file dynamically as needed and realize the file append function. To read the file in O(1) time complexity and improve the efficiency of file search, the MWHC hash function is used to store the position of the index information of each file in the index file. There is no need to read the index information of all files, only need to read the index information of the corresponding bucket. To meet the storage needs of multi-modal medical data, HBase is used to store the index information and set the identification column to identify different modal medical data, which is convenient for storage and management of different modal data and improves file reading speed. To further optimize storage performance, the LRU-based metadata prefetching mechanism is established, and the LZ4 compression algorithm is utilized to compress the merged files. The experiment compares file access performance and NameNode memory usage. The results show that compared with the original HDFS, HAR, MapFile, TypeStorage, and HPF small file storage methods, the proposed algorithm has a shorter file access time, which can improve the overall performance of HDFS when processing massive small files in multi-modal medical data.

Key words:multi-modal medical data;HDFS;HBase;small files;storage performance optimization

Get Citation

曾梦,邹北骥,张文生,杨雪冰,朱承璋.多模态医疗数据中海量小文件存储优化方法.软件学报,2023,34(3):1451-1469

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:June 17,2021
Revised:November 25,2021
Adopted:
Online: March 10,2023
Published: March 06,2023

You are the first2049962Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History