Distributed Gradient Boosting Decision Tree Algorithm for High-dimensional and Multi-classification Problems

doi:10.13328/j.cnki.jos.005690

微信服务号

微信订阅号

2025-4-8- 5

Home > Archive>Volume 30, Issue 3, 2019 >784-798. DOI:10.13328/j.cnki.jos.005690

PDF HTML XML Export Cite reminder

Distributed Gradient Boosting Decision Tree Algorithm for High-dimensional and Multi-classification Problems
DOI:
                        10.13328/j.cnki.jos.005690
                    
Author:
                        JIANG Jia-WeiJIANG Jia-Wei
Key Laboratory of High Confidence Software Technologies of Ministry of Education(Peking University), Beijing 100871, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
FU Fang-ChengFU Fang-Cheng
Key Laboratory of High Confidence Software Technologies of Ministry of Education(Peking University), Beijing 100871, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
SHAO Ying-XiaSHAO Ying-Xia
School of Computer Science, Beijing University of Posts and Telecommunications, Beijing 100876, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
CUI BinCUI Bin
Key Laboratory of High Confidence Software Technologies of Ministry of Education(Peking University), Beijing 100871, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:National Natural Science Foundation of China (61702423, 61532021, U1501252, 61402180); National Key Research and Development Program of China (2016YFB1000905)

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

Gradient boosting decision tree algorithm is widely used in various tasks, such as classification, regression, and ranking, owing to its high accuracy and strong interpretability. With the explosive growth of data volume, distributed gradient boosting decision tree algorithms have become an important research issue. Although there exists a series of implementations of distributed gradient boosting decision tree, they perform poorly on high-dimensional and multi-classification tasks. The data parallel strategy they adopt requires the transmission of gradient histograms, and this communication overhead becomes the bottleneck in many high-dimensional and multi-classification task. This study aims at this problem and tries to find an efficient parallel strategy that is more suitable for the target. Data-parallel and feature-parallel strategies are first compared based on a cost model, and it is theoretically proved that feature-parallel is more suitable for high-dimensional and multi-classification tasks. Based on the analysis, this paper proposes a feature-parallel distributed gradient boosting decision tree algorithm, named FP-GBDT. FP-GBDT designs an efficient distributed dataset transposition method to partition the training dataset by column. During the construction of gradient histogram, FP-GBDT uses a sparsity-aware method to accelerate the histogram construction. When splitting tree nodes, FP-GBDT develops a bitmap compression method to transmit the placement of instances, thereby reduces the communication overhead. This study compares the performance of distributed gradient boosting decision tree algorithm under different parallel strategies through extensive experiments. First, the effectiveness of proposed optimization methods in FP-GBDT is verified. Then, the representative of data-parallel strategy of FP-GBDT and XGBoost are compared. On various datasets, it is proved that FP-GBDT is more efficient in high-dimensional and multi-classification tasks. FP-GBDT achieves up to 6 times performance improvement than data-parallel implementations.

Key words:gradient boosting decision tree;data parallel;feature parallel;system implementation;performance comparison

Get Citation

江佳伟,符芳诚,邵蓥侠,崔斌.面向高维特征和多分类的分布式梯度提升树.软件学报,2019,30(3):784-798

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:July 19,2018
Revised:September 20,2018
Adopted:
Online: March 06,2019
Published:

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History