桶外排序算法的抽样分点分发策略
作者:
基金项目:

Supported by the National Natural Science Foundation of China under Grant Nos.60223004,60321002,60303005(国家自然科学基金)


The Sample-Seperators Based Distributing Scheme of the External Bucket Sort Algorithm
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [14]
  • |
  • 相似文献 [20]
  • |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    计算机外排序常用二阶段多路归并算法和桶算法.后者运算开销小,效率更高.但基于关键字高位比特的子文件分发策略应用受限:关键字必须是整数;得到的子文件可能大小不一;子文件数不能任意选择.基于统计学理论,提出抽样分点分发策略克服以上问题,扩展桶排序的应用范围.讨论了抽样分点估计的收敛性,给出了不发生内存溢出的保证概率.该策略使桶排序算法在SheenkSort排序系统上得到成功应用,并最终获得2003年度PennySort世界排序比赛Indy组冠军.

    Abstract:

    Two ways to sort externally are Multi-Line Merging Sort and Bucket Sort, both with two passes. The Bucket Sort burdens the CPU less and is more efficient, while its usage is restricted heavily by the High-Bit scheme that distributes records into subfiles: the keys have to be integers; the sizes of subfiles may vary too much; the number of subfiles cannot be chosen freely. Based on statistical theory, this paper presents a sample-seperators scheme to broaden the ussage of bucket sort algorithm. A brief discussion on the convergance of sample-seperator estimation is given and the probability to avoid memory overflow is calculated. This scheme enables the bucket sort algorithm to be applied in the SheenkSort system to win the 2003 PennySort (the Indy category) competition.

    参考文献
    [1]Owen A. Bubble sort: An archaeological algorithmic analysis. In: Grissom S, Knox D, Joyce D, Dann W, eds. Proc. of the 34th SIGCSE Technical Symp. on Computer Science Education. New York: ACM Press, 2003.1-5.
    [2]Hore CAR. Quicksort. The Computer Journal, 1962,5(1):10-16.
    [3]Tang XY. Fast sorting method of separating segment. Journal of Software, 1993,4(2):53-57 (in Chinese with English abstract).
    [4]Wang XY. A new sorting method by base distribution and linking. Chinese Journal of Computers, 2002,23(7):774-777(in Chinese with English abstract).
    [5]Chen JC. Proportion extend sort. SIAM Journal on Computing, 2001,(31)1:323-330.
    [6]Lorin H. Sorting and Sort Systems. Reading: Addison-Wesley Publishing Company, 1975.
    [7]Knuth DE. The Art of Computer Programming, Vol 3: Sorting and Searching. Reading: Addison-Wesley Publishing Company,1973.
    [8]Gray J, Coates J, Nyberg C. Performance/Price sort and PennySort. Technical Report, MS-TR-98-45, Microsoft Research, 1998.
    [9]Sort Benchmark. http://research.microsoft. com/barc/SortBenchmark/
    [10]Shi Y, Zhang L, Liu P. THSORT: A single-processor parallel sorting algorithm. Journal of Software, 2003,14(2):159-165 (in Chinese with English abstract). http//www.jos.org.cn/1000-9825/15/159.htm
    [11]Wegner LM, Teuhola JI. The external heapsort. IEEE Trans. on Software Engineering, 1989,15(7):917-925.
    [12]唐向阳.分段快速排序算法.软件学报,1993,4(2):53-57.
    [13]王向阳.任意分布数据的基数分配链接排序算法.计算机学报,2000,23(7):774-777.
    [14]施遥,张力,刘鹏.THSORT:单机并行排序算法.软件学报,2003,14(2):159-165.http//www.jos.org.cn/1000-9825/15/159.htm
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

杨磊,黄辉,宋涛.桶外排序算法的抽样分点分发策略.软件学报,2005,16(5):643-651

复制
分享
文章指标
  • 点击次数:3741
  • 下载次数: 5348
  • HTML阅读次数: 0
  • 引用次数: 0
历史
  • 收稿日期:2004-03-23
  • 最后修改日期:2004-06-11
文章二维码
您是第19829757位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号