大数据流式计算：关键技术及系统实例

doi:10.13328/j.cnki.jos.004558

微信服务号

微信订阅号

2025年5月11日 9:13 星期日

首页 > 过刊浏览>2014年第25卷第4期 >839-862. DOI:10.13328/j.cnki.jos.004558

PDF HTML阅读 XML下载导出引用引用提醒

大数据流式计算：关键技术及系统实例
DOI:
                        10.13328/j.cnki.jos.004558
                    
CSTR:
                        
                    
作者:
                        孙大为孙大为
清华大学 计算机科学与技术系, 北京 100084
在期刊界中查找
在百度中查找
在本站中查找
张广艳张广艳
清华大学 计算机科学与技术系, 北京 100084;符号计算与知识工程教育部重点实验室(吉林大学), 吉林 长春 130012
在期刊界中查找
在百度中查找
在本站中查找
郑纬民郑纬民
清华大学 计算机科学与技术系, 北京 100084
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:国家自然科学基金（61170008，61272055）；国家重点基础研究发展计划（973）（2014CB340402）；吉林大学符号计算与知识工程教育部重点实验室资助项目（93K172012K12）

Big Data Stream Computing：Technologies and Instances

Author:

SUN Da-Wei
SUN Da-Wei
Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
在期刊界中查找
在百度中查找
在本站中查找
ZHANG Guang-Yan
ZHANG Guang-Yan
Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China;Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education (Jilin University), Changchun 130012, China
在期刊界中查找
在百度中查找
在本站中查找
ZHENG Wei-Min
ZHENG Wei-Min
Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

大数据计算主要有批量计算和流式计算两种形态，目前，关于大数据批量计算系统的研究和讨论相对充分，而如何构建低延迟、高吞吐且持续可靠运行的大数据流式计算系统是当前亟待解决的问题且研究成果和实践经验相对较少.总结了典型应用领域中流式大数据所呈现出的实时性、易失性、突发性、无序性、无限性等特征，给出了理想的大数据流式计算系统在系统结构、数据传输、应用接口、高可用技术等方面应该具有的关键技术特征，论述并对比了已有的大数据流式计算系统的典型实例，最后阐述了大数据流式计算系统在可伸缩性、系统容错、状态一致性、负载均衡、数据吞吐量等方面所面临的技术挑战.

关键词:大数据计算;流式计算;流式大数据;内存计算;系统实例

Abstract:

Batch computing and stream computing are two important forms of big data computing. The research and discussions on batch computing in big data environment are comparatively sufficient. But how to efficiently deal with stream computing to meet many requirements, such as low latency, high throughput and continuously reliable running, and how to build efficient stream big data computing systems, are great challenges in the big data computing research. This paper provides a research of the data computing architecture and the key issues in stream computing in big data environments. Firstly, the research gives a brief summary of three application scenarios of stream computing in business intelligence, marketing and public service. It also shows distinctive features of the stream computing in big data environment, such as real time, volatility, burstiness, irregularity and infinity. A well-designed stream computing system always optimizes in system structure, data transmission, application interfaces, high-availability, and so on. Subsequently, the research offers detailed analyses and comparisons of five typical and open-source stream computing systems in big data environment. Finally, the research specifically addresses some new challenges of the stream big data systems, such as scalability, fault tolerance, consistency, load balancing and throughput.

Key words:big data computing;stream computing;stream big data;memory computing;system instance

引用本文

孙大为,张广艳,郑纬民.大数据流式计算：关键技术及系统实例.软件学报,2014,25(4):839-862

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2013-09-07
最后修改日期:2013-12-16
录用日期:
在线发布日期: 2014-01-24
出版日期:

微信服务号

微信订阅号

引用本文

相关视频

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

相关视频

分享

微信扫一扫：分享

文章指标

历史

文章二维码