静态程序切片的GPU通用计算功耗预测模型
作者:
基金项目:

国家自然科学基金(60970012); 教育部博士点专项基金(20113120110008); 上海重点科技攻关项目(09511501000,09220502800); 上海市一流学科建设项目(XTKX2012)


Power Consumption Prediction Model of General-Purpose Computing GPU with Static Program Slicing
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [24]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    随着图形处理器通用计算的发展,GPU(graphics processing unit)通用计算程序功耗的度量与优化成为绿色计算领域中的一个基础问题.当前,GPU 计算能耗评测主要通过硬件来实现,而开发人员无法在编译之前了解应用程序能耗,难以实现能耗约束下的代码优化与重构.为了解决开发人员评估应用程序能耗的问题,提出了针对应用程序源代码的静态功耗预测模型,根据分支结构的疏密程度以及静态程序切片技术,分别建立分支稀疏和稠密两类应用程序的功耗预测模型.程序切片是介于指令与函数之间的度量粒度,在分析GPU应用程序时具有较强的理论支持和可行性.用非线性回归和小波神经网络建立两种切片功耗模型.针对特定GPU 非线性回归模型的准确性较好.小波神经网络预测模型适合各种体系的GPU,具有较好的通用性.对应用程序分支结构进行分析后,为分支稀疏程序提供加权功率统计模型,以保证功耗评估算法的效率.分支稠密程序则采用基于执行路径概率的功耗预测法,以提高预测模型的准确性.实验结果表明,两种预测模型及算法能够有效评估GPU 通用计算程序的功耗,模型预测值与实际测量值的相对误差低于6%.

    Abstract:

    With the development of general-purpose computing of GPUs (graphics processing units), power consumption measurements and optimization have become an essential issue in the green computing field. The current power consumption of GPUs is mainly measured by the hardware. However the programmers have had difficulty understanding the power consumption profile of the applications used to optimize and refactor before the compile phase. To solve this issue, power consumption models were proposed for GPU applications with regard to sparseness- branch and denseness-branch programs based on program slicing, respectively. The program slicing is fine-grained level that lies between the function and the instruction levels and has good feasibility and accuracy in the power consumption estimation. The power consumption prediction models for program slicing were proposed through no-linear regression and wavelet neural networks. To specific GPUs, the power prediction model based on no-linear regression is more precise than the prediction model based on wavelet neural networks. However the wavelet neural networks model has better generality to various kinds of GPUs. After analyzing the structure of the applications, the weighted power model for sparseness-branch programs was provided to achieve better effectiveness. The probability slicing power model for denseness-branch programs was also proposed to improve the accuracy that is based on the probability of the execution paths. The results indicate that the two different models can effectively predict the power consumption. And the average relative error between the predicted value and the measured value is less than 6%.

    参考文献
    [1] Kurp P. Green computing. Communications of the ACM, 2008,51(10):11-13. [doi: 10.1145/1400181.1400186]
    [2] Wang XR, Chen M, Fu X. MIMI power control for high-density servers in an enclosure. IEEE Trans. on Parallel and DistributedSystem, 2010,21(10):1412-1426. [doi: 10.1109/TPDS.2010.31]
    [3] Lin YS, Yang XJ, Tang T, Wang GB, Xu XH. A GPU low-power optimization based on parallelism analysis model. ChineseJournal of Computers, 2011,34(4):705-716 (in Chinese with English abstract). [doi: 10.3724/SP.J.1016.2011.00705]
    [4] Lin YS, Yang XJ, Tang T, Wang GB, Xu XH. An integrated energy optimization approach for CPU-GPU heterogeneous systembased on critical path analysis. Chinese Journal of Computers, 2012,35(1):123-133 (in Chinese with English abstract). [doi: 10.3724/SP.J.1016.2012.00123]
    [5] Gebhart M, Johnson DR, Tarjan D, Keckler SW, Dally WJ, Lindholm E, Skadron K. Energy-Efficient mechanisms for managingthread context in throughput processors. Computer Architecture News, 2011,39(3):235-246. [doi: 10.1145/2024723.2000093]
    [6] Wang GB, Lin YS, Yi W. Kernel fusion: An effective method for better power efficiency on multithreaded GPU. In: Proc. of the2010 IEEE/ACM Int’l Conf. on Green Computing and Communications & Int’l Conf. on Cyber, Physical and Social Computing.2010. 344-350. [doi: 10.1109/GreenCom-CPSCom.2010.102]
    [7] Shaikh MZ, Gregoire M, Li W, Wroblewski M, Simon S. In situ power analysis of general purpose graphical processing unit. In:Proc. of the 19th Euromicro Int’l Conf. on Parallel, Distributed and Network-Based Processing. 2011. 40-44. [doi: 10.1109/PDP.2011.67]
    [8] Collange S, Defour D, Tisserand A. Power consumption of GPUs from a software perspective. In: Proc. of the ComputationalScience (ICCS 2009). LNCS, Heidelberg: Springer-Verlag, 2009. 914-923.
    [9] Jiao Y, Lin H, Balarji P, Feng W. Power and performance characterization of computational kernel on the GPU. In: Proc. of theIEEE/ACM Int’l Conf. on Green Computing and Communications & Int’l Conf. on Cyber, Physical and Social Computing. 2010.221-228. [doi: 10.1109/GreenCom-CPSCom.2010.143]
    [10] Hong S, Kim H. An integrated GPU power and performance model. Computer Architecture News, 2010,38(3):280-289. [doi: 10.1145/1816038.1815998]
    [11] Lee D, Ishihara T, Muroyama M, Yasuura H, Fallah F. An energy characterization framework for software-based embeddedsystems. In: Proc. of the 2006 IEEE/ACM/IFIP Workshop on Embedded Systems for Real Time Multimedia. 2006. 59-64. [doi: 10.1109/ESTMED.2006.321275]
    [12] Sinha A, Ickes N, Chandrakasn AP. Instruction level and operating system profiling for energy exposed software. IEEE Trans. onVery Large Scale Integration (VLSI) Systems, 2003,11(6):1044-1057. [doi: 10.1109/TVLSI.2003.819569]
    [13] Tan TK, Raghunathan A, Lakishminarayana G, Jha NK. High-Level software energy macro-modeling. In: Proc. of the 38th DesignAutomation Conf. 2001. 605-610. [doi: 10.1109/DAC.2001.935580]
    [14] Senn E, Laurent J, Juin E, Diguet JP. Refining power consumption estimations in the component based AADL design flow. In:Proc. of the IEEE Conf. on Specification, Verification and Design Language. 2008. 173-178. [doi: 10.1109/FDL.2008.4641441]
    [15] Zhang TT, Wu X, Li CD, Dong YW. On energy-consumption analysis and evaluation for component-based embedded system withCSP. Chinese Journal of Computers, 2009,32(9):1-8 (in Chinese with English abstract).
    [16] Liu XB, Guo B, Shen Y, Xiong B, Wang JH, Wu YS, Liu YB. Embedded software energy modeling method at architecture level.扒獵瑡牮愠捊瑩???存摵潥椠??????????卬倠????は????ひづ???????崲?戨爲?嬺有?崰??攳渹攠猨獩祮?????健慳瑥琠敷物獴潨渠??????潨洠灡畢瑳整牲??牴挩栮椠瑨整捴瑰町爯支????兪畯慳渮瑯楲瑧愮瑣楮瘯攱‰?瀰瀭爹漸愲挵栯???琵栮?整摭???卯慩渺??爰愮渳挷椲猴振潓???漮爱朰愰渱??愰由昲洮愰渴渱?倵畝戼汢楲猾桛攱爷獝??ちの????????????扴牨?孮?ぁ嵋??汊慨牡欠?????畯湦瑴?卡???慡汲慣捨慩牴楥慣?偵??公甠慴湲瑡楮晳楦敯摲?楡湴瑩敯牮晳攺爠敁渠据敥?映潡牰?慲?睡档楨氠整?氠慬湯杷甠慥杮敥???氠敥捭瑢牥潤湤楥捤?乳潯瑦整獷?楲湥?吠桉敮漺牐敲瑯楣挮愠汯??潴浨灥甠瑄敥牳?卧据椬攠湁捵整??ちぴ???ㄠ??????????????孲摯潰楥???の??????樠?數湨瑩换獩??はの??ぐ??っ??嵳?扮牧?嬠??崰″匮椠渱朰攴父???‵吱漮眠慛牤摯獩?瀠爱漰戮愱戱椰氹椯獄瑁楔捅?瀲爰漰朳爮愱洲‵猳氷椴挲楝渼杢? ̄?渱??倠牌潥捯??潁晓?琠桌敡??慬来獹琠畂栬氠?卨敩浮椠湊慌爮????ㄠ??坴慲摡敓牐湁???慔朱猠瑰畲桯汣?剳敳獯敲愺爠捃桍???づぬ?????????戠牉?嬺??嵲?坣愮渠杯??????楉??塅?′娰栰漶甠?塵女??偭爠潉普楴汥楧湲条?慥汤汃?灲慣瑵桩獴??剃畯慮湦??椲愰渰?堮甠攵‵?愭漵??漮甠牛湤慯汩?漠昱‰匮漱昱琰眹愯牃敉??金??日?金????????????㈱???楁湭??桦楴渠敔献攠?睬楩瑣桩?湧朠汦楯獲栠?慯扤獥瑲牮愠捰瑲???桡瑭琠灳???督睴睵?橥潳猺?潁爠杴?捥湯???ては????????ち??桮瑧洠?孲摲潥楬???の??????即倮????はひ???ど???ぐ??っ?嵳?扩牮?嬠??嵴??潲敳挬欠攲‰???匱攰愶爨愲?刺?‵匭漵爱琮椠湛杤?物愺琠攱猰?椱渰?瘶椯摪攮潩?敬渮挲漰搰椷渮朱‰瀮爰漰挲敝猼獢?显潛爲‰捝漠浍灡汲整硩楮琠祐?爬攠摈畵捳瑳楥潩湮???????呩牶慩湮獧??漠湳??楣物据畧椠瑡獬?慯湲摩?卨祭猠瑶敩浡猠?晥潲牭噡楔搠整潲?呮敳捦桯湲潭污潴杩祯??金??ぅ???????????のㄠ??学摴潷楡??ㄠぅ???の??呲?卮噧听?有????社?㈨?〩呼有崴-47. [doi: 10.1109/TSE.2010.13]
    [21] Pharr M. GPU Gems2. 3rd ed., Boston: Addison Wesley, 2005. 493-495.
    [22] Wang HF, Chen QK. Power estimating model and analysis of general programming on GPU. JOURNAL OF SOFTWARE, 2012,7(5):1164-1170. [doi: 10.4304/jsw.7.5.1164-1170]
    [23] Bates DM, Watts DG. Nonlinear Regression Analysis and Its Applications. New York: Wiley, 1997. 36-37.
    [24] NVIDIA_Corporation.CUDA c programming guide. 2012. http://www.nvidia.com/
    [25] Parboil benchmark suite. 2012. http://impact.crhc.illinois.edu/parboil.php
    [26] Che S, Boyer M, Meng J, Tarjan D, Sheaffer JW, Lee SH, Skadro K. Rodinia: A benchmark suite for heterogeneous computing. In:Proc. of the 2009 IEEE Int’l Symp. on Workload Characterization. 2009. 44-54. [doi: 10.1109/IISWC.2009.5306797]
    [27] Zhang QH, Benveniste A. Wavelet networks. IEEE Trans. on Neural Networks, 1992,3(6):889-898. [doi: 10.1109/72.165591]
    [28] Wang DW, Dou Y, Li SK. Loop kernel pipelining mapping onto coarse-grained reconfigurable architectures. Chinese Journal ofComputers, 2009,32(6):1089-1098 (in Chinese with English a
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

王海峰,陈庆奎.静态程序切片的GPU通用计算功耗预测模型.软件学报,2013,24(8):1746-1760

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2012-08-03
  • 最后修改日期:2012-10-19
  • 在线发布日期: 2013-07-26
文章二维码
您是第19780749位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号