No Reference Video Quality Assessment Based on 3D Convolutional Neural Network
Author:
Affiliation:

Fund Project:

National Natural Science Foundation of China (61025011, 61332016, 61472389); National High Technology Research and Development Program of China (973) (2015CB351802)

  • Article
  • | |
  • Metrics
  • |
  • Reference [37]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    No reference video quality assessment (NR-VQA) measures distorted videos quantitatively without the reference of original high quality videos. Conventional NR-VQA methods are generally designed for specific types of distortions, or not consistent with human's perception. This paper innovatively introduces 3D deep convolutional neural network (3D-CNN) into VQA and proposes a 3D-CNN based NR-VQA method, which is universal for non-specific types of distortions. First, the proposed method utilizes 3D patches to learn spatio-temporal features that represent video content effectively. Second, the original 3D-CNN model is modified which is used to classify videos to make it adapt to VQA task. Experiments demonstrate that the proposed method is highly consistent with human's perception across numerous distortions and metrics. Compared with other state-of-the-art no-reference VQA methods, the proposed method runs much faster while keeping the similar performance. As a no-reference VQA method, it is even comparable with many of the state-of-the-art full-reference VQA methods, which provides the proposed method with better application prospects.

    Reference
    [1] Vu PV, Vu CT, Chandler DM. A spatiotemporal most-apparent-distortion model for video quality assessment. In:Proc. of the 18th IEEE Int'l Conf. On Image Processing. Brussels:IEEE, 2011. 2505-2508.
    [2] Vu PV, Chandler DM. Vis3:An algorithm for video quality assessment via analysis of spatial and spatiotemporal slices. Journal of Electronic Imaging, 2014,23(1):013016.
    [3] Seshadrinathan K, Bovik AC. Motion tuned spatio-temporal quality assessment of natural videos. IEEE Trans. on Image Processing, 2010,19(2):335-350.
    [4] Soundararajan R, Bovik AC. Video quality assessment by reduced reference spatio-temporal entropic differencing. IEEE Trans. on Circuits and Systems for Video Technology, 2013,23(4):684-694.
    [5] Ye P, Kumar J, Kang L, Doermann D. Unsupervised feature learning framework for no-reference image quality assessment. In:Proc. of the 2012 IEEE Int'l Conf. on Computer Vision and Pattern Recognition. Providence:IEEE, 2012. 1098-1105.
    [6] Mittal A, Moorthy AK, Bovik AC. No-Reference image quality assessment in the spatial domain. IEEE Trans. on Image Processing, 2012,21(12):4695-4708.
    [7] Xue W, Zhang L, Mou X. Learning without human scores for blind image quality assessment. In:Proc. of the 2013 IEEE Int'l Conf. on Computer Vision and Pattern Recognition. Portland:IEEE, 2013. 995-1002.
    [8] Kang L, Ye P, Li Y, Doermann D. Convolutional neural networks for no-reference image quality assessment. In:Proc. of the 2014 IEEE Int'l Conf. On Computer Vision and Pattern Recognition. Columbus:IEEE, 2014. 1733-1740.
    [9] Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. In:2012 Advances in Neural Information Processing Systems. Barcelona:MIT Press, 2012. 1097-1105.
    [10] Lin M, Chen Q, Yan S. Network in network. arXiv preprint arXiv:1312.4400.
    [11] Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In:Proc. of the 2015 IEEE Int'l Conf. on Computer Vision and Pattern Recognition. Boston:IEEE, 2015. 1-9.
    [12] He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385.
    [13] Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. In:Proc. of the 2014 IEEE Int'l Conf. on Computer Vision and Pattern Recognition. Columbus:IEEE, 2014. 580-587.
    [14] Girshick R. Fast R-CNN. In:Proc. of the 2015 IEEE Int'l Conf. On Computer Vision. Santiago:IEEE, 2015. 1440-1448.
    [15] Ren S, He K, Girshick R, Sun J. Faster R-CNN:Towards real-time object detection with region proposal networks. In:2015 Advances in Neural Information Processing Systems. Montréal:MIT Press, 2015. 91-99.
    [16] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In:Proc. of the 2015 IEEE Int'l Conf. on Computer Vision and Pattern Recognition. Boston:IEEE, 2015. 3431-3440.
    [17] Chen LC, Yang Y, Wang J, Xu W, Yuille AL. Attention to scale:Scale-Aware semantic image segmentation. arXiv preprint arXiv:1511.03339.
    [18] Hubel DH, Wiesel TN. Receptive fields, binocular interaction and functional architecture in the cat's visual cortex. The Journal of Physiology, 1962,160(1):106-154.
    [19] Fukushima K, Miyake S. Neocognitron:A new algorithm for pattern recognition tolerant of deformations and shifts in position. Pattern Recognition, 1982,15(6):455-469.
    [20] Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L. Large-Scale video classification with convolutional neural networks. In:Proc. of the 2014 IEEE Int'l Conf. on Computer Vision and Pattern Recognition. Columbs:IEEE, 2014. 1725-1732.
    [21] Ng JYH, Hausknecht M, Vijayanarasimhan S. Beyond short snippets:Deep networks for video classification. In:Proc. of the 2015 IEEE Int'l Conf. On Computer Vision and Pattern Recognition. Boston:IEEE, 2015. 4694-4702.
    [22] Tran D, Bourdev LD, Fergus R, Torresani L, Paluri M. Learning spatiotemporal features with 3D convolutional networks. In:Proc. of the 2015 IEEE Int'l Conf. on Computer Vision. Santiago:IEEE, 2015. 4489-4497.
    [23] Seshadrinathan K, Soundararajan R, Bovik AC, Cormack LK. A subjective study to evaluate video quality assessment algorithms. In:Proc. of the IS&T/SPIE Electronic Imaging. San Jose:SPIE, 2010. 75270H-75270H.
    [24] Seshadrinathan K, Soundararajan R, Bovik AC, Cormack LK. Study of subjective and objective quality assessment of video. IEEE Trans. on Image Processing. 2010,19(6):1427-1441.
    [25] Caviedes JE, Oberti F. No-Reference quality metric for degraded and enhanced video. In:Visual Communications and Image Processing. Lugano:Int'l Society for Optics and Photonics, 2003. 621-632.
    [26] Babu RV, Bopardikar AS, Perkis A, Hillestad OI. No-Reference metrics for video streaming applications. In:Proc of the Int'l Workshop on Packet Video. 2004.
    [27] Farias MC, Mitra SK. No-Reference video quality metric based on artifact measurements. In:Proc. of the 2005 IEEE Int'l Conf. on Image Processing. Genoa:IEEE, 2005. III-141.
    [28] Lin XY, Tian X, Chen YW. No-Reference video quality assessment based on region of interest. In:Proc. of the 2nd Int'l Conf. on Consumer Electronics, Communications and Networks. Yichang:IEEE, 2012. 1924-1927.
    [29] Zhu KF, Keisuke Hirakawa, Asari V, Saupe D. A no-reference video quality assessment based on laplacian pyramids. In:Proc. of the 2013 IEEE Int'l Conf. on Image Processing. Melbourne:IEEE, 2013. 49-53.
    [30] Yang F, Wan S, Chang Y, Wu HR. A novel objective no-reference metric for digital video quality assessment. IEEE Signal Processing Letters, 2005,12(10):685-688.
    [31] Mittal A, Saad M, Bovik AC. Assessment of video naturalness using time-frequency statistics. In:Proc. of the 2014 IEEE Int'l Conf. on Image Processing. Paris:IEEE, 2014. 571-574.
    [32] Saad MA, Bovik AC, Charrier C. Blind prediction of natural video quality. IEEE Trans. on Image Processing, 2014,23(3):1352-1365.
    [33] Xu JT, Ye P, Liu Y, Doermann D. No-Reference video quality assessment via feature learning. In:Proc. of the 2014 IEEE Int'l Conf. on Image Processing. Paris:IEEE, 2014. 491-495.
    [34] Li YM, Po LM, Cheung CH, Xu XY, Feng LT, Yuan F, Cheung KW. No-Reference video quality assessment with 3D shearlet transform and convolutional neural networks. IEEE Trans. on Circuits and Systems for Video Technology, 2016,26(6):1044-1057.
    [35] Wang Z, Bovik AC, Sheikh HR, Simoncelli EP. Image quality assessment:From error visibility to structural similarity. IEEE Trans. on Image Processing, 2004,13(4):600-612.
    [36] Sheikh HR, Bovik AC, Veciana GD. An information fidelity criterion for image quality assessment using natural scene statistics. IEEE Trans. on Image Processing, 2005,14(12):2117-2128.
    [37] Jia YQ, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T. Caffe:Convolutional architecture for fast feature embedding. In:Proc. of the 22nd ACM Int'l Conf. on Multimedia. Orlando:ACM, 2014. 675-678.
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

王春峰,苏荔,张维刚,黄庆明.基于3D卷积神经网络的无参考视频质量评价.软件学报,2016,27(S2):103-112

Copy
Share
Article Metrics
  • Abstract:3060
  • PDF: 5223
  • HTML: 0
  • Cited by: 0
History
  • Received:May 01,2016
  • Revised:October 18,2016
  • Online: January 10,2017
You are the first2038340Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063