面向大规模图像分类的深度卷积神经网络优化
作者:
作者简介:

白琮(1981-),男,山东泰安人,博士,讲师,CCF专业会员,主要研究领域为计算机视觉,多媒体信息处理;潘翔(1977-),男,博士,教授,博士生导师,CCF专业会员,主要研究领域为计算机视觉;黄玲(1994-),女,学士,CCF学生会员,主要研究领域为计算机视觉,多媒体信息处理;陈胜勇(1973-),男,博士,教授,博士生导师,CCF高级会员,主要研究领域为计算机视觉;陈佳楠(1990-),男,硕士生,主要研究领域为计算机视觉,多媒体信息处理;陈佳楠(1990-),男,硕士生,主要研究领域为计算机视觉,多媒体信息处理.

通讯作者:

白琮,E-mail:congbai@zjut.edu.cn

基金项目:

国家自然科学基金(61502424,U1509207,61325019);浙江省自然科学基金(LY15F020028,LY15F020024,LY18F020032)


Optimization of Deep Convolutional Neural Network for Large Scale Image Classification
Author:
Fund Project:

National Natural Science Foundation of China (61502424, U1509207, 61325019); Natural Science Foundation of Zhejiang Province, China (LY15F020028, LY15F020024, LY18F020032)

  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [41]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    在图像分类任务中,为了获得更高的分类精度,需要对图像提取不同层次的特征信息.深度学习被越来越多地应用于大规模图像分类任务中.提出了一种基于深度卷积神经网络的、可应用于大规模图像分类的深度学习框架.该框架在经典的深度卷积神经网络AlexNet基础上,分别从网络框架和网络内部结构两个方面对网络进行了优化和改进,进一步提升了网络的特征表达能力.同时,通过在全连接层引入隐层,使得网络能够同时具备学习图像特征和二值哈希的功能,从而使该框架具有处理大规模图像数据的能力.通过在3个标准数据库中的一系列比对实验,分析了不同优化方法在不同情况下的作用,并证明了所提优化方法的有效性.

    Abstract:

    Features from different levels should be extracted from images for more accurate image classification. Deep learning is used more and more in large scale image classification. This paper proposes a deep learning framework based on deep convolutional neural network that can be applied for the large scale image classification. The proposed framework has modified the framework and the internal structure of the classical deep convolutional neural network AlexNet to improve the feature representation ability of the network. Furthermore, this framework has the ability of learning image features and binary hash simultaneously by introducing the hidden layer in the full-connection layer. The proposal has been validated in showing significance improvement through the serial experiments in three commonly used databases. Lastly, different effects of different optimization methods are analyzed.

    参考文献
    [1] Christopher JCB. A tutorial on support vector machines for pattern recognition. ACM Trans. on Data Mining and Knowledge Discovery, 1998,2(2):121-167.[doi:10.1023/A:1009715923555]
    [2] Penatti OAB, Silva FB, Valle E, Gouet-Brunet V, Torres RDS. Visual word spatial arrangement for image retrieval and classification. ACM Trans. on Pattern Recognition, 2014,47(2):705-720.[doi:10.1016/j.patcog.2013.08.012]
    [3] Krizhevsky A, Sutskever I, Hinton GE. ImageNet:Classification with deep convolutional neural networks. In:Advances in Neural Information Processing Systems. Lake Tahoe:Curran Associates, Inc., 2012. 1097-1105.
    [4] Redmon J, Divvala S, Girshick R, Farhadi A. You only look once:Unified, real-time object detection. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 2016. 779-788.[doi:10.1109/CVPR.2016.91]
    [5] Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In:Proc. of the 2015 IEEE Conf. on Computer Vision and Pattern Recognition. IEEE, 2015. 1-9.[doi:10.1109/CVPR.2015.7298594]
    [6] LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-Based learning applied to document recognition. Proc. of the IEEE, 1999, 86(11):2278-2324.[doi:10.1109/5.726791]
    [7] He K, Zhang X, Ren S, Sun J. Delving deep into rectifiers:Surpassing human-level performance on imagenet classification. In:Proc. of the Int'l Conf. on Computer Vision. 2015. 1026-1034.[doi:10.1109/ICCV.2015.123]
    [8] He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In:Proc. of the Computer Vision and Pattern Recognition. IEEE, 2016. 770-778.[doi:10.1109/CVPR.2016.90]
    [9] Huang G, Liu Z, van der Maate L, Weinberger KQ. Densely connected convolutional networks. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017.[doi:10.1109/CVPR.2017.243]
    [10] Wang J, Kumar S, Chang SF. Semi-Supervised hashing for large-scale search. IEEE Trans. on Pattern Analysis & Machine Intelligence, 2012,34(12):2393.[doi:10.1109/TPAMI.2012.48]
    [11] Ba LJ, Caruana R. Do deep nets really need to be deep. In:Advances in Neural Information Processing Systems. Montreal:Curran Associates, Inc., 2013. 2654-2662.
    [12] Qu Y, Li L, Shen F, Lu C, Wu Y, Xie Y, Tao DC. Joint hierarchical category structure learning and large-scale image classification. IEEE Trans. on Image Processing, 2017,99:1.[doi:10.1109/TIP.2016.2615423]
    [13] Wei Y, Wei X, Lin M, Huang JS, Ni BB, Dong J, Zhao Y, Yan SC. HCP:A flexible CNN framework for multi-label image classification. IEEE Trans. on Pattern Analysis & Machine Intelligence, 2015,38(9):1901-1907.[doi:10.1109/TPAMI.2015. 2491929]
    [14] Yang HF, Lin K, Chen CS. Supervised learning of semantics-preserving hash via deep convolutional neural networks. IEEE Trans. on Pattern Analysis & Machine Intelligence, 2017,99:1.[doi:10.1109/TPAMI.2017.2666812]
    [15] Wang CF, Su L, Zhang WG, Huang QM. No reference video quality assessment based on 3D convolutional neural network. Ruan Jian Xue Bao/Journal of Software, 2016,27(S2):103-112(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/16025.htm
    [16] Gionis A, Indyk P, Motwani R. Similarity search in high dimensions via hashing. In:Proc. of the Int'l Conf. on Very Large Data Bases. San Francisco:Morgan Kaufmann Publishers, 2000. 518-529.
    [17] Mao XJ, Yang YB. Semantic hashing with image subspace learning. Ruan Jian Xue Bao/Journal of Software, 2014,25(8):1781-1793(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/4488.htm[doi:10.13328/j.cnki.jos.004488]
    [18] Weiss Y, Torralba A, Fergus R. Spectral hashing. In:Proc. of the Conf. on Neural Information Processing Systems. Vancouver:Curran Associates, Inc., 2008. 1753-1760.
    [19] Norouzi M, Fleet DJ. Minimal loss hashing for compact binary codes. In:Proc. of the Int'l Conf. on Machine Learning. Washington:Omnipress, 2011. 353-360.
    [20] Xia R, Pan Y, Lai H, Liu C, Yan S. Supervised hashing for image retrieval via image representation learning. In:Proc. of the American Association for Artificial Intelligence. 2014. 2156-2162. https://www.aaai.org/ocs/index.php/AAAI/AAAI14/paper/view/8137
    [21] Lecun Y, Kavukcuoglu K, Farabet C. Convolutional networks and applications in vision. In:Proc. of the IEEE Int'l Symp. on Circuits and Systems. 2010. 253-256.[doi:10.1109/ISCAS.2010.5537907]
    [22] Goodfellow IJ, Wardefarley D, Mirza M, Courville A, Bengio Y. Maxout networks. In:Proc. of the Int'l Conf. on Machine Learning. Atlanta, 2013. 1319-1327.
    [23] Liu W, Wang J, Ji RR, Jiang YG, Chang SF. Supervised hashing with kernels. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2012. 2074-2081.[doi:10.1109/CVPR.2012.6247912]
    [24] Ioffe S, Szegedy C. Batch normalization:Accelerating deep network training by reducing internal covariate shift. In:Proc. of the Int'l Conf. on Machine Learning. 2015. 448-456.
    [25] Boureau YL, Bach F, Lecun Y, Ponce J. Learning mid-level features for recognition. In:Proc. of the IEEE Int'l Conf. on Computer Vision and Pattern Recognition. 2010. 2559-2566.[doi:10.1109/CVPR.2010.5539963]
    [26] Wang JJ, Yang JC, Yu K, Lü FJ, Huang T, Gong YH. Locality-Constrained linear coding for image classification. In:Proc. of the IEEE Int'l Conf. on Computer Vision and Pattern Recognition. 2010. 3360-3367.[doi:10.1109/CVPR.2010.5540018]
    [27] Boureau YL, Ponce J, Lecun Y. A theoretical analysis of feature pooling in visual recognition. In:Proc. of the Int'l Conf. on Machine Learning. Haifa, 2010. 111-118.
    [28] Wang H, Cai Y, Zhang Y, Pan HX, Lü WF, Han H. Deep learning for image retrieval:What works and what doesn't. In:Proc. of the Int'l Conf. on Data Mining Workshop. 2015. 1576-1583.[doi:10.1109/ICDMW.2015.121]
    [29] Donahue J, Jia Y, Vinyals O, Hoffma J, Zhang N, Tzeng E, Darrell T. DeCAF:A deep convolutional activation feature for generic visual recognition. In:Proc. of the Int'l Conf. on Machine Learning. Atlanta, 2013. 815-830.
    [30] Jia YQ, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T. Caffe:Convolutional architecture for fast feature embedding. In:Proc. of the 22nd ACM Int'l Conf. on Multimedia. 2014. 675-678.[doi:10.1145/2647868.2654889]
    [31] Lecun Y, Cortes C. The MNIST database of handwritten digit. 1998. http://yann.lecun.com/exdb/mnist
    [32] Krizhevsky A. Learning multiple layers of features from tiny images. Technical Report, Computer Science Department, University of Toronto, 2009. http://www.cs.toronto.edu/~kriz/cifar-10-binary.tar.gz
    [33] Zeiler MD, Fergus R. Stochastic pooling for regularization of deep convolutional neural networks. In:Proc. of the Int'l Conf. on Learning Representation. 2013. http://arxiv.org/abs/1301.3557
    [34] Lin M, Chen Q, Yan S. Network in network. In:Proc. of the 2nd Int'l Conf. on Learning Representations. 2014, arXiv:1312.4400. https://arxiv.org/abs/1312.4400
    [35] Snoek J, Larochelle H, Adams RP. Practical Bayesian optimization of machine learning algorithms. In:Advances in Neural Information Processing System. Lake Tahoe:Curran Associates, Inc., 2012. 2951-2959.
    [36] Schmidhuber J, Meier U, Ciresan D. Multi-Column deep neural networks for image classification. In:Proc. of the Computer Vision and Pattern Recognition Workshops. 2012,157(10):3642-3649.[doi:10.1109/CVPR.2012.6248110]
    [37] Malinowski M, Fritz M. Learnable pooling regions for image classification. In:Proc. of the Int'l Conf. on Learning Representations Workshop. 2013. http://arxiv.org/abs/1301.3516
    [38] Srivastava N, Salakhutdinov R. Discriminative transfer learning with tree-based priors. In:Advances in Neural Information Processing Systems. Lake Tahoe:Curran Associates, Inc., 2013. 2094-2102.
    附中文参考文献:
    [15] 王春峰,苏荔,张维刚,黄庆明.基于3D卷积神经网络的无参考视频质量评价.软件学报,2016,27(增刊(2)):103-112. http://www.jos.org.cn/1000-9825/16025.htm
    [17] 毛晓蛟,杨育彬.一种基于子空间学习的图像语义哈希索引方法.软件学报,2014,25(8):1781-1793. http://www.jos.org.cn/1000-9825/4488.htm[doi:10.13328/j.cnki.jos.004488]
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

白琮,黄玲,陈佳楠,潘翔,陈胜勇.面向大规模图像分类的深度卷积神经网络优化.软件学报,2018,29(4):1029-1038

复制
分享
文章指标
  • 点击次数:5105
  • 下载次数: 9653
  • HTML阅读次数: 3374
  • 引用次数: 0
历史
  • 收稿日期:2017-04-28
  • 最后修改日期:2017-06-26
  • 在线发布日期: 2017-11-29
文章二维码
您是第19728073位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号