小样本困境下的深度学习图像识别综述
作者:
作者简介:

葛轶洲(1988-), 男, 硕士, 高级工程师, 主要研究领域为信号与信息处理.
刘恒(1997-), 男, 硕士生, 主要研究领域为神经网络, 数据分析.
王言(1997-), 男, 硕士生, 主要研究领域为数据增强, 神经网络.
徐百乐(1989-), 男, 博士生, 主要研究方向为神经网络, 增量学习.
周青(1973-), 男, 研究员, 主要研究领域为通信信号处理.
申富饶(1973-), 男, 博士, 教授, 博士生导师, CCF高级会员, 主要研究领域为神经计算, 机器人智能.

通讯作者:

刘恒,mg1937016@smail.nju.edu.cn

中图分类号:

TP391

基金项目:

国家自然科学基金(61876076)


Survey on Deep Learning Image Recognition in Dilemma of Small Samples
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [50]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    图像识别是图像研究领域的核心问题, 解决图像识别问题对人脸识别、自动驾驶、机器人等各领域研究都有重要意义. 目前广泛使用的基于深度神经网络的机器学习方法, 已经在鸟类分类、人脸识别、日常物品分类等图像识别数据集上达到了超过人类的水平, 同时越来越多的工业界应用开始考虑基于深度神经网络的方法, 以完成一系列图像识别业务. 但是深度学习方法极度依赖大规模标注数据, 这一缺陷极大地限制了深度学习方法在实际图像识别任务中的应用. 针对这一问题, 越来越多的研究者开始研究如何基于少量的图像识别标注样本来训练识别模型. 为了更好地理解基于少量标注样本的图像识别问题, 广泛地讨论了几种图像识别领域主流的少量标注学习方法, 包括基于数据增强的方法、基于迁移学习的方法以及基于元学习的方法, 通过讨论不同算法的流程以及核心思想, 可以清晰地看到现有方法在解决少量标注的图像识别问题上的优点和不足. 最后针对现有方法的局限性, 指出了小样本图像识别未来的研究方向.

    Abstract:

    Present machine learning methods have reached a higher level than human intelligence in image recognition and other tasks. However, recent machine learning methods, especially deep learning methods, rely heavily on a large number of annotation data that human cognition often does not need. This weakness greatly limits the application of deep learning method in practical problem. To solve this problem, learning from a few shot examples attracts more and more community’s research interest. In order to better understand the few shot learning problem, this study extensively discusses several popular few shot learning methods, including data augmentation methods, transfer learning methods, and meta learning methods. After the processes and core ingredients of different algorithms are discussed, the advantages and disadvantages of existing methods could be clearly seen in solving few shot learning problems. At the end of this paper, the points to future research directions are highlighted in the field of few shot learning problem.

    参考文献
    [1] Taigman Y, Yang M, Ranzato MA, Wolf L. DeepFace: Closing the gap to human-level performance in face verification. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014. 1701–1708. [doi: 10.1109/CVPR.2014.220]
    [2] El Sallab A, Abdou M, Perot E, Yogamani S. Deep reinforcement learning framework for autonomous driving. Electronic Imaging, 2017, 2017(19): 70–76. [doi: 10.2352/ISSN.2470-1173.2017.19.AVM-023
    [3] Esteva A, Robicquet A, Ramsundar B, Kuleshov V, DePristo M, Chou K, Cui C, Corrado G, Thrun S, Dean J. A guide to deep learning in healthcare. Nature Medicine, 2019, 25(1): 24–29. [doi: 10.1038/s41591-018-0316-z
    [4] Goodfellow I, Bengio Y, Courville A. Deep Learning. Cambridge: MIT Press, 2016.
    [5] Vanschoren J. Meta-learning: A survey. arXiv: 1810.03548, 2018.
    [6] Fort S. Gaussian prototypical networks for few-shot learning on omniglot. arXiv: 1708.02735, 2017.
    [7] Zhang HY, Cisse M, Dauphin YN, Lopez-Paz D. mixup: Beyond empirical risk minimization. In: Proc. of the 6th Int’l Conf. Paper at ICLR 2018. Vancouver, 2018.
    [8] Pham H, Dai ZH, Xie QZ, Luong MT, Le QV. Meta pseudo labels. arXiv: 2003.10580, 2020.
    [9] Hariharan B, Girshick R. Low-shot visual recognition by shrinking and hallucinating features. In: Proc. of the 2017 IEEE Int’l Conf. on Computer Vision. Venice: IEEE, 2017. 3037–3046. [doi: 10.1109/ICCV.2017.328]
    [10] Schwartz E, Karlinsky L, Shtok J, Harary S, Marder M, Kumar AD, Feris RS, Giryes R, Bronstein AM. Δ-encoder: An effective sample synthesis method for few-shot object recognition. In: Proc. of the 32nd Int’l Conf. on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2018. 2850–2860.
    [11] Chen ZT, Fu YW, Zhang YD, Jiang YG, Xue XY, Sigal L. Semantic feature augmentation in few-shot learning. arXiv: 1804.05298v2, 2018.
    [12] Mikolov T, Sutskever I, Chen K, Corrado G, Dean J. Distributed representations of words and phrases and their compositionality. In: Proc. of the 26th Int’l Conf. on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2013. 3111–3119.
    [13] Chen ZT, Fu YW, Zhang YD, Jiang YG, Xue XY, Sigal L. Multi-level semantic feature augmentation for one-shot learning. IEEE Trans. on Image Processing, 2019, 28(9): 4594–4605. [doi: 10.1109/TIP.2019.2910052
    [14] Zhang R, Che T, Ghahramani Z, Bengio Y, Song Y. MetaGAN: An adversarial approach to few-shot learning. In: Proc. of the 32nd Conf. on Neural Information Processing Systems. Montreal, 2018. 2371–2380.
    [15] Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets. In: Proc. of the 27th Int’l Conf. on Neural Information Processing Systems. Cambridge: MIT Press, 2014. 2672–2680.
    [16] Lu H, Zhang L, Cao ZG, Wei W, Xian K, Shen CH, Van Den Hengel A. When unsupervised domain adaptation meets tensor representations. In: Proc. of the 2017 IEEE Int’l Conference on Computer Vision. Venice: IEEE, 2017. 599–608. [doi: 10.1109/ICCV.2017.72]
    [17] Lee H, Grosse R, Ranganath R, Ng AY. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: Proc. of the 26th Annual Int’l Conf. on Machine Learning. New York: ACM, 2009. 609–616. [doi: 10.1145/1553374.1553453]
    [18] Chen WY, Liu YC, Kira Z, et al. A closer look at few-shot classification. arXiv: 1904.04232v2, 2019.
    [19] Dhillon GS, Chaudhari P, Ravichandran A, Soatto S. A baseline for few-shot image classification. arXiv: 1909.02729, 2020.
    [20] Grandvalet Y, Bengio Y. Semi-supervised learning by entropy minimization. In: Proc. of the 17th Int’l Conf. on Neural Information Processing Systems. Cambridge: MIT Press, 2005. 529–536.
    [21] Pettersson R. Visual information. Educational Technology, 1993.
    [22] Qi H, Brown M, Lowe DG. Low-shot learning with imprinted weights. In: Proc. of the 2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 5822–5830. [doi: 10.1109/CVPR.2018.00610]
    [23] Qiao SY, Liu XC, Shen W, Yuille A. Few-shot image recognition by predicting parameters from activations. In: Proc. of the 2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 7229–7238. [doi: 10.1109/CVPR.2018.00755]
    [24] Mishra N, Rohaninejad M, Chen X, Abbeel P. A simple neural attentive meta-learner. In: Proc. of the 6th Int’l Conf. on Learning Representations. Vancouver: OpenReview.net, 2018.
    [25] Nielsen MA. Neural networks and deep learning. San Francisco: Determination Press, 2015.
    [26] Vapnik VN. An overview of statistical learning theory. IEEE Trans. Neural Network, 1999, 10(5): 988–999
    [27] Ravi S, Larochelle H. Optimization as a model for few-shot learning. In: Proc. of the 5th Int’l Conf. on Learning Representations. Toulon: OpenReview.net, 2017.
    [28] Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997, 9(8): 1735–1780. [doi: 10.1162/neco.1997.9.8.1735
    [29] Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fast adaptation of deep networks. In: Proc. of the 34th Int’l Conf. on Machine Learning. Sydney: JMLR.org, 2017. 1126–1135.
    [30] Nichol A, Achiam J, Schulman J. On first-order meta-learning algorithms. arXiv: 1803.02999v3, 2018.
    [31] Antoniou A, Edwards H, Storkey A. How to train your MAML. arXiv: 1810.09502, 2019.
    [32] Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Proc. of the 32nd Int’l Conf. on Int’l Conf. on Machine Learning. Lille: JMLR, 2015. 448–456.
    [33] Davis JV, Kulis B, Jain P, Sra S, Dhillon IS. Information-theoretic metric learning. In: Proc. of the 24th Int’l Conf. on Machine Learning. New York: ACM, 2007. 209–216. [doi: 10.1145/1273496.1273523]
    [34] LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proc. of the IEEE, 1998, 86(11): 2278–2324. [doi: 10.1109/5.726791
    [35] Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma SA, Huang ZH, Karpathy A, Khosla A, Bernstein M, Berg AC, Li FF. ImageNet large scale visual recognition challenge. Int’l Journal of Computer Vision, 2015, 115(3): 211–252. [doi: 10.1007/s11263-015-0816-y
    [36] Koch G, Zemel R, Salakhutdinov R. Siamese neural networks for one-shot image recognition. In: Proc. of the Int’l Conf. on Machine Learning Deep Learning Workshop. 2015. 2.
    [37] Vinyals O, Blundell C, Lillicrap T, Kavukcuoglu K, Wierstra D. Matching networks for one shot learning. In: Proc. of the 30th Int’l Conf. on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2016. 3630–3638.
    [38] Sung F, Yang YX, Zhang L, Xiang T, Torr PHS, Hospedales TM. Learning to compare: Relation network for few-shot learning. In: Proc. of the 2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 1199–1208. [doi: 10.1109/CVPR.2018.00131]
    [39] Snell J, Swersky K, Zemel R. Prototypical networks for few-shot learning. In: Proc. of the 31st Conf. on Neural Information Processing Systems. Long Beach, 2017. 4077–4087.
    [40] Graves A, Wayne G, Danihelka I. Neural turing machines. arXiv: 1410.5401v2, 2014.
    [41] Santoro A, Bartunov S, Botvinick M, Wierstra D, Lillicrap TP. Meta-learning with memory-augmented neural networks. In: Proc. of the 33rd Int’l Conf. on Int’l Conf. on Machine Learning. New York: JMLR, 2016. 1842–1850.
    [42] Munkhdalai T, Yu H. Meta networks. In: Proc. of the 34th Int’l Conf. on Machine Learning. Sydney: JMLR.org, 2017. 2554–2563.
    [43] Oreshkin BN, Rodriguez P, Lacoste A. Tadam: Task dependent adaptive metric for improved few-shot learning. In: Proc. of the 32nd Int’l Conf. on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2018. 719–729.
    [44] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv: 1409.1556v4, 2014.
    [45] He KM, Zhang XY, Ren SQ, Sun J. Deep residual learning for image recognition. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016. 770–778. [doi: 10.1109/CVPR.2016.90]
    [46] Zagoruyko S, Komodakis N. Wide residual networks. In: Richard C, Wilson ERH, Smith WAP, eds. Proc. of the British Machine Vision Conference. New York: BMVA Press, 2016.
    [47] Wu M, Hughes MC, Parbhoo S, Zazzi M, Roth V, Doshi-Velez F. Beyond sparsity: Tree regularization of deep models for interpretability. arXiv: 1711.06178, 2017.
    [48] Tao XY, Hong XP, Chang XY, Dong SL, Wei X, Gong YH. Few-shot class-incremental learning. In: Proc. of the 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020. 12183–12192. [doi: 10.1109/CVPR42600.2020.01220]
    [49] Rakelly K, Shelhamer E, Darrell T, Efros A, Levine S. Conditional networks for few-shot semantic segmentation. In: Proc. of the Workshop Track-ICLR 2018. Vancouver: OpenReview.net, 2018.
    [50] Gao TY, Han X, Liu ZY, Sun MS. Hybrid attention-based prototypical networks for noisy few-shot relation classification. Proc. of the AAAI Conf. on Artificial Intelligence, 2019, 33: 6407–6414.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

葛轶洲,刘恒,王言,徐百乐,周青,申富饶.小样本困境下的深度学习图像识别综述.软件学报,2022,33(1):193-210

复制
分享
文章指标
  • 点击次数:5084
  • 下载次数: 9551
  • HTML阅读次数: 5185
  • 引用次数: 0
历史
  • 收稿日期:2021-01-13
  • 最后修改日期:2021-02-21
  • 在线发布日期: 2021-04-21
  • 出版日期: 2022-01-06
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号