Food Image Recognition via Multi-scale Jigsaw and Reconstruction Network
Author:
Affiliation:

Clc Number:

TP393

  • Article
  • | |
  • Metrics
  • |
  • Reference [47]
  • |
  • Related
  • | | |
  • Comments
    Abstract:

    Recently, food image recognition has received more and more attention for its wide applications in healthy diet management, smart restaurant, and so on. Unlike other object recognition tasks, food images belong to fine-grained ones with high intra-class variability and inter-class similarity. Furthermore, food images do not have fixed semantic patterns and specific spatial layout. These make food recognition more challenging. This study proposes a multi-scale jigsaw and reconstruction network (MJR-Net) for food recognition. MJR-Net is composed of three parts. The jigsaw and reconstruction module uses a method called destruction and reconstruction learning to destroy and reconstruct the original image to extract local discriminative details. Feature pyramid module can fuse mid-level features of different sizes to capture multi-scale local discriminative features. Channel-wise attention module can model the importance of different feature channels to enhance the discriminative visual patterns and weaken the noise patterns. The study also uses both A-softmax loss and Focal loss to optimize the network by increasing the inter-class variability and reweighting samples respectively. MJR-Net is evaluated on three food datasets (ETH Food-101, Vireo Food-172, and ISIA Food-500). The proposed method achieves 90.82%, 91.37%, and 64.95% accuracy, respectively. Experimental results show that, compared with other food recognition methods, MJR-Net shows greater competitiveness and especially achieves the state-of-the-art recognition performance on Vireo Food-172 and ISIA Food-500. Comprehensive ablation studies and visual analysis also prove the effectiveness of the proposed method.

    Reference
    [1] Khanna SK. Food and Culture:A Reader. 2nd ed., Carole Counihan and Penny Van Esterik, 2009. 157-159.
    [2] Min WQ, Jiang SQ, Liu LH, Rui Y, Jain RC. A survey on food computing. ACM Computing Surveys, 2019, 52(5):Article No.92.
    [3] Mezgec S, Eftimov T, Bucher T, Seljak BK. Mixed deep learning and natural language processing method for fake-food image recognition and standardization to help automated dietary assessment. Public Health Nutrition, 2019, 22(7):1193-1202.
    [4] Zhang WS, Zhang YJ, Zhai J, Zhao DH, Xu L, Zhou JH, Li ZW, Yang S. Multi-source data fusion using deep learning for smart refrigerators. Computers in Industry, 2018, 95:15-21.
    [5] Zhu YS, Zhao X, Zhao CY, Wang JQ, Lu HQ. Food det:Detecting foods in refrigerator with supervised transformer network. Neurocomputing, 2020, 379:162-171.
    [6] Mohammad I, Mazumder MSI, Saha EK, Razzaque ST, Chowdhury S. A deep learning approach to smart refrigerator system with the assistance of IOT. In:Proc. of the Int'l Conf. on Computing Advancements. 2020. 1-7.
    [7] Aguilar E, Remeseiro B, Bolaños M, Radeva P. Grab, pay, and eat:Semantic food detection for smart restaurants. IEEE Trans. on Multimedia, 2018, 20(12):3266-3275.
    [8] Min WQ, Jiang SQ, Jain RC. Food recommendation:Framework, existing solutions, and challenges. IEEE Trans. on Multimedia, 2020, 22(10):2659-2671.
    [9] Jiang SQ, Min WQ, Liu LH, Luo ZD. Multi-scale multi-view deep feature aggregation for food recognition. IEEE Trans. on Image Processing, 2020, 29:265-276.
    [10] Kagaya H, Aizawa K, Ogawa M. Fooddetection and recognition using convolutional neural network. In:Proc. of the ACM Int'l Conf. on Multimedia. Orlando, 2014. 1085-1088.
    [11] Ming ZY, Chen JJ, Cao Y, Forde C, Ngo CW, Chua TS. Food photo recognition for dietary tracking:System and experiment. In:Proc. of the Int'l Conf. on Multimedia Modeling. Osaka, 2018. 129-141.
    [12] Wah C, Branson S, Welinder P, Perona P, Belongie S. The Caltech-UCSD birds-200-2011 dataset. Technical Report, CNS-TR-2011-001, 2011.
    [13] Bossard L, Guillaumin M, Van Gool L. Food-101-mining discriminative components with random forests. In:Proc. of the European Conf. on Computer Vision. Zurich, 2014. 446-461.
    [14] Chen JJ, Ngo CW. Deep-based ingredient recognition for cooking recipe retrieval. In:Proc. of the ACM Int'l Conf. on Multimedia. Amsterdam, 2016. 32-41.
    [15] Min WQ, Liu LH, Wang ZL, Luo ZD, Wei XM, Wei XL, Jiang SQ. ISIA Food-500:A dataset for large-scale food recognition via stacked global-local attention network. In:Proc. of the ACM Int'l Conf. on Multimedia. Seattle, 2020. 393-401.
    [16] Chen M, Dhingra K, Wu W, Yang L, Sukthankar R, Yang J. PFID:Pittsburgh fast-food image dataset. In:Proc. of the Int'l Conf. on Image Processing. Cairo, 2009. 289-292.
    [17] Lowe DG. Distinctive image features from scale-invariant keypoints. Int'l Journal of Computer Vision, 2004, 60(2):91-110.
    [18] Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification withdeep convolutional neural networks. In:Proc. of the Annual Conf. on Neural Information Processing Systems. Lake Tahoe, 2012. 1106-1114.
    [19] He KM, Zhang XY, Ren SQ, Sun J. Deep residual learning for image recognition. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. Las Vegas, 2016. 770-778.
    [20] Martinel N, Foresti GL, Micheloni C. Wide-slice residual networks for food recognition. In:Proc. of the IEEE Workshop on Applications of Computer Vision. 2018. 567-576.
    [21] Zagoruyko S, Komodakis N. Wide residual networks. In:Proc. of the British Machine Vision Conf. York, 2016. Article No.87.
    [22] Liang HG, Wen XQ, Liang DD, Li HD, Ru F. Fine-grained food image recognition of a multilevel convolution feature pyramid. Journal of Image and Graphics, 2019, 24(6):870-881(in Chinese with English abstract).
    [23] Huang SL, Xu Z, Tao DC, Zhang Y. Part-stacked CNN for fine-grained visual categorization. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. Las Vegas, 2016. 1173-1182.
    [24] Zhang N, Jeff D, Girshick RB, Darrell T. Part-based R-CNNs for fine-grained category detection. In:Proc. of the European Conf. on Computer Vision. Zurich, 2014. 834-849.
    [25] Zheng HL, Fu JL, Mei T, Luo JB. Learning multi-attention convolutional neural network for fine-grained image recognition. In:Proc. of the Int'l Conf. on Computer Vision. Venice, 2017. 5219-5227.
    [26] Yang Z, Luo TG, Wang D, Hu ZQ, Gao J, Wang LW. Learning to navigate for fine-grained classification. In:Proc. of the European Conf. on Computer Vision. Munich, 2018. 438-454.
    [27] Chen Y, Bai YL, Zhang W, Mei T. Destruction and construction learning for fine-grained image recognition. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. Long Beach, 2019. 5157-5166.
    [28] Huang G, Liu Z, van der Maaten L, Weinberger KQ. Densely connected convolutional networks. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. Hawaii, 2017. 2261-2269.
    [29] Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking the inception architecture for computer vision. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. Las Vegas, 2016. 2818-2826.
    [30] Lin TY, Piotr D, Ross BG, He KM, Bharath H, Belongie SJ. Feature pyramid networks for object detection. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. Hawaii, 2017. 936-944.
    [31] Liu WY, Wen YD,Yu ZD, Li M, Raj B, Song L. SphereFace:Deep hypersphere embedding for face recognition. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. Hawaii, 2017. 6738-6746.
    [32] Lin TY, Priya G, Ross BG, He KM, Dollár P. Focal loss for dense object detection. In:Proc. of the Int'l Conf. on Computer Vision. Venice, 2017. 2999-3007.
    [33] Szegedy C, Liu W, Jia YQ, Sermanet P, Reed SE, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. Boston, 2015. 1-9.
    [34] Rodríguez P, Dorta DV, Cucurull G, Gonfaus JM, Roca FX, Gonzàlez J. Pay attention to the activations:A modular attention mechanism for fine-grained image recognition. IEEE Trans. on Multimedia, 2020, 22(2):502-514.
    [35] Hu J, Shen L, Sun G. Squeeze-and-excitation networks. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. Utah, 2018. 7132-7141.
    [36] Hu T, Qi H, Huang Q, et al. See better before looking closer:Weakly supervised data augmentation network for fine-grained visual classification. arXiv:1901.09891, 2019.
    [37] Qiu JN, Po F, Luo W, Sun YN, Wang SY, Lo B. Mining discriminative food regions for accurate food recognition. In:Proc. of the British Machine Vision Conf. Cardiff, 2019. Article No.165.
    [38] Min WQ, Liu LH, Luo ZD, Jiang SQ. Ingredient-guided cascaded multi-attention network for food recognition. In:Proc. of the ACM Int'l Conf. on Multimedia. Nice, 2019. 1331-1339.
    [39] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556, 2014.
    [40] Zoph B, Vasudevan V, Shlens J, Le Q. Learning transferable architectures for scalable image recognition. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. Utah, 2018. 8697-8710.
    [41] Woo S, Park J, Lee JY, Kweon IS. CBAM:Convolutional block attention module. In:Proc. of the European Conf. on Computer Vision. Munich, 2018. 3-19.
    [42] Cao Y, Xu JR, Stephen L, Wei FY, Hu H. GCNet:Non-local networks meet squeeze-excitation networks and beyond. In:Proc. of the Int'l Conf. on Computer Vision Workshops. Seoul, 2019. 1971-1980.
    [43] Gao Y, Han XT, Wang X, Huang WL, Scott MR. Channel interaction networks for fine-grained image categorization. In:Proc. of the AAAI Conf. on Artificial Intelligence. New York, 2020. 10818-10825.
    [44] Min WQ, Bao BK, Mei SH, Zhu YH, Rui Y, Jiang SQ. You are what you eat:Exploring rich recipe information for cross-region food analysis. IEEE Trans. on Multimedia, 2018, 20(4):950-964.
    [45] Min WQ, Jiang SQ, Wang SH, Sang JT, Mei SH. A delicious recipe analysis framework for exploring multi-modal recipes with various attributes. In:Proc. of the ACM Int'l Conf. on Multimedia. California, 2017. 402-410.
    附中文参考文献:
    [22] 梁华刚, 温晓倩, 梁丹丹, 李怀德, 茹锋. 多级卷积特征金字塔的细粒度食物图片识别. 中国图像图形学报, 2019, 24(6):870-881.
    Related
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

刘宇昕,闵巍庆,蒋树强,芮勇.多尺度拼图重构网络的食品图像识别.软件学报,2022,33(11):4379-4395

Copy
Share
Article Metrics
  • Abstract:1108
  • PDF: 2387
  • HTML: 1994
  • Cited by: 0
History
  • Received:September 23,2020
  • Revised:January 11,2021
  • Online: November 11,2022
  • Published: November 06,2022
You are the first2051218Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063