基于虚拟属性学习的文本-图像行人检索方法
作者:
作者简介:

王成济(1993-),男,博士生,主要研究领域为多媒体信息检索,机器学习;苏家威(1993-),男,博士生,主要研究领域为医学图像处理,机器学习;罗志明(1989-),男,博士,副教授,CCF专业会员,主要研究领域为计算机视觉,机器学习;曹冬林(1977-),男,博士,助理教授,CCF专业会员,主要研究领域为Web信息检索,自然语言处理;林耀进(1980-),男,博士,教授,主要研究领域为数据挖掘,机器学习;李绍滋(1963-),男,博士,教授,CCF高级会员,主要研究领域为计算机视觉,机器学习,多媒体信息检索

通讯作者:

曹冬林,another@xmu.edu.cn;李绍滋,szlig@xmu.edu.cn

基金项目:

国家自然科学基金(61876159,62076210,62076116)


Text-based Person Search via Virtual Attribute Learning
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [45]
  • | | | |
  • 文章评论
    摘要:

    文本-图像行人检索旨在从行人数据库中查找符合特定文本描述的行人图像.近年来受到学术界和工业界的广泛关注.该任务同时面临两个挑战:细粒度检索以及图像与文本之间的异构鸿沟.部分方法提出使用有监督属性学习提取属性相关特征,在细粒度上关联图像和文本.然而属性标签难以获取,导致这类方法在实践中表现不佳.如何在没有属性标注的情况下提取属性相关特征,建立细粒度的跨模态语义关联成为亟待解决的关键问题.为解决这个问题,融合预训练技术提出基于虚拟属性学习的文本-图像行人检索方法,通过无监督属性学习建立细粒度的跨模态语义关联.第一,基于行人属性的不变性和跨模态语义一致性提出语义引导的属性解耦方法,所提方法利用行人的身份标签作为监督信号引导模型解耦属性相关特征.第二,基于属性之间的关联构建语义图提出基于语义推理的特征学习模块,所提模块通过图模型在属性之间交换信息增强特征的跨模态识别能力.在公开的文本-图像行人检索数据集CUHK-PEDES和跨模态检索数据集Flickr30k上与现有方法进行实验对比,实验结果表明了所提方法的有效性.

    Abstract:

    The text-based person search aims to find the image of the target person conforming to a given text description from a person database, which has attracted the attention of researchers from academia and industry. It faces two challenges: fine-grained retrieval and a heterogeneous gap between images and texts. Some methods propose to use supervised attribute learning to obtain attribute-related features and build fine-grained associations between tests and images. The attribute annotations, however, are hard to obtain, which leads to poor performance of these methods in practice. Determining how to extract attribute-related features without attribute annotations and establish fine-grained and cross-modal semantic associations becomes a key problem to be solved. To address this issue, this study incorporates the pre-training technology and proposes a text-based person search via virtual attribute learning, which builds the cross-modal semantic associations between images and texts at a fine-grained level through unsupervised attribute learning. Specifically, in view of the invariance and cross-modal consistency of pedestrian attributes, a semantics-guided attribute decoupling method is proposed, which utilizes identity labels as the supervision signal to guide the model to decouple attribute-related features. Then, a feature learning module based on semantic reasoning is presented, which utilizes the relations between attributes to construct a semantic graph. This model uses the graph model to exchange information among attributes to enhance the cross-modal identification ability of features. The proposed approach is compared with existing methods on the public text-based person search dataset CUHK-PEDES and cross-modal retrieval dataset Flickr30k, and the experimental results verify the effectiveness of the proposed approach.

    参考文献
    [1] Zheng L, Shen LY, Tian L, Wang SJ, Wang JD, Tian Q. Scalable person re-identification: A benchmark. In: Proc. of the 2015 IEEE Int’l Conf. on Computer Vision (ICCV). Santiago: IEEE, 2015. 1116–1124.
    [2] Zhong Z, Zheng L, Cao DL, Li SZ. Re-ranking person re-identification with k-reciprocal encoding. In: Proc. of the 2017 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). Honolulu: IEEE, 2017. 3652–3661.
    [3] Xiao T, Li S, Wang BC, Lin L, Wang XG. Joint detection and identification feature learning for person search. In: Proc. of the 2017 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). Honolulu: IEEE, 2017. 3376–3385.
    [4] Pang L, Wang YW, Song YZ, Huang TJ, Tian YH. Cross-domain adversarial feature learning for sketch re-identification. In: Proc. of the 26th ACM Int’l Conf. on Multimedia. Seoul: ACM, 2018. 609–617.
    [5] Wu AC, Zheng WS, Yu HX, Gong SG, Lai JH. RGB-infrared cross-modality person re-identification. In: Proc. of the 2017 IEEE Conf. on Computer Vision (ICCV). Venice: IEEE, 2017. 5390–5399.
    [6] Nguyen DT, Hong HG, Kim KW, Park KR. Person recognition system based on a combination of body images from visible light and thermal cameras. Sensors, 2017, 17(3): 605. [doi: 10.3390/s17030605]
    [7] Li S, Xiao T, Li HS, Zhou BL, Yue DY, Wang XG. Person search with natural language description. In: Proc. of the 2017 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). Honolulu: IEEE, 2017. 5187–5196.
    [8] 卓昀侃, 綦金玮, 彭宇新. 跨媒体深层细粒度关联学习方法. 软件学报, 2019, 30(4): 884–895. http://www.jos.org.cn/1000-9825/5664.htm
    Zhuo YK, Qi JW, Peng YX. Cross-media deep fine-grained correlation learning. Ruan Jian Xue Bao/Journal of Software, 2019, 30(4): 884–895 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5664.htm
    [9] 罗浩, 姜伟, 范星, 张思朋. 基于深度学习的行人重识别研究进展. 自动化学报, 2019, 45(11): 2032-2049. [doi: 10.16383/j.aas.c180154].
    Luo H, Jiang W, Fan X, Zhang SP. A survey on deep learning based person re-identification. Acta Automatica Sinica, 2019, 45(11): 2032-2049 (in Chinese with English abstract). [doi: 10.16383/j.aas.c180154]
    [10] 祁磊, 于沛泽, 高阳. 弱监督场景下的行人重识别研究综述. 软件学报, 2020, 31(9): 2883-2902. http://www.jos.org.cn/1000-9825/6083.htm
    Qi L, Yu PZ, Gao Y. Research on weak-supervised person re-identification. Ruan Jian Xue Bao/Journal of Software, 2020, 31(9): 2883-2902 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6083.htm
    [11] Zhang Y, Lu HC. Deep cross-modal projection learning for image-text matching. In: Proc. of the 15th European Conf. on Computer Vision. Munich: Springer, 2018. 707–723.
    [12] Zheng ZD, Zheng L, Garrett M, Yang Y, Xu ML, Shen YD. Dual-path convolutional image-text embeddings with instance loss. ACM Transactions on Multimedia Computing, Communications, and Applications, 2020, 16(2): 1–23. [doi: 10.1145/3383184]
    [13] Sarafianos N, Xu X, Kakadiaris I. Adversarial representation learning for text-to-image matching. In: Proc. of the 2019 IEEE/CVF Int’l Conf. on Computer Vision (ICCV). Seoul: IEEE, 2019. 5813–5823.
    [14] Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L. ImageNet: A large-scale hierarchical image database. In: Proc. of the 2019 IEEE Conf. on Computer Vision and Pattern Recognition. Miami: IEEE, 2009. 248–255.
    [15] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In: Proc. of the 2nd Int’l Conf. on Learning Representations (ICLR). San Diego: ICLR, 2015. 1–14.
    [16] He KM, Zhang XY, Ren SQ, Sun J. Deep residual learning for image recognition. In: Proc. of the 2016 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). Las Vegas: IEEE, 2016. 770–778.
    [17] Howard AG, Zhu ML, Chen B, Kalenichenko D, Wang WJ, Weyand T, Andreetto M, Adam H. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861, 2017.
    [18] Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proc. of the 2019 Conf. of the North American Chapter of the Association for Computational Linguistics (NAACL). Minneapolis: Association for Computational Linguistics, 2019. 4171–4186.
    [19] Zha ZJ, Liu JW, Chen D, Wu F. Adversarial attribute-text embedding for person search with natural language query. IEEE Transactions on Multimedia, 2020, 22(7): 1836–1846. [doi: 10.1109/TMM.2020.2972168]
    [20] Aggarwal S, Babu RV, Chakraborty A. Text-based person search via attribute-aided matching. In: Proc. of the 2020 IEEE Winter Conf. on Applications of Computer Vision (WACV). Snowmass: IEEE, 2020. 2617–2625.
    [21] Dayan P, Abbott LF. Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems. London: The MIT Press, 2005.
    [22] Li S, Xiao T, Li HS, Yang W, Wang XG. Identity-aware textual-vr>ual matching with latent co-attention. In: Proc. of the 2017 IEEE Conf. on Computer Vision. Venice: IEEE, 2017. 1890–1899.
    [23] Chen TL, Xu CL, Luo JB. Improving text-based person search by spatial matching and adaptive threshold. In: Proc. of the 2018 IEEE Winter Conf. on Applications of Computer Vision (WACV). Lake Tahoe: IEEE, 2018. 1879–1887.
    [24] Niu K, Huang Y, Ouyang WL, Wang L. Improving description-based person re-identification by multi-granularity image-text alignments. IEEE Transactions on Image Processing, 2020, 29: 5542–5556. [doi: 10.1109/TIP.2020.2984883]
    [25] Gao LY, Niu K, Ma ZH, Jiao BL, Tan TH, Wang P. Text-guided visual feature refinement for text-based person search. In: Proc. of the 2021 Int’l Conf. on Multimedia Retrieval. Taipei: ACM, 2021. 118–126.
    [26] Jing Y, Si CY, Wang JB, Wang W, Wang L, Tan TN. Pose-guided multi-granularity attention network for text-based person search. Proc. of the AAAI Conf. on Artificial Intelligence, 2020, 34(7): 11189–11196.
    [27] Chen W, Liu Y, Bakker EM, Lew MS. Integrating information theory and adversarial learning for cross-modal retrieval. Pattern Recognition, 2021, 117: 107983. [doi: 10.1016/j.patcog.2021.107983]
    [28] Jing Y, Wang W, Wang L, Tan TN. Learning aligned image-text representations using graph attentive relational network. IEEE Transactions on Image Processing, 2021, 30: 1840–1852. [doi: 10.1109/TIP.2020.3048627]
    [29] Liu JW, Zha ZJ, Hong RC, Wang M, Zhang YD. Deep adversarial graph attention convolution network for text-based person search. In: Proc. of the 2019 ACM Int’l Conf. on Multimedia. Nice: ACM, 2019. 665–673.
    [30] Wang Z, Fang ZY, Wang J, Yang YZ. ViTAA: Visual-textual attributes alignment in person search by natural language. In: Proc. of the 16th European Conf. on Computer Vision. Glasgow: Springer, 2020. 402–420.
    [31] 史金婉, 宋雪萌, 刘子鑫, 聂礼强. 基于时尚图谱增强的个性化互补服装推荐. 信息安全学报, 2021, 6(5): 181–198. [doi: 10.19363/J.cnki.cn10-1380/tn.2021.09.14].
    Shi JW, Song XM, Liu ZX, Nie LQ. Fashion graph-enhanced personalized complementary clothing recommendation. Journal of Cyber Security, 2021, 6(5): 181–198 (in Chinese with English abstract). [doi: 10.19363/J.cnki.cn10-1380/tn.2021.09.14]
    [32] 郑鑫, 林兰, 叶茂, 王丽, 贺春林. 结合注意力机制和多属性分类的行人再识别. 中国图象图形学报, 2020, 25(5): 936–945. [doi: 10.11834/jig.190185].
    Zheng X, Lin L, Ye M, Wang L, He CL. Improving person re-identification by attention and multi-attributes. Journal of Image and Graphics, 2020, 25(5): 936–945 (in Chinese with English abstract). [doi: 10.11834/jig.190185]
    [33] Dong Q, Zhu XT, Gong SG. Person search by text attribute query as zero-shot learning. In: Proc. of the 2019 IEEE/CVF Int’l Conf. on Computer Vision (ICCV). Seoul: IEEE, 2019. 3652–3661.
    [34] Kim JH, Jun J, Zhang BT. Bilinear attention networks. In: Proc. of the 32nd Conf. on Neural Information Processing Systems. Montréal: NeurIPS, 2018. 1571–1581.
    [35] Li YJ, Tarlow D, Brockschmidt M, Zemel RS. Gated graph sequence neural networks. In: Proc. of the 4th Int’l Conf. on Learning Representations (ICLR). San Juan: ICLR, 2016. 1–20.
    [36] Young P, Lai A, Hodosh M, Hockenmaier J. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Transactions of the Association for Computational Linguistics, 2014, 2: 67–78. [doi: 10.1162/tacl_a_00166]
    [37] Chen DP, Li HS, Liu XH, Shen YT, Shao J, Yuan ZJ, Wang XG. Improving deep visual representation for person re-identification by global and local image-language association. In: Proc. of the 15th European Conf. on Computer Vision. Munich: Springer, 2018. 56–73.
    [38] Chen YC, Huang R, Chang H, Tan CQ, Xue T, Ma BP. Cross-modal knowledge adaptation for language-based person search. IEEE Transactions on Image Processing, 2021, 30: 4057–4069. [doi: 10.1109/TIP.2021.3068825]
    [39] 徐童, 周培伦, 陈恩红. 多模态语义理解中的不确定性. 中国人工智能学会通讯, 2020, 10(9): 7–11. (查阅所有网上资料, 未找到本条文献信息, 请联系作者确认)
    Xu T, Zhou PL, Chen EH. Uncertainty in multimodal semantic understanding. Communications of the CAAI, 2020, 10(9): 7–11 (in Chinese).
    相似文献
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

王成济,苏家威,罗志明,曹冬林,林耀进,李绍滋.基于虚拟属性学习的文本-图像行人检索方法.软件学报,2023,34(5):2035-2050

复制
分享
文章指标
  • 点击次数:1594
  • 下载次数: 4727
  • HTML阅读次数: 2812
  • 引用次数: 0
历史
  • 收稿日期:2022-04-12
  • 最后修改日期:2022-05-29
  • 在线发布日期: 2022-09-20
  • 出版日期: 2023-05-06
文章二维码
您是第19893893位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号