«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

ISSN.1000-5013.202508036]
点击复制

跨模态车辆再辨识：方法、挑战与未来发展()

分享到：

《华侨大学学报（自然科学版）》[ISSN:1000-5013/CN:35-1079/N]

卷:: 第46卷
期数:: 2025年第5期

页码:: 481-492

栏目:

出版日期:: 2025-09-20

文章信息/Info

Title:: Cross-Modal Vehicles Re-Identification：Methods, Challenges, and Future Developments

文章编号:: 1000-5013(2025)05-0481-12

作者:: 苏嘉骏¹; 詹思敏¹; 朱建清¹; 崔晓琳²; 1. 华侨大学工学院, 福建泉州 362021;2. 厦门市公安局, 福建厦门 361001

Author(s):: SU Jiajun¹; ZHAN Simin¹; ZHU Jianqing¹; CUI Xiaolin²; 1. College of Engineering, Huaqiao University, Quanzhou 362021, China; 2. Xiamen Municipal Public Security Bureau, Xiamen 361001, China

关键词:: 车辆再辨识; 多模态感知; 跨模态匹配; “人工智能+”

Keywords:: vehicle re-identification; multimodal perception; cross modal matching; “artificial intelligence +”

分类号:: TP391.41;

DOI:: 10.11830/ISSN.1000-5013.202508036

文献标志码:: A

摘要:: 车辆再辨识旨在通过车辆外观特征,实现无视场重叠摄像头间的身份匹配与检索,在智慧城市、智能交通等领域中具有重要的研究意义和广阔的应用前景。但是,传统基于可见光单模态的车辆再辨识方法在夜间低可见度、车灯眩光干扰、恶劣天气等条件下性能退化严重,在复杂环境中的适用性受到很大限制。为此,跨模态车辆再辨识应运而生,并取得了快速发展与进步。文中通过对跨模态车辆再辨识技术的研究背景介绍,从跨模态车辆再辨识应用场景出发,将已有研究分为可见光-红外光车辆再辨识和文本-图像车辆再辨识两大类。重点归纳和分析了这两大类场景下各种算法的优劣势,并总结多个公开数据集上各类算法的性能。最后,通过总结本领域面临的主要挑战,并展望未来潜在的发展方向,期望梳理跨模态车辆再辨识技术演进脉络,为后续研究提供启发。

Abstract:: Vehicle re-identification aims to achieve identity matching and retrieval across non-overlapping camera views based on vehicle appearance features. It holds significant research importance and broad application potential in fields such as smart cities and intelligent transportation. However, traditional single-modal re-identification methods based on visible light suffer severe performance degradation under low-visibility night scenes, headlight glare interference, and adverse weather conditions, which limits their applicability in complex environments. To address these limitations, cross-modal vehicle re-identification has emerged and achieved rapid progress. This paper first presents the research background of cross-modal vehicle re-identification techniques and categorizes existing studies into two main application scenarios: visible-infrared vehicle re-identification and text-image vehicle re-identification. The advantages and limitations of representative methods in each scenario are systematically analyzed, followed by a summary of their performance on multiple public datasets. Finally, the key challenges in this field are discussed, and potential future research directions are outlined, aiming to provide a clear overview of evolution of cross-modal vehicle re-identification techniques and offer insights for subsequent studies.

参考文献/References:

[1] KAMENOU E,DEL RINCON J M,MILLER P,et al.Closing the domain gap for cross-modal visible-infrared vehicle re-identification[C]//International Conference on Pattern Recognition.Paris,France: Paris Business School,2022:2728-2734.DOI:10.1109/ICPR56361.2022.9956381.
[2] YE Mang,SHEN Jianbing,LIN Gaojie,et al.Deep learning for person re-identification: A survey and outlook[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2021,44(6):2872-2893.DOI:10.48550/arXiv.2001.04193.
[3] SUN Ziruo,LIU Xinfang,BI Xiaopeng,et al.Dun: Dual-path temporal matching network for natural language-based vehicle retrieval[C]//IEEE Conference on Computer Vision and Pattern Recognition.Beijing: IEEE Press,2021:4061-4067.DOI:10.1109/CVPRW53098.2021.00458.
[4] ZHAO Qianqian,SU Jiajun,ZHU Jianqing,et al.Modality-consistent attention for visible-infrared vehicle re-identification[J].IEEE Signal Processing Letters,2024,31:1910-1914.DOI:10.1109/LSP.2024.3431920.
[5] XU Bocheng,XIONG Yihua,ZHANG Ruiet al.Natural language-based vehicle retrieval with explicit cross-modal representation learning[C]//IEEE Conference on Computer Vision and Pattern Recognition.New Orleans,Louisiana:IEEE Press,2022:3142-3149.DOI:10.1109/CVPRW56347.2022.00354.
[6] DU Yunhao,ZHANG Binyu,RUAN Xiangning,et al.OMG: Observe multiple granularities for natural language-based vehicle retrieval[C]//IEEE Conference on Computer Vision and Pattern Recognition.New Orleans,Louisiana:IEEE Press,2022:3124-3133.DOI:10.48550/arXiv.2204.08209.
[7] LOU Yihang,BAI Yan,LIU Jun,et al.VERI-Wild: A large dataset and a new method for vehicle re-identification in the wild[C]//IEEE Conference on Computer Vision and Pattern Recognition.Long Beach,California: IEEE Press,2019:3235-3243.DOI:10.1109/CVPR.2019.00335.
[8] FENG Qi,ABLAVSKY V,SCLAROFF S.CityFlow-NL: Tracking and retrieval of vehicles at city scale by natural language descriptions[EB/OL].(2021-04-05)[2025-08-30] .https://doi.org/10.48550/arXiv.2101.04741.
[9] LIU Jianfei,ZHAO Chunhui,ZHAO Chen,et al.MCGS-ReID: A Visible-infrared vehicle re-identification method using modal-cross graph sampler[J].IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing,2024,18:18806-18818.DOI:10.1109/jstars.2024.3513551.
[10] ZHENG Aihua,ZHU Xianpeng,LI Chenglong,et al.Multi-spectral vehicle re-identification with cross-directional consistency network and a high-quality benchmark[EB/OL].(2022-08-01)[2025-08-30] .https://arxiv.org/pdf/2208.00632v1
[11] LI Hongchao,LI Chenglong,ZHENG Aihua,et al.Multi-spectral vehicle re-identification: A challenge[C]//AAAI Conference on Artificial Intelligence.[S. l.]: AAAI Press,2020:11345-11353.DOI:10.1609/aaai.v34i07.6796.
[12] ZHENG Aihua,ZHU Xianpeng,MA Zhiqi,et al.Cross-directional consistency network with adaptive layer normalization for multi-spectral vehicle re-identification and a high-quality benchmark[J].Information Fusion,2023,100:101901.DOI:10.1016/j.inffus.2023.101901.
[13] ZHENG Aihua,MA Zhiqi,SUN Yongqi,et al.Flare-aware cross-modal enhancement network for multi-spectral vehicle re-identification[J].Information Fusion,2025,116:102800.DOI:10.1016/j.inffus.2024.102800.
[14] LI Hongchao,CHEN Jingong,ZHENG Aihua,et al.Day-night cross-domain vehicle re-identification[C]//IEEE Conference on Computer Vision and Pattern Recognition.Seattle, Washington D C: IEEE Press,2024:12626-12635.
[15] ZHANG Yukang,WANG Hanzi.Diverse embedding expansion network and low-light cross-modality benchmark for visible-infrared person re-identification[C]//IEEE Conference on Computer Vision and Pattern Recognition.Vancouver, Canada: IEEE Press,2023:2153-2162.DOI:10.1109/CVPR52729.2023.00214.
[16] YE Mang,SHEN Jianbing,CRANDALL D J,et al.Dynamic dual-attentive aggregation learning for visible-infrared person re-identification[C]//European Conference on Computer Vision.Glasgow,UK: Springer International Publishing,2020:229-247.DOI:10.1007/978-3-030-58520-4_14.
[17] ZHANG Yukang,YAN Yan,LU Yang,et al.Towards a unified middle modality learning for visible-infrared person re-identification[C]//ACM International Conference on Multimedia.Chengdu: ACM Press,2021:788-796.DOI:10.1145/3474085.3475229.
[18] ZHENG Aihua,SUN Yongqi,WANG Zi,et al.Collaborative enhancement network for low-quality multi-spectral vehicle re-identification[EB/OL].(2025-04-21)[2025-08-30] .https://arxiv.org/abs/2504.14877.
[19] QIU Liuxiang,CHEN Si,YAN Yan,et al.High-order structure based middle-feature learning for visible-infrared person re-identification[C]//AAAI Conference on Artificial Intelligence.Vancouver, Canada: AAAI Press,2024:4596-4604.DOI:10.1609/aaai.v38i5.28259.
[20] SEO T M,KANG D J.Disentangled reflectance-qmbient feature learning for day-night vehicle re-identification[J].Applied Soft Computing,2025,181:113539.DOI:10.1016/j.asoc.2025.113539.
[21] WANG Yuhao,LIU Yang,ZHENG Aihua,et al.Decoupled feature-based mixture of experts for multi-modal object re-identification[C]//AAAI Conference on Artificial Intelligence.Philadelphia,Pennsylvania: AAAI Press,2025:8141-8149.DOI:10.48550/arxiv.2412.10650.
[22] HUANG Linhan,CHEN Yutao,LIU Liu,et al.Harmonizing metric discrepancy for cross-modal object re-identification[J/OL].IEEE Transactions on Circuits and Systems for Video Technology,1-15[2025-10-24] .DOI:10.1109/TCSVT.2025.3576091.
[23] ZHENG Aihua,WANG Zi,CHEN Zihan,et al.Robust multi-modality person re-identification[C]//AAAI Conference on Artificial Intelligence.Palo Alto,USA: AAAI Press,2021:3529-3537.DOI:10.1609/aaai.v35i13.16467.
[24] WANG Zi,LI Chenglong,ZHENG Aihua,et al.Interact, embed, and enlarge: Boosting modality-specific representations for multi-modal person re-identification[C]//AAAI Conference on Artificial Intelligence.Palo Alto,California,USA: AAAI Press,2022:2633-2641.DOI:10.1609/aaai.v36i3.20165.
[25] WANG Yuhao,LIU Xuehu,ZHANG Pingping,et al.TOP-ReID: Multi-spectral object re-identification with token permutation[C]//AAAI Conference on Artificial Intelligence.Vancouver,Canada: AAAI Press,2024:5758-5766.DOI:10.1609/aaai.v38i6.28388.
[26] ZHANG Pingping,WANG Yuhao,LIU Yang,et al.Magic tokens: Select diverse tokens for multi-modal object re-identification[C]//IEEE Conference on Computer Vision and Pattern Recognition.Seattle,USA: IEEE Press,2024:17117-17126.DOI:10.1109/CVPR52733.2024.00001.
[27] WANG Yuhao,LIU Xuehu,YAN Tianyu,et al.MambaPro: Multi-modal object re-identification with mamba aggregation and synergistic prompt[C]//AAAI Conference on Artificial Intelligence.Philadelphia,Pennsylvania,USA: AAAI Press,2025:8150-8158.DOI:10.1609/aaai.v39i8.28730.
[28] WANG Yuhao,Lü Yongfeng,ZHANG Pingping,et al.IDEA: Inverted text with cooperative deformable aggregation for multi-modal object re-identification[EB/OL].(2025-03-13)[2025-08-30] .https://doi.org/10.48550/arXiv.2503.10324.
[29] RADFORD A,KIM J W,HALLACY C,et al.Learning transferable visual models from natural language supervision[EB/OL].(2021-02-26)[2025-08-30] .https://doi.org/10.48550/arXiv.2103.00020.
[30] PARK H,LEE S,LEE J,et al.Learning by aligning: Visible-infrared person re-identification using cross-modal correspondences[C]//IEEE International Conference on Computer Vision.[S. l.]: IEEE Press,2021:12046-12055.DOI:10.1109/ICCV48922.2021.01183.
[31] ZHANG Yiyuan,Kang Yuhao,ZHAO Sanyuan,et al.Dual-semantic consistency learning for visible-infrared person re-identification[J].IEEE Transactions on Information Forensics and Security,2022,18:1554-1565.DOI:10.1109/TIFS.2022.3224853.
[32] LUO Hao,GU Youzhi,LIAO Xingyu,et al.Bag of tricks and a strong baseline for deep person re-identification[C]//IEEE Conference on Computer Vision and Pattern Recognition Workshops.Long Beach,California,USA: IEEE Press,2019:1487-1495.DOI:10.1109/CVPRW.2019.00190.
[33] YANG Mouxing,HUANG Zhenyu,HU Peng,et al.Learning with twin noisy labels for visible-infrared person re-identification[C]//IEEE Conference on Computer Vision and Pattern Recognition.New Orleans,Louisiana,USA: IEEE Press,2022:14308-14317.DOI:10.1109/CVPR52688.2022.01391.
[34] WANG Jiangming,ZHANG Zhizhong,CHEN Mingang,et al.Optimal transport for label-efficient visible-infrared person re-identification[C]//AVIDAN S,BROSTOW G,CISSé M,et al.European Conference on Computer Vision: Lecture Notes in Computer Science.Tel Aviv,Israel: Springer,Cham,2022:93-109.DOI:10.1007/978-3-031-20053-3_6.
[35] DING Leqi,LIU Lei,HUANG Yan,et al.Text-to-image vehicle re-identification: Multi-scale multi-view cross-modal alignment network and a unified benchmark[J].IEEE Transactions on Intelligent Transportation Systems,2024,25(7):7673-7686.DOI:10.1109/TITS.2024.3370340.
[36] LIU Xinchen,LIU Wu,Mei Tao,et al.A deep learning-based approach to progressive vehicle re-identification for urban surveillance[C]//European Conference on Computer Vision.Amsterdam,Netherlands:Springer International Publishing,2016:869-884.DOI:10.1007/978-3-319-46475-6_53.
[37] LUO Hao,JIANG Wei,GU Youzhi,et al.A strong baseline and batch normalization neck for deep person re-identification[J].IEEE Transactions on Multimedia,2019,22(10):2597-2609.DOI:10.1109/TMM.2019.2958756.
[38] YE Mang,RUAN Weijian,DU Bo,et al.Channel augmented joint learning for visible-infrared recognition[C]//IEEE International Conference on Computer Vision.Montreal,Canada: IEEE Press,2021:13567-13576.DOI:10.1109/ICCV48922.2021.01331.
[39] SUN Hanzhe,LIU Jian,ZHANG Zhizhong,et al.Not all pixels are matched: Dense contrastive learning for cross-modality person re-identification[C]//ACM International Conference on Multimedia.Lisbon,Portugal: ACM Press,2022:5333-5341.
[40] LU Hu,ZOU Xuezhang,ZHANG Pingping.Learning progressive modality-shared transformers for effective visible-infrared person re-identification[C]//AAAI Conference on Artificial Intelligence.Washington D C,USA: AAAI Press,2023:1835-1843.DOI:10.1609/aaai.v37i2.25273.
[41] LI Shangze,LU Andong,HUANG Yan,et al.Joint token and feature alignment framework for text-based person search[J].IEEE Signal Processing Letters,2022,29:2238-2242.DOI:10.1109/LSP.2022.3217682.
[42] CHEN Yuhao,ZHANG Guoqing,LU Yujiang,et al.TIPCB: A simple but effective part-based convolutional baseline for text-based person search[J].Neurocomputing,2022,494:171-181.DOI:10.1016/j.neucom.2022.04.081.
[43] LI Junnan,SELVARAJU R R,GOTMARE A D,et al.Align before fuse: Vision and language representation learning with momentum distillation[J].Advances in Neural Information Processing Systems,2021,34:9694-9705.DOI:10.48550/arXiv.2107.07651.
[44] BIN Yi,LI Haoxuan,XU Yahui,et al.Unifying two-stream encoders with transformers for cross-modal retrieval[C]//ACM International Conference on Multimedia.Ottawa,Canada: ACM Press,2023:3041-3050.DOI:.
[45] GAO Chenyang,CAI Guanyu,JIANG Xinyang,et al.Contextual non-local alignment over full-scale representation for text-based person search[EB/OL].(2021-01-08)[2025-08-30] .https://doi.org/10.48550/arXiv.2101.03036.
[46] DING Zefeng,DING Changxing,SHAO Zhiyin,et al.Semantically self-aligned network for text-to-image part-aware person re-identification[EB/OL].(2021-07-21)[2025-08-30] .https://doi.org/10.48550/arXiv.2107.12666.
[47] HE Shuting,LUO He,JIANG Wei,et al.VGSG: Vision-guided semantic-group network for text-based person search[J].IEEE Transactions on Image Processing,2023,33:163-176.DOI:10.1109/TIP.2023.3337653.
[48] HE Kaiming,ZHANG Xiangyu,REN Shaoqing,et al.Deep residual learning for image recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas,USA: IEEE Press,2016:770-778.DOI:10.1109/CVPR.2016.90.
[49] PENNINGTON J,SOCHER R,MANNING C D.Glove: Global vectors for word representation[C]//ACL Conference on Empirical Methods in Natural Language Processing.Doha,Qatar: ACL Press,2014:1532-1543.DOI:10.3115/v1/D14-1162.
[50] DEY R,SALEM F M.Gate-variants of gated recurrent unit(GRU)neural networks[C]//IEEE International Midwest Symposium on Circuits and Systems.Boston,USA: IEEE Press,2017:1597-1600.DOI:10.1109/MWSCAS.2017.8053243.
[51] LEE S R,WOO T K,LEE S H.SBNet: Segmentation-based network for natural language-based vehicle search[C]//IEEE Conference on Computer Vision and Pattern Recognition.[S. l.]: IEEE Press,2021:4054-4060.DOI:10.1109/CVPRW53098.2021.00457.
[52] CLARK K,LUONG M T,LE Q V,et al.Electra: Pre-training text encoders as discriminators rather than generators[J].[EB/OL].(2020-03-23)[2025-08-30] .https://doi.org/10.48550/arXiv.2003.10555.
[53] ZHANG Ruixuan,YANG Zhepu,DAI Tao,et al.MAGAE: Multi-level alignment over aggregation semantic graph with attribute enhancement for text-based vehicle retrieval[J].IEEE Transactions on Intelligent Transportation Systems,2025,26(9):13704-13720.DOI:10.1109/TITS.2025.3566502.
[54] SHAO Zhiyin,ZHANG Xinyu,FANG Meng,et al.Learning granularity-unified representations for text-to-image person re-identification[C]//ACM International Conference on Multimedia.Lisbon,Portugal: ACM Press,2022:5566-5574.DOI:10.1145/3503161.3548028.
[55] HE Shuting,LUO Hao,WANG Pichao,et al.Transreid: Transformer-based object re-identification[C]//IEEE International Conference on Computer Vision.Montreal,Canada: IEEE Press,2021:15013-15022.DOI:10.1109/ICCV48922.2021.01474.
[56] YANG Xiangpeng,ZHU Linchao,WANG Xiaohan,et al.DGL: Dynamic global-local prompt tuning for text-video retrieval[C]//AAAI Conference on Artificial Intelligence.Vancouver,Canada: AAAI Press,2024:6540-6548.DOI:10.1609/aaai.v38i7.28475.
[57] XUE Hongwei,SUN Yuchong,LIU Bei,et al.CLIP-ViP: Adapting pre-trained image-text model to video-language alignment[C]//International Conference on Learning Representations. Kigali,Rwanda: IEEE Press,2023:1-15.DOI:10.48550/OpenReview.gnjzMagawq.
[58] LI Junnan,LI Dongxu,SAVARESE S,et al.BLIP-2: Bootstrapping language-image pre-training with frozen image encoders and large language models[C]//International Conference on Machine Learning.Honolulu,Hawaii,USA: [s. n.],2023:19730-19742.DOI:10.48550/arXiv.2301.12597.
[59] ZHANG Jiacheng,LIN Xiangru,JIANG Minyue,et al.A multi-granularity retrieval system for natural language-based vehicle retrieval[C]//IEEE Conference on Computer Vision and Pattern Recognition.New Orleans,Louisiana,USA: IEEE Press,2022:3216-3225.DOI:10.1109/CVPRW56347.2022.00363.
[60] NGO B H,NGUYEN D T,DO-TRAN T N et al.Comprehensive visual features and pseudo labeling for robust natural language-based vehicle retrieval[C]//IEEE Conference on Computer Vision and Pattern Recognition.New Orleans,Louisiana,USA: IEEE Press,2023:5409-5418.DOI:10.1109/CVPRW56347.2022.00364.
[61] SHEN Wei,FANG Ming,WANG Yuxia,et al.Enhancing visual representation for text-based person searching[J].Knowledge-Based Systems,2025,309:112893.DOI:10.48550/arXiv.2412.20646.
[62] LE H D A,NGUYEN Q Q V,NGUYEN V A,et al.Tracked-vehicle retrieval by natural language descriptions with domain adaptive knowledge[C]//IEEE Conference on Computer Vision and Pattern Recognition.New Orleans,Louisiana,USA: IEEE Press,2022:3300-3309.DOI:10.1109/CVPRW56347.2022.00373.
[63] ZHAO Chuyang,CHEN Haobo,ZHANG Wenyuan,et al.Symmetric network with spatial relationship modeling for natural language-based vehicle retrieval[C]//IEEE Conference on Computer Vision and Pattern Recognition.New Orleans,Louisiana,USA: IEEE Press,2022:3226-3233.DOI:10.1109/CVPRW56347.2022.00369.
[64] NGUYEN H T,PHAM M K,NGUYEN T P,et al.Text query based traffic video event retrieval with global-local fusion embedding[C]//IEEE Conference on Computer Vision and Pattern Recognition.New Orleans,Louisiana,USA: IEEE Press,2022:3134-3141.DOI:10.1109/CVPRW56347.2022.00353.

备注/Memo

备注/Memo:: 收稿日期: 2025-08-30
通信作者: 朱建清(1987-),男,教授,博士,博士生导师,主要从事机器视觉、模式识别、智能视频分析和目标再辨识等的研究。E-mail:jqzhu@hqu.edu.cn。
基金项目: 福建省科技兴警研究计划项目(2024Y0064); 福建省自然科学基金杰出青年科研项目(2022J06023); 福建省泉州市高层次人才创新创业项目(2023C013)https://hdxb.hqu.edu.cn/

更新日期/Last Update: 2025-09-20

《华侨大学学报（自然科学版）》[ISSN:1000-5013/CN:35-1079/N]

文章信息/Info

参考文献/References:

备注/Memo

常用功能

导航/Navigate

工具/Tools

统计/Statistics