«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

ISSN.1000-5013.201508047]
点击复制

深度学习与一致性表示空间学习的跨媒体检索()

分享到：

《华侨大学学报（自然科学版）》[ISSN:1000-5013/CN:35-1079/N]

卷:: 第39卷
期数:: 2018年第1期

页码:: 127-132

栏目:

出版日期:: 2018-01-17

文章信息/Info

Title:: Cross-Modal Multimedia Retrieval Based Deep Learning and Shared Representation Space Learning

文章编号:: 1000-5013(2018)01-0127-06

作者:: 邹辉; 杜吉祥; 翟传敏; 王靖; 华侨大学计算机科学与技术学院, 福建厦门 361021

Author(s):: ZOU Hui; DU Jixiang; ZHAI Chuanmin; WANG Jing; College of Computer Science and Technology, Huaqiao University, Xiamen 361021, China

关键词:: 跨模态; 跨媒体; 深度学习; 卷积神经网络; 一致性表示空间; 中心相关性

Keywords:: cross-modal; cross-media; deep learning; convolution neural networks; shared presentation space; centered correlation

分类号:: TP391

DOI:: 10.11830/ISSN.1000-5013.201508047

文献标志码:: A

摘要:: 提出一种基于深度学习与一致性表示空间学习的方法,针对图像与文本2种模态,分别采用卷积神经网络模型和潜在狄利克雷分布算法学习图像的深度特征和文档的主题概率分布;通过一个概率模型将两个高度异构的向量空间非线性映射到一个一致性表示空间;采用中心相关性算法计算不同模态信息在此空间的距离.在Wikipedia Dataset上的实验结果表明:在单模态输入检索中,文中方法的平均准确率为38.43%,相比于其他方法有明显提高.

Abstract:: A new learning method based deep learning and shared representation space learning is proposed in this paper. Using image and text as an example, we learn the deep learning features of images by convolution neural networks, and learn the text topic probability distribution by a latent Dirichlet allocation model respectively. Then nonlinear mapping the two features spaces into a shared presentation space by a probability model. At last, we adopt centered correlation to measure the distance between them. The experimental results in the Wikipedia Dataset show that our approach is better than that of the similar experiments for single mode input retrieval in recent years and its mean average precision reaches 38.43%.

参考文献/References:

[1] YANG Yi,XU Dong,NIE Feiping,et al.Ranking with local regression and global alignment for cross media retrieval[C]//Proceedings of the 17th International Conference on Multimedia.Vancouver:ACM,2009:175-184.DOI:10.1145/1631272.1631298.
[2] SRIVASTAVA N,SALAKHUTDINOV R R.Multimodal learning with deep boltzmann machines[J].Journal of Machine Learning Research,2014,24(8):1967-2006.
[3] LU Xinyan,WU Fei,TANG Siliang,et al.A low rank structural large margin method for cross-modal ranking[C]//Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval.Dublin:ACM,2013:433-442.DOI:10.1145/2484028.2484039.
[4] WU Fei,LU Xinyan,ZHANG Zhongfei,et al.Cross-media semantic representation via bi-directional learning to rank[C]//Proceedings of the 21st ACM International Conference on Multimedia.New York:ACM,2013:877-886.DOI:10.1145/2502081.2502097.
[5] ZHANG Yanyan,LI Guorong,CHU Lingyang,et al.Cross-media topic detection: A multi-modality fusion framework[C]//IEEE International Conference on Multimedia and Expo.San Jose:IEEE Press,2013:1-6.DOI:10.1109/ICME.2013.6607487.
[6] LI Liang,JIANG Shuqiang,HUANG Qingming.Learning image vicept description via mixed-norm regularization for large scale semantic image search[C]//IEEE Conference on Computer Vision and Pattern Recognition.Providence RI:IEEE Press,2011:825-832.DOI:10.1109/CVPR.2011.5995570.
[7] RASIWASIA N,COSTA P J,COVIELLO E,et al.A new approach to cross-modal multimedia retrieval[C]//Proceedings of the International Conference on Multimedia.Firenze:ACM,2010:251-260.DOI:10.1145/1873951.1873987.
[8] HINTON G E,OSINDERO S,TEH Y W.A fast learning algorithm for deep belief nets[J].Neural Computation,2006,18(7):1527-1554.DOI:10.1162/neco.2006.18.7.1527.
[9] RASIWASIA N, COSTA P J, COVIELLO E,et al.A new approach to cross-modal multimedia retrieval[C]//Proceedings of the 18th International Conference on Multimedia.Firenze:ACM,2010:251-260.
[10] JI Shuiwang,XU Wei,YANG Ming,et al.3D convolutional neural networks for human action recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,35(1):221-231.DOI:10.1109/TPAMI.2012.59.
[11] RAZAVIAN A S,AZIZPOUR H,SULLIVAN J,et al.CNN features off-the-shelf: An astounding baseline for recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition Workshops.Columbus:IEEE Press,2014:512-519.DOI:10.1109/CVPRW.2014.131.
[12] BLEI D M,NG A Y,JORDAN M I.Latent dirichlet allocation[J].Journal of Machine Learning Research,2003,3:993-1022.DOI:10.1162/jmlr.2003.3.4-5.993.
[13] ROSEN-ZVI M,GRIFFITHS T,STEYVERS M,et al.The author-topic model for authors and documents[C]//Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence.Pittsburgh:AAAI Press,2004:487-494.
[14] RAMAGE D,HALL D,NALLAPATI R,et al.Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora[C]//Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing.Singapore:Association for Computational Linguistics,2009:248-256.DOI:10.3115/1699510.1699543.
[15] LIU Yan,NICULESCU-MIZIL A,GRYC W.Topic-link LDA: Joint models of topic and author community[C]//Proceedings of the 26th Annual International Conference on Machine Learning.Quebec:ACM,2009:665-672.DOI:10.1145/1553374.1553460.
[16] JIA Yangqing,SHELHAMER E,DONAHUE J,et al.Caffe: Convolutional architecture for fast feature embedding[C]//Proceedings of the ACM International Conference on Multimedia.Orlando:ACM,2014:675-678.DOI:10.1145/2647868.2654889.
[17] KRIZHEVSKY A,SUTSKEVER I,HINTON G E.ImageNet classification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems.South Lake Tahoe:NIPS,2012:1097-1105.DOI:10.1145/3065386.
[18] NAIR V,HINTON G E.Rectified linear units improve restricted boltzmann machines[C]//Proceedings of the 27th International Conference on Machine Learning.Haifa:[s.n.],2010:807-814.DOI:10.1.1.165.6419.
[19] LI Jun,LUO Wei,YANG Jian,et al.Why does the unsupervised pretraining encourages moderate-sparseness[C]//The 31st International Conference on Machine Learning.Beijing:[s.n.],2014:1-6.
[20] HINTON G E,SRIVASTAVA N,KRIZHEVSKY A,et al.Improving neural networks by preventing co-adaptation of feature detectors[J].ArXiv Preprint ArXiv,2012,3(4):212-223.
[21] WANG Wei,OOI B C,YANG Xiaoyan,et al.Effective multi-modal retrieval based on stacked auto-encoders[J].Proceedings of the VLDB Endowment,2014,7(8):649-660.DOI:10.14778/2732296.2732301.
[22] WU Fei,JIANG Xinyang,LI Xi,et al.Cross-modal learning to rank via latent joint representation[J].IEEE Transactions on Image Processing,2015,24(5):1497-1509.DOI:10.1109/TIP.2015.2403240.
[23] LING Li,ZHAI Xiaohua,PENG Yuxin.Tri-space and ranking based heterogeneous similarity measure for cross-media retrieval[C]//21st International Conference on Pattern Recognition.Ibaraki:IEEE Press,2012:230-233.

备注/Memo

备注/Memo:: 收稿日期: 2015-08-26
通信作者: 杜吉祥(1977-),男,教授,博士,主要从事模式识别、数字图像处理的研究.E-mail:jxdu@hqu.edu.cn.
基金项目: 国家自然科学基金资助项目(61673186, 61175121); 福建省自然科学基金资助项目(2013J06014); 华侨大学中青年教师科研提升计划项目(ZQN-YX108)

常用功能

工具/Tools

统计/Statistics

摘要浏览/Viewed2720
全文下载/Downloads1139
评论/Comments

更新日期/Last Update: 2018-01-20