[1]贺凤,张洪博,杜吉祥,等.长短时记忆网络的自由体操视频自动描述方法[J].华侨大学学报(自然科学版),2020,41(6):808-815.[doi:10.11830/ISSN.1000-5013.201911047]
 HE Feng,ZHANG Hongbo,DU Jixiang,et al.Floor Exercise Video Automatic Description Method Using Long Short-Term Memory Network[J].Journal of Huaqiao University(Natural Science),2020,41(6):808-815.[doi:10.11830/ISSN.1000-5013.201911047]
点击复制

长短时记忆网络的自由体操视频自动描述方法()
分享到:

《华侨大学学报(自然科学版)》[ISSN:1000-5013/CN:35-1079/N]

卷:
第41卷
期数:
2020年第6期
页码:
808-815
栏目:
出版日期:
2020-11-20

文章信息/Info

Title:
Floor Exercise Video Automatic Description Method Using Long Short-Term Memory Network
文章编号:
1000-5013(2020)06-0808-08
作者:
贺凤123 张洪博123 杜吉祥123 汪冠鸿123
1. 华侨大学 计算机科学与技术学院, 福建 厦门 361021;2. 华侨大学 福建省大数据智能与安全重点实验室, 福建 厦门 361021;3. 华侨大学 厦门市计算机视觉与模式识别重点实验室, 福建 厦门 361021
Author(s):
HE Feng123 ZHANG Hongbo123DU Jixiang123 WANG Guanhong123
1. College of Computer Science and Technology, Huaqiao University, Xiamen 361021, China; 2. Fujian Key Laboratory of Big Data Intelligence and Security, Huaqiao University, Xiamen 361021, China; 3. Xiamen Key Laboratory of Computer Vision and Pattern Recognition, Huaqiao University, Xiamen 361021, China
关键词:
长短时记忆网络 注意力机制 自由体操 自动描述
Keywords:
long short-term memory network attention mechanism floor exercise automatic description
分类号:
TP183
DOI:
10.11830/ISSN.1000-5013.201911047
文献标志码:
A
摘要:
提出一种长短时记忆网络的自由体操视频自动描述方法.在视频描述模型S2VT中,通过长短时记忆网络学习单词序列和视频帧序列之间的映射关系.引入注意力机制对S2VT模型进行改进,增大含有翻转方向、旋转度数、身体姿态等关键帧的权重,提高自由体操视频自动描述的准确性.建立自由体操分解动作数据集,在数据集MSVD及自建数据集上进行3种模型的对比实验,并通过计划采样方法消除训练解码器与预测解码器之间的差异.实验结果表明:文中方法可提高自由体操视频自动描述的精度.
Abstract:
An automatic description method of floor exercise video based on long short-term memory network was proposed. In the video description model S2VT, learning the mapping relationship between word sequence and video frame sequence through long and short-term memory network. The attention mechanism was introduced to improve the S2VT model, increase the weight of key frames including turning direction, rotation degree and body posture, and improve the accuracy of automatic description of floor exercise video. The data set of floor exercise decomposition was established, and three models were compared among the data set MSVD and self built data set. The difference between the training decoder and the prediction decoder was eliminated by the scheduled sampling method. The experimental results showed that the proposed method can improve the accuracy of automatic description of floor exercise video.

参考文献/References:

[1] 张世杰.基于深度学习的体育视频关键姿态检测[D].北京:北京工业大学,2017.
[2] 茅洁,谷倩.深度学习优化蚁群算法的羽毛球项目技战术决策研究[J].运动,2016(18):5-6.DOI:10.3969/j.issn.1674-151x.2016.18.003.
[3] 王皓蜀.基于Mean Shift 算法的网球运动视频目标跟踪研究[J].现代电子技术,2017,40(13):73-76.DOI:10.16652/j.issn.1004-373x.2017.13.0190.
[4] 杨彬,王同喜.基于DSP-FPGA 的嵌入式篮球运动视频目标跟踪算法实现[J].湘潭大学学报(自然科学版),2018,40(6):104-108.
[5] 付裕.神经网络在足球比赛中的胜负预测[J].科技风,2018(23):215.DOI:10.19392/j.cnki.1671-7341.2018 23214.
[6] 王英英.基于体育训练的运动视频分析系统设计与开发[J].自动化技术与应用,2019(8):35.
[7] 马月洁,冯爽,王永滨.基于深度学习的足球球员跟踪算法研究[J].中国传媒大学学报(自然科学版),2018,25(3):60-64.DOI:10.3969/j.issn.1673-1328.2017.17.046.
[8] 杨斌.足球比赛视频中的目标检测与跟踪算法研究[J].计算机测量与控制,2017,25(9):266-268.DOI:10.16526/j.cnki.11-4762/tp.2017.09.068.
[9] 朱施成,李旭.2017-2020女子自由体操评分规则变化趋势及其影响因素探析[J].湖北师范大学学报(哲学社会科学版),2018,38(1):75-80.DOI:10.3969/j.issn.2096-3130.2018.01.017.
[10] KOJIMA A,TAMURA T,FUKUNAGA K.Natural language description of human activities from video images based on concept hierarchy of actions[J].International Journal of Computer Vision,2002,50(2):171-184.DOI:10.1023/A:1020346032608.
[11] GUADARRAMA S,KRISHNAMOORTHY N,MALKARNENKAR G,et al.Youtube2text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shot recognition[C]//Proceedings of the IEEE International Conference on Computer Vision.Sydney:IEEE Press,2013:2712-2719.DOI:10.1109/ICCV.2013.337.
[12] ROHRBACH M,QIU Wei,TITOV I,et al.Translating video content to natural language descriptions[C]//Proceedings of the IEEE International Conference on Computer Vision.Sydney:IEEE Press,2013:433-440.DOI:10.1109/ICCV.2013.61.
[13] XU Ran,XIONG Caiming,CHEN Wei,et al.Jointly modeling deep video and compositional text to bridge vision and language in a unified framework[C]//Twenty-Ninth AAAI Conference on Artificial Intelligence.Austin:AAAI Publications,2015:2346-2352.
[14] PAN Yingwei,YAO Ting,LI Houqiang,et al.Video captioning with transferred semantic attributes[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Honolulu:IEEE Press,2017:6504-6512.DOI:10.1109/CVPR.2017.111.
[15] KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenet classification with deep convolutional neural networks[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems.New York:Curran Associates,2012:1097-1105.DOI:10.1145/3065386.
[16] RUSSAKOVSKY O,JIA Deng,SU Hao,et al.Imagenet large scale visual recognition challenge[J].International Journal of Computer Vision,2015,115(3):211-252.DOI:10.1007/s11263-015-0816-y.
[17] VENUGOPALAN S,XU Huijuan,DONAHUE J,et al.Translating videos to natural language using deep recurrent neural networks[EB/OL].[2019-10-15] .https://arxiv.org/abs/1412.4729.
[18] SHETTY R,LAAKSONEN J.Frame-and segment-level features and candidate pool evaluation for video caption generation[C]//Proceedings of the 24th ACM International Conference on Multimedia.New York:[s.n.],2016:1073-1076.DOI:10.1145/2964284.2984062.
[19] JIN Qin,CHEN Jia,CHEN Shizhe,et al.Describing videos using multi-modal fusion[C]//Proceedings of the 24th ACM International Conference on Multimedia.New York:[s.n.],2016:1087-1091.DOI:10.1145/2964284.298 4065.
[20] SUNDERMEYER M,NEY H,SCHLüTER R.From feedforward to recurrent LSTM neural networks for language modeling[J].IEEE/ACM Transactions on Audio, Speech, and Language Processing,2015,23(3):517-529.DOI:10.1109/TASLP.2015.2400218.
[21] HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.DOI:10.1162/neco.1997.9.8.1735.
[22] GERS F A,SCHMIDHUBER J,CUMMINS F J N C.Learning to forget: Continual prediction with LSTM[J].1999,12(10):2451-2471.DOI:10.1162/089976600300015015.
[23] VENUGOPALAN S,ROHRBACH M,DONAHUE J,et al.Sequence to sequence-video to text[C]//Proceedings of the IEEE International Conference on Computer Vision.Santiago:IEEE Press,2015:4534-4542.DOI:10.1109/ICCV.2015.515.
[24] SHARIF R A,AZIZPOUR H,SULLIVAN J,et al.CNN features off-the-shelf: An astounding baseline for recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.Columbus:IEEE Press,2014:806-813.DOI:10.1109/CVPRW.2014.131.
[25] SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[EB/OL].[2019-10-15] .https://arxiv.org/abs/1409.1556.
[26] ZHAO Haitao,SUN Shaoyuan,JIN Bo.Sequential fault diagnosis based on lstm neural network[J].IEEE Access,2018(6):12929-12939.DOI:10.1109/ACCESS.2018.2794765.
[27] YAO Li,TORABI A,CHO K,et al.Describing videos by exploiting temporal structure[C]//Proceedings of the IEEE International Conference on Computer Vision.Santiago:IEEE Press,2015:4507-4515.DOI:10.1109/ICCV.2015.512.
[28] PAPINENI K,ROUKOS S, WARD T, et al. BLEU: A method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting on Association for Computational Linguistics.Stroudsburg:[s.n.],2002:311-318.
[29] CHEN D L,DOLAN W B.Collecting highly parallel data for paraphrase evaluation[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics.Stroudsburg:[s.n.],2011:190-200.
[30] 张颖,袁和金.基于3D卷积神经网络的人体行为识别方法[J].软件导刊,2017,16(11):9-11.DOI:10.11907/rjdk.172515.

相似文献/References:

[1]方娜,余俊杰,李俊晓,等.注意力机制下的EMD-GRU短期电力负荷预测[J].华侨大学学报(自然科学版),2021,42(6):817.[doi:10.11830/ISSN.1000-5013.202008003]
 FANG Na,YU Junjie,LI Junxiao,et al.Short-Term Power Load Forecasting Under EMD-GRU Attention Mechanism[J].Journal of Huaqiao University(Natural Science),2021,42(6):817.[doi:10.11830/ISSN.1000-5013.202008003]
[2]吴雨泽,聂卓赟,周长新.注意力叠加与时序特征融合的目标检测方法[J].华侨大学学报(自然科学版),2022,43(5):650.[doi:10.11830/ISSN.1000-5013.202103034]
 WU Yuze,NIE Zhuoyun,ZHOU Changxin.Object Detection Method of Attention Superposition and Temporal Feature Fusion[J].Journal of Huaqiao University(Natural Science),2022,43(6):650.[doi:10.11830/ISSN.1000-5013.202103034]
[3]钟铭恩,谭佳威,袁彬淦,等.复杂交通环境下二轮机动车乘员头盔检测算法[J].华侨大学学报(自然科学版),2023,44(3):301.[doi:10.11830/ISSN.1000-5013.202212028]
 ZHONG Mingen,TAN Jiawei,YUAN Bingan,et al.Helmet Detection Algorithm of Two-Wheeled Motor Vehicle Occupant in Complex Traffic Environment[J].Journal of Huaqiao University(Natural Science),2023,44(6):301.[doi:10.11830/ISSN.1000-5013.202212028]
[4]方昱龙,王泽锦,王华珍,等.基于模板学习的智能侨情问句生成方法[J].华侨大学学报(自然科学版),2023,44(6):735.[doi:10.11830/ISSN.1000-5013.202304010]
 FANG Yulong,WANG Zejin,WANG Huazhen,et al.Intelligent Question Generation Method Based on Template Learning for Overseas Chinese Situation[J].Journal of Huaqiao University(Natural Science),2023,44(6):735.[doi:10.11830/ISSN.1000-5013.202304010]

备注/Memo

备注/Memo:
收稿日期: 2019-11-19
通信作者: 张洪博(1986-),男,副教授,博士,主要从事计算机视觉、模式识别和图像视频分析的研究.E-mail:hongbobest@gmail.com.
基金项目: 国家自然科学基金资助项目(61871196, 61673186); 福建省自然科学基金资助项目(2017J01110, 2019J01082)
更新日期/Last Update: 2020-11-20