参考文献/References:
[1] XU K,BA J,KIROS R,et al.Show, attend and tell: Neural image caption generation with visual attention[C]//International Conference on Machine Learning.Lille:[s.n.],2015:2048-2057.
[2] 邹辉杜,吉祥翟,传敏,等.深度学习与一致性表示空间学习的跨媒体检索[J].华侨大学学报(自然科学版),2018,39(1):127-132.DOI:10.11830/ISSN.1000-5013.201508047.
[3] WEI Yunchao,ZHAO Yao,LU Canyi,et al.Cross-modal retrieval with CNN visual features: A new baseline[J].IEEE Transactions on Cybernetics,2017,47(2):449-460.DOI:10.1109/TCYB.2016.2519449.
[4] GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Generative adversarial nets[C]//Advances in Neural Information Processing Dystems.Montreal:[s.n.],2014:2672-2680.
[5] REED S.Generative adversarial text to image synthesis[J].International Machine Learning Society,2016(48):1060-1069.
[6] NILSBACK M E,ZISSERMAN A.Automated flower classification over a large number of classes[C]//Conference on Computer Vision, Graphics and Image Processing.Washington:IEEE Press,2008:722-729.DOI:10.1109/ICVGIP.2008.47.
[7] WAH C, BRANSON S, WELINDER P, et al. Caltech-UCSD birds 200[EB/OL]. [2011-10-26] [2018-06-15] .http://www.vision.caltech.edu/visipedia/CUB-200.html.
[8] LIN Tsungyi,MAIRE M,BELONGIE S,et al.Microsoft COCO: Common objects in context[C]//European Conference on Computer Vision.Zurich:[s.n.],2014:740-755.
[9] ZHANG Han,XU Tao,LI Hongsheng,et al.Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks[J].IEEE International Conference on Computer Vision,2017,2(3):5908-5916.DOI:10.1109/ICCV.2017.629.
[10] ZHANG Han,XU Tao,LI Hongsheng,et al.Stackgan++: Realistic image synthesis with stacked generative adversarial networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017(99):1.DOI:10.1109/TPAMI.2018.2856256.
[11] AYUSHMAN D.TAC-GAN-text conditioned auxiliary classifier generative adversarial network[J/OL].[2017-03-26] [2018-07-10] .https://arxiv.org/abs/1703.06412.
[12] NGUYEN A.Plug and play generative networks: Conditional iterative generation of images in latent space[J].IEEE Conference on Computer Vision and Pattern Recognition,2017(21):3510-3520.DOI:10.1109/CVPR.2017.374.
[13] SHIKHAR S.ChatPainter: Improving text to image generation using dialogue[J/OL].[2018-02-22] [2018-06-12] .https://arxiv.org/abs/1802.08216.
[14] REED S,AKATA Z,LEE H,et al.Learning deep representations of fine-grained visual descriptions[C]//Proceedings of the IEEE Computer Vision and Pattern Recognition.Las Vegas:IEEE Press,2016:49-58.DOI:10.1109/CVPR.2016.13.
[15] SALIMANS T,GOODFELLOW I,ZAREMBA W,et al.Improved techniques for training gans[C]//Advances in Neural Information Processing Systems.Barcelona:[s.n.],2016:2234-2242.
[16] DAS A,KOTTUR S,GUPTA K,et al.Visual dialog[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE Press,2017:1080-1089.DOI:10.1109/TPAMI.2018.2828437.