[1]陈鑫晶,陈锻生.分类重构堆栈生成对抗网络的文本生成图像模型[J].华侨大学学报(自然科学版),2019,40(4):549-555.[doi:10.11830/ISSN.1000-5013.201807038]
 CHEN Xinjing,CHEN Duansheng.Text to Image Model With Classification-Reconstruction Stack Generative Adversarial Networks[J].Journal of Huaqiao University(Natural Science),2019,40(4):549-555.[doi:10.11830/ISSN.1000-5013.201807038]
点击复制

分类重构堆栈生成对抗网络的文本生成图像模型()
分享到:

《华侨大学学报(自然科学版)》[ISSN:1000-5013/CN:35-1079/N]

卷:
第40卷
期数:
2019年第4期
页码:
549-555
栏目:
出版日期:
2019-07-10

文章信息/Info

Title:
Text to Image Model With Classification-Reconstruction Stack Generative Adversarial Networks
文章编号:
1000-5013(2019)04-0549-07
作者:
陈鑫晶 陈锻生
华侨大学 计算机科学与技术学院, 福建 厦门 361021
Author(s):
CHEN Xinjing CHEN Duansheng
College of Computer Science and Technology, Huaqiao University, Xiamen 361021, China
关键词:
文本生成图像 堆栈生成对抗网络 分类 重构 跨模态学习
Keywords:
text to image stack generative adversarial networks classification reconstruction Cross-modal learning
分类号:
TP391.41
DOI:
10.11830/ISSN.1000-5013.201807038
文献标志码:
A
摘要:
利用堆栈生成对抗网络,提出分类重构堆栈生成对抗网络.第一阶段生成64 px×64 px的图像,第二阶段生成256 px×256 px的图像.在每个阶段的文本生成图像中,加入图像类别信息、特征和像素重构信息辅助训练,生成质量更好的图像.将图像模型分别在Oxford-102、加利福尼亚理工学院鸟类数据库(CUB)和微软COCO(MS COCO)数据集上进行验证,使用Inception Score评估生成图像的质量和多样性.结果表明:提出的模型具有一定的效果,在3个数据集上的Inception Score值分别是3.54,4.16和11.45,相应比堆栈生成对抗网络提高10.6%,12.4%和35.5%.
Abstract:
Using the stack generative adversarial networks, we propose classification and reconstruction stack generative adversarial network. We have generated 64 px×64 px resolution images in the stage Ⅰ, then we synthesize 256 px×256 px resolution images in the Stage Ⅱ. In each stage of the text to image, we add the image category information, feature and pixel reconstruction information to assist in generating high-quality images. In this paper, we test and verify the presented model on Oxford-102, Caltech-University of California San Diego Birds(CUB)and Microsoft COCO(MS COCO)datasets, and evaluated the quality and diversity of generated images with Inception Score. The results show that the model proposed in this paper has certain effects, Inception Score on the three datasets were 3.54, 4.16 and 11.45, respectively, which increased by 10.6%, 12.4%, and 35.5% over the stack generative adversarial networks.

参考文献/References:

[1] XU K,BA J,KIROS R,et al.Show, attend and tell: Neural image caption generation with visual attention[C]//International Conference on Machine Learning.Lille:[s.n.],2015:2048-2057.
[2] 邹辉杜,吉祥翟,传敏,等.深度学习与一致性表示空间学习的跨媒体检索[J].华侨大学学报(自然科学版),2018,39(1):127-132.DOI:10.11830/ISSN.1000-5013.201508047.
[3] WEI Yunchao,ZHAO Yao,LU Canyi,et al.Cross-modal retrieval with CNN visual features: A new baseline[J].IEEE Transactions on Cybernetics,2017,47(2):449-460.DOI:10.1109/TCYB.2016.2519449.
[4] GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Generative adversarial nets[C]//Advances in Neural Information Processing Dystems.Montreal:[s.n.],2014:2672-2680.
[5] REED S.Generative adversarial text to image synthesis[J].International Machine Learning Society,2016(48):1060-1069.
[6] NILSBACK M E,ZISSERMAN A.Automated flower classification over a large number of classes[C]//Conference on Computer Vision, Graphics and Image Processing.Washington:IEEE Press,2008:722-729.DOI:10.1109/ICVGIP.2008.47.
[7] WAH C, BRANSON S, WELINDER P, et al. Caltech-UCSD birds 200[EB/OL]. [2011-10-26] [2018-06-15] .http://www.vision.caltech.edu/visipedia/CUB-200.html.
[8] LIN Tsungyi,MAIRE M,BELONGIE S,et al.Microsoft COCO: Common objects in context[C]//European Conference on Computer Vision.Zurich:[s.n.],2014:740-755.
[9] ZHANG Han,XU Tao,LI Hongsheng,et al.Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks[J].IEEE International Conference on Computer Vision,2017,2(3):5908-5916.DOI:10.1109/ICCV.2017.629.
[10] ZHANG Han,XU Tao,LI Hongsheng,et al.Stackgan++: Realistic image synthesis with stacked generative adversarial networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017(99):1.DOI:10.1109/TPAMI.2018.2856256.
[11] AYUSHMAN D.TAC-GAN-text conditioned auxiliary classifier generative adversarial network[J/OL].[2017-03-26] [2018-07-10] .https://arxiv.org/abs/1703.06412.
[12] NGUYEN A.Plug and play generative networks: Conditional iterative generation of images in latent space[J].IEEE Conference on Computer Vision and Pattern Recognition,2017(21):3510-3520.DOI:10.1109/CVPR.2017.374.
[13] SHIKHAR S.ChatPainter: Improving text to image generation using dialogue[J/OL].[2018-02-22] [2018-06-12] .https://arxiv.org/abs/1802.08216.
[14] REED S,AKATA Z,LEE H,et al.Learning deep representations of fine-grained visual descriptions[C]//Proceedings of the IEEE Computer Vision and Pattern Recognition.Las Vegas:IEEE Press,2016:49-58.DOI:10.1109/CVPR.2016.13.
[15] SALIMANS T,GOODFELLOW I,ZAREMBA W,et al.Improved techniques for training gans[C]//Advances in Neural Information Processing Systems.Barcelona:[s.n.],2016:2234-2242.
[16] DAS A,KOTTUR S,GUPTA K,et al.Visual dialog[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE Press,2017:1080-1089.DOI:10.1109/TPAMI.2018.2828437.

备注/Memo

备注/Memo:
收稿日期: 2018-07-22
通信作者: 陈锻生(1959-),男,教授,博士,主要从事数字图像处理与模式识别的研究.E-mail:dschen@hqu.edu.cn.
基金项目: 国家自然科学基金资助项目(61502182); 福建省科技计划重点项目(2015H0025)
更新日期/Last Update: 2019-07-20