Video Memorability Prediction Based on Global and Local Information

doi:10.13328/j.cnki.jos.005935

微信服务号

微信订阅号

2025-4-5- 10

Home > Archive>Volume 31, Issue 7, 2020 >1969-1979. DOI:10.13328/j.cnki.jos.005935

PDF HTML XML Export Cite reminder

Video Memorability Prediction Based on Global and Local Information
DOI:
                        10.13328/j.cnki.jos.005935
                    
Author:
                        WANG ShuaiWANG Shuai
School of Information, Renmin University of China, Beijing 100872, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
WANG Wei-YingWANG Wei-Ying
School of Information, Renmin University of China, Beijing 100872, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
CHEN Shi-ZheCHEN Shi-Zhe
School of Information, Renmin University of China, Beijing 100872, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
JIN QinJIN Qin
School of Information, Renmin University of China, Beijing 100872, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:National Natural Science Foundation of China (61772535); Beijing Natural Science Foundation (4192028); National Key Research and Development Plan,China (2016YFB1001202)

Article

Figures

Metrics

Reference [31]

Related [20]

Cited by

Materials

Comments

Abstract:

Memorability of a video is a metric to describe that how memorable the video is. Memorable videos contain huge values and automatically predicting the memorability of large numbers of videos can be applied in various applications including digital content recommendation, advertisement design, education system, and so on. This study proposes a global and local information based framework to predict video memorability. The framework consists of three components, namely global context representation, spatial layout, and local object attention. The experimental results of the global context representation and local object attention are remarkable, and the spatial layout also contributes a lot to the prediction. Finally, the proposedmodel improves the performances of thebaseline of MediaEval 2018 Media Memorability Prediction Task.

Key words:video memorability;attention;object detection;neural network

Reference

[1] Romain C, Claire-Hélène D, Duong NQK, Sjöberg M, Ionescu B, Do TT, Rennes F. In:Proc. of the MediaEval 2018:Predicting Media Memorability Task. Sophia Antipolis, 2018. 29-31.

[2] Fajtl J, Argyriou V, Monekosso D, Remagnino P. AMNet:Memorability estimation with attention. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2018. 6363-6372.[doi:10.1109/CVPR.2018.00666]

[3] Gygli M, Grabner H, Riemenschneider H, Nater F, Van Gool L. The interestingness of images. In:Proc. of the IEEE Int'l Conf. on Computer Vision. 2013. 1633-1640.[doi:10.1109/ICCV.2013.205]

[4] Zhong ZM, Guan Y, Hu Y, Li CH. Mining user interests on microblog based on profile and content. Ruan Jian Xue Bao/Journal of Software, 2017,28(2):278-291(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5030.htm[doi:10.13328/j.cnki. jos.005030]

[5] Bhattacharya S, Sukthankar R, Shah M. A frame-work for photo-quality assessment and enhancement based on visual aesthetics. In:Proc. of the ACM Int'l Conf. on Multimedia. 2010. 271-280.[doi:10.1145/1873951.1873990]

[6] Wang CH, Pu YY, Xu D, Zhu J, Tao ZE. Evaluating aesthetics quality in portrait photos. Ruan Jian Xue Bao/Journal of Software, 2015,26(Suppl.(2)):20-28(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/15012.htm

[7] Khosla A, Das Sarma A, Hamid R. What makes an image popular. In:Proc. of the Int'l Conf. on World Wide Web. 2014. 867-876.[doi:10.1145/2566486.2567996]

[8] Kong QC, Mao WJ. Predicting popularity of forum threads based on dynamic evolution. Ruan Jian Xue Bao/Journal of Software, 2014,25(12):2767-2776(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/4730.html.[doi:10.13328/j.cnki.jos. 004730]

[9] Isola P, Xiao JX, Parikh D, Torralba A, Oliva A. What makes a photograph memorable. IEEE Trans. on Pattern Analysis & Machine Intelligence, 2014,36(7):1469-1482.[doi:10.1109/TPAMI.2013.200]

[10] Speer R, Chin J, Havasi C. ConceptNet 5.5:An openmultilingual graph of general knowledge. In:Proc. of the 31st AAAI Conf. on Artificial Intelligence. San Francisco, 2017. 4444-4451. http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14972

[11] Phillips WA. On the distinction between sensory storage and short-termvisual memory. Perception & Psychophysics, 1974,16(2):283-290.[doi:10.3758/BF03203943]

[12] Wang S, Chen SZ, Zhao JM, Jin Q. Video interestingness prediction based on ranking model. In:Proc. of the Joint Workshop of the 4th Workshop on Affective Social Multimedia Computing and first Multi-ModalAffective Computing of Large-Scale Multimedia Data. ACM, 2018. 55-61.[doi:10.1145/3267935.3267952]

[13] Squalli-Houssaini H, Duong NQK, Gwenaëlle M, Demarty CH. Deep learning for predicting image memorability. In:Proc. of the 2018 IEEE Int'l Conf. on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2018. 2371-2375.[doi:10.1109/ICASSP.2018. 8462292]

[14] Baveye Y, Cohendet R, Da Silva MP, LeCallet P. Deep learning for image memorability prediction:The EmotionalBias. In:Proc. of the ACM on Multimedia Conf. 2016. 491-495.[doi:10.1145/2964284.2967269]

[15] Zarezadeh S, Rezaeian M, Sadeghi MT. Image memorability prediction using deep features. In:Proc. of the Electrical Engineering. 2017. 2176-2181.[doi:10.1109/IranianCEE.2017.7985423]

[16] Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. Computer Science, 2014.

[17] Shekhar S, Singal D, Singh H, Kedia M, Shetty A. Show and recall:Learning what makes videos memorable. In:Proc. of the 2017 IEEE Int'l Conf. on Computer Vision Workshops (ICCVW). Venice:IEEE, 2017. 2730-2739.[doi:10.1109/ICCVW.2017.321]

[18] Demarty CH, Sjöberg M, Ionescu B, Do TT, Gygli M, Duong NQK. In:Proc. of the Mediaeval 2017 Predicting Media Interesting Nesstask. 2017.

[19] Dubey R, Peterson J, Khosla A, Yang M, Ghanem B. What makes an object memorable. In:Proc. of the 2015 IEEE Int'l Conf. on Computer Vision (ICCV). 2015. 1089-1097. https://doi.org/10.1109/ICCV.2015.130

[20] Isola P, Xiao J, Torralba A, Oliva A. What makes an image memorable. In:Proc. of the CVPR. 2011. 145-152. https://doi.org/10.1109/CVPR.2011.5995721

[21] Pennington J, Socher R, Manning CD. GloVe:Global vectors for word representation. In:Proc. of the Empirical Methods in Natural Language Processing (EMNLP). 2014. 1532-1543. http://www.aclweb.org/anthology/D14-1162

[22] Arora S, Liang YY, Ma TY. A simple but tough-to-beat baseline for sentence embeddings. In:Proc. of the ICLR. 2017.

[23] Kiros R, Zhu YK, Salakhutdinov RR, Zemel R, Urtasun R, Torralba A, Fidler S. Skip-thought vectors. In:Advances in Neural Information Processing Systems 28. Curran Associates, Inc., 2015. 3294-3302.

[24] Du T, Bourdev L, Fergus R, Torresani L. Learning spatio temporal features with 3D convolutional networks. In:Proc. of the IEEE Int'l Conf. on Computer Vision. 2015. 4489-4497.[doi:10.1109/ICCV.2015.510]

[25] Almeida J, Leite NJ, Torres RDS. Comparison of video sequences with histograms of motion patterns. In:Proc. of the IEEE Int'l Conf. on Image Processing. 2011. 3673-3676.[doi:10.1109/ICIP.2011.6116516]

[26] Carreira J, Zisserman A. Quo Vadis, action recognition? A new modeland the kinetics dataset. In:Proc. of the 2017 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 2017. 4724-4733. https://doi.org/10.1109/CVPR.2017.502

[27] Haas AF, Guibert M, Foerschner A, Co T, Calhoun S, George E, Hatay M, Dinsdale E, Sandin SA, Smith JE, Vermeij MJA, Felts B, Dustan P, Salamon P, Rohwer F. Can we measure beauty? Computational evaluation of coral reef aesthetics. 2015. https://doi.org/10.7717/peerj.1390

附中文参考文献:

[4] 仲兆满,管燕,胡云,李存华.基于背景和内容的微博用户兴趣挖掘.软件学报,2017,28(2):278-291. http://www.jos.org.cn/1000-9825/5030.htm[doi:10.13328/j.cnki.jos.005030]

[6] 王朝晖,普园媛,徐丹,祝娟,陶则恩.人像照片的美感质量评价.软件学报,2015,26(Suppl.(2)):20-28. http://www.jos.org.cn/1000-9825/15012.htm

[8] 孔庆超,毛文吉.基于动态演化的讨论帖流行度预测.软件学报,2014,25(12):2767-2776. http://www.jos.org.cn/1000-9825/4730.html[doi:10.13328/j.cnki.jos.004730]

Get Citation

王帅,王维莹,陈师哲,金琴.基于全局和局部信息的视频记忆度预测.软件学报,2020,31(7):1969-1979

Copy

Article Metrics

Abstract:2755
PDF: 5381
HTML: 4265
Cited by: 0

History

Received:June 07,2019
Revised:July 11,2019
Adopted:
Online: January 17,2020
Published: July 06,2020

You are the first2033139Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History