• Article
  • | |
  • Metrics
  • |
  • Reference [26]
  • | |
  • Cited by [2]
  • | |
  • Comments
    Abstract:

    This paper proposes a two-phase rating predicting framework that fuses co-clustering and non-negative matrix factorization method. First, it uses a novel co-clustering method (BlockClust) to divide the raw rating matrix into clusters much smaller than the original matrix. Then it employs weighted non-negative matrix factorization algorithm to predict the unknown ratings. In virtue of co-clustering preprocessing, this method achieves a higher predicting accuracy and efficiency on these low-dimensional and homogeneous sub-matrices. Moreover, it proposes three update schemes for the corresponding update scenarios in recommender systems. Finally, the proposed method is implemented together with seven types of related CF (collaborative filtering) methods. The comparisons show the efficiency of the proposed method and its potential in large real-time recommender systems.

    Reference
    [1] Xu HL, Wu X, Li XD, Yan BP. Comparison study of Internet recommendation system. Journal of Software, 2009,20(2):350-362 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/3388.htm [doi: 10.3724/SP.J.1001.2009.03388]
    [2] Marlin B. Collaborative Filtering: A machine learning perspective [MS. Thesis]. Toronto: University of Toronto, 2004.
    [3] Hofmann T. Latent semantic models for collaborative filtering. ACM Trans. on Information System, 2004,22(1):89-115. [doi: 10.1145/963770.963774]
    [4] Blei DM, Ng AY, Jordan MI. Latent Dirichlet allocation. Journal of Machine Learning Research, 2003,3(3):993-1022. [doi: 10.1162/ jmlr.2003.3.4-5.993]
    [5] Netflix update: Try this at home. 2006. http://sifter.org/~simon/journal/20061211.html [6] Zhang S, Wang WH, Ford J, Makedon F. Learning from incomplete ratings using non-negative matrix factorization. In: Ghosh J, ed. Proc. of the 6th SIAM Conf. on Data Mining. Bethesda: SIAM, 2006. 549-553.
    [7] Cheng YZ, Church GM. Biclustering of expression data. In: Bourne PE, ed. Proc. of the 8th Int’l Conf. on Intelligent Systems for Molecular Biology. La Jolla: AAAI Press, 2000. 93-103. [doi: 10.1016/j.ipm.2008.12.004]
    [8] Cheng G, Wang F, Zhang CS. Collaborative filtering using orthogonal nonnegative matrix tri-factorization. Information Processing & Management, 2009,45(3):368-379.
    [9] Shan HH, Banerjee A. Bayesian co-clustering. In: Altman R, ed. Proc. of the ICDM 2008. Washington: IEEE Computer Society Press, 2008. 530-539.
    [10] Dhillon SI. Co-Clustering documents and words using bipartite spectral graph partitioning. In: Lee D, ed. Proc. of the 7th ACM SIGKDD. New York: ACM Press, 2001. 269-274.
    [11] Wang J, de Vries AP, Reinders MJT. Unifying user-based and item-based collaborative filtering approaches by similarity fusion. In: Efthimiadis EN, ed. Proc. of the 29th Annual Int’l ACM SIGIR. New York: ACM Press, 2006. 501-508.
    [12] Dhillon IS, Mallela S, Modha DS. Information-Theoretic co-clustering. In: Getoor L, ed. Proc. of the 9th ACM SIGKDD. New York: ACM Press, 2003. 89-98.
    [13] Banerjee A, Dhillon I, Ghosh J, Merugu S, Modha DS. A generalized maximum entropy approach to Bregman co-clustering and matrix approximation. Journal of Machine Learning Research, 2007,8(8):1919-1986.
    [14] Agarwal D, Merugu S. Predictive discrete latent factor models for large scale dyadic data. In: Berkhin P, ed. Proc. of the SIGKDD. New York: ACM Press, 2007. 26-35.
    [15] Li XG, Yu G, Wang DL, Bao YB. Latent concept extraction and text clustering based on information theory. Journal of Software, 2008,19(9):2276-2284 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/19/2276.htm [doi: 10.3724/SP.J.1001. 2008.02276]
    [16] Shafiei MM, Milios EE. Latent Dirichlet co-clustering. In: Liu JM, ed. Proc. of the 6th Int’l Conf. on Data Mining. Washington: IEEE Computer Society Press, 2006. 542-551.
    [17] George T, Merugu S. A scalable collaborative filtering framework based on co-clustering. In: Raghavan V, ed. Proc. of the 5th IEEE Int’l Conf. on Data Mining. Washington: IEEE Computer Society Press, 2005. 625-628.
    [18] Long B, Zhang ZF, Yu PS. Co-Clustering by block value decomposition. In: Grossman R, ed. Proc. of the SIGKDD 2005. New York: ACM Press, 2005. 635-640.
    [19] Gaussier E, Goutte C. Relation between PLSA and NMF and implications. In: Marchionini G, ed. Proc. of the 28th Annual Int’l ACM SIGIR. New York: ACM Press, 2005. 601-602.
    [20] Donoho D, Stodden V. When does non-negative matrix factorization give a correct decomposition into parts? In: Thrun S, Saul L, Sch?lkopf B, eds. Advances in Neural Information Processing Systems 16. Cambridge: MIT Press, 2004. 1141–1148.
    [21] Langville AN, Meyer CD, Albright R. Initializations for the nonnegative matrix factorization. In: Ungar L, ed. Proc. of the 12th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining. New York: ACM Press, 2006. http://meyer.math.ncsu.edu/ Meyer/PS-Files/NMFInit.pdf
    [22] Cao B, Shen D, Sun JT, Wang XH, Yang Q, Chen Z. Detect and track latent factors with online nonnegative matrix factorization. In: Proc. of the IJCAI. 2007. 2689-2694. http://dli.iiit.ac.in/ijcai/IJCAI-2007/PDF/IJCAI07-432.pdf
    [23] Wu H, Zhang D, Wang YJ, Cheng X. Incremental probabilistic latent semantic analysis for automatic question recommendation. In: Pu P, ed. Proc. of the Recommender System 2008. New York: ACM Press, 2008. 99-106.
    [24] Koren Y. Factorization meets the neighborhood: A multifaceted collaborative filtering model. In: Li Y, ed. Proc. of the 14th ACM SIGKDD. New York: ACM Press, 2008. 426-434.
    [25] Pan F, Zhang X, Wang W. A general framework for fast co-clustering on large datasets using matrix decomposition. In: Alonso G, ed. Proc. of the 24th Int’l Conf. on Data Engineering. Washington: IEEE Computer Society Press, 2008. 1337-1339.
    附中文参考文献: [1] 许海玲,吴潇,李晓东,阎保平.互联网推荐系统比较研究.软件学报,2009,20(2):350-362. http://www.jos.org.cn/1000-9825/ 3388.htm [doi: 10.3724/SP.J.1001.2009.03388]
    [15] 李晓光,于戈,王大玲,鲍玉斌.基于信息论的潜在概念获取与文本聚类.软件学报,2008,19(9):2276-2284. http://www.jos.org.cn/ 1000-9825/19/2276.htm [doi: 10.3724/SP.J.1001.2008.02276]
    Related
    Comments
    Comments
    分享到微博
    Submit
Get Citation

吴 湖,王永吉,王 哲,王秀利,杜栓柱.两阶段联合聚类协同过滤算法.软件学报,2010,21(5):1042-1054

Copy
Share
Article Metrics
  • Abstract:6773
  • PDF: 12056
  • HTML: 0
  • Cited by: 0
History
  • Received:May 07,2009
  • Revised:October 19,2009
You are the first2032482Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063