• Article
  • | |
  • Metrics
  • |
  • Reference [42]
  • |
  • Related
  • |
  • Cited by [38]
  • | |
  • Comments
    Abstract:

    Copy detection has very important application in both intellectual property protection and information retrieval. Currently, copy detection concentrates on document copy detection mainly. In early days, document copy detection concentrated on program plagiarism detection mainly and now the most studies are on text copy detection. In this paper, a comprehensive survey on natural language text copy detection is given, the developments of copy detection is introduced. The approaches and features of a variety of existing text copy detection systems or prototypes are reviewed in detail. Then some key detection techniques are listed and compared with each other. In the end, the future trend of text copy detection is discussed.

    Reference
    [1]Popek G. J., Kline C. S. Encryption and secure computer networks. ACM Computing Surveys, 1979,11(4):331~356.
    [2]Griswold GN. A method for protecting copyright on networks. In: Proceedings of the Joint Harvard MIT Workshop on Technology Strategies for Protecting Intellectual Property in the Networked Multimedia Environment. Cambridge: MIT Press, 1993. 214~221.
    [3]Brassil J, Low S, Maxemchuk N, O' Gorman L. Electronic marking and identification techniques to discourage document copying. IEEE Journal on Selected Areas in Communications, 1995,13(8):1495~1504.
    [4]Choudhury A, Maxemchuk N, Paul S, Schulzrinne H. Copyright protection for electronic publishing over computer networks. IEEE Network, 1995,9(3):12~21.
    [5]Kahng AB, Kirovski D, Mantik S, Potkonjak M, Wong JL. Copy detection for intellectual property protection of VLSI design. In: Proceedings of the Conference on Computer-Aided Design. 1999. 600~604. http://ieeexplore.ieee.org.
    [6]Wise MJ. YAP3: Improved detection of similarities in computer programs and other texts. In: Proceedings of the SIGCSE'96. 1996, 130~134. http://citeseer.nj.nec.com/wise96yap.html.
    [7]Yoshitaka A, Ichikawa T. A survey on content-based retrieval for multimedia databases. IEEE Transactions on Knowledge and Data Engineering, 1999,11(1):81~93.
    [8]Idris F, Panchanathan S. Review of image and video indexing techniques. Journal of Visual Communication and Image Representation, 1997,8(2):146~166.
    [9]Lu HQ, Kong WX, Liao Ming, Ma SD. A review of content-based parsing and retrieving for image and video. Acta Automatica Sinica, 2001,27(1):56~70 (in Chinese with English abstract).
    [10]Verco KL, Wise MJ. Software for detecting suspected plagiarism: comparing structure and attribute-counting systems. In: Proceedings of the 1st Australian Conference on Computer Science Education. 1996. 3~5. http://citeseer.nj.nec.com/ verco96software.html.
    [11]Grier S. A tool that detects plagiarism in PASCAL programs. SIGCSE Bulletin, 1981,13(1):15~20.
    [12]Gitchell D, Tran N. Sim: A utility for detecting similarity in computer programs. In: Proceedings of the 30th SIGCSE Technical Symposium on Computer Science Education. ACM Press, 1999. 266~270. http://doi.acm.org/10.1145/299649.299783.
    [13]Prechelt L, Malpohl G, Philippsen M. Finding plagiarism among a set of programs with Jplag. Journal of Universal Computer Science, 2002,8(11):1016~1038.
    [14]Parker A, Hamblen JO. Computer algorithms for plagiarism detection. IEEE Transactions on Education, 1989,32(2):94~99.
    [15]Clough P. Plagiarism in natural and programming languages: An overview of current tools and technologies. Research Memoranda: CS-00-05, Department of Computer Science, University of Sheffield, 2000.
    [16]Singhe S, Tweedie FJ. Neural networks and disputed authorship: New challenges. In: Proceedings of the 4th International Conference on Artificial Neural Networks. IEEE, 1995. 24~28. http://ieeexplore.ieee.org.
    [17]Manber U. Finding similar files in a large file system. In: Proceedings of the Winter USENIX Conference. 1994. 1~10. ttp://manber.com/publications.html.
    [18]Brin S, Davis J, Garcia-Molina H. Copy detection mechanisms for digital documents. In: Proceedings of the ACM SIGMOD Annual Conference. 1995. http://www-db.stanford.edu/pub/brin/1995/copy.ps.
    [19]Shivakumar N, Garcia-Molina H. SCAM: A copy detection mechanism for digital documents. In: Proceedings of the 2nd International Conference in Theory and Practice of Digital Libraries (DL'95). 1995. http://www-db.stanford.edu/~shiva/ publns.html.
    [20]Shivakumar N, Garcia-Molina H. Building a scalable and accurate copy detection mechanism. In: Proceedings of the 1st ACM Conference on Digital Libraries (DL'96). 1996. http://www-db.stanford.edu/~shiva/publns.html.
    [21]Salton G. The state of retrieval system evaluation. Information Processing & Management, 1992,28(4):441~449.
    [22]Garcia-Molina H, Gravano L, Shivakumar N. dSCAM: Finding document copies across multiple databases. In: Proceedings of the 4th International Conference on Parallel and Distributed Systems (PDIS'96). 1996. http://www-db.stanford.edu/~shiva/publns.html.
    [23]Shivakumar N, Garcia-Molina H. Finding near-replicas of documents on the web. In: Proceedings of the Workshop on Web Databases (WebDB'98) Held in Conjunction with EDBT'98. 1998. http://www-db.stanford.edu/~shiva/publns.html.
    [24]Heintze N. Scalable document fingerprinting. In: Proceedings of the 2nd USENIX Workshop on Electronic Commerce. 1996. http://www.cs.cmu.edu/afs/cs/user/nch/www/koala/main.html.
    [25]Broder AZ, Glassman SC, Manasse MS. Syntactic clustering of the Web. In: Proceedings of the 6th International Web Conference. 1997. http://gatekeeper.research.compaq.com/pub/DEC/SRC/technical-notes/SRC-1997-015-html/.
    [26]Si A, Leong HV, Lau RWH. CHECK: A document plagiarism detection system. In: Proceedings of the ACM Symposium for Applied Computing. 1997. 70~77. http://www.acm.org/pubs/citations/proceedings/ sac/331697/p70-si/.
    [27]Monostori K, Zaslavsky A, Schmidt H. MatchDetectReveal: Finding overlapping and similar digital documents. In: Proceedings of the Information Resources Management Association International Conference (IRMA2000). 2000. http://www.csse.monash.edu.au/ projects/MDR/papers/.
    [28]Monostori K, Zaslavsky A, Schmidt H. Parallel overlap and similarity detection in semi-structured document collections. In: Proceedings of the 6th Annual Australasian Conference on Parallel And Real-Time Systems (PART'99). 1999. http://www.csse.monash.edu.au/projects/MDR/papers/.
    [29]Monostori K, Zaslavsky A, Schmidt H. Document overlap detection system for distributed digital libraries. In: Proceedings of the ACM Digital Libraries 2000 (DL2000). 2000. http://www.csse.monash.edu.au/projects/MDR/papers/.
    [30]Monostori K, Zaslavsky A, Schmidt H. Parallel and distributed overlap detection on the Web. In: Proceedings of the Workshop on Applied Parallel Computing (PARA2000). 2000. http://www.csse.monash.edu.au/projects/MDR/papers/.
    [31]Monostori K, Zaslavsky A, Vajk I. Suffix vector: A space-efficient representation of a suffix tree. Technical Report, 2001.
    [32]Song QB, Shen JY. On illegal coping and distributing detection mechanism for digital goods. Journal of Computer Research and Development, 2001,38(1):121~125 (in Chinese with English abstract).
    [33]Glatt plagiarism screening program. 2003. http://www.plagiarism.com/screen.id.htm.
    [34]Plagiarism.org. 2003. http://www.plagiarism.org.
    [35]http://www.canexus.com/eve/abouteve.shtml. 2003.
    [36]http://www.wordchecksystems.com/. 2003.
    [37]Measure of software similarity. 2003. http://www.cs.berkeley.edu/~moss/general/moss.html.
    [38]Bull J, Collins C, Coughlin E, Sharp D. Technical review of plagiarism detection software report. http://www.jisc.ac.uk/. 2003.
    [39]Condron F. Plagiarism and the Internet. Report on the Electronic Plagiarism Detection Workshop, JISC (Joint Information Systems Committee). 2001. http://www.oucs.ox.ac.uk/ltg/reports/plag.shtml.
    [40]Culwin F, Lancaster T. A review of electronic services for plagiarism detection in student submissions. In: Proceedings of the LTSN-ICS conference 2000. 2000. http://www.ics.ltsn.ac.uk/pub/conf2000/Papers/Culwin.pdf.
    [41]卢汉清,孔维新,廖明,马颂德.基于内容的视频信号与图像库检索中的图像技术.自动化学报,2001,27(1):56~70.
    [42]宋擒豹,沈钧毅.数字商品非法复制和扩散的监测机制.计算机研究与发展,2001,38(1):121~125.
    Related
Get Citation

鲍军鹏,沈钧毅,刘晓东,宋擒豹.自然语言文档复制检测研究综述.软件学报,2003,14(10):1753-1760

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:September 03,2002
  • Revised:September 03,2002
You are the first2033211Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063