Correlation Estimating Algorithm of XML Stream Based on Hamming Norms
Affiliation:

  • Article
  • | |
  • Metrics
  • |
  • Reference [13]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    It is of great importance to compare the correlation of different XML (extensible markup language) streams in the limited space in the Database Theory. In the study of these problems, several measures are proposed, e.g. the tree-edit distance, to show the difference of XML trees. This paper proposes a natural measure l0 employing Hamming norms, i.e. the number of distinct sub-trees between two XML trees, to estimate the correlation. Furthermore, a probabilistic estimating algorithm involving space-bounded pseudorandom generators, stable distributions and hash functions has been presented in the data stream model. Theoretical time/space complexity analysis, correctness proof and experimental simulation show that this algorithm can give a desired approximation.

    Reference
    [1] Flajolet P, Martin GN. Probabilistic counting algorithms for data base applications. Journal of Computer and System Sciences, 1985,31(2):182-209. [doi: 10.1016/0022-0000(85)90041-8]
    [2] Alon N, Matias Y, Szegedy M. The space complexity of approximating the frequency moments. Journal of Computer and System Sciences, 1999,58(1):137-147. [doi: 10.1006/jcss.1997.1545]
    [3] Garofalakis M, Kumar A. XML stream processing using tree-edit distance embeddings. ACM Trans. on Database Systems, 2005, 30(1):279-332. [doi: 10.1145/1061318.1061326]
    [4] Polyzotis N, Garofalakis M, Ioannidis Y. Approximate XML query answers. In: Weikum G, ed. Proc. of the 2004 ACM SIGMOD Int’l Conf. on Management of Data. New York: ACM Press, 2004. 263-274.
    [5] Bar-Yossef Z, Jayram TS, Kumar R, Sivakumar D, Trevisan L. Counting distinct elements in a data stream. In: Rolim JDP, Vadhan S, eds. Proc. of the 6th Int’l Workshop on Randomization and Approximation Techniques in Computer Science. Berlin, Heidelberg: Springer-Verlag, 2002. 1-10.
    [6] Gibbons PB, Tirthapura S. Estimating simple functions on the union of data streams. In: Rosenberg A, ed. Proc. of the Symp. on Parallel Algorithms and Architectures. New York: ACM Press, 2001. 281-291.
    [7] Indyk P. Stable distributions, pseudorandom generators, embeddings and data stream computation. In: Proc. of the 41st Symp. on Foundations of Computer Science. Cambridge: IEEE Computer Society Press, 2000. 189-197. http://portal.acm.org/citation.cfm? id=795666.796606
    [8] Cormode G, Datar M, Indyk P, Muthukrishnan S. Comparing data streams using Hamming norms (how to zero in). In: Ramakrishnan R, ed. Proc. of the 28th Int’l Conf. on Very Large Data Bases. New York: VLDB Press, 2002. 335-345.
    [9] Cormode G, Muthukrishnan S. Estimating dominance norms of multiple data streams. In: Battista GD, Zwick U, eds. Proc. of the 11th European Symp. on Algorithms. Berlin, Heidelberg: Springer-Verlag, 2003. 148-160.
    [10] Zolotarev V. One-Dimensional Stable Distributions. In: Vol.65 of Translations of Mathematical Monographs. American Mathematical Society Press, 1986.
    [11] Blum M, Micali S. How to generate cryptographically strong sequences of pseudorandom bit. SIAM Journal on Computing, 1984, 13(4):850-864. [doi: 10.1137/0213053]
    [12] Yao AC. Theory and applications of trapdoor functions. In: Fischer MJ, ed. Proc. of the 23rd IEEE Symp. on Foundations of Computer Science. Cambridge: IEEE Computer Society Press, 1982. 80-91.
    [13] Nisan N. Pseudorandom generators for space-bounded computation. In: Ortiz H, ed. Proc. of the 22nd Symp. on Theory of Computation. ACM Press, 1990. 204-212.
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

孙 贺,朱 洪.基于Hamming范数的XML流相关性估测算法.软件学报,2010,21(4):672-679

Copy
Share
Article Metrics
  • Abstract:4978
  • PDF: 6111
  • HTML: 0
  • Cited by: 0
History
  • Received:December 17,2007
  • Revised:July 02,2008
You are the first2044088Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063