A Robust Bootstrapping Algorithm of Speaker Models for On-Line Unsupervised Speaker Indexing
Affiliation:

  • Article
  • | |
  • Metrics
  • |
  • Reference [21]
  • |
  • Related [20]
  • |
  • Cited by [2]
  • | |
  • Comments
    Abstract:

    A robust bootstrapping framework, which employs Multi-EigenSpace modeling technique based on regression class (RCpMES) to build speaker models with sparse data, and a short-segments clustering to prevent the too short segments from influencing bootstrapping, are proposed in this paper. For a real discussion archived with a total duration of 8 hours, the significant robustness of the proposed method is demonstrated, which not only improves the speaker change detection performance but also outperforms the conventional bootstrapping methods, enen if the average bootstrapping segment duration is less than 5 seconds.

    Reference
    [1]Delacourt P,Wellekens CJ.DISTBIC:A speaker-based segmentation for audio data indexing.Speech Communication,2000,32(1-2):111-126.
    [2]Lu L,Zhang HJ.Unsupervised speaker segmentation and tracking in real-time audio content analysis.Multimedia Systems,2005,10(4):332-343.
    [3]Sancho SS,Ascensión GA,José MLM,Carlos BC.Offline speaker segmentation using genetic algorithms and mutual information.IEEE Trans.on Evolutionary Computation,2006,10(2):175-186.
    [4]Meignier S,Moraru D,Fredouille C,Bonastre JF,Besacier L.Step-By-Step and integrated approaches in broadcast news speaker diarization.Computer Speech and Language,2006,20(1-2):303-330.
    [5]Aronowitz H,Burshtein D,Amir A.Speaker indexing in audio archives using Gaussian mixture scoring simulation.In:Bengio S,Bourlard H,eds.Proc.of the 1st Int'l Workshop on Machine Learning for Multimodal Interaction.LNCS 3361,Heidelberg:Springer-Verlag,2005.243-252.
    [6]Anguera X,Wooters C,Peskin B,Aguilo M.Robust speaker segmentation for meetings:The ICSI-SRI spring diarization system.In:Renals S,Bengio S,eds.Proc.of the 2nd Int'l Workshop on Machine Learning for Multimodal Interation.LNCS 3869,Heidelberg:Springer-Verlag,2005.402-414.
    [7]Campbell JP.Speaker recognition:A tutorial.Proc.of the IEEE,1997,85(9):1437-1462.
    [8]Chen SS,Gopalakrishnan PS.Clustering via the Bayesian information criterion with applications in speech recognition.In:Acero A,Hon HW,eds.Proc.of the 1998 IEEE Int'l Conf.on Acoustics,Speech and Signal Processing,vol.2.Seattle,Washington:IEEE,1998.645-648.
    [9]Gish H,Schmidt N.Text-Independent speaker identification.IEEE Signal Processing Magazine,1994,11(4):18-32.
    [10]Reynolds DA,Quatieri TF,Dunn RB.Speaker verification using adapted Gaussian mixture models.Digital Signal Processing,2000,10:19-41.
    [11]Kwon S,Narayanan S.Unsupervised speaker indexing using generic models.IEEE Trans.on Speech and Audio Processing,2005,13(5):1004-1013.
    [12]Nishida M,Kawahara T.Speaker model selection based on the Bayesian information criterion applied to unsupervised speaker indexing.IEEE Trans.on Speech and Audio Processing,2005,13(4):583-592.
    [13]Fu ZH,Zhao RC.Speaker modeling technique based on regression class for speaker identification with sparse training.In:Li SZ,et al.eds.Proc.of the Sinobiometrics 2004.LNCS 3338,Heidelberg:Springer-Verlag,2004.610-616.
    [14]Kuhn R,Junqua JC,Niedzielski NP.Rapid speaker adaptation in eigenvoice space.IEEE Trans.on Speech and Audio Processing,2000,8(6):695-706.
    [15]Fu ZH.Research on robustness of speaker recognition system[Ph.D.Thesis].Xi'an:Northwestern Polytechnique University,2004 (in Chinese with English abstract).
    [16]Ajmera J,McCowan I,Bourland H.Robust speaker change detection.IEEE Signal Processing Letters,2004,11(8):649-651.
    [17]Lu J,Mao B,Sun ZX,Zhang FY.An improved speaker based speech segmentation algorithm.Journal of Software,2002,13(2):274-279 (in Chinese with English abstract).http://www.jos.org.cn/1000-9825/13/274.pdf
    [18]Reynolds DA,Rose RC.Robust text-independent speaker identification using Gaussian mixture speaker models.IEEE Trans.on Speech and Audio Processing,1995,3(1):72-83.
    [19]Garofolo J,et al.DARPA TIMIT acoustic-phonetic continuous speech corpus CD-ROM.National Institute of Standards and Technology,1993.
    [15]付中华,赵荣椿.说话人识别系统鲁棒性研究[博士学位论文].西安:西北工业大学,2004.
    [17]卢坚,毛兵,孙正兴,张福炎.一种改进的基于说话人的语音分割算法.软件学报,2002,13(2):274-279.http://www.jos.org.cn/ 1000-9825/13/274.pdf
    Comments
    Comments
    分享到微博
    Submit
Get Citation

付中华,张艳宁.在线无监督说话人检索中稳健的模型自举算法.软件学报,2007,18(3):608-616

Copy
Share
Article Metrics
  • Abstract:4445
  • PDF: 5553
  • HTML: 0
  • Cited by: 0
History
  • Received:July 28,2006
  • Revised:November 13,2006
You are the first2032758Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063