MUSEMBLE: A Novel Music Retrieval System with Automatic Voice Query Transcription and Reformulation
Journal of Systems and Software
(JSS)
(1) Graduate School of Information and Communication, Artificial Intelligence Laboratory, Ajou University.
(2) School of Electrical Engineering, Korea University.
(3) College of Information and Communication, Ajou University.
Abstract
So far, many researches have been done to develop efficient music retrieval systems, and query-by-humming has
been considered as one of the most intuitive and effective query methods for music retrieval. For the voice humming
to be a reliable query source, elaborate signal processing and acoustic similarity measurement schemes are necessary.
On the other hand, recently, there has been an increased interest in query reformulation using relevance feedback with
evolutionary techniques such as genetic algorithm for multimedia information retrieval. However, these techniques
have not been exploited widely in the field of music retrieval. In this paper, we develop a novel music retrieval
system called MUSEMBLE (MUSic enEMBLE) based on two distinct features: (i) A sung or hummed query is
automatically transcribed into a sequence of pitch and duration pairs with improved accuracy for music
representation. More specifically, we developed two new and unique techniques called WAE (Windowed Average
Energy) and Dynamic ADF (Amplitude-based Difference Function) onsets for more accurate note segmentation and
onset/offset detection in acoustic signal, respectively. The former improved energy-based approaches such as AE by
defining small but coherent windows with local and global threshold values. On the other hand, the latter improved
the AF (Amplitude Function) that calculates the summation of the absolute values of signal differences for the
clustering energy contour. (ii) A user query is reformulated using user relevance feedback with a genetic algorithm to
improve retrieval performance. Even though we have especially focused on humming queries in this paper,
MUSEMBLE provides versatile query and browsing interfaces for various kinds of users. We have carried out
extensive experiments on the prototype system to evaluate the performance of our voice query transcription and
genetic algorithm based relevance feedback schemes. We demonstrate that our proposed method improves the
retrieval accuracy up to 20~40% compared with other popular RF methods. We also show that both WAE and
Dynamic ADF methods improve the transcription accuracy up to 95%.
Full paper
Now available on ScienceDirect(JSS) as Article in Press.
Last Updated: 2007-06-15
All rights reserved by Byeong-jun, Han. 2005-2009.