Improvement of Speech/Music Classification for 3GPP EVS Based on LSTM

Citations

WEB OF SCIENCE

7
Citations

SCOPUS

6

초록

The competition of speech recognition technology related to smartphones is now getting into full swing with the widespread internet of thing (IoT) devices. For robust speech recognition, it is necessary to detect speech signals in various acoustic environments. Speech/music classification that facilitates optimized signal processing from classification results has been extensively adapted as an essential part of various electronics applications, such as multi-rate audio codecs, automatic speech recognition, and multimedia document indexing. In this paper, we propose a new technique to improve robustness of a speech/music classifier for an enhanced voice service (EVS) codec adopted as a voice-over-LTE (VoLTE) speech codec using long short-term memory (LSTM). For effective speech/music classification, feature vectors implemented with the LSTM are chosen from the features of the EVS. To overcome the diversity of music data, a large scale of data is used for learning. Experiments show that LSTM-based speech/music classification provides better results than the conventional EVS speech/music classification algorithm in various conditions and types of speech/music data, especially at lower signal-to-noise ratio (SNR) than conventional EVS algorithm.

키워드

speech/music classificationEnhanced Voice Servicelong short-term memorybig data
제목
Improvement of Speech/Music Classification for 3GPP EVS Based on LSTM
저자
Kang, Sang-IckLee, Sangmin
DOI
10.3390/sym10110605
발행일
2018-11
유형
Article
저널명
Symmetry
10
11