상세 보기
Deep Learning Models for Electroglottography and Voice conversion to Text using Transfer Learning
초록
In this paper, we present a comparative study on performance of deep learning models for electroglottography (EGG) and voice conversion to text using transfer learning. In this regard, we deployed range of deep learning models such as ResNet101, MobileNetv2, GoogleNet for text recognition using electroglottography and voice signals correspondingly. Firstly, short-time Fourier transform (STFT) is utilized to generate spectrogram using time-series signals (EGG, Voice). Spectrogram images are resized to fulfill the requirement of pre-trained models (ImageNet-weights). Subsequently, rigorous experiments have been performed with various combinations of EGG, Voice and hybrid (EGG and voice). In addition, we have studied the impact of healthy and pathology signals using SVD dataset. Expectedly, the accuracies of healthy voice signals were significantly higher as compared to pathology signals. We analyzed the performance of each model under two combinations (healthy and mix). ResNet 101 outperforms other models in terms of generalizability as the accuracies were significantly higher in all three scenarios. The highest accuracy of RestNet 101 in the scenario of healthy and mix for voice signal is 98.10 and 88.57 respectively.
- 제목
- Deep Learning Models for Electroglottography and Voice conversion to Text using Transfer Learning
- 제목 (타언어)
- 전이 학습을 사용하여 전자 성문 및 음성을 텍스트로 변환하는 딥 러닝 모델
- 저자
- KIM DEOKHWAN
- 학회명
- 2022 한국차세대컴퓨팅학회 춘계 학술대회
- 개최지
- 베스트웨스턴제주호텔
- 학회 개최일
- 2022-05-19 ~ 2022-05-21