Download PDFOpen PDF in browser

Design and Evaluation of a Cross-Lingual ML-based Automatic Speech Recognition System Fine-tuned for the Galician Language

4 pagesPublished: February 16, 2023

Abstract

In recent years Machine Learning (ML) strategies have proven to be useful to automate numerous classification and pattern detection tasks in diverse fields thanks to the increase of computational power in hardware. One of such fields is the Automatic Speech Recog- nition (ASR), which can use ML architectures to transcribe human speech into readable text. The Word Error Rate (WER) obtained with ML strategies can become relatively low while providing quick responses, reaching accuracy levels that approach human tran- scription accuracy. However, one of the main drawbacks in traditional architectures is the high demand of transcribed data to obtain a low WER in training. This kind of data is particularly hard to be achieved due to the high dependency on human processing. Luckily, a new framework proposed in 2020 (wav2vec2), considerably reduces the need for audio labelling thanks to the use of a Convolutional Neural Network (CNN) with self-supervised training on cross-lingual unlabelled audios of multiple languages and the ability to fine- tune the obtained results with labelled audios of a specific language. Thus, the framework can obtain results that outperform previous architectures by using much smaller audio datasets with transcriptions. This paper presents an ASR system based on wav2vec 2.0 that is fine-tuned for Galician, a language which currently only has small audio datasets available. Such a system is evaluated with a spontaneous speech dataset of approximately 1 hour from the Galicia Parliament, showing a relatively low WER (18.61%).

Keyphrases: Automatic Speech Recognition, Convolutional Neural Network, language model, machine learning

In: Alvaro Leitao and Lucía Ramos (editors). Proceedings of V XoveTIC Conference. XoveTIC 2022, vol 14, pages 152--155

Links:
BibTeX entry
@inproceedings{XoveTIC2022:Design_and_Evaluation_of,
  author    = {Iv\textbackslash{}'an Froiz-M\textbackslash{}'iguez and \textbackslash{}'Oscar Blanco-Novoa and Paula Fraga-Lamas and Diego Fustes and Jos\textbackslash{}'e Carlos Dafonte V\textbackslash{}'azquez and Javier Pereira and Tiago M. Fern\textbackslash{}'andez-Caram\textbackslash{}'es},
  title     = {Design and Evaluation of a Cross-Lingual ML-based Automatic Speech Recognition System Fine-tuned for the Galician Language},
  booktitle = {Proceedings of V XoveTIC Conference. XoveTIC 2022},
  editor    = {Alvaro Leitao and Luc\textbackslash{}'ia Ramos},
  series    = {Kalpa Publications in Computing},
  volume    = {14},
  pages     = {152--155},
  year      = {2023},
  publisher = {EasyChair},
  bibsource = {EasyChair, https://easychair.org},
  issn      = {2515-1762},
  url       = {https://easychair.org/publications/paper/hdqh},
  doi       = {10.29007/1ppr}}
Download PDFOpen PDF in browser