THE AI-BASED AUTOMATIC SPEECH RECOGNITION SYSTEM

04.05.2023 21:54

[1. Information systems and technologies]

Author: Andrii Dumyn, postgraduate, Lviv Polytechnic National University, Lviv

The amount of audio and video content on the Internet is increasing daily. However, users often need help finding audio or video content on a topic of interest presented in an unfamiliar language. The growing popularity of streaming platforms such as YouTube, Netflix, Amazon Prime Video, and others facilitates this. According to statista.com [1], the most common languages in the world are English, Chinese (Mandarin), Hindi, and Spanish. Only on YouTube, 33% of videos are in English and 67% in other languages [2]. For this reason, automated translation and voiceover systems are prevalent. However, the speaker's emotional component and other features must be recovered during the automated dubbing texts or audio from other languages. Such a system will simplify the process of adapting audio and video content to the users of one or another country. It will help make a large part of exciting content available to users.

The scientific community is actively working on solving the problems of voice analysis, obtaining metadata from it. In particular, the authors of [3] are building a neural network model for determining the speaker's gender by voice. Concerning research on the emotionality of speech, the authors of [4] provide a brief overview of the most relevant developments in the computational processing of emotions in the voice. The main goal of the work [5] is to improve the speed of recognition of speech emotions using various feature extraction algorithms.

In general, the developed system should consist of several modules that can be customized and extended, for example, to support different languages or improve their operation.

The first stage in the system is pre-processing the audio. This module will be responsible for breaking the audio into structural units based on the sound of a single voice. The system should determine the emotional coloring of phrases, gender, age (child, adult, elderly), and other speech features (accent, hoarseness) based on previously prepared data. For this, developing a group of appropriate classifiers is required, the results of which will complement each other. The following module converts the data prepared at the first stage into text. At this stage, the audio or video will be transcribed, and a matrix of the duration of the phrase will be compiled. After that, a matrix of the duration of the potential sound of the translated phrases will be compiled. A set of models will be developed for automatic voice generation considering the emotional component, age, gender. The final module of the system ensures the unification of all audio recordings into one; if necessary, the function of a sure leveling of the soundtrack is possible. Also, this module will add an audio track to the video sequence (when dubbing the video).

The obtained work results will form the basis of further research in developing a group of classifiers for determining the emotional coloring of speech, gender, age, and features of human speech. Based on the proposed architecture, the interconnected system's design and development are planned.

References

1.Statista Search Department (2023, Mar 9th) The most spoken languages worldwide 2022 [Infographic]. Statista. URL: https://www.statista.com/statistics/266808/the-most-spoken-languages-worldwide/(date of access: 9.03.2023).

2.Pew Research Center (2019 July 25th) Popular YouTube channels produced a vast amount of content, much of it in languages other than English. Washington, D.C. URL: https://www.pewresearch.org/internet/2019/07/25/popular-youtube-channels-produced-a-vast-amount-of-content-much-of-it-in-languages-other-than-english/ (date of access: 8.03.2023)

3.Chachadi, K., Nirmala, S. R. 2022. Voice-based gender recognition using neural network. In Information and Communication Technology for Competitive Strategies (ICTCS 2020) (pp. 741-749). Springer, Singapore. DOI=https://doi.org/10.1007/978-981-16-0739-4_70.

4.Schuller, D. M., & Schuller, B. W. (2021). A Review on Five Recent and Near-Future Developments in Computational Processing of Emotion in the Human Voice. Emotion Review, 13(1), 44–50. DOI=https://doi.org/10.1177/1754073919898526.

5.Koduru, A., Valiveti, H.B., Budati, A.K. 2020. Feature extraction algorithms to improve the speech emotion recognition rate. Int J Speech Technol 23, 45–55 (2020). https://doi.org/10.1007/s10772-020-09672-4.

Ця робота ліцензується відповідно до Creative Commons Attribution 4.0 International License

Знайшли помилку? Виділіть помилковий текст мишкою і натисніть Ctrl + Enter

Another articles in this section

Сonferences

Conference 2026

Information society: technological, economic and technical aspects of formation (issue 106) (15-16.01.2026)

Conference 2025

Information society: technological, economic and technical aspects of formation (issue 95) (16-17.01.2025)

Information society: technological, economic and technical aspects of formation (issue 96) (11-12.02.2025)

Information society: technological, economic and technical aspects of formation (issue 97) (13-14.03.2025)

Information society: technological, economic and technical aspects of formation (issue 98) (15-16.04.2025)

Information society: technological, economic and technical aspects of formation (issue 99) (14-15.05.2025)

Information society: technological, economic and technical aspects of formation (issue 100) (11-12.06.2025)

Information society: technological, economic and technical aspects of formation (issue 101) (09-10.07.2025)

Information society: technological, economic and technical aspects of formation (issue 102) (16-17.09.2025)

Information society: technological, economic and technical aspects of formation (issue 103) (14-15.10.2025)

Information society: technological, economic and technical aspects of formation (issue 104) (13-14.11.2025)

Information society: technological, economic and technical aspects of formation (issue 105) (11-12.12.2025)

Conference 2024

Information society: technological, economic and technical aspects of formation (issue 84) (18-19.01.2024)

Information society: technological, economic and technical aspects of formation (issue 85) (15-16.02.2024)

Information society: technological, economic and technical aspects of formation (issue 86) (12-13.03.2024)

Information society: technological, economic and technical aspects of formation (issue 87) (11-12.04.2024)

Information society: technological, economic and technical aspects of formation (issue 88) (14-15.05.2024)

Information society: technological, economic and technical aspects of formation (issue 89) (12-13.06.2024)

Information society: technological, economic and technical aspects of formation (issue 90) (9-10.07.2024)

Information society: technological, economic and technical aspects of formation (issue 91) (10-11.09.2024)

Information society: technological, economic and technical aspects of formation (issue 92) (8-9.10.2024)

Information society: technological, economic and technical aspects of formation (issue 93) (12-13.11.2024)

Information society: technological, economic and technical aspects of formation (issue 94) (11-12.12.2024)

Conference 2023

Information society: technological, economic and technical aspects of formation (issue 74) (06-07.02.2023)

Information society: technological, economic and technical aspects of formation (issue 75) (06-07.03.2023)

Information society: technological, economic and technical aspects of formation (issue 76) (03-04.04.2023)

Information society: technological, economic and technical aspects of formation (issue 77) (09-10.05.2023)

Information society: technological, economic and technical aspects of formation (issue 78) (08-09.06.2023)

Information society: technological, economic and technical aspects of formation (issue 79) (06-07.07.2023)

Information society: technological, economic and technical aspects of formation (issue 80) (19-20.09.2023)

Information society: technological, economic and technical aspects of formation (issue 81) (11-12.10.2023)

Information society: technological, economic and technical aspects of formation (issue 82) (9-1.11.2023)

Information society: technological, economic and technical aspects of formation (issue 83) (7-8.12.2023)

Conference 2022

Information society: technological, economic and technical aspects of formation (issue 65) (8-9.02.2022)

Information society: technological, economic and technical aspects of formation (issue 66) (6-7.04.2022)

Information society: technological, economic and technical aspects of formation (issue 67) (11-12.05.2022)

Information society: technological, economic and technical aspects of formation (issue 68) (7-8.06.2022)

Information society: technological, economic and technical aspects of formation (issue 69) (4-5.07.2022)

Information society: technological, economic and technical aspects of formation (issue 70) (22-23.09.2022)

Information society: technological, economic and technical aspects of formation (issue 71) (18-19.10.2022)

Information society: technological, economic and technical aspects of formation (issue 72) (15-16.11.2022)

Information society: technological, economic and technical aspects of formation (issue 73) (08-09.12.2022)

Conference 2021

Information society: technological, economic and technical aspects of formation (Issue 55) (09.02.2021)

Information society: technological, economic and technical aspects of formation (Issue 56) (10.03.2021)

Information society: technological, economic and technical aspects of formation (issue 57) (13.04.2021)

Information society: technological, economic and technical aspects of formation (issue 58) (12.05.2021)

Information society: technological, economic and technical aspects of formation (issue 59) (08.06.2021)

Information society: technological, economic and technical aspects of formation (issue 60) (13.07.2021)

Information society: technological, economic and technical aspects of formation (issue 61) (15.09.2021)

Information society: technological, economic and technical aspects of formation (issue 62) (12.10.2021)

Information society: technological, economic and technical aspects of formation (issue 63) (11.11.2021)

Information society: technological, economic and technical aspects of formation (issue 64) (10.12.2021)

Congratulation from Internet Conference!

Рік заснування видання - 2011

THE AI-BASED AUTOMATIC SPEECH RECOGNITION SYSTEM

Another articles in this section