Development of Audio-Visual Speech Recognition using Deep-Learning Technique

Chun Kit How; Ismail Mohd Khairuddin; Mohd Azraai Mohd Razman; Anwar P. P. Abdul Majeed; Wan Hasbullah Mohd Isa

doi:10.15282/mekatronika.v4i1.8625

Development of Audio-Visual Speech Recognition using Deep-Learning Technique

Authors

Chun Kit How Faculty of Manufacturing and Mechatronics Engineering Technology, Universiti Malaysia Pahang, 26600 Pahang, Malaysia.
Ismail Mohd Khairuddin Faculty of Manufacturing and Mechatronics Engineering Technology, Universiti Malaysia Pahang, 26600 Pahang, Malaysia.
Mohd Azraai Mohd Razman Universiti Malaysia Pahang
Anwar P. P. Abdul Majeed Faculty of Manufacturing and Mechatronics Engineering Technology, Universiti Malaysia Pahang, 26600 Pahang, Malaysia.
Wan Hasbullah Mohd Isa Faculty of Manufacturing and Mechatronics Engineering Technology, Universiti Malaysia Pahang, 26600 Pahang, Malaysia.

DOI:

https://doi.org/10.15282/mekatronika.v4i1.8625

Keywords:

Audio-Visual, Speech Recognition, Deep-Learning, Emotion, Spectrogram

Abstract

Deep learning is a technique with artificial intelligent (AI) that simulate humans’ learning behavior. Audio-visual speech recognition is important for the listener understand the emotions behind the spoken words truly. In this thesis, two different deep learning models, Convolutional Neural Network (CNN) and Deep Neural Network (DNN), were developed to recognize the speech’s emotion from the dataset. Pytorch framework with torchaudio library was used. Both models were given the same training, validation, testing, and augmented datasets. The training will be stopped when the training loop reaches ten epochs, or the validation loss function does not improve for five epochs. At the end, the highest accuracy and lowest loss function of CNN model in the training dataset are 76.50% and 0.006029 respectively, meanwhile the DNN model achieved 75.42% and 0.086643 respectively. Both models were evaluated using confusion matrix. In conclusion, CNN model has higher performance than DNN model, but needs to improvise as the accuracy of testing dataset is low and the loss function is high.

Downloads

Published

2022-06-27

Issue

Vol. 4 No. 1 (2022): January 2022

Section

Original Article

License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

How to Cite

[1]

C. K. How, I. Mohd Khairuddin, M. A. Mohd Razman, A. P. P. Abdul Majeed, and W. H. Mohd Isa, “Development of Audio-Visual Speech Recognition using Deep-Learning Technique”, Mekatronika : J. Intell. Manuf. Mechatron., vol. 4, no. 1, pp. 88–95, Jun. 2022, doi: 10.15282/mekatronika.v4i1.8625.

Download Citation

Most read articles by the same author(s)

Yong Chen How, Ahmad Fakhri Ab. Nasir, Khairul Fikri Muhammad, Anwar P. P. Abdul Majeed, Mohd Azraai Mohd Razman, Muhammad Aizzat Zakaria, Glove Defect Detection Via YOLO V5 , Mekatronika: Journal of Intelligent Manufacturing and Mechatronics: Vol. 3 No. 2 (2021): July 2021
Amirul Asyraf Abdul Manan, Mohd Azraai Mohd Razman, Ismail Mohd Khairuddin, Muhammad Nur Aiman Shapiee, Chili Plant Classification using Transfer Learning models through Object Detection , Mekatronika: Journal of Intelligent Manufacturing and Mechatronics: Vol. 2 No. 2 (2020): July 2020
Zi Ying Yip, Ismail Mohd Khairuddin, Wan Hasbullah Mohd Isa, Anwar P. P. Abdul Majeed, Muhammad Amirul Abdullah, Mohd Azraai Mohd Razman, Badminton Smashing Recognition through Video Performance by using Deep Learning , Mekatronika: Journal of Intelligent Manufacturing and Mechatronics: Vol. 4 No. 1 (2022): January 2022
Jothi Letchumy Mahendra Kumar, Mamunur Rashid, Rabiu Muazu Musa, Mohd Azraai Mohd Razman, Norizam Sulaiman, Rozita Jailani, Anwar PP Abdul Majeed, An Evaluation of Different Fast Fourier Transform - Transfer Learning Pipelines for the Classification of Wink-based EEG Signals , Mekatronika: Journal of Intelligent Manufacturing and Mechatronics: Vol. 2 No. 1 (2020): January 2020
Kiran Pandian, Mohd Azraai Mohd Razman, Ismail Mohd Khairuddin, Muhammad Amirul Abdullah, Ahmad Fakhri Ab Nasir, Wan Hasbullah Mat Isa, Sign Language Recognition using Deep Learning through LSTM and CNN , Mekatronika: Journal of Intelligent Manufacturing and Mechatronics: Vol. 5 No. 1 (2023): January 2023
Muhammad Syafi’i Mass Duki, Muhammad Nur Aiman Shapiee, Muhammad Amirul Abdullah, Ismail Mohd Khairuddin, Mohd Azraai Mohd Razman, Anwar P. P. Abdul Majeed, The Classification of Taekwondo Kicks Via Machine Learning: A feature selection investigation , Mekatronika: Journal of Intelligent Manufacturing and Mechatronics: Vol. 3 No. 1 (2021): January 2021
Suhaimi Puteh, Nurul Fadhilah Mohamed Rodzali, Mohd Azraai Mohd Razman, Zelina Zaiton Ibrahim, Muhammad Nur Aiman Shapiee, Mohd Azhar Mohd Razman, Features Extraction of Capsicum Frutescens (C.F) NDVI Values using Image Processing , Mekatronika: Journal of Intelligent Manufacturing and Mechatronics: Vol. 2 No. 1 (2020): January 2020
Jia Chern Teo, Ismail Mohd Khairuddin, Mohd Azraai Mohd Razman, Anwar P. P. Abdul Majeed, Wan Hasbullah Mohd Isa, Automated Detection of Knee Cartilage Region in X-ray Image , Mekatronika: Journal of Intelligent Manufacturing and Mechatronics: Vol. 4 No. 1 (2022): January 2022
Wai Kit Cheng , Ismail Mohd Khairuddin, Anwar P.P. Abdul Majeed, Mohd Azraai Mohd Razman, The Classification of Heart Murmurs: The Identification of Significant Time Domain Features , Mekatronika: Journal of Intelligent Manufacturing and Mechatronics: Vol. 2 No. 2 (2020): July 2020
Arzielah Ashiqin Alwi, Ahmad Najmuddin Ibrahim, Muhammad Nur Aiman Shapiee, Muhammad Ar Rahim Ibrahim, Mohd Azraai Mohd Razman, Ismail Mohd Khairuddin, Ball Classification through Object Detection using Deep Learning for Handball , Mekatronika: Journal of Intelligent Manufacturing and Mechatronics: Vol. 2 No. 2 (2020): July 2020

1 2 3 4 5 > >>

Development of Audio-Visual Speech Recognition using Deep-Learning Technique

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

License

How to Cite

Similar Articles

Most read articles by the same author(s)

sidebar