Development of Audio-Visual Speech Recognition using Deep-Learning Technique

Chun Kit How; Ismail Mohd Khairuddin; Mohd Azraai Mohd Razman; Anwar P. P. Abdul Majeed; Wan Hasbullah Mohd Isa

doi:10.15282/mekatronika.v4i1.8625

Development of Audio-Visual Speech Recognition using Deep-Learning Technique

Authors

Chun Kit How Faculty of Manufacturing and Mechatronics Engineering Technology, Universiti Malaysia Pahang, 26600 Pahang, Malaysia.
Ismail Mohd Khairuddin Faculty of Manufacturing and Mechatronics Engineering Technology, Universiti Malaysia Pahang, 26600 Pahang, Malaysia.
Mohd Azraai Mohd Razman Universiti Malaysia Pahang
Anwar P. P. Abdul Majeed Faculty of Manufacturing and Mechatronics Engineering Technology, Universiti Malaysia Pahang, 26600 Pahang, Malaysia.
Wan Hasbullah Mohd Isa Faculty of Manufacturing and Mechatronics Engineering Technology, Universiti Malaysia Pahang, 26600 Pahang, Malaysia.

DOI:

https://doi.org/10.15282/mekatronika.v4i1.8625

Keywords:

Audio-Visual, Speech Recognition, Deep-Learning, Emotion, Spectrogram

Abstract

Deep learning is a technique with artificial intelligent (AI) that simulate humans’ learning behavior. Audio-visual speech recognition is important for the listener understand the emotions behind the spoken words truly. In this thesis, two different deep learning models, Convolutional Neural Network (CNN) and Deep Neural Network (DNN), were developed to recognize the speech’s emotion from the dataset. Pytorch framework with torchaudio library was used. Both models were given the same training, validation, testing, and augmented datasets. The training will be stopped when the training loop reaches ten epochs, or the validation loss function does not improve for five epochs. At the end, the highest accuracy and lowest loss function of CNN model in the training dataset are 76.50% and 0.006029 respectively, meanwhile the DNN model achieved 75.42% and 0.086643 respectively. Both models were evaluated using confusion matrix. In conclusion, CNN model has higher performance than DNN model, but needs to improvise as the accuracy of testing dataset is low and the loss function is high.

Downloads

Published

2022-06-27

Issue

Vol. 4 No. 1 (2022): January 2022

Section

Original Article

License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

How to Cite

[1]

C. K. How, I. Mohd Khairuddin, M. A. Mohd Razman, A. P. P. Abdul Majeed, and W. H. Mohd Isa, “Development of Audio-Visual Speech Recognition using Deep-Learning Technique”, Mekatronika : J. Intell. Manuf. Mechatron., vol. 4, no. 1, pp. 88–95, Jun. 2022, doi: 10.15282/mekatronika.v4i1.8625.

Download Citation

Most read articles by the same author(s)

Branden Adems Anak Kiethson, Ismail Mohd Khairuddin, Mohd Azraai Mohd Razman, Anwar P. P. Abdul Majeed, Muhammad Amirul Abdullah, Wan Hasbullah Mohd Isa, Artificial Intelligence Approach For Fire Monitoring and Warning System Design , Mekatronika: Journal of Intelligent Manufacturing and Mechatronics: Vol. 4 No. 2 (2022): July 2022
Abdul Haleem Habeeb Mohamed, Muhammad Aizzat Zakaria, Mohd Azraai Mohd Razman, Anwar P. P. Abdul Majeed, Mohamed Heerwan Peeie, Rain Classification for Autonomous Vehicle Navigation : A Support Vector Machine Approach , Mekatronika: Journal of Intelligent Manufacturing and Mechatronics: Vol. 2 No. 2 (2020): July 2020
Muhamad Aliff Imran Daud, Asmarani Ahmad Puzi, Shahrul Na’im Sidek, Salmah Anim Abu Hassan, Ahmad Anwar Zainuddin, Ismail Mohd Khairuddin, Mohd Azri Abdul Mutalib, Mechanomyography in Assessing Muscle Spasticity: A Systematic Literature Review , Mekatronika: Journal of Intelligent Manufacturing and Mechatronics: Vol. 6 No. 1 (2024): January 2024
Danial Haziq Rizal, Wan Hasbullah Mohd Isa, Muhammad Amirul Abdullah, Anwar P.P. Abdul Majeed, Norasmiza Mohd, Effects of Varied Planar Dimensions of IPMC on Simulated Actuation using COMSOL , Mekatronika: Journal of Intelligent Manufacturing and Mechatronics: Vol. 5 No. 2 (2023): July 2023
Weng Zhen Lim, Norasmiza Mohd, Anwar P. P. Abdul Majeed, Mohd Azraai Mohd Razman, Yin Goon Koon, Screw Absence Classification on Aluminum Plate via Features Based Transfer Learning Models , Mekatronika: Journal of Intelligent Manufacturing and Mechatronics: Vol. 5 No. 1 (2023): January 2023
Lim Weng Zhen, Anwar P.P Abdul Majeed, Mohd Azraai Mohd Razman, Ahmad Fakhri Ab. Nasir, The Condition Based Monitoring for Bearing Health , Mekatronika: Journal of Intelligent Manufacturing and Mechatronics: Vol. 2 No. 1 (2020): January 2020
Farhan Nabil Mohd Noor, Wan Hasbullah Mohd Isa, Anwar P.P. Abdul Majeed, The Diagnosis Of Diabetic Retinopathy By Means Of Transfer Learning With Conventional Machine Learning Pipeline , Mekatronika: Journal of Intelligent Manufacturing and Mechatronics: Vol. 2 No. 2 (2020): July 2020
Yee Zhing Liew, Anwar P. P. Abdul Majeed, Formulation of A Deep Learning Model for Automated Detection Via Segmentation of Lung Cancer , Mekatronika: Journal of Intelligent Manufacturing and Mechatronics: Vol. 6 No. 1 (2024): January 2024
Mohd Rais Hakim Ramlee, Ismail Mohd Khairuddin, Zubaidah Zamri, Muhammad Nur Aiman Shapiee, Muhammad Amirul Abdullah, Pill Recognition via Deep Learning Approaches , Mekatronika: Journal of Intelligent Manufacturing and Mechatronics: Vol. 6 No. 2 (2024): July 2024
Amiir Haamzah Mohamed Ismail, Anwar P. P. Abdul Majeed, Keras Implementation in Detecting Intracranial Hemorrhage and Multiclass Classification of Subtypes via Transfer Learning and Classifiers Selection , Mekatronika: Journal of Intelligent Manufacturing and Mechatronics: Vol. 6 No. 2 (2024): July 2024

<< < 1 2 3 4 5 > >>

Development of Audio-Visual Speech Recognition using Deep-Learning Technique

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

License

How to Cite

Similar Articles

Most read articles by the same author(s)

sidebar