Identifikasi Fitur Suara Menggunakan Model Convolutional Neural Network (CNN) pada Speech-to-Text (STT)
DOI:
https://doi.org/10.51454/decode.v4i3.631Keywords:
Convolutional Neural Network, Kata, Speech-to-Text, SuaraAbstract
Identifikasi pola ucapan dilakukan untuk dapat mengenali kata yang diucapkan. Salah satu metode yang dapat digunakan untuk mengidentifikasi Speech-to-Text (STT) adalah dengan menggunakan Convolutional Neural Network (CNN). Penelitian ini menggunakan metode CNN untuk mengidentifikasi STT pada raw speech dari sejumlah 23000 data dari open dataset suara Kaggle. Tahap awal dilakukan resampling durasi, untuk mengambil data rekaman yang memiliki durasi yang cukup untuk masuk dalam proses selanjutnya yaitu inisialisasi frekuensi. Tahap ini mengubah frekuensi asli dari suara rekaman. Inisialisasi dilakukan dengan mengubah frekuensi dari 16000Hz menjadi rentang 8000Hz. Tahap selanjutnya pelabelan data, yaitu data input dan output diberi label untuk klasifikasi sebagai dasar pembelajaran untuk pemrosesan data. Data yang sudah dilabeli kemudian dilakukan pembagian kedalam rasio 8:2. Tahap Akhir Perancangan arsitektur model CNN dilakukan untuk dapat mengenali pola suara yang sudah direkam pada dataset dan dapat mengidentifikasi ucapan. Hasil penelitian bertujuan untuk mengidentifikasi pola suara yang diucapkan dengan akurasi tinggi.
References
Arul Edwin Raj, A., Karan Kumar, B., Shajivan, S., & Rohit, A. (2023). Speech Emotion Recognition using Deep Learning. International Conference on Innovative Data Communication Technologies and Application, ICIDCA 2023 - Proceedings, 6(3), 505–509. https://doi.org/10.1109/ICIDCA56705.2023.10100056
Chandel, G., Matete, E., Nandy, T., Gaur, V., & Kumar Saini, S. (2023). Ambient Sound Recognition using Convolutional Neural Networks. E3S Web of Conferences, 405. https://doi.org/10.1051/e3sconf/202340502017
Dhar, S., Sen, A., Bandyopadhyay, A., Jana, N. D., Ghosh, A., & Sarayloo, Z. (2023). Differential Evolution Algorithm Based Hyper-Parameters Selection of Convolutional Neural Network for Speech Command Recognition. International Joint Conference on Computational Intelligence, 315-322. https://doi.org/10.5220/0012251500003595
Diep, Q. B., Phan, H. Y., & Truong, T. C. (2024). Crossmixed convolutional neural network for digital speech recognition. PLoS ONE, 19(4), 1-22. https://doi.org/10.1371/journal.pone.0302394
Huy, N. B. (2023). Raw Audio Dataset. https://www.kaggle.com/datasets/nguyenbahuy/raw-audio-data
Ibrahim, W., Candra, H., & Isyanto, H. (2022). Voice Recognition Security Reliability Analysis Using Deep Learning Convolutional Neural Network Algorithm. Journal of Electrical Technology UMY, 6(1), 1–11. https://doi.org/10.18196/jet.v6i1.14281
Jha, A., Gupta, S., Dubey, P., & Chhabria, A. (2022). Music Feature Extraction And Recommendation Using CNN Algorithm. ITM Web of Conferences, 44, 03026. https://doi.org/10.1051/itmconf/20224403026
Kaur, G., Srivastava, M., & Kumar, A. (2018). Speaker and Speech Recognition using Deep Neural Network. International Journal of Emerging Research in Management and Technology, 6(8), 118. https://doi.org/10.23956/ijermt.v6i8.126
Kheddar, H., Hemis, M., & Himeur, Y. (2024). Automatic speech recognition using advanced deep learning approaches: A survey. Information Fusion, 109. https://doi.org/10.1016/j.inffus.2024.102422
Kubanek, M., Bobulski, J., & Kulawik, J. (2019). A method of speech coding for speech recognition using a convolutional neural network. Symmetry, 11(9), 1-12. https://doi.org/10.3390/sym11091185
Monson, B. B., Hunter, E. J., Lotto, A. J., & Story, B. H. (2014). The perceptual significance of high-frequency energy in the human voice. Frontiers in Psychology, 5(JUN), 1-11. https://doi.org/10.3389/fpsyg.2014.00587
Nagajyothi, D., & Siddaiah, P. (2018). Speech recognition using convolutional neural networks. International Journal of Engineering and Technology(UAE), 7(4.6 Special Issue 6), 133-137. https://doi.org/10.14419/ijet.v7i4.6.20449
Pratibha Rashmi, & Manu Pratap Singh. (2023). Convolution neural networks with hybrid feature extraction methods for classification of voice sound signals. World Journal of Advanced Engineering Technology and Sciences, 8(2), 110–125. https://doi.org/10.30574/wjaets.2023.8.2.0083
Purwono, Ma’arif, A., Rahmaniar, W., Fathurrahman, H. I. K., Frisky, A. Z. K., & Haq, Q. M. U. (2022). Understanding of Convolutional Neural Network (CNN): A Review. International Journal of Robotics and Control Systems, 2(4), 739-748. https://doi.org/10.31763/ijrcs.v2i4.888
Sung, W. T., Kang, H. W., & Hsiao, S. J. (2023). Speech Recognition via CTC-CNN Model. Computers, Materials and Continua, 76(3), 3833-3858. https://doi.org/10.32604/cmc.2023.040024
Wang, F., Hao, M., Shi, Y., & Xu, B. (2024). Lead ASR Models to Generalize Better Using Approximated Bias-Variance Tradeoff. Communications in Computer and Information Science, 1961 CCIS, 174-185. https://doi.org/10.1007/978-981-99-8126-7_14
Yao, Z., Ren, S., Chen, S., Ma, Z., Guo, P., & Xie, L. (2022). TESSP: Text-Enhanced Self-Supervised Speech Pre-training. http://arxiv.org/abs/2211.13443
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Rodiah, Diana Tri Susetianingtias, Eka Patriya

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.