Investigation of Speech and Vocal Recognition Motion Techniques

Rajendra kumar Mahto; Anchal Kumari; Shweta Kumari

doi:10.1229/tecempresarialjournal.v19i1.419

Authors

Rajendra kumar Mahto Dr Shyama prasad mukherjee university, Ranchi, jharkhand
Mrs. Anchal Kumari Radha Govind University, Ramgarh
Mrs. Shweta Kumari Sarla Birla University, Ranchi

DOI:

https://doi.org/10.1229/tecempresarialjournal.v19i1.419

Keywords:

Human-computer interaction, gesture recognition, motion techniques, speech recognition, and voice recognition.

Abstract

Recent significant advancements in voice and speech recognition technologies have revolutionized human-computer interaction across multiple domains. While most traditional methods use audio signals for recognition, newer approaches incorporate motion data to improve user experience, robustness, and accuracy. With an emphasis on the incorporation of motion techniques, this research paper offers a thorough analysis of the most recent advancements in speech and voice recognition technology. We examine the development of voice recognition systems, going over their fundamental ideas and uses. We also explore the use of motion data to augment audio-based recognition techniques, such as gestures, facial expressions, and body movements. We examine cutting-edge motion-based algorithms and systems, emphasizing both their advantages and disadvantages. We also look at real-world applications in a variety of sectors, including virtual reality, gaming, healthcare, and the automobile industry.

References

Jurafsky, D., and Martin, J. H. (2009) are cited. Speech and language processing: An

overview of speech recognition, computational linguistics, and natural language processing. Pearson Education.

Hon, H. W., Acero, A., and Huang, X. (2001). A guide to theory, algorithms, and system development for spoken language processing. PTR for Prentice Hall.

Young, S., Liu, X., Kershaw, D., Evermann, G., Gales, M., Hain, T., & Woodland, P.

(2006). For HTK version 3.4, the HTK book. Cambridge University, Department of

Engineering.

Juang, B. H., and Rabiner, L. R. (1993). the principles of voice recognition. Pearson

Education.

Dahl (2012), Yu (2012), Deng (2012), and Acero (2012). Pre-trained deep neural networks

with context dependence for large-vocabulary speech recognition. IEEE Trans. on Speech,

Language, and Audio Processing, 20(1), 30-42.

Hinton, G., Jaitly, N., Mohamed, A. R., Dahl, G. E., Deng, L., Yu, D., & Kingsbury, B.

(2012). Four research groups' common views on deep neural networks for acoustic modeling

in speech recognition. Journal of IEEE Signal Processing, 29(6), 82–97.

Schmidhuber, J., Fernández, S., Gomez, F., and Graves, A. (2006). Recurrent neural

networks are used to label unsegmented sequence data in connectionist temporal

classification. In 23rd International Conference on Machine Learning Proceedings (pp. 369-

.

Cho, K., Bengio, Y., Serdyuk, D., Bahdanau, D., and Chorowski, J. (2015). Speech

recognition models based on attention. (pp. 577–585) in Advances in Neural Information

Processing Systems.

Salakhutdinov, R., and G. Hinton (2006). using neural networks to reduce the dimensionality

of data. 313(5786): 504-507 in Science.

Investigation of Speech and Vocal Recognition Motion Techniques

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

Subscription

Information

iNDEX