Continuous Bangla Speech Processing: Segmentation, Classification and Recognition
The goal of this research is to create a continuous voice recognition system that covers speech word segmentation, feature extraction, speech word classification, and recognition in Bangla. This study proposes four dynamic thresholding algorithms for segmenting continuous Bangla speech sentences into words and sub-words: I Algorithm-1 (based on modified k-means algorithm), (ii) Algorithm-2 (based on fuzzy-means algorithm), (iii) Algorithm-3 (based on modified Otsu’s algorithm), and (iv) Algorithm-4 (based on modified Otsu’s algorithm) (short-time speech features based algorithm). This research also introduces a new method for identifying the voiced portions of continuous speech in speech segmentation called the blocking black area method.
According to the amount of syllables in the segmented words, they are divided into several classes. This study provides a time-saving classification method called syllable-based classification for speech categorization. Speech spectrogram features and short-time speech features were evaluated during feature extraction. This study suggests three forms of speech features for feature generation: I short-time speech features, (ii) binary features, and (iii) MFCC features. Short-time speech characteristics and binary features are both employed in speech segmentation and recognition. Various windowing functions have been used in the creation of MFCC features. A comprehensive study on neural networks and performance analysis with various improved and faster back-propagation (BP) algorithms (such as BP with momentum, variable learning rate BP, resilient BP, conjugate gradient BP, and Levenberg-Marquardt BP algorithms) has been conducted for speech recognition. To design, train, and simulate the feedforward neural network with the BP learning algorithm, the Matlab Neural Network Toolbox 9.8.0 (R2020a) was utilised. The traditional BP algorithm’s convergence is rather sluggish, which is why this paper provides several better and faster BP methods to handle voice recognition difficulties.
Several Bangla words were continually delivered to justify the produced system. 100 (one hundred) well-defined Bangla sentences were recorded from 5 (five) male speakers of various ages to test the system’s performance, and 656 words were provided in the 100 Bangla sentences. As a result, the speech database has 500 Bangla speech sentences including 3,280 speech terms. With Algorithm-1 (based on modified k-means algorithm), 96.19 percent with Algorithm-2 (based on fuzzy-means algorithm), 90.58 percent with Algorithm-3 (based on modified Otsu’s algorithm), and 95.9 percent with Algorithm-4 (based on short-time speech features based algorithm), the segmentation system achieved an average segmentation accuracy of 95.55 percent. The classification method has a 91.42 percent accuracy rate on average. For recognising segmented voice words, the recognition system achieved a recognition rate of 83 percent using the robust BP algorithm, 90 percent using the conjugate gradient BP algorithm, and 90 percent using the Levenberg-Marquardt BP algorithm, respectively.
M. M. Rahman
Department of Computer Science and Engineering, Jatiya Kabi Kazi Nazrul Islam University, Bangladesh.
View Book:- https://stm.bookpi.org/CBSPSCR/article/view/5959