论文部分内容阅读
Language and speech are the most important and direct ways of human communication,and they have an irreplaceable role in our daily life.With the development of deep learning and the continuous advancement of artificial intelligence technology,peoples requirements for speech recognition are getting higher and higher,which has led to a series of research and development for speech recognition systems.Deep Learning(DL),as the most concerned machine learning model in recent years,has achieved amazing results in many fields such as speech recognition,image processing and so on.As these days where the newest technologies are widely being used in every developed country,the Cyrillic Mongolian speech recognition system has a significant purpose to be designed and to use in Mongolia.There are not only a large number of synonyms and homonyms in the Mongolian language but also complicated grammar and question structures.These factors need to be taken into account in the process of speech recognition.The training is difficult and the recognition effect is not ideal.At present,in the field of speech recognition,more and more acoustical models are constructed by neural networks and are studied in depth.Among them,Deep Neural Network(DNN)is the mainstream acoustic model.
The purpose of this study was to establish high-efficiency thyroid ultrasound report generation based on Cyrillic Mongolian speech recognition,aiming to improve Mongolian medical field technology status.We suggest that the Cyrillic Mongolian speech input system for the Thyroid Ultrasound report mainly consists of a speech input system,Cyrillic Mongolian speech recognition system,and thyroid ultrasound report.The system generates digital speech by collecting the voice data during the doctors examination through a microphone,and then the Cyrillic Mongolian speech recognition system will convert the speech data into text and the formed text will be output in the form of a report.As for the Cyrillic Mongolian speech recognition system based on convolution neural network:Convolution Neural Network(CNN)has a unique convolution pooling layer,which can reduce the number of parameters in the training process,better deal with a large number of the Cyrillic Mongolian data processing process,reduce the complexity of the model,and more suitable for Mongolian speech recognition process.Therefore,to improve the accuracy of the Cyrillic Mongolian speech recognition,the Cyrillic Mongolian speech recognition system based on deep convolution neural network acoustic model was designed and constructed.Study results show that:
(1)Aiming at the phenomenon of mandatory alignment of speech in the training process of traditional acoustic models,combined with the end-to-end structure,an end-to-end convolutional neural network(CTC-CNN)acoustic model was proposed to optimize the likelihood of input and output sequences.The experimental results show that the error rate of the Cyrillic Mongolian speech recognition system based on the CTC-CNN acoustic model is17.7%.Compared with the Cyrillic Mongolian speech recognition system based on CNN acoustic model,the accuracy is improved by1.2%.
(2)In the CTC-CNN model,CNN is a two-layer convolution structure with shallow layers.The recognition effect of the shallow convolution neural network model is limited.To further improve the accuracy,an end-to-end depth convolution neural network(CTC-DCNN)model was designed based on the residual block structure.The model gradient disappearance phenomenon is improved by maxout function optimization.A new improved acoustic model of end-to-end deep convolution neural network(CTC-DCNN optimization)was proposed to improve the accuracy and modeling ability of the network.The experimental results show that compared with the CNN model,this model has a4%t04.7%reduction in word error rate in speech recognition.
The purpose of this study was to establish high-efficiency thyroid ultrasound report generation based on Cyrillic Mongolian speech recognition,aiming to improve Mongolian medical field technology status.We suggest that the Cyrillic Mongolian speech input system for the Thyroid Ultrasound report mainly consists of a speech input system,Cyrillic Mongolian speech recognition system,and thyroid ultrasound report.The system generates digital speech by collecting the voice data during the doctors examination through a microphone,and then the Cyrillic Mongolian speech recognition system will convert the speech data into text and the formed text will be output in the form of a report.As for the Cyrillic Mongolian speech recognition system based on convolution neural network:Convolution Neural Network(CNN)has a unique convolution pooling layer,which can reduce the number of parameters in the training process,better deal with a large number of the Cyrillic Mongolian data processing process,reduce the complexity of the model,and more suitable for Mongolian speech recognition process.Therefore,to improve the accuracy of the Cyrillic Mongolian speech recognition,the Cyrillic Mongolian speech recognition system based on deep convolution neural network acoustic model was designed and constructed.Study results show that:
(1)Aiming at the phenomenon of mandatory alignment of speech in the training process of traditional acoustic models,combined with the end-to-end structure,an end-to-end convolutional neural network(CTC-CNN)acoustic model was proposed to optimize the likelihood of input and output sequences.The experimental results show that the error rate of the Cyrillic Mongolian speech recognition system based on the CTC-CNN acoustic model is17.7%.Compared with the Cyrillic Mongolian speech recognition system based on CNN acoustic model,the accuracy is improved by1.2%.
(2)In the CTC-CNN model,CNN is a two-layer convolution structure with shallow layers.The recognition effect of the shallow convolution neural network model is limited.To further improve the accuracy,an end-to-end depth convolution neural network(CTC-DCNN)model was designed based on the residual block structure.The model gradient disappearance phenomenon is improved by maxout function optimization.A new improved acoustic model of end-to-end deep convolution neural network(CTC-DCNN optimization)was proposed to improve the accuracy and modeling ability of the network.The experimental results show that compared with the CNN model,this model has a4%t04.7%reduction in word error rate in speech recognition.