版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領
文檔簡介
1、2950 英文單詞, 英文單詞,1.6 萬英文字符,中文 萬英文字符,中文 4900 字文獻出處: 文獻出處:Errattahi R , Hannani A E , Ouahmane H . Automatic Speech Recognition Errors Detection and Correction: A Review[J]. Procedia Computer Science, 2018, 128:32-37.Automa
2、tic Speech Recognition Errors Detection and Correction: A ReviewRahhal Errattahi, Asmaa El Hannani, Hassan OuahmaneAbstractEven though Automatic Speech Recognition (ASR) has matured to the point of commercial application
3、s, high error rate in some speech recognition domains remain as one of the main impediment factors to the wide adoption of speech technology, and especially for continuous large vocabulary speech recognition applications
4、. The persistent presence of ASR errors have intensified the need to find alternative techniques to automatically detect and correct such errors. The correction of the transcription errors is very crucial not only to imp
5、rove the speech recognition accuracy, but also to avoid the propagation of the errors to the subsequent language processing modules such as machine translation. In this paper, basic principles of ASR evaluation are first
6、 summarized, and then the state of the current ASR errors detection and correction research is reviewed. We focus on emerging techniques using word error rate metric.Keywords: Automatic Speech Recognition; ASR Error Dete
7、ction; ASR Error Correction; ASR evaluation;1. IntroductionAutomatic Speech Recognition (ASR) systems aims at converting a speech signal into a sequence of words either for text-based communication purposes or for device
8、 controlling. The purpose of evaluating ASR systems is to simulate human judgement of the performance of the systems in order to measure their usefulness and assess the remaining difficulties and especially when comparin
9、g systems. The standard metric of ASR evaluation is the Word Error Rate, which is defined as the proportion of word errors to words processed.ASR has matured to the point of commercial applications by providing transcrip
10、tion with an acceptable level of performance which allows integration into many applications. In general, ASR systems are effective when the conditions are well controlled. Nevertheless, they are too dependent on the tas
11、k being performed and the results are far from ideal, and especially for Large Vocabulary Continuous Speech Recognition (LVCSR) applications. This later still one of the most challenging tasks in the field, due to a numb
12、er of factors, including poor articulation, variable speaking rate and high degree of acoustic variability caused by noise, side-speech, accents, sloppy pronunciation, hesitation, repetition, interruptions and channel mi
13、smatch, and/or distortions. To deal with all these problems, there has been a plethora of algorithms and technologies proposed by the scientific communities for all steps of LVCSR over the last decade: pre-processing, fe
14、ature extraction, acoustic modeling, language modeling, decoding and result post-processing. Nevertheless LVCSR systems are not yet robust with error rates of up to 50% under certain conditions [21],[8].The persistent pr
15、esence of ASR errors motivates the attempt to find alternative techniques to assist users in correcting the transcription errors or to totally automate the correction process. evaluation procedure. In other words, the re
16、ference and recognised words get matched in order to decide which word have been deleted or inserted, and which reference- recognised string pairs have been aligned to each other, which may result in a hit or a substitut
17、ion.This is normally done by using the Viterbi Edit Distance [17] to efficiently select the reference and the recognised word sequence alignment for which the weighted error score is minimized. The Edit Distance usually
18、aligns an identical weights (1 for the Levensthein distance) to all three, insertion, substitution and deletion. Yet, unified weights may present a doubt to choose the best path alignment in the case when we have differe
19、nt ones which have the same score.To avoid this problem Morris et al. [12] suggest using different weights, such that substitution will be favoured than insertion and deletion. In general, it’s recommended to put WI = WD
20、 , and WS < WI + WS . Where WI , WS and WD are respectively the weight of insertion, substitution, and deletion.2.3. ASR Evaluation MetricsAccording to McCowan et al. [11] an ideal ASR evaluation metric should be: (i)
21、 Direct; measure ASR component independently on the ASR application, (ii) Objective; the measure should be calculated in an automated manner,(iii) Interpretable; the absolute value of the measure must give an idea about
22、the performance, and (iv) Modular; the evaluation measure should be general to allow thorough application-dependent analysis.Word Error Rate (WER) is the most popular metric for ASR evaluation, it measures the percentage
23、 of incorrect words (Substitutions (S), Insertions (I), Deletions (D)) regarding the total number of words processed. It is defined asWER = =(1)𝑆 + 𝐷 + 𝐼𝑁1𝑆 + 𝐷 + w
24、868;𝐻 + 𝑆 + 𝐷where I = total number of insertions, D = total number of deletions, S = total number of substitutions, H = total number of hits, and N1 = total number of input words.Despite of bei
25、ng the most commonly used, WER has many shortcomings [10]. First of all, WER is not a true percentage because it has no upper bound, so it doesn’t tell you how good a system is, but only that one is better than another.
26、Moreover, WER is not D/I symmetric, so in noisy conditions WER could exceed 100%, for the fact that it gives far more weight to insertions than to deletions.The WER still effective for speech recognition where errors can
27、 be corrected by typing, such as, dictation. However, for almost any other type of speech recognition systems, where the goal is more than transcription, it is necessary to look for an alternative, or additional, evaluat
28、ion framework.Many researchers have proposed alternative measures to solve the evident limitations of WER. In [12] Andrew et al. introduced two information theoretic measures of word information communicated. The first o
29、ne, named Relative Information Lost (RIL), is based on Mutual Information (I, or MI) [7], which measures the statistical dependence between the input words X and output words Y, and is calculated using the Shannon Entrop
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
- 4. 未經(jīng)權益所有人同意不得將文件中的內容挪作商業(yè)或盈利用途。
- 5. 眾賞文庫僅提供信息存儲空間,僅對用戶上傳內容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
- 6. 下載文件中如有侵權或不適當內容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- [雙語翻譯]語音識別外文翻譯--自動語音識別錯誤檢測與糾正綜述(英文)
- 2018年語音識別外文翻譯--自動語音識別錯誤檢測與糾正綜述
- 2018年語音識別外文翻譯--自動語音識別錯誤檢測與糾正綜述.DOCX
- 2018年語音識別外文翻譯--自動語音識別錯誤檢測與糾正綜述(英文).PDF
- 語音識別外文文獻翻譯
- 語音識別的翻譯
- 語音識別文獻翻譯
- [雙語翻譯]人臉識別外文翻譯—人臉識別技術綜述(節(jié)選)
- [雙語翻譯]人臉識別外文翻譯—人臉識別技術綜述(原文)
- [雙語翻譯]人臉識別外文翻譯—人臉識別技術綜述(原文).PDF
- [雙語翻譯]人臉識別外文翻譯—人臉識別技術綜述中英全
- [雙語翻譯]人臉識別外文翻譯—人臉識別技術綜述(節(jié)選).DOCX
- 外文翻譯--基于網(wǎng)絡的自動語音識別能度語言模型
- 語音識別的綜述【文獻綜述】
- 外文翻譯--基于語音識別的智能門控系統(tǒng)設計
- 機器人語音識別算法的研究外文翻譯
- 外文翻譯--基于語音識別的智能門控系統(tǒng)設計
- 外文翻譯--基于語音識別的智能門控系統(tǒng)設計
- 基于語音識別和語音播報設計綜述【文獻綜述】
- 外文翻譯--基于語音識別的智能門控系統(tǒng)設計(英文)
評論
0/150
提交評論