版權(quán)說(shuō)明:本文檔由用戶(hù)提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
1、ScienceDirectAvailable online at www.sciencedirect.comProcedia Computer Science 128 (2018) 32–371877-0509 © 2018 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (
2、https://creativecommons.org/licenses/by-nc-nd/4.0/) Selection and peer-review under responsibility of the scientific committee of the International Conference on Natural Language and Speech Processing. 10.1016/j.procs.
3、2018.03.005© 2018 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) Selection and peer-review under responsibi
4、lity of the scientific committee of the International Conference on Natural Language and Speech Processing. Keywords: Automatic Speech Recognition; ASR Error Detection; ASR Error Correction; ASR evaluation;1. Introducti
5、onAutomatic Speech Recognition (ASR) systems aims at converting a speech signal into a sequence of words either for text-based communication purposes or for device controlling. The purpose of evaluating ASR systems is to
6、 simu- late human judgement of the performance of the systems in order to measure their usefulness and assess the remaining difficulties and especially when comparing systems. The standard metric of ASR evaluation is the
7、 Word Error Rate, which is defined as the proportion of word errors to words processed. ASR has matured to the point of commercial applications by providing transcription with an acceptable level of performance which all
8、ows integration into many applications. In general, ASR systems are effective when the con- ditions are well controlled. Nevertheless, they are too dependent on the task being performed and the results are far from ideal
9、, and especially for Large Vocabulary Continuous Speech Recognition (LVCSR) applications. This later still one of the most challenging tasks in the field, due to a number of factors, including poor articulation, variable
10、? Corresponding author. Tel.: +212-523-344-822 ; fax: +212-523-394-915. E-mail address: errattahi.r@ucd.ac.maAbstractEven though Automatic Speech Recognition (ASR) has matured to the point of commercial applications, hig
11、h error rate in some speech recognition domains remain as one of the main impediment factors to the wide adoption of speech technology, and especially for continuous large vocabulary speech recognition applications. The
12、persistent presence of ASR errors have intensified the need to find alternative techniques to automatically detect and correct such errors. The correction of the transcription errors is very crucial not only to improve t
13、he speech recognition accuracy, but also to avoid the propagation of the errors to the subsequent language processing modules such as machine translation. In this paper, basic principles of ASR evaluation are first summa
14、rized, and then the state of the current ASR errors detection and correction research is reviewed. We focus on emerging techniques using word error rate metric.International Conference on Natural Language and Speech Proc
15、essing, ICNLSP 2015Automatic Speech Recognition Errors Detection and Correction: A ReviewRahhal Errattahia,?, Asmaa El Hannania, Hassan OuahmaneaaLaboratory of Information Technologies, National School of Applied Science
16、s, University of Chouaib Doukkali, El Jadida - Morocco34 Rahhal Errattahi et al. / Procedia Computer Science 128 (2018) 32–37A key practical issue with ASR evaluation metrics calculation is finding the word alignment b
17、etween the reference and the automatic transcription, which constitute the first step in the evaluation procedure. In other words, the reference and recognised words get matched in order to decide which word have been de
18、leted or inserted, and which reference- recognised string pairs have been aligned to each other, which may result in a hit or a substitution. This is normally done by using the Viterbi Edit Distance [17] to efficiently s
19、elect the reference and the recognised word sequence alignment for which the weighted error score is minimized. The Edit Distance usually aligns an identical weights (1 for the Levensthein distance) to all three, inserti
20、on, substitution and deletion. Yet, unified weights may present a doubt to choose the best path alignment in the case when we have different ones which have the same score. To avoid this problem Morris et al. [12] sugges
21、t using different weights, such that substitution will be favoured than insertion and deletion. In general, it’s recommended to put WI = WD , and WS < WI + WS . Where WI, WS and WD are respectively the weight of inser
22、tion, substitution, and deletion.2.3. ASR Evaluation MetricsAccording to McCowan et al. [11] an ideal ASR evaluation metric should be: (i) Direct; measure ASR component independently on the ASR application, (ii) Objectiv
23、e; the measure should be calculated in an automated manner,(iii) Interpretable; the absolute value of the measure must give an idea about the performance, and (iv) Modular; the evaluation measure should be general to all
24、ow thorough application-dependent analysis. Word Error Rate (WER) is the most popular metric for ASR evaluation, it measures the percentage of incorrect words (Substitutions (S), Insertions (I), Deletions (D)) regarding
25、the total number of words processed. It is defined asWER = S + D + IN1 = S + D + IH + S + D (1)where I = total number of insertions, D = total number of deletions, S = total number of substitutions, H = total number of h
26、its, and N1 = total number of input words. Despite of being the most commonly used, WER has many shortcomings [10]. First of all, WER is not a true percentage because it has no upper bound, so it doesn’t tell you how goo
27、d a system is, but only that one is better than another. Moreover, WER is not D/I symmetric, so in noisy conditions WER could exceed 100%, for the fact that it gives far more weight to insertions than to deletions. The W
28、ER still effective for speech recognition where errors can be corrected by typing, such as, dictation. However, for almost any other type of speech recognition systems, where the goal is more than transcription, it is ne
29、cessary to look for an alternative, or additional, evaluation framework. Many researchers have proposed alternative measures to solve the evident limitations of WER. In [12] Andrew et al. introduced two information theor
30、etic measures of word information communicated. The first one, named Relative Information Lost (RIL), is based on Mutual Information (I, or MI) [7], which measures the statistical dependence between the input words X and
31、 output words Y, and is calculated using the Shannon Entropy H as follow:RIL = H(Y|X)H(Y) (2)withH(Y) = ?n ?i=1 P(yi)logP(yi) (3)andH(X|Y) = ? ?i, j P(xi, yj)logP(xi, yj) (4)Nevertheless, the RIL still too far from an ad
溫馨提示
- 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶(hù)所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 眾賞文庫(kù)僅提供信息存儲(chǔ)空間,僅對(duì)用戶(hù)上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶(hù)上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶(hù)因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- [雙語(yǔ)翻譯]語(yǔ)音識(shí)別外文翻譯--自動(dòng)語(yǔ)音識(shí)別錯(cuò)誤檢測(cè)與糾正綜述
- 2018年語(yǔ)音識(shí)別外文翻譯--自動(dòng)語(yǔ)音識(shí)別錯(cuò)誤檢測(cè)與糾正綜述(英文).PDF
- 2018年語(yǔ)音識(shí)別外文翻譯--自動(dòng)語(yǔ)音識(shí)別錯(cuò)誤檢測(cè)與糾正綜述
- 2018年語(yǔ)音識(shí)別外文翻譯--自動(dòng)語(yǔ)音識(shí)別錯(cuò)誤檢測(cè)與糾正綜述.DOCX
- 語(yǔ)音識(shí)別外文文獻(xiàn)翻譯
- 語(yǔ)音識(shí)別的翻譯
- 語(yǔ)音識(shí)別文獻(xiàn)翻譯
- [雙語(yǔ)翻譯]人臉識(shí)別外文翻譯—人臉識(shí)別技術(shù)綜述(節(jié)選)
- 外文翻譯--基于語(yǔ)音識(shí)別的智能門(mén)控系統(tǒng)設(shè)計(jì)(英文)
- [雙語(yǔ)翻譯]人臉識(shí)別外文翻譯—人臉識(shí)別技術(shù)綜述(原文)
- [雙語(yǔ)翻譯]人臉識(shí)別外文翻譯—人臉識(shí)別技術(shù)綜述(原文).PDF
- [雙語(yǔ)翻譯]人臉識(shí)別外文翻譯—人臉識(shí)別技術(shù)綜述中英全
- [雙語(yǔ)翻譯]人臉識(shí)別外文翻譯—人臉識(shí)別技術(shù)綜述(節(jié)選).DOCX
- 外文翻譯--基于網(wǎng)絡(luò)的自動(dòng)語(yǔ)音識(shí)別能度語(yǔ)言模型
- [雙語(yǔ)翻譯]--(節(jié)選)外文翻譯--外文翻譯--一種新的自動(dòng)調(diào)制識(shí)別的方法(英文)
- 語(yǔ)音識(shí)別的綜述【文獻(xiàn)綜述】
- 外文翻譯--基于語(yǔ)音識(shí)別的智能門(mén)控系統(tǒng)設(shè)計(jì)
- 機(jī)器人語(yǔ)音識(shí)別算法的研究外文翻譯
- 外文翻譯--基于語(yǔ)音識(shí)別的智能門(mén)控系統(tǒng)設(shè)計(jì)
- 外文翻譯--基于語(yǔ)音識(shí)別的智能門(mén)控系統(tǒng)設(shè)計(jì)
評(píng)論
0/150
提交評(píng)論