2023年全國(guó)碩士研究生考試考研英語(yǔ)一試題真題(含答案詳解+作文范文)_第1頁(yè)
已閱讀1頁(yè),還剩17頁(yè)未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

1、<p><b>  語(yǔ)音識(shí)別</b></p><p>  在計(jì)算機(jī)技術(shù)中,語(yǔ)音識(shí)別是指為了達(dá)到說(shuō)話者發(fā)音而由計(jì)算機(jī)生成的功能,利用計(jì)算機(jī)識(shí)別人類語(yǔ)音的技術(shù)。(例如,抄錄講話的文本,數(shù)據(jù)項(xiàng);經(jīng)營(yíng)電子和機(jī)械設(shè)備;電話的自動(dòng)化處理) ,是通過(guò)所謂的自然語(yǔ)言處理的計(jì)算機(jī)語(yǔ)音技術(shù)的一個(gè)重要元素。通過(guò)計(jì)算機(jī)語(yǔ)音處理技術(shù),來(lái)自語(yǔ)音發(fā)音系統(tǒng)的由人類創(chuàng)造的聲音,包括肺,聲帶和舌頭,通過(guò)接觸,語(yǔ)音模式

2、的變化在嬰兒期、兒童學(xué)習(xí)認(rèn)識(shí)有不同的模式,盡管由不同人的發(fā)音,例如,在音調(diào),語(yǔ)氣,強(qiáng)調(diào),語(yǔ)調(diào)模式不同的發(fā)音相同的詞或短語(yǔ),大腦的認(rèn)知能力,可以使人類實(shí)現(xiàn)這一非凡的能力。在撰寫本文時(shí)(2008年),我們可以重現(xiàn),語(yǔ)音識(shí)別技術(shù)不只表現(xiàn)在有限程度的電腦能力上,在其他許多方面也是有用的。</p><p><b>  語(yǔ)音識(shí)別技術(shù)的挑戰(zhàn)</b></p><p>  古老的書寫系

3、統(tǒng),要回溯到蘇美爾人的六千年前。他們可以將模擬錄音通過(guò)留聲機(jī)進(jìn)行語(yǔ)音播放,直到1877年。然而,由于與語(yǔ)音識(shí)別各種各樣的問(wèn)題,語(yǔ)音識(shí)別不得不等待著計(jì)算機(jī)的發(fā)展。</p><p>  首先,演講不是簡(jiǎn)單的口語(yǔ)文本——同樣的道理,戴維斯很難捕捉到一個(gè)note-for-note曲作為樂(lè)譜。人類所理解的詞、短語(yǔ)或句子離散與清晰的邊界實(shí)際上是將信號(hào)連續(xù)的流,而不是聽起來(lái): I went to the store yest

4、erday昨天我去商店。單詞也可以混合,用Whadd ayawa嗎?這代表著你想要做什么。第二,沒(méi)有一對(duì)一的聲音和字母之間的相關(guān)性。在英語(yǔ),有略多于5個(gè)元音字母——a,e,i,o,u,有時(shí)y和w。有超過(guò)二十多個(gè)不同的元音, 雖然,精確統(tǒng)計(jì)可以取決于演講者的口音而定。但相反的問(wèn)題也會(huì)發(fā)生,在那里一個(gè)以上的信號(hào)能再現(xiàn)某一特定的聲音。字母C可以有相同的字母K的聲音,如蛋糕,或作為字母S,如柑橘。</p><p>  此

5、外,說(shuō)同一語(yǔ)言的人使用不相同的聲音,即語(yǔ)言不同,他們的聲音語(yǔ)音或模式的組織,有不同的口音。例如“水”這個(gè)詞,wadder可以顯著watter,woader wattah等等。每個(gè)人都有獨(dú)特的音量——男人說(shuō)話的時(shí)候,一般開的最低音,婦女和兒童具有更高的音高(雖然每個(gè)人都有廣泛的變異和重疊)。發(fā)音可以被鄰近的聲音、說(shuō)話者的速度和說(shuō)話者的健康狀況所影響,當(dāng)一個(gè)人感冒的時(shí)候,就要考慮發(fā)音的變化。</p><p>  最后

6、,考慮到不是所有的語(yǔ)音都是有意義的聲音組成。通常語(yǔ)音自身是沒(méi)有任何意義的,但有些用作分手話語(yǔ)以傳達(dá)說(shuō)話人的微妙感情或動(dòng)機(jī)的信息:哦,就像,你知道,好的。也有一些聽起來(lái)都不認(rèn)為是字,這是一項(xiàng)詞性的:呃,嗯,嗯。嗽、打噴嚏、談笑風(fēng)生、嗚咽,甚至打嗝的可以成為上述的內(nèi)容之一。在噪雜的地方與環(huán)境自身的噪聲中,即使語(yǔ)音識(shí)別也是困難的。</p><p>  “我昨天去了商店”的波形圖</p><p>

7、  “我昨天去了商店”的光譜圖</p><p><b>  語(yǔ)音識(shí)別的發(fā)展史</b></p><p>  盡管困難重重,語(yǔ)音識(shí)別技術(shù)卻隨著數(shù)字計(jì)算機(jī)的誕生一直被努力著。早在1952年,研究人員在貝爾實(shí)驗(yàn)室就已開發(fā)出了一種自動(dòng)數(shù)字識(shí)別器,取名“奧黛麗”。如果說(shuō)話的人是男性,并且發(fā)音者在詞與詞之間停頓350毫秒并把把詞匯限制在1—9之間的數(shù)字,再加上“哦”,另外如果這臺(tái)機(jī)

8、器能夠調(diào)整到適應(yīng)說(shuō)話者的語(yǔ)音習(xí)慣,奧黛麗的精確度將達(dá)到97℅—99℅,如果識(shí)別器不能夠調(diào)整自己,那么精確度將低至60℅.</p><p>  奧黛麗通過(guò)識(shí)別音素或者兩個(gè)截然不同的聲音工作。這些因素與識(shí)別器經(jīng)訓(xùn)練產(chǎn)生的參考音素是有關(guān)聯(lián)的。在接下來(lái)的20年里研究人員花了大量的時(shí)間和金錢來(lái)改善這個(gè)概念,但是少有成功。計(jì)算機(jī)硬件突飛猛進(jìn)、語(yǔ)音合成技術(shù)穩(wěn)步提高,喬姆斯基的生成語(yǔ)法理論認(rèn)為語(yǔ)言可以被程序性地分析。然而,這些似

9、乎并沒(méi)有提高語(yǔ)音識(shí)別技術(shù)。喬姆斯基和哈里的語(yǔ)法生成工作也導(dǎo)致主流語(yǔ)言學(xué)放棄音素概念,轉(zhuǎn)而選擇將語(yǔ)言的聲音模式分解成更小、更易離散的特征。</p><p>  1969年皮爾斯坦率地寫了一封信給美國(guó)聲學(xué)學(xué)會(huì)的會(huì)刊,大部分關(guān)于語(yǔ)音識(shí)別的研究成果都發(fā)表在上面。皮爾斯是衛(wèi)星通信的先驅(qū)之一,并且是貝爾實(shí)驗(yàn)室的執(zhí)行副主任,貝爾實(shí)驗(yàn)室在語(yǔ)音識(shí)別研究中處于領(lǐng)先地位。皮爾斯說(shuō)所有參與研究的人都是在浪費(fèi)時(shí)間和金錢。</p>

10、;<p>  如果你認(rèn)為一個(gè)人之所以從事語(yǔ)音識(shí)別方面的研究是因?yàn)樗艿玫浇疱X,那就太草率了。這種吸引力也許類似于把水變成汽油、從海水中提取黃金、治愈癌癥或者登月的誘惑。一個(gè)人不可能用削減肥皂成本10℅的方法簡(jiǎn)單地得到錢。如果想騙到人,他要用欺詐和誘惑。</p><p>  皮爾斯1969年的信標(biāo)志著在貝爾實(shí)驗(yàn)室持續(xù)了十年的研究結(jié)束了。然而,國(guó)防研究機(jī)構(gòu)ARPA選擇了堅(jiān)持下去。1971年他們資助了一項(xiàng)

11、開發(fā)一種語(yǔ)音識(shí)別器的研究計(jì)劃,這種語(yǔ)音識(shí)別器要能夠處理至少1000個(gè)詞并且能夠理解相互連接的語(yǔ)音,即在語(yǔ)音中沒(méi)有詞語(yǔ)之間的明顯停頓。這種語(yǔ)音識(shí)別器能夠假設(shè)一種存在輕微噪音背景的環(huán)境,并且它不需要在真正的時(shí)間中工作。</p><p>  到1976年,三個(gè)承包公司已經(jīng)開發(fā)出六種系統(tǒng)。最成功的是由卡耐基麥隆大學(xué)開發(fā)的叫做“Harpy”的系統(tǒng)。“Harpy”比較慢,四秒鐘的句子要花費(fèi)五分多鐘的時(shí)間來(lái)處理。并且它還要求發(fā)

12、音者通過(guò)說(shuō)句子來(lái)建立一種參考模型。然而,它確實(shí)識(shí)別出了1000個(gè)詞匯,并且支持連音的識(shí)別。</p><p>  研究通過(guò)各種途徑繼續(xù)著,但是“Harpy”已經(jīng)成為未來(lái)成功的模型。它應(yīng)用隱馬爾科夫模型和統(tǒng)計(jì)模型來(lái)提取語(yǔ)音的意義。本質(zhì)上,語(yǔ)音被分解成了相互重疊的聲音片段和被認(rèn)為最可能的詞或詞的部分所組成的幾率模型。整個(gè)程序計(jì)算復(fù)雜,但它是最成功的。</p><p>  在1970s到1980s

13、之間,關(guān)于語(yǔ)音識(shí)別的研究繼續(xù)進(jìn)行著。到1980s,大部分研究者都在使用隱馬爾科夫模型,這種模型支持著現(xiàn)代所有的語(yǔ)音識(shí)別器。在1980s后期和1990s,DARPA資助了一些研究。第一項(xiàng)研究類似于以前遇到的挑戰(zhàn),即1000個(gè)詞匯量,但是這次要求更加精確。這個(gè)項(xiàng)目使系統(tǒng)詞匯出錯(cuò)率從10℅下降了一些。其余的研究項(xiàng)目都把精力集中在改進(jìn)算法和提高計(jì)算效率上。</p><p>  2001年微軟發(fā)布了一個(gè)能夠與0ffice

14、XP 同時(shí)工作的語(yǔ)音識(shí)別系統(tǒng)。它把50年來(lái)這項(xiàng)技術(shù)的發(fā)展和缺點(diǎn)都包含在內(nèi)了。這個(gè)系統(tǒng)必須用大作家的作品來(lái)訓(xùn)練為適應(yīng)某種指定的聲音,比如埃德加愛倫坡的厄舍古屋的倒塌和比爾蓋茨的前進(jìn)的道路。即使在訓(xùn)練之后,該系統(tǒng)仍然是脆弱的,以至于還提供了一個(gè)警告:“如果你改變使用微軟語(yǔ)音識(shí)別系統(tǒng)的地點(diǎn)導(dǎo)致準(zhǔn)確率將降低,請(qǐng)重新啟動(dòng)麥克風(fēng)”。從另一方面來(lái)說(shuō),該系統(tǒng)確實(shí)能夠在真實(shí)的時(shí)間中工作,并且它確實(shí)能識(shí)別連音。</p><p>&l

15、t;b>  語(yǔ)音識(shí)別的今天</b></p><p><b>  技術(shù)</b></p><p>  當(dāng)今的語(yǔ)音識(shí)別技術(shù)著力于通過(guò)共振和光譜分析來(lái)對(duì)我們的聲音產(chǎn)生的聲波進(jìn)行數(shù)學(xué)分析。計(jì)算機(jī)系統(tǒng)第一次通過(guò)數(shù)字模擬轉(zhuǎn)換器記錄了經(jīng)過(guò)麥克風(fēng)傳來(lái)的聲波。那種當(dāng)我們說(shuō)一個(gè)詞的時(shí)候所產(chǎn)生的模擬的或者持續(xù)的聲波被分割成了一些時(shí)間碎片,然后這些碎片按照它們的振幅水平被度

16、量,振幅是指從一個(gè)說(shuō)話者口中產(chǎn)生的空氣壓力。為了測(cè)量振幅水平并且將聲波轉(zhuǎn)換成為數(shù)字格式,現(xiàn)在的語(yǔ)音識(shí)別研究普遍采用了奈奎斯特—香農(nóng)定理。</p><p><b>  奈奎斯特—香農(nóng)定理</b></p><p>  奈奎斯特—香農(nóng)定理是在1928年研究發(fā)現(xiàn)的,該定理表明一個(gè)給定的模擬頻率能夠由一個(gè)是原始模擬頻率兩倍的數(shù)字頻率重建出來(lái)。奈奎斯特證明了該規(guī)律的真實(shí)性,因?yàn)橐?/p>

17、個(gè)聲波頻率必須由于壓縮和疏散各取樣一次。例如,一個(gè)20kHz的音頻信號(hào)能準(zhǔn)確地被表示為一個(gè)44.1kHz的數(shù)字信號(hào)樣本。</p><p><b>  工作原理</b></p><p>  語(yǔ)音識(shí)別系統(tǒng)通常使用統(tǒng)計(jì)模型來(lái)解釋方言,口音,背景噪音和發(fā)音的不同。這些模型已經(jīng)發(fā)展到這種程度,在一個(gè)安靜的環(huán)境中準(zhǔn)確率可以達(dá)到90℅以上。然而每一個(gè)公司都有它們自己關(guān)于輸入處理的專

18、項(xiàng)技術(shù),存在著4種關(guān)于語(yǔ)音如何被識(shí)別的共同主題。</p><p>  1.基于模板:這種模型應(yīng)用了內(nèi)置于程序中的語(yǔ)言數(shù)據(jù)庫(kù)。當(dāng)把語(yǔ)音輸入到系統(tǒng)中后,識(shí)別器利用其與數(shù)據(jù)庫(kù)的匹配進(jìn)行工作。為了做到這一點(diǎn),該程序使用了動(dòng)態(tài)規(guī)劃算法。這種語(yǔ)音識(shí)別技術(shù)的衰落是因?yàn)檫@個(gè)識(shí)別模型不足以完成對(duì)不在數(shù)據(jù)庫(kù)中的語(yǔ)音類型的理解。</p><p>  2.基于知識(shí):基于知識(shí)的語(yǔ)音識(shí)別技術(shù)分析語(yǔ)音的聲譜圖以收集數(shù)據(jù)

19、和制定規(guī)則,這些數(shù)據(jù)和規(guī)則回饋與操作者的命令和語(yǔ)句等值的信息。這種識(shí)別技術(shù)不適用關(guān)于語(yǔ)音的語(yǔ)言和語(yǔ)音知識(shí)。</p><p>  3.隨機(jī):隨機(jī)語(yǔ)音識(shí)別技術(shù)在今天最為常見。隨機(jī)語(yǔ)音分析方法利用隨機(jī)概率模型來(lái)模擬語(yǔ)音輸入的不確定性。最流行的隨機(jī)概率模型是HMM(隱馬爾科夫模型)。如下所示:

20、 </p><p>  Yt是觀察到的聲學(xué)數(shù)據(jù),p(W)是一個(gè)特定詞串的先天隨機(jī)概率,p(Yt∣W)是在給定的聲學(xué)模型中被觀察到的聲學(xué)數(shù)據(jù)的概率,W是假設(shè)的詞匯串。在分析語(yǔ)音輸入的時(shí)候,HMM被證明是成功的,因?yàn)樵撍惴紤]到了語(yǔ)言模型,人類說(shuō)話的聲音模型和已知的所有詞匯。</p><p>  1.聯(lián)結(jié):在聯(lián)結(jié)主義語(yǔ)音識(shí)別技術(shù)當(dāng)中,關(guān)于語(yǔ)音輸入的知識(shí)是這樣獲得的,即分析輸入的信號(hào)并從簡(jiǎn)單的

21、多層感知器中用多種方式將其儲(chǔ)存在延時(shí)神經(jīng)網(wǎng)絡(luò)中。</p><p>  如前所述,利用隨機(jī)模型來(lái)分析語(yǔ)言的程序是今天最流行的,并且證明是最成功的。</p><p><b>  識(shí)別指令</b></p><p>  當(dāng)今語(yǔ)音識(shí)別軟件最重要的目標(biāo)是識(shí)別指令。這增強(qiáng)了語(yǔ)音軟件的功能。例如微軟Sync被裝進(jìn)了許多新型汽車?yán)锩?,?jù)說(shuō)這可以讓使用者進(jìn)入汽車的

22、所有電子配件和免提。這個(gè)軟件是成功的。它詢問(wèn)使用者一系列問(wèn)題并利用常用詞匯的發(fā)音來(lái)得出語(yǔ)音恒量。這些常量變成了語(yǔ)音識(shí)別技術(shù)算法中的一環(huán),這樣以后就能夠提供更好的語(yǔ)音識(shí)別。當(dāng)今的技術(shù)評(píng)論家認(rèn)為這項(xiàng)技術(shù)自20世紀(jì)90年代開始已經(jīng)有了很大進(jìn)步,但是在短時(shí)間內(nèi)不會(huì)取代手控裝置。</p><p><b>  聽寫</b></p><p>  關(guān)于指令識(shí)別的第二點(diǎn)是聽寫。就像接下

23、來(lái)討論的那樣,今天的市場(chǎng)看重聽寫軟件在轉(zhuǎn)述醫(yī)療記錄、學(xué)生試卷和作為一種更實(shí)用的將思想轉(zhuǎn)化成文字方面的價(jià)值。另外,許多公司看重聽寫在翻譯過(guò)程中的價(jià)值,在這個(gè)過(guò)程中,使用者可以把他們的語(yǔ)言翻譯成為信件,這樣使用者就可以說(shuō)給他們母語(yǔ)中另一部分人聽。在今天的市場(chǎng)上,關(guān)于該軟件的生產(chǎn)制造已經(jīng)存在。</p><p>  語(yǔ)句翻譯中存在的錯(cuò)誤</p><p>  當(dāng)語(yǔ)音識(shí)別技術(shù)處理你的語(yǔ)句的時(shí)候,它們的

24、準(zhǔn)確率取決于它們減少錯(cuò)誤的能力。它們?cè)谶@一點(diǎn)上的評(píng)價(jià)標(biāo)準(zhǔn)被稱為單個(gè)詞匯錯(cuò)誤率(SWER)和指令成功率(CSR)。當(dāng)一個(gè)句子中一個(gè)單詞被弄錯(cuò),那就叫做單個(gè)詞匯出錯(cuò)。因?yàn)镾WERs在指令識(shí)別系統(tǒng)中存在,它們?cè)诼爩戃浖凶顬槌R姟V噶畛晒β适怯蓪?duì)指令的精確翻譯決定的。一個(gè)指令陳述可能不會(huì)被完全準(zhǔn)確的翻譯,但識(shí)別系統(tǒng)能夠利用數(shù)學(xué)模型來(lái)推斷使用者想要發(fā)出的指令。</p><p><b>  商業(yè)</b>

25、;</p><p><b>  主要的語(yǔ)音技術(shù)公司</b></p><p>  隨著語(yǔ)音技術(shù)產(chǎn)業(yè)的發(fā)展,更多的公司帶著他們新的產(chǎn)品和理念進(jìn)入這一領(lǐng)域。下面是一些語(yǔ)音識(shí)別技術(shù)領(lǐng)域領(lǐng)軍公司名單(并非全部)NICE Systems(NASDAQ:NICE and Tel Aviv:Nice),該公司成立于1986年,總部設(shè)在以色列,它專長(zhǎng)于數(shù)字記錄和歸檔技術(shù)。他們?cè)?007

26、年收入5.23億美元。欲了解更多信息,請(qǐng)?jiān)L問(wèn)http://www.nice.com</p><p>  Verint系統(tǒng)公司(OTC:VRNT),總部設(shè)在紐約的梅爾維爾,創(chuàng)立于1994年把自己定位為“勞動(dòng)力優(yōu)化智能解決方案,IP視頻,通訊截取和公共安全設(shè)備的領(lǐng)先供應(yīng)商。詳細(xì)信息,請(qǐng)?jiān)L問(wèn)http://verint.com</p><p>  Nuance公司(納斯達(dá)克股票代碼:NUAN)總部

27、設(shè)在伯靈頓,開發(fā)商業(yè)和客戶服務(wù)使用語(yǔ)音和圖像技術(shù)。欲了解更多信息,請(qǐng)?jiān)L問(wèn)http://www.nuance.com</p><p>  Vlingo,總部設(shè)在劍橋,開發(fā)與無(wú)線/移動(dòng)技術(shù)對(duì)接的語(yǔ)音識(shí)別技術(shù)。 Vlingo最近與雅虎聯(lián)手合作,為雅虎的移動(dòng)搜索服務(wù)—一鍵通功能提供語(yǔ)音識(shí)別技術(shù)。欲了解更多信息,請(qǐng)?jiān)L問(wèn)http://vlingo.com</p><p>  在語(yǔ)音技術(shù)領(lǐng)域的其他主要公

28、司包括:Unisys,ChaCha,SpeechCycle,Sensory,微軟的Tellme公司,克勞斯納技術(shù)等等。專利侵權(quán)訴訟</p><p>  考慮到這兩項(xiàng)業(yè)務(wù)和技術(shù)的高度競(jìng)爭(zhēng)性,各公司之間有過(guò)無(wú)數(shù)次的專利侵權(quán)訴訟并不奇怪。在開發(fā)語(yǔ)音識(shí)別設(shè)備所涉及的每個(gè)元素都可以作為一個(gè)單獨(dú)的技術(shù)申請(qǐng)專利。使用已經(jīng)被另一家公司或個(gè)人申請(qǐng)專利的技術(shù),即使這項(xiàng)技術(shù)是你自己獨(dú)立研發(fā)的,你也可能被要求賠償,并并可能不公正地禁止

29、你以后使用該項(xiàng)技術(shù)。語(yǔ)音產(chǎn)業(yè)中的政治和商業(yè)緊緊地與語(yǔ)音技術(shù)的發(fā)展聯(lián)系在一起,因此,必須認(rèn)識(shí)到可能阻礙該行業(yè)的進(jìn)一步發(fā)展的政治和法律障礙。下面是對(duì)一些專利侵權(quán)訴訟的敘述。應(yīng)當(dāng)指出,目前有許多這樣的訴訟立案,許多訴訟案被推上法庭。</p><p><b>  語(yǔ)音識(shí)別未來(lái)的發(fā)展</b></p><p>  今后的發(fā)展趨勢(shì)和應(yīng)用</p><p>&l

30、t;b>  醫(yī)療行業(yè)</b></p><p>  醫(yī)療行業(yè)有多年來(lái)一直在宣傳電子病歷(EMR)。不幸的是,產(chǎn)業(yè)遲遲不能夠滿足EMRs,一些公司斷定原因是由于數(shù)據(jù)的輸入。沒(méi)有足夠的人員將大量的病人信息輸入成為電子格式,因此,紙質(zhì)記錄依然盛行。一家叫Nuance(也出現(xiàn)在其他領(lǐng)域,軟件開發(fā)者稱為龍指令)相信他們可以找到一市場(chǎng)將他們的語(yǔ)音識(shí)別軟件出售那些更喜歡聲音而非手寫輸入病人信息的醫(yī)生。</

31、p><p><b>  軍事</b></p><p>  國(guó)防工業(yè)研究語(yǔ)音識(shí)別軟件試圖將其應(yīng)用復(fù)雜化而非更有效率和親切。為了使駕駛員更快速、方便地進(jìn)入需要的數(shù)據(jù)庫(kù),語(yǔ)音識(shí)別技術(shù)是目前正在飛機(jī)駕駛員座位下面的顯示器上進(jìn)行試驗(yàn)。</p><p>  軍方指揮中心同樣正在嘗試?yán)谜Z(yǔ)音識(shí)別技術(shù)在危急關(guān)頭用快速和簡(jiǎn)易的方式進(jìn)入他們掌握的大量資料庫(kù)。另外,軍方

32、也為了照顧病員涉足EMR。軍方宣布,正在努力利用語(yǔ)音識(shí)別軟件把數(shù)據(jù)轉(zhuǎn)換成為病人的記錄。</p><p>  摘自:http://en.citizendium.org/wiki/Speech_Recognition</p><p><b>  附:英文原文</b></p><p>  Speech Recognition</p>&

33、lt;p>  In computer technology, Speech Recognition refers to the recognition of human speech by computers for the performance of speaker-initiated computer-generated functions (e.g., transcribing speech to text; data e

34、ntry; operating electronic and mechanical devices; automated processing of telephone calls) — a main element of so-called natural language processing through computer speech technology. Speech derives from sounds created

35、 by the human articulatory system, including the lungs, vocal cords, a</p><p>  The Challenge of Speech Recognition </p><p>  Writing systems are ancient, going back as far as the Sumerians of 6

36、,000 years ago. The phonograph, which allowed the analog recording and playback of speech, dates to 1877. Speech recognition had to await the development of computer, however, due to multifarious problems with the recogn

37、ition of speech. </p><p>  First, speech is not simply spoken text--in the same way that Miles Davis playing So What can hardly be captured by a note-for-note rendition as sheet music. What humans understand

38、 as discrete words, phrases or sentences with clear boundaries are actually delivered as a continuous stream of sounds: Iwenttothestoreyesterday, rather than I went to the store yesterday. Words can also blend, with Whad

39、dayawa? representing What do you want? </p><p>  Second, there is no one-to-one correlation between the sounds and letters. In English, there are slightly more than five vowel letters--a, e, i, o, u, and som

40、etimes y and w. There are more than twenty different vowel sounds, though, and the exact count can vary depending on the accent of the speaker. The reverse problem also occurs, where more than one letter can represent a

41、given sound. The letter c can have the same sound as the letter k, as in cake, or as the letter s, as in citrus. </p><p>  In addition, people who speak the same language do not use the same sounds, i.e. lan

42、guages vary in their phonology, or patterns of sound organization. There are different accents--the word 'water' could be pronounced watter, wadder, woader, wattah, and so on. Each person has a distinctive pitch

43、when they speak--men typically having the lowest pitch, women and children have a higher pitch (though there is wide variation and overlap within each group.) Pronunciation is also colored by adjacent sou</p><

44、p>  Lastly, consider that not all sounds consist of meaningful speech. Regular speech is filled with interjections that do not have meaning in themselves, but serve to break up discourse and convey subtle information

45、about the speaker's feelings or intentions: Oh, like, you know, well. There are also sounds that are a part of speech that are not considered words: er, um, uh. Coughing, sneezing, laughing, sobbing, and even hiccupp

46、ing can be a part of what is spoken. And the environment adds its own n</p><p>  History of Speech Recognition </p><p>  Despite the manifold difficulties, speech recognition has been attempted

47、for almost as long as there have been digital computers. As early as 1952, researchers at Bell Labs had developed an Automatic Digit Recognizer, or "Audrey". Audrey attained an accuracy of 97 to 99 percent if t

48、he speaker was male, and if the speaker paused 350 milliseconds between words, and if the speaker limited his vocabulary to the digits from one to nine, plus "oh", and if the machine could be adjusted to the sp

49、eaker's</p><p>  Audrey worked by recognizing phonemes, or individual sounds that were considered distinct from each other. The phonemes were correlated to reference models of phonemes that were generate

50、d by training the recognizer. Over the next two decades, researchers spent large amounts of time and money trying to improve upon this concept, with little success. Computer hardware improved by leaps and bounds, speech

51、synthesis improved steadily, and Noam Chomsky's idea of generative grammar suggested that lang</p><p>  In 1969, John R. Pierce wrote a forthright letter to the Journal of the Acoustical Society of Ameri

52、ca, where much of the research on speech recognition was published. Pierce was one of the pioneers in satellite communications, and an executive vice president at Bell Labs, which was a leader in speech recognition resea

53、rch. Pierce said everyone involved was wasting time and money. </p><p>  It would be too simple to say that work in speech recognition is carried out simply because one can get money for it. . . .The attract

54、ion is perhaps similar to the attraction of schemes for turning water into gasoline, extracting gold from the sea, curing cancer, or going to the moon. One doesn't attract thoughtlessly given dollars by means of sche

55、mes for cutting the cost of soap by 10%. To sell suckers, one uses deceit and offers glamor. </p><p>  Pierce's 1969 letter marked the end of official research at Bell Labs for nearly a decade. The defen

56、se research agency ARPA, however, chose to persevere. In 1971 they sponsored a research initiative to develop a speech recognizer that could handle at least 1,000 words and understand connected speech, i.e., speech witho

57、ut clear pauses between each word. The recognizer could assume a low-background-noise environment, and it did not need to work in real time. </p><p>  By 1976, three contractors had developed six systems. Th

58、e most successful system, developed by Carnegie Mellon University, was called Harpy. Harpy was slow—a four-second sentence would have taken more than five minutes to process. It also still required speakers to 'train

59、' it by speaking sentences to build up a reference model. Nonetheless, it did recognize a thousand-word vocabulary, and it did support connected speech. </p><p>  Research continued on several paths, but

60、 Harpy was the model for future success. It used hidden Markov models and statistical modeling to extract meaning from speech. In essence, speech was broken up into overlapping small chunks of sound, and probabilistic mo

61、dels inferred the most likely words or parts of words in each chunk, and then the same model was applied again to the aggregate of the overlapping chunks. The procedure is computationally intensive, but it has proven to

62、be the most successf</p><p>  Throughout the 1970s and 1980s research continued. By the 1980s, most researchers were using hidden Markov models, which are behind all contemporary speech recognizers. In the l

63、atter part of the 1980s and in the 1990s, DARPA (the renamed ARPA) funded several initiatives. The first initiative was similar to the previous challenge: the requirement was still a one-thousand word vocabulary, but thi

64、s time a rigorous performance standard was devised. This initiative produced systems that lowered the w</p><p>  In 2001, Microsoft released a speech recognition system that worked with Office XP. It neatly

65、encapsulated how far the technology had come in fifty years, and what the limitations still were. The system had to be trained to a specific user's voice, using the works of great authors that were provided, such as

66、Edgar Allen Poe's Fall of the House of Usher, and Bill Gates' The Way Forward. Even after training, the system was fragile enough that a warning was provided, "If you change the room in whic</p><p

67、>  Speech Recognition Today </p><p>  Technology </p><p>  Current voice recognition technologies work on the ability to mathematically analyze the sound waves formed by our voices through re

68、sonance and spectrum analysis. Computer systems first record the sound waves spoken into a microphone through a digital to analog converter. The analog or continuous sound wave that we produce when we say a word is slice

69、d up into small time fragments. These fragments are then measured based on their amplitude levels, the level of compression of air released from a p</p><p>  Nyquist-Shannon TheoremThe Nyquist –Shannon theo

70、rem was developed in 1928 to show that a given analog frequency is most accurately recreated by a digital frequency that is twice the original analog frequency. Nyquist proved this was true because an audible frequency m

71、ust be sampled once for compression and once for rarefaction. For example, a 20 kHz audio signal can be accurately represented as a digital sample at 44.1 kHz.</p><p>  How it WorksCommonly speech recogniti

72、on programs use statistical models to account for variations in dialect, accent, background noise, and pronunciation. These models have progressed to such an extent that in a quiet environment accuracy of over 90% can be

73、 achieved. While every company has their own proprietary technology for the way a spoken input is processed there exists 4 common themes about how speech is recognized. </p><p>  1. Template-Based: This mode

74、l uses a database of speech patterns built into the program. After receiving voice input into the system recognition occurs by matching the input to the database. To do this the program uses Dynamic Programming algorithm

75、s. The downfall of this type of speech recognition is the inability for the recognition model to be flexible enough to understand voice patterns unlike those in the database. </p><p>  2. Knowledge-Based: Kn

76、owledge-based speech recognition analyzes the spectrograms of the speech to gather data and create rules that return values equaling what commands or words the user said. Knowledge-Based recognition does not make use of

77、linguistic or phonetic knowledge about speech. </p><p>  3. Stochastic: Stochastic speech recognition is the most common today. Stochastic methods of voice analysis make use of probability models to model th

78、e uncertainty of the spoken input. The most popular probability model is use of HMM (Hidden Markov Model) is shown below. </p><p>  Yt is the observed acoustic data, p(W) is the a-priori probability of a par

79、ticular word string, p(Yt|W) is the probability of the observed acoustic data given the acoustic models, and W is the hypothesised word string. When analyzing the spoken input the HMM has proven to be successful because

80、the algorithm takes into account a language model, an acoustic model of how humans speak, and a lexicon of known words. </p><p>  4. Connectionist: With Connectionist speech recognition knowledge about a spo

81、ken input is gained by analyzing the input and storing it in a variety of ways from simple multi-layer perceptrons to time delay neural nets to recurrent neural nets. </p><p>  As stated above, programs that

82、 utilize stochastic models to analyze spoken language are most common today and have proven to be the most successful. </p><p>  Recognizing CommandsThe most important goal of current speech recognition sof

83、tware is to recognize commands. This increases the functionality of speech software. Software such as Microsost Sync is built into many new vehicles, supposedly allowing users to access all of the car’s electronic access

84、ories, hands-free. This software is adaptive. It asks the user a series of questions and utilizes the pronunciation of commonly used words to derive speech constants. These constants are then factored i</p><p&

85、gt;  DictationSecond to command recognition is dictation. Today's market sees value in dictation software as discussed below in transcription of medical records, or papers for students, and as a more productive way

86、to get one's thoughts down a written word. In addition many companies see value in dictation for the process of translation, in that users could have their words translated for written letters, or translated so the u

87、ser could then say the word back to another party in their native languag</p><p>  Errors in Interpreting the Spoken WordAs speech recognition programs process your spoken words their success rate is based

88、on their ability to minimize errors. The scale on which they can do this is called Single Word Error Rate (SWER) and Command Success Rate (CSR). A Single Word Error is simply put, a misunderstanding of one word in a spok

89、en sentence. While SWERs can be found in Command Recognition Programs, they are most commonly found in dictation software. Command Success Rate is defined b</p><p><b>  Business </b></p>&

90、lt;p>  Major Speech Technology Companies </p><p>  As the speech technology industry grows, more companies emerge into this field bring with them new products and ideas. Some of the leaders in voice recog

91、nition technologies (but by no means all of them) are listed below. </p><p>  NICE Systems (NASDAQ: NICE and Tel Aviv: Nice), headquartered in Israel and founded in 1986, specialize in digital recording and

92、archiving technologies. In 2007 they made $523 million in revenue in 2007. For more information visit http://www.nice.com. </p><p>  Verint Systems Inc.(OTC:VRNT), headquartered in Melville, New York and fou

93、nded in 1994 self-define themselves as “A leading provider of actionable intelligence solutions for workforce optimization, IP video, communications interception, and public safety.”[9] For more information visit http://

94、verint.com. </p><p>  Nuance (NASDAQ: NUAN) headquartered in Burlington, develops speech and image technologies for business and customer service uses. For more information visit http://www.nuance.com/. <

95、/p><p>  Vlingo, headquartered in Cambridge, MA, develops speech recognition technology that interfaces with wireless/mobile technologies. Vlingo has recently teamed up with Yahoo! providing the speech recognit

96、ion technology for Yahoo!’s mobile search service, oneSearch. For more information visit http://vlingo.com </p><p>  Other major companies involved in Speech Technologies include: Unisys, ChaCha, SpeechCycle

97、, Sensory, Microsoft's Tellme, Klausner Technologies and many more. </p><p>  Patent Infringement Lawsuits </p><p>  Given the highly competitive nature of both business and technology, it i

98、s not surprising that there have been numerous patent infringement lawsuits brought by various speech companies. Each element involved in developing a speech recognition device can be claimed as a separate technology, an

99、d hence patented as such. Use of a technology, even if it is independently developed, that is patented by another company or individual is liable to monetary compensation and often results in injunctions pre</p>&

100、lt;p>  The Future of Speech Recognition </p><p>  Future Trends & Applications </p><p>  The Medical Industry </p><p>  For years the medical industry has been touting electr

101、onic medical records (EMR). Unfortunately the industry has been slow to adopt EMRs and some companies are betting that the reason is because of data entry. There isn’t enough people to enter the multitude of current pati

102、ent’s data into electronic format and because of that the paper record prevails. A company called Nuance (also featured in other areas here, and developer of the software called Dragon Dictate) is betting that they can f

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 眾賞文庫(kù)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論