二代測(cè)序?qū)嶒?yàn)與測(cè)序原理_第1頁(yè)
已閱讀1頁(yè),還剩42頁(yè)未讀 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

1、二代測(cè)序的建庫(kù)與測(cè)序原理,何有裕yyhe@sibs.ac.cnyyhe@biosino.com.cn上海生物信息技術(shù)研究中心上海眾信生物技術(shù)有限公司蘇州眾信生物技術(shù)有限公司,內(nèi)容,樣本處理與測(cè)序原理簡(jiǎn)介羅氏454Illumina solexa原始數(shù)據(jù)質(zhì)量控制,TruSeq RNA and DNA Sample Preparation,Cluster Generation Overview,,,,,,,,,,,,,,,,,

2、,,,,,,,,,,,,,,,,,~ 1000-6000 molecules per cluster,OH,Cluster Generation, Template Hybridization,,,,,,,,,,diol,,,,diol,,,1st cycle denaturation,,,,,Cluster Generation, Bridge PCR,Template preparation-bridge RCR,Adap

3、tor ligation,Surface attachment,Bridge amplification,Denaturation,Trends in Genet 24:133(2008),,First base incorporated,Cycle 1: Add sequencing reagents,Detect Signal,Cleave Terminator and Dye,Cycle 2-n: Add sequencing

4、 reagentsand repeat,,Sequencing by Synthesis Overview,Cyclic reversible termination,All four labeled reversible terminators are added per cycleRemove unincorporated bases and detect signalRemove the terminating group

5、 and the fluorescent dye,Trends in Genet 24:133(2008),,Terminating group,,Fluorophore cleavage,Nat Rev Genet 11:31(2010),Base calling,Flowcell layout on GAII,A flow cell contains 8 lanes,Lane 1,Lane 2,Lane 8,,...,Colum

6、n 1Column 2,,Each lane contains 2 columns,Each column contains 60 tiles,,,,Each tile is imaged 4 times per cycle,,,,,Primary Data Analysis By Firecrest and Bustard in RTA/OLB,tiff image file,Intensity file,Firecrest,Bus

7、tard,Sequence file,,,,,,,,,,,,,,,,,,,,,,,,,,OH,,,diol,,,,diol,,,OH,Cluster Generation, Sequencing Primer Hybridization(Single測(cè)序方式處理步驟),,,Sequence multiple samples in the same lanes,,,,DNA insert,,,,Read 1,,,,,Index Read,

8、Read 2,,,,DNA insert,Index,Index SP,Rd2 SP,Rd1 SP,,,Multiplexing – multiple samples in the same lanes,Pair-end 測(cè)序優(yōu)勢(shì),,Mate-pair 建庫(kù)和測(cè)序,,Molecular Ecology Resources (2011),Template preparation- emulsion PCR,Trends in Genet

9、24:133(2008),Pyrosequencing,Single dNTP type flows per cycleInorganic pyrophosphate (PPi) drives visible light through a series of reactionsRemove unincorporated nucleotide,,Trends in Genet 24:133(2008),Base calling,Ho

10、mopolymer error,,GV6330,20,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,靈活的多樣本標(biāo)簽技術(shù),454、solexa測(cè)序模式,Detect H+ released as a voltage change—fast Common microchip design standards—low-cost manufacturingSequencing volume is i

11、ncreasing,Semiconductor sequencing,Fasta序列格式,Fastq 文件用4行記錄一條序列,第一行以@字符開(kāi)頭,跟在后面的是序列標(biāo)識(shí)和描述 第二行是序列字符 第三行以+字符開(kāi)頭,后面可以為空,或者和第一行一樣 第四行是第二行序列質(zhì)量數(shù)據(jù)的編碼,長(zhǎng)度需和第二行一樣,@HWI-ST507:211:C18E6ACXX:2:1101:1688:1992 1:N:0:GAGTGGCGACAATTTTTTT

12、TGATATTAATAAAGATAGAACTTTCTTCCTATGAGTTTTCTCTC+CCCFFDFFHHHHGJJGHIIJGIIJJJJIIJJHJJJJJIJJIIIGIIIJGGIHJDIJIGAHEHFFGHGHE,Example:,Illumina sequence identifiers,@HWI-EAS364_0004:4:1:995:9044#0/1,Casava 1.8以前的序列標(biāo)識(shí),Illumina seq

13、uence identifiers,@HWI-ST507:211:C18E6ACXX:2:1101:1688:1992 1:N:0:GAGTGG,Casava 1.8的序列標(biāo)識(shí),序列質(zhì)量,附:Solexa 1.3以前的quality計(jì)算公式是:,SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS......................................... .............

14、.............XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX.........................................IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII...........................................JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ

15、JJJJJ.......... LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLL........................................ !"#$%&'()*+,-./0123456789:;?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqr|

16、| | | |33 59 64 73 1040........................26...31.......40 -5....0........9................

17、.............40 0........9.............................40 3.....9.............................40 0........................26...31........41 S - Sanger

18、 Phred+33, raw reads typically (0, 40) X - Solexa Solexa+64, raw reads typically (-5, 40) I - Illumina 1.3+ Phred+64, raw reads typically (0, 40) J - Illumina 1.5+ Phred+64, raw reads typically (3, 40)

19、 with 0=unused, 1=unused, 2=Read Segment Quality Control Indicator (bold) (Note: See discussion above). L - Illumina 1.8+ Phred+33, raw reads typically (0, 41),Q值對(duì)應(yīng)ASCII碼,,,,454原始數(shù)據(jù)圖片、sff格式、fasta格式(qual),>

20、HSAPGDX01D1KDA length=181 xy=1540_3788 region=1 run=R_2012_08_01_00_39_39ACGTGTTCTGAGCCATATTGCGGTACTGGAAGGTGCGCCTGCACTGTCTGAGCACTGGTCACTGCTCGATACCAATGAAGCCTTATTTGATGAGGCGCGCACCACGCAGGCGGCGACTATTATCTTCTCGTTTGATCCAGAATAAC

21、CAAATCGAAAACGCTGGCAAGGCACACAGGGGATA>HSAPGDX01D1KDA length=181 xy=1540_3788 region=1 run=R_2012_08_01_00_39_3940 40 40 40 40 40 40 39 37 38 36 34 24 23 19 19 19 24 20 19 18 18 26 26 18 18 19 18 20 20 20 25 25 26 19 2

22、0 20 22 22 22 25 28 26 24 22 22 22 25 24 28 28 28 29 29 28 30 30 30 26 2626 27 27 27 31 31 30 28 28 28 30 30 30 30 26 21 21 20 20 26 27 28 24 25 20 20 20 20 19 19 19 27 28 28 30 30 31 30 28 28 30 31 31 32 32 31 31 30 30

23、30 31 27 24 24 22 20 20 20 22 2626 22 22 23 16 16 16 19 22 16 13 13 13 16 22 23 23 23 26 26 24 24 26 13 13 11 11 12 12 19 22 18 18 11 11 13 13 18 24 24 24 24 26 26 26 27 29 29 31 33 32 31 31 27 27 27 29 29 28 2622,454原始數(shù)

24、據(jù)長(zhǎng)度分布(質(zhì)控后一樣),Yield, data size produced by sequencer.Reads, sequenced fragments.Read length and quality.Coverage fold, number of times a nucleotide is represented. Depth, the average coverage fold.Coverage rate, rati

25、o of the region sequenced to the whole genome.Homopolymer, e.g. AAAAA,Key lab of systems biologySIBS, Chinese Academy of Sciences,一些測(cè)序中提到的基本概念,通常深度測(cè)序數(shù)據(jù)處理流程,Key lab of systems biologySIBS, Chinese Academy of Sciences,序

26、列質(zhì)量評(píng)估,? FastQC: A quality control tool for high throughput sequence data ? Java ?http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/? Function:,,,,,,,,,QC pipeline,原始數(shù)據(jù)的質(zhì)控過(guò)濾,Sequence level Short sequences Adapto

27、r/primer polyA | T region Overall low-complexity sequence (Dust) Contamination/unwanted sequences Ns (low quality ends) Quality level Low quality base or region 目標(biāo):所有保留的都是高質(zhì)量的,真正參與生物信息分析的數(shù)據(jù)。,Clean reads,去掉含有接頭序列的

28、reads;當(dāng)單端測(cè)序read中含有的N的含量超過(guò)該條read長(zhǎng)度比例的10% 時(shí),去除此對(duì)paired reads;當(dāng)單端測(cè)序read中含有的低質(zhì)量(低于5)堿基數(shù)超過(guò)該條read長(zhǎng)度比例的50% 時(shí),需要去除此對(duì)paired reads。,Reads中不合格的堿 基判斷標(biāo)準(zhǔn):reads中出現(xiàn)N, 記個(gè)數(shù)reads中堿基質(zhì)量分?jǐn)?shù)低于20分, 記個(gè)數(shù)去除的reads條件:質(zhì) 量不合格的堿基占reads長(zhǎng)度的10%以 上(即1

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 眾賞文庫(kù)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論