版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領
文檔簡介
1、人類群體遺傳學基本原理和分析方法,中科院-馬普學會計算生物學伙伴研究所,中國科學院上海生命科學研究院研究生課程 人類群體遺傳學,,,徐書華 金 力,第三講,進化樹的構建方法及應用,進化樹的構建方法及應用,進化樹的概念及相關的術語;進化樹的種類;進化樹的常用構建方法;進化樹的檢驗方法;進化樹的應用;什么情況下使用什么方法最合適?構建進化樹的常用軟件;練習,進化樹的概念及相關的術語,The purpose of
2、 a phylogenetic tree is to illustrate how a group of objects (usually genes or organisms) are related to one another,Phylogeny (phylo =tribe + genesis),Phylogeny,Orangutan,Gorilla,Chimpanzee,Human,From the Tree of the Li
3、fe Website,University of Arizona,Phylogenetic trees are about visualising evolutionary relationships,Phylogenetic trees diagram the evolutionary relationships between the taxa,((A,(B,C)),(D,E)) = The above phylogeny a
4、s nested parentheses,These say that B and C are more closely related to each other than either is to A,and that A, B, and C form a clade that is a sister group to the clade composed ofD and E. If the tree has a time s
5、cale, then D and E are the most closely related.,Clades,Evolutionary trees depict clades. A clade is a group of organisms that includes an ancestor and all descendents of that ancestor. You can think of a clade as a bra
6、nch on the tree of life.,Molecular Evolution - Li,Terminology,? External nodes: things under comparison; operational taxonomic units (OTUs)? Internal nodes: ancestral units; hypothetical; goal is to group current
7、 day units? Root: common ancestor of all OTUs under study. Path from root to node defines evolutionary path? Unrooted: specify relationship but not evolutionary path– If have an outgroup (external reason to believe ce
8、rtain OTU branched off first), then can root? Topology: branching pattern of a tree? Branch length: amount of difference that occurred along a branch,Ancestral Node or ROOT of the Tree,Internal Nodes orDivergence Po
9、ints (represent hypothetical ancestors of the taxa),Branches or Lineages,Terminal Nodes,,,,,,,,,,,,,,A,B,C,D,E,,,,,,,,,Represent theTAXA (genes,populations,species, etc.)used to inferthe phylogeny,Common Phylogene
10、tic Tree Terminology,Terminology,HomologueOrthologueParalogue,Homologs are commonly defined as orthologs, paralogs, or xenologs.Orthologs are homologs resulting from speciation. They are genes that stem from a common
11、 ancestor. Orthologs often have similar functions. SPO11 (Baudat et al. Mol Cell 2000) Paralogs are homologs resulting from gene duplication. They are genes derived from a common ancestral locus that was duplicated w
12、ithin the genome of an organism. Paralogs tend to have different functions. CLB1/CLB2 (Brachat et al. GenomeBiology 2003). Xenologs are homologs resulting from the horizontal transfer of a gene between two organisms.
13、 The function of xenologs can be variable. VDE (Okuda et al. Yeast 2003),Character-based methods can tease apart types of similarity and theoreticallyfind the true evolutionary tree. Similarity = relationship only if c
14、ertain conditionsare met (if the distances are ‘ultrametric’).,Types of Similarity,Observed similarity between two entities can be due to:Evolutionary relationship:Shared ancestral characters (‘plesiomorphies’)Share
15、d derived characters (‘’synapomorphy’)Homoplasy (independent evolution of the same character):Convergent events (in either related on unrelated entities),Parallel events (in related entities), Reversals (in related e
16、ntities),,,,C,G,G,,,C,,,,C,G,G,,,,,T,,Homology and Homoplasy,no hair,no wings,Homology:identity due to shared ancestry(evolutionary signal),Homoplasy:identity despite separate ancestry(evolutionary noise),,,paralogs,
17、,,orthologs,,paralogs,,,,orthologs,Erik L.L. Sonnhammer Orthology,paralogy and proposed classification for paralog subtypes TRENDS in Genetics Vol.18 No.12 December 2002http://tig.trends.com 0168-9525/02/$ – see front
18、matter © 2002 Elsevier Science Ltd. All rights reserved.,The Molecular Clock,For a given protein the rate of sequence evolution is approximately constant across lineagesZuckerkandl and Pauling (1965),This would all
19、ow speciation and duplication events to be dated accurately based on molecular data,Local and approximate molecular clocks more reasonable,Relative Rate Test,Test whether sets of sequences are evolving at equal rates (lo
20、cal molecular clock hypothesis),e.g. RRTree, Robinson-Rechavi http://pbil.univ-lyon1.fr/software/rrtree.html,進化樹的種類,Trees,Diagram consisting of branches and nodes Species tree (how are my species related?) contains onl
21、y one representative from each species. all nodes indicate speciation eventsGene tree (how are my genes related?)normally contains a number of genes from a single speciesnodes relate either to speciation or gene du
22、plication events,Gene tree, species tree,We often assume that gene trees give us species trees,a,b,c,A,B,D,Gene tree,Species tree,Gene tree - Species tree,The two events - mutation and speciation- are not expected to occ
23、ur at the same time. So gene trees cannot represent species tree.,,,,,,,,,,Taxon A,Taxon B,Taxon C,Taxon D,,1,1,1,6,3,5,,genetic change,,,,,,,Taxon A,Taxon B,Taxon C,Taxon D,,,,,,,Taxon A,Taxon B,Taxon C,Taxon D,,no mean
24、ing,Three types of trees,Cladogram Phylogram Ultrametric tree,All show the same evolutionary relationships, or branching orders, between the taxa.,,Tree Properties,In simple scenari
25、os, evolutionary trees are ultrametric and phylograms are additive.,,,,,,,Bacterium 1,Bacterium 3,Bacterium 2,Eukaryote 1,Eukaryote 4,Eukaryote 3,Eukaryote 2,,,,,,,Bacterium 1,Bacterium 3,Bacterium 2,Eukaryote 1,Eukaryo
26、te 4,Eukaryote 3,Eukaryote 2,Phylograms show branch order and branch lengths,Cladograms vs Phylograms,Cladograms show branching order - branch lengths are meaningless,Phenetics,Phenetics, when first introduced (Michener
27、 and Sokal, 1957), challenged the prevailing view that classifications should be based on comparisons between a limited number of characters that taxonomists believed to be important for one reason or another. Phenetici
28、sts argued that classifications should encompass as many variable characters as possible, these characters being scored numerically and analyzed by rigorous mathematical methods.,Cladistics,Cladistics (Hennig, 1966) also
29、 emphasizes the need for large datasets but differs from phenetics in that it does not give equal weight to all characters. The argument is that in order to infer the branching order in a phylogeny it is necessary to di
30、stinguish those characters that provide a good indication of evolutionary relationships from other characters that might be misleading. This might appear to take us back to the pre-phenetic approach but cladistics is mu
31、ch less subjective: rather than making assumptions about which characters are ‘important', cladistics demands that the evolutionary relevance of individual characters be defined. In particular, errors in the branchi
32、ng pattern within a phylogeny are minimized by recognizing two types of anomalous data.,Why Cladistics? Convergent evolution and Derived character states,Convergent evolution,Derived character state,Phenetics versus Cla
33、distics,Phenetics is the study of relationships among a group of organisms on the basis of the degree of similarity between them, be that similarity molecular, phenotypic, or anatomical. A tree-like network expressing
34、phenetic relationships is called a phenogram.,Phenetics versus Cladistics,Cladistics can be defined as the study of the pathways of evolution. In other words, cladists are interested in such questions as: how many bran
35、ches there are among a group of organisms; which branch connects to which other branch; and what is the branching sequence. A tree-like network that expresses such ancestor-descendant relationships is called a cladogram
36、. Thus, a cladogram refers to the topology of a rooted phylogenetic tree.,Phenetics versus Cladistics,While a phenogram may serve as an indicator of cladistic relationships, it is not necessarily identical to the cladog
37、ram. If there is a linear relationship between the time of divergence and the degree of genetic (or morphological) divergence, the two types of trees may become identical to each other.,Cladistics and Phenetics,Trees ar
38、e drawn based on the conserved charactersTrees are based on some measure of distance between the leaves Molecular phylogenies are inferred from molecular (usually sequence) dataeither cladistic (e.g. gene order) or ph
39、enetic,Cladistics and Phenetics,The maximum parsimony method is a typical representative of the cladistic approach, whereas the UPGMA method is a typical phenetic method. The other methods, however, cannot be classified
40、 easily according to the above criteria.,Rooted by outgroup,archaea,archaea,archaea,bacteria outgroup,,root,,eukaryote,eukaryote,eukaryote,eukaryote,Unrooted tree,archaea,archaea,archaea,Monophyletic group,Monophyletic
41、group,,,Rooted tree,outgroup,Unrooted vs Rooted tree,Rooting the Tree,In an unrooted tree the direction of evolution is unknown.The root is the hypothesized ancestor of the sequences in the tree.The root can either be
42、placed on a branch or at a node.,Inferring evolutionary relationships between the taxa requires rooting the tree:,To root a tree mentally, imagine that the tree is made of string. Grab the string at the root and tug
43、 on it until the ends of the string (the taxa) fall opposite the root:,Note that in this rooted tree, taxon A is no more closely related to taxon B than it is to C or D.,Unrooted tree,,Now, try it again with the root at
44、another position:,,A,B,,C,Root,,,D,Unrooted tree,,Note that in this rooted tree, taxon A is most closely related to taxon B, and together they are equally distantly related to taxa C and D.,C,D,Root,Rooted tree,A,,,,,,,B
45、,An unrooted, four-taxon tree theoretically can be rooted in five different places to produce five different rooted trees,The unrooted tree 1:,A,C,B,D,,These trees show five different evolutionary relationships among the
46、 taxa!,All of these rearrangements show the same evolutionary relationships between the taxa,B,D,A,C,,,Rooted tree 1a,,,,,,,By outgroup: Uses taxa (the “outgroup”) that are known to fall outside of the group of intere
47、st (the “ingroup”). Requires some prior knowledge about the relationships among the taxa. The outgroup can either be species (e.g., birds to root a mammalian tree) or previous gene duplicates (e.g., a-globins to root
48、b-globins).,There are two major ways to root trees:,,,,A,B,C,D,,,,10,2,3,5,2,,By midpoint or distance:Roots the tree at the midway point between the two most distant taxa in the tree, as determined by branch lengths.
49、Assumes that the taxa are evolving in a clock-like manner. This assumption is built into some of the distance-based tree building methods.,,,,,,,,,,,outgroup,d (A,D) = 10 + 3 + 5 = 18Midpoint = 18 / 2 = 9,Rooting and T
50、ree Interpretation,,,How Many Trees?,(assuming bifurcation only),How Many Trees?,進化樹的常用構建方法,系統(tǒng)發(fā)育樹構建的基本方法,最大簡約法(maximum parsimony,MP)距離法(distance)最大似然法(maximum likelihood,ML),Maximum Parsimony,Check each topologyCoun
51、t the minimum number of changes required to explain the dataChoose the tree with the smallest number of changes,Maximum Parsimony,ACT,GTT,GTT,GTA,ACA,GTA,,,1,2,2,MP score = 5,,,ACA,ACT,GTA,GTT,ACA,ACT,3,1,3,MP score = 7
52、,,,ACT,ACA,GTT,GTA,ACA,GTA,1,2,1,MP score = 4,Optimal MP tree,Maximum Parsimony: Limitations,With only a few sequences, becomes computationally intractable (“NP-hard”) # of rooted trees = (2n-3)!2n-2(n-
53、2)!# of unrooted trees = (2n-5)!2n-3(n-3)! Number of possible trees (Felsenstein 1978) #of species #rooted trees #unrooted trees211 331 4153 510515 10
54、3.44x1072.03x106152.13x1014 7.91x1012208.20x1021 2.21x1020,Maximum Parsimony: Limitations,Long Branches AttractionIn a set of sequences evolving at different rates the sequences evolving rapidly have been
55、 observed to be drawn together.,Long Branches Attraction,NJ tree based on CNVs,,Distance Methods,Distance Methods,Distance Method Criteria,Distance methods,Normally fast and simplee.g. UPGMA, Neighbour Joining, Minimum
56、Evolution,UPGMA,UPGMA: Visually,UPGMA: example,UPGMA: example,UPGMA: example,UPGMA weaknesses,UPGMA weaknesses,Neighbor Joining,Neighbor Joining (NJ),,,,,,8,7,6,5,4,1,2,3,,Start off with star tree; pull out pairs at a ti
57、me,NJ Algorithm,NJ Algorithm,NJ Algorithm,NJ Performance,Minimum Evolution,The total length of all branches in the tree should be a minimum.Neighbour joining is an approximation to minimum evolution.It has been shown t
58、hat the minimum evolution tree is expected to be the true tree provided branch lengths corrected for multiple hits.,Maximum Likelihood,Maximum Likelihood,Maximum Likelihood,Maximum Likelihood,Maximum Likelihood,The maxim
59、um likelihood method is a phenetic method that is statistically well founded. It has often lower variance than other methods (ie. it is frequently the estimation method least affected by sampling error) and tends to be
60、robust to many violations of the assumptions in the evolutionary model. Even with very short sequences maximum likelihood tends to outperform alternative methods such as parsimony or distance methods. Different tree topo
61、logies are evaluated. An important disadvantage is that it is very CPU intensive and thus time consuming and not appropriate for large datasets.,Phylogeny Flowchart,Difference in Methods,Comparison of methods,Neighbour
62、Joining (NJ) is very fast but depends on accurate estimates of distance. This is more difficult with very divergent dataParsimony suffers from Long Branch Attraction. This may be a particular problem for very divergen
63、t dataNJ can suffer from Long Branch AttractionParsimony is also computationally intensiveCodon usage bias can be a problem for MP and NJMaximum Likelihood is the most reliable but depends on the choice of model and
64、is very slowMethods may be combined,Comparison of Methods,進化樹的檢驗方法,Bootstrapping: how dependent is the tree on the dataset1. Randomly choose n objects from your dataset of n, with replacement2. Rebuild the tree b
65、ased on the subset of the data3. Repeat 1,000 – 10,000 times4. How often are the same children joined?,Jackknifing: how dependent is the tree on the dataset1. Randomly choose k objects from your dataset of n, w
66、ithout replacement2. Rebuild the tree based on the subset of the data3. Repeat 1,000 – 10,000 times4. How often are the same children joined?,How confident am I that my tree is correct?,Assessing Reliability:Bo
67、otstrap,Assessing Reliability:Bootstrap,Assessing Reliability:Bootstrap,Assessing Reliability:Bootstrap,Bootstrapping is a very valuable and widely used technique (it is demanded by some journals)BPs give an idea of
68、how likely a given branch would be to be unaffected if additional data, with the same distribution, became availableBPs are not the same as confidence intervals. There is no simple mapping between bootstrap values and
69、confidence intervals. There is no agreement about what constitutes a ‘good’ bootstrap value (> 70%, > 80%, > 85% ????)Some theoretical work indicates that BPs can be a conservative estimate of confidence inter
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
- 4. 未經權益所有人同意不得將文件中的內容挪作商業(yè)或盈利用途。
- 5. 眾賞文庫僅提供信息存儲空間,僅對用戶上傳內容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
- 6. 下載文件中如有侵權或不適當內容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- bootstrap入門教程
- bootstrap常用類簡介
- 32276.bootstrap方法及其應用
- 基于Bootstrap方法的基金業(yè)績評價.pdf
- 基于Bootstrap的擴散過程檢驗.pdf
- 基于Bootstrap方法的循環(huán)平穩(wěn)特征檢測.pdf
- 基于Bootstrap思想的期權非參數(shù)定價.pdf
- 管狀模型中半參數(shù)Bootstrap算法研究.pdf
- NA隨機序列的bootstrap收斂性.pdf
- 針對bootstrap中tabs控件的美化和完善
- 26347.時間序列模型檢驗及其bootstrap預測
- 基于Bootstrap方法的車身制造質量控制.pdf
- 基于Bootstrap的分級基金績效評價研究.pdf
- Bootstrap方法在實物期權定價中的應用.pdf
- 非參數(shù)回歸函數(shù)的穩(wěn)健Bootstrap.pdf
- Jackknife估計與Bootstrap估計的理論與應用.pdf
- 基于Bootstrap DBN模型的基因調控網絡構建.pdf
- 自助法(bootstrap)的模擬及應用研究.pdf
- 基于Bootstrap方法的信用風險度量及應用.pdf
- 基于Bootstrap方法的信源數(shù)估計算法研究.pdf
評論
0/150
提交評論