bootstrap_第1頁
已閱讀1頁,還剩108頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領

文檔簡介

1、人類群體遺傳學基本原理和分析方法,中科院-馬普學會計算生物學伙伴研究所,中國科學院上海生命科學研究院研究生課程 人類群體遺傳學,,,徐書華 金 力,第三講,進化樹的構建方法及應用,進化樹的構建方法及應用,進化樹的概念及相關的術語;進化樹的種類;進化樹的常用構建方法;進化樹的檢驗方法;進化樹的應用;什么情況下使用什么方法最合適?構建進化樹的常用軟件;練習,進化樹的概念及相關的術語,The purpose of

2、 a phylogenetic tree is to illustrate how a group of objects (usually genes or organisms) are related to one another,Phylogeny (phylo =tribe + genesis),Phylogeny,Orangutan,Gorilla,Chimpanzee,Human,From the Tree of the Li

3、fe Website,University of Arizona,Phylogenetic trees are about visualising evolutionary relationships,Phylogenetic trees diagram the evolutionary relationships between the taxa,((A,(B,C)),(D,E)) = The above phylogeny a

4、s nested parentheses,These say that B and C are more closely related to each other than either is to A,and that A, B, and C form a clade that is a sister group to the clade composed ofD and E. If the tree has a time s

5、cale, then D and E are the most closely related.,Clades,Evolutionary trees depict clades. A clade is a group of organisms that includes an ancestor and all descendents of that ancestor. You can think of a clade as a bra

6、nch on the tree of life.,Molecular Evolution - Li,Terminology,? External nodes: things under comparison; operational taxonomic units (OTUs)? Internal nodes: ancestral units; hypothetical; goal is to group current

7、 day units? Root: common ancestor of all OTUs under study. Path from root to node defines evolutionary path? Unrooted: specify relationship but not evolutionary path– If have an outgroup (external reason to believe ce

8、rtain OTU branched off first), then can root? Topology: branching pattern of a tree? Branch length: amount of difference that occurred along a branch,Ancestral Node or ROOT of the Tree,Internal Nodes orDivergence Po

9、ints (represent hypothetical ancestors of the taxa),Branches or Lineages,Terminal Nodes,,,,,,,,,,,,,,A,B,C,D,E,,,,,,,,,Represent theTAXA (genes,populations,species, etc.)used to inferthe phylogeny,Common Phylogene

10、tic Tree Terminology,Terminology,HomologueOrthologueParalogue,Homologs are commonly defined as orthologs, paralogs, or xenologs.Orthologs are homologs resulting from speciation. They are genes that stem from a common

11、 ancestor. Orthologs often have similar functions. SPO11 (Baudat et al. Mol Cell 2000) Paralogs are homologs resulting from gene duplication. They are genes derived from a common ancestral locus that was duplicated w

12、ithin the genome of an organism. Paralogs tend to have different functions. CLB1/CLB2 (Brachat et al. GenomeBiology 2003). Xenologs are homologs resulting from the horizontal transfer of a gene between two organisms.

13、 The function of xenologs can be variable. VDE (Okuda et al. Yeast 2003),Character-based methods can tease apart types of similarity and theoreticallyfind the true evolutionary tree. Similarity = relationship only if c

14、ertain conditionsare met (if the distances are ‘ultrametric’).,Types of Similarity,Observed similarity between two entities can be due to:Evolutionary relationship:Shared ancestral characters (‘plesiomorphies’)Share

15、d derived characters (‘’synapomorphy’)Homoplasy (independent evolution of the same character):Convergent events (in either related on unrelated entities),Parallel events (in related entities), Reversals (in related e

16、ntities),,,,C,G,G,,,C,,,,C,G,G,,,,,T,,Homology and Homoplasy,no hair,no wings,Homology:identity due to shared ancestry(evolutionary signal),Homoplasy:identity despite separate ancestry(evolutionary noise),,,paralogs,

17、,,orthologs,,paralogs,,,,orthologs,Erik L.L. Sonnhammer Orthology,paralogy and proposed classification for paralog subtypes TRENDS in Genetics Vol.18 No.12 December 2002http://tig.trends.com 0168-9525/02/$ – see front

18、matter © 2002 Elsevier Science Ltd. All rights reserved.,The Molecular Clock,For a given protein the rate of sequence evolution is approximately constant across lineagesZuckerkandl and Pauling (1965),This would all

19、ow speciation and duplication events to be dated accurately based on molecular data,Local and approximate molecular clocks more reasonable,Relative Rate Test,Test whether sets of sequences are evolving at equal rates (lo

20、cal molecular clock hypothesis),e.g. RRTree, Robinson-Rechavi http://pbil.univ-lyon1.fr/software/rrtree.html,進化樹的種類,Trees,Diagram consisting of branches and nodes Species tree (how are my species related?) contains onl

21、y one representative from each species. all nodes indicate speciation eventsGene tree (how are my genes related?)normally contains a number of genes from a single speciesnodes relate either to speciation or gene du

22、plication events,Gene tree, species tree,We often assume that gene trees give us species trees,a,b,c,A,B,D,Gene tree,Species tree,Gene tree - Species tree,The two events - mutation and speciation- are not expected to occ

23、ur at the same time. So gene trees cannot represent species tree.,,,,,,,,,,Taxon A,Taxon B,Taxon C,Taxon D,,1,1,1,6,3,5,,genetic change,,,,,,,Taxon A,Taxon B,Taxon C,Taxon D,,,,,,,Taxon A,Taxon B,Taxon C,Taxon D,,no mean

24、ing,Three types of trees,Cladogram Phylogram Ultrametric tree,All show the same evolutionary relationships, or branching orders, between the taxa.,,Tree Properties,In simple scenari

25、os, evolutionary trees are ultrametric and phylograms are additive.,,,,,,,Bacterium 1,Bacterium 3,Bacterium 2,Eukaryote 1,Eukaryote 4,Eukaryote 3,Eukaryote 2,,,,,,,Bacterium 1,Bacterium 3,Bacterium 2,Eukaryote 1,Eukaryo

26、te 4,Eukaryote 3,Eukaryote 2,Phylograms show branch order and branch lengths,Cladograms vs Phylograms,Cladograms show branching order - branch lengths are meaningless,Phenetics,Phenetics, when first introduced (Michener

27、 and Sokal, 1957), challenged the prevailing view that classifications should be based on comparisons between a limited number of characters that taxonomists believed to be important for one reason or another. Phenetici

28、sts argued that classifications should encompass as many variable characters as possible, these characters being scored numerically and analyzed by rigorous mathematical methods.,Cladistics,Cladistics (Hennig, 1966) also

29、 emphasizes the need for large datasets but differs from phenetics in that it does not give equal weight to all characters. The argument is that in order to infer the branching order in a phylogeny it is necessary to di

30、stinguish those characters that provide a good indication of evolutionary relationships from other characters that might be misleading. This might appear to take us back to the pre-phenetic approach but cladistics is mu

31、ch less subjective: rather than making assumptions about which characters are ‘important', cladistics demands that the evolutionary relevance of individual characters be defined. In particular, errors in the branchi

32、ng pattern within a phylogeny are minimized by recognizing two types of anomalous data.,Why Cladistics? Convergent evolution and Derived character states,Convergent evolution,Derived character state,Phenetics versus Cla

33、distics,Phenetics is the study of relationships among a group of organisms on the basis of the degree of similarity between them, be that similarity molecular, phenotypic, or anatomical. A tree-like network expressing

34、phenetic relationships is called a phenogram.,Phenetics versus Cladistics,Cladistics can be defined as the study of the pathways of evolution. In other words, cladists are interested in such questions as: how many bran

35、ches there are among a group of organisms; which branch connects to which other branch; and what is the branching sequence. A tree-like network that expresses such ancestor-descendant relationships is called a cladogram

36、. Thus, a cladogram refers to the topology of a rooted phylogenetic tree.,Phenetics versus Cladistics,While a phenogram may serve as an indicator of cladistic relationships, it is not necessarily identical to the cladog

37、ram. If there is a linear relationship between the time of divergence and the degree of genetic (or morphological) divergence, the two types of trees may become identical to each other.,Cladistics and Phenetics,Trees ar

38、e drawn based on the conserved charactersTrees are based on some measure of distance between the leaves Molecular phylogenies are inferred from molecular (usually sequence) dataeither cladistic (e.g. gene order) or ph

39、enetic,Cladistics and Phenetics,The maximum parsimony method is a typical representative of the cladistic approach, whereas the UPGMA method is a typical phenetic method. The other methods, however, cannot be classified

40、 easily according to the above criteria.,Rooted by outgroup,archaea,archaea,archaea,bacteria outgroup,,root,,eukaryote,eukaryote,eukaryote,eukaryote,Unrooted tree,archaea,archaea,archaea,Monophyletic group,Monophyletic

41、group,,,Rooted tree,outgroup,Unrooted vs Rooted tree,Rooting the Tree,In an unrooted tree the direction of evolution is unknown.The root is the hypothesized ancestor of the sequences in the tree.The root can either be

42、placed on a branch or at a node.,Inferring evolutionary relationships between the taxa requires rooting the tree:,To root a tree mentally, imagine that the tree is made of string. Grab the string at the root and tug

43、 on it until the ends of the string (the taxa) fall opposite the root:,Note that in this rooted tree, taxon A is no more closely related to taxon B than it is to C or D.,Unrooted tree,,Now, try it again with the root at

44、another position:,,A,B,,C,Root,,,D,Unrooted tree,,Note that in this rooted tree, taxon A is most closely related to taxon B, and together they are equally distantly related to taxa C and D.,C,D,Root,Rooted tree,A,,,,,,,B

45、,An unrooted, four-taxon tree theoretically can be rooted in five different places to produce five different rooted trees,The unrooted tree 1:,A,C,B,D,,These trees show five different evolutionary relationships among the

46、 taxa!,All of these rearrangements show the same evolutionary relationships between the taxa,B,D,A,C,,,Rooted tree 1a,,,,,,,By outgroup: Uses taxa (the “outgroup”) that are known to fall outside of the group of intere

47、st (the “ingroup”). Requires some prior knowledge about the relationships among the taxa. The outgroup can either be species (e.g., birds to root a mammalian tree) or previous gene duplicates (e.g., a-globins to root

48、b-globins).,There are two major ways to root trees:,,,,A,B,C,D,,,,10,2,3,5,2,,By midpoint or distance:Roots the tree at the midway point between the two most distant taxa in the tree, as determined by branch lengths.

49、Assumes that the taxa are evolving in a clock-like manner. This assumption is built into some of the distance-based tree building methods.,,,,,,,,,,,outgroup,d (A,D) = 10 + 3 + 5 = 18Midpoint = 18 / 2 = 9,Rooting and T

50、ree Interpretation,,,How Many Trees?,(assuming bifurcation only),How Many Trees?,進化樹的常用構建方法,系統(tǒng)發(fā)育樹構建的基本方法,最大簡約法(maximum parsimony,MP)距離法(distance)最大似然法(maximum likelihood,ML),Maximum Parsimony,Check each topologyCoun

51、t the minimum number of changes required to explain the dataChoose the tree with the smallest number of changes,Maximum Parsimony,ACT,GTT,GTT,GTA,ACA,GTA,,,1,2,2,MP score = 5,,,ACA,ACT,GTA,GTT,ACA,ACT,3,1,3,MP score = 7

52、,,,ACT,ACA,GTT,GTA,ACA,GTA,1,2,1,MP score = 4,Optimal MP tree,Maximum Parsimony: Limitations,With only a few sequences, becomes computationally intractable (“NP-hard”) # of rooted trees = (2n-3)!2n-2(n-

53、2)!# of unrooted trees = (2n-5)!2n-3(n-3)! Number of possible trees (Felsenstein 1978) #of species #rooted trees #unrooted trees211 331 4153 510515 10

54、3.44x1072.03x106152.13x1014 7.91x1012208.20x1021 2.21x1020,Maximum Parsimony: Limitations,Long Branches AttractionIn a set of sequences evolving at different rates the sequences evolving rapidly have been

55、 observed to be drawn together.,Long Branches Attraction,NJ tree based on CNVs,,Distance Methods,Distance Methods,Distance Method Criteria,Distance methods,Normally fast and simplee.g. UPGMA, Neighbour Joining, Minimum

56、Evolution,UPGMA,UPGMA: Visually,UPGMA: example,UPGMA: example,UPGMA: example,UPGMA weaknesses,UPGMA weaknesses,Neighbor Joining,Neighbor Joining (NJ),,,,,,8,7,6,5,4,1,2,3,,Start off with star tree; pull out pairs at a ti

57、me,NJ Algorithm,NJ Algorithm,NJ Algorithm,NJ Performance,Minimum Evolution,The total length of all branches in the tree should be a minimum.Neighbour joining is an approximation to minimum evolution.It has been shown t

58、hat the minimum evolution tree is expected to be the true tree provided branch lengths corrected for multiple hits.,Maximum Likelihood,Maximum Likelihood,Maximum Likelihood,Maximum Likelihood,Maximum Likelihood,The maxim

59、um likelihood method is a phenetic method that is statistically well founded. It has often lower variance than other methods (ie. it is frequently the estimation method least affected by sampling error) and tends to be

60、robust to many violations of the assumptions in the evolutionary model. Even with very short sequences maximum likelihood tends to outperform alternative methods such as parsimony or distance methods. Different tree topo

61、logies are evaluated. An important disadvantage is that it is very CPU intensive and thus time consuming and not appropriate for large datasets.,Phylogeny Flowchart,Difference in Methods,Comparison of methods,Neighbour

62、Joining (NJ) is very fast but depends on accurate estimates of distance. This is more difficult with very divergent dataParsimony suffers from Long Branch Attraction. This may be a particular problem for very divergen

63、t dataNJ can suffer from Long Branch AttractionParsimony is also computationally intensiveCodon usage bias can be a problem for MP and NJMaximum Likelihood is the most reliable but depends on the choice of model and

64、is very slowMethods may be combined,Comparison of Methods,進化樹的檢驗方法,Bootstrapping: how dependent is the tree on the dataset1. Randomly choose n objects from your dataset of n, with replacement2. Rebuild the tree b

65、ased on the subset of the data3. Repeat 1,000 – 10,000 times4. How often are the same children joined?,Jackknifing: how dependent is the tree on the dataset1. Randomly choose k objects from your dataset of n, w

66、ithout replacement2. Rebuild the tree based on the subset of the data3. Repeat 1,000 – 10,000 times4. How often are the same children joined?,How confident am I that my tree is correct?,Assessing Reliability:Bo

67、otstrap,Assessing Reliability:Bootstrap,Assessing Reliability:Bootstrap,Assessing Reliability:Bootstrap,Bootstrapping is a very valuable and widely used technique (it is demanded by some journals)BPs give an idea of

68、how likely a given branch would be to be unaffected if additional data, with the same distribution, became availableBPs are not the same as confidence intervals. There is no simple mapping between bootstrap values and

69、confidence intervals. There is no agreement about what constitutes a ‘good’ bootstrap value (> 70%, > 80%, > 85% ????)Some theoretical work indicates that BPs can be a conservative estimate of confidence inter

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經權益所有人同意不得將文件中的內容挪作商業(yè)或盈利用途。
  • 5. 眾賞文庫僅提供信息存儲空間,僅對用戶上傳內容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
  • 6. 下載文件中如有侵權或不適當內容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論