25
pages
English
Documents
Écrit par
Randy Linder
Publié par
pefav
Obtenez un accès à la bibliothèque pour le consulter en ligne En savoir plus
Découvre YouScribe et accède à tout notre catalogue !
Découvre YouScribe et accède à tout notre catalogue !
25
pages
English
Documents
Obtenez un accès à la bibliothèque pour le consulter en ligne En savoir plus
Publié par
Langue
English
Simultaneous estimation of
alignments and trees
Tandy Warnow
The University of Texas at Austin
(joint work with Randy Linder, Kevin Liu,
Serita Nelesen, and Sindhu Raghavan)DNA Sequence Evolution
-3 mil yrsAAGACTT
-2 mil yrs
AAGGCCT TGGACTT
-1 mil yrs
AGGGGGCAT TAGCCCT AGCACTT
todayAGGGCAT TAGCCCA TAGACTT AGCACAA AGCGCTTFN
FN: false negative
(missing edge)
FP: false positive
(incorrect edge)
FP
50% error rateDeletion Mutation
…ACGGTGCAGTTACCA…
…ACCAGTCACCA…
indels (insertions and deletions) also
occur!Input: unaligned sequences
S1 = AGGCTATCACCTGACCTCCA
S2 = TAGCTATCACGACCGC
S3 = TAGCTGACCGC
S4 = TCACGACCGACAPhase 1: Multiple Sequence
Alignment
S1 = AGGCTATCACCTGACCTCCA S1 = -AGGCTATCACCTGACCTCCA
S2 = TAGCTATCACGACCGC S2 = TAG-CTATCAC--GACCGC--
S3 = TAGCTGACCGC S3 = TAG-CT-------GACCGC--
S4 = TCACGACCGACA S4 = -------TCAC--GACCGACAPhase 2: Construct tree
S1 = AGGCTATCACCTGACCTCCA S1 = -AGGCTATCACCTGACCTCCA
S2 = TAGCTATCACGACCGC S2 = TAG-CTATCAC--GACCGC--
S3 = TAGCTGACCGC S3 = TAG-CT-------GACCGC--
S4 = TCACGACCGACA S4 = -------TCAC--GACCGACA
S1 S2
S3S4DNA sequence evolution
Simulation using ROSE: 100 taxon model trees, models 1-4 have “long gaps”,
and 5-8 have “short gaps”, site substitution is HKY+GammaSimultaneous estimation?
• Statistical methods (e.g., AliFritz and
BaliPhy) cannot be applied to datasets
above ~20 sequences.
• POY attempts to solve the NP-hard
“minimum treelength” problem, and can
be applied to larger datasets.POY vs. Clustal
• Ogden and Rosenberg did a simulation study
showing POY 3.0 alignments (using simple
gap penalties) were less accurate than
Clustal alignments on over 99% of the
datasets they generated.
• Simple gap penalties are of the form
gapcost(L)=cL for some constant c