Simultaneous estimation ofalignments and trees Tandy WarnowThe University of Texas at Austin joint work with Randy Linder Kevin Liu Serita Nelesen and Sindhu Raghavan

pages

English

Documents

Écrit par
Randy Linder

Publié par
pefav

Lire un extrait

Obtenez un accès à la bibliothèque pour le consulter en ligne En savoir plus

Découvre YouScribe et accède à tout notre catalogue !

Je m'inscris

Découvre YouScribe et accède à tout notre catalogue !

Je m'inscris

pages

English

Documents

Lire un extrait

Obtenez un accès à la bibliothèque pour le consulter en ligne En savoir plus

Publié par

pefav

Nombre de lectures

Langue

English

Simultaneous estimation ofalignments and trees Tandy WarnowThe University of Texas at Austin(joint work with Randy Linder, Kevin Liu,Serita Nelesen, and Sindhu Raghavan)

kevin liu

error rate

evolution aagactt

simultaneous estimation

false negative

dna sequence

mil yrs

tagccca tagactt

today agggcat

Voir

Publié par

pefav

Langue

English

Linder Type I and type II errors Nucleic acid sequence

Simultaneous estimation of
alignments and trees
Tandy Warnow
The University of Texas at Austin
(joint work with Randy Linder, Kevin Liu,
Serita Nelesen, and Sindhu Raghavan)DNA Sequence Evolution
-3 mil yrsAAGACTT
-2 mil yrs
AAGGCCT TGGACTT
-1 mil yrs
AGGGGGCAT TAGCCCT AGCACTT
todayAGGGCAT TAGCCCA TAGACTT AGCACAA AGCGCTTFN
FN: false negative
(missing edge)
FP: false positive
(incorrect edge)
FP
50% error rateDeletion Mutation
…ACGGTGCAGTTACCA…
…ACCAGTCACCA…
indels (insertions and deletions) also
occur!Input: unaligned sequences
S1 = AGGCTATCACCTGACCTCCA
S2 = TAGCTATCACGACCGC
S3 = TAGCTGACCGC
S4 = TCACGACCGACAPhase 1: Multiple Sequence
Alignment
S1 = AGGCTATCACCTGACCTCCA S1 = -AGGCTATCACCTGACCTCCA
S2 = TAGCTATCACGACCGC S2 = TAG-CTATCAC--GACCGC--
S3 = TAGCTGACCGC S3 = TAG-CT-------GACCGC--
S4 = TCACGACCGACA S4 = -------TCAC--GACCGACAPhase 2: Construct tree
S1 = AGGCTATCACCTGACCTCCA S1 = -AGGCTATCACCTGACCTCCA
S2 = TAGCTATCACGACCGC S2 = TAG-CTATCAC--GACCGC--
S3 = TAGCTGACCGC S3 = TAG-CT-------GACCGC--
S4 = TCACGACCGACA S4 = -------TCAC--GACCGACA
S1 S2
S3S4DNA sequence evolution
Simulation using ROSE: 100 taxon model trees, models 1-4 have “long gaps”,
and 5-8 have “short gaps”, site substitution is HKY+GammaSimultaneous estimation?
• Statistical methods (e.g., AliFritz and
BaliPhy) cannot be applied to datasets
above ~20 sequences.
• POY attempts to solve the NP-hard
“minimum treelength” problem, and can
be applied to larger datasets.POY vs. Clustal
• Ogden and Rosenberg did a simulation study
showing POY 3.0 alignments (using simple
gap penalties) were less accurate than
Clustal alignments on over 99% of the
datasets they generated.
• Simple gap penalties are of the form
gapcost(L)=cL for some constant c

Voir