Simultaneous estimation ofalignments and trees Tandy WarnowThe University of Texas at Austin joint work with Randy Linder Kevin Liu Serita Nelesen and Sindhu Raghavan

icon

25

pages

icon

English

icon

Documents

Écrit par

Publié par

Lire un extrait
Lire un extrait

Obtenez un accès à la bibliothèque pour le consulter en ligne En savoir plus

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris
icon

25

pages

icon

English

icon

Ebook

Lire un extrait
Lire un extrait

Obtenez un accès à la bibliothèque pour le consulter en ligne En savoir plus

Simultaneous estimation ofalignments and trees Tandy WarnowThe University of Texas at Austin(joint work with Randy Linder, Kevin Liu,Serita Nelesen, and Sindhu Raghavan)

  • kevin liu

  • error rate

  • evolution aagactt

  • simultaneous estimation

  • false negative

  • dna sequence

  • mil yrs

  • tagccca tagactt

  • today agggcat


Voir Alternate Text

Publié par

Nombre de lectures

26

Langue

English

Simultaneous estimation of
alignments and trees
Tandy Warnow
The University of Texas at Austin
(joint work with Randy Linder, Kevin Liu,
Serita Nelesen, and Sindhu Raghavan)DNA Sequence Evolution
-3 mil yrsAAGACTT
-2 mil yrs
AAGGCCT TGGACTT
-1 mil yrs
AGGGGGCAT TAGCCCT AGCACTT
todayAGGGCAT TAGCCCA TAGACTT AGCACAA AGCGCTTFN
FN: false negative
(missing edge)
FP: false positive
(incorrect edge)
FP
50% error rateDeletion Mutation
…ACGGTGCAGTTACCA…
…ACCAGTCACCA…
indels (insertions and deletions) also
occur!Input: unaligned sequences
S1 = AGGCTATCACCTGACCTCCA
S2 = TAGCTATCACGACCGC
S3 = TAGCTGACCGC
S4 = TCACGACCGACAPhase 1: Multiple Sequence
Alignment
S1 = AGGCTATCACCTGACCTCCA S1 = -AGGCTATCACCTGACCTCCA
S2 = TAGCTATCACGACCGC S2 = TAG-CTATCAC--GACCGC--
S3 = TAGCTGACCGC S3 = TAG-CT-------GACCGC--
S4 = TCACGACCGACA S4 = -------TCAC--GACCGACAPhase 2: Construct tree
S1 = AGGCTATCACCTGACCTCCA S1 = -AGGCTATCACCTGACCTCCA
S2 = TAGCTATCACGACCGC S2 = TAG-CTATCAC--GACCGC--
S3 = TAGCTGACCGC S3 = TAG-CT-------GACCGC--
S4 = TCACGACCGACA S4 = -------TCAC--GACCGACA
S1 S2
S3S4DNA sequence evolution
Simulation using ROSE: 100 taxon model trees, models 1-4 have “long gaps”,
and 5-8 have “short gaps”, site substitution is HKY+GammaSimultaneous estimation?
• Statistical methods (e.g., AliFritz and
BaliPhy) cannot be applied to datasets
above ~20 sequences.
• POY attempts to solve the NP-hard
“minimum treelength” problem, and can
be applied to larger datasets.POY vs. Clustal
• Ogden and Rosenberg did a simulation study
showing POY 3.0 alignments (using simple
gap penalties) were less accurate than
Clustal alignments on over 99% of the
datasets they generated.
• Simple gap penalties are of the form
gapcost(L)=cL for some constant c

Voir Alternate Text
  • Univers Univers
  • Ebooks Ebooks
  • Livres audio Livres audio
  • Presse Presse
  • Podcasts Podcasts
  • BD BD
  • Documents Documents
Alternate Text