IEEE ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS

icon

13

pages

icon

English

icon

Documents

Lire un extrait
Lire un extrait

Obtenez un accès à la bibliothèque pour le consulter en ligne En savoir plus

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris
icon

13

pages

icon

English

icon

Documents

Lire un extrait
Lire un extrait

Obtenez un accès à la bibliothèque pour le consulter en ligne En savoir plus

IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 1 Fixed-Parameter Tractability of the Maximum Agreement Supertree Problem Sylvain Guillemot and Vincent Berry Abstract— Given a set L of labels and a collection of rooted trees whose leaves are bijectively labelled by some elements of L, the Maximum Agreement Supertree problem (SMAST) is as follows: find a tree T on a largest label set L? ? L that homeomorphically contains every input tree restricted to L?. The problem has phylogenetic applications to infer supertrees and perform tree congruence analyses. In this paper, we focus on the parameterized complexity of this NP-hard problem, considering different combinations of parameters as well as particular cases. We show that SMAST on k rooted binary trees on a label set of size n can be solved in O((8n)k) time, which is an improvement with respect to the previously known O(n3k2 ) time algorithm. In this case, we also give an O((2k)pkn2) time algorithm, where p is an upper bound on the number of leaves of L missing in a SMAST solution. This shows that SMAST can be solved efficiently when the input trees are mostly congruent. Then for the particular case where any triple of leaves is contained in at least one input tree, we give O(4pn3) and O(3.12p + n4) time algorithms, obtaining the first fixed-parameter tractable algorithms on a single parameter for this problem.

  • smast can

  • can only

  • when such

  • such rogue

  • smast

  • solving smast

  • input trees

  • agreement supertree


Voir icon arrow

Publié par

Nombre de lectures

13

Langue

English

IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS
Fixed-Parameter Tractability of the Maximum Agreement Supertree Problem Sylvain Guillemot and Vincent Berry
1
Abstract — Given a set L of labels and a collection of rooted trees, but sometimes result from a combination of more sourc e trees whose leaves are bijectively labelled by some elements trees which do not conict when considered pairwise. of L , the Maximum Agreement Supertree problem (S MAST ) is The most used supertree methods focus on clades, e.g., the as follows: nd a tree T on a largest label set L L that homeomorphically contains every input tree restricted to L . The wmeeltlh-okdno[w3n],[M4]atarinxdiRtsepvraersieannttast.ioTnhiswiisthaPpraorsbilmemonywh(enMeRvePr) problem has phylogenetic applications to infer supertrees and perform tree congruence analyses. the input trees contain some “rogue” taxa, i.e., labels whose In this paper, we focus on the parameterized complexity of position greatly differs from one input tree to another. Suc h rogue this NP -hard problem, considering different combinations of taxa can result from horizontal gene transfer (HGT) events [ 5], parameters as well as particular cases. We show that S MAST a phenomenon that commonly arises in bacteria, plants, and t o on k rooted binary trees on a label set of size n can be solved a lesser extent among vertebrates. The presence of rogue tax a in O ((8 n ) k ) time, which is an improvement with respect to the gpirveeviaonuslyknown 2 O )( t n 3 k 2 ) timealgorithm.Insthaisncuapspe,erwbeoaulnsod acnadnihnednucceehtarveemaenndoonu-snecghliagnigbelseiinmtphaectcilandtehesestuopferatrneienopbuttatireneed, O ((2 k ) p kn ime algorithm, where p i on the number of leaves of L missing in a S MAST solution. This (bSy M c A la S d T e)-bmaseethdodm[et6h],od[s7.],T[8h]ehaMsaxbiemenumspeAcigreceamlleyntdeSsiugpneertdreteo shows that S MAST can be solved efciently when the input trees are mostly congruent. Then for the particular case where any deal with rogue taxa: it infers a supertree from a set of sourc e triple of leaves is contained in at least one input tree, we give trees by removing some labels, i.e., taxa, on the position of which O (4 p n 3 ) and O (3 12 p + n 4 ) time algorithms, obtaining the rst the source tree a re xed-parameter tractable algorithms on a single parameter for s dis g e. More precisely, given a collecti on T of this problem. We also obtain intractability results for several s ro u o p t e e r d tre tr e efeosrw T itihslaabtreelset T akoennainsuabsceotm L m o n s L etsuc L h,tahnat a e g a re c e h m tr e e n e t combinations of parameters, thus indicating that it is unlikely that xed-parameter tractable algorithms can be found in these of T restricted to L is included in T . The computational problem particular cases. called S MAST , or sometimes M ASP [6], consists of nding an Index Terms — Phylogenetics, maximum agreement supertree, agreement supertree containing the maximum number of label s parameterized complexity, algorithms, reductions, rooted triples. froTmhe L .S MAST method, an extension of the i greement max mum a subtree method (M AST ), specically allows the input trees to I. I NTRODUCTION have different, and usually overlapping, label sets. With t his to re lac A. Motivation. praecxtiibcialiltya,Sl M ic A at S i T onisswwheellr-eadinapptuetdtreeshpaveenoMn-i A d S e T ntiicnaslelvaebreall pp Supertree construction consists of building trees on a larg e sets. The rst such application is tree congruence analysis. Before set of labels from smaller trees covering parts of the label building a supertree for a set of source trees, it is essentia l to set. This task is applied in bioinformatics where trees repr esent certify that the source trees are not telling completely dif ferent phylogenies, but also in other elds such as databases [1] and data stories on the evolution of studied taxa: if a set of source mining [2]. In phylogenetics, the tree nodes represent sequ ences trees is not congruent enough, then no supertree can accurat ely or organisms ( taxa ), and the labels are bijectively associated with represent the set. This can be problematic when subsequent the leaves of the trees, representing current organisms, wh ile analyses have to be conducted from the supertree, e.g., meas uring internal nodes represent hypothetical ancestors. Rooted t rees are the inuence of geographical or climate factors on speciation usually described by their set of clades : a clade is the set of labels events. Several studies have recently proposed procedures to present under a same internal node. Clades represent relate d sets assess the congruence of a set of source trees by randomizati on of organisms such as species, orders, families, etc. The goa l of tests performed on M AST scores obtained for these trees: the supertree methods is to infer a tree that complies as closely as source trees become more congruent as the number of leaves possible with the topological information of the source tre es. The contained in their maximum agreement subtree increases [9] , [10], task is relatively easy when the input trees fully agree on th e [11]. The study of [9] focuses on the case where the considere d relative positions of the labels. In this case, it is possibl e to nd, source trees have different label sets and no taxon is common in polynomial time, a supertree that contains any input tree as to all trees. They propose to divide the congruence analysis into an induced subtree [1], hence fully respecting the topologi cal M AST computations on pairs of trees. The obtained values are information present in the data. However, practical input t rees then normalized and summarized by an average value, for whic h usually conict with respect to the relative positioning of some a p -value is computed by similar M AST computations on random labels. These incompatibilities sometimes affect only two input trees. Here, replacing M AST with S MAST copes with the fact Manuscript submitted July 13, 2007 tahnaatlyisniputcatrneebsehpaevrfeordimffeedrednitrelactbleyl:stehtes,wahnodltehsuesttohfetcreoensgcrauenncbee S. Guillemot and V. Berry are with the LIRMM, CNRS - University s Montpellier 2. Emails: { vberry,sguillem } @lirmm.fr considered at once, instead of resorting to separate analys es on
Voir icon more
Alternate Text