7
pages
English
Documents
Obtenez un accès à la bibliothèque pour le consulter en ligne En savoir plus
Découvre YouScribe en t'inscrivant gratuitement
Découvre YouScribe en t'inscrivant gratuitement
7
pages
English
Documents
Obtenez un accès à la bibliothèque pour le consulter en ligne En savoir plus
TAG Parsing as Model Enumeration
Ralph Debusmann Denys Duchier Marco Kuhlmann Stefan Thater
Programming Systems Lab Équipe Calligramme Programming Systems Lab Computational Linguistics
Saarland University LORIA – UMR 7503 Saarland University Saarland University
Postfach 15 11 50 Campus Scientifique, B. P. 239 Postfach 15 11 50 Postfach 15 11 50
66041 Saarbrücken 54506 Vandœuvre lès Nancy CEDEX 66041 Saarbrücken 66041 Saarbrücken
Germany France Germany Germany
rade@ps.uni-sb.de duchier@loria.fr kuhlmann@ps.uni-sb.de stth@coli.uni-sb.de
Abstract lem: given a description (a logical formula) of the input
string, enumerate all and only those well-ordered deriva-
This paper introduces well-ordered derivation tion trees that are licensed by . Based on earlier work by
trees and makes use of this concept in a novel Koller and Striegnitz (2002), we show that the solutions
axiomatization of the TAG parsing problem as to this problem can in turn be characterised as the solu-
a constraint satisfaction problem. Contrary to tions of a constraint-satisfaction problem (CSP) on finite
prior approaches, our axiomatization focuses set integer variables, which can be solved by state-of-the-
on the derivation trees rather than the derived art constraint technology.
trees. Well-ordered derivation trees are our pri-
Our approach offers at least two interesting perspec-
mary models, whereas the derived trees serve
tives. First, it enables the encoding of LTAG grammars as
solely to determine word order.
certain dependency grammars, thereby illuminating the
exact relation between the two formalisms. Second, the
1 Introduction formulation of the LTAG parsing problem as a CSP opens
up a large quantity of existing data to evaluate the con-
Tree Adjoining Grammar (TAG) relates strings with two
straint-based approach to parsing more thoroughly than
kinds of structures: derivation trees and corresponding
what could be done before.
derived trees. Derivation trees are more informative than
their corresponding derived trees in the sense that the
Plan of the paper. In §2, we show how the relation be-
derived trees can be reconstructed from them. However,
tween TAG derivation trees and elementary trees can be
derivation trees are usually interpreted as unordered trees;
formulated as the relation between models and logical
they then cannot be used to formulate the TAG parsing descriptions, and introduce the notion of a TAG satisfia-
problem directly, as they do not encode word order infor- bility problem. In §3, we extend satisfiability problems
mation. to parsing problems; we formalize the notion of well-or-
This paper suggests to interpret derivation trees as dered derivation trees as the structures under investiga-
ordered trees. It introduces the notion of well-ordered tion in these problems, and show how their solutions can
derivation trees: A derivation tree is called well-ordered be obtained by solving a CSP. We illustrate this approach
if its nodes stand in the same precedence relation as
by means of an example in §4. §5 discusses the two per-
the anchors in the corresponding derived tree. Because
spectives of our approach mentioned above. Finally, §6
TAG can generate non-context-free languages, well-or-
concludes the paper and presents ideas for future work.
dered derivation trees can be non-projective, i.e., they can
contain “crossing” edges. The main contribution of this
2 TAG Satisfiability Problem
paper is an axiomatization of the exact form of non-pro-
jectivity licensed by TAG operations. It thereby provides a There are two major and constrasting formal perspectives
novel model-theoretic interpretation of the LTAG parsing on parsing: proof-theoretic and model-theoretic. The for-
problem. mer emphasizes the construction of logical derivations,
The axiomatization of well-ordered derivation trees is while the latter more directly states how models sat-
put into practice in a description-based approach to TAG isfy descriptions. The model-theoretic perspective (Cor-
parsing, in which the parsing problem of strongly lexical- nell and Rogers, 1998) applied to a description-based
1ized TAGs is interpreted as a model enumeration prob- specification of parsing problems is often more readily
1 amenable to constraint-based processing. This is the viewA TAG is called strongly lexicalized, if each of its elementary
trees contains exactly one anchor. which we adopt for the remainder of this paper.
ffAs a first step in our description-based approach to TAG The notion of TAG satisfiability problem as outlined
parsing, we formulate the TAG satisfiability problem: above is implicit in (Koller and Striegnitz, 2002), who
formulate the surface realization problem of natural lan-Given a multiset of elementary trees, can they be com-
guage generation as the configuration problem of (un-bined using operations of adjunction and substitution to
ordered) dependency trees. A natural question is whetherform a complete derived tree?
this treatment can be extended to parsing problems.
To formally express TAG satisfiability problems, we in-
Given our formalization of TAG satisfiability problems,
troduce the following description language:
parsing problems cannot be expressed directly, as the
models under consideration—derivation trees—are un-′::= w : | ∧ , (1)
ordered trees. In order to express word order, a more natu-
ral class of models are derived trees, as these encode wordwhere w is taken from a set of tree variables, and from
order information in a direct way. However, the problema set of TAG elementary trees of some grammar. We call
in using derived trees is that the formalization of the sat-w : a tree literal. We say that is normal if every tree
isfaction relation becomes non-trivial, as the adjunctionvariable in appears precisely in one tree literal.
operation now requires a more complicated interpretationIt is well-known that the satisfiability problem is equiv-
of elementary trees—not as atomic entities, but as groupsalent to the existence of a derivation tree, hence the idea
of nodes that may get separated by material being “gluedto use derivation trees as models of normal descrip-
in between” by adjunction. If not conditioned carefully,tions. In order to make this more precise, we need some
this might lead to a formalism that is more expressivedefinitions:
than TAG (Muskens, 2001; Rogers, 2003).Let be the set of finite paths, i.e. the finite sequences
We suggest to solve the problem by considering deriva-of positive integers. For simplicity in the following, we
tion trees as being ordered. In the next section, we willshall identify a node of a tree with the path ∈ that
introduce the notion of well-ordered derivation trees,leads to it starting from the root of said tree.
which are possibly non-projective, ordered derivationA derivation tree is a tree(V,E) formed from vertices
trees whose precedence relation agrees with the prece-V and labeled edges E⊆ V×V× . A model of a normal
dence relation in the corresponding derived tree. This al-description w : ∧...∧w : is a derivation tree where1 1 n n
2 lows for an extension of the description language from (1)V ={w ,...,w } and such that the following conditions1 n
with precedence literals, which can be interpreted onare satisfied, where we write w − → w for an edge1 2
(well-ordered) derivation trees in a straightforward way.labeled with path and representing either an adjunction
or a substitution of at node of :2 1
3 Well-ordered Derivation Trees
• If w is the root of the tree, then must be an initialk k
Our ambition is to tackle the parsing problem wheretree.
word-order is part of the input specification. To this end,
• For each w , its outgoing edges must all be labeledi we formulate the TAG parsing problem analogously to our
with distinct paths, and for each substitution (resp. earlier definition of the TAG satisfiability problem:
adjunction) node in there must be exactly (resp.i
Given an ordered multiset of elementary trees, can they3at most) one -labeled edge.
be combined using operations of adjunction and substitu-
tion to form a complete derived tree where the respective• For each edge w − → w , if is a substitution1 2
anchors are realized in the same order as stipulated by(resp. adjunction) node in , then must be an ini-1 2
the input specification?tial (resp. auxiliary) tree.
To formally express the parsing problem, we extendIn order to model lexical ambiguity, the description
′our description language with precedence literals w≺ w :language can be extended with a limited form of disjunc-
tion, using for example the following extended language: ′ ′::= w : | w≺ w | ∧ (2)
k k ′::= w :{ ,..., } | ∧ ,k 1 nk ′ ′where w≺ w means that w’s anchor must precede w ’s.
For the same reasons as before, the approach that we willwhere the set is to be interpreted disjunctively.
develop for this language will trivially extend to one with
2Thus setting the interpretation of tree variables to be the iden- lexical ambiguity.
tity substantially simplifies the presentation.