7
pages
English
Documents
Obtenez un accès à la bibliothèque pour le consulter en ligne En savoir plus
Découvre YouScribe en t'inscrivant gratuitement
Découvre YouScribe en t'inscrivant gratuitement
7
pages
English
Documents
Obtenez un accès à la bibliothèque pour le consulter en ligne En savoir plus
Extensible Dependency Grammar: A New Methodology
Ralph Debusmann Denys Duchier Geert-Jan M. Kruijff
´Programming Systems Lab Computational LinguisticsEquipe Calligramme
Saarland University Saarland UniversityLORIA – UMR 7503
Postfach 15 11 50 Postfach 15 11 50Campus Scientifique, B. P. 239
66041 Saarbru¨cken 66041 Saarbru¨cken54506 Vandœuvre le`s Nancy CEDEX
Germany GermanyFrance
rade@ps.uni-sb.de gj@coli.uni-sb.deduchier@loria.fr
Abstract • number of dimensions: two in TDG (ID and
LP), arbitrary many in XDG
This paper introduces the new grammar formalism
of Extensible Dependency Grammar (XDG), and • set of principles: fixed in TDG, extensible
emphasizes the benefits of its methodology of ex- principle library in XDG
plaining complex phenomena by interaction of sim-
The structure of this paper is as follows: In§2, weple principles on multiple dimensions of linguis-
introduce XDG formally, and also the XDG solvertic description. This has the potential to increase
used for parsing and generation. In §3, we intro-modularity with respect to linguistic description and
duce a number of XDG principles informally, be-grammar engineering, and to facilitate concurrent
fore making use of them in an idealized example
processing and the treatment of ambiguity.
grammar in §4. In §5 we argue why XDG has the
potential to be an improvement over multi-stratal1 Introduction
and mono-stratal approaches, before we conclude in
We introduce the new grammar formalism of Exten- §6.
sible Dependency Grammar (XDG). In XDG, com-
plex phenomena arise out of the interaction of sim- 2 Extensible Dependency Grammar
ple principles on multiple dimensions of linguis- In this section, we introduce XDG formally and
tic description. In this paper, we point out how mention briefly the constraint-based XDG solver for
this novel methodology positions XDG in between parsing and generation.
multi-stratal approaches like LFG and MTT, and
2.1 Formalizationmono-stratal ones like HPSG, attempting to com-
bine their benefits and avoid their problems. Formally, an XDG grammar is built up of dimen-
It is the division of linguistic analyses into dif- sions, a lexicon and principles, and characterizes a
ferent dimensions which makes XDG multi-stratal. set of well-formed analyses.
On the other, XDG is mono-stratal in that its princi- A dimension is a tuple D=(Lab,Fea,Val,Pri) of
ples interact to constrain all dimensions simultane- a set Lab of edge labels, a set Fea of features, a set
ously. XDG combines the benefits of these two po- Val of feature values, and a set of one-dimensional
sitions, and attempts to circumvent their problems. principles Pri. A lexicon for the dimension D is a
From multi-stratal approaches, XDG adopts a high set Lex ⊆ Fea→ Val of total feature assignments
degree of modularity, both with respect to linguis- called lexical entries. An analysis on dimension
tic description as well as for grammar engineering. D is a triple (V,E,F) of a set V of nodes, a set
This also facilitates the statement of cross-linguistic E ⊆ V×V× Lab of directed labeled edges, and an
generalizations. XDG avoids the problem of placing assignment F : V →(Fea→ Val) of lexical entries
too high a burden on the interfaces, and allows in- to nodes. V and E form a graph. We write Ana forD
teractions between all and not only adjacent dimen- the set of all possible analyses on dimension D. The
sions. From mono-stratal approaches, XDG adopts principles characterize subsets of Ana . We assumeD
a high degree of integration, facilitating concurrent that the elements of Pri are finite representations of
processing and the treatment of ambiguity. At the such subsets.
nsame time, XDG does not lose its modularity. An XDG grammar((Lab ,Fea ,Val ,Pri) ,Pri,i i i i i=1
XDG is a descendant of Topological Depen- Lex) consists of n dimensions, multi-dimensional
dency Grammar (TDG) (Duchier and Debusmann, principles Pri, and a lexicon Lex. An XDG analysis
n2001), pushing the underlying methodology further (V,E ,F) is an element of Ana = Ana ×···×i i 1i=1
by generalizing it in two aspects: Ana where all dimensions share the same set ofnnodes V . We call a dimension of a grammar gram- duce some of the most important one-dimensional
mar dimension. and multi-dimensional principles.
Multi-dimensional principles specify subsets of
3.1 Tree principleAna, i.e. of tuples of analyses for the individual di-
mensions. The lexicon Lex⊆ Lex ×···×Lex con-1 n tree(i) The analysis on dimension i must be a tree.
strains all dimensions at once, thereby synchroniz-
The tree principle is non-lexicalized and
ing them. An XDG analysis is licensed by Lex iff
parametrized by the dimension i.
(F (v),...,F (v))∈ Lex for every node v∈ V .1 n
In order to compute analyses for a given input, 3.2 Dag principle
we employ a set of input constraints (Inp), which
dag(i) The analysis on dimension i must be a di-again specify a subset of Ana. XDG solving then
rected acyclic graph.amounts to finding elements of Ana that are licensed
The dag principle is non-lexicalized andby Lex, and consistent with Inp and Pri. The input
parametrized by the dimension i.constraints determine whether XDG solving is to be
used for parsing or generation. For parsing, they
3.3 Valency principlespecify a sequence of words, and for generation, a
multiset of semantic literals. valency(i,in ,out) All nodes on dimension i musti i
satisfy their in and out specifications.
2.2 Solver
The valency principle is lexicalized and serves
XDG solving has a natural reading as a constraint to lexically describe dependency graphs. It is
satisfaction problem (CSP) on finite sets of integers, parametrized by the dimension i, the in specification
where well-formed analyses correspond to the solu- in and the out specification out . For each node, ini i i
tions of the CSP (Duchier, 2003). We have imple- stipulates the licensed incoming edges, and out thei
mented an XDG solver using the Mozart-Oz pro- licensed outgoing edges.
gramming system (Mozart Consortium, 2004).
In the example grammar lexicon part in Figure 1
XDG solving operates on all dimensions concur-
below, the in specification is in and out isID ID
rently. This means that the solver can infer informa- the out specification on the ID dimension. For the
tion about one dimension from information on an-
common noun Roman, the in specification licenses
other, if there is either a multi-dimensional principle zero or one incoming edges labeled subj (subj?),
linking the two dimensions, or by the synchroniza- and zero or one incoming edges labeled obj (obj?).
tion induced by the lexical entries. For instance, not The out specification requires precisely one outgo-
only can syntactic information trigger inferences in ing edge labeled det (det!).
syntax, but also vice versa.
Because XDG allows us to write grammars with 3.4 Government principle
completely free word order, XDG solving is an
government(i,cases ,govern ) All edges in dimen-i iNP-complete problem (Koller and Striegnitz, 2002).
sion i must satisfy the government specification of
This means that the worst-case complexity of the
the mother.
solver is exponential. The average-case complexity
The government principle is lexicalized. Its pur-of many smaller-scale grammars that we have ex-
pose is to constrain the case feature of a depen-perimented with seems polynomial, but it remains
1dent. It is parametrized by the dimension i, theto be seen whether we can scale this up to large-
cases specification cases and the government spec-iscale grammars.
ification govern. cases assigns to each word a set of
possible cases, and govern a mapping from labels to3 Principles
sets of cases.
The well-formedness conditions of XDG analy-
In Figure 1, the cases specification for the deter-
ses are stipulated by principles. Principles are
miner den is {acc} (i.e. it can only be accusative).
parametrizable, e.g. by the dimensions on which
By its government specification, the finite verb ver-
they are applied, or by lexical features. They can
sucht requires its subject to exhibit nominative case
be lexicalized or non-lexicalized, and can be one-
(subj →{nom}).
dimensional or multi-dimensional. Principles are
taken from an extensible principle library. So far,
1We restrict ourselves to the case feature only for simplicity.
the set of possible principles is unrestricted, and to In a fully-fledged grammar, the government principle would be
find restrictions for them is a topic for future re- used to constrain also other morphological aspects like number,
search. In the following two subsections, we intro- person and gender.3.5 Agreement principle 3.9 Linking principle
linking(i, j,link ) All edges on dimension i mustagreement(i,cases ,agree ) All edges in dimensioni i, ji
satisfy the linking specification of the mother.i must satisfy the agreement specification of the
The linking principle is lexicalized and two-mother.
dimensional. It is parametrized by the two dimen-The agreement principle is lexicalized. Its pur-
sions i and j, and by the link