13
pages
English
Documents
Obtenez un accès à la bibliothèque pour le consulter en ligne En savoir plus
Découvre YouScribe en t'inscrivant gratuitement
Découvre YouScribe en t'inscrivant gratuitement
13
pages
English
Documents
Obtenez un accès à la bibliothèque pour le consulter en ligne En savoir plus
Parsing as AbstractInterpretation
of Grammar Semantics
Patrick Cousot Radhia Cousot
D´ epartement d’informatique Laboratoire d’informatique
´ ´Ecole normale sup´erieure CNRS & Ecole polytechnique
45, rue d’Ulm 91128 Palaiseau cedex
75230 Paris cedex 05, France France
cousot@ens.fr rcousot@lix.polytechnique.fr
http://www.di.ens.fr/~cousot http://lix.polytechnique.fr/~rcousot
Received 15 January 2001, accepted 16 August 2001
Abstract
Earley’s parsing algorithm is shown to be an abstract interpretation
of a refinement of the derivation semantics of context-free grammars.
1Introduction
Abstract interpretation is a theory of the approximation of the mathematical
structures involved in the formalization of the semantics of computer systems
[6]. It offers a unifying point of view on static program analysis [4] (including
data flow analysis [6, 8]and typing [2]) of specification and programming lan-
guages, model-checking [8], etc. Following this synthetic point of view, we show
that Earley’s parsing algorithm [9] can be formally designed by abstract inter-
pretation of a refinement of thederivation semantics of context-free grammars.
2Context-freeGrammars, Derivations, Gener-
ated Language and Parsing
The set of finite words on an alphabet A is denoted A.Thisincludes the
empty word .A language on the alphabetA is a subset ofA .A context-free
grammar G is a quadruple N ,T ,P,A,where:
• X,Y,...∈N is the finite set of nonterminals;
• the distinguished nonterminal A∈N is the axiom;
• a, b,...∈T,such thatT∩N =∅,is the finite set of terminals;
∆
•V =(N∪T )\{A} is the vocabulary;
• α, β,...∈V is the set of finite words on the vocabularyV;
1•P⊆N×V is the finite set of productions, X,α ∈P being written
G
X −→ α.
Observe thattheaxiom A cannot appear on the righthand side α of productions
X, α.Thisrestriction can be easily bypassed by introducing a new axiom A
Gsuch that A −→ A.
G
The semantics of a grammarG can be defined as the derivation relation =⇒
which is the least relation such that a nonterminal derives to the righthand side
of any of its productions, as specified by the following axiom schema (X∈N,
α∈V ):
G G
X =⇒ α, whenever X −→ α (1)
and a word derives to another word by replacement of a nonterminal by any one
of its derivations, as specified by the following inference rule schema:
G G
X =⇒ αY γ, Y =⇒ β ,X,Y ∈N,α,β,γ∈V . (2)
G
X =⇒ αβγ
G
The leftmost derivation =⇒ is defined in the same way but for the nonter-
minal replacement which is restricted to the leftmost nonterminal:
G G
X =⇒ α, whenever X −→ α (3)
G G
X =⇒ αY γ, Y =⇒ β ,X,Y ∈N,α∈T ,β,γ∈V (4)
G
X =⇒ αβγ
G
Similarly, the leftmost derivation from the axiom =⇒ is the restrictionA,
G
of the leftmost derivation =⇒ to nonterminals deriving from the grammar ax-
iom:
G G
A =⇒ α, whenever A −→ α (5)A,
G
X =⇒ αY γA, G ,X,Y ∈N,α∈T ,γ∈V ,Y −→ β (6)
G
Y =⇒ βA,
G G
X =⇒ αY γ, Y =⇒ βA, A, ,X,Y ∈N,α∈T ,β,γ∈V (7)
G
X =⇒ αβγA,
The language L generated by a grammar G is the set of terminal wordsG
deriving from the axiom A:
∆ GL = {α∈T | A =⇒ α} . (8)G
Equivalently, the language generated by a grammar can be defined using the
leftmost derivation [12, Theorem 4.1.1]:
GL = {α∈T | A =⇒ α} . (9)G
Equivalently, we can also use the leftmost derivation from the axiom:
2Lemma 1
GL = {α∈T | A =⇒ α} . (10)G A,
G G
Proof Obviously if wehaveproved X =⇒ α we can prove X =⇒ α usingA,
(3) for either (5) or (6) and (4) for (7).
G Reciprocally, we prove that if X = A or ∃η∈T ,ξ∈V : A =⇒ ηXξ and
G G
X =⇒ δ then X =⇒ δ. A,
G
The proof is on the length of the proof of X =⇒ δ by the formalsystem(3)
–(4).
G G
• If we have proved X =⇒ δ by (5) then X −→ δ and there are two
subcases:
G
– If X = A then X =⇒ δ follows from (5);A,
G – Otherwise, there exist η∈T and ξ∈V such that A =⇒ ηXξ.So,
G G
by induction, we can prove that A =⇒ ηXξ whence X =⇒ δ byA, A,
(6).
G • Otherwise, we have proved X =⇒ δ by (4) so we have δ = αβγ, α∈T
G G
andwemadesubproofs for X =⇒ αY γ and Y =⇒ β.Therearenow
two subcases:
G G
– If X = A then by induction X =⇒ αY γ that is A =⇒ αY γ withA, A,
Gα∈T so that again by induction Y =⇒ β.By(7),weconcludeA,
G G
that X =⇒ αβγ that is X =⇒ δ;A, A,
G – Otherwise, there exist η∈T and ξ∈V such that A =⇒ ηXξ.By
G G X =⇒ αY γ and (4), it follows that A =⇒ ηαY γξ with ηα∈T .
Hence we can apply the induction hypothesis and therefore prove
G G
that Y =⇒ β.By(7),weconclude that X =⇒ αβγ,whenceA, A,
G
X =⇒ δ.A,
G G
We conclude that A =⇒ α if and only if A =⇒ α so that (9) implies (10). A,
Parsing of a given terminal word ω∈T for a given grammarG consists in
deciding whether this word ω belongs to the language generated by the grammar
G: ω∈L .G
3Fixpoint Semantics of Formal Systems
It is well-known that formal systems specify a least fixpoint [1, 7]. The axioms
∆
and rule schemata of a formal system are interpreted as rule instances Φ =
Pi
i∈ ∆ on a given universe U where for all i∈ ∆, P⊆U is the premise
ci
(which is the empty set∅ for axiom instances) and c ∈U is the conclusion of thei
3Pi
rule instance .Thesubset of the universeU specified by the formal system
ci
⊆∆
Φisdefined as its semantics Φ=lfp F where the consequence operator:Φ
∆
F (X) = {c | i∈ ∆∧ P ⊆ X} (11)Φ i i
is the set of valid consequences of the hypothesis X.Theconsequence operator
⊆
F on ℘(U)is ⊆-monotonic so that the least fixpoint lfp F does exist [13].Φ Φ
The fixpoint semantics is equivalent to the more traditional one based on formal
proofs [1].
For example the formal system (5) – (7) defines the leftmost derivation from
the grammar axiom as:
⊆G G=⇒ =lfp D , (12)A, A,
∆ GG
D (R) = { A, α | A −→ α}A,
G∪{ Y,β | X,αY γ ∈ R∧ α∈T ∧ Y −→ β}
∪{ X, αβγ | X, αY γ ∈ R∧ α∈T ∧ Y,β ∈ R} .
4Earley’sParsing Algorithm
4.1 Earley’s Items
Given a terminal word ω∈T , ω = ω ...ω , n≥0(which is when n =0),1 n
Earley’s parsing algorithm [9, 11] involves Earley’s items which are quintuples
written:
X → α·β,i,j
G
where X −→ αβ is a production of the given grammarG and 0≤ i≤ j≤ n.A
G
valid Earley’s item is an assertion or judgement stating that α =⇒ ω ...ωi+1 j
G
(that is α =⇒ when i = j). Valid Earley’s items are derived left to right
E
and top-down starting from the grammar axiom. The setI of valid Earley’sG,ω
items for the gammarG andinput word ω is specified by the formal system (13)
–(16) below.
4.2 Rule-Based Specification of Earley’s Parsing Algorithm
The initialization axioms are instances of the following schema (for all produc-
G
tions A −→ γ of the grammar axiom A):
A→ ·γ, 0, 0 . (13)
The derivation rules are instances of the following schema (for all productions
G G
X −→ αY β and Y −→ γ of the grammarG and 0≤ i≤ j≤ n):
X → α·Yβ,i,j
. (14)
Y → ·γ,j,j
G G
The reduction rule schema is (for all productions X −→ αY β and Y −→ γ of
the grammarG and 0≤ k≤ i≤ j≤ n):
X → α·Yβ,k,i, Y → γ·,i,j
. (15)
X → αY ·β,k,j
4
G
The advance rule schema