Parsing as Abstract Interpretation of Grammar Semantics

icon

13

pages

icon

English

icon

Documents

Écrit par

Publié par

Lire un extrait
Lire un extrait

Obtenez un accès à la bibliothèque pour le consulter en ligne En savoir plus

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris
icon

13

pages

icon

English

icon

Documents

Lire un extrait
Lire un extrait

Obtenez un accès à la bibliothèque pour le consulter en ligne En savoir plus

Niveau: Supérieur
Parsing as Abstract Interpretation of Grammar Semantics Patrick Cousot Departement d'informatique Ecole normale superieure 45, rue d'Ulm 75230 Paris cedex 05, France Radhia Cousot Laboratoire d'informatique CNRS & Ecole polytechnique 91128 Palaiseau cedex France Received 15 January 2001, accepted 16 August 2001 Abstract Earley's parsing algorithm is shown to be an abstract interpretation of a refinement of the derivation semantics of context-free grammars. 1 Introduction Abstract interpretation is a theory of the approximation of the mathematical structures involved in the formalization of the semantics of computer systems [6]. It o?ers a unifying point of view on static program analysis [4] (including data flow analysis [6, 8] and typing [2]) of specification and programming lan- guages, model-checking [8], etc. Following this synthetic point of view, we show that Earley's parsing algorithm [9] can be formally designed by abstract inter- pretation of a refinement of the derivation semantics of context-free grammars. 2 Context-free Grammars, Derivations, Gener- ated Language and Parsing The set of finite words on an alphabet A is denoted A. This includes the empty word . A language on the alphabet A is a subset of A. A context-free grammar G is a quadruple ?N , T , P , A?, where: • X, Y, .

  • earley's parsing

  • valid earley's

  • g?? ??

  • a?

  • context

  • grammar axiom

  • words deriving

  • free grammars

  • maps ? ?

  • ?x ?


Voir icon arrow

Publié par

Nombre de lectures

11

Langue

English

Parsing as AbstractInterpretation
of Grammar Semantics
Patrick Cousot Radhia Cousot
D´ epartement d’informatique Laboratoire d’informatique
´ ´Ecole normale sup´erieure CNRS & Ecole polytechnique
45, rue d’Ulm 91128 Palaiseau cedex
75230 Paris cedex 05, France France
cousot@ens.fr rcousot@lix.polytechnique.fr
http://www.di.ens.fr/~cousot http://lix.polytechnique.fr/~rcousot
Received 15 January 2001, accepted 16 August 2001
Abstract
Earley’s parsing algorithm is shown to be an abstract interpretation
of a refinement of the derivation semantics of context-free grammars.
1Introduction
Abstract interpretation is a theory of the approximation of the mathematical
structures involved in the formalization of the semantics of computer systems
[6]. It offers a unifying point of view on static program analysis [4] (including
data flow analysis [6, 8]and typing [2]) of specification and programming lan-
guages, model-checking [8], etc. Following this synthetic point of view, we show
that Earley’s parsing algorithm [9] can be formally designed by abstract inter-
pretation of a refinement of thederivation semantics of context-free grammars.
2Context-freeGrammars, Derivations, Gener-
ated Language and Parsing
The set of finite words on an alphabet A is denoted A.Thisincludes the
empty word .A language on the alphabetA is a subset ofA .A context-free
grammar G is a quadruple N ,T ,P,A,where:
• X,Y,...∈N is the finite set of nonterminals;
• the distinguished nonterminal A∈N is the axiom;
• a, b,...∈T,such thatT∩N =∅,is the finite set of terminals;

•V =(N∪T )\{A} is the vocabulary;
• α, β,...∈V is the set of finite words on the vocabularyV;
1•P⊆N×V is the finite set of productions, X,α ∈P being written
G
X −→ α.
Observe thattheaxiom A cannot appear on the righthand side α of productions
X, α.Thisrestriction can be easily bypassed by introducing a new axiom A
Gsuch that A −→ A.
G
The semantics of a grammarG can be defined as the derivation relation =⇒
which is the least relation such that a nonterminal derives to the righthand side
of any of its productions, as specified by the following axiom schema (X∈N,
α∈V ):
G G
X =⇒ α, whenever X −→ α (1)
and a word derives to another word by replacement of a nonterminal by any one
of its derivations, as specified by the following inference rule schema:
G G
X =⇒ αY γ, Y =⇒ β ,X,Y ∈N,α,β,γ∈V . (2)
G
X =⇒ αβγ
G
The leftmost derivation =⇒ is defined in the same way but for the nonter-
minal replacement which is restricted to the leftmost nonterminal:
G G
X =⇒ α, whenever X −→ α (3)
G G
X =⇒ αY γ, Y =⇒ β ,X,Y ∈N,α∈T ,β,γ∈V (4)
G
X =⇒ αβγ
G
Similarly, the leftmost derivation from the axiom =⇒ is the restrictionA,
G
of the leftmost derivation =⇒ to nonterminals deriving from the grammar ax-
iom:
G G
A =⇒ α, whenever A −→ α (5)A,
G
X =⇒ αY γA, G ,X,Y ∈N,α∈T ,γ∈V ,Y −→ β (6)
G
Y =⇒ βA,
G G
X =⇒ αY γ, Y =⇒ βA, A, ,X,Y ∈N,α∈T ,β,γ∈V (7)
G
X =⇒ αβγA,
The language L generated by a grammar G is the set of terminal wordsG
deriving from the axiom A:
∆ GL = {α∈T | A =⇒ α} . (8)G
Equivalently, the language generated by a grammar can be defined using the
leftmost derivation [12, Theorem 4.1.1]:
GL = {α∈T | A =⇒ α} . (9)G
Equivalently, we can also use the leftmost derivation from the axiom:
2Lemma 1
GL = {α∈T | A =⇒ α} . (10)G A,
G G
Proof Obviously if wehaveproved X =⇒ α we can prove X =⇒ α usingA,
(3) for either (5) or (6) and (4) for (7).
G Reciprocally, we prove that if X = A or ∃η∈T ,ξ∈V : A =⇒ ηXξ and
G G
X =⇒ δ then X =⇒ δ. A,
G
The proof is on the length of the proof of X =⇒ δ by the formalsystem(3)
–(4).
G G
• If we have proved X =⇒ δ by (5) then X −→ δ and there are two
subcases:
G
– If X = A then X =⇒ δ follows from (5);A,
G – Otherwise, there exist η∈T and ξ∈V such that A =⇒ ηXξ.So,
G G
by induction, we can prove that A =⇒ ηXξ whence X =⇒ δ byA, A,
(6).
G • Otherwise, we have proved X =⇒ δ by (4) so we have δ = αβγ, α∈T
G G
andwemadesubproofs for X =⇒ αY γ and Y =⇒ β.Therearenow
two subcases:
G G
– If X = A then by induction X =⇒ αY γ that is A =⇒ αY γ withA, A,
Gα∈T so that again by induction Y =⇒ β.By(7),weconcludeA,
G G
that X =⇒ αβγ that is X =⇒ δ;A, A,
G – Otherwise, there exist η∈T and ξ∈V such that A =⇒ ηXξ.By
G G X =⇒ αY γ and (4), it follows that A =⇒ ηαY γξ with ηα∈T .
Hence we can apply the induction hypothesis and therefore prove
G G
that Y =⇒ β.By(7),weconclude that X =⇒ αβγ,whenceA, A,
G
X =⇒ δ.A,
G G
We conclude that A =⇒ α if and only if A =⇒ α so that (9) implies (10). A,
Parsing of a given terminal word ω∈T for a given grammarG consists in
deciding whether this word ω belongs to the language generated by the grammar
G: ω∈L .G
3Fixpoint Semantics of Formal Systems
It is well-known that formal systems specify a least fixpoint [1, 7]. The axioms

and rule schemata of a formal system are interpreted as rule instances Φ =

Pi
i∈ ∆ on a given universe U where for all i∈ ∆, P⊆U is the premise
ci
(which is the empty set∅ for axiom instances) and c ∈U is the conclusion of thei
3Pi
rule instance .Thesubset of the universeU specified by the formal system
ci
⊆∆
Φisdefined as its semantics Φ=lfp F where the consequence operator:Φ

F (X) = {c | i∈ ∆∧ P ⊆ X} (11)Φ i i
is the set of valid consequences of the hypothesis X.Theconsequence operator

F on ℘(U)is ⊆-monotonic so that the least fixpoint lfp F does exist [13].Φ Φ
The fixpoint semantics is equivalent to the more traditional one based on formal
proofs [1].
For example the formal system (5) – (7) defines the leftmost derivation from
the grammar axiom as:
⊆G G=⇒ =lfp D , (12)A, A,
∆ GG
D (R) = { A, α | A −→ α}A,
G∪{ Y,β | X,αY γ ∈ R∧ α∈T ∧ Y −→ β}
∪{ X, αβγ | X, αY γ ∈ R∧ α∈T ∧ Y,β ∈ R} .
4Earley’sParsing Algorithm
4.1 Earley’s Items
Given a terminal word ω∈T , ω = ω ...ω , n≥0(which is when n =0),1 n
Earley’s parsing algorithm [9, 11] involves Earley’s items which are quintuples
written:
X → α·β,i,j
G
where X −→ αβ is a production of the given grammarG and 0≤ i≤ j≤ n.A
G
valid Earley’s item is an assertion or judgement stating that α =⇒ ω ...ωi+1 j
G
(that is α =⇒ when i = j). Valid Earley’s items are derived left to right
E
and top-down starting from the grammar axiom. The setI of valid Earley’sG,ω
items for the gammarG andinput word ω is specified by the formal system (13)
–(16) below.
4.2 Rule-Based Specification of Earley’s Parsing Algorithm
The initialization axioms are instances of the following schema (for all produc-
G
tions A −→ γ of the grammar axiom A):
A→ ·γ, 0, 0 . (13)
The derivation rules are instances of the following schema (for all productions
G G
X −→ αY β and Y −→ γ of the grammarG and 0≤ i≤ j≤ n):
X → α·Yβ,i,j
. (14)
Y → ·γ,j,j
G G
The reduction rule schema is (for all productions X −→ αY β and Y −→ γ of
the grammarG and 0≤ k≤ i≤ j≤ n):
X → α·Yβ,k,i, Y → γ·,i,j
. (15)
X → αY ·β,k,j
4
G
The advance rule schema

Voir icon more
Alternate Text