195
pages
English
Documents
2010
Le téléchargement nécessite un accès à la bibliothèque YouScribe Tout savoir sur nos offres
195
pages
English
Documents
2010
Le téléchargement nécessite un accès à la bibliothèque YouScribe Tout savoir sur nos offres
Publié par
Publié le
01 janvier 2010
Nombre de lectures
50
Langue
English
Poids de l'ouvrage
1 Mo
Publié par
Publié le
01 janvier 2010
Langue
English
Poids de l'ouvrage
1 Mo
XML Functional Dependencies
based on Tree Homomorphisms
Doctoral Thesis
Presented in partial fulfilment of the requirements for the degree of
Doctor rerum naturalium (Dr. rer. nat.)
at the
Faculty of Mathematics/Informatics and Mechanical Engineering,
Clausthal University of Technology
Diem-Thu Trinh
June 2009Supervisor: Prof. Dr. Sven Hartmann
Co-Supervisor: Prof. Dr. Sebastian Link (Victoria University of Wellington)
Date of the Oral Examination: July 3, 2009
Chairman of the Board of Examiners: Prof. Dr. Ju¨rgen DixAbstract
Functional dependencies form an important class of integrity constraints which
hasbeenwellstudiedinthecontextofrelationaldatabases. Recently, theeXten-
sible Markup Language (XML) has emerged as a popular and flexible standard
for modelling and managing data in a variety of domains. Although several at-
tempts have been made to generalise the concept of functional dependencies to
the context of XML, the search for an interesting yet practical formalisation of
XML functional dependencies still poses a challenge.
In this dissertation I propose and investigate a new class of XML functional
dependencies with properties (called pXFDs) defined on the basis of tree homo-
morphism. My first contribution is to demonstrate that one can reason about
pXFDs efficiently. Through establishing a semantic equivalent between pXFD
implication and logical consequence of propositional Horn clauses, I show that
the problemofpXFD implicationcanbedecided intime linearinthetotalnum-
ber of essential v-properties in a set of pXFDs. My second contribution is to
identify and prove a sound and complete axiomatisation for pXFDs. This yields
an alternative approach for deciding the implication problem for pXFDs.
Based on these findings, I investigate the application of pXFDs in classifying
well-designed XML schemas. The absence of redundancy is used as a suitable
indicator of “good” design. For that I generalise two notions of redundancy
(i.e., fact and value redundancy) from the relational data model and identify
two corresponding normal forms for characterising them. I also show that the
proposed normal forms can be checked efficiently: it is sufficient to examine
pXFDs in a cover set rather than the set of all implied pXFDs.
Finally, Iconsider theproblem ofpXFDacquisition anddiscovery. To beableto
benefit from pXFDs, it is imperative to have a goodspecification of pXFDs that
describe the semantics of the data to be stored in an XML database. Discovery
and acquisition are two complementary tasks which help in determining such
a specification. The logical characterisation of pXFD implication allows me to
develop a sample-based approach to support pXFD acquisition. For pXFD dis-
covery, ontheotherhand,Iadaptamethodusingdifferencesetsandhypergraph
transversals in the context of XML.
My investigations also demonstrate that pXFDs exhibit certain characteristics
which are reminiscent of functional dependencies in the relational data model.
iiiivContents
Abstract iii
List of Figures vii
List of Tables ix
List of Algorithms xi
Symbols and Notations xiii
1 Introduction 1
1.1 XML Data Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.1 XML Schema Tree and XML Data Tree . . . . . . . . . . . . . . 4
1.1.2 Subgraph Terminology . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 The Role of Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 XML Functional Dependencies with Properties . . . . . . . . . . . . . . . 9
1.3.1 Examples Illustrating Expressiveness of pXFDs . . . . . . . . . . 10
1.3.2 Implication and Derivation of pXFDs . . . . . . . . . . . . . . . . 12
1.3.3 Observations About Property-Equality . . . . . . . . . . . . . . . 12
1.3.4 Canonical pXFDs . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.4 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2 Semantic Equivalence Theorem for pXFDs 25
2.1 Relating Canonical pXFD Implication to Horn Clauses . . . . . . . . . . 27
2.1.1 Horn Clause Encoding . . . . . . . . . . . . . . . . . . . . . . . . 28
2.1.2 Semantic Equivalence of Canonical pXFDs to Logic . . . . . . . . 29
2.1.3 Construction of Two-v-pre-image Data Tree . . . . . . . . . . . . 35
2.2 Application of the Semantic Equivalence . . . . . . . . . . . . . . . . . . 47
2.2.1 Deciding the Implication Problem for pXFDs . . . . . . . . . . . 47
2.2.2 Sound and Complete Axiomatisation . . . . . . . . . . . . . . . . 50
2.2.3 Assistance with Constraints Acquisition . . . . . . . . . . . . . . 65
2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
vvi CONTENTS
3 Dependency Discovery 71
3.1 Transversal Approach for pXFD Discovery . . . . . . . . . . . . . . . . . 74
3.1.1 Canonical pXFD-Cover for Representing Satisfied pXFDs . . . . . 74
3.1.2 Minimal Transversals for Finding Canonical pXFD-Cover . . . . . 81
3.2 Transversal Approach: In Details . . . . . . . . . . . . . . . . . . . . . . 91
3.2.1 Finding Minimal Transversals . . . . . . . . . . . . . . . . . . . . 91
3.2.2 Finding Difference Sets from Dual Agree Sets . . . . . . . . . . . 100
3.2.3 Finding Candidate RHSs . . . . . . . . . . . . . . . . . . . . . . . 103
3.3 Finding Agree Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
3.3.1 Finding⊑-maximal Essential v-Ancestors in Agree Sets . . . . . . 124
3.3.2 Finding⊑-maximal Essential v-Subgraphs in Agree Sets . . . . . 126
3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
4 Redundancy and Normal Forms 151
4.1 Fact Redundancy and FR-pXNF . . . . . . . . . . . . . . . . . . . . . . 152
4.2 Value Redundancy and VR-pXNF . . . . . . . . . . . . . . . . . . . . . . 157
4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
5 Conclusion and Future Work 165
Bibliography 169
Acknowledgements 177
Index 178List of Figures
1.1 XML document about entries into photography competitions. . . . . . . 2
′1.2 XML data tree T corresponding to XML document in Figure 1.1. . . 3photo
′1.3 XML data tree T with dance school information. . . . . . . . . . . . 4dance
1.4 XMLschematreeT andsomeofitssubgraphs: walkboyandsubgraphdance
{boy,girl}. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
′1.5 XML tree for the pre-image treei ofv in T , and its projection to2 class dance
v -subgraph{boy,girl} . . . . . . . . . . . . . . . . . . . . . . . . . . 7class
1.6 Counter-example construction when X,Y are not v-reconcilable. . . . . . 15
1.7 Dance school schema tree showing its four v -units . . . . . . . . . . . 19class
2.1 Fagin’s Semantic and Syntactic Proof of the semantic equivalence between
FDs and propositional logic. . . . . . . . . . . . . . . . . . . . . . . . . . 26
.W W W W2.2 Pre-image trees p ,p such that p | = p | for some essential v-W W1 2 1 2.W WsubgraphW butp | =p | for everyv-subgraphX properly containedX X1 2
inW. If|W| is even then|W| = 2l andm =l−1, otherwise|W| = 2m−1
and m =l. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.3 XML schema tree T containing three v-units: U =ABC, U =DE,abstr 1 2
U =F . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
.W W W W1 1 1 12.4 Pre-imagetreesp ,p suchthatp | =p | ifandonlyifv-subgraphX X1 2 1 2
X is properly contained in W =AB. . . . . . . . . . . . . . . . . . . . . 411
W W U1 2 12.5 Pre-images p ,p merged to formp . . . . . . . . . . . . . . . . . . . 432 2 2
U U U1 2 32.6 Pre-images p ,p ,p merged to form p . Then p ,p merged to form2 1 22 2 2
′T -compatible data tree T . . . . . . . . . . . . . . . . . . . . . . . 46abstr abstr
2.7 Constraint acquisition by iterative inspection of candidates. . . . . . . . . 66
2.8 Flowchart of the propositional tableaux method for pXFDs acquisition. . 68
′3.1 T -compatible XML data tree T . . . . . . . . . . . . . . . . . 83purchase purchase
3.2 XML schema tree T . . . . . . . . . . . . . . . . . . . . . . . . . . 83purchase
3.3 Minimal transversals computation for D (v ) . . . . . . . . . 99′ PurchaseT
purchase
3.4 Relationship between essential v-ancestors and candidate v-ancestor RHSs. 106
3.5 Hasse diagramfor the poset (candRHS ′(v)| −{∅},⊑)from Example 3.67.122T U
′3.6 Relational representation of XML data tree T . . . . . . . . . . . . 140Purchase
3.7 Summary of pXFD Discovery Process. . . . . . . . . . . . . . . . . . . . 148
vii
6viiiList of Tables
2.1 System F of inference rules for (canonical) pXFDs . . . . . . . . . 50canonical
2.2 System F of inference rules for pXFDs . . . . . . . . . . . . . . . . 61general
2.3 α-rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
2.4 β-rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
′3.1 Summary of main transversal approach for discovering pXFDs in T . 89purchase
ixx