15
pages
English
Documents
Obtenez un accès à la bibliothèque pour le consulter en ligne En savoir plus
Découvre YouScribe en t'inscrivant gratuitement
Découvre YouScribe en t'inscrivant gratuitement
15
pages
English
Documents
Obtenez un accès à la bibliothèque pour le consulter en ligne En savoir plus
XyView:UniversalRelationsRevisited
DanVodislavSophieCluet
CNAM/CEDRICINRIA
Paris,FranceRocquencourt,France
vodislav@cnam.frSophie.Cluet@inria.fr
Abstract
Wepresent
XyView
,apracticalsolu-
tionforfastdevelopmentofuser-(web
forms)andmachine-orientedapplications
(webservices)overarepositoryofhet-
erogeneousschema-freeXMLdocuments.
XyView
providesthemeanstoviewsucha
repositoryasanarraythatcanbequeried
usingaQBE-likeinterfaceorthroughsim-
pleselection/projectionqueries.Itex-
tendstheconceptofuniversalrelationsin
mainlytwoways:(i)theinputisnotare-
lationalschemabutapotentiallylargeset
ofXMLdataguides;(ii)theviewisnot
definedexplicitelybyauniquequerybut
implicitlybyvariousmappingssoasto
avoiddatalossandduplicatesgenerated
byjoins.DevelopedontopoftheXyleme
contentmanagementsystem,XyViewcan
easilybeadaptedtoanysystemsupport-
ingXQuery.
Keywords
:XMLviews,heterogeneousdataintegration,
applicationdevelopmenttools,universalrelation
1Introduction
Fordecades,companieshaveproduceddigital
datasuchasnotes,contracts,emails,progressre-
ports,minutes,etc.Thisdataconstituteamine
ofusefulinformationthatislargelyunexploited.
TheadventofXLMprovidestheopportunityto
changethat.Manyenterprisesarenowconsider-
ingstoringtheirhomedatainXMLrepositories
soastobeabletoquerytheminasignificantway,
i.e.,withtoolsmoresophisticatedthanfulltext
searchengines.Inthispaper,weareaddressing
theproblemofqueryingsuchrepositories.More
Gre´goryCoronaImenSebei
XylemeCNAM/CEDRIC
Paris,FranceParis,France
Gregory.Corona@xyleme.comimen.sebei@cnam.fr
precisely,weareinterestedindeveloping,easily
andquickly,simplequeryAPI(webservices)or
userinterfaces(webforms)overtheserepositories.
Animportantcharacteristicoftheapplications
weareconsideringisthattheydealwithlegacy
datathathavebeenmostlyproducedbyhuman
beingsusingstandardtexteditors.Asaresult,
thedatais(i)poorlytyped(wellformedrather
thanvalidXML)and(ii)highlyheterogeneous
(althoughdocumentshavestrongsemanticcon-
nections).Thesefeaturesareparticularlychal-
lengingsincetheycallforsophisticatedtoolsto
easetheapplicationprogrammertaskwhileatthe
sametimedisablingmostexistingapproaches.
Thesolutionweproposeborrowsfromtheuni-
versalrelationparadigmoftheseventies[18]:
XyView
providesthemeanstoeasilyviewasetof
heterogeneousXMLdocumentsasasinglearray
thatcanbequeriedthroughsimpleselectionsand
projections.Obviously,thecontextbeingXML,
thearraycontainsXMLsubtreesandisbuiltus-
ingXQuery.Butthefundamentaldifferencesbe-
tweenuniversalrelationsandourapproachare
thefollowing:
•
Thearrayisnotdefinedbyonequerybut
byaspecificationofhowasimpleselection-
projectionuserqueryistobetranslatedinto
anXQuery.
Thisdifferenceisimportant.Theproblem
withuniversalrelationsisthat,unlessthe
databaseschemahasparticularlyniceprop-
ertieswhichisrarelythecase,projectionop-
erationsgeneratemanyduplicatesthatare
notalwayseasytoremove.Thisisdueto
thejoinoperationsenteringthedefinitionof
theuniversalrelation.Alternatively,thejoin
operationscanalsobethecauseofmissing
information.Thisisusuallysolvedbyintro-
ducingouter-joinsbutatthecostofhaving
todealwithnullvalues.
Notethattheseproblemsofdatalossanddu-
plicatesmayoccuranytimeaviewisdefined
asastructuredquery(SQLorXQuery).
Ourapproachisnottodefinetheviewasa
querybutratherasavirtualsetofqueries
thataregeneratedontheflytofittheuser
currentrequirements.Inthisway,weavoid
incompleteorverboseanswers.
•
Todealwiththecomplexityoftheinput
data,wedefineviewsintwosteps.Thefirst
dealswithdataheterogeneityandsomehow
mapssemanticallyconnectedheterogeneous
documentsintoatargetstructure.Atrun
time,thisstepgeneratesunions.Thesecond
stepcorrespondstoastandardviewdefini-
tionwheredataisaggregated.Atruntime,
thisleadstojoins.
Somehowsimilartoageneralwrapper-
mediatorarchitecture,ourviewmodeladds
anintermediarylevelthat(i)stronglystruc-
turestheviewbyseparatingunionsfrom
joins,and(ii)provideshomogeneousXML
typingfortheuniversalrelationelements.
WeimplementedXyViewasasetoftoolson
topoftheXyleme[19]XMLrepository,butit
caneasilybeadaptedtoanysystemsupporting
XQuery.TheXyViewtoolscovertheviewdef-
initionprocessbutalsogenerationofwebform
applicationsandwebservices.Althoughitsex-
pressivepowerislimitedaswillbeexplainedin
thispaper,XyViewhasproveditsworthwithsev-
eralindustrialapplications.
Therestofthepaperisorganizedasfollows.
Thenextsectionpresentsanexampleapplication
scenariothatillustratestheproblemwearead-
dressing.Section3describestheXyViewmodel.
Section4explorestheexpressivenessandsome
moresubtlefeaturesofthemodel,thensection5
describestheXyViewsystemthatisbuiltontop
ofanXMLrepository.Thefinalsectionspresent
therelatedworkandexploresomefuturework.
2ExampleApplicationScenarioand
Motivation
Theexamplethatwepresenthereisadrasticsim-
plificationofareallifeapplication.Asportsnews
companyhandlesseveraltypesofnewswires.The
wiresarewellformedXMLdocuments,withno
globalschema,thathavebeenextractedfromtext
files.Thesefileshavebeeneditedbyvariouslo-
calcorrespondentsovertheyears,accordingto
thecompany(mostlyverbal)editingrecommen-
dations.Thewireshavedifferentstructures,de-
pendingonthesportandthekindofinformation
theycontain.
Forlackofspaceandeaseofunderstanding,
weshowhereonlytwosuchwiresaboutfoot-
ball(soccer)andinasimplifiedform.Thefirst
considersresultsfromnationalleagues(e.g.,Doc-
ument1),andthesecondresultsfrominterna-
tionalinter-countrygames(e.g.,Document2).
Thenewscompanywantstobuildanapplication
thatqueriesthroughsimplewebformsthevarious
footballresultswiresandasportsencyclopedia
withdetailedinformationaboutfootballplayers
(Document3).
Theapplicationmanipulatesdocumentswhose
structuresaresimilar,butnotnecessarilyidenti-
cal,toDocuments1,2and3.Notably,otherdoc-
umentsmayhavemoreorlessinformation.These
threekindsofdocumentsarestoredinasingle
XMLcontentmanagementsystemincollections
whoserespectiveidentifiersareNationalURI,In-
ternationalURIandEncyclopediaURI.
<!--Document1:Nationalleagueresult-->
<GameResult>
<WireHeading>...</WireHeading>
<Description>RealMadrid1-Valencia0</Description>
<Date>2004-05-22</Date>
<Team>
<Name>RealMadrid</Name>
<Scored>1</Scored>
<Scorer><Name>Zidane</Name>
<Goals>1</Goals>
</Scorer>
</Team>
<Team>
<Name>Valencia</Name>
<Scored>0</Scored>
</Team>
</GameResult>
<!--Document2:Inter-countrygame-->
<ResultDate="2004-03-15">
<Summary>France1-Spain1</Summary>
<Scorers>
<ScorerGoals="1">
<Name>Zidane</Name>
<Country>France</Country>
</Scorer>
<ScorerGoals="1">
<Name>Raul</Name>
<Country>Spain</Country>
</Scorer>
</Scorers>
</Result>
<!--Document3:Sportsencyclopedia-->
<Encyclopedia>
<Football>
<Player><Name>Zidane</Name>
<Biography>...</Biography>
</Player>
...</Football>
...</Encyclopedia>
Theapplicationqueries,asthoseinFigure1,
mayconcernfootballresults(Q
1
),playerbiogra-
phies(Q
2
),orboth(Q
3
).
Theseapparentlysimplequeriesareinfact
ratherhardtoprograminXQueryasillustrated
byFigures2,3and4(issuesregardingthetyping
ofresultsarediscussedinSection4,weassume
herethatqueriesreturnsimplestrings).First,
onemustfindwhatdocumentsareneededamong
thevariousdocumenttypesinthesystem,and
whataretheirunderlyingstructures(documents
maybeschema-free).Then,onemustunderstand
whataretheXMLelements(andtheiraccess
paths)involvedineachquery,andhowtocom-
binethemtoproducetheresult(e.g.Q
1
simply
involvesaunionwhileQ
3
involvestwojoinsand
aunion).Andlast,butnotleast,theapplication
queriesmustbecorrectlyexpressedinXQuery.
Q
1
:
“GamesinwhichZidanescoredmorethanonce”
Q
2
:
“ThebiographyofZidane”
Q
3
:
“Biographiesofscorersfromgameson2004-09-08”
Figure1:Samplequeries
Programmersofgraphicaluserinterfacesare
notdatabaseexperts.Theyareusuallymorecom-
fortablewithJava,servlets,stylesheets,etc.than
withdatabaseschemasorXQuery.Yet,theyhave
toprogram(andmodify)manyqueriestosat-
isfytheircustomersneeds.Ourobjectiveswith
XyViewistooptimizetheirproductivitybyal-
lowingthemtoviewthedatabaseassomething
assimpleasaqueryformconsistingoffieldsthat
canbeusedtofilterorextractdata.
Notethatthisisanoldidea,universalrelations
intheseventiesaddressedthesameproblem.The
databasewasviewedasasinglerelationqueried
usingsimpleselectionsandprojections.
Yet,thereisacrucialdifferencebetween
XyViewanduniversalre