XyView: Universal Relations Revisited

icon

15

pages

icon

English

icon

Documents

Écrit par

Publié par

Lire un extrait
Lire un extrait

Obtenez un accès à la bibliothèque pour le consulter en ligne En savoir plus

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris
icon

15

pages

icon

English

icon

Documents

Lire un extrait
Lire un extrait

Obtenez un accès à la bibliothèque pour le consulter en ligne En savoir plus

XyView: Universal Relations Revisited Dan Vodislav Sophie Cluet Gregory Corona Imen Sebei CNAM/CEDRIC Paris, France INRIA Rocquencourt, France Xyleme Paris, France CNAM/CEDRIC Paris, France Abstract We present XyView, a practical solu- tion for fast development of user- (web forms) and machine-oriented applications (web services) over a repository of het- erogeneous schema-free XML documents. XyView provides the means to view such a repository as an array that can be queried using a QBE-like interface or through sim- ple selection/projection queries. It ex- tends the concept of universal relations in mainly two ways: (i) the input is not a re- lational schema but a potentially large set of XML data guides; (ii) the view is not defined explicitely by a unique query but implicitly by various mappings so as to avoid data loss and duplicates generated by joins. Developed on top of the Xyleme content management system, XyView can easily be adapted to any system support- ing XQuery. Keywords: XML views, heterogeneous data integration, application development tools, universal relation 1 Introduction For decades, companies have produced digital data such as notes, contracts, emails, progress re- ports, minutes, etc. This data constitute a mine of useful information that is largely unexploited.

  • projection

  • xyview

  • ple selection-projection

  • xml document

  • single struc- ture

  • projection queries

  • view schemas


Voir icon arrow

Publié par

Nombre de lectures

8

Langue

English

XyView:UniversalRelationsRevisited

DanVodislavSophieCluet
CNAM/CEDRICINRIA
Paris,FranceRocquencourt,France
vodislav@cnam.frSophie.Cluet@inria.fr

Abstract
Wepresent
XyView
,apracticalsolu-
tionforfastdevelopmentofuser-(web
forms)andmachine-orientedapplications
(webservices)overarepositoryofhet-
erogeneousschema-freeXMLdocuments.
XyView
providesthemeanstoviewsucha
repositoryasanarraythatcanbequeried
usingaQBE-likeinterfaceorthroughsim-
pleselection/projectionqueries.Itex-
tendstheconceptofuniversalrelationsin
mainlytwoways:(i)theinputisnotare-
lationalschemabutapotentiallylargeset
ofXMLdataguides;(ii)theviewisnot
definedexplicitelybyauniquequerybut
implicitlybyvariousmappingssoasto
avoiddatalossandduplicatesgenerated
byjoins.DevelopedontopoftheXyleme
contentmanagementsystem,XyViewcan
easilybeadaptedtoanysystemsupport-
ingXQuery.
Keywords
:XMLviews,heterogeneousdataintegration,
applicationdevelopmenttools,universalrelation
1Introduction
Fordecades,companieshaveproduceddigital
datasuchasnotes,contracts,emails,progressre-
ports,minutes,etc.Thisdataconstituteamine
ofusefulinformationthatislargelyunexploited.
TheadventofXLMprovidestheopportunityto
changethat.Manyenterprisesarenowconsider-
ingstoringtheirhomedatainXMLrepositories
soastobeabletoquerytheminasignificantway,
i.e.,withtoolsmoresophisticatedthanfulltext
searchengines.Inthispaper,weareaddressing
theproblemofqueryingsuchrepositories.More

Gre´goryCoronaImenSebei
XylemeCNAM/CEDRIC
Paris,FranceParis,France
Gregory.Corona@xyleme.comimen.sebei@cnam.fr

precisely,weareinterestedindeveloping,easily
andquickly,simplequeryAPI(webservices)or
userinterfaces(webforms)overtheserepositories.
Animportantcharacteristicoftheapplications
weareconsideringisthattheydealwithlegacy
datathathavebeenmostlyproducedbyhuman
beingsusingstandardtexteditors.Asaresult,
thedatais(i)poorlytyped(wellformedrather
thanvalidXML)and(ii)highlyheterogeneous
(althoughdocumentshavestrongsemanticcon-
nections).Thesefeaturesareparticularlychal-
lengingsincetheycallforsophisticatedtoolsto
easetheapplicationprogrammertaskwhileatthe
sametimedisablingmostexistingapproaches.
Thesolutionweproposeborrowsfromtheuni-
versalrelationparadigmoftheseventies[18]:
XyView
providesthemeanstoeasilyviewasetof
heterogeneousXMLdocumentsasasinglearray
thatcanbequeriedthroughsimpleselectionsand
projections.Obviously,thecontextbeingXML,
thearraycontainsXMLsubtreesandisbuiltus-
ingXQuery.Butthefundamentaldifferencesbe-
tweenuniversalrelationsandourapproachare
thefollowing:

Thearrayisnotdefinedbyonequerybut
byaspecificationofhowasimpleselection-
projectionuserqueryistobetranslatedinto
anXQuery.
Thisdifferenceisimportant.Theproblem
withuniversalrelationsisthat,unlessthe
databaseschemahasparticularlyniceprop-
ertieswhichisrarelythecase,projectionop-
erationsgeneratemanyduplicatesthatare
notalwayseasytoremove.Thisisdueto
thejoinoperationsenteringthedefinitionof
theuniversalrelation.Alternatively,thejoin
operationscanalsobethecauseofmissing
information.Thisisusuallysolvedbyintro-

ducingouter-joinsbutatthecostofhaving
todealwithnullvalues.
Notethattheseproblemsofdatalossanddu-
plicatesmayoccuranytimeaviewisdefined
asastructuredquery(SQLorXQuery).
Ourapproachisnottodefinetheviewasa
querybutratherasavirtualsetofqueries
thataregeneratedontheflytofittheuser
currentrequirements.Inthisway,weavoid
incompleteorverboseanswers.

Todealwiththecomplexityoftheinput
data,wedefineviewsintwosteps.Thefirst
dealswithdataheterogeneityandsomehow
mapssemanticallyconnectedheterogeneous
documentsintoatargetstructure.Atrun
time,thisstepgeneratesunions.Thesecond
stepcorrespondstoastandardviewdefini-
tionwheredataisaggregated.Atruntime,
thisleadstojoins.
Somehowsimilartoageneralwrapper-
mediatorarchitecture,ourviewmodeladds
anintermediarylevelthat(i)stronglystruc-
turestheviewbyseparatingunionsfrom
joins,and(ii)provideshomogeneousXML
typingfortheuniversalrelationelements.
WeimplementedXyViewasasetoftoolson
topoftheXyleme[19]XMLrepository,butit
caneasilybeadaptedtoanysystemsupporting
XQuery.TheXyViewtoolscovertheviewdef-
initionprocessbutalsogenerationofwebform
applicationsandwebservices.Althoughitsex-
pressivepowerislimitedaswillbeexplainedin
thispaper,XyViewhasproveditsworthwithsev-
eralindustrialapplications.
Therestofthepaperisorganizedasfollows.
Thenextsectionpresentsanexampleapplication
scenariothatillustratestheproblemwearead-
dressing.Section3describestheXyViewmodel.
Section4explorestheexpressivenessandsome
moresubtlefeaturesofthemodel,thensection5
describestheXyViewsystemthatisbuiltontop
ofanXMLrepository.Thefinalsectionspresent
therelatedworkandexploresomefuturework.
2ExampleApplicationScenarioand
Motivation
Theexamplethatwepresenthereisadrasticsim-
plificationofareallifeapplication.Asportsnews

companyhandlesseveraltypesofnewswires.The
wiresarewellformedXMLdocuments,withno
globalschema,thathavebeenextractedfromtext
files.Thesefileshavebeeneditedbyvariouslo-
calcorrespondentsovertheyears,accordingto
thecompany(mostlyverbal)editingrecommen-
dations.Thewireshavedifferentstructures,de-
pendingonthesportandthekindofinformation
theycontain.
Forlackofspaceandeaseofunderstanding,
weshowhereonlytwosuchwiresaboutfoot-
ball(soccer)andinasimplifiedform.Thefirst
considersresultsfromnationalleagues(e.g.,Doc-
ument1),andthesecondresultsfrominterna-
tionalinter-countrygames(e.g.,Document2).
Thenewscompanywantstobuildanapplication
thatqueriesthroughsimplewebformsthevarious
footballresultswiresandasportsencyclopedia
withdetailedinformationaboutfootballplayers
(Document3).
Theapplicationmanipulatesdocumentswhose
structuresaresimilar,butnotnecessarilyidenti-
cal,toDocuments1,2and3.Notably,otherdoc-
umentsmayhavemoreorlessinformation.These
threekindsofdocumentsarestoredinasingle
XMLcontentmanagementsystemincollections
whoserespectiveidentifiersareNationalURI,In-
ternationalURIandEncyclopediaURI.
<!--Document1:Nationalleagueresult-->
<GameResult>
<WireHeading>...</WireHeading>
<Description>RealMadrid1-Valencia0</Description>
<Date>2004-05-22</Date>
<Team>
<Name>RealMadrid</Name>
<Scored>1</Scored>
<Scorer><Name>Zidane</Name>
<Goals>1</Goals>
</Scorer>
</Team>
<Team>
<Name>Valencia</Name>
<Scored>0</Scored>
</Team>
</GameResult>
<!--Document2:Inter-countrygame-->
<ResultDate="2004-03-15">
<Summary>France1-Spain1</Summary>
<Scorers>
<ScorerGoals="1">
<Name>Zidane</Name>
<Country>France</Country>
</Scorer>
<ScorerGoals="1">
<Name>Raul</Name>
<Country>Spain</Country>
</Scorer>
</Scorers>
</Result>

<!--Document3:Sportsencyclopedia-->
<Encyclopedia>
<Football>
<Player><Name>Zidane</Name>
<Biography>...</Biography>
</Player>
...</Football>
...</Encyclopedia>
Theapplicationqueries,asthoseinFigure1,
mayconcernfootballresults(Q
1
),playerbiogra-
phies(Q
2
),orboth(Q
3
).
Theseapparentlysimplequeriesareinfact
ratherhardtoprograminXQueryasillustrated
byFigures2,3and4(issuesregardingthetyping
ofresultsarediscussedinSection4,weassume
herethatqueriesreturnsimplestrings).First,
onemustfindwhatdocumentsareneededamong
thevariousdocumenttypesinthesystem,and
whataretheirunderlyingstructures(documents
maybeschema-free).Then,onemustunderstand
whataretheXMLelements(andtheiraccess
paths)involvedineachquery,andhowtocom-
binethemtoproducetheresult(e.g.Q
1
simply
involvesaunionwhileQ
3
involvestwojoinsand
aunion).Andlast,butnotleast,theapplication
queriesmustbecorrectlyexpressedinXQuery.
Q
1
:
“GamesinwhichZidanescoredmorethanonce”
Q
2
:
“ThebiographyofZidane”
Q
3
:
“Biographiesofscorersfromgameson2004-09-08”
Figure1:Samplequeries
Programmersofgraphicaluserinterfacesare
notdatabaseexperts.Theyareusuallymorecom-
fortablewithJava,servlets,stylesheets,etc.than
withdatabaseschemasorXQuery.Yet,theyhave
toprogram(andmodify)manyqueriestosat-
isfytheircustomersneeds.Ourobjectiveswith
XyViewistooptimizetheirproductivitybyal-
lowingthemtoviewthedatabaseassomething
assimpleasaqueryformconsistingoffieldsthat
canbeusedtofilterorextractdata.
Notethatthisisanoldidea,universalrelations
intheseventiesaddressedthesameproblem.The
databasewasviewedasasinglerelationqueried
usingsimpleselectionsandprojections.
Yet,thereisacrucialdifferencebetween
XyViewanduniversalre

Voir icon more
Alternate Text