The Fitness Benchmark for Data Warehousing

icon

19

pages

icon

English

icon

Documents

Le téléchargement nécessite un accès à la bibliothèque YouScribe Tout savoir sur nos offres

icon

19

pages

icon

English

icon

Documents

Le téléchargement nécessite un accès à la bibliothèque YouScribe Tout savoir sur nos offres

1
The Fitness Benchmark for Data Warehousing
Michael Goshey
Department of Computer Science and Engineering
University of Minnesota
Abstract
This paper challenges the notion that speed of query execution, which dominates the OLTP database bench
marking landscape, is the appropriate critical metric for benchmarking data warehouses. The heart of the work
is the introduction of a new fitness benchmark for testing and rating a warehouse’s ability to provide decision
support rather than testing its ability to quickly respond to queries. The proposed benchmark contains three specific
metrics: utilization (thorough use of existing data structures), coverage (non reliance on OLTP source systems) and
granularity (robust support of required aggregations). The metrics were chosen for their desirability, impactfulness,
portability, simplicity and ease of improvement. The paper defines the metrics in detail, identifies the metadata
used in evaluating each of the measures, proposes and defines a weighting scheme for combining them into the
overallbenchmarkanddemonstratesthecomputationofthebenchmarkagainstawell documented,publiclyavailable
reference set of decision support data. The paper concludes that using the results of the computed benchmark to
direct specific change can lead to measurable improvements in the ability of a warehouse to provide decision support.
I. INTRODUCTION
HE practice of benchmarking database management systems is dominated by approaches that emphasize
query ...
Voir icon arrow

Publié par

Nombre de lectures

60

Langue

English

TheFitnessBenchmarkforDataWarehousingMichaelGosheyDepartmentofComputerScienceandEngineeringUniversityofMinnesotaAbstractThispaperchallengesthenotionthatspeedofqueryexecution,whichdominatestheOLTPdatabasebench-markinglandscape,istheappropriatecriticalmetricforbenchmarkingdatawarehouses.Theheartoftheworkistheintroductionofanewfitnessbenchmarkfortestingandratingawarehouse’sabilitytoprovidedecisionsupportratherthantestingitsabilitytoquicklyrespondtoqueries.Theproposedbenchmarkcontainsthreespecificmetrics:utilization(thoroughuseofexistingdatastructures),coverage(non-relianceonOLTPsourcesystems)andgranularity(robustsupportofrequiredaggregations).Themetricswerechosenfortheirdesirability,impactfulness,portability,simplicityandeaseofimprovement.Thepaperdefinesthemetricsindetail,identifiesthemetadatausedinevaluatingeachofthemeasures,proposesanddefinesaweightingschemeforcombiningthemintotheoverallbenchmarkanddemonstratesthecomputationofthebenchmarkagainstawell-documented,publiclyavailablereferencesetofdecisionsupportdata.Thepaperconcludesthatusingtheresultsofthecomputedbenchmarktodirectspecificchangecanleadtomeasurableimprovementsintheabilityofawarehousetoprovidedecisionsupport.I.INTRODUCTION1HEpracticeofbenchmarkingdatabasemanagementsystemsisdominatedbyapproachesthatemphasizeTqueryexecutiontimeasthecriticalmetric[14].Generalpurposeon-linetransactionprocessing(OLTP)systemsrequireimmediatequeryresponsetofacilitatee-commerceandotherreal-timeornearreal-timeapplications.Insuchdomainsitismeaningfultolearnthatonedatabasesystemprocessesmoretransactionsthananotherduringagivenperiodoftime.Howeveradatawarehouseisdesignedwithamorenarrowobjective:facilitatingdecision-makingbyoptimizingdataforanalyticalandreportingpurposes[11][3].Datawarehousesarecommonlyreferredtoas’decisionsupport’systemstoreflectthisfocusedemphasis[23].Whileperformance-basedbenchmarksmayalsobeofinterestinevaluatingdecisionsupportsystems,thispaper1challengesthenotionthatspeedofqueryexecutionistheappropriatecriticalmetricinbenchmarkingdatawarehouses.Theultimatemeasureofadatawarehouse’ssuccessshouldnotbeitsqueryprocessingspeedbutrathertheextenttowhichitsupportsdecisionmaking.Thekeybenchmarkingquestionforadecisionsupportsystemisnot’howfastdoesitreturnqueries?’butrather’howwelldoesitsupportdecisions?’1PreparedaspartofaprojectfortheFall,2006UniversityofMinnesotacomputersciencegraduatecourseOverviewofDatabaseResearch
2Thechallengecanbeframedinthefollowingterms:givenanexistingdatabaseandrelatedschemaandaworkloadofuser-submittedqueriesagainstthatdataandschema,produceanempiricalmeasurementthatratestherelationshipbetweenthedatabaseanditsqueries.Themeasurementshouldreflectbestpractices,berelatedtorelativelyhigh-impactconcerns,beapplicabletomajordatabasemanagementsystems,bebasedoneasytograspconceptsandbeeasilyremediedwhendiagnosedasproblematic.Tonarrowthescopetoareasonableconsiderationset,themeasureshouldbespecificallylimitedtodecisionsupport(datawarehouse)databases.Theimportanceofthechallengeliesisthefactthatdatawarehousesareamajorareaofdatabasestodayandtheirrelativeimportancecontinuestogrow[18].Thechallengeisdifficultinsofarasdesigningameasureorbenchmarkcanbeahighlysubjectiveexercise.Thereforeitisbeneficialtothinkintermsofaproxysetofspecificmetricsthatunambiguouslyevaluatetheabilityofadatawarehousetosupportdecisionmaking.Inrecognizingthatnostandardevaluationmetriccurrentlyexiststhispaperdefinessuchaproxycalledthefitnessbenchmark.Whenthedesignofadatawarehouseiswell-suitedtotheparticularsetofqueriesittypicallyhandlesitcanbethoughtofashavingabetterfitnessscorethanawarehousethatisless-suitedtosupportingitsownqueries.Thispaperarticulatesageneralpurposeframeworkofmetadata-basedmetricsthatevaluatethequeriespresentedtoawarehouseagainsttheactualdataandfeaturesofthewarehousetoderiveanempiricalscorerepresentingthedatawarehouse’sfitness.Thepaperdescribesthebenchmark’scomponentmeasures,identifiesthemetadatausedinevaluatingthemeasuresanddemonstratesthroughspecificexamplesbasedonpublisheddecisionsupportdatahowthefitnessbenchmarkcanbeusedtoconductfocusedevaluationofadatawarehouseandrateitsabilitytosupportthedecisionmakingquerieswithwhichitistypicallypresented.Ideallyafitnessevaluationcouldprecedeanyperformance-basedbenchmarkinginordertomoreefficientlydirecttheperformancerelatedoptimizations.A.RelatedWorkBenchmarking:Benchmarkingofrelationaldatabasesistheestablishedpracticeofdefiningandapplyingconsistentperformance-basedstandardsofmeasurement[19]toabroadrangeof(typicallytransactional)databasemanagementsystems.Well-knownmeasuressuchastheTPCBenchmark[14]havebecomeindustrystandardsforthecomparativeevaluationofOLTPsystems.DecisionSupportandDataWarehousing:Datawarehousesarebuiltfromdimensional(denormalized),aggre-gateddata[12][5][11]andaredesignedfortheexpressedpurposeofsupportingfact-baseddecisionmaking[3][23][8].SimilartoworkinvolvingOLTPsystems,aprimaryfocusofthedatawarehouseliteraturehasbeentoproposeoptimizationsaimedatreducingqueryexecutiontimeinordertoimproveusefulnessforendusersresponsibleforsubmittingdecisionsupportqueries.Suchproposedoptimizationshaveincludednovelclusteringschemes[10],betterapproachestocachemanagement[17],optimizationstrategiesforthemaintenanceofmaterializedviews[9]
3[2],improvementstoindexing[7]andqueryapproximationtechniques[1].Somepapershaveproposedbenchmarksdesignedspecificallytoaddressesuniquedatamodelconsiderationsrelatedtodecisionsupportsystems[16][5][4]thoughthesehavealmostexclusivelyfocusedonqueryresponsetimeasthekeyassessmentmetric.DataQuality:Dataqualityhasbeenanotherareaofrelatedworkinthedatabaseliterature.Papersinthisfieldfocusonthebroadtopicofensuringaccuracyandrobustnessofdataanddonotexplicitlydealwiththeeffectivenessofdecisionsupportsystems[22][13].Theorthogonalityofdataqualitytothesubjectofthecurrentpaperisevidentwhenoneimaginesadatawarehousecomprisedofthehighestqualitydatathatishowever,ineffectiveatsupportingdecisionmakingduetolackofrelevancetousers’datarequirementsorthroughtheabsenceofcriticalaggregations.UseofMetadatainDataWarehousing:Theuseofmetadataindatawarehousingiswell-documentedandconsideredanindustrybestpractice[12][11].Metadataisusedforevaluatingthecurrentstateofdata,trackinghistoricalchangesovertime,adherencetoestablishedrulesandtheanalysisofsubmittedqueries.Somepapersproposespecificschemesforleveragingmetadataintransitioningolder(presumablyless-interesting)warehousedatatoless-granularaggregationsforthepurposeofreducingoverallstoragefootprint[20][6].Papersinthisveinfocusondataageasthekeydriverofchangeratherthanusingmetadatathatreflectsactualusage.Ingeneraltheexistingliteraturestopsshortofproposingtheuseofmetadataasaninputtothecreationofdecisionsupportbenchmarks.B.ContributionsofThisPaperThecurrentpaperrefutesthedefactoassumptionthatexecutionspeedshouldbethefocalpointinbenchmarkingdatawarehousesandproposestheinstitutionofanovelbenchmarkingapproachbasedonthenotionoffitness.Datawarehousefitnessisdefinedhereafterastheempiricalmeasurementthatunambiguouslyevaluatestheabilityofadatawarehousetosupportthedecision-makingquerieswithwhichitispresented.Threespecificmetricsembodyingdesirablecharacteristicsofdatawarehouseswerechosenasrepresentativeofthissupport.Thatistosayawarehousethatscoreswellonallthreecomponentmetricswouldbeconsideredbycurrentindustrybestpracticestobehighlysupportiveofitsparticulardecisionmakingqueries.Allthreemetricsinvolveharvestingmetadatafromthedecisionsupportsystemitselfaswellasthequerieswithwhichitispresented.Thegapanalysisofthesetwotypesofmetadataacrossthethreemeasuresisweightedandexpressedintheoverallbenchmarkwhichcanbeusedhereaftertoconsistentlydescribeadatawarehouse’sfitness.Thethreecomponentmetricsareproposed,definedanddiscussedindetailinthepaperandincludeutilization(thethoroughuseofexistingdatawarehousestructures),coverage(thenon-relianceonOLTPsourcesystemsatquerytime)andgranularity(therobustsupportofrepeatedlyrequestedaggregations).Thisnewapproachtodatawarehousebenchmarkingcanbeused(apriori)incombinationwithtraditionalperformance-basedbenchmarkstonarrowlyfocusperformanceoptimizationsto
suitablyfitdatawarehouses.C.OutlineofRemainingSections4Theremainderofthepaperisorganizedasfollows:SectionIIreviewskeydecisionsupportconceptsthathavebearingonthepaper,SectionIIIdefinesthefitnessapproachanddescribesindetailitsthreecomponentmetricsandweightingscheme,SectionIVappliesthefitnessbenchmarktoawell-knownreferencedatasetandSectionVpresentsconclusionsandconsidersideasforfuturework.II.DATAWAREHOUSING:BASICCONCEPTSThefollowingsectionprovidesabriefoverviewofrelevantdatawarehousinganddecisionsupportconceptsofwhichsomebasicunderstandingisrequiredinordertograspthepresentationofthebenchmarkandthediscussionofitscomponentmetrics.A.DecisionSupportSystemsMuchoftheworkinbenchmarkingdatabaseshasrelatedspecificallytoOLTPdatabasesmodeledtosupportspeedyandefficienthandlingofinsert,updateanddeletetransactionssuchasthosefoundinairlinereservationsystemsande-commercewebsites.Attheotherendofthespectrumarethesystemsthatarethesubjectofthiswork:decisionsupportoron-lineanalyticalprocessing(OLAP)systemswhichareoptimizedspecificallyfordataretrieval,reporting,aggregationandadhocqueryandanalysis.B.DimensionalModelingOLTPsystemsgenerallyfeaturehighlynormalizeddatathatminimizesredundantstorageandsupportstheothergoalsofOLTPsystems.Incontrast,thedimensionalmodelsprominentinOLAPtolerateredundancyinexchangeforbetterresponsivenesstoanticipatedreportingandanalysisqueries.Adimensionalmodelconsistsofde-normalizedor’flattened’entitydimensionssurroundingtablesofnumericmeasuresorfacts.Thestructureresemblesastarwithafacttableatitscenterjoinedviaforeignkeystoeachofitsrelateddimensiontables.Thisstructureminimizesthenumberofjoinsnecessaryforuserstoperformanceanalysisonthenumericmeasures,allowingthemtouseanyattributecolumninanydimensionasafilteringconditionorasthebasisforaggregation.This’starschema’isthepredominantconstructindatawarehousingandaswillbeseenlaterisdirectlyrelatedtotheutilizationmetricinthefitnessbenchmark.
C.DataWarehousesandDataMarts5Whiledatawarehousingtypicallyreferstoacomprehensivesetofdecisionsupporttechnologiesadatawarehousemorenarrowlyreferstothestoreofdataitselfandimpliesdimensional,aggregatedandhistoricaldataratherthanthenormalized,transactionalandrealtimedataonewouldexpectinanOLTPsystem.Thetermdatamarthasbeenusedinavarietyofcontextsandmeaningshoweverthemeaningmostgermanetothispaperisthatofasinglestarschemastructurethatitselfisonecomponentofadatawarehouse.Inthisdefinitionadatawarehousecontainsanumberofdatamarts.Mosttypicallyeachmarthasitsownfacttable,howeverthemartsoftensharecommondimensionsthatrepresentfrequentlyusedentitiessuchasdateororganization.D.MetadataDatawarehousesutilize’dataaboutdata’(metadata)inacoupleofkeywaysthatarerelevanttothispaper.Mostmajorrelationaldbmsproductsincludesystemcatalogsthatprovideaccesstorichmetadataconcerningthesizeandstateofobjectsandstructuresinthesystem.Likewisemostofthesesamesystemsprovidetracingandmonitoringfacilitiesenablingactivitymonitoringtoanswerquestionssuchas’whoarethemost/leastactiveusers?’aswellasthecaptureofinformationaboutincomingqueriesinordertoanswerquestionssuchas’whatarethemost/leastpopularqueries?’III.THEFITNESSAPPROACHThefirststeptowardnewlyestablishingthefitnessbenchmarkistodefinetheconstituentmetricsofwhichitiscomprised.Anysuchselectionofmeasuresisinherentlysubjectiveasdifferentaspectsofadatawarehousemaybemoreorlessimportanttodifferentorganizationsorunderdifferentconditions.Inadditiontothemetricsthatwereultimatelyselectedanumberofothercandidatemetricswereconsidered,includingtheusefulnessoftemporalupdates,theefficiencyofcleansingandconformingprocesses,thevalueofvariousindicesandtheextenttowhichaggregationsareactuallyused.However,themetricsthatmadethefinallistaredatawarehousepropertiesthatparticularlyreflectthefollowingimportantcriteria:desirability:acceptedasabestpracticeimpactfulness:relatedtorelativelyhigh-impactconcernsportability:applicabletomajordatabasemanagementsystemssimplicity:basedoneasytounderstandconceptsimprovability:easilyfixedwhendiagnosedasproblematic
A.Utilization6Utilizationisthefirstcomponentmetricandisbasedonthenotionofrequestedstructures,definedasthoseelementsofthewarehouse(tables,materializedviews,etc.)thataredirectlyreferencedinthequeriesthatuserssubmit.Theutilizationmetricmeasurestheratioofthesizeofrequestedstructurestothesizeofallstructuresinthewarehouse.Ahighutilizationscoreindicatesthatahighpercentageofthedatainthewarehouseisbeingused,implyingthatthewarehouseisefficientlysized.Theutilizationmetricisrepresentedbythefollowingequation.teLh=numberofbytesinrequestedstructures,w=totalnumberofallbytes,x=utilization.Thenh=xw)1(Figure1illustratesthesituation.Thedataineachwarehouse(w)fallsintotwobroadcategories:bytesinrequestedstructures(h)andbytesinunrequestedstructures(h0).Theworseandbetterdiagramscanbeviewedastwodifferentwarehousesrespondingtothesamesetofrequestsorasthebeforeandafterversionsofawarehouseresizedtobetterfittherequestsitreceives.Theobjectiveoftheutilizationmetricistominimizeh0,thatis,minimizetheamountofdatabeingstoredinthewarehousebutneverrequestedbyuserstosupporttheirdecisionmaking.TheFig.1.Utilization.importanceofthismetricisfairlyobvious:storingdatainawarehouseisexpensiveandthereforeshouldbelimitedtoonlystoringwhatisrequired.Thisisbecausedatastoredinawarehouseisnotsimplyanoff-linereplicaofsourcesystemdatabuthasratherundergoneanumberofcostlytransformations(de-normalization,aggregation,cleansing,conformance).Giventheenormityoftoday’sdatawarehouses[18]applyingsuchefforttodatathatisneverusedishighlyundesirable.Inevaluatingmetadatatocomputetheutilizationscore,querymetadata(specificallytheFROMclauses)iscomparedagainstsystemcatalogmetadata(listsofavailableusertables,etc.)toidentifygapsandassesstheappropriatenessofthesizeanddesignofthewarehouse.Sincefacttablesareverylarge(frequentlytensofmillions
7ofrowsormoreinmodernwarehouses)theyareobviouscandidatesforremovalwhencompletelyunused.Yetgiventheirdeceivinglysmallsize,unuseddimensiontablescanalsohaveasignificant-albeitsubtle-impactondatawarehouses.Todemonstratethiscounter-intuitiveresult,referenceFigure2whichshowsasampledailysalesdatamartforaretailshoestore.Thenatureofthedimensionalmodelissuchthatifthedataisperfectlydense(nosparseness)thenumberofrowsinthefacttableatthecenterofastarschemaissimplytheproductofthenumberofrowsineachofthesurroundingdimensions.TheSalesFacttableinthediagramisassumedtobeperfectlydenseandistherefore14.6millionrows(730x200x10x10).Bycomparisonthe10rowDimSalesRepdimensionappearstobesoinconsequentialthatevenifanalysisindicatesitisneverusedonewouldquestionthevalueofbotheringtoremoveit.Howeverduetoitsmultiplicativeimpactonthefacttable,removingDimSalesRepreducesSalesFactto1.46millionrows(730x200x10),10%ofitsoriginalsize!Inotherwords,retainingadimensionsuchasDimSalesRepinawarehousewhenitisnotbeingusedhasasubstantialimpactonrelatedfacttables,whichwelabelaswidowed,thatis,substantiallyunderusedduetothenon-useofarelateddimension.Theutilizationmetriccandirectlyidentifyunusedfactanddimensiontablesaswellasindirectlyidentifywidowedfacttables.B.CoverageFig.2.Sampledatamartforaretailshoestorewithtwoyearsofhistory.Thesecondmetricthatmakesupthefitnessbenchmarkiscalledcoverageandisbasedonthenotionofcontainedqueries,definedasthesubsetofqueriessubmittedtothedecisionsupportsystemthatdonotrequireanyaccesstoorreferenceofsource(transactional)systems.Thesequeriescanbecompletelyfulfilledsolelyfromexistingwarehousedata.Leti=numberofcontainedqueries,q=totalnumberofallqueries,y=coverage.Theni=yq)2(
8ExamplesofuncontainedqueriesmightincludefactsaggregatedtoalowerlevelofgranularitythanisrepresentedinthewarehouseorWHEREandGROUPBYclausesthattargetattributesthatwereleftoutofthewarehouse.InassessingthecoveragemetrictheSQLforeachdecisionsupportqueryiscapturedforcomparisonagainstthelistofdatawarehouseobjectstodeterminetheextentofnon-coveragetobeminimized.AsindicatedinFigure3,ahigher(better)coveragescoreisattainedbyhavingahigherpercentageofcontainedqueries(i)andthereforeminimizingreferencestosourcesystemsviauncontainedqueries(i0).ItisevidentfromtheFig.3.Coverageunchangingsizeofq(totalnumberofqueriessubmitted)inthetwowarehouses(w)thattheobjectiveisnottoforceuserstolivewithoutcertaindecisionsupportqueriesbutrathertoexpandthewarehousedatatoincludethedatabeingreferencedinsourcesystemstoachieveahigherpercentageofcontainedqueries.Coverageisarguablythemostimportantofthethreecomponentfitnessmetricsforseveralreasons.Transactionaldatasourcestypicallyhaverealtimeobligationsthatmustbemet.Systemswithsuchrequirementsshouldbeisolatedfromunpredictableandfrequentlyresource-intensiverequestssubmittedbydatawarehouseusers.InadditiontoperformanceriskstotheOLTPsystemsthemselvesdatawarehouseusersmayexperiencelongdelayswhensubmittinguncontainedqueriessincethehighlynormalizedtransactionaldataisnotoptimizedforreportingandon-the-flyaggregation.Anotherpitfalltousersisthatthedatainthewarehousehastypicallypassedanumberofcleansing,conformingandotherrule-basedprocessinghurdlesthatmaynotbeinplaceinthesourcesystemsthemselvesandcanthereforeleadtoinaccuratecomparisonsofdataindifferentstatesofadherencetorules.C.GranularityThefinalcomponentmetricinthefitnesswarehouseisgranularitywhichincorporatestheconceptofdistinctgroups,definedhereasthestrictsubsetofuniqueGROUPBYfieldsthatappearinoneormorequeries.ThismeasureusesquerymetadatatoobtainacountofeachuniquefieldappearinginatleastoneGROUPBYstatementaswellasatotalcountofallfieldsappearinginallGROUPBYstatements.Letj=numberofdistinctgroups,g=totalnumberofallgroups,z=granularity.Then
j=zg93()Forexampleif10totalqueriesusingaggregateoperatorsaresubmittedtothewarehouseandallteninclude’CalendarMonthName’asaGROUPBYfield,’CalendarMonthName’iscountedasonedistinctgroup(j)andas10totalgroups(g).Thisreflectsthefactthatthissingleaggregation(at’CalendarMonthName’)willberepeated10times,andthateachsubsequentquerywillincurthesameperformanceoverheadassociatedwiththeaggregation.Thetwowarehouses(w)picturedinFigure4reflectthesamesetofsubmittedqueries(q).ThediagramontheleftFig.4.Granularityportraysalesserfractionofdistinctgroups(j)toallgroups(g)whereastherightdiagramshowsanincreaseinthismeasure.Thescenarioillustratedisthatascertaingroupsareidentifiedashighlyredundant,theperformanceimpactofqueriesthatusethesegroupscanbemitigatedbyaddingnewaggregatefacttablestothewarehouse.Thisisportrayedinthetwodiagramsbythechangeintheshapeofw.Addressinggroupredundancyinthiswaywillalsoreducethetotalnumberofgroupsbecausefeweraggregatequerieswillbesubmittedwhenmoredataisstoredattherequiredgrain.Thisisreflectedinthereductioninthesizeofg(totalnumberofgroups)fromtheleftdiagramtotheright.Themetricsarenotusedtoimpactuserbehavior(whichcouldbeindicativeofdecreaseddecisionsupport)butrathertodriveadaptationinthewarehousetomakethewarehousemoresuitabletothequeriesusersaresubmitting.D.TheFitnessBenchmarkHavingidentifiedthethreeconstituentmetricsthenextstepistocombinethemintotheproposedfitnessbenchmark.Letf=fitness,anda,b,c=coefficientsofutilization,coverageandgranularityrespectively,where0<coefficient<1,anda+b+c=1
hTneanditfollowsthatf=ax+by+cz01f01)4(Theoutstandingissueincompletingthebenchmarkdesignistodeterminethevalueofeachofthethreecoefficients.Identifyingtheoptimalcombinationofthethreeweightsisnotatalltheobjectiveofthispaper.Infactthismayprovetobeunrealisticastherelativeimportanceofutilization,coverageandgranularitywillalmostcertainlyvarybasedonthespecificsofthesituationinwhichthewarehouseisdeployed.Custodiansofawarehousewithextremelycomplicatedandpoorlyperformingsourcesystemsmayweighcoveragerelativelyhighgiventhesubstantialpenaltyimpliedbyaccessingsourcesystemsinsuchascenario.Warehousesinwhichintensive,resource-heavydataprocessingstepsarerequiredtoadddatatothewarehousemayhaveheavierweightassignedtotheutilizationmetricthanthosewherethisislessofanissue.Giventhatthekeyobjectiveofthepaperistodesignanddemonstrateaworkingbenchmarkthatcanbecalibratedtorespondtodifferingreal-worldconditions,letusproposethefollowingweightingscheme:a=0.3,b=0.5,c=0.2Thisschemeassignsimportancetothemetricsindescendingorderofcoverage,utilizationandgranularity.Thisimpliesthatthedatawarehouseproblemofreferencingsourcesystemsismoresignificantthanthatofstoringtoomuchdata,whichisinturnmoresignificantthannotachievingperfectdatagranularity.Thisalsosuggeststhatcoverageismorethantwiceascriticalasgranularity.Althoughtheseassumptionsaresomewhatarbitrarytheyarenotovertlycounterintuitiveandallowustoproceedtothenextstepofdemonstratinganapplicationofthefinalizedbenchmark,namelyf=0.3x+0.5y+0.2zIV.DEMONSTRATIONANDRESULTS()5Demonstratinganapplicationofthefitnessbenchmarkrequiresarobustsetofdecisionsupportdataalongwithmetadatafromboththesystemcatalogandtheassociatedsetofuser-submittedqueries.ThegenericdataselectedforthisexerciseisthefreelyavailableAdventureWorksreferencesetfromMicrosoft R[15]whichwascreatedasatooltodemonstrateaspectsoftheirdatabasesystem(Figure5).Thestatisticsconcerninguser-submittedquerieswerecontrivedforthesimulationtohighlightthecontributionsmadebyeachofthecomponentmetricstowardtheoverallbenchmarkscore.Bydesignthebenchmarkdoesnotmeasurequeryperformanceandthereforehardware
11Fig.5.SystemdiagramshowingtheAdventureWorksreferencedataandassociatedmetadatausedindemonstratingthefitnessbenchmark.andsoftwaredetailsarelargelyirrelevantbutforcompletenesssaketheenvironmentinwhichthedatawasaccessedwasthestandard(server)editionofMicrosoftSQLServer2005runninginaMSWindows RXP(sp2)testbedona2.8GHzdual-corex86desktopmachinewith2GBofSDRam.A.ReferenceDataSetFig.6.OverviewofthevemaindatamartsintheAdventureWorksDWdatawarehouse.TheintegratedAdventureWorksreferencesetcontainsalarge,normalizedtransactionaldatabasecalledAdven-tureWorksaswellasadimensionallymodeleddatawarehousecalledAdventureWorksDW.Theschemasarewelldefinedandreflectthemock-businesscaseuponwhichthefictitiousdataisbased,makingthedatasetanattractivechoiceforreal-worldscenariosimulationcomparedtootherreferencesetssuchastheTPC-Hdata[21]thataremoregearedtoperformance-basedsimulation.Thebusinesscaseindicatesthatthedatabelongstothefictitious
Voir icon more
Alternate Text