Automatic regression benchmark system

icon

15

pages

icon

English

icon

Documents

Écrit par

Publié par

Le téléchargement nécessite un accès à la bibliothèque YouScribe Tout savoir sur nos offres

icon

15

pages

icon

English

icon

Ebook

Le téléchargement nécessite un accès à la bibliothèque YouScribe Tout savoir sur nos offres

Automaticregressionbenchmarksystem
NicolasDesprès
oTechnicalReportn 0513,June2005
revision 921
Regression benchmark is a part of regression testing that aims at an automatic detection of performance
regrduringapplicationdevelopment. Thegoalistodetectassoonaspossiblethesmallestchangeof
performance. Weneedprecisemeasurementofsmallpartsoftheprogramtoachievethat. Although,many
benchmarksystemsalreadyexistsnonearefullyautomatedand/oradaptedtowiderangeofapplications.
However, automation is a crucial requirement in order to detect regression as soon as possible. This paper
tackles generalities about performance measurements, then gives the requirements of such a system, and
finallyproposesamodeling.
L’évaluation des régressions de performance d’une application fait partie intégrante de la phase de test de
régression.Lebutestd’exprimerleplusprécisémentpossiblelesdifférencesdeperformanceentredeuxver
sions.Parprécision,nousentendonsàlafoislapertinencedel’estimateurdetempsutiliséetlagranularité
despartiesévaluées.
Bienquedenombreuxsystèmesd’évaluationcomparativedesperformancesexistentdéjà,peusontcom
plètementautomatiséset/ouadaptésàdifférentessortesd’applications.Pourtant,l’automatisationdecette
phaseestcrucialeafindedétecterleplustôtpossiblelespertesdeperformance.
Cet exposé présente tout d’abord les prérequis à la mise en place d’un tel système. Puis son architec
ture ainsi que son application au projet Transformers. Ensuite, nous comparerons différentes techniques
d’estimations ...
Voir Alternate Text

Publié par

Nombre de lectures

86

Langue

English

AutomaticregressionbenchmarksystemNicolasDesprèsTechnicalReportno0513,June2005revision921Regressionbenchmarkisapartofregressiontestingthataimsatanautomaticdetectionofperformanceregressionduringapplicationdevelopment.Thegoalistodetectassoonaspossiblethesmallestchangeofperformance.Weneedprecisemeasurementofsmallpartsoftheprogramtoachievethat.Although,manybenchmarksystemsalreadyexistsnonearefullyautomatedand/oradaptedtowiderangeofapplications.However,automationisacrucialrequirementinordertodetectregressionassoonaspossible.Thispapertacklesgeneralitiesaboutperformancemeasurements,thengivestherequirementsofsuchasystem,andfinallyproposesamodeling.L’évaluationdesrégressionsdeperformanced’uneapplicationfaitpartieintégrantedelaphasedetestderégression.Lebutestd’exprimerleplusprécisémentpossiblelesdifférencesdeperformanceentredeuxver-sions.Parprécision,nousentendonsàlafoislapertinencedel’estimateurdetempsutiliséetlagranularitédespartiesévaluées.Bienquedenombreuxsystèmesd’évaluationcomparativedesperformancesexistentdéjà,peusontcom-plètementautomatiséset/ouadaptésàdifférentessortesd’applications.Pourtant,l’automatisationdecettephaseestcrucialeafindedétecterleplustôtpossiblelespertesdeperformance.Cetexposéprésentetoutd’abordlesprérequisàlamiseenplaced’untelsystème.Puissonarchitec-tureainsiquesonapplicationauprojetTransformers.Ensuite,nouscompareronsdifférentestechniquesd’estimationsdutempsd’exécution.Etenfin,nousévoqueronslespossibilitésd’adaptationd’untelenvi-ronnementàd’autresorted’applications.Keywordsautomatic,regressionbenchmark,performanceanalysis,visualization,database,datacollectionLaboratoiredeRechercheetDéveloppementdel’Epita14-16,rueVoltaire–F-94276LeKremlin-Bicêtrecedex–FranceTél.+33153145947–Fax.+33153145922nicolas.despres@lrde.epita.frhttp://www.lrde.epita.fr/
Copyingthisdocument2Copyright c2005LRDE.Permissionisgrantedtocopy,distributeand/ormodifythisdocumentunderthetermsoftheGNUFreeDocumentationLicense,Version1.2oranylaterversionpublishedbytheFreeSoftwareFoundation;withtheInvariantSectionsbeingjust“Copyingthisdocument”,noFront-CoverTexts,andnoBack-CoverTexts.AcopyofthelicenseisprovidedinthefileCOPYING.DOC.
Contents1Introduction2Requirements2.1Overview...........................................2.2Automaticdataacquisition................................2.3Uniedresultsformat...................................2.4Resultsrepository......................................2.5Resultsanalyzes/visualizationinterface.........................2.6Platform-dependency....................................3Design3.1Overviewoftheregressionbenchmarksystem.....................3.2Thebenchmarksuite....................................3.3Thepopulatescript.....................................3.4Thecollectscript......................................3.5Thewebinterface......................................3.6Thedatabase.........................................3.6.1Tablesdescription.................................3.6.2Tablealterationversusnumerousrecords....................3.7Typicalusecasescenario..................................3.8Theuniedresultsformat.................................3.8.1Featuredmaterials.................................3.8.2Chosenlanguage..................................4Conclusion5Bibliography455556667778899901011111214151
Chapter1IntroductionThistechnicalreportpresentsaregressionbenchmarksystem.Suchasystemaimsatpreventingperformanceregressionwhichmaybeintroducedduringaprojectdevelopment.Betweentwomajorversionsofaprogram,manysmalllossesofperformancemaybeintro-ducedcontinuouslybymaintainerswhiletheyaredevelopingtheprogram.Theyoftendonotdetectthesesmallperformanceregressionsbecausetheydonotruntheirbenchmarksuiteforev-erypatchtheyapply1orbecausetheirperformancemeasurementsarenotaccurateenough.Thesumofallthesesmalllossesofperformancemayresultinasignificantperformanceregression.Whenmaintainersdetectanimportantregressionoftheefficiencyoftheirprogram,hundredsofpatchesarealreadycommitted.Thus,theyareunabletofindwhenthisregressionhappened,especiallyifitisthesumofseveralsmallregressions.Thisproblemcanbeavoidedifthebench-marksuiteisruncontinuouslywhiletheprogramismodified.Itisfrequentthatadeveloperteamdotensofpatchesperday(sometimesmore)ontheirproject.Thus,itisverycumbersometorunthebenchmarksuitemanuallyaftereverypatch,speciallyifittakesalongtimetorunit2.Moreovertheamountofbenchmarkresultmayincreasequicklysincethenumberofrevisionsofanaverageprojectisoftenaround500.Thus,aregressionbenchmarksystemmustbefullyautomaticinordertonotoverloadthewholedevelopmentcycle.Thisreportdescribestheregressionbenchmarksystemunderdevelopmentinourlaboratory.Thissystemaimsatsolvingtheproblemintroducedabove.Itisstilladraftbutthemainmodulespecification,intermsofrequirementsanddesign,aredefined.Firstofall,therequirementsofthesystemaredetailed.Then,thewholeproject’sarchitectureisdescribed.1Numerousprojectsdonotevenhaveanybenchmarksuite.2Thisoftenhappenssincethetestsuitemustberunbeforethebenchmarksuiteandthatbothsuitemaybelargerandlargerastheprogramgrows.
Chapter2RequirementsThischaptercoversthespecificationoftherequirementsofanautomaticregressionbenchmarksystem.Firstofall,wegiveashortdescriptionofallofthemasanoverview.Then,wedetaileachofthemindividually.2.1OverviewTheregressionbenchmarkframeworkmustfulfillthefollowingrequirements[Coursonetal.(2000);Kalibera(2004);Kaliberaetal.(2004)]:Itmustperformandcollectthemeasurementresultsautomatically.Itmustprovideaunifiedresultformat.Itmustmanagearesultrepository.Itmustfeatureauser-friendlyinterfaceforresultanalysisand/orvisualization.Itmustbeplatform-independent.Alltheserequirementsaredetailedinthefollowingsections.2.2AutomaticdataacquisitionAsmentionedintheintroduction,weaimatmeasuringperformancesforeveryrevisionofaproject,inordertodetectperformanceregressionassoonaspossible.Becauseperformancemea-surementsmaytakealongtime,andbecauseitiscommontocommitchangesmorethantentimesperday,theperformancedataacquisitionforagivenprojectisverytimeconsumingandthuscan’tbeperformedmanually.Itiscrucialthattheentirebenchmarkprocessisperformedau-tomatically:fromtheprogramandbenchmarkenvironmentinstallationandrun,totheadditionoftheresultsintotherepository.2.3UniedresultsformatAbenchmarkconsistsinmakingacomparisonoftwomeasurements.Thecomparisonmaybedoneagainstanothercontestantprogramoranolderversionofthesameprogram.Inordertodosuchacomparison,theresultmustbestoredusingthesameformattoavoidtheuseofconverter.Moreover,wewanttobeabletobenchmarksubpartsofabenchmark.So,weneedaresultformatthatsupportsnestedstructures.However,itisoutofthescopeofourprojecttowriteacomplexparserandacomplexprettyprinter.Thus,weneedaformatwhichiseasytoreadandwritefromtheperspectiveofascript.
2.4Resultsrepository62.4ResultsrepositoryTheamountofcollecteddatamayincreasequicklybecause,wewanttoperformmeasurementforeveryrevision.So,weneedastrongstoragesystem(e.g.notaregularfile).Moreover,weneedtobeabletosearchandgroupbenchmarkstogetherwhenweanalyzethedata.Theseanalyzesmustbefastenoughandsupportscalability.2.5Resultsanalyzes/visualizationinterfaceTheresultanalyze/visualizationinterfacemustprovideaneasyanduser-friendlywaytogen-erategraphsbasedonthecollecteddata.Themostimportantgraphweneedinaregressionbenchmarksystemistheonerepresentingtheperformanceevolutioninrespecttotherevisionnumberoftheprogram.Wealsoneedtocomparedifferentprograms:typicallyourprogramanditscontestants.2.6Platform-dependencyTheend-usermaywishtoseethedifferenceofperformanceofitsprogramfromanarchitecturetoanother.So,thebenchmarkenvironmentmustbeabletorunondifferentarchitectures.Thisconstraintisappliedespeciallyonthebenchmarksuitewrittenbytheprojectauthors.Mostof,theprojectundertestiscompatiblewiththearchitecturethebenchmarksuiteiscompatibletoo.Thetaskoftheregressionbenchmarksystemisonlytorunthebenchmarksuiteandtocollecttheresult.
Chapter3DesignInthischapter,wepresentthedesignchosentodeveloptheregressionbenchmarksystem.Firstofall,wegiveanoverviewofthewholesystem.Secondly,thestructureofthedatabaseisde-tailed.Then,wedescribethetypicalusecasescenario.Finally,wedetailoursuggestionforaunifiedresultformat.Finally,wearguequicklythetoolsetwehavechosentoimplementit.3.1OverviewoftheregressionbenchmarksystemThesystemiscomposedofseveralcomponentslistedbelow:Abenchmarksuite.Apopulatescript.Acollectscript.Awebinterface.Adatabase.Therelationshipsbetweeneachcomponentareshownonthefigure3.11(onpage8).3.2ThebenchmarksuiteThebenchmarksuiteisthepartofthesystemwhichactuallyperformsthemeasurements.Onthefigure3.1(page8),wecallthispartoftheframework:thebencher.Mostoftheexistingprojectsimplementtheirbenchmarksuitebywritingatestsuitededicatedtoemphasistheperformancesoftheprograminsteadofthecorrectbehaviorofitsfeatures.Developersaregenerallyinterestedinmeasuringtheamountoftimeand/orthememoryus-agetheirprogramneeds.Thesetwovaluesareeasilycomputablebymeansofareusableexternprogram(suchastimeorvalgrind[Nethercote(2004)]).Thesetoolsareconvenientbecausetheyarenotintrusiveintheprogramcode.Inotherword,theyratherneedmeaningfultestsuitesthancodeinstrumentationsinordertoberelevant.Manyprojectsalsoneedmorespecificinformation.Forinstance,ourprojectOlena[Duret-Lutz(2000)],animageprocessinglibrary,cancomputethenumberoftimesanalgorithmaccessestoapixelofanimage.Thisinformationisveryinterestinginordertooptimizeanalgorithm.Contrarytothetimeandmemoryusagevalues,thecomputationofsuchavalueimpliestoin-strumentthelibrarycode.Thisexampleillustratesthatitisveryhardwithcommonlanguages1TheCRUDabbreviationistheCreate,Read,UpdateandDeleteactionssequencethatisusuallyperformedonadatabase.
3.3ThepopulatescriptFigure3.1:Regressionbenchmarksystemoverview8todevelopagenerictoolsthatcanhelptocomputeanymeasureonemayneed.That’stherea-sonwhyourregressionbenchmarksystemdoesnotfeatureagenerictooltoeasethewritingofbenchmarkmeasurements.Nevertheless,thispointistackledin[Kaliberaetal.(2004)].However,asdiscussedinthepreviouschapter,wewanttounifytheformatusedbythebench-marksuitetoprintitsresults.Thisformatisnotonlyamatteroflayout.Itassertsthatnecessaryinformationispresent.Theinformationisthedescriptionofeverybenchmarkoftheprojectandtheirresults.Thedescriptionmustbenonambiguous,inordertoensurethattheinsertionoftheresultinthedatabasewon’tneedanyhumaninteractionsduringtheentireprocess.Thema-terialsprovidedtohelpthedeveloperstoprinttherightinformationusingtherightformatisdescribedinthesectiondedicatedtotheunifiedresultsformat3.8onpage11.Thismaterialisprovidedasalibraryanditisgeneratedfromtheinformationcontainedinthedatabase.3.3ThepopulatescriptThegoalofthepopulatescriptistotakethebenchmarkresults(writtenusingtheunifiedresultsformat)asinputandpopulatethemintothedatabase.Thisscriptiscalledeitherbythecollectscript(seesection3.4)orbyahumanoperator.Thebenchmarksuiteofagivenprojectmayberunwithoutourprojectinstalledonthema-chine.That’swhythemodulethatcommittheinformationintoourdatabaseisembeddedintothepopulatescriptinsteadofthebenchmarksuite.Moreover,incaseofanunexpectedambiguousbenchmarkdescriptionwhichcouldnotbecommittedforsanityreasonsintothedatabase,thepopulatescriptkeepstrackoftheresultuntilahumanintervention.Thus,theautomaticprocessisnotinterruptedandnodataarelost.3.4ThecollectscriptThegoalofthecollectscriptistoperiodicallyrunthebenchmarksuiteofeveryrevisionofev-eryprojectregisteredinthedatabaseandoneveryconfigurationmentioned.Thus,itfillsthedatabaseandensuresthatnonerevisionmeasuresaremissing.
9Design3.5ThewebinterfaceThewebinterfaceallowstheusertodrawgraphsandchartsbasedonthemeasuresstoredinthedatabase.WehavechosenawebbasedapplicationinsteadofanXwindowoneforportabilityreasons.3.6ThedatabaseThedatabaseisthecornerstoneofthesystem.Itisdesignedtoavoidduplicatedfieldandtosupporteverybenchmarktype.Thefigure3.2(page9)representstherelationshipbetweenthetablesofthedatabase.Figure3.2:Theresultsdatabasedescription3.6.1TablesdescriptionThecentraltable,calledbenchs,storeseverysinglebenchmark.Inthisdatabase,abenchmarkrepresentsonemeasureofonefeatureofoneproject.Ameasureisqualifiedbyatypeandascale.Themeasurementtypetellsus,forinstance,ifthebenchmarkmeasuresthememoryusageorthedurationoftheprogram.Thus,thetypehasaunitsuchasbytesorseconds.Themeasurementscalecodestheprograminputsizeusedbythebenchmark.Thisallowsustoperformscalabilitybenchmark.Ascaleisalsoqualifiedbyaunit.Thebenchstablehasthefollowingfields:anid,aname,aprojectid,atypeid,ascaleid,etc...Becauseabenchmarkmaynotbeavailablefromthebeginning(firstrevision)ofaprojecttoitsend,therearetwomorefieldscalledstart_revisionandstop_revision.Theyindicatebetweenwhichrevisionintervalthebenchmarkmaybeperform.Theexecutionstablestorestheresultofabenchmarkcollectedforeveryvalidrevisionandeveryavailableconfiguration.Thebenchmarkismentionedinthistablebymeansofitsidinthebenchstable.Theexecutionstableallowustoeasilychecktheperformanceregressionsforagivenbenchmarkofaproject.
3.7Typicalusecasescenario013.6.2TablealterationversusnumerousrecordsWehavedesignedthisdatabasetoavoidhavingtoalteratablewhilethesystemisrunning.Wehavealsopaidattentiontonotduplicatedata.Thus,thetablerelationshipsmayseemcomplex,butitisnotrelevantsinceitismaintainedbythesystemandnotbytheusers.Currently,weprefertohaveatablewithmanyrecordsinsteadofcreatingnewtablesonthefly.3.7TypicalusecasescenarioThetypicalusecasescenarioisshownonthefigure3.3andisdetailedhere:Figure3.3:Atypicalusecasescenario1.Registeranewbenchmarkinthedatabaseviathewebinterface.Thisincludestheadditionofnecessarynewtypesorscalesorunits.2.Askthesystemtoregeneratethebenchmarkconfigurationfile.Basically,thisfilefeaturesmaterialstohelpthedevelopertoprinttheresultsusingtheexpectedformat.3.Writethecodeneededofthenewbenchmarkintheproject’sbenchmarksuite.4.Runagainthebenchmarksuiteandredirecttheresultstothepopulatesuite.Thisstageisoptionalsincepeoplemaywaitfortheperiodicalbenchmarkexecution.5.Finally,oncethepopulationprocessisfinishedyoucanwatchtheresultsbymeansofchartsusingthewebinterface.
11Design3.8TheuniedresultsformatThisformataimsatrepresentingeverybenchmarkresultstype.Itis,so-called,unifiedbecauseeveryprojectmayuseitastheoutputformatoftheirbenchmarksuite.3.8.1FeaturedmaterialsThedevelopersofthebenchmarksuiteshouldnotknowtheunifiedformatthatweprovide.Firstofall,itmaybeverycumbersomeforthemtoprintitproperly.Secondly,ifwechangeit,theywillhavetoadapttheirbenchmarksuite.Thirdly,theydon’thavetoknowalltheinformationwerequiretodescribewithoutambiguityabenchmarkanditsresult.So,weprovidealibrarywhichcontainsalltheinformationneededwhichareregisteredinthedatabaseforagivenproject.Basically,thebenchmarksname,typeandunitareavailable.Then,thelibraryinterfacefeaturesmainlytwofunctionswiththefollowingprototypes:voidbegin_benchmark(constchar*name,constchar*type,constchar*unit);voidend_benchmark(doublescore);Forinstance,thedevelopersofthebenchmarksuitemayusethesefunctionsthisway:#include"benchmark.h"staticdoubledo_the_bench(void){doublescore;/*computethevalueofthescorevariable...*/returnthe_score;}intmain(void){doublescore;begin_project(MY_PROJECT);begin_benchmark(MY_BENCH_FOO,MY_TYPE_BAR,MY_INPUT_BAZ);score=do_the_benchmark();end_benchmark(score);end_project();return0;}Figure3.4:AnexampleofabenchmarkcodeThebegin_benchmarkfunctionprintsthedescriptionofthebenchmarkwhichisgoingtoberun.Thedescriptionistheidofthebenchmarkinthebenchstableofthedatabase.Thisidiscomputedbythehashoftheconcatenationofthebenchmarkname,typeandinputstrings.Sincethetypenameandinputnamearekeptuniqueinthedatabaseandthebenchmarknameforagivenprojectaswell,therearenoambiguities.Aftercallingthebegin_benchmarkfunction,wereallycomputeourbenchmarkandthen,givethescoreasargumenttotheend_benchmarkfunctioncall.Thebegin_projectfunctioncallindicatesthatthebenchmarkswrittenuntil
Voir Alternate Text
  • Univers Univers
  • Ebooks Ebooks
  • Livres audio Livres audio
  • Presse Presse
  • Podcasts Podcasts
  • BD BD
  • Documents Documents
Alternate Text