Automatic regression benchmark system

pages

English

Documents

Écrit par
Nicolas Desprès

Publié par
Voin

Lire

Le téléchargement nécessite un accès à la bibliothèque YouScribe Tout savoir sur nos offres

pages

English

Ebook

Lire

Le téléchargement nécessite un accès à la bibliothèque YouScribe Tout savoir sur nos offres

Publié par

Voin

Nombre de lectures

Langue

English

Automaticregressionbenchmarksystem
NicolasDesprès
oTechnicalReportn 0513,June2005
revision 921
Regression benchmark is a part of regression testing that aims at an automatic detection of performance
regrduringapplicationdevelopment. Thegoalistodetectassoonaspossiblethesmallestchangeof
performance. Weneedprecisemeasurementofsmallpartsoftheprogramtoachievethat. Although,many
benchmarksystemsalreadyexistsnonearefullyautomatedand/oradaptedtowiderangeofapplications.
However, automation is a crucial requirement in order to detect regression as soon as possible. This paper
tackles generalities about performance measurements, then gives the requirements of such a system, and
ﬁnallyproposesamodeling.
L’évaluation des régressions de performance d’une application fait partie intégrante de la phase de test de
régression.Lebutestd’exprimerleplusprécisémentpossiblelesdifférencesdeperformanceentredeuxver
sions.Parprécision,nousentendonsàlafoislapertinencedel’estimateurdetempsutiliséetlagranularité
despartiesévaluées.
Bienquedenombreuxsystèmesd’évaluationcomparativedesperformancesexistentdéjà,peusontcom
plètementautomatiséset/ouadaptésàdifférentessortesd’applications.Pourtant,l’automatisationdecette
phaseestcrucialeaﬁndedétecterleplustôtpossiblelespertesdeperformance.
Cet exposé présente tout d’abord les prérequis à la mise en place d’un tel système. Puis son architec
ture ainsi que son application au projet Transformers. Ensuite, nous comparerons différentes techniques
d’estimations ...

Voir

Publié par

Voin

Nombre de lectures

Langue

English

AutomaticregressionbenchmarksystemNicolasDesprèsTechnicalReportno0513,June2005revision921Regressionbenchmarkisapartofregressiontestingthataimsatanautomaticdetectionofperformanceregressionduringapplicationdevelopment.Thegoalistodetectassoonaspossiblethesmallestchangeofperformance.Weneedprecisemeasurementofsmallpartsoftheprogramtoachievethat.Although,manybenchmarksystemsalreadyexistsnonearefullyautomatedand/oradaptedtowiderangeofapplications.However,automationisacrucialrequirementinordertodetectregressionassoonaspossible.Thispapertacklesgeneralitiesaboutperformancemeasurements,thengivestherequirementsofsuchasystem,andﬁnallyproposesamodeling.L’évaluationdesrégressionsdeperformanced’uneapplicationfaitpartieintégrantedelaphasedetestderégression.Lebutestd’exprimerleplusprécisémentpossiblelesdifférencesdeperformanceentredeuxver-sions.Parprécision,nousentendonsàlafoislapertinencedel’estimateurdetempsutiliséetlagranularitédespartiesévaluées.Bienquedenombreuxsystèmesd’évaluationcomparativedesperformancesexistentdéjà,peusontcom-plètementautomatiséset/ouadaptésàdifférentessortesd’applications.Pourtant,l’automatisationdecettephaseestcrucialeaﬁndedétecterleplustôtpossiblelespertesdeperformance.Cetexposéprésentetoutd’abordlesprérequisàlamiseenplaced’untelsystème.Puissonarchitec-tureainsiquesonapplicationauprojetTransformers.Ensuite,nouscompareronsdifférentestechniquesd’estimationsdutempsd’exécution.Etenﬁn,nousévoqueronslespossibilitésd’adaptationd’untelenvi-ronnementàd’autresorted’applications.Keywordsautomatic,regressionbenchmark,performanceanalysis,visualization,database,datacollectionLaboratoiredeRechercheetDéveloppementdel’Epita14-16,rueVoltaire–F-94276LeKremlin-Bicêtrecedex–FranceTél.+33153145947–Fax.+33153145922nicolas.despres@lrde.epita.fr–http://www.lrde.epita.fr/

Copyingthisdocument2Copyright c2005LRDE.Permissionisgrantedtocopy,distributeand/ormodifythisdocumentunderthetermsoftheGNUFreeDocumentationLicense,Version1.2oranylaterversionpublishedbytheFreeSoftwareFoundation;withtheInvariantSectionsbeingjust“Copyingthisdocument”,noFront-CoverTexts,andnoBack-CoverTexts.AcopyofthelicenseisprovidedintheﬁleCOPYING.DOC.

Contents1Introduction2Requirements2.1Overview...........................................2.2Automaticdataacquisition................................2.3Uniﬁedresultsformat...................................2.4Resultsrepository......................................2.5Resultsanalyzes/visualizationinterface.........................2.6Platform-dependency....................................3Design3.1Overviewoftheregressionbenchmarksystem.....................3.2Thebenchmarksuite....................................3.3Thepopulatescript.....................................3.4Thecollectscript......................................3.5Thewebinterface......................................3.6Thedatabase.........................................3.6.1Tablesdescription.................................3.6.2Tablealterationversusnumerousrecords....................3.7Typicalusecasescenario..................................3.8Theuniﬁedresultsformat.................................3.8.1Featuredmaterials.................................3.8.2Chosenlanguage..................................4Conclusion5Bibliography455556667778899901011111214151

Chapter1IntroductionThistechnicalreportpresentsaregressionbenchmarksystem.Suchasystemaimsatpreventingperformanceregressionwhichmaybeintroducedduringaprojectdevelopment.Betweentwomajorversionsofaprogram,manysmalllossesofperformancemaybeintro-ducedcontinuouslybymaintainerswhiletheyaredevelopingtheprogram.Theyoftendonotdetectthesesmallperformanceregressionsbecausetheydonotruntheirbenchmarksuiteforev-erypatchtheyapply1orbecausetheirperformancemeasurementsarenotaccurateenough.Thesumofallthesesmalllossesofperformancemayresultinasigniﬁcantperformanceregression.Whenmaintainersdetectanimportantregressionoftheefﬁciencyoftheirprogram,hundredsofpatchesarealreadycommitted.Thus,theyareunabletoﬁndwhenthisregressionhappened,especiallyifitisthesumofseveralsmallregressions.Thisproblemcanbeavoidedifthebench-marksuiteisruncontinuouslywhiletheprogramismodiﬁed.Itisfrequentthatadeveloperteamdotensofpatchesperday(sometimesmore)ontheirproject.Thus,itisverycumbersometorunthebenchmarksuitemanuallyaftereverypatch,speciallyifittakesalongtimetorunit2.Moreovertheamountofbenchmarkresultmayincreasequicklysincethenumberofrevisionsofanaverageprojectisoftenaround500.Thus,aregressionbenchmarksystemmustbefullyautomaticinordertonotoverloadthewholedevelopmentcycle.Thisreportdescribestheregressionbenchmarksystemunderdevelopmentinourlaboratory.Thissystemaimsatsolvingtheproblemintroducedabove.Itisstilladraftbutthemainmodulespeciﬁcation,intermsofrequirementsanddesign,aredeﬁned.Firstofall,therequirementsofthesystemaredetailed.Then,thewholeproject’sarchitectureisdescribed.1Numerousprojectsdonotevenhaveanybenchmarksuite.2Thisoftenhappenssincethetestsuitemustberunbeforethebenchmarksuiteandthatbothsuitemaybelargerandlargerastheprogramgrows.

Chapter2RequirementsThischaptercoversthespeciﬁcationoftherequirementsofanautomaticregressionbenchmarksystem.Firstofall,wegiveashortdescriptionofallofthemasanoverview.Then,wedetaileachofthemindividually.2.1OverviewTheregressionbenchmarkframeworkmustfulﬁllthefollowingrequirements[Coursonetal.(2000);Kalibera(2004);Kaliberaetal.(2004)]:•Itmustperformandcollectthemeasurementresultsautomatically.•Itmustprovideauniﬁedresultformat.•Itmustmanagearesultrepository.•Itmustfeatureauser-friendlyinterfaceforresultanalysisand/orvisualization.•Itmustbeplatform-independent.Alltheserequirementsaredetailedinthefollowingsections.2.2AutomaticdataacquisitionAsmentionedintheintroduction,weaimatmeasuringperformancesforeveryrevisionofaproject,inordertodetectperformanceregressionassoonaspossible.Becauseperformancemea-surementsmaytakealongtime,andbecauseitiscommontocommitchangesmorethantentimesperday,theperformancedataacquisitionforagivenprojectisverytimeconsumingandthuscan’tbeperformedmanually.Itiscrucialthattheentirebenchmarkprocessisperformedau-tomatically:fromtheprogramandbenchmarkenvironmentinstallationandrun,totheadditionoftheresultsintotherepository.2.3UniﬁedresultsformatAbenchmarkconsistsinmakingacomparisonoftwomeasurements.Thecomparisonmaybedoneagainstanothercontestantprogramoranolderversionofthesameprogram.Inordertodosuchacomparison,theresultmustbestoredusingthesameformattoavoidtheuseofconverter.Moreover,wewanttobeabletobenchmarksubpartsofabenchmark.So,weneedaresultformatthatsupportsnestedstructures.However,itisoutofthescopeofourprojecttowriteacomplexparserandacomplexprettyprinter.Thus,weneedaformatwhichiseasytoreadandwritefromtheperspectiveofascript.

2.4Resultsrepository62.4ResultsrepositoryTheamountofcollecteddatamayincreasequicklybecause,wewanttoperformmeasurementforeveryrevision.So,weneedastrongstoragesystem(e.g.notaregularﬁle).Moreover,weneedtobeabletosearchandgroupbenchmarkstogetherwhenweanalyzethedata.Theseanalyzesmustbefastenoughandsupportscalability.2.5Resultsanalyzes/visualizationinterfaceTheresultanalyze/visualizationinterfacemustprovideaneasyanduser-friendlywaytogen-erategraphsbasedonthecollecteddata.Themostimportantgraphweneedinaregressionbenchmarksystemistheonerepresentingtheperformanceevolutioninrespecttotherevisionnumberoftheprogram.Wealsoneedtocomparedifferentprograms:typicallyourprogramanditscontestants.2.6Platform-dependencyTheend-usermaywishtoseethedifferenceofperformanceofitsprogramfromanarchitecturetoanother.So,thebenchmarkenvironmentmustbeabletorunondifferentarchitectures.Thisconstraintisappliedespeciallyonthebenchmarksuitewrittenbytheprojectauthors.Mostof,theprojectundertestiscompatiblewiththearchitecturethebenchmarksuiteiscompatibletoo.Thetaskoftheregressionbenchmarksystemisonlytorunthebenchmarksuiteandtocollecttheresult.

Chapter3DesignInthischapter,wepresentthedesignchosentodeveloptheregressionbenchmarksystem.Firstofall,wegiveanoverviewofthewholesystem.Secondly,thestructureofthedatabaseisde-tailed.Then,wedescribethetypicalusecasescenario.Finally,wedetailoursuggestionforauniﬁedresultformat.Finally,wearguequicklythetoolsetwehavechosentoimplementit.3.1OverviewoftheregressionbenchmarksystemThesystemiscomposedofseveralcomponentslistedbelow:•Abenchmarksuite.•Apopulatescript.•Acollectscript.•Awebinterface.•Adatabase.Therelationshipsbetweeneachcomponentareshownontheﬁgure3.11(onpage8).3.2ThebenchmarksuiteThebenchmarksuiteisthepartofthesystemwhichactuallyperformsthemeasurements.Ontheﬁgure3.1(page8),wecallthispartoftheframework:thebencher.Mostoftheexistingprojectsimplementtheirbenchmarksuitebywritingatestsuitededicatedtoemphasistheperformancesoftheprograminsteadofthecorrectbehaviorofitsfeatures.Developersaregenerallyinterestedinmeasuringtheamountoftimeand/orthememoryus-agetheirprogramneeds.Thesetwovaluesareeasilycomputablebymeansofareusableexternprogram(suchastimeorvalgrind[Nethercote(2004)]).Thesetoolsareconvenientbecausetheyarenotintrusiveintheprogramcode.Inotherword,theyratherneedmeaningfultestsuitesthancodeinstrumentationsinordertoberelevant.Manyprojectsalsoneedmorespeciﬁcinformation.Forinstance,ourprojectOlena[Duret-Lutz(2000)],animageprocessinglibrary,cancomputethenumberoftimesanalgorithmaccessestoapixelofanimage.Thisinformationisveryinterestinginordertooptimizeanalgorithm.Contrarytothetimeandmemoryusagevalues,thecomputationofsuchavalueimpliestoin-strumentthelibrarycode.Thisexampleillustratesthatitisveryhardwithcommonlanguages1TheCRUDabbreviationistheCreate,Read,UpdateandDeleteactionssequencethatisusuallyperformedonadatabase.

3.3ThepopulatescriptFigure3.1:Regressionbenchmarksystemoverview8todevelopagenerictoolsthatcanhelptocomputeanymeasureonemayneed.That’stherea-sonwhyourregressionbenchmarksystemdoesnotfeatureagenerictooltoeasethewritingofbenchmarkmeasurements.Nevertheless,thispointistackledin[Kaliberaetal.(2004)].However,asdiscussedinthepreviouschapter,wewanttounifytheformatusedbythebench-marksuitetoprintitsresults.Thisformatisnotonlyamatteroflayout.Itassertsthatnecessaryinformationispresent.Theinformationisthedescriptionofeverybenchmarkoftheprojectandtheirresults.Thedescriptionmustbenonambiguous,inordertoensurethattheinsertionoftheresultinthedatabasewon’tneedanyhumaninteractionsduringtheentireprocess.Thema-terialsprovidedtohelpthedeveloperstoprinttherightinformationusingtherightformatisdescribedinthesectiondedicatedtotheuniﬁedresultsformat3.8onpage11.Thismaterialisprovidedasalibraryanditisgeneratedfromtheinformationcontainedinthedatabase.3.3ThepopulatescriptThegoalofthepopulatescriptistotakethebenchmarkresults(writtenusingtheuniﬁedresultsformat)asinputandpopulatethemintothedatabase.Thisscriptiscalledeitherbythecollectscript(seesection3.4)orbyahumanoperator.Thebenchmarksuiteofagivenprojectmayberunwithoutourprojectinstalledonthema-chine.That’swhythemodulethatcommittheinformationintoourdatabaseisembeddedintothepopulatescriptinsteadofthebenchmarksuite.Moreover,incaseofanunexpectedambiguousbenchmarkdescriptionwhichcouldnotbecommittedforsanityreasonsintothedatabase,thepopulatescriptkeepstrackoftheresultuntilahumanintervention.Thus,theautomaticprocessisnotinterruptedandnodataarelost.3.4ThecollectscriptThegoalofthecollectscriptistoperiodicallyrunthebenchmarksuiteofeveryrevisionofev-eryprojectregisteredinthedatabaseandoneveryconﬁgurationmentioned.Thus,itﬁllsthedatabaseandensuresthatnonerevisionmeasuresaremissing.

9Design3.5ThewebinterfaceThewebinterfaceallowstheusertodrawgraphsandchartsbasedonthemeasuresstoredinthedatabase.WehavechosenawebbasedapplicationinsteadofanXwindowoneforportabilityreasons.3.6ThedatabaseThedatabaseisthecornerstoneofthesystem.Itisdesignedtoavoidduplicatedﬁeldandtosupporteverybenchmarktype.Theﬁgure3.2(page9)representstherelationshipbetweenthetablesofthedatabase.Figure3.2:Theresultsdatabasedescription3.6.1TablesdescriptionThecentraltable,calledbenchs,storeseverysinglebenchmark.Inthisdatabase,abenchmarkrepresentsonemeasureofonefeatureofoneproject.Ameasureisqualiﬁedbyatypeandascale.Themeasurementtypetellsus,forinstance,ifthebenchmarkmeasuresthememoryusageorthedurationoftheprogram.Thus,thetypehasaunitsuchasbytesorseconds.Themeasurementscalecodestheprograminputsizeusedbythebenchmark.Thisallowsustoperformscalabilitybenchmark.Ascaleisalsoqualiﬁedbyaunit.Thebenchstablehasthefollowingﬁelds:anid,aname,aprojectid,atypeid,ascaleid,etc...Becauseabenchmarkmaynotbeavailablefromthebeginning(ﬁrstrevision)ofaprojecttoitsend,therearetwomoreﬁeldscalledstart_revisionandstop_revision.Theyindicatebetweenwhichrevisionintervalthebenchmarkmaybeperform.Theexecutionstablestorestheresultofabenchmarkcollectedforeveryvalidrevisionandeveryavailableconﬁguration.Thebenchmarkismentionedinthistablebymeansofitsidinthebenchstable.Theexecutionstableallowustoeasilychecktheperformanceregressionsforagivenbenchmarkofaproject.

3.7Typicalusecasescenario013.6.2TablealterationversusnumerousrecordsWehavedesignedthisdatabasetoavoidhavingtoalteratablewhilethesystemisrunning.Wehavealsopaidattentiontonotduplicatedata.Thus,thetablerelationshipsmayseemcomplex,butitisnotrelevantsinceitismaintainedbythesystemandnotbytheusers.Currently,weprefertohaveatablewithmanyrecordsinsteadofcreatingnewtablesontheﬂy.3.7TypicalusecasescenarioThetypicalusecasescenarioisshownontheﬁgure3.3andisdetailedhere:Figure3.3:Atypicalusecasescenario1.Registeranewbenchmarkinthedatabaseviathewebinterface.Thisincludestheadditionofnecessarynewtypesorscalesorunits.2.Askthesystemtoregeneratethebenchmarkconﬁgurationﬁle.Basically,thisﬁlefeaturesmaterialstohelpthedevelopertoprinttheresultsusingtheexpectedformat.3.Writethecodeneededofthenewbenchmarkintheproject’sbenchmarksuite.4.Runagainthebenchmarksuiteandredirecttheresultstothepopulatesuite.Thisstageisoptionalsincepeoplemaywaitfortheperiodicalbenchmarkexecution.5.Finally,oncethepopulationprocessisﬁnishedyoucanwatchtheresultsbymeansofchartsusingthewebinterface.

11Design3.8TheuniﬁedresultsformatThisformataimsatrepresentingeverybenchmarkresultstype.Itis,so-called,uniﬁedbecauseeveryprojectmayuseitastheoutputformatoftheirbenchmarksuite.3.8.1FeaturedmaterialsThedevelopersofthebenchmarksuiteshouldnotknowtheuniﬁedformatthatweprovide.Firstofall,itmaybeverycumbersomeforthemtoprintitproperly.Secondly,ifwechangeit,theywillhavetoadapttheirbenchmarksuite.Thirdly,theydon’thavetoknowalltheinformationwerequiretodescribewithoutambiguityabenchmarkanditsresult.So,weprovidealibrarywhichcontainsalltheinformationneededwhichareregisteredinthedatabaseforagivenproject.Basically,thebenchmarksname,typeandunitareavailable.Then,thelibraryinterfacefeaturesmainlytwofunctionswiththefollowingprototypes:voidbegin_benchmark(constchar*name,constchar*type,constchar*unit);voidend_benchmark(doublescore);Forinstance,thedevelopersofthebenchmarksuitemayusethesefunctionsthisway:#include"benchmark.h"staticdoubledo_the_bench(void){doublescore;/*computethevalueofthescorevariable...*/returnthe_score;}intmain(void){doublescore;begin_project(MY_PROJECT);begin_benchmark(MY_BENCH_FOO,MY_TYPE_BAR,MY_INPUT_BAZ);score=do_the_benchmark();end_benchmark(score);end_project();return0;}Figure3.4:AnexampleofabenchmarkcodeThebegin_benchmarkfunctionprintsthedescriptionofthebenchmarkwhichisgoingtoberun.Thedescriptionistheidofthebenchmarkinthebenchstableofthedatabase.Thisidiscomputedbythehashoftheconcatenationofthebenchmarkname,typeandinputstrings.Sincethetypenameandinputnamearekeptuniqueinthedatabaseandthebenchmarknameforagivenprojectaswell,therearenoambiguities.Aftercallingthebegin_benchmarkfunction,wereallycomputeourbenchmarkandthen,givethescoreasargumenttotheend_benchmarkfunctioncall.Thebegin_projectfunctioncallindicatesthatthebenchmarkswrittenuntil

Voir