Benchmarking SAT Solvers for Bounded Model Checking Emmanuel Zarpas IBM Haifa Research Laboratory, zarpas@il.ibm.com Abstract. Modern SAT solvers are highly dependent on heuristics. Therefore, benchmarking is of prime importance in evaluating the per- formances of different solvers. However, relevant benchmarking is not necessarily straightforward. We present our experiments using the IBM CNF Benchmark on several SAT solvers. Using the results, we attempt to define guidelines for a relevant benchmarking methodology, using SAT solvers for real life BMC applications. 1 Introduction Over the past decade, formal verification via model checking has evolved from a theoretical concept to a production-level technique. It is being actively used in chip design projects across the industry, where formal verification engineers can now tackle the verification of large industrial hardware designs. Bounded Model Checking (BMC) [1] has recently enabled the verification of problems that used to be beyond the reach of formal verification. However, BMC performance relies heavily on the performance of the underlying SAT solver. We conducted several experiments using the IBM CNF benchmark [12, 11] to evaluate SAT tools. Often, we found discrepancies between our results and the experimental results found in the literature. For example, our results are not necessarily in line with the results of the SAT03 contest [4]. The main goal of this paper is to outline a methodology for benchmarking SAT ...
344E.Zarpas0000100010010110.11101001000100001.010.0zchaff I runtime08070605040302100niB(a)(b)Fig.1.Histogram(b)givesthedistributionofthedecimallogarithmofzChaffI/zChaffIIspeedupwith10000sectime-out2.4ComparisonwithSATContestResultsInordertounderstandwhetherornotourresultsareconsistentwiththoseoftheSAT03contest6(fordetailsresultssee[5]),wehadalookattheresultsofthefirstandsecondstagesofthecompetition.Firststage.Lookingattheresults7fortheindustrialcategory[5],wenotethefollowing:–Series13rule8isover-represented(Cf.Table4).Besides,sinceallsolverstime-outonmostoftheCNFsfromthisseries,itisnotverymeaningful.–Exceptforseries13ruleand11rule,theseriesfromtheIBMCNFbench-markareeasyforzChaff,BerkMin561,andSiegev1.–Resultsforrule07arenotconsistentwithourownexperiences.Thisdis-crepancyisprobablycausedbythe“Lisasyndrome”[4]:CNFswereshuffledforSAT03andsolverperformancecandifferdramaticallybetweenashuffledCNFandtheoriginal.TheSAT03contestresultsforBerkMin561,Forklift,Siegev1,andzChaffon(shuffled)seriesfromtheIBMCNFBenchmarkaresummarizedinTable4.Ifresultsfromseries07ruleand13rulearediscarded,zChaffshowsbetterresultsthanBerkMin561.678InthefirststageoftheSAT04competition,zChaffIandzChaffIIhaveverycloseresultsandthesolversthatoutperformedthem(Forklift,Oepir)arenotavailableforexperimentation.TheSiegeversionusedfortheSAT03contestisSiegev1.ThenamesoftheSAT03seriesappearsinitalicsinordertodifferentiatethemfromtheseriesweusedinourexperiments.FortheexactcompositionoftheSAT03series,see[5].