EMSQMS 2010:Papers with Abstracts

Papers
Abstract. This invited talk will look at logic solvers through the application lens of constructing and processing a theory library of mechanized mathematics. In fact, constructing and processing theories are two distinct applications, and each will be considered in turn. Construction is carried out by formalizing a mathematical theory using an interactive theorem prover, and logic solvers can remove much of the drudgery by automating common reasoning tasks. At the theory library level, logic solvers can provide assistance with theory engineering tasks such as compressing theories, managing dependencies, and constructing new theories from reusable theory components.
Abstract. This paper seeks to explore the predictability of SAT and SMT solvers in response to different kinds of changes to benchmarks. We consider both semantics-preserving and possibly semantics-modifying transformations, and provide preliminary data about solver predictability. We also propose carrying learned theory lemmas over from an original run to runs on similar benchmarks, and show the benefits of this idea as a heuristic for improving predictability of SMT solvers.
Abstract. In this paper we report about QBFEVAL'10, the seventh in a series of events established with the aim of assessing the advancements in reasoning about quantified Boolean formulas (QBFs). The paper discusses the results obtained and the evaluation setup, from the criteria used to select QBF instances down to the hardware infrastructure. We also discuss the current state-of-the-art in light of past challenges and we envision future research directions that are motivated by the results of QBFEVAL'10.
Abstract. Evaluating improvements to modern SAT solvers and comparison of two arbitrary solvers is a challenging and important task. Relative performance of two solvers is usually assessed by running them on a set of SAT instances and comparing the number of solved instances and their running time in a straightforward manner. In this paper we point to shortcomings of this approach and advocate more reliable, statistically founded methodologies that could discriminate better between good and bad ideas. We present one such methodology and illustrate its application.
Abstract. (extended abstract submitted as paper 8)
Abstract. The SMT-Exec service is a benchmark repository, execution service, and competition infrastructure for the SMT community. Besides running the yearly competition and providing real-time (and also archival) results and analysis, SMT-Exec permits "private experiments" to be run year-round by researchers all over the world. These experiments are just like the yearly competition and run on the same computing cluster, but may be parameterized by users to run on benchmark and solver subsets of interest with a configurable timeout. Private solvers may be uploaded and tested against each other or against archival versions of competition solvers, and solvers and experiment results may be "published" so that they are publicly viewable. This talk describes SMT-Exec design highlights, challenges, growing pains, and current development plans for future versions of SMT-Exec.
Abstract. In order to compare the quality of proofs, it is necessary to measure artifacts of the proofs, and evaluate the measurements to determine differences between the proofs. This paper discounts the approach of ranking measurements of proof artifacts, and takes the position that different proofs are good proofs. The position is based on proofs in the TSTP solution library, which are generated by Automated Theorem Proving (ATP) systems applied to first-order logic problems in the TPTP problem library.