Sloth - An SMT Solver for String Constraints

About

Sloth is a decision procedure for several relevant fragments of string constraints, including the straight-line fragment and acyclic formulas. In contrast to most other string solvers, Sloth is able to decide satisfiability of constraints combining concatenation, regular expressions, and transduction, all of which are essential in applications. Sloth uses succinct alternating finite-state automata (AFAs) as concise symbolic representations of string constraints, and uses model checking algorithms like IC3 for solving emptiness of the AFA. More details on the algorithm can be found in our POPL'18 paper; a technical report is available here.

Requirements

It is easiest to work with one of the binary releases of sloth. The following packages are needed in order to run Sloth on your system:

OpenJDK >= 1.7
One of the following model checkers:
- abc >= 1.0.1 (also requires aiger >= 1.9.4)
- nuXmv >= 1.1.1

Sloth is heavily based on the Princess theorem prover, which is included in the binary release packages.

Installation as Binary

In order to compile a binary release of Sloth, you have to do following steps:

Download one of these packages and install them as directed:
- nuXmv - https://es-static.fbk.eu/tools/nuxmv/
- abc - https://bitbucket.org/alanmi/abc/ and aiger - http://fmv.jku.at/aiger/
Copy the files aigtoaig (aiger), abc and nuXmv into /usr/bin/
Download one of the binary releases, unpack it, and call ./sloth
- Alternatively, you can also invoke sloth using the command java -jar target/scala-2.11/sloth-assembly-1.0.jar

Download and Installation from Sources

In order to compile Sloth from the sources, in addition to one of the model checkers (nuXvm or abc) you need the Scala build system, which is included in common Linux distributions:

sbt >= 1.0

The follow the steps:

Clone the repository to the local repository
- $ git clone https://github.com/uuverifiers/sloth.git
Run sbt assembly
Call ./sloth to invoke the tool

Usage

./sloth [options] [file] Sloth options:

+model	print models/satisfying assignments
+splitOpt	enable split optimization. This optimization sometimes reduces the number of considered cases significantly
-modelChecker=abc\|nuxmv\|simple	choose the model checker, default is nuxmv. Option "simple" enable our model checker that is used on formulae without splitting (recommended uses with +splitOpt).
+minimalSuccessors	enable an optimisation that reduces the set of considered AFA states (use for abc, nuxmv)
+assert	enable runtime assertions in Sloth

Input language

Sloth processes constraints in the SMT-LIB 2.6 format, extended with a theory of strings. Sloth is mostly compatible with input language from CVC4 (http://cvc4.cs.stanford.edu/wiki/Strings) and supports a large subset of the CVC4 string operations; a summary of supported operations is given below. For examples, we recommend to have a look at the tests directory.

	SMTLIB language
String Sort	`String`
String literals	`"abcdef"`
String equality	`(= s1 s2)`
Concatenation	`(str.++ s1 ... sn)`
Single replacement	`(str.replace s t1 t2)`
Replacement	`(str.replaceall s t1 t2)`
Membership in regular expression	`(str.in.re s r)`
String to regular expression	`(str.to.re s)`
Regular expression concatenation	`(re.++ r1 ... rn)`
Regular expression union	`(re.union r1 ... rn)`
Regular expression intersection	`(re.inter r1 ... rn)`
Regular expression Kleene star	`(re.* r)`
Regular expression plus	`(re.+ r)`
Regular expression character range	`(re.range s t)`

String literals

At the moment Sloth only supports 256 character (8-bit) Extended ASCII alphabets. String literals are composed from printable characters (value between 0x20 and 0x7e) in double quotes. Escape sequences are supported too. They are interpreted in the usual way, as shown in the following table.

\0 ... \9	represents ASCII character 0 … 9, respectively
\a, \b, \e, \f, \n,\r, \t, \v	represents its corresponding ASCII character
\ooo	encodes a single ASCII character where ooo consists of exactly three digits in the octal encoding of the character (from 0 to 377)
\xNN	encodes a single ASCII character, where NN is two-digit hexadecimal constant

The backslash character (\) is silently ignored when it is followed by a sequence of characters not recognized as an escape sequence. Non-printable/extended ASCII character is printed in the exadecimal format \xNN, except for the character denoted by the escape sequences \a, \b, \e, \f, \n, \r, \t, \v, which are printed using those escape sequences.

Most Important Operations on Strings

To define a string variable/constant:
(declare-fun x () String)
or
(declare-const x String)

String Concatenation:
(str.++ s1 ... sn)
where s1, ..., sn are string terms. String concatenation takes at least 2 arguments.

String Single Replacement:
(str.replace s t1 t2)
where s, t1 and t2 are string terms, t1 is non-empty. This function searches t1 in s, and returns the string in which the first occurrence of t1 is replaced by the string t2.

String Replacement:
(str.replaceall s t1 t2)
where s, t1 and t2 are string terms, t1 is non-empty. This function searches t1 the s, and returns a new string where all occurrences of t1 are replaced by string t2.

Symbolic Regular Expression

Membership Constraint:
(str.in.re s r)
where s is a string term and r is a regular expression.

String to Regular Expression Conversion:
(str.to.re s)
where s is a string term. The statement turns a regular expression that only contains a string s.

Regular Expression Concatenation:
(re.++ r_1 ... r_n)
where r_1, ..., r_n are regular expressions.

Regular Expression Alternation:
(re.union r_1 ... r_n)
where r_1, ..., r_n are regular expressions.

Regular Expression Intersection:
(re.inter r_1 ... r_n)
where r_1, ..., r_n are regular expressions.

Regular Expression Kleene-Star:
(re.* r)
where r is a regular expression.

Regular Expression Kleene-Cross:
(re.+ r)
where r is a regular expression.

Regular Expression Option:
(re.opt r)
where r is a regular expression.

Regular Expression Range:
(re.range s t)
where s, t are single characters in double quotes, e.g. "a", "b". It returns a regular expression that contains any character between s and t.

The Empty Regular Expression (recognising the empty set):
re.nostr

The Regular Expression that contains all characters:
re.allchar

Transducers

Sloth can also handle string transformations expressed as transducers. We represent transducers as mutually recursive functions with multiple arguments in SMT-LIB. In general, the characteristic function of the language accepted by an transducer can be written as a set of mutually (primitive) recursive functions over strings; the equations defining the functions directly correspond to the transitions of the transducer.

For instance, considering strings over 8-bit bit-vectors, the following recursive function describes the to-uppercase transducer. The transducer definition makes use of the additional operations seq-head to extract the first character of a (non-empty) string, and seq-tail to compute the tail:

(define-fun-rec toUpper ((x String) (y String)) Bool (or (and (= x "") (= y "")) (and (= (seq-head y) (ite (and (bvule (_ bv97 8) (seq-head x)) ; 'a' (bvule (seq-head x) (_ bv122 8))) ; 'z' (bvsub (seq-head x) (_ bv32 8)) ; 'a' -> 'A', etc. (seq-head x))) (toUpper (seq-tail x) (seq-tail y)))))

This function can then be used to formulate string constraints, for instance:

(declare-fun x0 () String) (declare-fun x1 () String) (declare-fun x2 () String) (declare-fun x3 () String) (declare-fun s0 () String) ; source variable ; (assert (toUpper x2 x3)) (assert (= x0 (str.++ "he" s0 "o"))) (assert (= x2 (str.++ x0 x1))) ; (check-sat) (get-model)

It is also possible to work with non-length-preserving transducers, i.e., transducers that map a string to a new string of potentially different length. For instance, the following transducer extracts the string in between the 0th and 1st occurrence of '=':

(define-funs-rec ((extract1st ((x String) (y String)) Bool) (extract1st_2 ((x String) (y String)) Bool) (extract1st_3 ((x String) (y String)) Bool)) ( ; ; extract1st (or (and (= x "") (= y "")) (and (not (= (seq-head x) (_ bv61 8))) ; not '=' (extract1st (seq-tail x) y)) (and (= (seq-head x) (_ bv61 8)) ; '=' (extract1st_2 (seq-tail x) y))) ; ; extract1st_2 (or (and (= x "") (= y "")) (and (= (seq-head x) (seq-head y)) (not (= (seq-head x) (_ bv61 8))) ; not '=' (extract1st_2 (seq-tail x) (seq-tail y))) (and (= (seq-head x) (_ bv61 8)) ; '=' (extract1st_3 (seq-tail x) y))) ; ; extract1st_3 (or (and (= x "") (= y "")) (extract1st_3 (seq-tail x) y)) ; ))

Contact

If you have any questions, do not hesitate to contact the tool/method authors:

Lukas Holik <holik@fit.vutbr.cz>
Petr Janku <ijanku@fit.vutbr.cz>
Anthony W. Lin <anthony.w.to@gmail.com>
Philipp Ruemmer <philipp.ruemmer@it.uu.se>
Tomas Vojnar <vojnar@fit.vutbr.cz>

Acknowledgements

This work is supported by the AQUAS: Aggregated Quality Assurance for Systems (737475), the Czech Science Foundation (projects 14-11384S, 16-24707Y, 17-12465S), the internal BUT FIT project FIT-S-17-4014, and the IT4IXS: IT4Innovations Excellence in Science (LQ1602).