Chipmunk: A Tool for Matching of Regular Expressions

  1. News
  2. About
  3. Source Code
  4. Conference Papers
  5. Benchmarks
  6. Contact
  7. Related Papers
  8. Acknowledgement

News

November, 2020:The paper about Chipmunk has been accepted to appear at OOPSLA'20.


About

The tool allows efficient matching regular expressions (regexes) with bounded repetition using deterministic automata with registers - counting-sets automata (CsA). The CsA can hold sets of bounded integers and can be manipulated by a limited portfolio of constant-time operations. Our experimental results confirm that deterministic CsAs produced from practical regexes with repetition are indeed vastly smaller than the corresponding DFAs. This prototype matcher based on CsA simulation handles practical regexes with repetition regardless of sizes of counter bounds. It easily copes with regexes with repetition where state-of-the-art matchers struggle.

This is a common research with Microsoft Research (Margus Veanes).



Source Code

https://pajda.fit.vutbr.cz/ituronova/countingautomata-matcher


Conference Papers

1.
HOLIK, L.; LENGAL, O.; VOJNAR, T.; VEANES, M.; TURONOVA, L. Regex Matching with Counting-Set Automata. Proceedings of the ACM on Programming Languages, 2020, vol. 4, no. 11, p. 1-30. ISSN: 2475-1421.


Benchmarks

The regexes used for this experiment were selected:

1.
from the database of over 500,000 real-world regexes coming from an Internet-wide analysis of regexes collected from over 190,000 software projects;
2.
from databases of regexes used by network intrusion detection systems (NIDSes), in particular, Snort, Bro, Sagan, and, moreover, the academic papers;
3.
the RegExLib database of regexes; and
4.
industrial regexes originally used for security purposes.


The original set of files can be retrieved from repository here.

Contact

If you have further questions, do not hesitate to contact authors:

[1]
HOLIK, L.; LENGAL, O.; TURONOVA, L.; VOJNAR, T.; SAARIKIVI, O.; VEANES, M. Succinct Determinisation of Counting Automata via Sphere Construction. In In Proc. of 17th Asian Symposium on Programming Languages and Systems - APLAS'19. Lecture Notes in Computer Science. Berlin Heidelberg: Springer Verlag, 2019. p. 468-489. ISSN: 0302-9743.

Acknowledgements

This work is supported by ERC CZ project LL1908 and FIT BUT internal project FIT-S-20-6427.