Parslet and its friends

Parslet is a small parser framework for Ruby. It uses the PEG formalism for its grammars, and its not alone to do so. I have been asked to compare parslet to Treetop and Citrus. For raw parsing speed. With complicated grammars.

I’ve generated treetop and citrus versions from rkhs Ansi Smalltalk grammar. As a byproduct, parslet now exposes visitors for the grammar and can generate these two formats as string.

Benchmarks have been run using the above grammar with varying input sizes. This matters a lot for this grammar, since it nests very deep. Speed is not going to vary linearly with size of input in some cases.

I’ve included two versions of parslet, the current one and the very soon to be released (these days in fact) 1.1.0. The difference between the two is about one month of performance optimization.

Treetop and citrus enter the competition with the latest released gems. (1.4.9 and 2.3.4 respectively) I am putting the code that generates these grammars into the next release of parslet, so these benchmarks can be rerun once either one changes.

Here are the pure numbers:

Input size	parslet-1.0	parslet-1.1	treetop	citrus
185 chars	23.644s	0.291s	0.644s	2.503s
361 chars	14m28.970s	0.335s	0.405s	1m16.155s
671 chars	19m2.713s	0.390s	0.421s	1m26.287s
1671 chars	ages	0.581s	0.477s	1m34.930s
9796 chars	ages	2.281s	0.933s	5m8.059s
22186 chars	ages	5.521s	1.630s	7m35.694s

These figures have been measured on my weakly MacBook Air (1.6 GHz, the old one) using Ruby 1.9.2 p136. The exact figures don’t matter much, we’re after a graphic of big O here:

As you can see, treetop and parslet kind of fight it out along the X axis. Here’s another graphic that displays the relevant details of that:

Conclusions

There is a big difference between the 1.0 and 1.1 version of parslet. In fact, I think it is now big enough to warrant a release. I’ll continue to do performance optimization. The numbers indicate that parslet looses out versus treetop for big inputs because it allocates a lot of small objects. I might find ways around that.

One of the ideas I’m tossing around is to optimize not only the way work is being done, but the amount of work itself. If you imagine a tiny part of a grammar that is still not too unlikely to occur in reality:


  str('chunky') >> str('bacon')

It is quite obvious that this could be simplified by a grammar walk to


  str('chunkybacon')

This impacts on several levels. There is now just one parslet to test against, which means less code to execute. Input is read in bigger chunks; this will also benefit performance. And finally, error messages might even be improved by an optimized version of the grammar. Did I mention that you can now visit the grammar tree inside parslet by requiring parslet/atoms/visitor? So things are ready for this step, its just that I want to flush the other changes to the public, so to speak, before starting something new.

Parslet vs…

Comparing parslet against treetop or citrus based on performance alone will always miss the point. Certainly parsing speed matters. But being able to progress with your parser project and not hit roadblocks early matters just as much. I’ve gotten good feedback in that respect; it seems that what works for me also worked for others. Also, I am happy that people on irc (#parslet on freenode) have been helping each other out, (re)creating the friendly atmosphere I associate with Ruby.

I think parslet needs to make further progress in several directions. One direction will certainly be execution speed. And treetops code generation really does a lot, at least so it seems. But realize, treetop does less work as well; parslet keeps all these detailed error messages around for you to peruse. That’s got to count, right?

These are exciting times

And I am happy to be part of it. If you haven’t seen parslets quiet beauty yet, you should head to the project home page and check it out. If you have, please upgrade to 1.1 – it contains bug fixes and a speedy core. Doing so will improve both yours and your users experience!