Parslet and its friends
Parslet is a small parser framework for Ruby. It uses the PEG formalism for its grammars, and its not alone to do so. I have been asked to compare parslet to Treetop and Citrus. For raw parsing speed. With complicated grammars.
I’ve generated treetop and citrus versions from rkhs Ansi Smalltalk grammar. As a byproduct, parslet now exposes visitors for the grammar and can generate these two formats as string.
Benchmarks have been run using the above grammar with varying input sizes. This matters a lot for this grammar, since it nests very deep. Speed is not going to vary linearly with size of input in some cases.
I’ve included two versions of parslet, the current one and the very soon to be released (these days in fact) 1.1.0. The difference between the two is about one month of performance optimization.
Treetop and citrus enter the competition with the latest released gems. (1.4.9 and 2.3.4 respectively) I am putting the code that generates these grammars into the next release of parslet, so these benchmarks can be rerun once either one changes.
Here are the pure numbers:
Input size | parslet-1.0 | parslet-1.1 | treetop | citrus |
185 chars | 23.644s | 0.291s | 0.644s | 2.503s |
361 chars | 14m28.970s | 0.335s | 0.405s | 1m16.155s |
671 chars | 19m2.713s | 0.390s | 0.421s | 1m26.287s |
1671 chars | ages | 0.581s | 0.477s | 1m34.930s |
9796 chars | ages | 2.281s | 0.933s | 5m8.059s |
22186 chars | ages | 5.521s | 1.630s | 7m35.694s |
These figures have been measured on my weakly MacBook Air (1.6 GHz, the old one) using Ruby 1.9.2 p136. The exact figures don’t matter much, we’re after a graphic of big O here:
As you can see, treetop and parslet kind of fight it out along the X axis. Here’s another graphic that displays the relevant details of that:
Conclusions
There is a big difference between the 1.0 and 1.1 version of parslet. In fact, I think it is now big enough to warrant a release. I’ll continue to do performance optimization. The numbers indicate that parslet looses out versus treetop for big inputs because it allocates a lot of small objects. I might find ways around that.
One of the ideas I’m tossing around is to optimize not only the way work is being done, but the amount of work itself. If you imagine a tiny part of a grammar that is still not too unlikely to occur in reality:
str('chunky') >> str('bacon')
It is quite obvious that this could be simplified by a grammar walk to
str('chunkybacon')
This impacts on several levels. There is now just one parslet to test against,
which means less code to execute. Input is read in bigger chunks; this will
also benefit performance. And finally, error messages might even be improved
by an optimized version of the grammar. Did I mention that you can now visit
the grammar tree inside parslet by requiring
parslet/atoms/visitor
? So things are ready for this step, its
just that I want to flush the other changes to the public, so to speak, before
starting something new.
Parslet vs…
Comparing parslet against treetop or citrus based on performance alone will
always miss the point. Certainly parsing speed matters. But being able to
progress with your parser project and not hit roadblocks early matters just as
much. I’ve gotten good feedback in that respect; it seems that what works for
me also worked for others. Also, I am happy that people on irc
(#parslet
on freenode) have been helping each other out,
(re)creating the friendly atmosphere I associate with Ruby.
I think parslet needs to make further progress in several directions. One direction will certainly be execution speed. And treetops code generation really does a lot, at least so it seems. But realize, treetop does less work as well; parslet keeps all these detailed error messages around for you to peruse. That’s got to count, right?
These are exciting times
And I am happy to be part of it. If you haven’t seen parslets quiet beauty yet, you should head to the project home page and check it out. If you have, please upgrade to 1.1 – it contains bug fixes and a speedy core. Doing so will improve both yours and your users experience!