My First Useful Perl6 Grammar

Created 2015-10-18 / Edited 2015-10-18

Tags: Perl6

In my recent talks about perl6, one thing that I keep neglecting is Grammars. Part of that is because I hadn't used them yet!

I was putting together a talk on some clojure code, and was getting mad at my slide software not syntax-highlighting with magical rainbow parenthesis. Clojure being a LISP-like, it lacks some visual clues for structure and is WAY easier to glance at (in my opinion) when parenthesis are highlighted. I have vim doing this for me... but no dice in my slides.

So! Slice up slides, run them through vim, stitch them back together, and presto!

Complete code is at https://github.com/awwaiid/pinpoint-code-color.

I use Pinpoint as my go-to slide software. Simple and easy both to develop a deck and to present it. Each slide is a header and a body done in some Pango markup (more or less simplified HTML). Here is a 3 slide deck:

#!/usr/bin/env pinpoint

# Defaults for all slides
[blackbluefade.png]
[font=Sans 90]
[duration=50.00]
[fill]
[shading-opacity=0]

--

Some other slide content.

-- [font=monospace 99] [code=sh]

\#!/bin/sh

\# A lovely cgi application

echo "Content-type: text/html"
echo
echo "Hello, world! Query:"
echo $QUERY_STRING

-- [font=monospace 99] [code=python]

\# Python WSGI

def application(environ, start_response):
    start_response('200 OK', [('Content-Type', 'text/plain')])
    yield 'Hello, world!'

We'll do this in two parts.

=== Part 1: The Grammar ==

#!/usr/bin/env perl6

grammar Pinpoint::Grammar {
token TOP {


+
}

token header { .*? <?before ^^ "--" > }

token slide {


}

token slide-header { ^^ "--" (\s* )* $$ \n }

token slide-setting { '' .+? '' }

token slide-content { .? <?before ^^ "--" > || . $ }
};


Using 'grammar' is kinda like some funky cross between a regex and a class. That is -- everything is declared like a named regex, but the result is effectively a class that has a ".parse" method we can use. We're giving the whole grammar a name, and then declaring the file structure in the body.

'TOP' is the start of the parse, which will be an entire file for me. There is one special slide that is the header-slide, and then after that some content slides. Though each bit has a name, you do a lot of things just like other regexes -- so "<slide>+" means one-or-more slides.

After that it just gets broken down further and further. Probably the trickiest bit is the <code><?before ^^ "--" ></code> pattern, which we see a few times. This says that if we see a "--" on the start of it's own line, then we've gone too far. Slide content has the "||" to say if we NEVER find the "--" then we should instead continue to the end of the file.

=== Part 2: The Processor ==
Now that we have a handy dandy grammar, we'll just slurp in whatever file/stdin we are given, parse, and the traverse the parse tree. There is a way to do this with Action objects (another class with methods for each of these tokens), but I didn't do that.

<code class="perl6">
my $match = Pinpoint::Grammar.parse(slurp());

print $match<header>;

for $match<slide>.flat -> $slide {
  my $h = ~$slide<slide-header>;
  $h ~~ s/\s*\[code\=(.*?)\]//;
  print $h;

  if ~$slide<slide-header> ~~ /\[code\=(.*?)\]/ {
    my $cmd = "code-htmlize $0";
    my $filter = shell $cmd, :in, :out;
    $filter.in.say(~$slide<slide-content>);
    $filter.in.close;
    say $filter.out.slurp-rest;
  } else {
    print $slide<slide-content>;
  }
}

To make this easy, I print out the bits of the parse tree as I go. the '$h' is a chopped up version of a slide-header, removing my new and magical code=foo slide setting. Now that I'm looking at this, I see that I didn't actually traverse into the slide-setting list for each slide... I guess that would be better than the search/replace I'm doing here.

A specific thing to note is that "" is forcing string-context here, so "$slide" is taking the object for the slide-header node we are on and dumps it out as a string of the content. This makes sense, though I initially found the "~" usage a bit ugly.

Once we get down to a slide that has a "code=foo" style setting in the header, instead of printing the slide content like we do otherwise we'll run it through an external shell command. The "code-htmlize" command is some other script I wrote that takes in code and a language and outputs an HTML-colorized version using vim. Here we do the equivalent of Open2, getting a filehandle for the input and output of the external command, printing to it's input, reading from its output.

That's it! A pretty readable thing that parses and traverses my slide software file format, filtering the results as it goes.