Better Scala syntax highlighting for Hakyll

I chose Hakyll over other static site generators for this blog in part because Hakyll is built on Pandoc, John MacFarlane’s fantastic document conversion library. Unfortunately Pandoc’s syntax highlighting for Scala is incredibly bad:

import scala.language.experimental.macros
import scala.reflect.macros.Context

object TupleExample {
  def fill[A](arity: Int)(a: A): Product = macro fill_impl[A]
  def fill_impl[A](c: Context)(arity: c.Expr[Int])(a: c.Expr[A]) = {
    import c.universe._

    arity.tree match {
      case Literal(Constant(n: Int)) if n < 23 => c.Expr(
        Apply(Select(Ident("Tuple" + n.toString), "apply"), List.fill(n)(a.tree))
      )
      // Not going to worry about not getting what we expect.
    }
  }
}

Note in particular that the line defining fill is entirely gray—at the very least I’d expect the types to be highlighted somehow, and macro to be treated as a keyword. I also have no idea why you’d want the first piece of the package path in an import statement to be a different color from the rest of the path.

Pandoc uses the Kate editor’s syntax files, and Kate’s default syntax file for Scala is derived from one that’s included in the Scala distribution. I got no responses to a query about better Kate syntax files for Scala, so we’ll have to make our own.

We’ll start by grabbing a copy of Kate’s version of the file, which is probably somewhere like this on your system:

/usr/share/apps/katepart/syntax/scala.xml

The first thing you notice when you look at this file is that it starts off with several thousand class names:

<highlighting>
  <list name="scala2">
    <item> Actor </item>
    <item> ActorProxy </item>
    <item> ActorTask </item>
    ...

My guess is that these are there so that Kate can indicate when a class is in the standard Scala or Java libraries (although I don’t know why there would be for example a Fluid in the Scala section, then). I don’t particularly want my syntax highlighting to worry about that kind of thing, and Pandoc ignores the contexts that use these lists, anyway (since they only add extra styling to the normal default style, and Pandoc apparently doesn’t pass that along.)

So we just scrap them. A little further down we see a similar list of Java’s names for primitives (i.e., all lowercase). That can go, too. Now our syntax file is 3,351 lines lighter, and it’s easier to see how to add our macro keyword—it’s just an item in the keyword list.

Kate’s Highlight Definition XML format is a little more verbose than e.g. Vim’s language for doing the same thing, but it’s pretty straightforward. To handle our import problem, for example, we just add a line like this to the Normal context:

<RegExpr attribute="Keyword" context="Imports" String="\b(package|import)\b" />

And then a new context like this:

<context attribute="Normal Text" lineEndContext="#pop" name="Imports">
  <DetectChar attribute="String" context="#pop" char=";"/>
</context>

And that’s all—now any package or import statement will have normal text formatting up until we see a semicolon or the end of the line.

To highlight types, we start by adding a couple of lines to our Normal context again:

<DetectChar attribute="Symbol" context="TypeParamTop" char="["/>
<RegExpr attribute="Symbol" context="TypeAnnot" String="[^:]?:\s"/>

And three new contexts to go with them:

<context attribute="Data Type" lineEndContext="#stay" name="TypeParamTop">
  <DetectChar attribute="Data Type" context="TypeParam" char="["/>
  <DetectChar attribute="Symbol" context="#pop" char="]"/>
</context>
<context attribute="Data Type" lineEndContext="#stay" name="TypeParam">
  <DetectChar attribute="Data Type" context="TypeParam" char="["/>
  <DetectChar attribute="Data Type" context="#pop" char="]"/>
</context>
<context attribute="Data Type" lineEndContext="#stay" name="TypeAnnot">
  <DetectChar attribute="Data Type" context="TypeParam" char="["/>
  <RegExpr attribute="Data Type" context="#stay" String="\w+"/>
  <RegExpr attribute="Data Type" context="#stay" String="\s=>\s"/>
  <RegExpr attribute="Normal Text" context="#pop" String="[\)|\s]"/>
</context>

Now for the unpleasant part: getting Pandoc to see this syntax file. Pandoc uses a library called highlighting-kate (also developed by MacFarlane) that actually compiles Kate syntax files to Haskell code. This means that we’re going to have to rebuild both the highlighting-kate and pandoc packages

We start by unpacking the source for highlighting-kate (note that these instructions are based in part on a post by John Baker that explains how to add syntax highlighting for J to Pandoc):

cabal unpack highlighting-kate-0.5.3.8

Note that I’m using hakyll-4.2.2.0, which depends on pandoc-1.11.1, which depends on highlighting-kate-0.5.3.8. Your versions may vary.

The next step is to copy your new scala.xml file into the xml subdirectory of the highlighting-kate-0.5.3.8 directory you just unpacked. Then you build the thing, generate the Haskell code from the syntax files, and rebuild:

cd highlighting-kate-0.5.3.8/
cabal configure
cabal build
runhaskell ParseSyntaxFiles.hs xml/
cabal build
cabal copy

And next for pandoc itself:

cabal unpack pandoc-1.11.1
cd pandoc-1.11.1/
cabal configure
cabal build
cabal copy

Now you just rebuild your Hakyll project, and bingo:

import scala.language.experimental.macros
import scala.reflect.macros.Context

object TupleExample {
  def fill[A](arity: Int)(a: A): Product = macro fill_impl[A]
  def fill_impl[A](c: Context)(arity: c.Expr[Int])(a: c.Expr[A]) = {
    import c.universe._

    arity.tree match {
      case Literal(Constant(n: Int)) if n < 23 => c.Expr(
        Apply(Select(Ident("Tuple" + n.toString), "apply"), List.fill(n)(a.tree))
      )
      // Not going to worry about not getting what we expect.
    }
  }
}

It’s not perfect, but it is vastly better, and I don’t really feel like spending any more time this evening fussing with a syntax highlighter for an editor I don’t even use. You can watch this space for future improvements.