metaplasmus

  • Best new (to me) fiction of 2016

    24 December 2016
    fiction | books

    At least as far as my reading went, 2016 wasn’t a great year for fiction, but I did have more time for reading fiction than I’ve had since I was a grad student and that was my job. This isn’t the usual kind of thing I do with this blog, but here are thirty or so one-or-two-sentence reviews of some of the novels that I read for the first time this year (sorted roughly in decreasing order of importance they had for me).

    Continue reading »
  • A note about LambdaConf

    24 March 2016
    scala | lambdaconf

    Yesterday morning John De Goes published a blog post explaining why he and the other LambdaConf organizers decided not to uninvite an outspoken defender of slavery and lots of other vile stuff from their conference.

    If you think that keeping Yarvin on the program was the right choice, then this post isn’t for you. I think you’re wrong, and it’s pretty likely I also think your reasons are bad, predictable, and boring. John’s blog post is at least not predictable or boring, even though there are plenty of problems with the process he describes, and even though it’s really hard to read his anxiety about the power of “social media”, etc. as anything but more Moldbuggian fretting over lost privilege.

    What this post is about is how good last year’s LambdaConf was. It’s the only LambdaConf I’ve been to, and will probably be the only one I’ll ever go to, now, but it was also possibly the best tech conference I’ve ever been to. They seemed to get so many things right and to have thought carefully about so many things: the balance of the program, the clarity about the code of conduct, the on-site child care, the emphasis on not organizing every extracurricular event around alcohol, and so on. I’m not sure I’ve ever seen another conference note the availability of all-gender restrooms on their website. At least from my perspective, none of these efforts felt perfunctory—I got the impression that the organizers really cared deeply about creating a welcoming environment.

    This was especially important to me at last year’s conference because in May 2015 the Scala community was even more of a disaster than usual. Scalaz had recently withdrawn from Typelevel, the Scalaz leadership had been rearranged, and I’d been in some fairly unpleasant arguments with a couple of the LambdaConf speakers and more than a couple of the other attendees. I was nervous about what this would mean for the tone of the conference, but somehow it was a non-issue—I personally saw nothing but civility and a lot more good will than I expected, and I believe the LambdaConf organizers deserve at least part of the credit for that.

    This is why I find yesterday’s decision so frustrating—I know people do shitty, inconsistent, exclusionary things all the time, especially in the tech industry, but this is like watching a friend do something particularly shitty, inconsistent, and exclusionary.

    Continue reading »
  • JSON numbers in circe 0.3.0

    14 February 2016
    scala | circe | json | argonaut

    I’m publishing this article as a blog post (rather than as part of the circe documentation) because it’s intended to be a discursive overview of the representation of JSON numbers in circe 0.3.0, not as an up-to-date description of the implementation in whatever the current circe version happens to be when you’re reading this. For information about JSON numbers in circe versions after 0.3.0, please see the project documentation (in particular the changelogs and API docs).

    Continue reading »
  • Configuring generic derivation

    14 January 2016
    scala | shapeless | circe | json | macros

    This post is a kind of sequel to my previous article on type classes and generic derivation, although unfortunately there’s a lot of intermediate content that should go between there and here that I haven’t written yet. This post introduces a new feature in circe that I’m pretty excited about, though, so I’m not going to worry about skipping over that stuff for now.

    Continue reading »
  • Yet another iteratee library

    8 January 2016
    scala | haskell | cats | iteratees | scalaz

    I’ll start with the story of how I got saved, since it’s kind of relevant. Back when I was an English Ph.D. student, I worked on a number of projects that involved natural language processing, which meant doing a lot of counting trigrams or whatever in tens of thousands of text files in giant messy directory trees. I was working primarily in Ruby at the time, after years of Java, and at least back in 2008 it was a pain in the ass to do this kind of thing in either Ruby or Java. You really want a library that provides the following features:

    1. Resource management: you don’t want to have to worry about running out of file handles.
    2. Streaming: you shouldn’t ever have to have all of the data in memory at once.
    3. Fusion: two successive mapping operations shouldn’t need to traverse the data twice.
    4. Graceful error recovery: these tasks are all off-line, but you still don’t want to have to restart a computation that’s been running for ten minutes just because the formatting in one file is wrong.

    Maybe there was such a library for Ruby or Java back then, but if there was I didn’t know about it. I did have some experience with Haskell, though, and at some point in 2010 I heard about iteratees, and they were exactly what I’d always wanted. I didn’t really understand how they worked at first, but with iteratee (and later John Millikin’s enumerator) I was able to write code that did what I wanted and didn’t make me think about stuff I didn’t want to think about. I started picking Haskell instead of Ruby for new projects, and that’s how I accepted statically-typed functional programming into my life.

    Continue reading »
  • Error accumulating decoders in circe

    17 December 2015
    scala | json | circe

    Suppose we’re writing a service that accepts JSON requests and returns some kind of response. If there’s a problem with a request—it’s not even valid JSON, it doesn’t match the schema we expect, etc.—we want to return an error, and of course it’d be nice if these errors were actually useful to the caller.

    Unfortunately “useful” in this context can mean lots of different things, and the differences will usually depend at least in part on how involved a human was in creating the request. In the case of validation errors—i.e. we successfully received some JSON, but it’s not a shape we understand—then if there’s no human in sight, we generally only need to say something like “hey, we’re not even speaking the same language, you should probably go try somewhere else”. A detailed breakdown of all the reasons we don’t understand the request is unlikely to be useful, so we might as well fail as fast as possible and save resources.

    If on the other hand a human was responsible for the content of the request, it’s possible that the caller will be able to make use of detailed information about all the problems with that content. Suppose for example that the JSON comes from a web form or spreadsheet that for whatever reason needs to be at least partially validated on the server side. In this case we probably don’t want to fail fast—we want to accumulate all of the errors and send them back together, so that the human can correct them in a single pass.

    Continue reading »
  • Type classes and generic derivation

    8 November 2015
    scala | shapeless

    Yesterday I wrote a Stack Overflow answer about using Shapeless for generic derivation of type class instances, and this morning I started putting together some new documentation for circe’s generic derivation, and after a few paragraphs I decided that it might make sense to write a blog post that could serve as a bridge between the two—between simple examples like the one in my answer (which doesn’t really go into motivation, etc.) and the kinds of things we’re doing in circe. I’ll start more or less from scratch, assuming only some basic familiarity with Scala syntax.

    Continue reading »
  • Goodbye, Twitter

    13 October 2015
    scala | twitter | personal

    Today is my last day at Twitter (well, kind of—my email access was shut off this morning, so I’ve been spending the day reading a novel).

    I’m not sure how I feel about this yet. After some very recent changes in the Twitter Open Source team, I’d been vaguely thinking about leaving the company anyway, and since I’m not a proud person (at least I don’t think I am), I have no complaints about that happening involuntarily if it means a severance package. We’ll see whether Jack and I agree about the “generous” part.

    If I tried to recruit you to work at Twitter recently, my apologies—I had no idea this was coming. I still think it’s a great place to work, even though (like most of the rest of the world) I’m not sure how much confidence I have in its leadership or product direction. There are lots of incredibly smart and generous people at Twitter. In fact in my experience pretty much everyone there at least three or four steps from the top of the org chart fits that description. You should probably work there if you get the chance.

    I was told literally nothing about Twitter’s plans for the Twitter Open Source team, which for the past couple of weeks has consisted of two people (counting me). If you’re using Twitter open source software, you’re probably okay, especially if you’re using a project like Finagle that’s owned by a team that cares a lot about open source (and the Core System Libraries team does). Because the Open Source team has never been larger than three people, individual engineering teams have always been primarily responsible for maintaining their own open source projects, and I’m assuming that’s not going to change.

    I’ll miss pretending to be Finagle on Twitter, but I know I’m leaving @finagle in good hands.

    If you’re reading this, you probably have some idea about the kind of work I like to do. If you’ve got that kind of work, get in touch (preferably on Twitter). I’m planning to take off at least a couple of months to read, work on open source projects (including Finch!), and maybe learn some more Rust or Idris or something, but I’m keeping my eyes open.

    If I worked with you at Twitter, or if you attended my Twitter University courses, thanks—it was a great seventeen months.

    Continue reading »
  • Roll-your-own Scala

    11 July 2015
    scala | scalaz | cats

    I’ve always really liked this passage from On the Genealogy of Morals:

    [T]here is a world of difference between the reason for something coming into existence in the first place and the ultimate use to which it is put, its actual application and integration into a system of goals… anything which exists, once it has come into being, can be reinterpreted in the service of new intentions, repossessed, repeatedly modified to a new use by a power superior to it.

    A couple of months ago at LambdaConf I had a few conversations with different people about why we like (or at least put up with) Scala when there are so many better languages out there. Most of the answers were the usual ones: the JVM, the ecosystem, the job market, the fact that you don’t have to deal with Cabal, etc.

    For me it’s a little more complicated than that. I like Scala in part because it’s a mess. It’s not a “fully” dependently typed language, but you can get pretty close with singleton types and path dependent types. It provides higher-kinded types, but you have to work around lots of bugs and gaps and underspecified behaviors to do anything very interesting with them. And so on—it’s a mix of really good ideas and a few really bad ideas and you can put them together in ways that the language designers didn’t anticipate and probably don’t care about at all.

    The rest of this blog post will be a long story about one example of this kind of thing involving Scalaz’s UnapplyProduct.

    Continue reading »
  • Deriving incomplete type class instances

    21 June 2015
    scala | finch | shapeless | scalaz

    Suppose we’ve got a simple representation of a user:

    case class User(id: Long, name: String, email: String)

    Now suppose we’re writing a web service where we allow clients to post some JSON to a resource to create a new user. We get to pick the id, not the client, so we might accept something like this:

    {
      "name": "Foo McBar",
      "email": "foo@mcbar.com"
    }

    If we’re using a type class-based JSON library like Argonaut, we’ll probably have written a codec instance for User (or we may be using a library like argonaut-shapeless that derives instances for our case classes automatically).

    The problem is that our User codec won’t work on JSON like the example above (since it’s missing the id field).

    Continue reading »
  • Foldable considered confusing

    4 July 2014
    scala | haskell

    Tomasz Nurkiewicz recently published an article arguing that the fold on Option (new in Scala 2.10) is unreadable and inconsistent and should be avoided. I personally disagree about the unreadability part and the should be avoided part, which isn’t too surprising, since I generally disagree with Tomasz. I have a lot of respect for him, though, and I can actually get on board with a good chunk of what he says in the article. I can understand why you might want to avoid fold in some projects, and I agree that the way the standard library provides folds is inconsistent and frustrating—I still get a little angry when I think about the fact that Try doesn’t have a fold even though it was introduced in the same version of the language that gave us the fold on Option.

    So this post isn’t about why Tomasz is wrong about readability, etc.—it’s about how much I hate the name Foldable.

    Continue reading »
  • Partitioning by constructor

    14 June 2014
    scala | shapeless | macros

    It’s not unusual in Scala to want to take a collection with items of some algebraic data type and partition its elements by their constructors. In this Stack Overflow question, for example, we’re given a type for fruits:

    sealed trait Fruit
    
    case class Apple(id: Int, sweetness: Int) extends Fruit
    case class Pear(id: Int, color: String) extends Fruit

    The goal is to be able to take a collection of fruits and split it into two collections—one of apples and one of pairs.

    def partitionFruits(fruits: List[Fruit]): (List[Apple], List[Pear]) = ???

    It’s pretty easy to use collect to solve this problem for particular cases. It’s a little trickier when we start thinking about what a more generic version of such a method would look like—we want to take a collection of items of some arbitrary algebraic data type and return a n-tuple whose elements are collections of items of each of that ADT’s constructors (and let’s require them to be typed as specifically as possible, since this is Scala). It’s not too hard to imagine how you could write a macro that would perform this operation, but it’d be messy and would probably end up feeling kind of ad-hoc (at least without a lot of additional work and thinking).

    Fortunately we’ve got Shapeless 2.0, where Miles Sabin and co. have written the macros for us so we can keep our hands clean.

    Continue reading »
  • Explicit defaults in Scala

    17 October 2013
    scala | macros

    This post is another entry in my series on stuff you should never do with macros in Scala, but that you could do with macros in Scala, if you really wanted to, and if you’d picked up a bottle of Macallan on the way home from work and were willing to waste half an evening doing something ridiculously useless.

    It’s specifically a response to this Stack Overflow question, which asks if it’s possible to specify explicitly that you want to use the default value of a constructor parameter in Scala.

    So suppose we have a class like this:

    class Foo(val x: String, val y: Int = 13, val z: Symbol = 'zzz)

    The goal is to allow the following syntax:

    val useDefaultZ = true
    
    new Foo(x = "whatever", y = 1, z = if (useDefaultZ) default else 'whatever)

    This is possible with macros, and it’s not nearly as easy as you might think.

    Continue reading »
  • Ordering case classes

    13 October 2013
    scala | shapeless | macros

    Scala provides lexicographic Ordering instances for TupleN types when all of the element types have Ordering instances. It doesn’t provide the same instances for case classes, however—probably just because lexicographic order isn’t what you want for case classes as often as it is for tuples.

    Sometimes you actually do want a lexicographic ordering for your case classes, though, and Scala unfortunately doesn’t provide any nice boilerplate-free way to create such orderings. This post will provide a quick sketch of two approaches to filling this gap: one using macros, and one using Shapeless 2.0’s new Generic machinery and the TypeClass type class.

    First for a case class to use as a running example, along with some instances:

    case class Foo(x: Int, y: String)
    
    val a = Foo(9, "x")
    val b = Foo(1, "z")
    val c = Foo(9, "w")
    
    val foos = List(a, b, c)

    Let’s quickly confirm that there’s no Ordering[Foo] already sitting around:

    scala> foos.sorted
    <console>:14: error: No implicit Ordering defined for Foo.
                  foos.sorted
                       ^

    Yep, we’re going to have to take care of this ourselves.

    Continue reading »
  • Natural vampires

    3 October 2013
    scala | macros

    This Stack Overflow question is interesting—it asks whether we can use Scala macros to create a value class for positive integers where the positiveness is checked at compile-time, and where it’s not possible to create an invalid instance.

    I’m pretty sure it’s not. My first thought was to turn PosInt into a sealed universal trait with a private value class implementation in the PosInt companion object, but inheriting from a universal trait forces us to give up most (all?) of the advantages of value classes in this case, and of course it’s not actually possible to make the value class private, anyway.

    So I don’t have an answer, but I do have a pretty neat trick involving vampire methods that gives us some of the benefits of value classes.

    Continue reading »
  • Quasiquotes for multiple parameter lists

    6 September 2013
    scala | macros

    Quasiquotation is an old idea (Miles Sabin notes the term in a 1937 Quine paper, for example) that’s now available in Scala (thanks to the efforts of Eugene Burmako and Den Shabalin), where it allows us to avoid the nightmarishly complicated and verbose code that’s required to construct abstract syntax trees manually in our macros.

    Quasiquotes are a little like reification, but much more flexible about what kinds of things can be “spliced” into the tree, and where they can be spliced. For example, we couldn’t use reify in the following code, because there’s no way to splice in the name of the type member:

    def foo(name: String): Any = macro foo_impl
    def foo_impl(c: Context)(name: c.Expr[String]) = {
      import c.universe._
    
      val memberName = name.tree match {
        case Literal(Constant(lit: String)) => newTypeName(lit)
        case _ => c.abort(c.enclosingPosition, "I need a literal!")
      }
    
      val anon = newTypeName(c.fresh)
    
      c.Expr(Block(
        ClassDef(
          Modifiers(Flag.FINAL), anon, Nil, Template(
            Nil, emptyValDef, List(
              constructor(c.universe),
              TypeDef(Modifiers(), memberName, Nil, TypeTree(typeOf[Int]))
            )
          )
        ),
        Apply(Select(New(Ident(anon)), nme.CONSTRUCTOR), Nil)
      ))
    }

    This is an unreadable mess, and it’s not even a complete example—it depends on some additional utility code.

    Continue reading »
  • Potemkin val-age

    2 September 2013
    scala | macros

    My attempt to sneak the terms vampire and zombie into the Scala vernacular seems to be succeeding, so here’s a new one:

    Potemkin definitions: definitions in a macro-constructed structural type that are intended only to make an expression passed as an argument to another macro method typecheck before that macro rewrites it.

    I came up with the trick to support this horrible abuse of Scala syntax:

    case class Car(var speed: Int, var color: String) { ... }
    object Car { ... }
    
    import Car.syntax._
    
    val car = new Car(0, "blue")
    
    car set {
      color = "red"
      speed = 10000
    }

    Here color and speed are definitions in a structural type that have the same signatures as the fields on the case class, but they don’t actually do anything—if we call them we get a NotImplementedError. They only exist to allow the block expression we’re passing to set to typecheck before the macro implementation replaces them with something useful.

    Continue reading »
  • Feeding our vampires

    31 August 2013
    scala | macros

    I’ve written several times about vampire methods, which are macro methods inside a macro-defined type, where the macro method’s implementation is provided in an annotation. Normally when we define a type in a def macro, it looks like a structural type to the outside world, and calling methods on a structural type involves reflective access in Scala. Vampire methods allow us to avoid that ugly bit of runtime reflection.

    This trick (which was first discovered by Eugene Burmako) is useful because it makes it a little more practical to use def macros to approximate type providers, for example. It’s also just really clever.

    For methods with no parameters, the execution of the trick is pretty straightforward. It’s a little more complicated when we do have parameters, as Eric Torreborre observes in a question here, since in that case the annotation will need to contain a function instead of just a simple constant of some kind.

    Continue reading »
  • The most horrible code I've ever written

    30 August 2013
    scala | macros

    When macros first showed up in Scala as an experimental language feature last year, many Scala developers responded with skepticism or distaste. They argued that macros were a distraction from work on more urgent problems with the language, that they would lead to even more complex and reader-unfriendly code, etc.

    After a year and a half I think these arguments have less weight, as macros have proven extremely useful in a wide range of applications: string interpolation, serialization, type-level programming with singletons, numeric literals and faster loops, typed channels for actors, and so on. They’ve let me make many parts of my own code faster and safer in surprising ways.

    This post is not about a useful application of macros. It’s inspired by a couple of questions on Stack Overflow today, and is an example of exactly the kind of thing macros should not ever be used for. But it’s Friday evening and I’m drinking beer in the office and I think this trick is pretty clever, so here we go.

    Continue reading »
  • Lists are even easier

    27 August 2013
    haskell | scala

    This is a quick follow-up to my post last night about stream processing with iteratees. It’s worth pointing out that we can accomplish much the same thing even more concisely using Haskell’s lists:

    import Data.Char (isSpace)
    import Data.List (mapAccumL)
    import Data.List.Split (splitWhen)
    
    nextPage (page, _)    = (page + 1, 0)
    nextLine (page, line) = (page, line + 1)
    
    locator = snd . mapAccumL go (0, 0)
      where
        go loc c = let nextLoc = advance c loc in (nextLoc, (c, nextLoc))
        advance '\f' = nextPage
        advance '\n' = nextLine
        advance _    = id
    
    tokenizer = map unzip . filter (not . null) . splitWhen (isSpace . fst)
    
    poem =
      "It is the same! - For, be it joy or sorrow,\n" ++
      "The path of its departure still is free:\f" ++
      "Man's yesterday may ne'er be like his morrow;\n" ++
      "Nought may endure but Mutability.\n"
    
    main = mapM_ print $ tokenizer . locator $ poem

    (We could make this even more concise by using the standard library’s words in our definition of tokenizer, but I’m trying to stick to the same basic structure as the iteratee version.)

    Continue reading »
  • Iteratees are easy

    26 August 2013
    haskell | scala | iteratees

    This blog post is a short response to my MITH colleague Jim Smith, who several weeks ago published a blog post about a stream processing language that he’s developing. His post walks through an example of how this language could allow you to take a stream of characters, add some location metadata to each, and then group them into words, while still holding onto the location metadata about the characters that make up the words.

    The process he describes sounds a little like the functionality that iteratees provide, so I decided I’d take a quick stab at writing up an iteratee implementation of his example in Haskell. I’m using John Millikin’s enumerator package, since that’s the iteratee library that I’m most comfortable with.

    Continue reading »
  • Vampire methods for structural types

    12 July 2013
    scala | macros

    I wish I could take credit for what I’m about to show you, because it’s easily the cleverest thing I’ve seen all week, but it’s Eugene Burmako’s trick and I’ve only simplified his demonstration a bit and adapted it to work in Scala 2.10.

    Continue reading »
  • Fake type providers, part 2

    11 July 2013
    scala | macros

    I like writing code. I also like not writing code, especially when I’m writing code. Type providers are a particularly nice way not to write code. They let you take some kind of schema (for a relational database, RDF vocabulary, etc.) and turn it directly into binding classes at compile time—with no worrying about managing generated code, etc.

    I’ve wanted type providers in Scala for a long time (heck, I wanted type providers ten years ago when I was a Java programmer who had no idea what a type provider was but was dissatisfied with code generation for this kind of task). The type macros currently available in Macro Paradise will provide the real thing, but they’re still (at least) months and months away from a stable Scala release.

    In the meantime, you can get surprisingly good fake type providers with the def macros in Scala 2.10. In a previous post I outlined one set of macro-based approaches to the problem, with the most concise version involving structural types and therefore (unfortunately) reflective access. In this post I’ll go into a bit more detail about the code involved, and will look at just how bad reflective access actually is performance-wise by comparing the structural-type approach to two alternatives: plain old code generation and macro-supported compile-time dynamic types.

    Continue reading »
  • Singleton types for literals in Scala

    28 June 2013
    scala | macros

    It’s sometimes useful in Scala to have a type with a single value. These are called singleton types, and they show up most easily in the context of Scala’s objects. For example, if we have the following definition:

    object foo {
      def whatever = 13
    }

    We can refer to a type foo.type that is the singleton type for foo—i.e., the type that contains nothing except foo. We can use this type to write a function that won’t compile with any non-foo argument:

    def fooIdentity(x: foo.type) = x

    For example:

    scala> fooIdentity(foo)
    res1: foo.type = foo$@5da19724
    
    scala> fooIdentity("foo")
    <console>:14: error: type mismatch;
     found   : String("foo")
     required: foo.type
                  fooIdentity("foo")
                              ^

    Note that this error message doesn’t just tell us that we provided a String when we needed a foo—it lists the type as String("foo"). This is because string literals—like all other literals in Scala (except function literals)—are also singletons in the sense that their most specific type is a singleton type.

    Continue reading »
  • Macro methods and subtypes

    21 June 2013
    scala | macros

    Suppose we want to define an HListable trait in Scala that will add a members method returning an HList of member values to any case class that extends it. This would let us write the following, for example:

    scala> case class User(first: String, last: String, age: Int) extends HListable
    defined class User
    
    scala> val foo = User("Foo", "McBar", 25)
    foo: User = User(Foo,McBar,25)
    
    scala> foo.members == "Foo" :: "McBar" :: 25 :: HNil
    res0: Boolean = true

    So we try the following, which looks reasonable at a glance:

    Continue reading »
  • Macro-supported DSLs for schema bindings

    19 June 2013
    scala | macros | rdf | dh

    We’ve recently started using the W3C‘s banana-rdf library at MITH, and it’s allowing us to make a lot of our code for working with RDF graphs both simpler and less tightly coupled to a specific RDF store. It’s a young library, but also very clever and well-designed, and it does an excellent job of exploiting advanced features of the Scala language to make its users’ lives easier. Alexandre Bertails and his collaborators deserve a lot of credit for what they’ve accomplished in just a little over a year.

    One of the least pleasant aspects of working with any RDF library is writing bindings for particular vocabularies. For example, if we wanted to use the Open Archives Initiative’s Object Reuse and Exchange vocabulary in our banana-rdf application, we’d need to write something like the following:

    Continue reading »
  • Learning Shapeless

    9 June 2013
    shapeless | scala

    I began writing this post as an answer to this Stack Overflow question about learning how to use Shapeless, but it ended up a little long for a Stack Overflow answer, so I’m posting it here instead.

    When I first started teaching myself type-level programming in the context of Shapeless last year, I spent a lot of time working through simple problems with heterogeneous lists of type-level natural numbers. One fairly straightforward (but still non-trivial) example of such a problem is the second user story of this Coding Dojo kata, which is also outlined by Paul Snively in this email to the Shapeless mailing list.

    The goal is to determine whether a list of numbers is the appropriate length (nine) and has a valid checksum, which is calculated by taking the sum of each element multiplied by its distance (plus one) from the end of the list, modulo eleven.

    Continue reading »
  • Lots of little trees, part 2

    7 June 2013
    xml | scala | java

    I just noticed that the Lawrence Berkeley National Laboratory’s Nux library provides streaming XQuery functionality that makes it very easy to do the kind of XML processing that I described in this post last week.

    Using Scala, for example, we can start with some imports:

    import nu.xom.{ Builder, Element, Nodes }
    import nux.xom.xquery.{ StreamingPathFilter, StreamingTransform, XQueryUtil }

    Next we write the “transformer” that we want to apply to every record element:

    val processor = new StreamingTransform {
      def transform(record: Element) = {
        val id = XQueryUtil.xquery(record, "IndexCatalogueID").get(0)
        val placeResults = XQueryUtil.xquery(record, "//Place")
        val places = (0 until placeResults.size) map placeResults.get
    
        println(id.getValue + " " + places.map(_.getValue).mkString(", "))
        new Nodes()
      }
    }

    We’re not really transforming anything here, of course—just performing a side effect as we iterate through the records. We could just as easily be adding some representation of the record to a mutable collection, sending a message to an actor, etc.

    Continue reading »
  • Applicative validation syntax

    5 June 2013
    scala | scalaz | shapeless

    People say that Validation is Scalaz’s gateway drug, which might be accurate if you ignore the suggestion that there’s anything even remotely fun about validation. In my book, making sure that your program doesn’t choke on bad input is always a chore.

    Applicative validation is at least a step in the right direction—it makes it easier to write less code, introduce fewer bugs, and draw clearer lines between data models and validation logic. Suppose for example that we have the following case class in Scala:

    case class Foo(a: Int, b: Char, c: String)

    Suppose also that we have a form with three fields that we want to use to create instances of Foo. We receive input from this form as strings, and we want to be sure that these strings have certain properties.

    Continue reading »
  • Lots of little trees

    29 May 2013
    xml | haskell | conduits | dh

    In my field (computational humanities), people like to distribute databases as enormous XML files. These are often very flat trees, with the root element containing hundreds of thousands (or millions) of record elements, and they can easily be too big to be parsed into memory as a DOM (Document Object Model) or DOM-like structure.

    This is exactly the kind of problem that streaming XML parsers are designed to solve. There are two dominant approaches to parsing XML streams: push-based models (like SAX, the “Simple API for XML”), and pull-based models (like StAX, or—shudder—scala.xml.pull). Both of these approaches save memory by producing streams of events (BeginElement, Comment, etc.) instead of reconstructing a tree-based representation of the file in memory. (Such a representation can be 5-10 times the size of the file on disk, which quickly becomes a problem when you have four gigs of memory and your XML files are approaching a gigabyte in size.)

    Push-based APIs like SAX are inherently imperative: we register callbacks with the parser that specify how to handle events, and then it calls them as it parses the XML file. With a pull parser, on the other hand, the programmer sees the events as an iterator or lazy collection that he or she is responsible for iterating through. Newer frameworks that support streaming XML processing tend to provide pull-based APIs, and many developers find pull parsing more intuitive than SAX (or at least slightly less miserable).

    Continue reading »
  • Better Scala syntax highlighting for Hakyll

    28 March 2013
    scala | hakyll | pandoc

    I chose Hakyll over other static site generators for this blog in part because Hakyll is built on Pandoc, John MacFarlane’s fantastic document conversion library. Unfortunately Pandoc’s syntax highlighting for Scala is incredibly bad:

    import scala.language.experimental.macros
    import scala.reflect.macros.Context
    
    object TupleExample {
      def fill[A](arity: Int)(a: A): Product = macro fill_impl[A]
      def fill_impl[A](c: Context)(arity: c.Expr[Int])(a: c.Expr[A]) = {
        import c.universe._
    
        arity.tree match {
          case Literal(Constant(n: Int)) if n < 23 => c.Expr(
            Apply(Select(Ident("Tuple" + n.toString), "apply"), List.fill(n)(a.tree))
          )
          // Not going to worry about not getting what we expect.
        }
      }
    }

    Note in particular that the line defining fill is entirely gray—at the very least I’d expect the types to be highlighted somehow, and macro to be treated as a keyword. I also have no idea why you’d want the first piece of the package path in an import statement to be a different color from the rest of the path.

    Pandoc uses the Kate editor’s syntax files, and Kate’s default syntax file for Scala is derived from one that’s included in the Scala distribution. I got no responses to a query about better Kate syntax files for Scala, so we’ll have to make our own.

    Continue reading »
  • Tweet

Copyright © 2013 Travis Brown Powered by Hakyll