Yet another iteratee library

I'll start with the story of how I got saved, since it's kind of relevant. Back when I was an English Ph.D. student, I worked on a number of projects that involved natural language processing, which meant doing a lot of counting trigrams or whatever in tens of thousands of text files in giant messy directory trees. I was working primarily in Ruby at the time, after years of Java, and at least back in 2008 it was a pain in the ass to do this kind of thing in either Ruby or Java. You really want a library that provides the following features:

  1. Resource management: you don't want to have to worry about running out of file handles.
  2. Streaming: you shouldn't ever have to have all of the data in memory at once.
  3. Fusion: two successive mapping operations shouldn't need to traverse the data twice.
  4. Graceful error recovery: these tasks are all off-line, but you still don't want to have to restart a computation that's been running for ten minutes just because the formatting in one file is wrong.

Maybe there was such a library for Ruby or Java back then, but if there was I didn't know about it. I did have some experience with Haskell, though, and at some point in 2010 I heard about iteratees, and they were exactly what I'd always wanted. I didn't really understand how they worked at first, but with iteratee (and later John Millikin's enumerator) I was able to write code that did what I wanted and didn't make me think about stuff I didn't want to think about. I started picking Haskell instead of Ruby for new projects, and that's how I accepted statically-typed functional programming into my life.

Continue reading

Roll-your-own Scala

I've always really liked this passage from On the Genealogy of Morals:

[T]here is a world of difference between the reason for something coming into existence in the first place and the ultimate use to which it is put, its actual application and integration into a system of goals… anything which exists, once it has come into being, can be reinterpreted in the service of new intentions, repossessed, repeatedly modified to a new use by a power superior to it.

A couple of months ago at LambdaConf I had a few conversations with different people about why we like (or at least put up with) Scala when there are so many better languages out there. Most of the answers were the usual ones: the JVM, the ecosystem, the job market, the fact that you don't have to deal with Cabal, etc.

For me it's a little more complicated than that. I like Scala in part because it's a mess. It's not a "fully" dependently typed language, but you can get pretty close with singleton types and path dependent types. It provides higher-kinded types, but you have to work around lots of bugs and gaps and underspecified behaviors to do anything very interesting with them. And so on—it's a mix of really good ideas and a few really bad ideas and you can put them together in ways that the language designers didn't anticipate and probably don't care about at all.

The rest of this blog post will be a long story about one example of this kind of thing involving Scalaz's UnapplyProduct.

Continue reading

Deriving incomplete type class instances

Suppose we've got a simple representation of a user:

case class User(id: Long, name: String, email: String)

Now suppose we're writing a web service where we allow clients to post some JSON to a resource to create a new user. We get to pick the id, not the client, so we might accept something like this:

  "name": "Foo McBar",
  "email": "foo@mcbar.com"

If we're using a type class-based JSON library like Argonaut, we'll probably have written a codec instance for User (or we may be using a library like argonaut-shapeless that derives instances for our case classes automatically).

The problem is that our User codec won't work on JSON like the example above (since it's missing the id field).

Continue reading

Applicative validation syntax

People say that Validation is Scalaz's gateway drug, which might be accurate if you ignore the suggestion that there's anything even remotely fun about validation. In my book, making sure that your program doesn't choke on bad input is always a chore.

Applicative validation is at least a step in the right direction—it makes it easier to write less code, introduce fewer bugs, and draw clearer lines between data models and validation logic. Suppose for example that we have the following case class in Scala:

case class Foo(a: Int, b: Char, c: String)

Suppose also that we have a form with three fields that we want to use to create instances of Foo. We receive input from this form as strings, and we want to be sure that these strings have certain properties.

Continue reading