A statement about my Scala open source work

This post is about open source burnout, but it's not about adopters feeling entitled to my time, or disagreements between maintainers, or anything like that. My personal experience of working on open source Scala projects for the past decade has been almost entirely positive, with only one or two exceptions that I can think of off the top of my head, out of dozens of thousands of interactions.

Instead this post is about my unwillingness to continue contributing my time to a community whose leadership is characterized by unreflective privilege and petty vindictiveness (Martin Odersky), grifty amoral opportunism (John De Goes), cowardice and inaction (Typelevel), and betrayal of trust (the Scala Center).

Continue reading

Response to John De Goes

This document (originally published on GitHub) contains additional supporting material for the claims in this post (published 1 September 2019), in response to a cease and desist letter from John A. De Goes that I received on 6 July 2020.

De Goes is the CEO of a Scala consultancy named Ziverge, the founder of a project named "ZIO" (not to be confused with the racist slur), and the organizer of LambdaConf, Functional Scala, and other events.

In the letter De Goes's lawyer claims that the post "specifically targets our client with the goal to publicly vilify our client" and threatens to sue me for defamation in a German court.

Please see also this post for more information about De Goes deleting the FCoP repository on GitHub shortly before sending the cease and desist letter.

Continue reading

The Fantasyland Code of Professionalism

The Fantasyland Code of Professionalism (FCoP) is a code of conduct developed by the Fantasyland Institute of Learning, an organization that was founded by John A. De Goes and is responsible for LambdaConf, a functional programming conference.

Many other people have written about the shortcomings of the FCoP as a code of conduct, including Christie Koehler, who calls it "beyond mediocre" and "downright dangerous", and Matthew Garrett (in an article titled "The Fantasyland Code of Professionalism is an abuser's fantasy").

The purpose of the post you're reading now isn't exactly to critique the FCoP, though, but to preserve some of the discussion surrounding it, since De Goes has recently deleted the FCoP GitHub repository and several other FCoP-related documents, in a move that seems related to the fact that he's currently threatening to sue me for defamation.

One of the claims in De Goes's cease and desist letter is that the following statement (published here) is false:

The FCoP was developed specifically in response to the 2016 LambdaConf controversy, and it's clearly designed to protect white supremacists like Yarvin.

De Goes's lawyer writes:

The FCoP is a code of conduct for professional communities that our client has created. The FCoP is clearly not designed to protect white supremacists.

I've provided evidence in another document that it's reasonable to describe Curtis Yarvin as a white supremacist, and that many other people besides me have done this, including journalists, prominent software developers (for example Erica Baker in this Inc. article), and one of his former business partners.

Continue reading

Best new (to me) fiction of 2019

I might try to fill in micro-reviews at some point, but right now this is just a list, in approximately descending order of how important the book was to me.

Continue reading

Implicit scope and Cats

Update: the experiment described in this post is now available in the Cats 2.2.0 pre-releases, starting with 2.2.0-M1, which was published in March 2020.

This post is an attempt to provide some additional context and justification for an experiment that I've been working on as a proposal for a future version of Cats (probably 3.0). The argument is that by moving Cats's type class instances for standard library types into implicit scope, we can provide a better user experience along a couple of dimensions (fewer imports to think about, faster compile times), while also making the library better aligned with future changes in the language and compiler.

Continue reading

Scala and the visitor pattern

Scala provides a handful of language features that are designed to make it easy for users to define and work with algebraic data types (ADTs). You write some case classes that extend a sealed trait, you write some functions that pattern match on those case classes, and you're done: you have a nice linked list or rose tree or whatever.

Sometimes you can't use case classes to implement your variants, though, or you don't want to put your case classes in your public API, and in these situations pattern matching is typically much less useful. This blog post is about the visitor pattern, which is an alternative to pattern matching that provides many of its benefits, and about the use of visitors we're planning for Circe 1.0.

Continue reading

Why Parallel

I've written a couple of blog posts about how the Parallel type class has changed in Cats 2.0, but those posts don't really say much about why someone using Cats should care about Parallel in the first place. The name suggests that it has something to do with running computations at the same time, and while that's one of things you can do with it (via the instance for IO in cats-effect, for example), it has a much, much wider range of applications. This post will focus on a real-world use case for Parallel that at a glance might not seem to have much in common with running things in parallel: accumulating errors while validating form inputs.

Continue reading

Cats 2.0 migration guide

Typelevel has just published Cats 2.0.0, and while the core modules are guaranteed to be binary compatible with 1.x, there are some changes that break source compatibility. Most of these changes are unlikely to affect users, but a few will, and the goal of this post is to point out which those are and what you can do about them.

Note that while some of the stuff below is pretty intense, it's unlikely to apply to you. In fact if you're not using Parallel, there's like a 99% chance you can close this tab right now and go change your Cats version and everything will be fine. There are also always people in the Cats Gitter channel who are happy to help. In any case please don't be intimidated and put off updating to 2.0.0—the community is healthier if adopters invest in staying up to date.

Continue reading

Fuck yeah type erasure

(Apologies for the title—after a lot of time on Twitter this week I've been feeling nostalgic for things like Tumblr c. 2010.)

This post is an attempt to answer a question Baccata64 asked on Reddit yesterday afternoon:

how does the Parallel change not break bincompat ? Is it that type parameters and type members are encoded the same way at the bytecode level ?

The context is that Cats 2.0.0-RC2 includes a recent change where the Parallel and NonEmptyParallel type classes were changed from having two type parameters each:

trait NonEmptyParallel[M[_], F[_]] {
  // ...

…to one, with the parallel context (the F parameter) changed to a type member:

trait NonEmptyParallel[M[_]] {
  type F[_]
  // ...

This post will give some background about the context and motivation for this change, and then will try to answer Baccata64's question.

Continue reading

John De Goes and the FP community

Update (25 July 2020): John De Goes has hired a lawyer to send me a cease and desist letter demanding that I delete this post. See my responses here and here for more details.

This post is a collection of links about John De Goes that show some clear patterns of behavior:

  • De Goes defending white supremacists and misogynists.
  • De Goes attacking critics and accusing them (especially women) of lying.
  • De Goes engaging in targeted harassment, either directly (@druconfessions) or indirectly (e.g. via ClarkHat, a LambdaConf sponsor).
Continue reading

Best new (to me) fiction of 2018

I had a lot of fun doing this thing a couple of years ago, so I thought I'd go ahead and give up the pretense of this being a tech blog and do it again. I don't drink now so it wasn't as fun as last time, but if I convince one person to read Suttree or The Little Friend (or to skip Lincoln on the Bardo) it'll have been worth it.

Continue reading

Best new (to me) fiction of 2016

At least as far as my reading went, 2016 wasn't a great year for fiction, but I did have more time for reading fiction than I've had since I was a grad student and that was my job. This isn't the usual kind of thing I do with this blog, but here are thirty or so one-or-two-sentence reviews of some of the novels that I read for the first time this year (sorted roughly in decreasing order of importance they had for me).

Continue reading

A note about LambdaConf

Yesterday morning John De Goes published a blog post explaining why he and the other LambdaConf organizers decided not to uninvite an outspoken defender of slavery and lots of other vile stuff from their conference.

If you think that keeping Yarvin on the program was the right choice, then this post isn't for you. I think you're wrong, and it's pretty likely I also think your reasons are bad, predictable, and boring. John's blog post is at least not predictable or boring, even though there are plenty of problems with the process he describes, and even though it's really hard to read his anxiety about the power of "social media", etc. as anything but more Moldbuggian fretting over lost privilege.

What this post is about is how good last year's LambdaConf was. It's the only LambdaConf I've been to, and will probably be the only one I'll ever go to, now, but it was also possibly the best tech conference I've ever been to. They seemed to get so many things right and to have thought carefully about so many things: the balance of the program, the clarity about the code of conduct, the on-site child care, the emphasis on not organizing every extracurricular event around alcohol, and so on. I'm not sure I've ever seen another conference note the availability of all-gender restrooms on their website. At least from my perspective, none of these efforts felt perfunctory—I got the impression that the organizers really cared deeply about creating a welcoming environment.

This was especially important to me at last year's conference because in May 2015 the Scala community was even more of a disaster than usual. Scalaz had recently withdrawn from Typelevel, the Scalaz leadership had been rearranged, and I'd been in some fairly unpleasant arguments with a couple of the LambdaConf speakers and more than a couple of the other attendees. I was nervous about what this would mean for the tone of the conference, but somehow it was a non-issue—I personally saw nothing but civility and a lot more good will than I expected, and I believe the LambdaConf organizers deserve at least part of the credit for that.

This is why I find yesterday's decision so frustrating—I know people do shitty, inconsistent, exclusionary things all the time, especially in the tech industry, but this is like watching a friend do something particularly shitty, inconsistent, and exclusionary.

Continue reading

JSON numbers in circe 0.3.0

I'm publishing this article as a blog post (rather than as part of the circe documentation) because it's intended to be a discursive overview of the representation of JSON numbers in circe 0.3.0, not as an up-to-date description of the implementation in whatever the current circe version happens to be when you're reading this. For information about JSON numbers in circe versions after 0.3.0, please see the project documentation (in particular the changelogs and API docs).

Continue reading

Configuring generic derivation

This post is a kind of sequel to my previous article on type classes and generic derivation, although unfortunately there's a lot of intermediate content that should go between there and here that I haven't written yet. This post introduces a new feature in circe that I'm pretty excited about, though, so I'm not going to worry about skipping over that stuff for now.

Continue reading

Yet another iteratee library

I'll start with the story of how I got saved, since it's kind of relevant. Back when I was an English Ph.D. student, I worked on a number of projects that involved natural language processing, which meant doing a lot of counting trigrams or whatever in tens of thousands of text files in giant messy directory trees. I was working primarily in Ruby at the time, after years of Java, and at least back in 2008 it was a pain in the ass to do this kind of thing in either Ruby or Java. You really want a library that provides the following features:

  1. Resource management: you don't want to have to worry about running out of file handles.
  2. Streaming: you shouldn't ever have to have all of the data in memory at once.
  3. Fusion: two successive mapping operations shouldn't need to traverse the data twice.
  4. Graceful error recovery: these tasks are all off-line, but you still don't want to have to restart a computation that's been running for ten minutes just because the formatting in one file is wrong.

Maybe there was such a library for Ruby or Java back then, but if there was I didn't know about it. I did have some experience with Haskell, though, and at some point in 2010 I heard about iteratees, and they were exactly what I'd always wanted. I didn't really understand how they worked at first, but with iteratee (and later John Millikin's enumerator) I was able to write code that did what I wanted and didn't make me think about stuff I didn't want to think about. I started picking Haskell instead of Ruby for new projects, and that's how I accepted statically-typed functional programming into my life.

Continue reading

Error accumulating decoders in circe

Suppose we're writing a service that accepts JSON requests and returns some kind of response. If there's a problem with a request—it's not even valid JSON, it doesn't match the schema we expect, etc.—we want to return an error, and of course it'd be nice if these errors were actually useful to the caller.

Unfortunately "useful" in this context can mean lots of different things, and the differences will usually depend at least in part on how involved a human was in creating the request. In the case of validation errors—i.e. we successfully received some JSON, but it's not a shape we understand—then if there's no human in sight, we generally only need to say something like "hey, we're not even speaking the same language, you should probably go try somewhere else". A detailed breakdown of all the reasons we don't understand the request is unlikely to be useful, so we might as well fail as fast as possible and save resources.

If on the other hand a human was responsible for the content of the request, it's possible that the caller will be able to make use of detailed information about all the problems with that content. Suppose for example that the JSON comes from a web form or spreadsheet that for whatever reason needs to be at least partially validated on the server side. In this case we probably don't want to fail fast—we want to accumulate all of the errors and send them back together, so that the human can correct them in a single pass.

Continue reading

Type classes and generic derivation

Yesterday I wrote a Stack Overflow answer about using Shapeless for generic derivation of type class instances, and this morning I started putting together some new documentation for circe's generic derivation, and after a few paragraphs I decided that it might make sense to write a blog post that could serve as a bridge between the two—between simple examples like the one in my answer (which doesn't really go into motivation, etc.) and the kinds of things we're doing in circe. I'll start more or less from scratch, assuming only some basic familiarity with Scala syntax.

Continue reading

Goodbye, Twitter

Today is my last day at Twitter (well, kind of—my email access was shut off this morning, so I've been spending the day reading a novel).

I'm not sure how I feel about this yet. After some very recent changes in the Twitter Open Source team, I'd been vaguely thinking about leaving the company anyway, and since I'm not a proud person (at least I don't think I am), I have no complaints about that happening involuntarily if it means a severance package. We'll see whether Jack and I agree about the "generous" part.

If I tried to recruit you to work at Twitter recently, my apologies—I had no idea this was coming. I still think it's a great place to work, even though (like most of the rest of the world) I'm not sure how much confidence I have in its leadership or product direction. There are lots of incredibly smart and generous people at Twitter. In fact in my experience pretty much everyone there at least three or four steps from the top of the org chart fits that description. You should probably work there if you get the chance.

I was told literally nothing about Twitter's plans for the Twitter Open Source team, which for the past couple of weeks has consisted of two people (counting me). If you're using Twitter open source software, you're probably okay, especially if you're using a project like Finagle that's owned by a team that cares a lot about open source (and the Core System Libraries team does). Because the Open Source team has never been larger than three people, individual engineering teams have always been primarily responsible for maintaining their own open source projects, and I'm assuming that's not going to change.

I'll miss pretending to be Finagle on Twitter, but I know I'm leaving @finagle in good hands.

If you're reading this, you probably have some idea about the kind of work I like to do. If you've got that kind of work, get in touch (preferably on Twitter). I'm planning to take off at least a couple of months to read, work on open source projects (including Finch!), and maybe learn some more Rust or Idris or something, but I'm keeping my eyes open.

If I worked with you at Twitter, or if you attended my Twitter University courses, thanks—it was a great seventeen months.

Continue reading

Roll-your-own Scala

I've always really liked this passage from On the Genealogy of Morals:

[T]here is a world of difference between the reason for something coming into existence in the first place and the ultimate use to which it is put, its actual application and integration into a system of goals… anything which exists, once it has come into being, can be reinterpreted in the service of new intentions, repossessed, repeatedly modified to a new use by a power superior to it.

A couple of months ago at LambdaConf I had a few conversations with different people about why we like (or at least put up with) Scala when there are so many better languages out there. Most of the answers were the usual ones: the JVM, the ecosystem, the job market, the fact that you don't have to deal with Cabal, etc.

For me it's a little more complicated than that. I like Scala in part because it's a mess. It's not a "fully" dependently typed language, but you can get pretty close with singleton types and path dependent types. It provides higher-kinded types, but you have to work around lots of bugs and gaps and underspecified behaviors to do anything very interesting with them. And so on—it's a mix of really good ideas and a few really bad ideas and you can put them together in ways that the language designers didn't anticipate and probably don't care about at all.

The rest of this blog post will be a long story about one example of this kind of thing involving Scalaz's UnapplyProduct.

Continue reading

Deriving incomplete type class instances

Suppose we've got a simple representation of a user:

case class User(id: Long, name: String, email: String)

Now suppose we're writing a web service where we allow clients to post some JSON to a resource to create a new user. We get to pick the id, not the client, so we might accept something like this:

  "name": "Foo McBar",
  "email": "foo@mcbar.com"

If we're using a type class-based JSON library like Argonaut, we'll probably have written a codec instance for User (or we may be using a library like argonaut-shapeless that derives instances for our case classes automatically).

The problem is that our User codec won't work on JSON like the example above (since it's missing the id field).

Continue reading

Foldable considered confusing

Tomasz Nurkiewicz recently published an article arguing that the fold on Option (new in Scala 2.10) is unreadable and inconsistent and should be avoided. I personally disagree about the unreadability part and the should be avoided part, which isn't too surprising, since I generally disagree with Tomasz. I have a lot of respect for him, though, and I can actually get on board with a good chunk of what he says in the article. I can understand why you might want to avoid fold in some projects, and I agree that the way the standard library provides folds is inconsistent and frustrating—I still get a little angry when I think about the fact that Try doesn't have a fold even though it was introduced in the same version of the language that gave us the fold on Option.

So this post isn't about why Tomasz is wrong about readability, etc.—it's about how much I hate the name Foldable.

Continue reading

Partitioning by constructor

It's not unusual in Scala to want to take a collection with items of some algebraic data type and partition its elements by their constructors. In this Stack Overflow question, for example, we're given a type for fruits:

sealed trait Fruit

case class Apple(id: Int, sweetness: Int) extends Fruit
case class Pear(id: Int, color: String) extends Fruit

The goal is to be able to take a collection of fruits and split it into two collections—one of apples and one of pairs.

def partitionFruits(fruits: List[Fruit]): (List[Apple], List[Pear]) = ???

It's pretty easy to use collect to solve this problem for particular cases. It's a little trickier when we start thinking about what a more generic version of such a method would look like—we want to take a collection of items of some arbitrary algebraic data type and return a n-tuple whose elements are collections of items of each of that ADT's constructors (and let's require them to be typed as specifically as possible, since this is Scala). It's not too hard to imagine how you could write a macro that would perform this operation, but it'd be messy and would probably end up feeling kind of ad-hoc (at least without a lot of additional work and thinking).

Fortunately we've got Shapeless 2.0, where Miles Sabin and co. have written the macros for us so we can keep our hands clean.

Continue reading

Explicit defaults in Scala

This post is another entry in my series on stuff you should never do with macros in Scala, but that you could do with macros in Scala, if you really wanted to, and if you'd picked up a bottle of Macallan on the way home from work and were willing to waste half an evening doing something ridiculously useless.

It's specifically a response to this Stack Overflow question, which asks if it's possible to specify explicitly that you want to use the default value of a constructor parameter in Scala.

So suppose we have a class like this:

class Foo(val x: String, val y: Int = 13, val z: Symbol = 'zzz)

The goal is to allow the following syntax:

val useDefaultZ = true

new Foo(x = "whatever", y = 1, z = if (useDefaultZ) default else 'whatever)

This is possible with macros, and it's not nearly as easy as you might think.

Continue reading

Ordering case classes

Scala provides lexicographic Ordering instances for TupleN types when all of the element types have Ordering instances. It doesn't provide the same instances for case classes, however—probably just because lexicographic order isn't what you want for case classes as often as it is for tuples.

Sometimes you actually do want a lexicographic ordering for your case classes, though, and Scala unfortunately doesn't provide any nice boilerplate-free way to create such orderings. This post will provide a quick sketch of two approaches to filling this gap: one using macros, and one using Shapeless 2.0's new Generic machinery and the TypeClass type class.

First for a case class to use as a running example, along with some instances:

case class Foo(x: Int, y: String)

val a = Foo(9, "x")
val b = Foo(1, "z")
val c = Foo(9, "w")

val foos = List(a, b, c)

Let's quickly confirm that there's no Ordering[Foo] already sitting around:

scala> foos.sorted
<console>:14: error: No implicit Ordering defined for Foo.

Yep, we're going to have to take care of this ourselves.

Continue reading

Natural vampires

This Stack Overflow question is interesting—it asks whether we can use Scala macros to create a value class for positive integers where the positiveness is checked at compile-time, and where it's not possible to create an invalid instance.

I'm pretty sure it's not. My first thought was to turn PosInt into a sealed universal trait with a private value class implementation in the PosInt companion object, but inheriting from a universal trait forces us to give up most (all?) of the advantages of value classes in this case, and of course it's not actually possible to make the value class private, anyway.

So I don't have an answer, but I do have a pretty neat trick involving vampire methods that gives us some of the benefits of value classes.

Continue reading

Quasiquotes for multiple parameter lists

Quasiquotation is an old idea (Miles Sabin notes the term in a 1937 Quine paper, for example) that's now available in Scala (thanks to the efforts of Eugene Burmako and Den Shabalin), where it allows us to avoid the nightmarishly complicated and verbose code that's required to construct abstract syntax trees manually in our macros.

Quasiquotes are a little like reification, but much more flexible about what kinds of things can be "spliced" into the tree, and where they can be spliced. For example, we couldn't use reify in the following code, because there's no way to splice in the name of the type member:

def foo(name: String): Any = macro foo_impl
def foo_impl(c: Context)(name: c.Expr[String]) = {
  import c.universe._

  val memberName = name.tree match {
    case Literal(Constant(lit: String)) => newTypeName(lit)
    case _ => c.abort(c.enclosingPosition, "I need a literal!")

  val anon = newTypeName(c.fresh)

      Modifiers(Flag.FINAL), anon, Nil, Template(
        Nil, emptyValDef, List(
          TypeDef(Modifiers(), memberName, Nil, TypeTree(typeOf[Int]))
    Apply(Select(New(Ident(anon)), nme.CONSTRUCTOR), Nil)

This is an unreadable mess, and it's not even a complete example—it depends on some additional utility code.

Continue reading

Potemkin val-age

My attempt to sneak the terms vampire and zombie into the Scala vernacular seems to be succeeding, so here's a new one:

Potemkin definitions: definitions in a macro-constructed structural type that are intended only to make an expression passed as an argument to another macro method typecheck before that macro rewrites it.

I came up with the trick to support this horrible abuse of Scala syntax:

case class Car(var speed: Int, var color: String) { ... }
object Car { ... }

import Car.syntax._

val car = new Car(0, "blue")

car set {
  color = "red"
  speed = 10000

Here color and speed are definitions in a structural type that have the same signatures as the fields on the case class, but they don't actually do anything—if we call them we get a NotImplementedError. They only exist to allow the block expression we're passing to set to typecheck before the macro implementation replaces them with something useful.

Continue reading

Feeding our vampires

I've written several times about vampire methods, which are macro methods inside a macro-defined type, where the macro method's implementation is provided in an annotation. Normally when we define a type in a def macro, it looks like a structural type to the outside world, and calling methods on a structural type involves reflective access in Scala. Vampire methods allow us to avoid that ugly bit of runtime reflection.

This trick (which was first discovered by Eugene Burmako) is useful because it makes it a little more practical to use def macros to approximate type providers, for example. It's also just really clever.

For methods with no parameters, the execution of the trick is pretty straightforward. It's a little more complicated when we do have parameters, as Eric Torreborre observes in a question here, since in that case the annotation will need to contain a function instead of just a simple constant of some kind.

Continue reading

The most horrible code I've ever written

When macros first showed up in Scala as an experimental language feature last year, many Scala developers responded with skepticism or distaste. They argued that macros were a distraction from work on more urgent problems with the language, that they would lead to even more complex and reader-unfriendly code, etc.

After a year and a half I think these arguments have less weight, as macros have proven extremely useful in a wide range of applications: string interpolation, serialization, type-level programming with singletons, numeric literals and faster loops, typed channels for actors, and so on. They've let me make many parts of my own code faster and safer in surprising ways.

This post is not about a useful application of macros. It's inspired by a couple of questions on Stack Overflow today, and is an example of exactly the kind of thing macros should not ever be used for. But it's Friday evening and I'm drinking beer in the office and I think this trick is pretty clever, so here we go.

Continue reading

Lists are even easier

This is a quick follow-up to my post last night about stream processing with iteratees. It's worth pointing out that we can accomplish much the same thing even more concisely using Haskell's lists:

import Data.Char (isSpace)
import Data.List (mapAccumL)
import Data.List.Split (splitWhen)

nextPage (page, _)    = (page + 1, 0)
nextLine (page, line) = (page, line + 1)

locator = snd . mapAccumL go (0, 0)
    go loc c = let nextLoc = advance c loc in (nextLoc, (c, nextLoc))
    advance '\f' = nextPage
    advance '\n' = nextLine
    advance _    = id

tokenizer = map unzip . filter (not . null) . splitWhen (isSpace . fst)

poem =
  "It is the same! - For, be it joy or sorrow,\n" ++
  "The path of its departure still is free:\f" ++
  "Man's yesterday may ne'er be like his morrow;\n" ++
  "Nought may endure but Mutability.\n"

main = mapM_ print $ tokenizer . locator $ poem

(We could make this even more concise by using the standard library's words in our definition of tokenizer, but I'm trying to stick to the same basic structure as the iteratee version.)

Continue reading

Iteratees are easy

This blog post is a short response to my MITH colleague Jim Smith, who several weeks ago published a blog post about a stream processing language that he's developing. His post walks through an example of how this language could allow you to take a stream of characters, add some location metadata to each, and then group them into words, while still holding onto the location metadata about the characters that make up the words.

The process he describes sounds a little like the functionality that iteratees provide, so I decided I'd take a quick stab at writing up an iteratee implementation of his example in Haskell. I'm using John Millikin's enumerator package, since that's the iteratee library that I'm most comfortable with.

Continue reading

Vampire methods for structural types

I wish I could take credit for what I'm about to show you, because it's easily the cleverest thing I've seen all week, but it's Eugene Burmako's trick and I've only simplified his demonstration a bit and adapted it to work in Scala 2.10.

Continue reading

Fake type providers, part 2

I like writing code. I also like not writing code, especially when I'm writing code. Type providers are a particularly nice way not to write code. They let you take some kind of schema (for a relational database, RDF vocabulary, etc.) and turn it directly into binding classes at compile time—with no worrying about managing generated code, etc.

I've wanted type providers in Scala for a long time (heck, I wanted type providers ten years ago when I was a Java programmer who had no idea what a type provider was but was dissatisfied with code generation for this kind of task). The type macros currently available in Macro Paradise will provide the real thing, but they're still (at least) months and months away from a stable Scala release.

In the meantime, you can get surprisingly good fake type providers with the def macros in Scala 2.10. In a previous post I outlined one set of macro-based approaches to the problem, with the most concise version involving structural types and therefore (unfortunately) reflective access. In this post I'll go into a bit more detail about the code involved, and will look at just how bad reflective access actually is performance-wise by comparing the structural-type approach to two alternatives: plain old code generation and macro-supported compile-time dynamic types.

Continue reading

Singleton types for literals in Scala

It's sometimes useful in Scala to have a type with a single value. These are called singleton types, and they show up most easily in the context of Scala's objects. For example, if we have the following definition:

object foo {
  def whatever = 13

We can refer to a type foo.type that is the singleton type for foo—i.e., the type that contains nothing except foo. We can use this type to write a function that won't compile with any non-foo argument:

def fooIdentity(x: foo.type) = x

For example:

scala> fooIdentity(foo)
res1: foo.type = foo$@5da19724

scala> fooIdentity("foo")
<console>:14: error: type mismatch;
 found   : String("foo")
 required: foo.type

Note that this error message doesn't just tell us that we provided a String when we needed a foo—it lists the type as String("foo"). This is because string literals—like all other literals in Scala (except function literals)—are also singletons in the sense that their most specific type is a singleton type.

Continue reading

Macro methods and subtypes

Suppose we want to define an HListable trait in Scala that will add a members method returning an HList of member values to any case class that extends it. This would let us write the following, for example:

scala> case class User(first: String, last: String, age: Int) extends HListable
defined class User

scala> val foo = User("Foo", "McBar", 25)
foo: User = User(Foo,McBar,25)

scala> foo.members == "Foo" :: "McBar" :: 25 :: HNil
res0: Boolean = true

So we try the following, which looks reasonable at a glance:

Continue reading

Macro-supported DSLs for schema bindings

We've recently started using the W3C's banana-rdf library at MITH, and it's allowing us to make a lot of our code for working with RDF graphs both simpler and less tightly coupled to a specific RDF store. It's a young library, but also very clever and well-designed, and it does an excellent job of exploiting advanced features of the Scala language to make its users' lives easier. Alexandre Bertails and his collaborators deserve a lot of credit for what they've accomplished in just a little over a year.

One of the least pleasant aspects of working with any RDF library is writing bindings for particular vocabularies. For example, if we wanted to use the Open Archives Initiative's Object Reuse and Exchange vocabulary in our banana-rdf application, we'd need to write something like the following:

Continue reading

Learning Shapeless

I began writing this post as an answer to this Stack Overflow question about learning how to use Shapeless, but it ended up a little long for a Stack Overflow answer, so I'm posting it here instead.

When I first started teaching myself type-level programming in the context of Shapeless last year, I spent a lot of time working through simple problems with heterogeneous lists of type-level natural numbers. One fairly straightforward (but still non-trivial) example of such a problem is the second user story of this Coding Dojo kata, which is also outlined by Paul Snively in this email to the Shapeless mailing list.

The goal is to determine whether a list of numbers is the appropriate length (nine) and has a valid checksum, which is calculated by taking the sum of each element multiplied by its distance (plus one) from the end of the list, modulo eleven.

Continue reading

Lots of little trees, part 2

I just noticed that the Lawrence Berkeley National Laboratory's Nux library provides streaming XQuery functionality that makes it very easy to do the kind of XML processing that I described in this post last week.

Using Scala, for example, we can start with some imports:

import nu.xom.{ Builder, Element, Nodes }
import nux.xom.xquery.{ StreamingPathFilter, StreamingTransform, XQueryUtil }

Next we write the "transformer" that we want to apply to every record element:

val processor = new StreamingTransform {
  def transform(record: Element) = {
    val id = XQueryUtil.xquery(record, "IndexCatalogueID").get(0)
    val placeResults = XQueryUtil.xquery(record, "//Place")
    val places = (0 until placeResults.size) map placeResults.get

    println(id.getValue + " " + places.map(_.getValue).mkString(", "))
    new Nodes()

We're not really transforming anything here, of course—just performing a side effect as we iterate through the records. We could just as easily be adding some representation of the record to a mutable collection, sending a message to an actor, etc.

Continue reading

Applicative validation syntax

People say that Validation is Scalaz's gateway drug, which might be accurate if you ignore the suggestion that there's anything even remotely fun about validation. In my book, making sure that your program doesn't choke on bad input is always a chore.

Applicative validation is at least a step in the right direction—it makes it easier to write less code, introduce fewer bugs, and draw clearer lines between data models and validation logic. Suppose for example that we have the following case class in Scala:

case class Foo(a: Int, b: Char, c: String)

Suppose also that we have a form with three fields that we want to use to create instances of Foo. We receive input from this form as strings, and we want to be sure that these strings have certain properties.

Continue reading

Lots of little trees

In my field (computational humanities), people like to distribute databases as enormous XML files. These are often very flat trees, with the root element containing hundreds of thousands (or millions) of record elements, and they can easily be too big to be parsed into memory as a DOM (Document Object Model) or DOM-like structure.

This is exactly the kind of problem that streaming XML parsers are designed to solve. There are two dominant approaches to parsing XML streams: push-based models (like SAX, the "Simple API for XML"), and pull-based models (like StAX, or—shudderscala.xml.pull). Both of these approaches save memory by producing streams of events (BeginElement, Comment, etc.) instead of reconstructing a tree-based representation of the file in memory. (Such a representation can be 5-10 times the size of the file on disk, which quickly becomes a problem when you have four gigs of memory and your XML files are approaching a gigabyte in size.)

Push-based APIs like SAX are inherently imperative: we register callbacks with the parser that specify how to handle events, and then it calls them as it parses the XML file. With a pull parser, on the other hand, the programmer sees the events as an iterator or lazy collection that he or she is responsible for iterating through. Newer frameworks that support streaming XML processing tend to provide pull-based APIs, and many developers find pull parsing more intuitive than SAX (or at least slightly less miserable).

Continue reading

Better Scala syntax highlighting for Hakyll

I chose Hakyll over other static site generators for this blog in part because Hakyll is built on Pandoc, John MacFarlane's fantastic document conversion library. Unfortunately Pandoc's syntax highlighting for Scala is incredibly bad:

Continue reading