Macro-supported DSLs for schema bindings
We've recently started using the W3C's
banana-rdf
library
at MITH, and it's allowing us to make
a lot of our code for working with RDF graphs
both simpler and less tightly coupled to a specific RDF store. It's a
young library, but also very clever and well-designed, and it does an excellent job of
exploiting advanced features of the Scala language to make its
users' lives easier. Alexandre Bertails
and his collaborators deserve a lot of credit for what they've accomplished
in just a little over a year.
One of the least pleasant aspects of working with any RDF library is writing
bindings for particular vocabularies. For example, if we wanted to use the
Open Archives Initiative's
Object Reuse and Exchange vocabulary
in our banana-rdf
application, we'd need to write something like the following:
class OREPrefix[Rdf <: RDF](ops: RDFOps[Rdf])
extends PrefixBuilder("ore", "http://www.openarchives.org/ore/terms/")(ops) {
import ops._
val aggregates = apply("aggregates")
val isAggregatedBy = apply("isAggregatedBy")
val describes = apply("describes")
val isDescribedBy = apply("isDescribedBy")
val lineage = apply("lineage")
val proxyFor = apply("proxyFor")
val proxyIn = apply("proxyIn")
val similarTo = apply("similarTo")
val Aggregation = apply("Aggregation")
val AggregatedResource = apply("AggregatedResource")
val Proxy = apply("Proxy")
val ResourceMap = apply("ResourceMap")
}
This isn't as bad as writing a Jena vocabulary, for example, but it's still verbose and error-prone. A relatively simple application can require dozens of these vocabulary bindings, some of which may define hundreds of classes and properties. The vocabularies may also be under development and subject to change, in which case the binding code must be kept in sync with a schema or some other kind of documentation.
Since this is Scala, I decided that I'd take a stab at using macros to
make the process of writing these things a little easier. It's not
too hard to use this trick to write
a macro that will parse a given RDF schema at compile time and generate an
anonymous subclass of Prefix
with the appropriate members. The usage looks like this:
import org.w3.banana._, sesame._, edu.umd.mith.banana._
trait MyPrefixes[Rdf <: RDF] extends Prefixes[Rdf] {
val ore = createFromSchema[Rdf](
"http://www.openarchives.org/ore/terms/",
"/ore-terms.rdf"
)
}
And then:
scala> val prefixes = new MyPrefixes[Sesame] { def ops = RDFOps[Sesame] }
prefixes: MyPrefixes[org.w3.banana.sesame.Sesame] = $anon$1@12928368
scala> prefixes.ore.similarTo
res0: org.w3.banana.sesame.Sesame#URI = http://www.openarchives.org/ore/terms/similarTo
No need to type out all the class and property names twice—we just drop the schema
in our resource folder, point to it when we call our macro method, and we're ready to go.
Note that we also don't have to provide the prefix name—it will be set automatically
based on the name of the value ("ore"
in this case).
This is simple and easy, but it's not perfect. We don't have a "real" type for
our ORE Prefix
—just the structural type of the ore
instance. This can be inconvenient
in certain situations, and it also means that writing something like ore.similarTo
involves a reflective call. It's still entirely type-safe—we'd get a compile-time error if we accidentally wrote ore.similarToo
,
for example—but it does inflict a small performance penalty, and if we don't want lots of warnings during compilation we
either have to set a compiler option to enable reflective calls or import scala.language.reflectiveCalls
all over the place.
Someday we'll all live in Macro Paradise and type macros will make this kind of problem completely irrelevant. In the meantime, I've been experimenting with a couple of other approaches that avoid the problem while still minimizing verbosity. We have to write out all of our class and property names, but only once:
trait OREPrefix[Rdf <: RDF] extends Prefix[Rdf] {
val prefixIri = "http://www.openarchives.org/ore/terms/"
val
aggregates, isAggregatedBy,
describes, isDescribedBy,
lineage, proxyFor,
proxyIn, similarTo,
Aggregation, AggregatedResource,
Proxy, ResourceMap: Rdf#URI
}
Next we call a macro method that will create an instance of this trait with the appropriate implementations of these values:
trait MyPrefixes[Rdf <: RDF] extends Prefixes[Rdf] {
val ore = create[OREPrefix]
}
And now we can run the REPL code above without any reflective calls.
If we want a little more compile-time protection against typing mistakes, we can point to a schema and the macro will confirm that all of our class and property names are legitimate:
trait MyPrefixes[Rdf <: RDF] extends Prefixes[Rdf] {
val ore = createWithSchema[OREPrefix]("/ore-terms.rdf")
}
Suppose for example that this had been our trait:
trait OREPrefix[Rdf <: RDF] extends Prefix[Rdf] {
val prefixIri = "http://www.openarchives.org/ore/terms/"
val these, arent, really, properties: Rdf#URI
}
We'd get a nice little error when we try to compile MyPrefixes
:
<console>:22: error: The following is not a valid property name: these
This is currently all experimental work, and I'm not sure whether any of it will make its way into projects at MITH, but I've published the code on GitHub in case anyone's interested in taking a look at how it's implemented.