home

Lots of little trees, part 2

I just noticed that the Lawrence Berkeley National Laboratory's Nux library provides streaming XQuery functionality that makes it very easy to do the kind of XML processing that I described in this post last week.

Using Scala, for example, we can start with some imports:

import nu.xom.{ Builder, Element, Nodes }
import nux.xom.xquery.{ StreamingPathFilter, StreamingTransform, XQueryUtil }

Next we write the "transformer" that we want to apply to every record element:

val processor = new StreamingTransform {
  def transform(record: Element) = {
    val id = XQueryUtil.xquery(record, "IndexCatalogueID").get(0)
    val placeResults = XQueryUtil.xquery(record, "//Place")
    val places = (0 until placeResults.size) map placeResults.get

    println(id.getValue + " " + places.map(_.getValue).mkString(", "))
    new Nodes()
  }
}

We're not really transforming anything here, of course—just performing a side effect as we iterate through the records. We could just as easily be adding some representation of the record to a mutable collection, sending a message to an actor, etc.

Continue reading