class: center, middle #
circe-algebra
(decoding after cursors)
Travis Brown
@travisbrown
--- layout: true --- .left-column[ ## Intro ] .right-column[ ### What is Circe? Fast, functional JSON encoding and decoding for Scala ] --- .left-column[ ## Intro ] .right-column[ ### What is Circe? Fast, functional JSON encoding and decoding for Scala * Built on [Cats](https://github.com/typelevel/cats) * Integrated with [Typelevel](https://github.com/typelevel) ecosystem * Three nines of purity * Pretty cool community (almost 100 contributors) * Competitive with [spray-json](https://github.com/spray/spray-json) in terms of performance ] --- .left-column[ ## Intro ] .right-column[ ### What is Circe? Fast, functional JSON encoding and decoding for Scala * Built on [Cats](https://github.com/typelevel/cats) * Integrated with [Typelevel](https://github.com/typelevel) ecosystem * Three nines of purity * Pretty cool community (almost 100 contributors) * Competitive with [spray-json](https://github.com/spray/spray-json) in terms of performance ### Who's using Circe? The BBC, Chartboost, Drivetribe, The Guardian, SoundCloud, Spotify, Stripe, Twilio, Zalando, Zendesk, ... ] --- .left-column[ ## Intro ] .right-column[ ### History of Circe * January 2015: First Cats commit * August 2015: Circe adapted from Argonaut 6.1 * February 2016: Circe 0.3 released * August 2017: 250k monthly Circe downloads * December 2017: Cats 1.0 * January (?) 2018: Circe 1.0 ] --- .left-column[ ## Intro ] .right-column[ ### History of Circe * January 2015: First Cats commit * August 2015: Circe adapted from Argonaut 6.1 * February 2016: Circe 0.3 released * August 2017: 250k monthly Circe downloads * December 2017: Cats 1.0 * January (?) 2018: Circe 1.0 ### Motivations for the fork * Better fit with the design goals of [Finch](https://github.com/finagle/finch) * Help drive Cats development and adoption * Playground for [library design experimentation](https://github.com/circe/circe/blob/master/DESIGN.md) ] --- .left-column[ ## Intro ] .right-column[ ### Differences between Circe and Argonaut * Optional [error accumulation](https://meta.plasm.us/posts/2015/12/17/error-accumulating-decoders-in-circe/) * More advanced [generic derivation](https://meta.plasm.us/posts/2016/01/14/configuring-generic-derivation/) * More, bigger [numbers](https://meta.plasm.us/posts/2016/02/14/json-numbers-in-circe-0.3.0/) * [Performance](https://github.com/circe/circe-benchmarks) (2-3x) ] --- .left-column[ ## Intro ] .right-column[ ### Things that still aren't that different Decoding is fundamentally built on _cursors_ ] --- .left-column[ ## Intro ] .right-column[ ### Things that still aren't that different Decoding is fundamentally built on _cursors_ [Paul Chiusano](https://pchiusano.github.io/) in a 2014 [blog post](https://pchiusano.github.io/2014-08-12/zippers-not-useful.html): > The general issue here is that neither parsing nor serialization really require the concept of a functional cursor _at all_. _Can it_ be done? Yes, of course. And there are even some nice libraries that happen to use zippers for these tasks. But _should_ it be done? Probably not. ] --- .left-column[ ## Intro ] .right-column[ ### Moving from cursors to circe-algebra More contexts: * Fail-fast vs. error accumulation ] --- .left-column[ ## Intro ] .right-column[ ### Moving from cursors to circe-algebra More contexts: * Fail-fast vs. error accumulation *
History vs. no history
*
YOLO (exception-throwing à la spray-json)
] --- .left-column[ ## Intro ] .right-column[ ### Moving from cursors to circe-algebra More contexts: * Fail-fast vs. error accumulation *
History vs. no history
*
YOLO (exception-throwing à la spray-json)
More formats: * `io.circe.Json` * YAML (via circe-yaml) ] --- .left-column[ ## Intro ] .right-column[ ### Moving from cursors to circe-algebra More contexts: * Fail-fast vs. error accumulation *
History vs. no history
*
YOLO (exception-throwing à la spray-json)
More formats: * `io.circe.Json`,
`scalajson.ast.JValue`, `spray.json.JsValue`
*
YAML (without conversions)
*
BSON, Avro, etc.
] --- .left-column[ ## Intro ] .right-column[ ### Moving from cursors to circe-algebra More contexts: * Fail-fast vs. error accumulation *
History vs. no history
*
YOLO (exception-throwing à la spray-json)
More formats: * `io.circe.Json`,
`scalajson.ast.JValue`, `spray.json.JsValue`
*
YAML (without conversions)
*
BSON, Avro, etc.
More other stuff: * Elegance (no unneeded operations) * Performance (operations can be inspected, rearranged) ] --- .left-column[ ## Intro ] .right-column[ ### Potentially interesting background resources https://github.com/travisbrown/circe-algebra https://github.com/circe/circe/issues/228 https://pchiusano.github.io/2014-08-12/zippers-not-useful.html ] --- .left-column[ ## Intro ## Cursors ] .right-column[ ### What do we mean by "cursor" (or "zipper")? * Data structure + pointer * Purely functional * Supports navigation and "modification" ] --- .left-column[ ## Intro ## Cursors ] .right-column[ ### Cursors in this context * `Cursor`: JSON AST + pointer * `HCursor`: `Cursor` + history * `ACursor`: `HCursor` that may have failed ] --- .left-column[ ## Intro ## Cursors ] .right-column[ ### Cursors in this context ```scala import io.circe._, io.circe.literal._ val doc = json"""{ "name": "things", "values": [[ "foo", 100 ]] }""" val c = doc.hcursor val nameC = c.downField("name") val badC = c.downField("nonsense") val firstValueC = c.downField("values").downN(0) val firstValue = firstValueC.as[(String, Int)] ``` ] --- .left-column[ ## Intro ## Cursors ] .right-column[ ### Cursors are really powerful ```scala import io.circe._, io.circe.literal._, io.circe.syntax._ val doc = json"""{"foo":[{"bar":1,"qux":true},0.01],"baz":[true]}""" def edited = doc.hcursor .downField("foo") .downN(0) .downField("bar") .set(List(1, 2, 3).asJson) .top ``` ] --- .left-column[ ## Intro ## Cursors ] .right-column[ ### Cursors for decoding ```scala import io.circe._, io.circe.jawn.decode val doc = """{"id":"abc","names":{"screen":"Foo McBar"},"age":25}""" case class User(id: String, screenName: String, age: Option[Int]) implicit val decodeUser: Decoder[User] = Decoder.instance { c => for { i <- c.get[String]("id") n <- c.downField("names").get[String]("screen") a <- c.get[Option[Int]]("age") } yield User(i, n, a) } val user = decode[User](doc) ``` ] --- .left-column[ ## Intro ## Cursors ] .right-column[ ### Cursors for decoding General pattern: * Move cursor * Read into context * Move cursor * Read into context * Compose results in context ] --- .left-column[ ## Intro ## Cursors ] .right-column[ ### Cursors for decoding General pattern: * Move cursor * Read into context * Move cursor * Read into context * Compose results in context Types: * Navigation: `(H|A)Cursor => ACursor` * Decoding: `(H|A)Cursor => Result[A]` ] --- .left-column[ ## Intro ## Cursors ## Algebra ] .right-column[ ### With an algebra of operations Both navigation and reading are "operations": ```scala sealed abstract class Op[A] case object ReadNull extends Op[Unit] case object ReadBoolean extends Op[Boolean] case object ReadNumber extends Op[BigDecimal] case object ReadString extends Op[String] // ... case class DownField(key: String) extends Op[Unit] case class DownAt(index: Int) extends Op[Unit] ``` ] --- .left-column[ ## Intro ## Cursors ## Algebra ] .right-column[ ### With an algebra of operations Both navigation and reading are "operations": ```scala sealed abstract class Op[A] case object ReadNull extends Op[Unit] case object ReadBoolean extends Op[Boolean] case object ReadNumber extends Op[BigDecimal] case object ReadString extends Op[String] // ... case class DownField(key: String) extends Op[Unit] case class DownAt(index: Int) extends Op[Unit] ``` The new part: * Interpretation: `Interpreter[I, F] ≈ Op[A] => I => F[A]` ] --- .left-column[ ## Intro ## Cursors ## Algebra ] .right-column[ ### Cursors vs. our algebra of operation The cursor approach: * `Decoder[A] ≈ HCursor => Either[DecodingFailure, A]` The algebra approach: * `Decoder[A] ≈ Op[A]` * `Interpreter[I, F] ≈ Op[A] => I => F[A]` ] --- .left-column[ ## Intro ## Cursors ## Algebra ] .right-column[ ### Cursors vs. our algebra of operation The cursor approach: * `Decoder[A] ≈ HCursor => Either[DecodingFailure, A]` The algebra approach: * `Decoder[A] ≈ Op[A]` * `Interpreter[I, F] ≈ Op[A] => I => F[A]` We've split up the work of decoding into two parts: * Operations (a simple data structure) * Interpreters (decoding engines) ] --- .left-column[ ## Intro ## Cursors ## Algebra ## Goals ] .right-column[ ### Design goals * Should be ~90% source-compatible with Circe 0.9 * Should be _at least_ as fast as Circe 0.9 in all benchmarks * It shouldn't be possible to write nonsense decoders * `m` formats and `n` contexts shouldn't involve `m * n` code ] --- .left-column[ ## Intro ## Cursors ## Algebra ## Goals ] .right-column[ ### Design goals * Should be ~90% source-compatible with Circe 0.9 * Should be _at least_ as fast as Circe 0.9 in all benchmarks * It shouldn't be possible to write nonsense decoders * `m` formats and `n` contexts shouldn't involve `m * n` code Explicitly not a design goal: * Interpreters don't need to be easy to write ] --- .left-column[ ## Intro ## Cursors ## Algebra ## Goals ] .right-column[ ### Design goals * Should be ~90% source-compatible with Circe 0.9 * Should be _at least_ as fast as Circe 0.9 in all benchmarks * It shouldn't be possible to write nonsense decoders * `m` formats and `n` contexts shouldn't involve `m * n` code Explicitly not a design goal: * Interpreters don't need to be easy to write Rationale: * We _may_ write a few dozen interpreters for Circe, ever * We can easily test the hell out of these * Optimizing for simple implementations seems kind of silly ] --- .left-column[ ## Intro ## Cursors ## Algebra ## Goals ## Approach ] .right-column[ ### Free * It's simple * It's extensible * It provides some nice guardrails ] --- .left-column[ ## Intro ## Cursors ## Algebra ## Goals ## Approach ] .right-column[ ### Free * It's simple * It's extensible * It provides some nice guardrails Problems: * We don't actually care about simplicity at this level * There's a ton of overhead * It puts constraints on us we don't necessarily want ] --- .left-column[ ## Intro ## Cursors ## Algebra ## Goals ## Approach ] .right-column[ ### Finally tagless * It's the cool new thing Problems: * I don't think it fits our use case ] --- .left-column[ ## Intro ## Cursors ## Algebra ## Goals ## Approach ] .right-column[ ### Adding composition operators to our algebra ```scala case object ReadNull extends Op[Unit] case object ReadBoolean extends Op[Boolean] // ... case class DownField(key: String) extends Op[Unit] case class DownAt(index: Int) extends Op[Unit] // ... case class Pure[A](value: A) extends Op[A] case class Bind[A, B](opA: Op[A], f: A => Op[B]) extends Op[B] ``` ] --- .left-column[ ## Intro ## Cursors ## Algebra ## Goals ## Approach ] .right-column[ ### Adding composition operators to our algebra ```scala case object ReadNull extends Op[Unit] case object ReadBoolean extends Op[Boolean] // ... case class DownField(key: String) extends Op[Unit] case class DownAt(index: Int) extends Op[Unit] // ... case class Pure[A](value: A) extends Op[A] case class Bind[A, B](opA: Op[A], f: A => Op[B]) extends Op[B] ``` Advantages: * It's fast ] --- .left-column[ ## Intro ## Cursors ## Algebra ## Goals ## Approach ] .right-column[ ### Adding composition operators to our algebra ```scala case object ReadNull extends Op[Unit] case object ReadBoolean extends Op[Boolean] // ... case class DownField(key: String) extends Op[Unit] case class DownAt(index: Int) extends Op[Unit] // ... case class Pure[A](value: A) extends Op[A] case class Bind[A, B](opA: Op[A], f: A => Op[B]) extends Op[B] ``` Advantages: * It's fast Disadvantages: * It's extra stuff our interpreters have to worry about * It means we can't (as conveniently) compose algebras ] --- .left-column[ ## Intro ## Cursors ## Algebra ## Goals ## Approach ] .right-column[ ### Throughput ```markdown Benchmark Mode Score Error Units decodeAccumulating avgt 32279.290 ± 4567.515 ns/op decodeAccumulatingWithHistory avgt 38935.287 ± 9136.040 ns/op decodeAccumulatingWithCursors avgt 46498.210 ± 8795.706 ns/op decodeFailFast avgt 34671.843 ± 1805.075 ns/op decodeFailFastWithHistory avgt 42670.889 ± 10314.980 ns/op decodeFailFastWithCursors avgt 53211.451 ± 10669.691 ns/op decodeFree avgt 857278.165 ± 135684.433 ns/op decodeSimple avgt 290936.679 ± 92996.094 ns/op ``` ] --- .left-column[ ## Intro ## Cursors ## Algebra ## Goals ## Approach ] .right-column[ ### Allocation rates ```markdown Benchmark Mode Score Error Units decodeAccumulating avgt 83168.013 ± 0.005 B/op decodeAccumulatingWithHistory avgt 83170.519 ± 21.556 B/op decodeAccumulatingWithCursors avgt 123101.484 ± 81.025 B/op decodeFailFast avgt 83169.587 ± 13.533 B/op decodeFailFastWithHistory avgt 83154.850 ± 24.407 B/op decodeFailFastWithCursors avgt 123085.868 ± 86.616 B/op decodeFree avgt 2001547.110 ± 43.804 B/op decodeSimple avgt 727288.167 ± 0.414 B/op ``` ] --- .left-column[ ## Intro ## Cursors ## Algebra ## Goals ## Approach ] .right-column[ ### Example: two versions with cursors ```scala import io.circe.{ Decoder => DecoderC }, io.circe.literal._ val doc = json"""{"id":"abc","names":{"screen":"Foo McBar"},"age":25}""" case class User(id: String, screenName: String, age: Option[Int]) val decodeUserMonadic: DecoderC[User] = DecoderC.instance { c => for { i <- c.get[String]("id") n <- c.downField("names").get[String]("screen") a <- c.get[Option[Int]]("age") } yield User(i, n, a) } val decodeUserApplicative: DecoderC[User] = DecoderC.instance { c => ( c.get[String]("id"), c.downField("names").get[String]("screen"), c.get[Option[Int]]("age") ).mapN(User(_, _, _)) } ``` ] --- .left-column[ ## Intro ## Cursors ## Algebra ## Goals ## Approach ] .right-column[ ### Example: cursors and algebra ```scala val decodeUserCursor: DecoderC[User] = DecoderC.instance { c => ( c.get[String]("id"), c.downField("names").get[String]("screen"), c.get[Option[Int]]("age") ).mapN(User(_, _, _)) } import io.circe.algebra.{ Decoder => DecoderA, ops } val decodeUserAlgebra: DecoderA[User] = DecoderA.instance( ( ops.get[String]("id"), ops.downField("names").get[String]("screen"), ops.get[Option[Int]]("age") ).mapN(User(_, _, _)) ) ``` ] --- .left-column[ ## Intro ## Cursors ## Algebra ## Goals ## Approach ] .right-column[ ### Other topics / complications Optimizations: * Redundant operations (e.g. `ReadLong`, `ReadMap`) * Mutability in interpreters Semantics: * Non-JSON operations (e.g. `ReadTimestamp`) * Bracketing * Optional decoding * Representation of errors more generally ] --- .left-column[ ## Intro ## Cursors ## Algebra ## Goals ## Approach ## Future ] .right-column[ ### Release plans * Circe 0.9.0 this week * Circe 0.10.0 + circe-algebra 0.1.0 by January 1 * Circe 1.0.0 by the end of January ] --- .left-column[ ## Intro ## Cursors ## Algebra ## Goals ## Approach ## Future ] .right-column[ ### Release plans * Circe 0.9.0 this week * Circe 0.10.0 + circe-algebra 0.1.0 by January 1 * Circe 1.0.0 by the end of January Open questions: * Should cursors stay in core? * Should the AST stay in core? * How many non-JSON operations? ] --- .left-column[ ## Intro ## Cursors ## Algebra ## Goals ## Approach ## Future ] .right-column[ ### Release plans * Circe 0.9.0 this week * Circe 0.10.0 + circe-algebra 0.1.0 by January 1 * Circe 1.0.0 by the end of January Open questions: * Should cursors stay in core? * Should the AST stay in core? * How many non-JSON operations? Will need help with: * Interpreters: YAML, BSON by 1.0.0 * Scalafix rewrite rules ] --- .left-column[ ## Intro ## Cursors ## Algebra ## Goals ## Approach ## Future ] .right-column[ ### Thanks! https://github.com/travisbrown/circe-algebra ]