circe-algebra

class: center, middle

# <span>circe-algebra</span><br/><span class="title-subtitle">(decoding after cursors)</span>

<h2 class="title-org">Travis Brown<br/><span class="title-org"><a href="https://twitter.com/travisbrown">@travisbrown</a></span>
</h2>

---

layout: true

<div class="footer">
  <div class="footer-title">
    <h1><span class="sas">circe-algebra</span> | <a href="https://twitter.com/travisbrown">@travisbrown</a></h1>
  </div>
</div>

---

.left-column[
  ## Intro
]
.right-column[

### What is Circe?

Fast, functional JSON encoding and decoding for Scala
]
---

.left-column[
  ## Intro
]
.right-column[

### What is Circe?

Fast, functional JSON encoding and decoding for Scala

* Built on [Cats](https://github.com/typelevel/cats)
* Integrated with [Typelevel](https://github.com/typelevel) ecosystem
* Three nines of purity
* Pretty cool community (almost 100 contributors)
* Competitive with [spray-json](https://github.com/spray/spray-json) in terms of performance
]

---
.left-column[
  ## Intro
]
.right-column[

### What is Circe?

Fast, functional JSON encoding and decoding for Scala

### Who's using Circe?

The BBC, Chartboost, Drivetribe, The Guardian, SoundCloud, Spotify, Stripe, Twilio, Zalando, Zendesk, ...
]

---

.left-column[
  ## Intro
]
.right-column[

### History of Circe

* January 2015: First Cats commit
* August 2015: Circe adapted from Argonaut 6.1
* February 2016: Circe 0.3 released
* August 2017: 250k monthly Circe downloads
* December 2017: Cats 1.0
* January (?) 2018: Circe 1.0
]

---

.left-column[
  ## Intro
]
.right-column[

### History of Circe

### Motivations for the fork

* Better fit with the design goals of [Finch](https://github.com/finagle/finch)
* Help drive Cats development and adoption
* Playground for [library design experimentation](https://github.com/circe/circe/blob/master/DESIGN.md)
]

---

.left-column[
  ## Intro
]
.right-column[

### Differences between Circe and Argonaut

* Optional [error accumulation](https://meta.plasm.us/posts/2015/12/17/error-accumulating-decoders-in-circe/)
* More advanced [generic derivation](https://meta.plasm.us/posts/2016/01/14/configuring-generic-derivation/)
* More, bigger [numbers](https://meta.plasm.us/posts/2016/02/14/json-numbers-in-circe-0.3.0/)
* [Performance](https://github.com/circe/circe-benchmarks) (2-3x)
]

---

.left-column[
  ## Intro
]
.right-column[

### Things that still aren't that different

Decoding is fundamentally built on _cursors_
]
---

.left-column[
  ## Intro
]
.right-column[

### Things that still aren't that different

Decoding is fundamentally built on _cursors_

[Paul Chiusano](https://pchiusano.github.io/) in a 2014 [blog post](https://pchiusano.github.io/2014-08-12/zippers-not-useful.html):

> The general issue here is that neither parsing nor serialization really require the concept of a functional cursor _at all_. _Can it_ be done? Yes, of course. And there are even some nice libraries that happen to use zippers for these tasks. But _should_ it be done? Probably not.
]
---

.left-column[
  ## Intro
]
.right-column[

### Moving from cursors to circe-algebra

More contexts:

* Fail-fast vs. error accumulation

]
---

.left-column[
  ## Intro
]
.right-column[

### Moving from cursors to circe-algebra

More contexts:

* Fail-fast vs. error accumulation
* <span class="title-org">History vs. no history</span>
* <span class="title-org">YOLO (exception-throwing à la spray-json)</span>

]
---

.left-column[
  ## Intro
]
.right-column[

### Moving from cursors to circe-algebra

More contexts:

* Fail-fast vs. error accumulation
* <span class="title-org">History vs. no history</span>
* <span class="title-org">YOLO (exception-throwing à la spray-json)</span>

More formats:

* `io.circe.Json`
* YAML (via circe-yaml)

]
---

.left-column[
  ## Intro
]
.right-column[

### Moving from cursors to circe-algebra

More contexts:

* Fail-fast vs. error accumulation
* <span class="title-org">History vs. no history</span>
* <span class="title-org">YOLO (exception-throwing à la spray-json)</span>

More formats:

* `io.circe.Json`, <span class="title-org">`scalajson.ast.JValue`, `spray.json.JsValue`</span>
* <span class="title-org">YAML (without conversions)</span>
* <span class="title-org">BSON, Avro, etc.</span>

]
---

.left-column[
  ## Intro
]
.right-column[

### Moving from cursors to circe-algebra

More contexts:

* Fail-fast vs. error accumulation
* <span class="title-org">History vs. no history</span>
* <span class="title-org">YOLO (exception-throwing à la spray-json)</span>

More formats:

More other stuff:

* Elegance (no unneeded operations)
* Performance (operations can be inspected, rearranged)

]
---

.left-column[
  ## Intro
]
.right-column[

### Potentially interesting background resources

https://github.com/travisbrown/circe-algebra
https://github.com/circe/circe/issues/228
https://pchiusano.github.io/2014-08-12/zippers-not-useful.html

]

---

.left-column[
  ## Intro
  ## Cursors
]
.right-column[

### What do we mean by "cursor" (or "zipper")?

* Data structure + pointer
* Purely functional
* Supports navigation and "modification"

]
---

.left-column[
  ## Intro
  ## Cursors
]
.right-column[

### Cursors in this context

* `Cursor`: JSON AST + pointer
* `HCursor`: `Cursor` + history
* `ACursor`: `HCursor` that may have failed

]
---

.left-column[
  ## Intro
  ## Cursors
]
.right-column[

### Cursors in this context

```scala
import io.circe._, io.circe.literal._

val doc =
  json"""{ "name": "things", "values": [[ "foo", 100 ]] }"""

val c = doc.hcursor

val nameC = c.downField("name")
val badC = c.downField("nonsense")
val firstValueC = c.downField("values").downN(0)
val firstValue = firstValueC.as[(String, Int)]
```

]
---

.left-column[
  ## Intro
  ## Cursors
]
.right-column[

### Cursors are really powerful

```scala
import io.circe._, io.circe.literal._, io.circe.syntax._

val doc =
  json"""{"foo":[{"bar":1,"qux":true},0.01],"baz":[true]}"""

def edited =
  doc.hcursor
   .downField("foo")
   .downN(0)
   .downField("bar")
   .set(List(1, 2, 3).asJson)
   .top
```
]

---

.left-column[
  ## Intro
  ## Cursors
]
.right-column[

### Cursors for decoding

```scala
import io.circe._, io.circe.jawn.decode

val doc =
  """{"id":"abc","names":{"screen":"Foo McBar"},"age":25}"""

case class User(id: String, screenName: String, age: Option[Int])

implicit val decodeUser: Decoder[User] = Decoder.instance { c =>
  for {
    i <- c.get[String]("id")
    n <- c.downField("names").get[String]("screen")
    a <- c.get[Option[Int]]("age")       
  } yield User(i, n, a)
}

val user = decode[User](doc)
```
]

---

.left-column[
  ## Intro
  ## Cursors
]
.right-column[

### Cursors for decoding

General pattern:

* Move cursor
* Read into context
* Move cursor
* Read into context
* Compose results in context
]

---

.left-column[
  ## Intro
  ## Cursors
]
.right-column[

### Cursors for decoding

General pattern:

* Move cursor
* Read into context
* Move cursor
* Read into context
* Compose results in context

Types:

* Navigation: `(H|A)Cursor => ACursor`
* Decoding: `(H|A)Cursor => Result[A]`
]

---
.left-column[
  ## Intro
  ## Cursors
  ## Algebra
]
.right-column[

### With an algebra of operations

Both navigation and reading are "operations":

```scala
sealed abstract class Op[A]

case object ReadNull extends Op[Unit]
case object ReadBoolean extends Op[Boolean]
case object ReadNumber extends Op[BigDecimal]
case object ReadString extends Op[String]
// ...
case class DownField(key: String) extends Op[Unit]
case class DownAt(index: Int) extends Op[Unit]
```

]

---
.left-column[
  ## Intro
  ## Cursors
  ## Algebra
]
.right-column[

### With an algebra of operations

Both navigation and reading are "operations":

```scala
sealed abstract class Op[A]

The new part:

* Interpretation: `Interpreter[I, F] ≈ Op[A] => I => F[A]`

]
---
.left-column[
  ## Intro
  ## Cursors
  ## Algebra
]
.right-column[

### Cursors vs. our algebra of operation

The cursor approach:

* `Decoder[A] ≈ HCursor => Either[DecodingFailure, A]`

The algebra approach:

* `Decoder[A] ≈ Op[A]`
* `Interpreter[I, F] ≈ Op[A] => I => F[A]`

]
---
.left-column[
  ## Intro
  ## Cursors
  ## Algebra
]
.right-column[

### Cursors vs. our algebra of operation

The cursor approach:

* `Decoder[A] ≈ HCursor => Either[DecodingFailure, A]`

The algebra approach:

* `Decoder[A] ≈ Op[A]`
* `Interpreter[I, F] ≈ Op[A] => I => F[A]`

We've split up the work of decoding into two parts:

* Operations (a simple data structure)
* Interpreters (decoding engines)

]
---
.left-column[
  ## Intro
  ## Cursors
  ## Algebra
  ## Goals
]
.right-column[

### Design goals

* Should be ~90% source-compatible with Circe 0.9
* Should be _at least_ as fast as Circe 0.9 in all benchmarks
* It shouldn't be possible to write nonsense decoders
* `m` formats and `n` contexts shouldn't involve `m * n` code

]
---
.left-column[
  ## Intro
  ## Cursors
  ## Algebra
  ## Goals
]
.right-column[

### Design goals

Explicitly not a design goal:

* Interpreters don't need to be easy to write

]
---
.left-column[
  ## Intro
  ## Cursors
  ## Algebra
  ## Goals
]
.right-column[

### Design goals

Explicitly not a design goal:

* Interpreters don't need to be easy to write

Rationale:

* We _may_ write a few dozen interpreters for Circe, ever
* We can easily test the hell out of these
* Optimizing for simple implementations seems kind of silly
]
---
.left-column[
  ## Intro
  ## Cursors
  ## Algebra
  ## Goals
  ## Approach
]
.right-column[

### Free

* It's simple
* It's extensible
* It provides some nice guardrails

]
---
.left-column[
  ## Intro
  ## Cursors
  ## Algebra
  ## Goals
  ## Approach
]
.right-column[

### Free

* It's simple
* It's extensible
* It provides some nice guardrails

Problems:

* We don't actually care about simplicity at this level
* There's a ton of overhead
* It puts constraints on us we don't necessarily want

]
---
.left-column[
  ## Intro
  ## Cursors
  ## Algebra
  ## Goals
  ## Approach
]
.right-column[

### Finally tagless

* It's the cool new thing

Problems:

* I don't think it fits our use case

]

---
.left-column[
  ## Intro
  ## Cursors
  ## Algebra
  ## Goals
  ## Approach
]
.right-column[

### Adding composition operators to our algebra

```scala
case object ReadNull extends Op[Unit]
case object ReadBoolean extends Op[Boolean]
// ...
case class DownField(key: String) extends Op[Unit]
case class DownAt(index: Int) extends Op[Unit]
// ...
case class Pure[A](value: A) extends Op[A]
case class Bind[A, B](opA: Op[A], f: A => Op[B]) extends Op[B]
```

]
---
.left-column[
  ## Intro
  ## Cursors
  ## Algebra
  ## Goals
  ## Approach
]
.right-column[

### Adding composition operators to our algebra

Advantages:

* It's fast

]
---
.left-column[
  ## Intro
  ## Cursors
  ## Algebra
  ## Goals
  ## Approach
]
.right-column[

### Adding composition operators to our algebra

Advantages:

* It's fast

Disadvantages:

* It's extra stuff our interpreters have to worry about
* It means we can't (as conveniently) compose algebras

]

---

.left-column[
  ## Intro
  ## Cursors
  ## Algebra
  ## Goals
  ## Approach
]
.right-column[

### Throughput

```markdown
Benchmark                      Mode      Score        Error  Units

decodeAccumulating             avgt  32279.290 ±   4567.515  ns/op
decodeAccumulatingWithHistory  avgt  38935.287 ±   9136.040  ns/op
decodeAccumulatingWithCursors  avgt  46498.210 ±   8795.706  ns/op

decodeFailFast                 avgt  34671.843 ±   1805.075  ns/op
decodeFailFastWithHistory      avgt  42670.889 ±  10314.980  ns/op
decodeFailFastWithCursors      avgt  53211.451 ±  10669.691  ns/op

decodeFree                     avgt 857278.165 ± 135684.433  ns/op
decodeSimple                   avgt 290936.679 ±  92996.094  ns/op
```
]
---

.left-column[
  ## Intro
  ## Cursors
  ## Algebra
  ## Goals
  ## Approach
]
.right-column[

### Allocation rates

```markdown
Benchmark                      Mode       Score    Error   Units

decodeAccumulating             avgt   83168.013 ±  0.005    B/op
decodeAccumulatingWithHistory  avgt   83170.519 ± 21.556    B/op
decodeAccumulatingWithCursors  avgt  123101.484 ± 81.025    B/op

decodeFailFast                 avgt   83169.587 ± 13.533    B/op
decodeFailFastWithHistory      avgt   83154.850 ± 24.407    B/op
decodeFailFastWithCursors      avgt  123085.868 ± 86.616    B/op

decodeFree                     avgt 2001547.110 ± 43.804    B/op
decodeSimple                   avgt  727288.167 ±  0.414    B/op
```
]
---
.left-column[
  ## Intro
  ## Cursors
  ## Algebra
  ## Goals
  ## Approach
]
.right-column[

### Example: two versions with cursors

```scala
import io.circe.{ Decoder => DecoderC }, io.circe.literal._

val doc =
  json"""{"id":"abc","names":{"screen":"Foo McBar"},"age":25}"""

case class User(id: String, screenName: String, age: Option[Int])

val decodeUserMonadic: DecoderC[User] =
  DecoderC.instance { c =>
    for {
      i <- c.get[String]("id")
      n <- c.downField("names").get[String]("screen")
      a <- c.get[Option[Int]]("age")       
    } yield User(i, n, a)
  }

val decodeUserApplicative: DecoderC[User] =
  DecoderC.instance { c =>
    (
      c.get[String]("id"),
      c.downField("names").get[String]("screen"),
      c.get[Option[Int]]("age")       
    ).mapN(User(_, _, _))
  }
```
]
---
.left-column[
  ## Intro
  ## Cursors
  ## Algebra
  ## Goals
  ## Approach
]
.right-column[

### Example: cursors and algebra

```scala
val decodeUserCursor: DecoderC[User] =
  DecoderC.instance { c =>
    (
      c.get[String]("id"),
      c.downField("names").get[String]("screen"),
      c.get[Option[Int]]("age")       
    ).mapN(User(_, _, _))
  }

import io.circe.algebra.{ Decoder => DecoderA, ops }

val decodeUserAlgebra: DecoderA[User] =
  DecoderA.instance(
    (
      ops.get[String]("id"),
      ops.downField("names").get[String]("screen"),
      ops.get[Option[Int]]("age")
    ).mapN(User(_, _, _))
  )
```
]
---
.left-column[
  ## Intro
  ## Cursors
  ## Algebra
  ## Goals
  ## Approach
]
.right-column[

### Other topics / complications

Optimizations:

* Redundant operations (e.g. `ReadLong`, `ReadMap`)
* Mutability in interpreters

Semantics:

* Non-JSON operations (e.g. `ReadTimestamp`)
* Bracketing
* Optional decoding
* Representation of errors more generally

]
---
.left-column[
  ## Intro
  ## Cursors
  ## Algebra
  ## Goals
  ## Approach
  ## Future
]
.right-column[

### Release plans

* Circe 0.9.0 this week
* Circe 0.10.0 + circe-algebra 0.1.0 by January 1
* Circe 1.0.0 by the end of January

]
---
.left-column[
  ## Intro
  ## Cursors
  ## Algebra
  ## Goals
  ## Approach
  ## Future
]
.right-column[

### Release plans

* Circe 0.9.0 this week
* Circe 0.10.0 + circe-algebra 0.1.0 by January 1
* Circe 1.0.0 by the end of January

Open questions:

* Should cursors stay in core?
* Should the AST stay in core?
* How many non-JSON operations?
]
---
.left-column[
  ## Intro
  ## Cursors
  ## Algebra
  ## Goals
  ## Approach
  ## Future
]
.right-column[

### Release plans

* Circe 0.9.0 this week
* Circe 0.10.0 + circe-algebra 0.1.0 by January 1
* Circe 1.0.0 by the end of January

Open questions:

* Should cursors stay in core?
* Should the AST stay in core?
* How many non-JSON operations?

Will need help with:

* Interpreters: YAML, BSON by 1.0.0
* Scalafix rewrite rules
]
---
.left-column[
  ## Intro
  ## Cursors
  ## Algebra
  ## Goals
  ## Approach
  ## Future
]
.right-column[

### Thanks!

https://github.com/travisbrown/circe-algebra
]