# JSON numbers in circe 0.3.0

I’m publishing this article as a blog post (rather than as part of the circe documentation) because it’s intended to be a discursive overview of the representation of JSON numbers in circe 0.3.0, not as an up-to-date description of the implementation in whatever the current circe version happens to be when you’re reading this. For information about JSON numbers in circe versions after 0.3.0, please see the project documentation (in particular the changelogs and API docs).

## The syntax of JSON numbers

JSON numbers have a fairly straightforward grammar—you’ve got an optional sign, an integral part, an optional decimal part, and an optional exponent (with 10 as the base). There are some details about where e.g. `+`

and `0`

are allowed, but these are mostly uninteresting. Given a representation (for example a 4-tuple of strings), writing a complete, correct, and reasonably efficient parser for JSON number expressions is unlikely to take an experienced programmer more than half an hour or so. In short: this part is boring and well-defined by the spec.

## Representing JSON numbers

Whether we’re writing a JSON library or an application that processes JSON, we’ll generally want some kind of normalized form for JSON numbers that collapses distinctions between some pairs of valid JSON number strings. For example, it’s unlikely that we’ll want to preserve a distinction between these two expressions:

```
1e100
1E100
```

Or these:

```
100.0
100.00000000
```

Or maybe even these:

```
100
1e2
```

Other cases are more difficult. What about numbers with and without decimal parts that are made up entirely of zeros?

```
100
100.00000000
```

Or signed zeros?

```
0.0
-0.0
```

Or numbers that have the same double-precision floating-point representation?

```
0.00...imagine a few hundred more zeros here...001
0.0
```

In circe only the last two cases are distinguished (assuming the JSON parser supports these distinctions; Jawn does, and Scala.js supports signed zeros but does lose precision). The following rules summarize the distinctions that circe supports:

- Precision is never lost (assuming the JSON parser doesn’t lose it)
- If there’s an exponent, the case of the
`e`

is irrelevant. - More generally, whether the same number is written with an exponent or not is irrelevant.
- Negative and positive zero are different.

The general principle is that if there are reasonable use cases for making a distinction, circe should support it. Signed zeros are useful for some numerical applications, so we preserve the sign. If someone wants to make a case for distinguishing `100.0`

and `100`

, that could potentially happen in a future version. Distinguishing `1e2`

and `1E2`

is probably a nonstarter.

## Implementing the JSON number representation

The last section says that circe *never* loses precision, but clearly there have to be some limits on the size of the numbers we can represent. According to the JSON grammar, JSON numbers can be arbitrarily large—the grammar would happily accept a number with trillions of digits in the exponent. RFC 7159 is a little more grounded:

This specification allows implementations to set limits on the range and precision of numbers accepted.

circe follows Argonaut in aiming to make this limit really, really high—something more or less like “does the expression fit in memory?”. In both circe and Argonaut (6.1) we can do something like the following:

```
import scalaz._, Scalaz._, argonaut.Parse
val \/-(x) = Parse.parse(s"""1e${ "9" * 1000 }""")
val \/-(y) = Parse.parse(s"""10e${ "9" * 999 }8""")
x.nospaces // will be s"""1e${ "9" * 1000 }"""
x === y // will be true
```

In Argonaut (and circe before 0.3.0) this is accomplished by representing large numbers as a pair of a `BigDecimal`

and a `BigInt`

exponent, with the `BigDecimal`

either being zero or having a single decimal digit to the right of the decimal point.

This works great for equality, but unfortunately that’s all Argonaut uses this representation for. If we try to do anything with these large number values except print them or compare them, they start to break in different ways:

```
scala> x.number.map(_.toBigDecimal)
java.lang.NumberFormatException
at java.math.BigDecimal.parseExp(BigDecimal.java:638)
...
scala> x.number.map(_.toLong)
java.lang.NumberFormatException
at java.math.BigDecimal.parseExp(BigDecimal.java:638)
...
```

(On the current Argonaut head `toDouble`

fails similarly, but this is a regression from 6.1, where it returns positive infinity.)

Runtime exceptions are one thing, but it gets worse: user input can actually cause a thread to hang pretty much forever:

`Parse.parse("1e2147483647").map(_.number.map(_.toBigInt))`

This attempts to create a `BigInt`

with `Int.MaxValue`

digits, which takes… a very, very long time.

circe 0.3.0 tries to make this situation a little less horrible by introducing a new big number type, which I’ve named `BiggerDecimal`

. This type is a lot like `java.math.BigDecimal`

except that the scale is a `BigInteger`

instead of an `int`

, and the unscaled value is constrained to have no trailing zeros (for the sake of making equality easy to determine). It also provides a much more limited set of operations than `BigDecimal`

, and (most importantly) the operations it does provide are guaranteed not to have godawful resource requirements.

All of this means that we can write the following in circe:

```
import cats.data.Xor, io.circe.jawn.parse
val Xor.Right(Some(x)) = parse(s"""1e${ "9" * 1000 }""").map(_.asNumber)
val Xor.Right(Some(y)) = parse("1e2147483647").map(_.asNumber)
```

And then:

```
scala> x.toBigDecimal
res0: Option[BigDecimal] = None
scala> x.toLong
res1: Option[Long] = None
scala> x.toDouble
res2: Double = Infinity
scala> y.toBigDecimal
res3: Option[BigDecimal] = Some(1E+2147483647)
scala> y.toBigInt
res4: Option[BigInt] = None
```

The conversions of `x`

to `BigDecimal`

and `y`

to `BigInt`

fail (immediately and safely) because they are determined to be too expensive. You can still round-trip these values back to JSON, compare them for equality, ask whether they’re whole, etc.—you just can’t convert them to these types.

The conversion of `x`

to `Long`

fails for a simpler reason—its value is outside the range of the long integer type. In accordance with the horrible nature of `Double`

, `x.toDouble`

is more lossy than the other operations—it returns the nearest `Double`

value or one of the infinities if the value is out of range.

## Representing JSON numbers: practical considerations

We could simply represent JSON numbers as `BiggerDecimal`

values, but circe uses a slightly more complex representation for practical reasons. If we’re constructing a JSON number value from a `Long`

or `Double`

, for example, we might as well make it possible to avoid converting those numbers to `BiggerDecimal`

values. The following is a simplified version of circe’s JSON number ADT (note that these constructors are not part of the public API):

```
case class JsonDecimal(value: String) extends JsonNumber
case class JsonBiggerDecimal(value: BiggerDecimal) extends JsonNumber
case class JsonBigDecimal(value: BigDecimal) extends JsonNumber
case class JsonLong(value: Long) extends JsonNumber
case class JsonDouble(value: Double) extends JsonNumber
```

The `JsonDecimal`

constructor is provided for cases where our parser hands us a string that has already been validated as a JSON number and we want to parse it into a `BiggerDecimal`

lazily. The final three constructors are provided solely for the sake of efficiency.

## Decoding

One of the goals of circe is to make it possible for users to avoid ever interacting with types like `Json`

or `JsonNumber`

. This means that everything above is (ideally) just a bunch of implementation details. Typically users will work with JSON numbers by asking for them to be decoded into meaningful types that have nothing to do with JSON. circe 0.3.0 introduces a few changes in this respect.

The most important of these changes is a more clear-cut distinction between three groups of numeric types. The first group is the arbitrary-precision types: `BigInt`

and `BigDecimal`

. circe’s decoders for these types will succeed under two conditions:

- an exact representation is possible (e.g. non-whole JSON number will never be decoded into
`BigInt`

values); and - coming up with an exact representation wouldn’t consume too many resources.

The second group is the integral types: `Byte`

, `Short`

, `Int`

, and `Long`

. In Argonaut (and in circe before 0.3.0), the decoders for these types would happily truncate JSON number values:

```
scala> argonaut.Parse.decodeOption[Int]("0.99999999")
res0: Option[Int] = Some(0)
scala> argonaut.Parse.decodeOption[Short]("32768")
res1: Option[Short] = Some(32767)
```

This is no longer the case in circe—these decoders now succeed only if an exact representation is possible.

```
scala> io.circe.jawn.decode[Int]("0.99999999")
res0: cats.data.Xor[io.circe.Error,Int] = Left(io.circe.DecodingFailure: Int)
scala> io.circe.jawn.decode[Short]("32768")
res1: cats.data.Xor[io.circe.Error,Short] = Left(io.circe.DecodingFailure: Short)
```

The third group is the floating-point types: `Float`

and `Double`

. These behave the same as they always have—they will always succeed for any JSON number we can parse, but they do this by truncating and losing precision:

```
scala> io.circe.jawn.decode[Double](s"""1e${ "9" * 1000 }""")
res2: cats.data.Xor[io.circe.Error,Double] = Right(Infinity)
scala> io.circe.jawn.decode[Double]("1e-10000000")
res3: cats.data.Xor[io.circe.Error,Double] = Right(0.0)
```

If you want to recover the old behavior for e.g. `Int`

, it’s of course possible to decode your JSON into a `Double`

and then truncate or round as desired.

## Conclusion

All of the above adds some complexity to circe’s number implementation, but I think the result is a much more consistent and reliable API. This is brand new work (I merged the `BiggerDecimal`

stuff yesterday morning and released circe 0.3.0 last night), so there may still be bugs (although the test coverage is pretty good), and I’m sure there are aspects of the implementation that could be improved. Any feedback—as a GitHub issue, in chat on Gitter, complaints directed at me on Twitter, etc.—would be greatly appreciated.