# JSON numbers in circe 0.3.0

I'm publishing this article as a blog post (rather than as part of the circe documentation) because it's intended to be a discursive overview of the representation of JSON numbers in circe 0.3.0, not as an up-to-date description of the implementation in whatever the current circe version happens to be when you're reading this. For information about JSON numbers in circe versions after 0.3.0, please see the project documentation (in particular the changelogs and API docs).

## The syntax of JSON numbersđź”—

JSON numbers have a fairly straightforward grammarâ€”you've got an optional sign, an
integral part, an optional decimal part, and an optional exponent (with 10 as the base). There are
some details about where e.g. `+`

and `0`

are allowed, but these are mostly uninteresting. Given a
representation (for example a 4-tuple of strings), writing a complete, correct, and reasonably
efficient parser for JSON number expressions is unlikely to take an experienced programmer more than
half an hour or so. In short: this part is boring and well-defined by the spec.

## Representing JSON numbersđź”—

Whether we're writing a JSON library or an application that processes JSON, we'll generally want some kind of normalized form for JSON numbers that collapses distinctions between some pairs of valid JSON number strings. For example, it's unlikely that we'll want to preserve a distinction between these two expressions:

```
1e100
1E100
```

Or these:

```
100.0
100.00000000
```

Or maybe even these:

```
100
1e2
```

Other cases are more difficult. What about numbers with and without decimal parts that are made up entirely of zeros?

```
100
100.00000000
```

Or signed zeros?

```
0.0
-0.0
```

Or numbers that have the same double-precision floating-point representation?

```
0.00...imagine a few hundred more zeros here...001
0.0
```

In circe only the last two cases are distinguished (assuming the JSON parser supports these distinctions; Jawn does, and Scala.js supports signed zeros but does lose precision). The following rules summarize the distinctions that circe supports:

- Precision is never lost (assuming the JSON parser doesn't lose it)
- If there's an exponent, the case of the
`e`

is irrelevant. - More generally, whether the same number is written with an exponent or not is irrelevant.
- Negative and positive zero are different.

The general principle is that if there are reasonable use cases for making a distinction, circe
should support it. Signed zeros are useful for some numerical applications, so we preserve the sign.
If someone wants to make a case for distinguishing `100.0`

and `100`

, that could potentially happen
in a future version. Distinguishing `1e2`

and `1E2`

is probably a nonstarter.

## Implementing the JSON number representationđź”—

The last section says that circe *never* loses precision, but clearly there have to be some limits
on the size of the numbers we can represent. According to the JSON grammar, JSON numbers can be
arbitrarily largeâ€”the grammar would happily accept a number with trillions of digits in the
exponent. RFC 7159 is a little more grounded:

This specification allows implementations to set limits on the range and precision of numbers accepted.

circe follows Argonaut in aiming to make this limit really, really highâ€”something more or less like "does the expression fit in memory?". In both circe and Argonaut (6.1) we can do something like the following:

```
import scalaz._, Scalaz._, argonaut.Parse
val \/-(x) = Parse.parse(s"""1e${ "9" * 1000 }""")
val \/-(y) = Parse.parse(s"""10e${ "9" * 999 }8""")
x.nospaces // will be s"""1e${ "9" * 1000 }"""
x === y // will be true
```

In Argonaut (and circe before 0.3.0) this is accomplished by representing large numbers as a pair of
a `BigDecimal`

and a `BigInt`

exponent, with the `BigDecimal`

either being zero or having a single
decimal digit to the right of the decimal point.

This works great for equality, but unfortunately that's all Argonaut uses this representation for. If we try to do anything with these large number values except print them or compare them, they start to break in different ways:

```
scala> x.number.map(_.toBigDecimal)
java.lang.NumberFormatException
at java.math.BigDecimal.parseExp(BigDecimal.java:638)
...
scala> x.number.map(_.toLong)
java.lang.NumberFormatException
at java.math.BigDecimal.parseExp(BigDecimal.java:638)
...
```

(On the current Argonaut head `toDouble`

fails similarly, but this is a regression from 6.1, where
it returns positive infinity.)

Runtime exceptions are one thing, but it gets worse: user input can actually cause a thread to hang pretty much forever:

```
Parse.parse("1e2147483647").map(_.number.map(_.toBigInt))
```

This attempts to create a `BigInt`

with `Int.MaxValue`

digits, which takesâ€¦ a very, very long time.

circe 0.3.0 tries to make this situation a little less horrible by introducing a new big number
type, which I've named `BiggerDecimal`

. This type is a lot like
`java.math.BigDecimal`

except that the scale is a `BigInteger`

instead of an `int`

, and the unscaled
value is constrained to have no trailing zeros (for the sake of making equality easy to determine).
It also provides a much more limited set of operations than `BigDecimal`

, and (most importantly) the
operations it does provide are guaranteed not to have godawful resource requirements.

All of this means that we can write the following in circe:

```
import cats.data.Xor, io.circe.jawn.parse
val Xor.Right(Some(x)) = parse(s"""1e${ "9" * 1000 }""").map(_.asNumber)
val Xor.Right(Some(y)) = parse("1e2147483647").map(_.asNumber)
```

And then:

```
scala> x.toBigDecimal
res0: Option[BigDecimal] = None
scala> x.toLong
res1: Option[Long] = None
scala> x.toDouble
res2: Double = Infinity
scala> y.toBigDecimal
res3: Option[BigDecimal] = Some(1E+2147483647)
scala> y.toBigInt
res4: Option[BigInt] = None
```

The conversions of `x`

to `BigDecimal`

and `y`

to `BigInt`

fail (immediately and safely) because
they are determined to be too expensive. You can still round-trip these values back to JSON, compare
them for equality, ask whether they're whole, etc.â€”you just can't convert them to these types.

The conversion of `x`

to `Long`

fails for a simpler reasonâ€”its value is outside the range of the
long integer type. In accordance with the horrible nature of `Double`

, `x.toDouble`

is more lossy
than the other operationsâ€”it returns the nearest `Double`

value or one of the infinities if the
value is out of range.

## Representing JSON numbers: practical considerationsđź”—

We could simply represent JSON numbers as `BiggerDecimal`

values, but circe uses a slightly more
complex representation for practical reasons. If we're constructing a JSON number value from a
`Long`

or `Double`

, for example, we might as well make it possible to avoid converting those numbers
to `BiggerDecimal`

values. The following is a simplified version of
circe's JSON number ADT (note that these constructors are not part of the public
API):

```
case class JsonDecimal(value: String) extends JsonNumber
case class JsonBiggerDecimal(value: BiggerDecimal) extends JsonNumber
case class JsonBigDecimal(value: BigDecimal) extends JsonNumber
case class JsonLong(value: Long) extends JsonNumber
case class JsonDouble(value: Double) extends JsonNumber
```

The `JsonDecimal`

constructor is provided for cases where our parser hands us a string that has
already been validated as a JSON number and we want to parse it into a `BiggerDecimal`

lazily. The
final three constructors are provided solely for the sake of efficiency.

## Decodingđź”—

One of the goals of circe is to make it possible for users to avoid ever interacting with types like
`Json`

or `JsonNumber`

. This means that everything above is (ideally) just a bunch of implementation
details. Typically users will work with JSON numbers by asking for them to be decoded into
meaningful types that have nothing to do with JSON. circe 0.3.0 introduces a few changes in this
respect.

The most important of these changes is a more clear-cut distinction between three groups of numeric
types. The first group is the arbitrary-precision types: `BigInt`

and `BigDecimal`

. circe's decoders
for these types will succeed under two conditions:

- an exact representation is possible (e.g. non-whole JSON number will never be decoded into
`BigInt`

values); and - coming up with an exact representation wouldn't consume too many resources.

The second group is the integral types: `Byte`

, `Short`

, `Int`

, and `Long`

. In Argonaut (and in
circe before 0.3.0), the decoders for these types would happily truncate JSON number values:

```
scala> argonaut.Parse.decodeOption[Int]("0.99999999")
res0: Option[Int] = Some(0)
scala> argonaut.Parse.decodeOption[Short]("32768")
res1: Option[Short] = Some(32767)
```

This is no longer the case in circeâ€”these decoders now succeed only if an exact representation is possible.

```
scala> io.circe.jawn.decode[Int]("0.99999999")
res0: cats.data.Xor[io.circe.Error,Int] = Left(io.circe.DecodingFailure: Int)
scala> io.circe.jawn.decode[Short]("32768")
res1: cats.data.Xor[io.circe.Error,Short] = Left(io.circe.DecodingFailure: Short)
```

The third group is the floating-point types: `Float`

and `Double`

. These behave the same as they
always haveâ€”they will always succeed for any JSON number we can parse, but they do this by
truncating and losing precision:

```
scala> io.circe.jawn.decode[Double](s"""1e${ "9" * 1000 }""")
res2: cats.data.Xor[io.circe.Error,Double] = Right(Infinity)
scala> io.circe.jawn.decode[Double]("1e-10000000")
res3: cats.data.Xor[io.circe.Error,Double] = Right(0.0)
```

If you want to recover the old behavior for e.g. `Int`

, it's of course possible to decode your JSON
into a `Double`

and then truncate or round as desired.

## Conclusionđź”—

All of the above adds some complexity to circe's number implementation, but I think the result is a
much more consistent and reliable API. This is brand new work (I merged the `BiggerDecimal`

stuff
yesterday morning and released circe 0.3.0 last night), so there may still be bugs (although the
test coverage is pretty good), and I'm sure there are aspects of the implementation that could be
improved. Any feedbackâ€”as a GitHub issue, in chat on Gitter,
complaints directed at me on Twitter, etc.â€”would be greatly appreciated.