I’m publishing this article as a blog post (rather than as part of the circe documentation) because it’s intended to be a discursive overview of the representation of JSON numbers in circe 0.3.0, not as an up-to-date description of the implementation in whatever the current circe version happens to be when you’re reading this. For information about JSON numbers in circe versions after 0.3.0, please see the project documentation (in particular the changelogs and API docs).
The syntax of JSON numbers
JSON numbers have a fairly straightforward grammar—you’ve got an optional sign, an integral part, an optional decimal part, and an optional exponent (with 10 as the base). There are some details about where e.g.
0 are allowed, but these are mostly uninteresting. Given a representation (for example a 4-tuple of strings), writing a complete, correct, and reasonably efficient parser for JSON number expressions is unlikely to take an experienced programmer more than half an hour or so. In short: this part is boring and well-defined by the spec.
Representing JSON numbers
Whether we’re writing a JSON library or an application that processes JSON, we’ll generally want some kind of normalized form for JSON numbers that collapses distinctions between some pairs of valid JSON number strings. For example, it’s unlikely that we’ll want to preserve a distinction between these two expressions:
Or maybe even these:
Other cases are more difficult. What about numbers with and without decimal parts that are made up entirely of zeros?
Or signed zeros?
Or numbers that have the same double-precision floating-point representation?
In circe only the last two cases are distinguished (assuming the JSON parser supports these distinctions; Jawn does, and Scala.js supports signed zeros but does lose precision). The following rules summarize the distinctions that circe supports:
- Precision is never lost (assuming the JSON parser doesn’t lose it)
- If there’s an exponent, the case of the
- More generally, whether the same number is written with an exponent or not is irrelevant.
- Negative and positive zero are different.
The general principle is that if there are reasonable use cases for making a distinction, circe should support it. Signed zeros are useful for some numerical applications, so we preserve the sign. If someone wants to make a case for distinguishing
100, that could potentially happen in a future version. Distinguishing
1E2 is probably a nonstarter.
Implementing the JSON number representation
The last section says that circe never loses precision, but clearly there have to be some limits on the size of the numbers we can represent. According to the JSON grammar, JSON numbers can be arbitrarily large—the grammar would happily accept a number with trillions of digits in the exponent. RFC 7159 is a little more grounded:
This specification allows implementations to set limits on the range and precision of numbers accepted.
circe follows Argonaut in aiming to make this limit really, really high—something more or less like “does the expression fit in memory?”. In both circe and Argonaut (6.1) we can do something like the following:
In Argonaut (and circe before 0.3.0) this is accomplished by representing large numbers as a pair of a
BigDecimal and a
BigInt exponent, with the
BigDecimal either being zero or having a single decimal digit to the right of the decimal point.
This works great for equality, but unfortunately that’s all Argonaut uses this representation for. If we try to do anything with these large number values except print them or compare them, they start to break in different ways:
(On the current Argonaut head
toDouble fails similarly, but this is a regression from 6.1, where it returns positive infinity.)
Runtime exceptions are one thing, but it gets worse: user input can actually cause a thread to hang pretty much forever:
This attempts to create a
Int.MaxValue digits, which takes… a very, very long time.
circe 0.3.0 tries to make this situation a little less horrible by introducing a new big number type, which I’ve named
BiggerDecimal. This type is a lot like
java.math.BigDecimal except that the scale is a
BigInteger instead of an
int, and the unscaled value is constrained to have no trailing zeros (for the sake of making equality easy to determine). It also provides a much more limited set of operations than
BigDecimal, and (most importantly) the operations it does provide are guaranteed not to have godawful resource requirements.
All of this means that we can write the following in circe:
The conversions of
BigInt fail (immediately and safely) because they are determined to be too expensive. You can still round-trip these values back to JSON, compare them for equality, ask whether they’re whole, etc.—you just can’t convert them to these types.
The conversion of
Long fails for a simpler reason—its value is outside the range of the long integer type. In accordance with the horrible nature of
x.toDouble is more lossy than the other operations—it returns the nearest
Double value or one of the infinities if the value is out of range.
Representing JSON numbers: practical considerations
We could simply represent JSON numbers as
BiggerDecimal values, but circe uses a slightly more complex representation for practical reasons. If we’re constructing a JSON number value from a
Double, for example, we might as well make it possible to avoid converting those numbers to
BiggerDecimal values. The following is a simplified version of circe’s JSON number ADT (note that these constructors are not part of the public API):
JsonDecimal constructor is provided for cases where our parser hands us a string that has already been validated as a JSON number and we want to parse it into a
BiggerDecimal lazily. The final three constructors are provided solely for the sake of efficiency.
One of the goals of circe is to make it possible for users to avoid ever interacting with types like
JsonNumber. This means that everything above is (ideally) just a bunch of implementation details. Typically users will work with JSON numbers by asking for them to be decoded into meaningful types that have nothing to do with JSON. circe 0.3.0 introduces a few changes in this respect.
The most important of these changes is a more clear-cut distinction between three groups of numeric types. The first group is the arbitrary-precision types:
BigDecimal. circe’s decoders for these types will succeed under two conditions:
- an exact representation is possible (e.g. non-whole JSON number will never be decoded into
- coming up with an exact representation wouldn’t consume too many resources.
The second group is the integral types:
Long. In Argonaut (and in circe before 0.3.0), the decoders for these types would happily truncate JSON number values:
This is no longer the case in circe—these decoders now succeed only if an exact representation is possible.
The third group is the floating-point types:
Double. These behave the same as they always have—they will always succeed for any JSON number we can parse, but they do this by truncating and losing precision:
If you want to recover the old behavior for e.g.
Int, it’s of course possible to decode your JSON into a
Double and then truncate or round as desired.
All of the above adds some complexity to circe’s number implementation, but I think the result is a much more consistent and reliable API. This is brand new work (I merged the
BiggerDecimal stuff yesterday morning and released circe 0.3.0 last night), so there may still be bugs (although the test coverage is pretty good), and I’m sure there are aspects of the implementation that could be improved. Any feedback—as a GitHub issue, in chat on Gitter, complaints directed at me on Twitter, etc.—would be greatly appreciated.