JSON parsing fails for integer values greater than BiggestInt.max #15413

ErikSchierboom · 2020-09-26T16:20:49Z

The JSON parsing functions (like parseJson and parseFile) fail when trying to parse an integer which value exceeds the maximum value that BiggestInt can hold, even if that value could be represented by the BiggestUInt type.

Example

import json

let jsonWithBigInteger = "9223372036854775808"

echo parseJson(jsonWithBigInteger)

See https://play.nim-lang.org/#ix=2yON

Current Output

/usercode/in.nim(5)      in
/playground/nim/lib/pure/json.nim(931) parseJson
/playground/nim/lib/pure/json.nim(847) parseJson
/playground/nim/lib/pure/json.nim(783) parseJson
/playground/nim/lib/pure/strutils.nim(1094) parseBiggestInt
/playground/nim/lib/pure/parseutils.nim(443) parseBiggestInt
/playground/nim/lib/pure/parseutils.nim(423) rawParseInt
/playground/nim/lib/pure/parseutils.nim(397) integerOutOfRangeError
Error: unhandled exception: Parsed integer outside of valid range [ValueError]

Expected Output

9223372036854775808

Possible Solution

In json.nim integers are represented using the BiggestInt type (see https://github.com/nim-lang/Nim/blob/devel/lib/pure/json.nim#L184 and https://github.com/nim-lang/Nim/blob/devel/lib/pure/json.nim#L795), which is signed and thus cannot contain any values that are greater than the maximum value for BiggestInt. A solution would be to detect if BiggestInt can hold the integer value and if not, use BiggestUInt.

Additional Information

This issue is blocking my project Exercism in which I have to parse the following JSON: https://github.com/exercism/problem-specifications/blob/master/exercises/grains/canonical-data.json#L66

$ nim -v
Nim Compiler Version 1.2.6 [MacOSX: amd64]
Compiled at 2020-09-21
Copyright (c) 2006-2020 by Andreas Rumpf

active boot switches: -d:release

The text was updated successfully, but these errors were encountered:

metagn · 2020-09-26T17:08:34Z

A solution would be to detect if BiggestInt can hold the integer value and if not, use BiggestUInt.

This would need another JsonNodeKind and branch of JsonNode specifically for uints, which would have no other use outside this specific case (integers between [2^63, 2^64)). A simple alternative is parsing to a float, which is technically what JS does. A JBigNum branch that stores a raw string of the integer is also too specialized, since it is only for integers that are too big and not, for example, floats that have more than 18 decimal digits.

disruptek · 2020-09-26T17:54:10Z

I’ve had this problem for ages. As far as I can tell, JSON doesn’t define a range for integers and unfortunately, we cannot complete a parse successfully when our native type cannot hold the JSON integer value.

The right solution fixes our stdlib, I think, but someone who cares about our hooks system should weigh in with a sympathetic design.

What do our other 3rd-party JSON libraries do?

Araq · 2020-09-27T03:59:21Z

We should parse to float for these cases. Or maybe pretend it's a string literal. UInt64 solves nothing.

juancarlospaco · 2020-09-27T09:20:48Z

Should be doable by fixing #14696 (comment)

jackhftang · 2020-10-04T14:32:53Z

It will be very tricky for caller to handle logic that normally a JInt is expected, but if the value greater than some value it is a JFloat and lose precision silently. IMHO, throwing an error make sense.

qbradley · 2020-10-04T15:59:19Z

If the library was generic on the integer type then the user could choose appropriate type for their use case including a bigint from nimble. Is it possible to make it generic with a default to BiggestInt without breaking compatibility?

timotheecour · 2020-10-04T18:43:45Z

D's std.json (see https://dlang.org/phobos/std_json.html) handles large uint's correctly; nim can handle them correctly too.

import std.stdio;
import std.json;
void main(string[]args){
  string s = "9223372036854775808";
  JSONValue j = parseJSON(s);
  assert(j.toString == "9223372036854775808");
  assert(j.type == JSONType.uinteger);
  assert(j.get!ulong == 9223372036854775808);
}

(note that https://github.com/dlang-community/std_data_json or https://vibed.org/api/vibe.data.json/ is more popular)

juancarlospaco · 2020-10-04T19:03:58Z

@timotheecour Whats JSONType.uinteger a Big Int basically?.
I think that silently losing information into floats is not good workaround, need Big Int on stdlib.

timotheecour · 2020-10-04T19:26:14Z

Whats JSONType.uinteger a Big Int basically?.

no, it's an enum member encoding the parsed type; see code here: https://github.com/dlang/phobos/blob/master/std/json.d

here are how these map to internal D types:

struct JSONValue
{
    import std.exception : enforce;

    union Store
    {
        string                          str;
        long                            integer;
        ulong                           uinteger;
        double                          floating;
        JSONValue[string]               object;
        JSONValue[]                     array;
    }
    private Store store;

and here's the parsing logic:

                    if (isNegative)
                    {
                        value.store.integer = parse!long(data);
                        value.type_tag = JSONType.integer;
                    }
                    else
                    {
                        // only set the correct union member to not confuse CTFE
                        ulong u = parse!ulong(data);
                        if (u & (1UL << 63))
                        {
                            value.store.uinteger = u;
                            value.type_tag = JSONType.uinteger;
                        }
                        else
                        {
                            value.store.integer = u;
                            value.type_tag = JSONType.integer;
                        }
                    }

which is hopefully self explanatory: it tries to fit into a integer (mapping to D's long, or nim's int) unless it's positive and doesn't fit into an integer, in which case it maps to uinteger (D's ulong, nim's uint)

I think that silently losing information into floats is not good workaround, need Big Int on stdlib.

float is a bad workaround here; and Big Int is not needed to solve this problem. We can adapt the above D solution which (hopefully) results in no breaking change; only parsed integers that don't fit in int would result in being encoded as uint (instead of raising ValueError as they do currently)

disruptek · 2020-10-04T19:30:20Z

How does that help? Sounds like a terrible product to use.

We need BigInt and I think it’s pretty clear why; the JSON spec demands it. Anything else is just going to require fixing later.

krux02 · 2020-10-09T12:19:12Z

Json numbers are not BigInt, they are arbitrary precision decimal numbers. The problem here is the same with string and BigInt, they require an additional tiny allocation and pointer indirection making json parsing even slower than it already is. So my recommendation is to not write to the decimal field unless absolutely necessary.

We should parse to float for these cases. Or maybe pretend it's a string literal. UInt64 solves nothing

WTF float? convert the number to a lossy type and then back to uint. What could ever go wrong with that one. String would work I agree, but also an additional uint64 field would fix the problem.

timotheecour · 2020-10-09T20:18:57Z

I have to agree with @krux02 here; Json numbers are not BigInt so using BigInt wouldn't solve the general problem of number, and most importantly would incur overhead in the common case; furthermore there is not ETA on std/bigints see #14696 (essentially, either it's a slow nim implementation/wrapper or it's a fast nim wrapper around GMP but which has LGPL licensing concerns, see #14696 (comment))

my proposal remains:

use uint64 exactly as done in D (for uint64 numbers too large to fit in int64) see JSON parsing fails for integer values greater than BiggestInt.max #15413 (comment)
for now users that need larger numbers than uint64/int64 range (ditto with large floats) must encode as string otherwise overflow exception is raised
eventually, once std/bigints materialize, the overflow exception would not be thrown for numbers that don't fit in uint64/int64
ditto with std/bigfloats (which could also be enabled by a GMP wrapper btw, since GMP implements both)

benefits:

can be implemented quickly, without waiting on std/bigints, std/bigfloats
no bigint overhead forced upon every number including the 99.99..% common case (in practice) of numbers that fit in int64/uint64
no breaking change nor performance breaking change once std/bigints becomes available

disruptek · 2020-10-10T16:58:17Z

I don't know why @krux02 brought up the number type; this issue is about the integer type.

I'm against changing json twice -- once to accomodate one extra bit, and again when we have a proper solution that meets the spec. If you can provide the API that won't necessitate an additional change later, then I'd support this change.

I'm curious why you think that we'll have bigint overhead for all integers. It seems to me that we'll know how many digits the integer has, so most smaller integers should be quite easy to accomodate natively without any bigint machinery.

timotheecour · 2020-10-10T19:55:48Z

uint64 in json are way more common than BigInt, for the simple reason that uint64 is a native type in most environments, so it makes sense to support it efficiently without incurring BigInt overhead, which can be large.

I don't see anything wrong with adding JsonNodeKind.JUInt for the range int64.high+..uint64.high and then once std/bigints is added, adding JsonNodeKind.JBigInt for the range of integers > uint64.high and < -int64.low.

Note: other libraries I found that support (at least) uint64 in json do as I suggest (more research welcome; by no means complete):

dlang does have a std.bigint module yet they opted for the JUInt approach I described, and for good reasons.
ditto with rust json: https://github.com/serde-rs/json/blob/master/src/number.rs

I'm against changing json twice

I don't follow this. No code will break and the mapping from a Number in any given range to a JsonNodeKind would not change as JUInt and JBigInt are added. Plus, to match the spec you'd anyway need to add a JBigFloat kind, whether you have BigInt or not.

PR

=> #15545

…64.high

disruptek · 2020-10-10T22:11:43Z

uint64 in json are way more common than BigInt, for the simple reason that uint64 is a native type in most environments, so it makes sense to support it efficiently without incurring BigInt overhead, which can be large.

I'm with you so far. 👍

I don't see anything wrong with adding JsonNodeKind.JUInt for the range int64.high+..uint64.high and then once std/bigints is added, adding JsonNodeKind.JBigInt for the range of integers > uint64.high and < -int64.low.

I know. But that doesn't mean there's nothing wrong with it, right? It just means you don't see it.

Once you start changing the semantics, you ruin the point of JSON. If you're using it in any kind of way beyond that which it is specified for, then basically you are doing it wrong. The whole idea is that we agree on what we know and don't know about the data.

I'm against changing json twice

I don't follow this. No code will break and the mapping from a Number in any given range to a JsonNodeKind would not change as JUInt and JBigInt are added. Plus, to match the spec you'd anyway need to add a JBigFloat kind, whether you have BigInt or not.

Not my problem. I didn't invent JSON nor did I write our implementation of it. I'm just trying to explain that you're going down the wrong path if you decide to attempt to ascribe more precision to types that are intentionally vague. This isn't adding abstraction, it's removing it.

Now, how are you not going to break my code when you add a new JsonNodeKind? I have exhaustive case statements for this enum.

krux02 · 2020-10-13T16:22:52Z

Now, how are you not going to break my code when you add a new JsonNodeKind? I have exhaustive case statements for this enum.

If the new enum value only occur if you encounter values that were impossible to use before, then I don't think it would break your code. As timothee suggested, the uint type should only be used for integer values that cannot be represented with int64.

I don't know why @krux02 brought up the number type; this issue is about the integer type.

Well to be precise, Json only has a one type for numbers. And these are arbitrary precision decimal types (I call it JNumber for now), because that is exactly what you can represent in ASCII. Nim's native types are fixed precision integer and floating point types with up to 64 bits of precision. On top of that Nim's floating point numbers (IEEE 754) also allow NaN and ±∞. This means neithe is JNumber a subset of any of the buildin Nim types, nor is float32/float64 a subset of JNumber and the implementation of Json now has to deal with this mess.

One advantage that we do know is, that almost all numbers that end up in json are serialized from some native number types, not hand written nor from some arbitrary precision library. So it totally makes sens to special case for all these builtin types to support them as good and fast as possible. So even if the Arbitrary precision number type would already be supported, it would still make sense to add this UInt special case for performance reasons.

disruptek · 2020-10-13T20:41:49Z

First, maybe you can explain what I'm missing in the following exhaustive case statement:

case kind
of JInt: discard
of JFloat: discard
of JString: discard
of JBool: discard
of JObject: discard
of JArray: discard
of JNull: discard

This will result in a compile-time error if you add a new value to JsonNodeKind.

I agree that the implementation is poor. Again, if you want my support, show me how the change does not break existing code. Alternatively, show how the change won't itself need to be changed in some breaking way in the future.

Honestly, I would sooner support a solution that is 100x slower, supports only one JNumber type, and does so for every conceivable JSON encoding. If you're going to break me, break me for the last time.

krux02 · 2020-10-13T20:51:57Z

Yes, sorry, you are right. For a moment I forgot that in Nim case statements fail to compile when they are not fully covered.

disruptek · 2020-10-13T20:54:27Z

What about JUndefined? It lets us change the spec just once. You can subsequently parse JUndefined with whatever assumption you wish to bring to bear upon the source.

Araq · 2020-10-14T14:41:47Z

JavaScript doesn't have 64 bit unsigned numbers at all, this discussion is silly. If we look at the spec, https://www.json.org/json-en.html it seems clear to me that any number must be accepted, including numbers much larger than high(uint64). I propose we add a new JSON kind, JRawNumber and parse it into a string value. Then clients can use a custom parseNumber proc and their favorite bignum library (or restrict it to uint64).

disruptek · 2020-10-14T14:51:42Z

Sure, but this is JSON, not JavaScript.

If we break with JUndefined then we may as well also consolidate into a single JNumber kind also.

timotheecour · 2020-10-14T21:46:43Z

I like JRawNumber if done in addition to JUInt, uint64 is a native type on most platforms and shouldn't incur performance cost

JUInt shall still be introduced (see fix #15413: support JSON in int64.high+1..uint64.high as JUint, and beyond as JNumber #15545) and ensure uint64 doesn't incur overhead, since it's arguably much more common than bigint/bigfloat
JRawNumber is also added as a fallback to represent all valid json numbers not already covered by the other number kinds (JInt, JUint, JFloat)
JRawNumber is represented as a string and json parser only checks that the string represent valid json number, no decoding happens at parse time
proc getRawNumberStr(j: JsonNode): string is added to json get APIs (no decoding happens here)
user can use parse the output of getRawNumberStr outside of std/json using a separate library for bigint/bigfloat

this gives best of both worlds:

performance is not sacrified for the common case (uint64 can be common for some applications eg uuids etc)
json spec is followed
json isn't bloated by an added dependency on some bigint/bigfloat library
only breaks the case statement once (JUInt + JRawNumber can be added in 1 PR or at least within 1 nim point release)

disruptek · 2020-10-14T23:13:46Z

Not a fan, but I'm thinking I'll be writing my own JSON parser, so don't let me stand in the way of "progress".

krux02 · 2020-10-14T23:51:49Z

@disruptek well, if you do that, you should probably start here: https://github.com/Araq/packedjson

dom96 · 2020-10-18T11:16:15Z

Am I understanding correctly that your plan is to break code by introducing JRawNumber? Can we come up with a transition period for this at least?

Araq · 2020-10-19T12:02:07Z

In the meantime I found a better solution: A parse flag so that "unparsable" numbers are mapped to JString.

jackhftang · 2020-10-19T14:21:50Z

but then you cannot distinguish

{ "n": 12345678901234567890 }

or

{ "n": "12345678901234567890" }

one way is to map to a new type JUnknown instead of JString, but I don't think it is a good solution.

…64.high

timotheecour · 2020-10-19T23:51:08Z

I've updated #15545 as described in #15413 (comment)

nim> parseJson(""" { "a1": 123456789012345676712345678} """)["a1"].kind
JNumber == type JsonNodeKind

nim> parseJson(""" { "a1": 123456789012345676712} """)["a1"].kind
JUInt == type JsonNodeKind

nim> parseJson(""" { "a1": 123456789012345676} """)["a1"].kind
JInt == type JsonNodeKind

EDIT: and see #15646 for my proposed approach to mitigate impact of enum changes.

…64.high

krux02 · 2020-10-21T15:27:29Z

but then you cannot distinguish

Yes true, but not really worth to worry about. "JString" is bad, but floating point numbers need to fall back to a string already anyway if they want to serialize infinity somehow. So I think "numbers might be in a string literal" is something I think is acceptable.

…64.high

…ge via special encoded JString

* fixes #15413 * better hide it properly * see if this makes our list of important packages happy (cherry picked from commit 87a60c1)

* fixes nim-lang#15413 * better hide it properly * see if this makes our list of important packages happy

Araq added Standard Library Low Priority Feature labels Sep 27, 2020

timotheecour added a commit to timotheecour/Nim that referenced this issue Oct 10, 2020

fix nim-lang#15413: Json now supports numbers in int64.high+1 .. uint…

c75470c

…64.high

timotheecour mentioned this issue Oct 10, 2020

fix #15413: support JSON in int64.high+1..uint64.high as JUint, and beyond as JNumber #15545

Closed

This was referenced Oct 13, 2020

sync: Does not produce a tests.toml file for grains exercism/configlet#39

Closed

Reopening the Problem Specifications repo exercism/problem-specifications#1674

Closed

timotheecour added a commit to timotheecour/Nim that referenced this issue Oct 19, 2020

fix nim-lang#15413: Json now supports numbers in int64.high+1 .. uint…

081c607

…64.high

timotheecour added a commit to timotheecour/Nim that referenced this issue Oct 20, 2020

fix nim-lang#15413: Json now supports numbers in int64.high+1 .. uint…

9bd38a4

…64.high

timotheecour added a commit to timotheecour/Nim that referenced this issue Oct 20, 2020

fix nim-lang#15413: Json now supports numbers in int64.high+1 .. uint…

360a6c1

…64.high

timotheecour added a commit to timotheecour/Nim that referenced this issue Oct 21, 2020

fix nim-lang#15413: Json now supports numbers in int64.high+1 .. uint…

4b25f3c

…64.high

timotheecour mentioned this issue Oct 25, 2020

new {.allowMissingCases.} pragma to allow growing enums #15646

Closed

Araq added a commit that referenced this issue Oct 29, 2020

fixes #15413

c1f58d3

Araq closed this as completed in 87a60c1 Oct 29, 2020

timotheecour added a commit to timotheecour/Nim that referenced this issue Oct 29, 2020

fix nim-lang#15413 JsonNode now supports numbers outside of int64 ran…

a2ec7ee

…ge via special encoded JString

narimiran pushed a commit that referenced this issue Nov 5, 2020

fixes #15413 (#15768)

645200a

* fixes #15413 * better hide it properly * see if this makes our list of important packages happy (cherry picked from commit 87a60c1)

c-blake mentioned this issue Nov 20, 2020

Speed up parsejson 3.25x (with a gcc-10.2 PGO build) on number heavy input #16055

Closed

PMunch pushed a commit to PMunch/Nim that referenced this issue Jan 6, 2021

fixes nim-lang#15413 (nim-lang#15768)

39f8bcd

* fixes nim-lang#15413 * better hide it properly * see if this makes our list of important packages happy

mildred pushed a commit to mildred/Nim that referenced this issue Jan 11, 2021

fixes nim-lang#15413 (nim-lang#15768)

e8f1f2d

* fixes nim-lang#15413 * better hide it properly * see if this makes our list of important packages happy

irdassis pushed a commit to irdassis/Nim that referenced this issue Mar 16, 2021

fixes nim-lang#15413 (nim-lang#15768)

5ca5207

* fixes nim-lang#15413 * better hide it properly * see if this makes our list of important packages happy

ardek66 pushed a commit to ardek66/Nim that referenced this issue Mar 26, 2021

fixes nim-lang#15413 (nim-lang#15768)

afeea05

* fixes nim-lang#15413 * better hide it properly * see if this makes our list of important packages happy

This was referenced Mar 23, 2022

feat: Added new PasswordStrengthScore method status-im/status-go#2599

Merged

feat: Added new function to call to new status-go GetPasswordStrengthScore function status-im/nim-status-go#11

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JSON parsing fails for integer values greater than BiggestInt.max #15413

JSON parsing fails for integer values greater than BiggestInt.max #15413

ErikSchierboom commented Sep 26, 2020

metagn commented Sep 26, 2020

disruptek commented Sep 26, 2020

Araq commented Sep 27, 2020 •

edited

Loading

juancarlospaco commented Sep 27, 2020

jackhftang commented Oct 4, 2020

qbradley commented Oct 4, 2020

timotheecour commented Oct 4, 2020

juancarlospaco commented Oct 4, 2020 •

edited

Loading

timotheecour commented Oct 4, 2020 •

edited

Loading

disruptek commented Oct 4, 2020

krux02 commented Oct 9, 2020 •

edited

Loading

timotheecour commented Oct 9, 2020 •

edited

Loading

disruptek commented Oct 10, 2020

timotheecour commented Oct 10, 2020 •

edited

Loading

disruptek commented Oct 10, 2020

krux02 commented Oct 13, 2020 •

edited

Loading

disruptek commented Oct 13, 2020

krux02 commented Oct 13, 2020

disruptek commented Oct 13, 2020

Araq commented Oct 14, 2020 •

edited

Loading

disruptek commented Oct 14, 2020

timotheecour commented Oct 14, 2020 •

edited

Loading

disruptek commented Oct 14, 2020

krux02 commented Oct 14, 2020

dom96 commented Oct 18, 2020

Araq commented Oct 19, 2020

jackhftang commented Oct 19, 2020

timotheecour commented Oct 19, 2020 •

edited

Loading

krux02 commented Oct 21, 2020 •

edited

Loading

JSON parsing fails for integer values greater than BiggestInt.max #15413

JSON parsing fails for integer values greater than BiggestInt.max #15413

Comments

ErikSchierboom commented Sep 26, 2020

Example

Current Output

Expected Output

Possible Solution

Additional Information

metagn commented Sep 26, 2020

disruptek commented Sep 26, 2020

Araq commented Sep 27, 2020 • edited Loading

juancarlospaco commented Sep 27, 2020

jackhftang commented Oct 4, 2020

qbradley commented Oct 4, 2020

timotheecour commented Oct 4, 2020

juancarlospaco commented Oct 4, 2020 • edited Loading

timotheecour commented Oct 4, 2020 • edited Loading

disruptek commented Oct 4, 2020

krux02 commented Oct 9, 2020 • edited Loading

timotheecour commented Oct 9, 2020 • edited Loading

disruptek commented Oct 10, 2020

timotheecour commented Oct 10, 2020 • edited Loading

PR

disruptek commented Oct 10, 2020

krux02 commented Oct 13, 2020 • edited Loading

disruptek commented Oct 13, 2020

krux02 commented Oct 13, 2020

disruptek commented Oct 13, 2020

Araq commented Oct 14, 2020 • edited Loading

disruptek commented Oct 14, 2020

timotheecour commented Oct 14, 2020 • edited Loading

disruptek commented Oct 14, 2020

krux02 commented Oct 14, 2020

dom96 commented Oct 18, 2020

Araq commented Oct 19, 2020

jackhftang commented Oct 19, 2020

timotheecour commented Oct 19, 2020 • edited Loading

krux02 commented Oct 21, 2020 • edited Loading

Araq commented Sep 27, 2020 •

edited

Loading

juancarlospaco commented Oct 4, 2020 •

edited

Loading

timotheecour commented Oct 4, 2020 •

edited

Loading

krux02 commented Oct 9, 2020 •

edited

Loading

timotheecour commented Oct 9, 2020 •

edited

Loading

timotheecour commented Oct 10, 2020 •

edited

Loading

krux02 commented Oct 13, 2020 •

edited

Loading

Araq commented Oct 14, 2020 •

edited

Loading

timotheecour commented Oct 14, 2020 •

edited

Loading

timotheecour commented Oct 19, 2020 •

edited

Loading

krux02 commented Oct 21, 2020 •

edited

Loading