Speed up parsejson 3.25x (with a gcc-10.2 PGO build) on number heavy input #16055

…input. Also add a couple convenience iterators and update `changelo.md`. The basic idea here is to process the ASCII just once in a combination validation/classification/parse rather than validating/classifying a number in `parsejson.parseNumber`, then doing the same work again in `parseFloat`/`parseInt` and finally doing it a 3rd time in the C library `strtod`/etc. The last is especially slow on some libc's/common CPUs due to `long double`/80-bit extended precision arithmetic on each digit. In any event, the new fully functional `parseNumber` is not actually much more code than the old classifying-only `parseNumber`. Because we aim to do this work just once and the output of the work is an int64|float, this PR adds those exported fields to `JsonParser`. (It also documents two existing but undocumented fields.) One simple optimization done here is pow10. It uses a small lookup table for the power of 10 multiplier. The full 2048*8B = 16 KiB one is bigger than is useful. In truth most numbers in any given run will likely be of similar orders of magnitude, meaning the cache cost would not be heavy, but probably best to not rely only likelihood. So fall back to a fast integer-exponentiation algorithm when out of range. The new `parseNumber` itself is more or less a straight-line parse of scientific notation where the '.' can be anywhere in the number. To do the right power-of-10 scaling later, we need to bookkeep a little more than the old `parseNumber` as noted in side comments in the code. Note that calling code that always wants a `float` even if the textual representation looks like an integer can do something like `if p.tok == tkInt: float(p.i) else: p.f` or similar, depending on context. Note that since we form the final float by integer * powerOf10 and since powers of 10 have some intrinsic binary representational error and since some alternative approaches (on just some CPUs) use 80-bit floats, it is possible for the number to mismatch those other approaches in the least significant mantissa bit (or for the ASCII->binary->ASCII round-trip to not be the identity). On those CPUs only, better matching results can maybe be gotten from an `emit` using `long double` if desired (also with a long double table for powers of 10 and the powers of 10 calculation). This strikes me as unlikely to be truly needed long-term, though.

slice assignment. Within 1.03x of non-copy mode in a PGO build (1.10x if the caller must save the last valid string). One might think that this argues for ditching `strFloats` & `strIntegers` entirely and just always copying. In some sense, it does. However, the non-copy mode is a useful common case here, as shown in `jsonTokens(string)` example code. So, it is a bit of a judgement call whether supporting that last 10% of perf in a useful common case matters.

…yMem`.

(Someone should make `copyMem` just work everywhere, IMO.)

Also, extract mess into a template.

…ields.

…ions.

float has only 15.95 decimal digits).

this is the only remaining problem. Before merging we should make overflow detection more precise than just a digit count for both `tkInt` and `tkFloat`.

nim-lang#16055 (comment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up parsejson 3.25x (with a gcc-10.2 PGO build) on number heavy input #16055

Speed up parsejson 3.25x (with a gcc-10.2 PGO build) on number heavy input #16055

Commits on Nov 19, 2020

Commits on Nov 20, 2020

Commits on Nov 23, 2020