Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up parsejson 3.25x (with a gcc-10.2 PGO build) on number heavy input #16055

Closed
wants to merge 16 commits into from

Commits on Nov 19, 2020

  1. Speed up parsejson 3.25x (with a gcc-10.2 PGO build) on number heavy …

    …input.
    
    Also add a couple convenience iterators and update `changelo.md`.
    
    The basic idea here is to process the ASCII just once in a combination
    validation/classification/parse rather than validating/classifying a
    number in `parsejson.parseNumber`, then doing the same work again in
    `parseFloat`/`parseInt` and finally doing it a 3rd time in the C library
    `strtod`/etc.  The last is especially slow on some libc's/common CPUs
    due to `long double`/80-bit extended precision arithmetic on each digit.
    In any event, the new fully functional `parseNumber` is not actually much
    more code than the old classifying-only `parseNumber`.
    
    Because we aim to do this work just once and the output of the work is
    an int64|float, this PR adds those exported fields to `JsonParser`.  (It
    also documents two existing but undocumented fields.)
    
    One simple optimization done here is pow10.  It uses a small lookup
    table for the power of 10 multiplier.  The full 2048*8B = 16 KiB one is
    bigger than is useful.  In truth most numbers in any given run will
    likely be of similar orders of magnitude, meaning the cache cost would
    not be heavy, but probably best to not rely only likelihood.  So fall
    back to a fast integer-exponentiation algorithm when out of range.
    
    The new `parseNumber` itself is more or less a straight-line parse of
    scientific notation where the '.' can be anywhere in the number.  To do
    the right power-of-10 scaling later, we need to bookkeep a little more
    than the old `parseNumber` as noted in side comments in the code.
    
    Note that calling code that always wants a `float` even if the textual
    representation looks like an integer can do something like `if p.tok ==
    tkInt: float(p.i) else: p.f` or similar, depending on context.
    
    Note that since we form the final float by integer * powerOf10 and since
    powers of 10 have some intrinsic binary representational error and since
    some alternative approaches (on just some CPUs) use 80-bit floats, it is
    possible for the number to mismatch those other approaches in the least
    significant mantissa bit (or for the ASCII->binary->ASCII round-trip to
    not be the identity).  On those CPUs only, better matching results can
    maybe be gotten from an `emit` using `long double` if desired (also with
    a long double table for powers of 10 and the powers of 10 calculation).
    This strikes me as unlikely to be truly needed long-term, though.
    c-blake committed Nov 19, 2020
    Configuration menu
    Copy the full SHA
    0e24c5e View commit details
    Browse the repository at this point in the history

Commits on Nov 20, 2020

  1. Make the default mode quite a bit faster with setLen+copyMem instead of

    slice assignment.  Within 1.03x of non-copy mode in a PGO build (1.10x
    if the caller must save the last valid string).  One might think that
    this argues for ditching `strFloats` & `strIntegers` entirely and just
    always copying.  In some sense, it does.
    
    However, the non-copy mode is a useful common case here, as shown in
    `jsonTokens(string)` example code.  So, it is a bit of a judgement call
    whether supporting that last 10% of perf in a useful common case matters.
    c-blake committed Nov 20, 2020
    Configuration menu
    Copy the full SHA
    6cb3375 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    46d5f21 View commit details
    Browse the repository at this point in the history
  3. Gack. I guess one needs defined(nimscript) not nimvm.

    (Someone should make `copyMem` just work everywhere, IMO.)
    c-blake committed Nov 20, 2020
    Configuration menu
    Copy the full SHA
    ad4707c View commit details
    Browse the repository at this point in the history
  4. Ok. This jsOrVmBlock works for streams so it should work here.

    Also, extract mess into a template.
    c-blake committed Nov 20, 2020
    Configuration menu
    Copy the full SHA
    b802eba View commit details
    Browse the repository at this point in the history
  5. Fix empty container problem.

    c-blake committed Nov 20, 2020
    Configuration menu
    Copy the full SHA
    0e66b44 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    05a7350 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    136fe92 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    7fd9287 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    5b98112 View commit details
    Browse the repository at this point in the history
  10. Fix integer overflow errors. 18 digits is still plenty (IEEE 64-bit

    float has only 15.95 decimal digits).
    c-blake committed Nov 20, 2020
    Configuration menu
    Copy the full SHA
    61f39cd View commit details
    Browse the repository at this point in the history
  11. Oops..fix comment, too.

    c-blake committed Nov 20, 2020
    Configuration menu
    Copy the full SHA
    822d91c View commit details
    Browse the repository at this point in the history
  12. This may pass all tests, but it is preliminary to see if, in fact,

    this is the only remaining problem.  Before merging we should make
    overflow detection more precise than just a digit count for both
    `tkInt` and `tkFloat`.
    c-blake committed Nov 20, 2020
    Configuration menu
    Copy the full SHA
    88d9845 View commit details
    Browse the repository at this point in the history

Commits on Nov 23, 2020

  1. Configuration menu
    Copy the full SHA
    7a076c2 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    24192b5 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    e8c234f View commit details
    Browse the repository at this point in the history