Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update benchmark sampling method and statistics #282

Merged
merged 1 commit into from
Aug 22, 2023

Conversation

popematt
Copy link
Contributor

@popematt popematt commented Aug 1, 2023

Issue #, if available:

While not necessarily a fix, this does make significant progress on #265

Description of changes:

  • Updates/fixes the sampling methodology so that it's properly running samples of 1-2 seconds each.
  • Adds a SampleDist class that helps compute statistics for a sample distribution
  • Drops the "rate" metrics and adds operations per second metrics.
  • Drops the percentile stats and adds error and standard deviation.
  • Refactors the report generator to make it less spaghetti-like
  • Updates the table output format so that it can be directly copy-pasted to github as a markdown table.
  • Updates the compare command so that the fields argument uses the same field names and syntax as the other commands.
  • Drops support for Python 3.7 (which is EOL) because the statistics.NormalDist class requires at least Python 3.8.

The perf testing workflow is expected to fail as I have not updated that. It will come in the next PR.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

# print(_x, end=",")
# print("")

print(tabulate(report, tablefmt='pipe', headers='keys', floatfmt='.2f'))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the new table format, you can copy/paste directly from the terminal to e.g. a github comment, and this is what you will see:

name time_mean(ns) time_error(ns) ops/s_mean ops/s_error
(cbor,dumps,cat.cbor) 621.56 6.53 1609325.62 16351.13
(cbor2,dumps,cat.cbor) 2559.77 20.53 390726.48 3086.54
(json,dumps,cat.json) 1619.62 6.14 617453.26 2341.06
(ujson,dumps,cat.json) 698.16 3.01 1432404.24 6223.26
(ion_text,dumps,cat.ion) 9420.41 84.96 106175.15 938.70
(ion_binary,dumps,cat.ion) 10463.80 59.24 95575.76 541.19
Ion Binary, pure Python impl 263795.71 1908.30 3791.34 27.22
Ion Text, pure Python impl 107178.88 1369.13 9334.16 115.81
(self_describing_protobuf,dumps,cat.sd_protobuf_data) 3974.29 55.22 251742.58 3371.29
(protobuf,dumps,cat.protobuf_data) 285.56 2.89 3502871.19 34420.39

Note that this suffers from false precision (just like JMH reports do). The number formatting needs to be fixed to show a specific number of significant figures—possibly dependent on the error value for the results.

Copy link
Contributor

@tgregg tgregg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. The autorange feature was a good find.

@popematt popematt merged commit c73cf2a into amazon-ion:master Aug 22, 2023
8 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants