Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Long-term scheduled transactions State API #16144

Open
wants to merge 8 commits into
base: develop
Choose a base branch
from

Conversation

iwsimon
Copy link
Contributor

@iwsimon iwsimon commented Oct 24, 2024

Description:

Define Schema for block stream-friendly state organization.
Add migrations from existing state.
Update readable/writable stores to use new states

Related issue(s):

Fixes #16059
Fixes #16068
Fixes #16069

Notes for reviewer:

Checklist

  • Documented (Code comments, README, etc.)
  • Tested (unit, integration, etc.)

…ization.

Add migrations from existing state.
Update readable/writable stores to use new states

Signed-off-by: Iris Simon <[email protected]>
@iwsimon iwsimon added this to the v0.56 milestone Oct 24, 2024
@iwsimon iwsimon self-assigned this Oct 24, 2024
@iwsimon iwsimon requested review from a team and jsync-swirlds as code owners October 24, 2024 14:56
@iwsimon iwsimon requested a review from kimbor October 24, 2024 14:56
Copy link

codecov bot commented Oct 24, 2024

Codecov Report

Attention: Patch coverage is 91.35802% with 7 lines in your changes missing coverage. Please review.

Project coverage is 58.14%. Comparing base (f5d3e52) to head (5c06cec).

Files with missing lines Patch % Lines
...ra/node/app/blocks/impl/KVStateChangeListener.java 0.00% 3 Missing ⚠️
...ice/schedule/impl/schemas/V0560ScheduleSchema.java 95.34% 0 Missing and 2 partials ⚠️
...rvice/schedule/impl/WritableScheduleStoreImpl.java 93.75% 0 Missing and 1 partial ⚠️
.../schedule/impl/handlers/ScheduleCreateHandler.java 66.66% 0 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             develop   #16144   +/-   ##
==========================================
  Coverage      58.14%   58.14%           
- Complexity     19848    19854    +6     
==========================================
  Files           2730     2731    +1     
  Lines         100084   100138   +54     
  Branches       10337    10339    +2     
==========================================
+ Hits           58189    58223   +34     
- Misses         38273    38291   +18     
- Partials        3622     3624    +2     
Files with missing lines Coverage Δ
...om/hedera/node/app/blocks/impl/BlockImplUtils.java 79.24% <100.00%> (+0.39%) ⬆️
...rvice/schedule/impl/ReadableScheduleStoreImpl.java 100.00% <100.00%> (ø)
...app/service/schedule/impl/ScheduleServiceImpl.java 100.00% <100.00%> (ø)
...pp/service/schedule/impl/ScheduleStoreUtility.java 64.28% <100.00%> (-32.78%) ⬇️
...rvice/schedule/impl/WritableScheduleStoreImpl.java 98.33% <93.75%> (-1.67%) ⬇️
.../schedule/impl/handlers/ScheduleCreateHandler.java 66.17% <66.66%> (-0.50%) ⬇️
...ice/schedule/impl/schemas/V0560ScheduleSchema.java 95.34% <95.34%> (ø)
...ra/node/app/blocks/impl/KVStateChangeListener.java 23.85% <0.00%> (-0.68%) ⬇️

... and 8 files with indirect coverage changes

Impacted file tree graph

Copy link

codacy-production bot commented Oct 24, 2024

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
+0.00% (target: -1.00%) 96.30%
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (f5d3e52) 99901 61674 61.74%
Head commit (5c06cec) 99955 (+54) 61710 (+36) 61.74% (+0.00%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#16144) 81 78 96.30%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

Codacy stopped sending the deprecated coverage status on June 5th, 2024. Learn more

Signed-off-by: Iris Simon <[email protected]>
Signed-off-by: Iris Simon <[email protected]>
Copy link

github-actions bot commented Oct 24, 2024

Node: HAPI Test (Restart) Results

9 files  1 errors  8 suites   7m 34s ⏱️
7 tests 7 ✅ 0 💤 0 ❌
8 runs  8 ✅ 0 💤 0 ❌

For more details on these parsing errors, see this check.

Results for commit 7ea084b.

♻️ This comment has been updated with latest results.

Copy link
Member

@jsync-swirlds jsync-swirlds left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the change from a list of candidate duplicate schedules to a single candidate duplicate schedule is correct.
The partial "hash" used is not robust, and collisions are quite likely, so we should still have a list (perhaps of ID values instead of whole schedules) keyed to the hash value, and check all of them when a new schedule might be a duplicate.

Edit: The hash function used is better than I thought, so this is incorrect, but see below for an even more effective alternative that avoids needing a hash function entirely.

private final ReadableKVState<ProtoLong, ScheduleList> schedulesByExpirationSecond;
private final ReadableKVState<ProtoBytes, ScheduleList> schedulesByStringHash;
private final ReadableKVState<ProtoLong, ScheduleIdList> scheduleIdsByExpirationSecond;
private final ReadableKVState<ProtoBytes, Schedule> schedulesByStringHash;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

String hash values can and will collide (it's only a partial hash, not comprehensive). Changing this to a single schedule will miss duplications. That's why we stored the whole schedule originally; storing a list of ID values might be acceptable as an alternative to storing the whole schedule.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is explained below.

private @Nullable Schedule maybeDuplicate(
@NonNull final Schedule schedule, @Nullable final List<Schedule> duplicates) {
if (duplicates == null) {
private @Nullable Schedule maybeDuplicate(@NonNull final Schedule schedule, @Nullable final Schedule duplicate) {
Copy link
Member

@jsync-swirlds jsync-swirlds Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should still handle hash collisions, the hash implementation is not robust enough to guarantee no collisions, so it is entirely likely that multiple schedules will match the hash, and need to be checked for duplication.

Edit: This is incorrect, the hash function used was better than I understood, but we still have a better option that avoids relying on probabilistic factors at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is explained below.

@@ -56,11 +56,11 @@ public interface ReadableScheduleStore {
* @return a {@code List<Schedule>} of entries that have the same hash as the provided schedule
*/
@Nullable
List<Schedule> getByEquality(@NonNull Schedule scheduleToMatch);
Schedule getByEquality(@NonNull Schedule scheduleToMatch);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should still be a list (even if it's changed to just be ID values).
The "duplication" hash of the schedule is not enough to know if it's a duplicate, and multiple "candidate" duplicate schedules may have the same "duplication" hash value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is explained below.

@iwsimon
Copy link
Contributor Author

iwsimon commented Oct 25, 2024

I don't think the change from a list of candidate duplicate schedules to a single candidate duplicate schedule is correct. The partial "hash" used is not robust, and collisions are quite likely, so we should still have a list (perhaps of ID values instead of whole schedules) keyed to the hash value, and check all of them when a new schedule might be a duplicate.

Thanks for reviewing this PR @jsync-swirlds. After @tinker-michaelj PR #16053, schedule service code has been updated.
scheduleByEqualityMutable is only updated during WritableScheduleStoreImpl.put(), which are called in ScheduleCreateHandler.handle() and ScheduleSignHandler.handle(). In ScheduleSignHandler, we handle the same Schedule. In ScheduleCreateHandler, we use maybeDuplicate() to double check with the same attributes as in ScheduleStoreUtility.calculateBytesHash() of a Schedule, if it's duplicate it will throw HandleException, otherwise it gets put in scheduleByEqualityMutable. You will never have more than one Schedule with same ProtoBytes in scheduleByEqualityMutable.

@iwsimon iwsimon closed this Oct 25, 2024
@iwsimon iwsimon reopened this Oct 25, 2024
@jsync-swirlds
Copy link
Member

jsync-swirlds commented Oct 25, 2024

I don't think the change from a list of candidate duplicate schedules to a single candidate duplicate schedule is correct. The partial "hash" used is not robust, and collisions are quite likely, so we should still have a list (perhaps of ID values instead of whole schedules) keyed to the hash value, and check all of them when a new schedule might be a duplicate.

Thanks for reviewing this PR @jsync-swirlds. After @tinker-michaelj PR #16053, schedule service code has been updated. scheduleByEqualityMutable is only updated during WritableScheduleStoreImpl.put(), which are called in ScheduleCreateHandler.handle() and ScheduleSignHandler.handle(). In ScheduleSignHandler, we handle the same Schedule. In ScheduleCreateHandler, we use maybeDuplicate() to double check with the same attributes as in ScheduleStoreUtility.calculateBytesHash() of a Schedule, if it's duplicate it will throw HandleException, otherwise it gets put in scheduleByEqualityMutable. You will never have more than one Schedule with same ProtoBytes in scheduleByEqualityMutable.

After multiple discussions, and a discussion in slack about how degenerate inputs are handled by various hash functions, a better approach presents itself (thanks to a suggestion from Leemon).

We can take the three fields we compare, and just concatenate those values together for the ProtoBytes key, then make ScheduleID the value. That eliminates the secondary comparison (if the key matches, the values are identical), and the scheduleID in the value is exactly what we need to add in the Block Stream/Record Receipt for the "duplicate schedule" response. Overall we should end up saving quite a bit of space on average, and even worst-case we'll be saving more than 50 bytes for each map entry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants