Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Access] Add registerDB pruning module #6397

Open
wants to merge 55 commits into
base: master
Choose a base branch
from

Conversation

UlyanaAndrukhiv
Copy link
Contributor

Closes #6068

In this PR:

  • Implemented pruner module for registerDB which will ensure that unneeded pruned data is removed from the db, freeing up disk space.
  • Integrated pruner into Access and Observer nodes.
  • Added metrics.
  • Added functional tests .

@codecov-commenter
Copy link

codecov-commenter commented Aug 26, 2024

Codecov Report

Attention: Patch coverage is 46.78363% with 182 lines in your changes missing coverage. Please review.

Project coverage is 42.65%. Comparing base (899e12e) to head (4961567).

Files with missing lines Patch % Lines
cmd/observer/node_builder/observer_builder.go 0.00% 66 Missing ⚠️
cmd/access/node_builder/access_node_builder.go 0.00% 63 Missing ⚠️
storage/pebble/registers_pruner.go 72.58% 25 Missing and 9 partials ⚠️
storage/pebble/db_pruner.go 76.59% 8 Missing and 3 partials ⚠️
storage/pebble/operation/registers.go 0.00% 8 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6397      +/-   ##
==========================================
+ Coverage   41.20%   42.65%   +1.44%     
==========================================
  Files        2052     1651     -401     
  Lines      182191   149190   -33001     
==========================================
- Hits        75075    63639   -11436     
+ Misses     100824    80169   -20655     
+ Partials     6292     5382     -910     
Flag Coverage Δ
unittests 42.65% <46.78%> (+1.44%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

node.Logger,
builder.RegisterDB,
pstorage.WithPrunerMetrics(builder.RegisterDBPrunerMetrics),
//pstorage.WithPruneThreshold(builder.registerDBPruneThreshold),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WithPruneThreshold is temporarily commented out and will be re-enabled once PR #6345 is merged.

Copy link
Contributor

@peterargue peterargue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work! I haven't finished reviewing everything, but here are my comments so far.

storage/pebble/registers_pruner.go Outdated Show resolved Hide resolved
storage/pebble/registers_pruner.go Outdated Show resolved Hide resolved
storage/pebble/registers_pruner.go Outdated Show resolved Hide resolved
storage/pebble/registers_pruner.go Outdated Show resolved Hide resolved
storage/pebble/registers_pruner.go Outdated Show resolved Hide resolved
storage/pebble/registers_pruner.go Outdated Show resolved Hide resolved
storage/pebble/registers_pruner.go Outdated Show resolved Hide resolved
storage/pebble/registers_pruner.go Outdated Show resolved Hide resolved
storage/pebble/registers_pruner.go Outdated Show resolved Hide resolved
storage/pebble/registers_pruner.go Outdated Show resolved Hide resolved
@Guitarheroua Guitarheroua marked this pull request as ready for review October 9, 2024 10:15
storage/pebble/registers_pruner.go Outdated Show resolved Hide resolved
storage/pebble/registers_pruner.go Outdated Show resolved Hide resolved
storage/pebble/registers_pruner.go Show resolved Hide resolved
Comment on lines 232 to 233
// Keep the first entry found for this registerID that is <= pruneHeight
keepKeyFound = true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ we need to make sure we only prune registers if there is exactly one register updated at between the height range [this register's height + 1, last pruned height].

This pruning logic also depends on the key iteration direction. Do we iterate a register in increasing height order or decreasing order? We should make this assumption explicit.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is needed to implement the requirement from issue #6068:

For each register prefix, find the first key whose height is less than or equal to the pruning height. This is the earliest entry to keep.

For example, here’s how we currently iterate if pruneHeight is 99989:

  • [0x01/key/owner1/99990] [keep, > 99989]
  • [0x01/key/owner1/99988] [first key to keep < 99989]
  • [0x01/key/owner1/85000] [remove]
  • ...
  • [0x01/key/owner2/99989] [first key to keep == 99989]
  • [0x01/key/owner2/99988] [remove]
  • ...
  • [0x01/key/owner3/99988] [first key to keep < 99989]
  • [0x01/key/owner3/98001] [remove]

I simplified the logic a bit, renamed some variables, and added comments in this commit for more clarity.

Or maybe I misunderstood your concerns about the logic?

module/metrics.go Outdated Show resolved Hide resolved
module/metrics.go Outdated Show resolved Hide resolved
Comment on lines 930 to 933
NumberOfRowsPruned(rows uint64)

// ElementVisited records the element that were visited during the pruning operation.
ElementVisited()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking about how we monitor the pruning.

We are probably interested:

  1. The last pruned height. It's a useful metrics so that we know the script query below this height would fail.
  2. Is it pruning right now. If yes, what's the progress percentage. (e.g 5%, 50%).

We are probably not interested:

  1. how many actual registers are pruned. If we are interested, we can estimated by looking at how often the last pruning progress changes, since pruning is done by batch delete, and each batch has the same size.

NumberOfRowsPruned tells us it's pruning, but it doesn't tell us the progress.
ElementVisited also tell su it's pruning, but the actual number is not very meaningful.

I think we could consider just measure one metrics:LatestPrunedHeightWithProgressPercentage, it could be just a uint64 value.

So if the metrics shows 8923910015, then it means it's pruning, and the last pruned height 89239100 and the progress is 15% in the existing pruning iteration. Once the current pruning iteration is completed, the metrics will become 8924910000, which means we pruned from 89239100 to 89249100 (100%).

Now the question is, how do we know the 15% pruning progress?

We can estimate that by checking the first few hex chars in the register ID key, since we are iterate all keys in a certain order and assuming the keys are distributed evenly, the first few hex chars basically divides all registers into different buckets, and we can calculate the percentage from that.

With only one single metrics, we could reduce the impact to the key iteration.

for k := range data {
keys = append(keys, k)
}
sort.Slice(keys, func(i, j int) bool { return keys[i] < keys[j] })
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should not matter in which order we store the keys in pebble. Pebble is supposed to store keys in a sorted order for iteration.

And we also don't require the call to sort registers before storing them.

Suggested change
sort.Slice(keys, func(i, j int) bool { return keys[i] < keys[j] })

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should sort the keys, as this test data is retrieved from a map where the key is the height, and the entries are out of order. Before storing them in the DB one by one using Registers::Store, they should be sorted from lowest to highest height.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, Store requires to be called from low height to higher height.

@Guitarheroua Guitarheroua requested review from Guitarheroua and zhangchiqing and removed request for Guitarheroua November 4, 2024 17:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Access] Add registerDB pruning module
5 participants