Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Access] Add registerDB pruning module #6068

Open
5 of 6 tasks
Tracked by #6065
peterargue opened this issue Jun 11, 2024 · 8 comments · May be fixed by #6397
Open
5 of 6 tasks
Tracked by #6065

[Access] Add registerDB pruning module #6068

peterargue opened this issue Jun 11, 2024 · 8 comments · May be fixed by #6397
Assignees

Comments

@peterargue
Copy link
Contributor

peterargue commented Jun 11, 2024

Problem Definition

This is the second step to enabling pruning on the registers db, and depends on #6066.

Now that db queries will only include responses for unpruned heights, we can layer on the pruning module. This module will ensure that unneeded pruned data is removed from the db, freeing up disk space.

Proposed Solution

Configuration:

  • pruneThreshold ⇒ number of blocks below the latestHeight should be kept
  • pruneInterval ⇒ number of pruned blocks in the db, above which pruning should be triggered

Steps:

  1. Periodically scan through all rows in the db

    • the scan should be triggered when pruneHeight - firstHeight > pruneInterval
      • Where pruneHeight = latestHeight - pruneThreshold is the lowest height to keep
    • heights are stored in the database in 1’s complement, so entries are in descending height order
    • when the pruner starts, update the firstHeight entry in the db to be the prune height. This ensures that any entries below will not be used on subsequent startups, avoiding consistency problems if the node were to crash or restart mid pruning cycle
  2. For each register prefix, find the first key who’s height is less than or equal to the prune height. This is the earliest entry to keep

  3. Delete all rows with the same key prefix with lower heights

    💡 Note: we will want to throttle how quickly the pruner cycles through keys to manage resource consumption.

    For example
    If latestHeight is 99999 and pruneThreshold is 10, then we will remove entries for all heights less than 99989
    [0x01/key/path/1/99999] [keep, ≥ 99989]
    [0x01/key/path/1/99990] [keep, ≥ 99989]
    [0x01/key/path/1/99988] [keep, first row < 99989]
    [0x01/key/path/1/85000] [remove]
    [0x01/key/path/2/99989] [keep, ≥ 99989]
    [0x01/key/path/2/99988] [remove]
    [0x01/key/path/3/99988] [keep, first row < 99989]
    [0x01/key/path/3/98001] [remove]
    [0x02/key/path/0/99900] [keep, first row < 99989]

Considerations

  • We will need to investigate how pebble handles garbage collection and tune configuration appropriately to strike a balance between reclaiming space, and CPU/IO used for rewriting files
  • We will similarly need to find a balance between speed/frequency of pruning and CPU/IO to reduce the impact on system performance caused by pruning. It’s OK for it to take a long time as long as it can process faster than entries expire. Otherwise, we’ll need some additional optimization which may trade off some memory for faster pruning.
  • Since pruning will run relatively slowly, there will be new entries written to the db during the process. The logic will need to handle this to ensure that it does complete in a reasonable amount of time.

Definition of Done

  1. There is a new CLI flag for pruneInterval and pruneThrottleDelay which sets a delay the pruner should use while iterating the db to reduce load on the overall system.
  2. The pruning module runs when enabled
  3. There are metrics covering the following at a minimum
    • pruning runtime
    • number of rows pruned
    • number of blocks pruned
    • number of elements visited
  4. There are unit and integration tests covering the module and expected behavior

Tasks

@Guitarheroua
Copy link
Collaborator

Guitarheroua commented Aug 14, 2024

I fixed the example a bit for clarity:

[0x01/key/path/1/99999] [keep, > 99989]
[0x01/key/path/1/99990] [keep, > 99989]
[0x01/key/path/1/99988] [keep, first row < 99989]
[0x01/key/path/1/85000] [remove]
...
[0x01/key/path/2/99989] [keep, first row == 99989]
[0x01/key/path/2/99988] [remove]
...
[0x01/key/path/3/99988] [keep, first row < 99989]
[0x01/key/path/3/98001] [remove]
...
[0x02/key/path/0/99900] [keep, first row < 99989]

@UlyanaAndrukhiv
Copy link
Contributor

UlyanaAndrukhiv commented Aug 14, 2024

The raw implementation steps

New CLI flags

The Access and Observer nodes should have new command arguments:
--registerdb-pruning-enabled - to turn the feature on or off.
--registerdb-prune-interval - to set a number of pruned blocks in the db, above which pruning should be triggered
--registerdb-prune-throttle-delay- to set a delay the pruner which should use while iterating the db to reduce load on the overall system.

RegisterDBPruner module

The module should be initialized with a set of configuration atomic options and metrics for tracking the performance and results of pruning operations. Also it should accept the RegisterIndex interface to get latestHeight and *pebble.DB instance for accessing and modifying the storage.

type Registers struct {
db *pebble.DB
firstHeight uint64
latestHeight *atomic.Uint64
}

API

The module should be a component.Component. The loop function will the main worker for the RegisterDBPruner. It is responsible for periodically checking if pruning is necessary and performing pruning operations at regular pruneThrottleDelay interval.

WithOptions: Configures the RegisterDBPruner component with the following options:

  • WithPruneThreshold(threshold uint64): Sets the pruneThreshold parameter for the pruner.
  • WithPruneInterval(interval uint64): Sets the pruneInterval parameter for the pruner.
  • WithPruneThrottleDelay(delay time.Duration): Sets the pruneThrottleDelay parameter for the pruner.

RegisterDBPruner exposes the following methods:

  • pruneUpToHeight(): Performs the actual pruning operation to remove old and unnecessary data from the storage.

Height Update: It first updates the firstHeight in the Registers storage to the new pruneHeight. This ensures that any entries below this height are not considered in future operations, maintaining consistency.

Row Deletion: The method iterates over the database, identifying the first key who’s height is less than or equal to the pruneHeight and deleting rows with the same key prefix with lower heights, ensuring that all outdated data is removed efficiently.

Batch Operations: Deletions are batched to minimize I/O operations and improve performance. The method commits the batch after processing all relevant rows.

  • checkPrune(ctx context.Context): Determines if pruning should be performed. It calculates whether the difference between the latest height and the first height exceeds the pruneInterval. If it does, it triggers the pruning operation.

    pruneHeight - firstHeight > pruneInterval, 
    where pruneHeight = latestHeight - pruneThreshold
    

@Guitarheroua
Copy link
Collaborator

Hey @peterargue! Do we understand correctly that pruneThrottleDelay is the time interval for pruning to prevent this operation from being performed too often?

@Guitarheroua
Copy link
Collaborator

@peterargue Also, could you please check the implementation steps, that @UlyanaAndrukhiv provided for this issue above? Thanks in advance!

@peterargue
Copy link
Contributor Author

peterargue commented Aug 15, 2024

Hey @peterargue! Do we understand correctly that pruneThrottleDelay is the time interval for pruning to prevent this operation from being performed too often?

pruneThrottleDelay is actually a small pause between pruning each register or batch of registers. The idea is to prevent the pruner from using up too much CPU/memory. I'm thinking a default value in the low milliseconds should be fine.

The implementation would probably look like collecting some amount of deletes into a batch, committing the batch, then pausing. So we would need to tune pruneThrottleDelay and the batch size.

@peterargue
Copy link
Contributor Author

peterargue commented Aug 15, 2024

New CLI flags

The Access and Observer nodes should have new command arguments: --registerdb-pruning-enabled - to turn the feature on or off. --registerdb-prune-interval - to set a number of pruned blocks in the db, above which pruning should be triggered --registerdb-prune-throttle-delay- to set a delay the pruner which should use while iterating the db to reduce load on the overall system.

Let's not make --registerdb-prune-interval configurable, and instead use a const.
but let's make --registerdb-prune-threshold configurable, which sets the number of blocks to keep.

then pruneInterval is pruneThreshold plus some amount (say 10%)

The module should be initialized with a set of configuration atomic options

The pruner is single threaded, so we probably don't need to use atomics. If there turn out to be cases when we need to expose values to outside processes, we can use atomic.

performing pruning operations at regular pruneThrottleDelay interval.

pruning should happen on some regular interval (10m, 1h, etc). pruneThrottleDelay will control a small pause between batches of registers inspected and pruned. The goal being to reduce CPU and IO load on the system.

  • WithPruneInterval(interval uint64): Sets the pruneInterval parameter for the pruner.

if prune interval is a const, we won't need an option for it.

  • pruneUpToHeight(): Performs the actual pruning operation to remove old and unnecessary data from the storage.

this method makes sense, but we don't need to expose it. it should be called by the loop.

Height Update: It first updates the firstHeight in the Registers storage to the new pruneHeight. This ensures that any entries below this height are not considered in future operations, maintaining consistency.

it should update firstHeight in the database. That may be done via the Registers storage, or the pruner could use the db operation directly. I'm currently thinking the latter is better since I don't think we would want to expose the ability to update the height to anyone.

Row Deletion: The method iterates over the database, identifying the first key who’s height is less than or equal to the pruneHeight and deleting rows with the same key prefix with lower heights, ensuring that all outdated data is removed efficiently.

yes

Batch Operations: Deletions are batched to minimize I/O operations and improve performance. The method commits the batch after processing all relevant rows.

yes, and we will need to tune the batch size along with pruneThrottleDelay to fine a good balance between resource utilization and speed of each pruning cycle.

  • checkPrune(ctx context.Context): Determines if pruning should be performed. It calculates whether the difference between the latest height and the first height exceeds the pruneInterval. If it does, it triggers the pruning operation.

yes

@peterargue
Copy link
Contributor Author

How about this breakdown?

  1. Define APIs (functions and args)
    • Pruner module
    • Register storage
  2. Pruner module
    • setup
    • component interface
    • loop logic
  3. Prune logic
    • implement pruneUpToHeight
  4. Registers Remove method
    • implement method that delete the data and updates metadata
  5. Integrate into access/observer nodes + config
    • add to node bootstrap
    • add config
  6. Implement integration and functional tests
    • Could potentially parallelize since much of the work will be sorting out mocks and the test scaffolding
  7. Add metrics
  8. Localnet testing

Steps 2-4 would all include writing comprehensive unit tests.

I think 1 must be done first, but could be a collaboration.
2-5 could be done in parallel, and 6-8 could also.

@Guitarheroua
Copy link
Collaborator

How about this breakdown?

...

I think 1 must be done first, but could be a collaboration. 2-5 could be done in parallel, and 6-8 could also.

I think 1-2, should be done first by one person, but other steps should work, as you describe. Will try it out, and parallelize the work as described. Thank you for your advice!

@UlyanaAndrukhiv UlyanaAndrukhiv linked a pull request Aug 26, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants