[Access] Add support for pebbleDB to execution data tracker/pruner #6277

UlyanaAndrukhiv · 2024-07-29T20:16:41Z

Closes: #6260

Context

In this pull request:

Split out ExecutionDataTracker DB code into common ExecutionDataTracker interface.
Refactored badger implementation.
Added the pebble version of the storage object.
Added functional and integration tests for pebble version of execution data pruning.

…e execution data tracker impl, refactored badger impl

… test for badger impl

codecov-commenter · 2024-07-29T20:25:44Z

Codecov Report

Attention: Patch coverage is 39.39394% with 340 lines in your changes missing coverage. Please review.

Project coverage is 41.40%. Comparing base (9653906) to head (3362d0b).

Files with missing lines	Patch %	Lines
storage/badger/execution_data_tracker.go	65.43%	35 Missing and 21 partials ⚠️
storage/pebble/execution_data_tracker.go	63.30%	31 Missing and 20 partials ⚠️
storage/pebble/operation/execution_data_tracker.go	0.00%	30 Missing ⚠️
storage/badger/operation/execution_data_tracker.go	0.00%	27 Missing ⚠️
storage/pebble/operation/common.go	36.58%	25 Missing and 1 partial ⚠️
storage/mock/track_blobs_fn.go	0.00%	21 Missing ⚠️
cmd/access/node_builder/access_node_builder.go	0.00%	16 Missing ⚠️
cmd/observer/node_builder/observer_builder.go	0.00%	16 Missing ⚠️
storage/execution_data_tracker.go	37.50%	13 Missing and 2 partials ⚠️
storage/mock/prune_callback.go	0.00%	15 Missing ⚠️
... and 11 more

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #6277      +/-   ##
==========================================
- Coverage   41.42%   41.40%   -0.03%     
==========================================
  Files        2024     2032       +8     
  Lines      144439   144742     +303     
==========================================
+ Hits        59839    59928      +89     
- Misses      78403    78624     +221     
+ Partials     6197     6190       -7

Flag	Coverage Δ
unittests	`41.40% <39.39%> (-0.03%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…dates

…of github.com:The-K-R-O-K/flow-go into UlyanaAndrukhiv/6017-pebble-for-tracker-updates

…updated integration tests to avoid code dublication

…dates

…hub.com:The-K-R-O-K/flow-go into UlyanaAndrukhiv/6017-pebble-for-tracker-updates

…dates

…hub.com:The-K-R-O-K/flow-go into UlyanaAndrukhiv/6017-pebble-for-tracker-updates

…dates

storage/badger/execution_data_tracker.go

storage/pebble/execution_data_tracker.go

…ding to comments

…dates

…badger db

…dates

Guitarheroua

Looks good!

peterargue · 2024-08-16T20:38:11Z

@zhangchiqing since you've been working closely with pebble/badger lately, can you help review this one. It's refactoring the execution data pruner to support both badger and pebble

…dates

zhangchiqing · 2024-09-13T21:04:40Z

Great work! @UlyanaAndrukhiv

Since we're adding both Pebble and Badger implementations for the execution data tracker, this PR has become quite extensive. I suggest we start with the Badger solution first, focusing on the implementation for the tracker and the pruning logic.

This PR also references the Pebble implementation I created in this pull request. However, that PR is still under review, and there's a high likelihood that the Pebble implementation will need refactoring. The patterns we're referring to here might become outdated, so it's better to wait and postpone the Pebble implementation for now.

We could initially implement the pruner in Badger, refactoring it to use Badger batch updates instead of transactions. Since Pebble doesn't support transactions, using Badger batch updates could make it easier for us to switch to the Pebble implementation later.

Design

We might need to revisit the design of the Tracker and Pruner. The original tracker and pruner were implemented a long time ago, and they face several challenges if we switch to Badger batch updates. We should take a step back and reconsider the design first.

For example:
For each height, we index a list of CIDs as execution data. CIDs are nested, meaning a CID could have multiple children, and a child CID might have multiple different parent CIDs. When we prune a height, we can't just remove all the CIDs and their child CIDs indexed by that height, because some CID might be referenced by other CIDs at higher height. This is challenging because it requires extra information to determine if a CID or a child CID is prunable.

We tried to address this by keeping track of the highest indexed height for each CID (RetrieveTrackerLatestHeight / UpsertTrackerLatestHeight), but this introduces complexity, and I'm unsure if it's concurrency-safe without database transactions. Is it possible to eliminate the extra index to simplify things? If we keep the LatestHeight index for each CID, we need to be cautious about dirty writes that might corrupt data. For instance, while we're pruning a CID, we might also be concurrently indexing a new height with a certain CID referring to the deleted CID, which could corrupt the newly indexed data. We probably don't want to solve this problem by blocking indexing with a lock during pruning, because pruning might take a long time.

Therefore, I think we need to address these challenges before proceeding with the implementation.

zhangchiqing

Just realized I forgot to submit my reivew, and my comments was in pending.

zhangchiqing · 2024-08-29T15:53:26Z

cmd/access/node_builder/access_node_builder.go

-						return builder.ExecutionDataBlobstore.DeleteBlob(context.TODO(), c)
-					}),
-				)
+				if executionDataDBMode == execution_data.ExecutionDataDBModeBadger {


Can we implement a CheckExistingExecutionDataDBMode(executionDataDBMode, trackerDir) function or something similar to check if the folder has consistent data with the DB mode?

This could prevent from accidentally using existing badger db data as pebble, which might corrupt the database.

zhangchiqing · 2024-08-29T16:18:27Z

storage/badger/execution_data_tracker.go

+//
+// No errors are expected during normal operation.
+func (s *ExecutionDataTracker) trackBlobs(blockHeight uint64, cids ...cid.Cid) error {
+	cidsPerBatch := s.batchItemLimit(storage.CidsPerBatch, 2, storage.BlobRecordKeyLength+storage.LatestHeightKeyLength+8)


It seems this never change and can be calculated during initialization?

zhangchiqing · 2024-08-29T17:47:27Z

storage/badger/execution_data_tracker.go

+				break
+			}
+
+			dInfo := &storage.DeleteInfo{


better not to create until it's actually needed.

zhangchiqing · 2024-08-29T17:48:47Z

storage/badger/execution_data_tracker.go

+		return err
+	}
+
+	if err := s.db.View(func(txn *badger.Txn) error {


I think View is a read-only op, but pruning is a write op

zhangchiqing · 2024-08-29T17:57:01Z

storage/badger/execution_data_tracker.go

+// - c: The CID of the blob to be tracked.
+//
+// No errors are expected during normal operation.
+func (s *ExecutionDataTracker) trackBlob(tx *badger.Txn, blockHeight uint64, c cid.Cid) error {


Can we implement this without using badger transaction, instead using batch updates. This allows it easy to switch to pebble implementation which also uses batch updates.

zhangchiqing · 2024-08-29T18:09:51Z

storage/badger/execution_data_tracker.go

+
+		// iterate over blob records, calling pruneCallback for any CIDs that should be pruned
+		// and cleaning up the corresponding tracker records
+		for it.Seek(blobRecordPrefix); it.ValidForPrefix(blobRecordPrefix); it.Next() {


can we reuse badger's traverse function to implement?

It's better that we abstract the lowlevel database operation, it would make it easy to switch to pebble.

zhangchiqing · 2024-08-29T18:14:46Z

storage/badger/execution_data_tracker.go

+		return nil
+	}
+
+	err = operation.UpsertTrackerLatestHeight(c, blockHeight)(tx)


Why do we need to keep track of the latest height of each cid?

UlyanaAndrukhiv added 6 commits July 25, 2024 16:01

Refactored structure of execution data tracker

0b6499a

Refactored badger version of execution data tracker

5893168

Added pebble operations for execution data tracker, added basic pebbl…

6f24602

…e execution data tracker impl, refactored badger impl

Added functional test for pebble execution data tracker impl, updated…

cb61531

… test for badger impl

Updated AN and ON builders

e2630cd

Added comments

fd14160

Merge branch 'master' into UlyanaAndrukhiv/6017-pebble-for-tracker-up…

d42f1e6

…dates

Guitarheroua requested a review from peterargue July 30, 2024 08:58

UlyanaAndrukhiv added 17 commits July 30, 2024 17:52

Moved interface from tracker to storage

4d675d7

Merge branch 'master' into UlyanaAndrukhiv/6017-pebble-for-tracker-up…

97897ad

…dates

Updated pebble execution data tracker

0f3dd84

Added comment for ffBytes, fixed check

437d0a7

Merge branch 'UlyanaAndrukhiv/6017-pebble-as-execution-datastore-db' …

5f7a25e

…of github.com:The-K-R-O-K/flow-go into UlyanaAndrukhiv/6017-pebble-for-tracker-updates

Added integration test for pebble version of execution data pruning, …

d9218c9

…updated integration tests to avoid code dublication

Merge branch 'master' into UlyanaAndrukhiv/6017-pebble-for-tracker-up…

83fc78a

…dates

Updated integration test

63573d6

Merge branch 'UlyanaAndrukhiv/6017-pebble-for-tracker-updates' of git…

6f266d8

…hub.com:The-K-R-O-K/flow-go into UlyanaAndrukhiv/6017-pebble-for-tracker-updates

Merge branch 'master' into UlyanaAndrukhiv/6017-pebble-for-tracker-up…

383e8eb

…dates

Updated naming

edadc78

Merge branch 'UlyanaAndrukhiv/6017-pebble-for-tracker-updates' of git…

04d6e20

…hub.com:The-K-R-O-K/flow-go into UlyanaAndrukhiv/6017-pebble-for-tracker-updates

Generated mocks

68a0223

Merge branch 'master' into UlyanaAndrukhiv/6017-pebble-for-tracker-up…

ff6b215

…dates

Updated naming for operations on storages

a77db84

Added documentation

9a2e3be

Updated order of arguments in NewExecutionDataTracker

4132ead

UlyanaAndrukhiv marked this pull request as ready for review August 6, 2024 08:38

UlyanaAndrukhiv requested review from ramtinms and janezpodhostnik as code owners August 6, 2024 08:38

Merge branch 'master' into UlyanaAndrukhiv/6017-pebble-for-tracker-up…

d6e7a25

…dates

Updated badger version of batchDelete

d6f8b9a

Guitarheroua reviewed Aug 13, 2024

View reviewed changes

storage/badger/execution_data_tracker.go Outdated Show resolved Hide resolved

Guitarheroua reviewed Aug 13, 2024

View reviewed changes

storage/badger/execution_data_tracker.go Outdated Show resolved Hide resolved

Guitarheroua reviewed Aug 13, 2024

View reviewed changes

storage/badger/execution_data_tracker.go Outdated Show resolved Hide resolved

Guitarheroua reviewed Aug 13, 2024

View reviewed changes

storage/badger/execution_data_tracker.go Outdated Show resolved Hide resolved

Guitarheroua reviewed Aug 13, 2024

View reviewed changes

storage/pebble/execution_data_tracker.go Outdated Show resolved Hide resolved

Guitarheroua reviewed Aug 13, 2024

View reviewed changes

storage/pebble/execution_data_tracker.go Outdated Show resolved Hide resolved

UlyanaAndrukhiv added 8 commits August 13, 2024 13:13

Updated pebble version of execution data tracker by using batch accor…

34e04ee

…ding to comments

Fixed godoc, updated logger values

3b8509d

Merge branch 'master' into UlyanaAndrukhiv/6017-pebble-for-tracker-up…

6cb0105

…dates

Updated trackBlob according to comment

b8181ab

Separated read operation from write batch operations

640de21

Moved common logic for maximum number of items in a single batch for …

65e1614

…badger db

Updated naming

feb984d

Moved common logic for tracker initialization

b12dc9b

UlyanaAndrukhiv requested review from Guitarheroua and peterargue August 13, 2024 14:59

UlyanaAndrukhiv added 4 commits August 14, 2024 18:09

Added functional tests for common pebble operations

95ac8ae

Merge branch 'master' into UlyanaAndrukhiv/6017-pebble-for-tracker-up…

cb2924a

…dates

Merge branch 'master' into UlyanaAndrukhiv/6017-pebble-for-tracker-up…

59837c9

…dates

Merge branch 'master' into UlyanaAndrukhiv/6017-pebble-for-tracker-up…

43ce701

…dates

Guitarheroua approved these changes Aug 16, 2024

View reviewed changes

peterargue requested a review from zhangchiqing August 16, 2024 20:37

UlyanaAndrukhiv added 4 commits August 26, 2024 11:25

Merge branch 'master' into UlyanaAndrukhiv/6017-pebble-for-tracker-up…

0e39976

…dates

Merge branch 'master' into UlyanaAndrukhiv/6017-pebble-for-tracker-up…

93aa09f

…dates

Merged with master

fba4e0b

Merge branch 'master' into UlyanaAndrukhiv/6017-pebble-for-tracker-up…

3362d0b

…dates

zhangchiqing reviewed Oct 19, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Access] Add support for pebbleDB to execution data tracker/pruner #6277

[Access] Add support for pebbleDB to execution data tracker/pruner #6277

UlyanaAndrukhiv commented Jul 29, 2024 •

edited

Loading

codecov-commenter commented Jul 29, 2024 •

edited

Loading

Guitarheroua left a comment

peterargue commented Aug 16, 2024

zhangchiqing commented Sep 13, 2024

zhangchiqing left a comment

zhangchiqing Aug 29, 2024

zhangchiqing Aug 29, 2024

zhangchiqing Aug 29, 2024

zhangchiqing Aug 29, 2024

zhangchiqing Aug 29, 2024

zhangchiqing Aug 29, 2024

zhangchiqing Aug 29, 2024

[Access] Add support for pebbleDB to execution data tracker/pruner #6277

Are you sure you want to change the base?

[Access] Add support for pebbleDB to execution data tracker/pruner #6277

Conversation

UlyanaAndrukhiv commented Jul 29, 2024 • edited Loading

Context

codecov-commenter commented Jul 29, 2024 • edited Loading

Codecov Report

Guitarheroua left a comment

Choose a reason for hiding this comment

peterargue commented Aug 16, 2024

zhangchiqing commented Sep 13, 2024

Design

zhangchiqing left a comment

Choose a reason for hiding this comment

zhangchiqing Aug 29, 2024

Choose a reason for hiding this comment

zhangchiqing Aug 29, 2024

Choose a reason for hiding this comment

zhangchiqing Aug 29, 2024

Choose a reason for hiding this comment

zhangchiqing Aug 29, 2024

Choose a reason for hiding this comment

zhangchiqing Aug 29, 2024

Choose a reason for hiding this comment

zhangchiqing Aug 29, 2024

Choose a reason for hiding this comment

zhangchiqing Aug 29, 2024

Choose a reason for hiding this comment

UlyanaAndrukhiv commented Jul 29, 2024 •

edited

Loading

codecov-commenter commented Jul 29, 2024 •

edited

Loading