Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parallel processing mode #1

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

cfrankb
Copy link

@cfrankb cfrankb commented Jan 20, 2023

No description provided.

Copy link

@maoueh maoueh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I don't like the accumulation of the output, feels highly problematic to not have interactivity until the command stop. WaitGroup with concurrent worker library already exist like https://github.com/remeh/sizedwaitgroup, I don't think we need to re-implement this.

Finally, I have some concerns with ordering, current behavior has strict ordering, you will see the ranges scanning happening in order of the defined range. Now, the ordering is not specified anymore and anything could happen.

So, I want to see:

  • Output interactivity (at least at the level of a merged blocks bundle)
  • Respected ordering of jobs
  • No overly high memory usage

For concurrency and ordered output, we have a library dhammer that contains such structure, it's something that could be used, see example https://github.com/streamingfast/dhammer/blob/develop/example_nailer_test.go (I just converted to use Go Generics, so requires 1.18+).

I'm fine with other concurrency solution, there is no need to use dhammer.Nailer, as long as we keep ordering

check_blocks.go Outdated Show resolved Hide resolved
check_blocks.go Outdated Show resolved Hide resolved
check_blocks.go Outdated Show resolved Hide resolved
@cfrankb
Copy link
Author

cfrankb commented Jan 27, 2023

@maoueh with the addition of the latest commit, I think that have resolved all the outstanding objections except for the output ordering. I am looking at the dhammer example which has ONLY 1 output per threads and it not readily clear how a scenario with unlimited outputs per thread can be reordered without buffering. Please advise.

@maoueh
Copy link

maoueh commented Jan 30, 2023

@cfrankb Yeah you are right that buffering is still required, we cannot maintain ordering without buffering. So, I suggest we do the following.

We disable parallel job and print details is to print full blocks, make no sense for this. The CLI PR would then refuse to run in parallel mode when use is requesting to print full blocks (I think actually that we should remove capability to print full block(s) in check command altogether, it doesn't belong there).

This means that actual output to write to stdout is limited in size and we can correctly implemented parallel and ordered feature.

@cfrankb cfrankb force-pushed the feature/add_parallel_processing_mode branch from 8eba728 to e6a77f3 Compare February 1, 2023 19:03
@cfrankb cfrankb force-pushed the feature/add_parallel_processing_mode branch from e6a77f3 to a74e623 Compare February 1, 2023 19:06
@cfrankb
Copy link
Author

cfrankb commented Feb 1, 2023

@maoueh I addressed the ordering in the latest commit. Please advise on additional tweaks needed.

@sduchesneau
Copy link

I tried the current code in firehose-ethereum and I get pretty bad experience from using it.

  1. There is a race condition where it deadlocks on almost every run. It blocks on tools/check_blocks.go:206, while another thread is on sf-tools/check_blocks.go:99. I tried to identify more precisely the logic error that causes it but the logic behind the messageLogger using isDone and currentJobID seems a bit convoluted... hard to read for me anyway.

  2. When I use the -s flag (summary), I'm getting responses in random order:

Block #477 (7c53fc332289c2d92d046d99836585aaba9246c086b43db04b2df50d2702d689) 0 transactions, 0 calls
Block #5525 (b3cb6b547678c4ebe73fb24a05f838797b19b105f5d2418e64d4e41ada3d395e) 0 transactions, 0 calls
Block #478 (4d8f9fb289ae974f04320f97a15c6da73c6b67b1a95cd2ae6bbca88cfff56a17) 0 transactions, 0 calls
Block #3536 (18981f39fe2ab08422d73da8ea3399471dffbedecf2d85fd445ada6b61845096) 0 transactions, 0 calls

I know that the 'PrintFull' was discussed as something that shouldn't be possible in parallel, but I feel that the 'PrintStats', containing at most one line per block, is not something too costly to buffer for reordering.

  1. When integrating, I found that the CheckMergedBlocks (public) method cannot be used directly (as it requires passing a channel of logInfo which is not publicly exposed... and I have to admit that the caller, firehose-ethereum, wouldn't know what to provide there...)
  • Could you keep that method working as-is ? Maybe it could default to parallel checking with decent values if the PrintMode allows it, and the parallel parameters could be passed only as an option ?

How to reproduce in firehose-ethereum

  1. add this line: replace github.com/streamingfast/sf-tools => /path/to/your/sf-tools to your go.mod
  2. tweak tools/check.go to use the following, with batches of 1000 blocks and 10 workers:
-       return sftools.CheckMergedBlocks(cmd.Context(), zlog, storeURL, fileBlockSize, blockRange, blockPrinter, printDetails)
+       return sftools.CheckMergedBlocksBatch(cmd.Context(), zlog, storeURL, fileBlockSize, blockRange, blockPrinter, printDetails, 200, 10)
  1. run go mod tidy to get the dependencies
  2. Run go install -v ./cmd/fireeth && fireeth tools check merged-blocks /path/to/my/blocks -r 0:10000

It should get stuck somewhere after the first few jobs. (In my case, the merged-blocks are remote, maybe affects the bug that looks like a race condition)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants