-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support nested iteration #295
Conversation
I signed CLA, a re-run is needed. |
2984e50
to
8f16bb6
Compare
Thank you for tackling this 🙇 the API looks great! Once CI runs you'll hit failures against Rails edge, you may want to get ahead of that by borrowing and extending #294 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this contribution! It's been on my wishlist for a long time, so it's great to finally see it become reality ❤️
Would you mind rebasing on the latest |
8f16bb6
to
2ab488a
Compare
Rebased. |
There are some CI failures 🤔
Is this test flaky? Or did one of the gem bumps break it? |
@Mangara I identified what is the problem. Do not know how I overlooked it, but this iterator isn't working correctly when created from an existing cursor when resuming iteration. Each cursor in the chain is independently advanced. For example, if the cursor was And additionally the test was weak enough to identify it when running in isolation.
job-iteration/lib/job-iteration/iteration.rb Line 185 in b1f818e
Upd: We need to start from the current cursor in enumerators (not the next), e.g. - job-iteration/lib/job-iteration/csv_enumerator.rb Lines 60 to 62 in b1f818e
But the next time the same record will be processed once again. Is it a problem? In ActiveJob/Sidekiq each job already can be processed multiple times because of errors/etc, so people are already encouraged to write idempotent code in jobs. |
comments_count.times.map { |n| { content: "#{product.name} comment ##{n}", product_id: product.id } } | ||
end.flatten | ||
|
||
Comment.insert_all!(comments) | ||
end | ||
|
||
def truncate_fixtures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you want to change the line below to ActiveRecord::Base.connection.truncate(Product.table_name, Comment.table_name)
to clean up this new model as well.
Is it different than how it is today? That doesn't seem a problem to me in that case. I had a bit of trouble parsing this and the test cases, so I played with these to make sure I understand it correctly (and please correct me if I'm still wrong) test "array cursor can be used to resume" do
enumerator_builder = EnumeratorBuilder.new(mock)
integers = [0, 1, 2]
enum = enumerator_builder.array(integers, cursor: 0)
assert_equal [1, 2], enum.map { |value, _cursor| value }
end
test "nested array cursor can be used to resume" do
enumerator_builder = EnumeratorBuilder.new(mock)
integers = [0, 1, 2]
strings = [["0a", "0b", "0c"], ["1a", "1b", "1c"], ["2a", "2b", "2c"]]
enum = enumerator_builder.nested(
[
->(cursor) { enumerator_builder.array(integers, cursor: cursor) },
->(integer, cursor) { enumerator_builder.array(strings[integer], cursor: cursor) },
],
cursor: [0, 0]
)
assert_equal ["1b", "1c", "2a", "2b", "2c"], enum.map { |value, _cursor| value }
end The second test is correctly specified, but currently fails with:
|
Yes. Currently ( The first test case has Second test case: since we are starting from Not sure if it is more clear now. Feel free to ask, and I will try rephrase and somehow improve my explanation. |
|
||
enum.each do |item, cursor_value| | ||
if index == @cursor.size - 1 | ||
yield item, current_cursor + [cursor_value] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been stepping through the debugger a few times and what stood out to me is cursor = @cursor[index]
, the resumed cursor value stays 'stuck' for subsequent iterations, but when not resuming is initialized as an array of nils.
In my nested array example this means skipping over "2a"
but also "3a"
if you extend the pattern as such:
integers = [0, 1, 2, 3]
strings = [["0a", "0b", "0c"], ["1a", "1b", "1c"], ["2a", "2b", "2c"], ["3a", "3b", "3c"]]
This seems to do the trick unless I'm missing something (it's getting late here)?
yield item, current_cursor + [cursor_value] | |
yield item, current_cursor + [cursor_value] | |
@cursor[index] = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to do this inside each
, would work under cursor = @cursor[index]
as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will fix the mentioned issue.
But for the cursor like [0, 1]
, it will start from [1, something]
, because when the iteration is resumed, it starts from the cursor
+ 1. So 0b
and 0c
will be incorrectly skipped.
I would say that this PR is correctly implemented, but works incorrectly, because as I mentioned previously, we need to resume from the same cursor
value each time (without +1).
A separate PR with changes like fatkodima/sidekiq-iteration@5d3fe18 needs to be made for this PR to start working and not skipping values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess what I am not grasping is why the cursor logic throughout this library needs to change to make nesting work. AFAIK its nil
to start at the beginning of a collection, and a non-nil cursor to resume after. With the generic way nested enumerators are set up I don't see why this should change? The individual enumerators shouldn't need to care if it's nested or not, the nested enumerator should feed the correct value to them and handle the nesting level (index
).
To whiteboard the possibly cursor values for ["0a", "0b", "0c"]
and the item processed in each_iteration:
- cursor:
nil
, item: "0a" - cursor:
0
, item: "0b" - cursor:
1
, item: "0c"
and nested with: [["0a", "0b", "0c"], ["1a", "1b", "1c"], ["2a", "2b", "2c"]]
:
- cursor:
[nil, nil]
, item: "0a" - cursor:
[nil, 0]
, item: "0b" - cursor:
[nil, 1]
, item: "0c" - cursor:
[0, nil]
, item: "1a" - cursor:
[0, 0]
, item: "1b" - cursor:
[0, 1]
, item: "1c" - cursor:
[1, nil]
, item: "2a"
etc. It seems to me resetting the right cursor index to nil at the correct time should make the above possible and make it work without changes to the rest of the library.
Having debugged some more I think it's also possible to refactor out this is needed to serialize the cursor, and inspecting that I am beginning to see what you described :) Inside JobIteration::Iteration#iterate_with_enumerator for value object_from_enumerator="1a", index=[1, 0] and that would be serialized as cursor_position. A more integration-style test asserting that pausing and resuming a nested job will cover all the records would be nice. Will play with this a bit more!current_cursor
in the iterate method
Either way - thanks for taking the time to work on this and answering my questions 🙇♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to get a PR up in the next few days.
Looking forward for it 👍
Fixes #63.
This is a generalized version of #100, as suggested in #63 (comment).
Example usage:
cc @bdewater