Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unsubscribe_after forgotten on reconnection #1302

Open
scottlamb opened this issue Aug 20, 2024 · 3 comments
Open

unsubscribe_after forgotten on reconnection #1302

scottlamb opened this issue Aug 20, 2024 · 3 comments
Assignees
Labels
defect Suspected defect such as a bug or regression

Comments

@scottlamb
Copy link

scottlamb commented Aug 20, 2024

Observed behavior

In the following sequence:

  1. subscribe
  2. unsubscribe_after n
  3. reconnect
  4. reach n messages
  5. additional messages published on topic

...in step 3, async-nats re-subscribes on the new connection without re-sending an UNSUB {sid} {max} (where max = prev_max - delivered).

...and in step 4, async-nats appears to remove the subscription info from ConnectionHandler::subscriptions here but not actually send an UNSUB message either.

...so in step 5, it still continues receiving messages on the sid from the server for this topic, even though they go nowhere. This is a waste in bandwidth that could add up in some use cases.

Expected behavior

In step 3, I think it should send the UNSUB {sid} {max} immediately to avoid this.

Server and client version

Rust async nats client, current head (053944d)

Host environment

any

Steps to reproduce

I hacked the force_reconnect test to observe this:

  • call unsub_after pre-reconnect
  • exceed this limit post-reconnect
  • just to ensure the test doesn't quit before the spurious messages are received, make a new subscription on a different topic, publish something on it, and wait for it to go through.
  • add an else if branch here to actually see the spurious messages

I'd be happy to put together a fix. The debatable point is how to automatically test the fix is correct. I believe it's normal to get messages on unknown sids for a bit after enqueuing an UNSUB because there's a delay before the server actually handles it, and it could have enqueued another message on the topic in the meantime. So this isn't worth causing an error over in general, or likely even enqueueing something on events_tx. But we can arrange the sequencing in a test so these messages shouldn't happen if the client is behaving properly. Maybe are there connection stats somewhere we can add a counter to for this, and the test can check those after the fact? I'm open to ideas.

@scottlamb scottlamb added the defect Suspected defect such as a bug or regression label Aug 20, 2024
@Jarema Jarema self-assigned this Aug 21, 2024
@Jarema
Copy link
Member

Jarema commented Aug 21, 2024

Hey!

Thanks for detailed description.
What we could do here is detect that given subscription should be unsubed after given time, and send UNSUB immediately after SUB on reconnect.

That way, there is a high chance that both protocol messages will arrive at the server at the same time.
Of course some messages can be lost in the process - if the connection was abrutply shut down, but that is ok.

To test it - I would actually check if simply calling force_reconnect is not enough to get what we need.
The point of the test is to check if the total number of messages received is alright.

@scottlamb
Copy link
Author

To test it - I would actually check if simply calling force_reconnect is not enough to get what we need. The point of the test is to check if the total number of messages received is alright.

So the Subscription produces the correct results, as it's dropped from the ConnectionHandler::subscriptions map. The problem is wasted bandwidth/CPU behind the scenes. My question is how can we write a test that fails now and succeeds post-fix? Having a counter that clients can see was my first idea, and I'm open to others.

@Jarema
Copy link
Member

Jarema commented Aug 26, 2024

I would enqueue a fresh UNSUB after immediately afte SUB on reconnect.
This should be safe and not trigger more messages than expected, as both should arrive to the server at the same time (flushed toghether).

For testing - I did not find any good way so far.
And I'm not a fan of exposing internal counters just to enable test case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect Suspected defect such as a bug or regression
Projects
None yet
Development

No branches or pull requests

2 participants