`unsubscribe_after` forgotten on reconnection #1302

scottlamb · 2024-08-20T21:33:30Z

Observed behavior

In the following sequence:

subscribe
unsubscribe_after n
reconnect
reach n messages
additional messages published on topic

...in step 3, async-nats re-subscribes on the new connection without re-sending an UNSUB {sid} {max} (where max = prev_max - delivered).

...and in step 4, async-nats appears to remove the subscription info from ConnectionHandler::subscriptions here but not actually send an UNSUB message either.

...so in step 5, it still continues receiving messages on the sid from the server for this topic, even though they go nowhere. This is a waste in bandwidth that could add up in some use cases.

Expected behavior

In step 3, I think it should send the UNSUB {sid} {max} immediately to avoid this.

Server and client version

Rust async nats client, current head (053944d)

Host environment

any

Steps to reproduce

I hacked the force_reconnect test to observe this:

call unsub_after pre-reconnect
exceed this limit post-reconnect
just to ensure the test doesn't quit before the spurious messages are received, make a new subscription on a different topic, publish something on it, and wait for it to go through.
add an else if branch here to actually see the spurious messages

I'd be happy to put together a fix. The debatable point is how to automatically test the fix is correct. I believe it's normal to get messages on unknown sids for a bit after enqueuing an UNSUB because there's a delay before the server actually handles it, and it could have enqueued another message on the topic in the meantime. So this isn't worth causing an error over in general, or likely even enqueueing something on events_tx. But we can arrange the sequencing in a test so these messages shouldn't happen if the client is behaving properly. Maybe are there connection stats somewhere we can add a counter to for this, and the test can check those after the fact? I'm open to ideas.

The text was updated successfully, but these errors were encountered:

Jarema · 2024-08-21T11:55:39Z

Hey!

Thanks for detailed description.
What we could do here is detect that given subscription should be unsubed after given time, and send UNSUB immediately after SUB on reconnect.

That way, there is a high chance that both protocol messages will arrive at the server at the same time.
Of course some messages can be lost in the process - if the connection was abrutply shut down, but that is ok.

To test it - I would actually check if simply calling force_reconnect is not enough to get what we need.
The point of the test is to check if the total number of messages received is alright.

scottlamb · 2024-08-21T15:06:06Z

To test it - I would actually check if simply calling force_reconnect is not enough to get what we need. The point of the test is to check if the total number of messages received is alright.

So the Subscription produces the correct results, as it's dropped from the ConnectionHandler::subscriptions map. The problem is wasted bandwidth/CPU behind the scenes. My question is how can we write a test that fails now and succeeds post-fix? Having a counter that clients can see was my first idea, and I'm open to others.

Jarema · 2024-08-26T12:14:03Z

I would enqueue a fresh UNSUB after immediately afte SUB on reconnect.
This should be safe and not trigger more messages than expected, as both should arrive to the server at the same time (flushed toghether).

For testing - I did not find any good way so far.
And I'm not a fan of exposing internal counters just to enable test case.

scottlamb added the defect Suspected defect such as a bug or regression label Aug 20, 2024

Jarema self-assigned this Aug 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`unsubscribe_after` forgotten on reconnection #1302

`unsubscribe_after` forgotten on reconnection #1302

scottlamb commented Aug 20, 2024 •

edited

Loading

Jarema commented Aug 21, 2024

scottlamb commented Aug 21, 2024

Jarema commented Aug 26, 2024

unsubscribe_after forgotten on reconnection #1302

unsubscribe_after forgotten on reconnection #1302

Comments

scottlamb commented Aug 20, 2024 • edited Loading

Observed behavior

Expected behavior

Server and client version

Host environment

Steps to reproduce

Jarema commented Aug 21, 2024

scottlamb commented Aug 21, 2024

Jarema commented Aug 26, 2024

`unsubscribe_after` forgotten on reconnection #1302

`unsubscribe_after` forgotten on reconnection #1302

scottlamb commented Aug 20, 2024 •

edited

Loading