Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: synchronize cloud data client operations properly #204

Merged
merged 4 commits into from
May 30, 2024

Conversation

saranyailla
Copy link
Member

@saranyailla saranyailla commented May 21, 2024

Issue #, if available:

Description of changes:
Run mqtt callbacks in a separate thread to avoid a deadlock situation that happens when the Shadow manager component enters into RUNNING state before the MQTT client connection is successfully created acc to GG.

Mqtt connect future will be completed with the client only after the first on connect callbacks are triggered. Shadow manager onConnect callback needs the client to be fully formed (connect future to be completed with the mqtt client) for it to use subscribe with it. Hence, the subscriptions triggered from the callback timeout waiting for the client.

During SM start up, startSyncingShadows is called which calls updateSubscriptions on the cloudDataClient. That spins up a new thread from the executor service pool which run this private synchronized updateSubscriptions on the cloudDataClient. This runs indefinitely as mqtt subscribe op was never successful.
Now, mqtt callback thread is blocked at updateSubscriptions in startSyncShadows because that method is also synchronized on the cloudDataClient instance and we can't have two synchronized methods interleaving on the same instance.

Why is this change necessary:
More info:
When the MQTT client is created for the first time, onConnect (one-time) callbacks are run before the connectFuture is completed with the client. Only when these callbacks are completed, the connectFuture is completed.

But, in the case where Shadow manager component enters into RUNNING state before the MQTT client connection is successfully created for the first time, onConnectionResumed callback is triggered when the mqtt client is created for the first time. This callback uses subscribes to topics using mqtt client. However, in order to subscribe using the mqtt client, the connectFuture should be fully completed resulting in a deadlock situation.

The fix is to run the callback in a separate thread, so the connectFuture is completed without being blocked.

How was this change tested:

Any additional information or context required to review the change:

Checklist:

  • Updated the README if applicable
  • Updated or added new unit tests
  • Updated or added new integration tests
  • If your code makes a remote network call, it was tested with a proxy

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@jcosentino11
Copy link
Member

Can we accomplish this without adding more threads? In startSyncingShadows, we can check if connected in a non-blocking way mqttClient.getMqttOnline().get(). and for stopSyncingShadows, maybe we can find a way to not waitForSyncEnd(); in this case

Copy link

github-actions bot commented May 21, 2024

Unit Tests Coverage Report

File Coverage Lines Branches
All files 83% 88% 78%

Minimum allowed coverage is 65%

Generated by 🐒 cobertura-action against e5f94b4

Copy link

github-actions bot commented May 21, 2024

Integration Tests Coverage Report

File Coverage Lines Branches
All files 72% 76% 69%

Minimum allowed coverage is 45%

Generated by 🐒 cobertura-action against e5f94b4

@saranyailla saranyailla force-pushed the unblock-mqtt branch 2 times, most recently from 36e0338 to 0c3db80 Compare May 24, 2024 03:21
@saranyailla saranyailla force-pushed the unblock-mqtt branch 4 times, most recently from 7f67acf to da7df14 Compare May 30, 2024 05:19
@saranyailla saranyailla changed the title fix: run mqtt callbacks in a separate thread fix: properly synchronize cloud data client operations May 30, 2024
@saranyailla saranyailla changed the title fix: properly synchronize cloud data client operations fix: synchronize cloud data client operations properly May 30, 2024
Copy link
Member

@jcosentino11 jcosentino11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, just some small cleanup suggestions

@saranyailla saranyailla force-pushed the unblock-mqtt branch 2 times, most recently from 5e60507 to 45b3a6d Compare May 30, 2024 22:23
@@ -290,6 +290,7 @@ void GIVEN_100_synced_shadows_WHEN_unsubscribeForAllShadowsTopics_THEN_unsubscri
}

cloudDataClient.unsubscribeForAllShadowsTopics();
TimeUnit.MILLISECONDS.sleep(5000);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's this for?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cloudDataClient.unsubscribeForAllShadowsTopics(); now runs in a separate thread with the current changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah ideally we have a way of waiting via a latch or something

@saranyailla saranyailla merged commit 132bf74 into main May 30, 2024
5 of 6 checks passed
@saranyailla saranyailla deleted the unblock-mqtt branch May 30, 2024 23:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants