Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize heartbeat timeout judgement if worker already succeeded. #1289

Conversation

BalaBalaYi
Copy link
Collaborator

@BalaBalaYi BalaBalaYi commented Oct 11, 2024

What changes were proposed in this pull request?

  1. Add 'succeeded report' implement.
  2. Add 'succeeded' flag for Node object.
  3. Skip 'succeeded node' in 'noheartbeat' judgement.

Why are the changes needed?

Won't trigger heartbeat timeout if agent already succeeded exit.

Does this PR introduce any user-facing change?

Specify whether this pull request introduces any changes that users will directly interact with or notice.

How was this patch tested?

Ut and training job.

@BalaBalaYi BalaBalaYi added the enhancement New feature or request label Oct 11, 2024
@BalaBalaYi BalaBalaYi changed the title Optimize heartbeat timeout judgement if worker already succeeded. [WIP] Optimize heartbeat timeout judgement if worker already succeeded. Oct 11, 2024
Copy link

codecov bot commented Oct 11, 2024

Codecov Report

Attention: Patch coverage is 96.15385% with 2 lines in your changes missing coverage. Please review.

Project coverage is 80.34%. Comparing base (cfbc24b) to head (8f95c81).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
dlrover/python/tests/test_job_manager.py 90.47% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1289      +/-   ##
==========================================
+ Coverage   80.30%   80.34%   +0.04%     
==========================================
  Files         222      222              
  Lines       20429    20481      +52     
==========================================
+ Hits        16405    16456      +51     
- Misses       4024     4025       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@BalaBalaYi BalaBalaYi changed the title [WIP] Optimize heartbeat timeout judgement if worker already succeeded. Optimize heartbeat timeout judgement if worker already succeeded. Oct 14, 2024
@BalaBalaYi BalaBalaYi merged commit 0ef290a into intelligent-machine-learning:master Oct 14, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants