Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-16702 rebuild: restart rebuild for a massive failure case #15406

Open
wants to merge 1 commit into
base: release/2.6
Choose a base branch
from

Commits on Nov 5, 2024

  1. DAOS-16702 rebuild: restart rebuild for a massive failure case

    In special massive failure case -
    1. some engines down and triggered rebuild.
    2. one engine participated the rebuild, not finished yet, it down again,
       the #failures exceeds pool RF and will not change pool map.
    3. That engine restarted by administrator.
    
    In that case should recover the rebuild task on the engine, to simplify it now just
    abort and retry the global rebuild task.
    No such issue by the typical recover approach that restart the whole
    system including the PS leader.
    
    another backport commit -
    947c76d DAOS-16175 container: fix a case for cont_iv_hdl_fetch (#15395)
    
    Skip-nlt: true
    
    Signed-off-by: Xuezhao Liu <[email protected]>
    liuxuezhao committed Nov 5, 2024
    Configuration menu
    Copy the full SHA
    9643482 View commit details
    Browse the repository at this point in the history