Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds migrations to restore currupted staking ledgers in Polkadot and Kusama #447

Conversation

gpestana
Copy link
Contributor

@gpestana gpestana commented Aug 26, 2024

Note: for more details on the corrupted ledgers issue and recovery steps check https://hackmd.io/m_h9DRutSZaUqCwM9tqZ3g?view.

This PR adds a migration in Polkadot and Kusama runtimes to recover the current corrupted ledgers in Polkadot and Kusama. A migration consists of:

  1. Call into pallet_staking::Pallet::<T>::restore_ledger for each of the "whitelisted" stashes as Root origin.
  2. Performs a check that ensures the restored ledger's stake does not overflow the current stash's free balance. If that's the case, force unstake the ledger. This check is currently missing in polkadot-sdk/pallet-staking (PR with patch here).

The reason to restore the corrupted ledgers as migrations implemented in the fellowship runtimes is twofold:

  1. The call to pallet_staking::Pallet::<T>::restore_ledger and check + force_unstake must be done atomically (thus a ledger can't be safely restored with a set of two distinct extrinsic calls, so it's not possible to use referenda to this fx).
  2. To speed up the whole process and avoid having to wait for 1. merge and releases of Patches Call::Staking.restore_ledger to ensure a restored ledger has enough free balance to cover staking locks paritytech/polkadot-sdk#5066 and 2. referenda to call into Call::restore_ledger for both Polkadot and Kusama.

Alternatively, we could add a new temporary pallet directly in the fellowship runtime which would expose an extrinsic to restore the ledgers and perform the extra missing check. See this PR as an example.


  • on-runtime-upgrade tests against Polkadot and Kusama
  • staking try-state checks passing after all migrations.

@bkchr
Copy link
Contributor

bkchr commented Aug 27, 2024

This could just be an on chain referendum that calls the functions?

@gpestana
Copy link
Contributor Author

This could just be an on chain referendum that calls the functions?

We only can tell if a ledger needs force unstake at the time of recovery. We need to do the restore, check and potentially unstake atomically. In the time between the referenda are open and executed, new stashes can get in the situation that need to be force unstaked.

@kianenigma
Copy link
Contributor

@bkchr'ss comment still holds: the flaw you are pointing out, the ledgers changing between the time of proposal and enactment, also applies to the hardcoded list of ledgers here.

From my understanding in previous talks, the goal of bringing this code here was that the original pallet_staking::Pallet::<T>::restore_ledger had some flaws, and then I suggested you to copy-paste the code here in a "temporary" pallet.

@gpestana
Copy link
Contributor Author

gpestana commented Aug 27, 2024

@bkchr'ss comment still holds: the flaw you are pointing out, the ledgers changing between the time of proposal and enactment, also applies to the hardcoded list of ledgers here.

So there's two things at play here:

  1. The hardcoded list of accounts are fixed and should not change (since we fixed the runtime code so that there's no new corrupted ledgers).
  2. From the set of hardcoded (and corrupted ledgers), some may need to be unstaked. At this point, we know that one account needs to be unstaked in Polkadot, but it may change in the future (if the stash changes its free balance so that free_balance < restored stake of the ledger).

@kianenigma
Copy link
Contributor

I see, you are right!

I got a bit confused because I didn't look at the code and just checked the comments here.

Then I would rephrase the asnwer to @bkchr as: The logic here is a bit more than just what is available in a pallet-staking Call, therefore it has be a migration, or a temp pallet etc.

Indeed a migration seems cleaner than my original suggestion of a temp pallet, good choice!

@bkchr
Copy link
Contributor

bkchr commented Aug 27, 2024

Then I would rephrase the asnwer to @bkchr as: The logic here is a bit more than just what is available in a pallet-staking Call, therefore it has be a migration, or a temp pallet etc.

Bkchr says that he didn't know that the calls need to happen atomically. Then it makes sense to have them as a migration.

@mrshiposha
Copy link
Contributor

Hi @gpestana.

Yesterday, we tried to stake DOT via a multisig account using batchAll([staking.bond, staking.nominate]).
We got staking.BadState. I thought it was some issue with providers/consumers, but they seem OK on our account (providers = 1).

Do you think this issue is related to the one this PR fixes, or is it a separate one?

Note: our account isn't part of the list from this PR.

@gpestana
Copy link
Contributor Author

gpestana commented Sep 5, 2024

@mrshiposha a staking.BadState error is related to this PR yes. Do you want to reach out to me on Matrix so we can check the state of your account together? My handle is @gpestana:parity.io

Copy link
Contributor

@kianenigma kianenigma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except adding the migration sturcts to the migration tuple + reporting on a dry-run thereof.

@gpestana
Copy link
Contributor Author

gpestana commented Sep 9, 2024

Just for the record, the issue that @mrshiposha has raised is not related to corrupted ledgers. The BadState error was raised due to trying to bond with a unfunded account as stash.

related to: paritytech/polkadot-sdk#5627

relay/kusama/src/lib.rs Outdated Show resolved Hide resolved
relay/kusama/src/lib.rs Show resolved Hide resolved
relay/kusama/src/lib.rs Outdated Show resolved Hide resolved
@Juanma0x
Copy link

Juanma0x commented Oct 2, 2024

What else is needed for this PR to be merged? Are more approvals required, or does it depend on other developments?

@gpestana
Copy link
Contributor Author

gpestana commented Oct 2, 2024

What else is needed for this PR to be merged? Are more approvals required, or does it depend on other developments?

We need more approvals from high enough fellows to merge.

@ggwpez ggwpez mentioned this pull request Oct 2, 2024
@ggwpez ggwpez self-requested a review October 2, 2024 13:18
relay/kusama/src/lib.rs Outdated Show resolved Hide resolved
relay/kusama/src/lib.rs Outdated Show resolved Hide resolved
relay/kusama/src/lib.rs Outdated Show resolved Hide resolved
@gpestana
Copy link
Contributor Author

gpestana commented Oct 2, 2024

On runtime upgrade runs with a recent snapshot:

Polkadot

[2024-10-02T14:36:21Z INFO  try_runtime_core::common::misc_logging] ------------------------------------------------------------------


[2024-10-02T14:36:21Z INFO  try_runtime_core::common::misc_logging] 🔬 Running TryRuntime_on_runtime_upgrade with checks: PreAndPost


[2024-10-02T14:36:21Z INFO  try_runtime_core::common::misc_logging] ------------------------------------------------------------------


[2024-10-02T14:36:41Z INFO  runtime::polkadot] try-runtime::on_runtime_upgrade polkadot.

(...)

[2024-10-02T14:36:41Z INFO  runtime::polkadot] migrations::corrupted_ledgers: ledger of 5e510306a89f40e5520ae46adcc7a4a1bbacf27c86c163b0691bbbd7b5ef9c10 (5ECNRY7q...) restored (with force unstake).
[2024-10-02T14:36:41Z INFO  runtime::polkadot] migrations::corrupted_ledgers: ledger of a6379e16c5dab15e384c71024e3c6667356a5487127c291e61eed3d8d6b335dd (5FpeKyF3...) restored.
[2024-10-02T14:36:41Z INFO  runtime::polkadot] migrations::corrupted_ledgers: ledger of 6c3e8acb9225c2a6d22539e2c268c8721b016be1558b4aad4bed220dfbf01fea (5EWdcCGJ...) restored.
[2024-10-02T14:36:41Z INFO  runtime::polkadot] migrations::corrupted_ledgers: ledger of 4458ad5f0c082da64610607beb9d3164a77f1ef7964b7871c1de182ea7213783 (5DcKTQ71...) restored.
[2024-10-02T14:36:41Z INFO  runtime::polkadot] migrations::corrupted_ledgers: done. success: 4, error: 0

Kusama

[2024-10-02T14:39:54Z INFO  try_runtime_core::common::misc_logging] ------------------------------------------------------------------


[2024-10-02T14:39:54Z INFO  try_runtime_core::common::misc_logging] 🔬 Running TryRuntime_on_runtime_upgrade with checks: PreAndPost


[2024-10-02T14:39:54Z INFO  try_runtime_core::common::misc_logging] ------------------------------------------------------------------


[2024-10-02T14:40:19Z INFO  staging_kusama_runtime] try-runtime::on_runtime_upgrade kusama.

(...)

[2024-10-02T14:40:19Z INFO  runtime::kusama] migrations::corrupted_ledgers: ledger of 52559f2c7324385aade778eca4d7837c7492d92ee79b66d6b416373066869d2e (5DvfDdum...) restored.
[2024-10-02T14:40:19Z INFO  runtime::kusama] migrations::corrupted_ledgers: ledger of 31162f413661f3f5351169299728ab6139725696ac6f98db9665e8b76d73d300 (5DB4nzJ4...) restored.
[2024-10-02T14:40:19Z INFO  runtime::kusama] migrations::corrupted_ledgers: ledger of 3a8012a52ec2715e711b1811f87684fe6646fc97a276043da7e75cd6a6516d29 (5DPQgPws...) restored.
[2024-10-02T14:40:19Z INFO  runtime::kusama] migrations::corrupted_ledgers: done. success: 3, error: 0

@gpestana
Copy link
Contributor Author

gpestana commented Oct 2, 2024

/merge

@fellowship-merge-bot fellowship-merge-bot bot merged commit 065c332 into polkadot-fellows:main Oct 2, 2024
48 checks passed
@fellowship-merge-bot
Copy link
Contributor

Enabled auto-merge in Pull Request

Available commands
  • /merge: Enables auto-merge for Pull Request
  • /merge cancel: Cancels auto-merge for Pull Request
  • /merge help: Shows this menu

For more information see the documentation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants