From 942c7c66bf3a9232b57110261989eee2dc959eb1 Mon Sep 17 00:00:00 2001 From: Daniel Date: Fri, 17 Mar 2023 18:18:05 +0100 Subject: [PATCH 1/4] docs: add sketch for restarting vega from remote snapshot --- .../how-to/restart-from-remote-snapshot.md | 108 ++++++++++++++++++ 1 file changed, 108 insertions(+) create mode 100644 docs/node-operators/how-to/restart-from-remote-snapshot.md diff --git a/docs/node-operators/how-to/restart-from-remote-snapshot.md b/docs/node-operators/how-to/restart-from-remote-snapshot.md new file mode 100644 index 000000000..fe4888bc6 --- /dev/null +++ b/docs/node-operators/how-to/restart-from-remote-snapshot.md @@ -0,0 +1,108 @@ +# Restart vega from remote snapshot + +### Let's assume the following things: + +You have configured the vega core & data-node +You can access one of the data nodes to collect its network history. +You can get trusted block details (I will describe how to do it later) + + +### 1. Stop the network + +```shell +systemctl stop vega; +systemctl stop datanode + +# or vegavisor +systemctl stop vegavisor +``` + +### 2. Update bootstrap peers in the data node config. + +You have to do it to be able to fetch the data node snapshots. + +- `NetworkHistory.Enabled` - update this value to `true` +- `NetworkHistory.Store.BootstrapPeers` - provide at least one valid bootstrap peer + +You have to ask someone who is already running data-node to get BootstrapPeers. + +### 3. Collect trusted block info from someone's data node + +To do it, visit the `https:///api/v2/snapshots` page and get the following information about one block: +- `blockHeight` +- `blockHash` + +We recommend getting one of the newest blocks but not the last one. It can be any existing block on one of your validators mentioned in the tendermint config. It does not determine the block height you want to load. + +### 4. Update tendermint config +Update the `statesync` section in the <>/config/config.toml + +```toml +[statesync] +enable = true +trust_height = <> +trust_hash = "<>" +``` + +### 5. Enable network sync from network history + +Update the </config/data-node/config.toml file: + +```toml +AutoInitialiseFromNetworkHistory = true + + +[NetworkHistory] + Enabled = true +``` + +### 6. Check the vega core config + +Check if you have `StartHeight` set to `-1` in <>/config/node/config.toml + +```toml +[Snapshot] + StartHeight = -1 + +[Broker.Socket] + DialTimeout = "4h" # increase this value. Otherwise, it may fail. +``` + +### 7. Call unsafe reset all + +```shell +vega unsafe_reset_all --home <> +vega tm unsafe-reset-all --home <> +rm -r <>/state/data-node/* +``` + +### 8. Start your node + +```shell +systemctl start vega +systemctl start datanode + +# or visor +systemctl start vegavisor +``` + +### 9. Revert config + +When your node started correctly you may want to revert changes you made to + +- vegavisor config +- vega config +- tendermint config +- data node config + +We recommend to do it to avoid starting from snapshot next time. Usually when your node is not too far behind, it is faster to reply missing block than load everything from network. + +### Update visor timeout + +Only when you are running vegavisor. Modify the <>/config.toml + +Visor tries to connect every 2 sec. Just put some big value like `7200` - (7200*2 sec = 4h) + +```toml +maxNumberOfFirstConnectionRetries = 7200 +``` From f566753707ae3f7ebab893e705812c19541775a7 Mon Sep 17 00:00:00 2001 From: Daniel Date: Fri, 24 Mar 2023 17:23:49 +0100 Subject: [PATCH 2/4] feat: add missing information for rpc_servers in restart from remote snapshot docs --- docs/node-operators/how-to/restart-from-remote-snapshot.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/docs/node-operators/how-to/restart-from-remote-snapshot.md b/docs/node-operators/how-to/restart-from-remote-snapshot.md index fe4888bc6..a25afb181 100644 --- a/docs/node-operators/how-to/restart-from-remote-snapshot.md +++ b/docs/node-operators/how-to/restart-from-remote-snapshot.md @@ -42,8 +42,13 @@ Update the `statesync` section in the <>/config/config.toml enable = true trust_height = <> trust_hash = "<>" +rpc_servers = "<>" ``` +:::note +You have to ask someone for tendermint RPC server. The best server is the one you collected trusted block height and hash. But it can be any server which has trusted block. +::: + ### 5. Enable network sync from network history Update the </config/data-node/config.toml file: From d19cd1db64c8eb25778dcbfe3cd9b01ef5d4a325 Mon Sep 17 00:00:00 2001 From: Daniel Date: Wed, 29 Mar 2023 14:51:24 +0200 Subject: [PATCH 3/4] docs: add timeout for network history init info --- docs/node-operators/how-to/restart-from-remote-snapshot.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/docs/node-operators/how-to/restart-from-remote-snapshot.md b/docs/node-operators/how-to/restart-from-remote-snapshot.md index a25afb181..7a834543b 100644 --- a/docs/node-operators/how-to/restart-from-remote-snapshot.md +++ b/docs/node-operators/how-to/restart-from-remote-snapshot.md @@ -49,7 +49,7 @@ rpc_servers = "<>" You have to ask someone for tendermint RPC server. The best server is the one you collected trusted block height and hash. But it can be any server which has trusted block. ::: -### 5. Enable network sync from network history +### 5. Enable network sync from network history and update network history init timeout Update the </config/data-node/config.toml file: @@ -59,6 +59,9 @@ AutoInitialiseFromNetworkHistory = true [NetworkHistory] Enabled = true +... +[NetworkHistory.Initialise] + TimeOut = "4h" ``` ### 6. Check the vega core config From 7dbe51ab72f44f4167d2e39b355a21409d47ec62 Mon Sep 17 00:00:00 2001 From: Daniel Date: Fri, 28 Apr 2023 12:56:02 +0100 Subject: [PATCH 4/4] feat: add the database recreation step --- .../how-to/restart-from-remote-snapshot.md | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/docs/node-operators/how-to/restart-from-remote-snapshot.md b/docs/node-operators/how-to/restart-from-remote-snapshot.md index 7a834543b..bea7056e3 100644 --- a/docs/node-operators/how-to/restart-from-remote-snapshot.md +++ b/docs/node-operators/how-to/restart-from-remote-snapshot.md @@ -6,7 +6,6 @@ You have configured the vega core & data-node You can access one of the data nodes to collect its network history. You can get trusted block details (I will describe how to do it later) - ### 1. Stop the network ```shell @@ -47,7 +46,7 @@ rpc_servers = "<>" :::note You have to ask someone for tendermint RPC server. The best server is the one you collected trusted block height and hash. But it can be any server which has trusted block. -::: +:::github.com/ ### 5. Enable network sync from network history and update network history init timeout @@ -78,12 +77,21 @@ Check if you have `StartHeight` set to `-1` in <>/config/node/config. ### 7. Call unsafe reset all +Reset the chain data state: + ```shell vega unsafe_reset_all --home <> vega tm unsafe-reset-all --home <> rm -r <>/state/data-node/* ``` +Recreate the PostgreSQL database: + +```sql +DROP database <>; +CREATE DATABASE <> WITH owner=<>; +``` + ### 8. Start your node ```shell