Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Deployment guide rewrite #853

Merged
merged 1 commit into from
Aug 22, 2023
Merged

Conversation

carver
Copy link
Collaborator

@carver carver commented Aug 21, 2023

Fleshed out a lot of instructions, after deploying last week

Fixes #846

To-Do

  • Clean up commit history

Copy link
Collaborator

@njgheorghita njgheorghita left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢 🦩 🌊


## Communicate
### Update docker images
Docker images are how Ansible moves the binaries to the nodes. Update the docker tags with:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth leaving a short note here about what tags are used for what. eg.

Our CI pushes the latest & latest-bridge images to the docker registry for every pr that's merged into master. Our network uses the testnet & bridge tags to build from. So, to deploy, we need to pull the latest latest & latest-bridge tags, rename them, and then push the newly tagged images to the registry.

Ok, that might be a little wordy. But, I'm just thinking it might be helpful to explain why we do these steps so people can have a reference in case something unexpected happens.

Notify in Discord chat about the new release being complete, and the network nodes being updated.
### Run ansible
- Check monitoring tools to understand network health, and compare against post-deployment, eg~
- [Glados](http://glados.ethportal.net/content/)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding the trin-bench site here (https://trin-bench.ethdevops.io/login) to evaluate pre/post deploy network health is also valuable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻 I'll put a link straight to the trin metrics dashboard. Some follow-up work would be cool to write up some hints about what to look for in Grafana, and link to that here.

You might see this during a deployment:
> fatal: [trin-ams3-18]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: connect to host 178.128.253.26 port 22: Connection timed out", "unreachable": true}

Retry once more. If it times out again, ask `@paulj` to reboot the machine.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooooh, nice! Did this actually get resolved? I'm quite familiar with that machine always failing and typically just let it fail since it was the only one that failed, so i'm glad to see that it was resolved!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, "resolved" as in he rebooted it, and the next deploy succeeded. But, maybe it will fail again at the next deploy? 🤷🏻‍♂️

After deploying last week
@carver carver merged commit 1933704 into ethereum:master Aug 22, 2023
4 checks passed
@carver carver deleted the deployment-guide branch August 22, 2023 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

docs: move deployment guide into mdbook
2 participants