Spark, Jupyterlab, and other Data Science tooling via Docker Swarm
- Docker
- docker-compose
- Existing Docker registry for storing images
- Existing caching layer for .deb packages
Edit the .env
file.
https://docs.docker.com/engine/swarm/swarm-mode/
docker swarm init --advertise-addr=192.168.1.113 --listen-addr=0.0.0.0
Setup across all nodes using provided token and command
Nodes:
- 192.168.1.113 - Asus-Blue (master)
- 192.168.1.145 - Windows WSL2 (worker)
- 192.168.1.105 - Alienware (worker)
- 192.168.1.124 - Laptop (worker)
https://docs.docker.com/engine/swarm/manage-nodes/ Check the status of the swarm cluster
docker node ls
https://docs.docker.com/engine/swarm/stack-deploy/
Use existing Docker image registry
Add the following to your daemon.json
Docker file:
"insecure-registries": ["192.168.1.226:5000"]
docker service ls
Download Spark and copy it to sparkmaster
, sparkworker
, and jupyterlab
.
Download livy and copy it to sparkmaster
.
Download Spark NLP .jar and copy it to sparkmaster
, sparkworker
, and jupyterlab
.
Build and save the images on the local registry, then deploy:
./build-deploy.sh
Check the status of the stack
docker stack ls
docker stack services spark
Get the full details
docker stack ps spark --no-trunc
Open up the Spark web-ui
Open up the Jupyterlab web-ui
Look at workers as they execute
Look at the submitter
Look at Livy
Bring everything down
docker stack rm spark
docker swarm leave --force
Download newer upstream versions by running ./build-deploy.sh
nload
- for live network usagehtop
- for live CPU and RAM usage