docker-spark-datascience

Spark, Jupyterlab, and other Data Science tooling via Docker Swarm

Requirements

Docker
docker-compose
Existing Docker registry for storing images
Existing caching layer for .deb packages

Setup

Edit the .env file.

https://docs.docker.com/engine/swarm/swarm-mode/

docker swarm init --advertise-addr=192.168.1.113 --listen-addr=0.0.0.0

Setup across all nodes using provided token and command

Nodes:

192.168.1.113 - Asus-Blue (master)
192.168.1.145 - Windows WSL2 (worker)
192.168.1.105 - Alienware (worker)
192.168.1.124 - Laptop (worker)

https://docs.docker.com/engine/swarm/manage-nodes/ Check the status of the swarm cluster

docker node ls

https://docs.docker.com/engine/swarm/stack-deploy/

Use existing Docker image registry

Add the following to your daemon.json Docker file:

"insecure-registries": ["192.168.1.226:5000"]

docker service ls

Download Spark and copy it to sparkmaster, sparkworker, and jupyterlab.

Download livy and copy it to sparkmaster.

Download Spark NLP .jar and copy it to sparkmaster, sparkworker, and jupyterlab.

Build and save the images on the local registry, then deploy:

./build-deploy.sh

Check the status of the stack

docker stack ls
docker stack services spark

Get the full details

docker stack ps spark --no-trunc

Open up the Spark web-ui

http://localhost:8090

Open up the Jupyterlab web-ui

http://localhost:8888

Look at workers as they execute

http://localhost:8081

Look at the submitter

http://localhost:4040

Look at Livy

http://localhost:8998

Teardown

Bring everything down

docker stack rm spark
docker swarm leave --force

Updates

Download newer upstream versions by running ./build-deploy.sh

Debugging

nload - for live network usage
htop - for live CPU and RAM usage

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
jupyterlab		jupyterlab
sparkmaster		sparkmaster
sparkworker		sparkworker
.dockerignore		.dockerignore
.gitignore		.gitignore
README.md		README.md
build-deploy.sh		build-deploy.sh
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

docker-spark-datascience

Requirements

Setup

Teardown

Updates

Debugging

About

Releases

Packages

Languages

carlsonp/docker-spark-datascience

Folders and files

Latest commit

History

Repository files navigation

docker-spark-datascience

Requirements

Setup

Teardown

Updates

Debugging

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages