Skip to content

Latest commit

 

History

History
13 lines (9 loc) · 1.25 KB

README.md

File metadata and controls

13 lines (9 loc) · 1.25 KB

jupyter-ds

Docker Setup for Interactive Data Science; The Image contains Anaconda, Jupyter, Java8, PixieDust, Spark, Scala 2.11.8, Dataframe Profiling; with example notebook combining all together.

Anaconda Version 4.0.0; all supported packages https://docs.continuum.io/anaconda/packages/pkg-docs

The image extending the pixiedust docker image https://github.com/markwatsonatx/Dockerfiles/blob/master/pixiedust-python35-1.0.5/Dockerfile. More information on the project and usage: https://ibm-watson-data-lab.github.io/pixiedust/use.html

The Dataframe profiling package source: https://github.com/julioasotodv/spark-df-profiling

Building the image: docker build -t ${YOUR_TAG} . Running the image: docker run -p 8888:8888 -v ${PATH_TO_YOUR_DATA_DIR}:/usr/notebooks/shared -d ${YOUR_TAG}.

  • ${PATH_TO_YOUR_DATA_DIR} will be mounted on /usr/notebooks/shared in the container where u can easily access your data from spark
  • The jupyter UI will be available on localhost:8888. The default directory will contain PixieDust Tutorial examples, and a data directory with a sample notebook using PixieDust with the Dataframe profiling package. The shared directory shall contain all the user files which are mounted from ${PATH_TO_YOUR_DATA_DIR}