ts-flint is a collection of modules related to time series analysis for PySpark.
You can build flint by running this in the top level of this repo:
python setup.py install
# -or-
pip install .
You can also install directly from gitlab clone:
make dist
This will create a jar under target/scala-2.11/flint-assembly-{VERSION}-SNAPSHOT.jar
You can use ts-flint with PySpark by:
pyspark --jars /path/to/flint-assembly-{VERSION}-SNAPSHOT.jar --py-files /path/to/flint-assembly-{VERSION}-SNAPSHOT.jar
or
>>> import os
>>> import ts.flint
>>> ts.flint.__file__[len(os.getcwd()):]
'/ts/flint/__init__.py'
You can also run ts-flint from within a jupyter notebook. First, create a virtualenv or conda environment containing pandas and jupyter.
conda create -n flint python=3.5 pandas notebook
source activate flint
py.test
Then visit http://localhost:8080.
Make sure pyspark is in your PATH. Then, from the flint project dir, start pyspark with the following options:
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS="notebook --NotebookApp.open_browser=False --NotebookApp.ip='$(hostname)' --NotebookApp.port=8888"
pyspark --master=local --jars /path/to/flint-assembly-{VERSION}-SNAPSHOT.jar --py-files /path/to/flint-assembly-{VERSION}-SNAPSHOT.jar
The Flint python bindings are documented at https://ts-flint.readthedocs.io/en/latest