This section describes how to get the Sparv backend up and running on your own machine. Begin by downloading the latest version from the GitHub repository.
These instructions are for installation on a UNIX-like environment. For more information on the different configuration variables check the developer's guide.
The Sparv backend is configured to be run in combination with the catapult which is also available on GitHub (see below).
- Version 3 of the Sparv corpus pipeline (see the technical report for installation instructions)
- The Sparv catapult
- Python 3.4 or newer
- A WSGI server (optional but recommended)
- GCC for compiling the
catapult
C extension
- Set up a Python virtual environment for the backend and install the requirements from
backend/html/app/requirements.txt
. - Set the backend configuration variables in
backend/html/app/config.py
. - When running the backend with gunicorn (recommended) set the gunicorn
configuration in
backend/html/app/gunicorn_config.py
. - Set up the catapult, for instance in
backend/data/catapult
. - Set up a Python virtual environment for the catapult and install the requirements from
catapult/requirements.txt
. - Set the catapult configuration variables in
catapult/config.sh
. - From within the
catapult
directory runmake
to buildcatalaunch
. - Set up the Sparv pipeline, for instance in
backend/data/pipeline
and build the pipeline models. You will not need to setup a virtual environment for the pipeline. Set theVENV_PATH
to the catapult virtual environment. - Start the catapult by running
./start-server.sh
(from within thecatapult
directory). - Run the script
index.py
in thebackend
directory. This can be done by running it directly from the python interpreter or by starting a WSGI server using gunicorn (recommended):html/app/venv/bin/gunicorn -c html/app/gunicorn_config.py index
. - Set up the cron jobs listed in
catapult/cronjobs
for the automatic maintenance of Sparv.