Skip to content

Credentials Setup

Martin Olveyra edited this page Apr 13, 2023 · 6 revisions

Previous Chapter: Introduction


shub-workflow library depends on scrapinghub python bindings library. For operation on ScrapyCloud (SC), the client from scrapinghub library needs either an explicit passing of SC credentials in its constructor, or to setup it via the environment variable SH_APIKEY. shub-workflow chooses the second alternative, so you need to setup this environment variable somehow in your project. There are several alternatives. The recommended approach is to add a setting with the ScrapyCloud apikey in zyte dash settings and set it as environment variable in the project settings.py file, e.g (assuming the zyte project setting is named SC_APIKEY, edit according to your case):

import os
(...)
from shub_workflow.utils import kumo_settings

(...)

settings_from_kumo = kumo_settings()
if "SC_APIKEY" in settings_from_kumo:
    os.environ["SH_APIKEY"] = settings_from_kumo["SC_APIKEY"]

This approach avoids to hard code the SC apikey directly on settings.py. Of course, if you need to run shub-workflow scripts in your local environment, you will not have access to kumo settings, so you need to set up SH_APIKEY environment variable locally.

There are even safer alternatives, that difficult visibility of this credential. For example, if your project is based on a Dockerfile image, and assuming you have the SH_APIKEY environment variable set locally, you can add the following lines to Dockerfile:

ARG SHUB_APIKEY
ENV SH_APIKEY $SHUB_APIKEY

and, at the deploy time with shub:

$ shub image upload <target> -b SHUB_APIKEY="$SH_APIKEY"

The -b option of shub passes building arguments to docker build. In this case, the building argument SHUB_APIKEY is set with the value of the local SH_APIKEY environment variable. The ARG directive in Dockerfile declares that SHUB_APIKEY is an accepted building argument. The ENV line then sets the variable environment SH_APIKEY using the value of the passed argument. The final effect is that the SH_APIKEY is only visible within the docker container once instantiated. This method ensures that only the developers involved in the project need to know the project api key. For extra safety you may rely on bitbucket buildbot. The shub image upload line above will be specified in the bitbucket-pipelines.yml file, and the local SH_APIKEY environment variable set up via bitbucket repository settings->repository variables. In this case, only the repository admin need to have access to the apikey.

May be a more practical alternative, without using a custom docker image, would be to include the mentioned Dockerfile lines in the scrapinghub-scrapy-stack Dockerfile and make available the '-b' option to 'shub deploy' command. At the moment of writing these lines, this feature is not available, though.


Next Chapter: Crawl Managers