CKAN is a powerful data management system that makes data accessible – by providing tools to streamline publishing, sharing, finding and using data. CKAN is a key component consumed by the PDND project.
PDND stays for "Piattaforma Digitale Nazionale Dati" (the Italian Digital Data Platform), previously known as Data & Analytics Framework (DAF).
You can find more informations about the PDND on the official Digital Transformation Team website.
The tools used in this repository are
-
CKAN version 2.6.7 with the extensions listed at the end of this document.
-
Solr version 6.2, packaged for CKAN and with some customizations. Solr code is available here.
-
PostgreSQL version 10.1, modified for CKAN. The container is available here. The image is tagged
latest
. -
Redis version 5.0.5. Redis is automatically pulled in as a dependency from its official Docker repository.
-
Datapusher commit 0.0.15(coming soon)
In this repository, CKAN and its related tools are redistributed as a set of Docker containers interacting with one each other.
The Dockerfile
and the docker-compose.yml
files are in the root of this repository.
NOTE: the
docker-compose.yml
file sets different environment variables that could be used to adapt and customized many platform functionalities, read more in "Environment variables" section below.
If you want a CKAN instance up and running, follow these steps.
- Create and enter an empty folder:
mkdir ckan-it && cd ckan-it/
(or use the name you prefer) - Download the
docker-compose.yml
from here - Pull and run all containers:
docker-compose up -d
After a while you can open the CKAN home http://localhost:5000 and login with the provided credentials.
You can follow the log stream running docker-compose logs -f
(then ctrl+c to exit).
The following default credentials can be used to access the portal (you should change them after the first login).
Username: ckanadmin
Password: ckanpassword
If you only want to run a CKAN instance and use it to manage and publish your own data, you can stop here. In a production environment you can install and setup a proxy server in front of CKAN with https support.
WARNING: all data are stored in Docker named volumes! In a production environment you should mount these volumes on local folders updating the docker-compose configuration accordingly.
To bring down the test environment and remove the containers use docker-compose down
.
If you want to build local images instead of pull them from Dockerhub, ie. for testing pourpose, you need some extra steps.
- Clone this repo:
git clone https://github.com/italia/ckan-it.git
(if you want to clone the repo in a folder other thanckan-it/
add the name you want after the previous command, ie.git clone https://github.com/italia/ckan-it.git my_custom_folder
) - Enter the created folder:
cd ckan-it/
(or the name you have chosen in previous step, ie.cd my_custom_folder/
) - Change working branch if needed:
git checkout branch-name
- Initialize submodules:
git submodule update --init --recursive
- Build images:
docker-compose -f docker-compose.yml -f docker-compose.build.yml build
- Run all containers using built images:
docker-compose up -d
(if you want to check logs rundocker-compose logs -f
)
If you want to import data from external sources, follow these additional steps.
WARNING: note that if CKAN_HARVEST
variable in docker-compose is not set to "true"
no organizations and sources are initially loaded, so you must use the GUI to manually add new organizations and sources of your choice before next steps.
- Browse to http://localhost:5000/harvest to check all available sources
- Identify the name of the CKAN Container and run the following command:
docker exec -it pdnd-ckan /ckan-harvest.sh
You can see logs during harvesting import with following command: docker-compose logs -f
.
You can find more logs in /var/log/ckan
folder inside the container.
Schedule a CRON job on the host machine to run the /ckan-harvest.sh
script at the root of the file system of the CKAN container.
How to do this really depends on how you run the containers. When running containers with docker-compose for instance we did this by getting the container id and using docker-exec
to run a command inside the container, as follows:
docker exec -it pdnd-ckan /ckan-harvest.sh 2>&1 /var/log/periodic-harvest.out
So you can schedule a periodic run of the above script, ie. every hour, with CRON on the host machine.
The italia/ckan-it-harvesters repository contains all sources harvested by the national catalog of the PDND. If you want to clone it in your environment you must follow some additional steps:
- Check if
data/init/harvesters
folder exists, if not add it runninggit submodule add https://github.com/italia/ckan-it-harvesters data/init/harvesters
- Add
CKAN_HARVEST="true"
environment variable to the ckan service indocker-compose.yml
(ie. seedocker-compose.harvest.yml
) - Run containers:
docker-compose up -d
- Wait for organizations and harvest sources loading, then run
docker exec -it pdnd-ckan /ckan-harvest.sh
- Follow previous section to setup a periodic harvesting
The following environment variables are mandatory and should be set in order to deploy CKAN. The docker-compose.yml
file in this repository applies some exemplar values, to be used for demos and local tests.
-
CKAN_DEBUG (format: {"true"|"false"}) - Whether to activate or not the debug log messages. It should always be false for production environments.
-
CKAN_HARVEST (format: {"true"|"false"}) - Whether to activate or not the built-in harvesters. It should be false if you want to build your own catalog.
-
CKAN_SITE_URL - The base URL of your CKAN deployment.
-
CKAN_ADMIN_EMAIL - The email address of the local admin user.
-
CKAN_ADMIN_USERNAME - The user name of the local admin user.
-
CKAN_ADMIN_PASSWORD - The password of the local admin user.
-
CKAN_DB_HOST - The host name of the CKAN PostgreSQL database.
-
CKAN_DB_PORT - The port of the CKAN PostgreSQL database.
-
CKAN_DB_USER - The user name of the CKAN PostgreSQL database.
-
PGPASSWORD - The password of the CKAN PostgreSQL database.
-
CKAN_SQLALCHEMY_URL (format: {postgresql://{CKAN_DB_USER}:{PGPASSWORD}@{CKAN_DB_HOST}:{CKAN_DB_PORT}/}) - The connection string to your PostgreSQL database.
-
CKAN_REDIS_HOST - The host name of your Redis service.
-
CKAN_REDIS_PORT - The port of your Redis service.
-
CKAN_REDIS_URL (format: redis://{CKAN_REDIS_HOST}:/{CKAN_REDIS_PORT}) - The full address of the Redis service.
-
CKAN_SOLR_HOST - The host name of the Solr service.
-
CKAN_SOLR_PORT - The port of the Solr service.
-
CKAN_SOLR_URL (format: http://{CKAN_SOLR_HOST}:{CKAN_SOLR_PORT}/solr/ckan) - The full URL of the Solr service.
- stats
- view
- text_view
- image_view
- recline_view
- datastore
- spatial (commit c5c8451)
- spatial_metadata
- spatial_query
- harvest (tag v1.1.4)
- ckan_harvester
- multilang (commit fa8da32)
- multilang_harvester
- dcat (tag v0.0.9)
- dcat_rdf_harvester
- dcat_json_harvester
- dcat_json_interface
- dcatapit (commit 48f352b)
- dcatapit_pkg
- dcatapit_org
- dcatapit_config
- dcatapit_harvester
- dcatapit_csw_harvester
- dcatapit_harvest_list
- dcatapit_subcatalog_facets
Contributions are welcome. Feel free to open issues and submit a pull request at any time!
This repository is very specific to the PDND project that could be used as an example. Meanwhile, the community is working on an generic, redistributable version.