Skip to content

europeana/sitemap

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Europeana Sitemap

Generates and publishes a record and entity sitemap for www.europeana.eu

The record sitemap is generated by connecting to a Mongo server and listing all records (with a minimum content tier and meta data tier). The entity sitemap uses the search functionality of Entity-API to retrieve all entities used on the Europeana website.

For both, the generated sitemap consists of:

  • multiple sitemap files containing record urls (45,000 resp. 20,000 per file)
  • a sitemap index file listing all the sitemap files

To make sure there is always a sitemap available, we use blue/green versions of the sitemap files and we keep track which one is 'active'. At the start of the update process all files of the inactive blue/green version are deleted first. Then the new sitemap files are created and the active version is switched from blue to green or vice versa.

For more information about sitemaps in general see also https://support.google.com/webmasters/answer/183668?hl=en

Build

mvn clean install (add -DskipTests) to skip the unit tests during build

Run locally

You can run the application directly in your IDE (select 'Run' on SitemapApplication class) For debugging purposes you can use the following urls:

  • /files shows a list of stored files

  • /file?name=x shows the contents of the stored file with the name x

  • /record/index.xml and /entity/index.xml shows the contents of the sitemap index files

Note that you can only run /record/update or /entity/update manually if you configure and provide an administrator apikey e.g. /record/update?wskey=<enter_adminkey_here>

Deployment

  1. Generate a Docker image using the project's Dockerfile

  2. Configure the application by generating a sitemap.user.properties file and placing this in the k8s folder. After deployment this file will override the settings specified in the sitemap.properties file located in the src/main/resources folder. The .gitignore file makes sure the .user.properties file is never committed.

  3. Configure the deployment by setting the proper environment variables specified in the configuration template files in the k8s folder

  4. Deploy to Kubernetes infrastructure

  5. To run a sitemap update deploy the same image, but add either record or entity on the command-line. This can also be deployed as a Kubernetes cron job