Generates and publishes a record and entity sitemap for www.europeana.eu
The record sitemap is generated by connecting to a Mongo server and listing all records (with a minimum content tier and meta data tier). The entity sitemap uses the search functionality of Entity-API to retrieve all entities used on the Europeana website.
For both, the generated sitemap consists of:
- multiple sitemap files containing record urls (45,000 resp. 20,000 per file)
- a sitemap index file listing all the sitemap files
To make sure there is always a sitemap available, we use blue/green versions of the sitemap files and we keep track which one is 'active'. At the start of the update process all files of the inactive blue/green version are deleted first. Then the new sitemap files are created and the active version is switched from blue to green or vice versa.
For more information about sitemaps in general see also https://support.google.com/webmasters/answer/183668?hl=en
mvn clean install
(add -DskipTests
) to skip the unit tests during build
You can run the application directly in your IDE (select 'Run' on SitemapApplication class) For debugging purposes you can use the following urls:
-
/files
shows a list of stored files -
/file?name=x
shows the contents of the stored file with the name x -
/record/index.xml
and/entity/index.xml
shows the contents of the sitemap index files
Note that you can only run /record/update
or /entity/update
manually if you configure and provide an
administrator apikey e.g. /record/update?wskey=<enter_adminkey_here>
-
Generate a Docker image using the project's Dockerfile
-
Configure the application by generating a
sitemap.user.properties
file and placing this in the k8s folder. After deployment this file will override the settings specified in thesitemap.properties
file located in the src/main/resources folder. The .gitignore file makes sure the .user.properties file is never committed. -
Configure the deployment by setting the proper environment variables specified in the configuration template files in the k8s folder
-
Deploy to Kubernetes infrastructure
-
To run a sitemap update deploy the same image, but add either
record
orentity
on the command-line. This can also be deployed as a Kubernetes cron job