This repository is for experimental scripts to align books between HathiTrust, Internet Archive, Google Books, etc.
By "alignment", I mean that for a given volume in one repository, I want to try to find any matching volumes in the other repositories.
Ultimately, I want to be able to mash in a HT/IA/GB/etc. URL or other identifier and get a list of potential matches elsewhere on the web.
make
curl
- Ruby
The default make
target should download and run everything.
WARNING: this currently produces about 4.3GB of output.
The book-aligner.rb
script uses bulk metadata downloads from HathiTrust and the Internet Archive to find the complete set of identifiers that have any matching OCLC/LCCN/ISSN/ISBN identifier (~41M matches). These results are then filtered down to those that have a matching volume number or publication year.
Because there's no freely-available bulk metadata download for Google Books, we'll have to rely on the 1.1M associations we get for free from Internet Archive metadata.
The second component of this project is a GitHub Pages HTML frontend which includes a small JavaSript library that queries book-aligner.rb
matches loaded into Fusion Tables. The code for this is in js/book-aligner.coffee
.
Some examples of what I want for "matching volumes":
- Neue Jahrbücher für Philologie und Paedogogik, bd. 135. Note that the IA metadata does not record the volume number.
- Opvscvla academica collecta et animadversionibvs locvpletata, vol. 1. Note that if you search Google Books for the OCLC number
9772746
associated with this volume in IA/HT, it only returns vol. 5.