Custom ArchivesSpace EAD Importer for MOMA EADs
This is an ArchivesSpace plugin and can be installed following the directions here.
The plugin adds a new importer to the application with the id "moma_ead_xml". This is a subclass of the standard EAD importer that ships with ArchivesSpace 1.0.9.
The custom importer does the following:
-
Assign a level of 'file' to any component missing a level attribute.
-
Use the 'eadid' tag to populate the id_0 field.
-
Strip out 'unitdate' tags appearing in 'unittitle' tags when setting the resource or component title.
-
Strip out 'lb' tags when creating extent records from 'physdesc' tags. Simplify the logic for parsing 'physdesc' as notes from 'physdesc' tags are not required.
-
Set 'indicator_1' attribute of 'container' records to 'BLANK' when not present to ensure that records are valid.
-
Default 'extent_type' to 'linear_feet' when missing so that records import.
-
Default compontent titles 'Untitled' when missing so that records import.
-
Ignore empty 'corpname' tags.
-
Ignore notes that have empty content so that records import.
-
Set date labels when present in source XML rather than using 'creation'.
Theses customizations are specific to version 1.0.9 of ArchivesSpace and may not work with later versions.
This package also contains a stand-alone script for replacing HTML character entities with the numeric equivalents that are expected by XML parsers. To run the script:
./scripts/replace_entities.rb {directory_containing_eads} {blank_directory}