- Report of media handling procedures related to objects that are related to a group procedure
- For a given set of objects, change object hierarchy relationships to nonhierarchical relationships
- Export objectNumber, objectProductionDate, objectName, and namedCollection for all collectionobjects
- Publish all media handling procedure records to "All"
- Is an authority term used in a field in deleted object records?
- Copy value from old, single-valued client-custom field to new, multi-valued community profile field
- Set object publishTo value to public browser for conditionally selected set of objects
- Indexing errors related to
annotationdate
field
Note
|
This client’s profile has a descriptionlevel field in the object record.
|
Object selection criteria:
-
descriptionlevel
=item
orgroup of items
-
responsibledepartment
=archaeology
,archaeology-nagpra
, orethnographic
-
objectnumber
begins withA
First, we find all existing hierarchical relationships where the selected objects are children of other objects.
WITH depts AS (
SELECT id, STRING_AGG(item, '|') AS departments
FROM collectionobjects_common_responsibledepartments
GROUP BY id
),
children AS (
SELECT DISTINCT cc.id, cc.objectnumber, hier.name AS csid, cohc.descriptionlevel, depts.departments AS dept
FROM collectionobjects_common AS cc
INNER JOIN collectionobjects_ohc AS cohc ON
cc.id = cohc.id AND
(cohc.descriptionlevel = 'urn:cspace:cspace.client.org:vocabularies:name(descriptionlevel):item:name(item)''item'''
OR
cohc.descriptionlevel = 'urn:cspace:cspace.client.org:vocabularies:name(descriptionlevel):item:name(groupofitems1553529920283)''group of items'''
)
INNER JOIN collectionobjects_common_responsibledepartments AS rd ON
cc.id = rd.id AND
( rd.item = 'archaeology'
OR
rd.item = 'archaeology-nagpra'
OR
rd.item = 'ethnographic'
)
INNER JOIN misc ON misc.id = cc.id AND misc.lifecyclestate != 'deleted'
INNER JOIN hierarchy AS hier ON cc.id = hier.id
LEFT JOIN depts ON cc.id = depts.id
WHERE
cc.objectnumber LIKE 'A %'
)
SELECT children.objectnumber AS childobjnum,
REGEXP_REPLACE(children.descriptionlevel, $$.*'(.*)'$$, '\1', 'g') AS childdesclevel,
children.dept AS childdept,
rel.subjectcsid AS childcsid,
rel.relationshiptype,
cc.objectnumber AS parentobjnum,
REGEXP_REPLACE(cohc.descriptionlevel, $$.*'(.*)'$$, '\1', 'g') AS parentdesclevel,
depts.departments AS parentdept,
rel.objectcsid AS parentcsid
FROM relations_common rel
INNER JOIN children ON (
children.csid = rel.subjectcsid
AND
rel.relationshiptype = 'hasBroader'
)
INNER JOIN misc ON misc.id = rel.id AND misc.lifecyclestate != 'deleted'
INNER JOIN hierarchy hier ON rel.objectcsid = hier.name
INNER JOIN collectionobjects_common cc ON hier.id = cc.id
LEFT OUTER JOIN collectionobjects_ohc cohc ON cc.id = cohc.id
LEFT JOIN depts ON cc.id = depts.id
Then we find all existing non-hierarchical relationships where the selected objects are in a relationship with another object:
WITH depts AS (
SELECT id, STRING_AGG(item, '|') AS departments
FROM collectionobjects_common_responsibledepartments
GROUP BY id
),
children AS (
SELECT DISTINCT cc.id, cc.objectnumber, hier.name AS csid, cohc.descriptionlevel, depts.departments AS dept
FROM collectionobjects_common AS cc
INNER JOIN collectionobjects_ohc AS cohc ON
cc.id = cohc.id AND
(cohc.descriptionlevel = 'urn:cspace:cspace.client.org:vocabularies:name(descriptionlevel):item:name(item)''item'''
OR
cohc.descriptionlevel = 'urn:cspace:cspace.client.org:vocabularies:name(descriptionlevel):item:name(groupofitems1553529920283)''group of items'''
)
INNER JOIN collectionobjects_common_responsibledepartments AS rd ON
cc.id = rd.id AND
( rd.item = 'archaeology'
OR
rd.item = 'archaeology-nagpra'
OR
rd.item = 'ethnographic'
)
INNER JOIN misc ON misc.id = cc.id AND misc.lifecyclestate != 'deleted'
INNER JOIN hierarchy AS hier ON cc.id = hier.id
LEFT JOIN depts ON cc.id = depts.id
WHERE
cc.objectnumber LIKE 'A %'
)
SELECT children.objectnumber AS childobjnum,
REGEXP_REPLACE(children.descriptionlevel, $$.*'(.*)'$$, '\1', 'g') AS childdesclevel,
children.dept AS childdept,
rel.subjectcsid AS childcsid,
rel.relationshiptype,
cc.objectnumber AS parentobjnum,
REGEXP_REPLACE(cohc.descriptionlevel, $$.*'(.*)'$$, '\1', 'g') AS parentdesclevel,
depts.departments AS parentdept,
rel.objectcsid AS parentcsid
FROM relations_common rel
INNER JOIN children ON (
children.csid = rel.subjectcsid
AND
rel.relationshiptype = 'affects'
)
INNER JOIN misc ON misc.id = rel.id AND misc.lifecyclestate != 'deleted'
INNER JOIN hierarchy hier ON rel.objectcsid = hier.name
INNER JOIN collectionobjects_common cc ON hier.id = cc.id
LEFT OUTER JOIN collectionobjects_ohc cohc ON cc.id = cohc.id
LEFT JOIN depts ON cc.id = depts.id
We don’t want to needlessly recreate nonhierarchical relations that already exist, so we throw the results of our queries in separate sheets of an Excel workbook for some quick and dirty formula fun.
-
create a
relid
for each row by concatenatingparentobjnum
andchildobjnum
-
use
XMATCH
to mark rows inexistinghier
that are not inexistingnonhier
, comparingrelid
-
remove the marked rows
We use the remaining rows to populate the nonhierarchicalrelationship CSV template, to import our new nonhierarchical relationships into CS.
We use the final report from the ingest process to populate the objecthierarchy CSV template with data only from successfully created non-hierarchical relationships, so we can ingest the hierarchical relationships for delete.
We can use a similar process (via wrangling in Excel) to identify pairs of objects for which there are both hierarchical and non-hierarchical relationships, and use that to populate an objecthierarchy CSV template for deletes.
The queries can be run again after the work has been done to identify stragglers and issues.
Export objectNumber, objectProductionDate, objectName, and namedCollection for all collectionobjects
-
if there are multiple values in the objectProductionDate, objectName, or namedCollection fields, take only the value that appears first in the CS record
-
return the dateDisplayDate element for the date field
-
return the first termDisplayName element from the Work authority linked to in namedCollection
Warning
|
Somewhat dangerous because it assumes unique objectNumber values across objects. This is not enforced by CS, though I knew it was the case in my data set. It also does not remove any items marked as deleted. |
WITH objects AS (
SELECT
hier.id AS objid,
common.objectnumber AS objnum
from collectionobjects_common common
inner join hierarchy hier on hier.id = common.id
inner join misc on misc.id = hier.id AND misc.lifecyclestate != 'deleted'
inner join collectionspace_core core on core.id = misc.id
), prod_date_objs AS (
SELECT DISTINCT ON (objects.objnum) objects.objnum, sdg.datedisplaydate AS date
FROM hierarchy hier
INNER JOIN structureddategroup sdg ON hier.id = sdg.id
INNER JOIN objects ON hier.parentid = objects.objid
WHERE hier.name = 'collectionobjects_common:objectProductionDateGroupList'
ORDER BY objects.objnum ASC, hier.pos ASC
), named_coll AS (
SELECT objects.objnum, wt.termdisplayname AS coll
FROM objects
INNER JOIN collectionobjects_common_namedcollections nc ON nc.id = objects.objid
INNER JOIN works_common wc ON nc.item = wc.refname AND nc.pos = 0
INNER JOIN hierarchy hier ON hier.parentid = wc.id AND hier.pos = 0
INNER JOIN worktermgroup wt ON hier.id = wt.id
), obj_names AS (
SELECT DISTINCT ON (objects.objnum) objects.objnum, ong.objectname AS objname
FROM hierarchy hier
INNER JOIN objectnamegroup ong ON hier.id = ong.id
INNER JOIN objects ON hier.parentid = objects.objid
WHERE hier.name = 'collectionobjects_common:objectNameList'
ORDER BY objects.objnum ASC, hier.pos ASC
)
SELECT objects.objnum, pdo.date, nc.coll, ong.objname
FROM objects
LEFT OUTER JOIN prod_date_objs pdo ON pdo.objnum = objects.objnum
LEFT OUTER JOIN named_coll nc ON nc.objnum = objects.objnum
LEFT OUTER JOIN obj_names ong ON ong.objnum = objects.objnum
-
This only handles adding a publishTo value to fields that have nothing in that field.
SELECT COUNT(mh.id)
FROM media_common mh
LEFT JOIN media_common_publishtolist pt ON mh.id = pt.id
INNER JOIN misc ON misc.id = mh.id AND misc.lifecyclestate != 'deleted'
WHERE pt.item = E'urn:cspace:{INSTANCEDOMAIN}:vocabularies:name(publishto):item:name(all)\'All\''
AND mh.identificationnumber IS NOT NULL
SELECT COUNT(mh.id)
FROM media_common mh
LEFT JOIN media_common_publishtolist pt ON mh.id = pt.id
INNER JOIN misc ON misc.id = mh.id AND misc.lifecyclestate != 'deleted'
WHERE pt.id IS NULL
AND mh.identificationnumber IS NOT NULL
WITH not_published AS (
SELECT mh.id
FROM media_common mh
LEFT JOIN media_common_publishtolist pt ON mh.id = pt.id
INNER JOIN misc ON misc.id = mh.id AND misc.lifecyclestate != 'deleted'
WHERE pt.id IS NULL
AND mh.identificationnumber IS NOT NULL
), to_insert AS (
SELECT not_published.id,
CAST(0 AS integer) AS pos,
CAST(E'urn:cspace:{INSTANCEDOMAIN}:vocabularies:name(publishto):item:name(all)\'All\'' AS text) AS item
FROM not_published
)
INSERT INTO media_common_publishtolist
select * from to_insert
Client could not delete an Archeological Culture Concept authority record. They were getting an error about the term being used in records, but the "Used In" listing in the term record and searching the CS application showed that no records were using the term.
We wanted to verify that there were hidden "deleted" object records lurking in the system, preventing term deletion.
This was made easier by the fact that only one field in the object record would have used terms from this particular authority.
select oppg.id, h.parentid, m.lifecyclepolicy, m.lifecyclestate from objectproductionpeoplegroup oppg
inner join hierarchy h
on h.id = oppg.id
inner join misc m
on h.parentid = m.id
where oppg.objectproductionpeople = 'urn:cspace:{INSTANCEDOMAIN}:conceptauthorities:name(archculture):item:name(Prehistoric1558556422893)''Pre-Contact **DELETE - DO NOT USE **'''
-
implemented a single value namedCollection field, populated by Work authority, for a client
-
later added a multi-value namedCollections field, also populated by Work authority, to community profiles
-
to get the client closer to community profile and fix confusion about apparent duplicate fields in search/import/export forms, we needed to migrate their data from the old field to the new field
SELECT COUNT(coo.id) FROM public.collectionobjects_ohc coo
WHERE coo.namedcollection IS NOT NULL
SELECT COUNT(coc.id) FROM public.collectionobjects_common_namedcollections coc
WHERE coc.item IS NOT NULL
SELECT COUNT(coo.id) FROM public.collectionobjects_ohc coo
LEFT JOIN
public.collectionobjects_common_namedcollections con
ON con.id = coo.id
WHERE
con.id IS NULL
AND coo.namedcollection IS NOT NULL
INSERT INTO public.collectionobjects_common_namedcollections (id, pos, item)
SELECT coo.id, 0, coo.namedcollection FROM public.collectionobjects_ohc coo
LEFT JOIN
public.collectionobjects_common_namedcollections con
ON con.id = coo.id
WHERE
con.id IS NULL
AND coo.namedcollection IS NOT NULL
Only update objects where:
-
object does not have a 'deleted' workflow state
-
objectNumber starts with A, H, or N, and
-
inventoryStatus is NOT deaccessioned, transferred, missing, or stolen, and
-
publishTo value (is set, but is not NAGPRA or Do not publish) OR (publishTo value is not set)
Tip
|
"publishTo value is set" means there is at least one row in collectionobjects_common_publishtolist with id matching to collectionobjects_common id . There can be rows for objects in collectionobjects_common_publishtolist with null item field. These should be updated with the new value.
|
This project needed two separate actions: update existing publishTo values meeting criteria, and inserting new publishTo values into records meeting other criteria but having no publishTo values
UPDATE collectionobjects_common_publishtolist
SET item = 'urn:cspace:core.collectionspace.org:vocabularies:name(publishto):item:name(cspacepub)''CollectionSpace Public Browser'''
WHERE id IN (
SELECT coc_pub.id
FROM collectionobjects_common AS coc
LEFT JOIN collectionobjects_common_inventorystatuslist AS coc_inv ON (coc.id = coc_inv.id)
LEFT JOIN collectionobjects_common_publishtolist AS coc_pub ON (coc.id = coc_pub.id)
INNER JOIN misc ON misc.id = coc.id AND misc.lifecyclestate != 'deleted'
WHERE
(coc.objectnumber LIKE 'A%' OR coc.objectnumber LIKE 'H%' OR coc.objectnumber LIKE 'N%')
AND
(
(coc_inv.item NOT LIKE '%deaccessioned%' OR coc_inv.item NOT LIKE '%transferred%'OR coc_inv.item NOT LIKE '%missing%' OR coc_inv.item NOT LIKE '%stolen%')
OR
coc_inv.item IS NULL)
AND
coc_pub.id IS NOT NULL
AND
(
coc_pub.item IS NULL
OR
(coc_pub.item NOT LIKE '%NAGPRA%' AND coc_pub.item NOT LIKE '%Do not publish%')
)
);
WITH not_published AS (
SELECT DISTINCT coc.id FROM collectionobjects_common AS coc
LEFT JOIN collectionobjects_common_inventorystatuslist AS coc_inv ON (coc.id = coc_inv.id)
LEFT JOIN collectionobjects_common_publishtolist AS coc_pub ON (coc.id = coc_pub.id)
INNER JOIN misc ON misc.id = coc.id AND misc.lifecyclestate != 'deleted'
WHERE (coc.objectnumber LIKE 'A%' OR coc.objectnumber LIKE 'H%' OR coc.objectnumber LIKE 'N%')
AND
(
(coc_inv.item NOT LIKE '%deaccessioned%' OR coc_inv.item NOT LIKE '%transferred%'OR coc_inv.item NOT LIKE '%missing%' OR coc_inv.item NOT LIKE '%stolen%')
OR coc_inv.item IS NULL)
AND coc_pub.id IS NULL
), to_insert AS (
SELECT not_published.id,
CAST(0 AS integer) AS pos,
CAST('urn:cspace:core.collectionspace.org:vocabularies:name(publishto):item:name(cspacepub)''CollectionSpace Public Browser''' AS text) AS item
FROM not_published
)
INSERT INTO collectionobjects_common_publishtolist
SELECT * from to_insert
SELECT COUNT(coc.id) FROM collectionobjects_common AS coc
LEFT JOIN collectionobjects_common_publishtolist AS coc_pub ON (coc.id = coc_pub.id)
INNER JOIN misc ON misc.id = coc.id AND misc.lifecyclestate != 'deleted'
WHERE coc_pub.item LIKE '%Public Browser%'
Upon re-indexing, logs showed errors like:
org.nuxeo.ecm.core.api.NuxeoException: Could not select: SELECT "id", "annotationauthor", "annotationdate", "annotationtype", "annotationnote" FROM "annotationgroup" WHERE "id" IN (?)
Caused by: org.postgresql.util.PSQLException: Bad value for type timestamp/date/time: {1}
Caused by: java.lang.NumberFormatException: Expected time to be colon-separated, got '/'
annotationgroup
where annotationdate
value contains /
select * from annotationgroup
where annotationdate LIKE '%/%'
For the client, generate a report of object numbers, previous invalid annotationdate
values, and what you changed them to. The following will give you the first two pieces of that.
select cc.objectnumber, ag.annotationdate from annotationgroup ag
inner join hierarchy h on ag.id = h.id
inner join hierarchy hh on h.parentid = hh.id
inner join collectionobjects_common cc on hh.id = cc.id
where ag.annotationdate LIKE '%/%'
You need a list of the object record CSIDs where this will be fixed, so that only those can be re-indexed.
annotationgroup
entries with annotationdate
values containing /
SELECT HH.NAME AS CSID,
AG.ID AS GROUPID
FROM ANNOTATIONGROUP AG
LEFT JOIN HIERARCHY H ON AG.ID = H.ID
LEFT JOIN HIERARCHY HH ON H.PARENTID = HH.ID
WHERE ANNOTATIONDATE LIKE '%/%';