Skip to content

Latest commit

 

History

History
548 lines (412 loc) · 21.2 KB

README.md

File metadata and controls

548 lines (412 loc) · 21.2 KB

Querying and SPARQL update

This repository covers the content of the above titled workshop at IslandoraCon 2017.

Outline

Introduction

Fedora (version 3) provided a built-in semantic store to hold the information about Fedora resources. This is commonly referred to as the Resource Index

RDF, Triplestores and SPARQL

oh my!

The RELS-EXT datastream on a Fedora resource contains an RDF description of the relationships between that resource and others, it also contains other information about that resource.

Those statements are also added to the Resource Index, in the case of a standard Islandora and Fedora installation this is Mulgara.

Mulgara is a triplestore or an RDF Store.

http://www.linkeddatatools.com/introducing-rdf

For most types of data, there is the concept of some elements of data having more importance over other elements. For example in a relational database you might store all your data in various tables and link them by primary key, or in an XML document you might use the hierarchy to infer importance. But in a triplestore there is no inherant importance or hierarchy. Anything can be linked to anything else.

So why is it called a triplestore? Because it stores triples.

What's a triple?

A triple is the atomic unit of data in the semantic web; just like a row was the atomic unit of data in the old world, the triple is the row of the new world. -- https://data-gov.tw.rpi.edu/wiki/Triples

A triple is a unit of data that has the form of SUBJECT PREDICATE OBJECT

The Subject is what the statement is about, the Predicate is the property or data-type and the Object is the value.

The Flintstones example by OntoText

The full graph from the above example could be

Subject Predicate Object
:Fred :hasAge 25
:Fred :hasSpouse :Wilma
:Fred :livesIn :Bedrock

So as an example a photo of Louis Riel from the UM DigitalCollections site contains the following RELS-EXT

<rdf:RDF>
  <rdf:Description rdf:about="info:fedora/uofm:9333">
    <fedora:isMemberOfCollection rdf:resource="info:fedora/uofm:riel"/>
    <fedora-model:hasModel rdf:resource="info:fedora/uofm:highResImage"/>
    <fedora-model:hasModel rdf:resource="info:fedora/islandora:compoundCModel"/>
    <islandora:isManageableByUser>xxxx</islandora:isManageableByUser>
    <islandora:isManageableByRole>yyyy</islandora:isManageableByRole>
  </rdf:Description>
</rdf:RDF>

So your Resource index will have the following triples.

Subject Predicate Object
info:fedora/uofm:9333 fedora:isMemberOfCollection info:fedora/uofm:riel
info:fedora/uofm:9333 fedora-model:hasModel info:fedora/islandora:compoundCModel
info:fedora/uofm:9333 fedora-model:hasModel info:fedora/uofm:highResImage
info:fedora/uofm:9333 islandora:isManageableByUser xxxx
info:fedora/uofm:9333 islandora:isManageableByRole yyyy

What's a graph?

An RDF graph is a set of RDF triples. -- https://www.w3.org/TR/rdf11-concepts/#section-rdf-graph

For our purposes a graph is the FROM <#ri> we see in all our Sparql queries. Fedora used this default graph to store our information.

You could have multiple graphs in a triplestore and you can query them independantly of each other or query across those graphs. But that is a lesson for another day.


RDF and shared ontologies

The benefit of RDF is that one can share their ontology with the world or borrow from (and build upon) another's work.

You can go it alone and create your own ontology which is allowed. For our Flintstones we might have this (Flinstones v1) as a dataset.

Our custom ontology defines the relationships between people and between people and places.

But there are others that will have objects that share these relationships and you can benefit from sharing those relationships. For instance the FOAF (Friend Of A Friend) ontology "... is a project devoted to linking people and information using the Web."

Because our Bedrock family members are all "people" and foaf has a class for that

The Person class represents people. Something is a Person if it is a person. We don't nitpic about whether they're alive, dead, real, or imaginary.

We can extend our data by defining each entry as a foaf:Person. This adds more context to our data, Flintstones v.2.

Not bad, but we have used our own vocabulary to define the hasSpouse and hasChild relationships. I'm sure someone else somewhere has defined those relationships.

The relationship ontology also does this. So we don't have to re-invent the wheel and can make use of their http://purl.org/vocab/relationship/spouseOf and http://purl.org/vocab/relationship/parentOf predicates.

Now our data still provides the same information, but using some shared predicates. Also I threw Mr. Slate in there as he doesn't have any children. Flintstones v3.

Sparql

The W3C Spec on Sparql 1.1 is split up in to a bunch of different specifications. We'll concentrate on the Query here and Update below.

The 4 forms of Sparql querying are SELECT, CONSTRUCT, ASK and DESCRIBE

I'm using this large Flintstones dataset for the below examples.


Predicates and Prefixes

One of the things to remember when writing any Sparql queries is to always define your prefixes.

We've become spoiled because Fedora has a set of prefixes that are recognized by default. These are:

Prefix Namespace
test http://example.org/terms#
fedora-rels-ext info:fedora/fedora-system:def/relations-external#
fedora info:fedora/
rdf http://www.w3.org/1999/02/22-rdf-syntax-ns#
fedora-model info:fedora/fedora-system:def/model#
fedora-view info:fedora/fedora-system:def/view#
mulgara http://mulgara.org/mulgara#
dc http://purl.org/dc/elements/1.1/
xml-schema http://www.w3.org/2001/XMLSchema#

So many of our Islandora queries appear in code like this.

The internally registered prefixes allow you to avoid registering them in your query. However, because this is only good for Mulgara you run into trouble if you switch to using a different triplestore.

You can avoid that problem by simply registering each prefix you intend to use in your query, so the above example becomes.

  PREFIX fedora-rels-ext: <info:fedora/fedora-system:def/relations-external#>
  PREFIX fedora-model: <info:fedora/fedora-system:def/model#>
  PREFIX fedora-view: <info:fedora/fedora-system:def/view#>
  PREFIX islandora-rels-ext: <http://islandora.ca/ontology/relsext#>
  SELECT ?pid ?page ?label ?width ?height
  FROM <#ri>
  WHERE {
    ?pid fedora-rels-ext:isMemberOf <info:fedora/{$object->id}> ;
         fedora-model:label ?label ;
         islandora-rels-ext:isSequenceNumber ?page ;
         fedora-model:state fedora-model:Active .
    OPTIONAL {
      ?pid fedora-view:disseminates ?dss .
      ?dss fedora-view:disseminationType <info:fedora/*/JP2> ;
           islandora-rels-ext:width ?width ;
           islandora-rels-ext:height ?height .
   }
  }
  ORDER BY ?page

Now this query will operate identically on any triplestore that support Sparql.


Select

This is the form most people are comfortable with.

You bind values to variables and return those variables.

An example (using the above would be to find all of Pebbles children

PREFIX ff: <test:flintstones#> 
PREFIX vocab: <http://purl.org/vocab/relationship/>

SELECT ?kids WHERE {
   ff:Pebbles vocab:parentOf ?kids .
}

This will return

?kids
<test:flintstones#Chip>
<test:flintstones#Roxy >



We know that the vocab:parentOf predicate defines a parent to child relationship, this means that the parent is ALWAYS the subject and the child is ALWAYS the object. This makes it easy to reverse this and find the parents of kids with the same data.

PREFIX ff: <test:flintstones#> 
PREFIX vocab: <http://purl.org/vocab/relationship/>

SELECT ?parents WHERE {
   ?parents vocab:parentOf ff:Chip .
}
?parents
<test:flintstones#Bamm-Bamm>
<test:flintstones#Pebbles>



What about following the relationships further?

PREFIX ff: <test:flintstones#> 
PREFIX vocab: <http://purl.org/vocab/relationship/>

SELECT ?grandChild WHERE {
   ff:Fred vocab:parentOf ?kid .
   ?kid vocab:parentOf ?grandChild .
}

So here we:

  1. found all the children of Fred and assigned them to a variable (?kid).
  2. used the values in the variable ?kid in another match to find all children of those people.

This allows us to follow two generations and allows us to find grandchildren from grandparents. The result is:

?grandChild
<test:flintstones#Chip>
<test:flintstones#Roxy>

Construct

Construct returns a single RDF graph generated using a template and replacing variables with values in other graphs.

For instance what if instead of just finding out who Fred's grandchildren are we want to make a relationship for that.

PREFIX ff: <test:flintstones#> 
PREFIX vocab: <http://purl.org/vocab/relationship/>

CONSTRUCT {
  ?grandparent ff:grandparentOf ?grandchild
} WHERE {
  ?grandparent vocab:parentOf ?kid .
   ?kid vocab:parentOf ?grandchild .
}

So here we are creating our own graph made of people that have children that have children.

subject predicate object
<test:flintstones#Barney> <test:flintstones#grandparentOf> <test:flintstones#Chip>
<test:flintstones#Betty> <test:flintstones#grandparentOf> <test:flintstones#Chip>
<test:flintstones#Barney> <test:flintstones#grandparentOf> <test:flintstones#Roxy>
<test:flintstones#Betty> <test:flintstones#grandparentOf> <test:flintstones#Roxy>
<test:flintstones#Ed> <test:flintstones#grandparentOf> <test:flintstones#Pebbles>
<test:flintstones#Edna> <test:flintstones#grandparentOf> <test:flintstones#Pebbles>
<test:flintstones#Fred> <test:flintstones#grandparentOf> <test:flintstones#Chip>
<test:flintstones#Wilma> <test:flintstones#grandparentOf> <test:flintstones#Chip>
<test:flintstones#Fred> <test:flintstones#grandparentOf> <test:flintstones#Roxy>
<test:flintstones#Wilma> <test:flintstones#grandparentOf> <test:flintstones#Roxy>
<test:flintstones#Pearl> <test:flintstones#grandparentOf> <test:flintstones#Pebbles>
<test:flintstones#Ricky> <test:flintstones#grandparentOf> <test:flintstones#Pebbles>

So here we have entries for Fred, Wilma, Barney & Betty -> Chip & Roxy. We also have entries from Ed & Edna (Fred's parents) and Ricky & Pearl (Wilma's parents) to their grandchild Pebbles.

Notice that Mr. Slate does not appear as he is doesn't match our where clause.


Ask

An Ask query is used to test whether or not a query pattern has a solution. It returns "true" or "false". The benefit here is that once the pattern matches once your query halts and you don't have to wait for it to resolve all possible matches.

So to find out if anyone is between the ages of 20 and 22 (inclusive), we can do

PREFIX ff: <test:flintstones#> 
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

ASK {
  ?person ff:hasAge ?age .
  FILTER(?age >= "20"^^xsd:integer && ?age <= "23"^^xsd:integer)
}
false



Ok how about 40 and 42 (hint: Betty was 41).

PREFIX ff: <test:flintstones#> 
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

ASK {
  ?person ff:hasAge ?age .
  FILTER(?age >= "40"^^xsd:integer && ?age <= "42"^^xsd:integer)
}

| true |

This doesn't tell us anything about those matching records, except that they exist. This can be more efficient as the query halts once the first match is found. It only keeps running if it doesn't find a match.


Describe

Describe is useful for seeing information about the resources returned by your query.

A simple example is

PREFIX ff: <test:flintstones#>
DESCRIBE ff:Wilma

Which returns all triples that have Wilma as the subject or object.

subject predicate object
<test:flintstones#Fred> <http://purl.org/vocab/relationship/spouseOf> <test:flintstones#Wilma>
<test:flintstones#Pearl> <http://purl.org/vocab/relationship/parentOf> <test:flintstones#Wilma>
<test:flintstones#Ricky> <http://purl.org/vocab/relationship/parentOf> <test:flintstones#Wilma>
<test:flintstones#Wilma> <http://purl.org/vocab/relationship/parentOf> <test:flintstones#Pebbles>
<test:flintstones#Wilma> <http://purl.org/vocab/relationship/spouseOf> <test:flintstones#Fred>
<test:flintstones#Wilma> <test:flintstones#hasAge> 44
<test:flintstones#Wilma> <test:flintstones#livesIn> <test:flintstones#Bedrock>
<test:flintstones#Wilma> rdf:type foaf:Person
<test:flintstones#Wilma> foaf:familyName Slaghoople
<test:flintstones#Wilma> foaf:givenName Pebbles
<test:flintstones#Wilma> foaf:givenName Wilma
<test:flintstones#Wilma> foaf:name Wilma Flintstone

Another example is if we do a query to find the "people" who are 44 years old.

PREFIX ff: <test:flintstones#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?x
WHERE {
  ?x ff:hasAge "44"^^xsd:integer
}

We would get back

?x
<test:flintstones#Barney>
<test:flintstones#Wilma>



But if we changed SELECT to DESCRIBE...

PREFIX ff: <test:flintstones#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

DESCRIBE ?x
WHERE {
  ?x ff:hasAge "44"^^xsd:integer
}

This returns all triples that refer to the results of our query.

subject predicate object
<test:flintstones#Betty> <http://purl.org/vocab/relationship/spouseOf> <test:flintstones#Barney>
<test:flintstones#Fred> <http://purl.org/vocab/relationship/spouseOf> <test:flintstones#Wilma>
<test:flintstones#Pearl> <http://purl.org/vocab/relationship/parentOf> <test:flintstones#Wilma>
<test:flintstones#Ricky> <http://purl.org/vocab/relationship/parentOf> <test:flintstones#Wilma>
<test:flintstones#Barney> <http://purl.org/vocab/relationship/parentOf> <test:flintstones#Bamm-Bamm>
<test:flintstones#Barney> <http://purl.org/vocab/relationship/spouseOf> <test:flintstones#Betty>
<test:flintstones#Barney> <test:flintstones#hasAge> 44
<test:flintstones#Barney> <test:flintstones#livesIn> <test:flintstones#Bedrock>
<test:flintstones#Barney> rdf:type foaf:Person
<test:flintstones#Barney> foaf:familyName Rubble
<test:flintstones#Barney> foaf:givenName Bernard
<test:flintstones#Barney> foaf:name Barney Rubble
<test:flintstones#Wilma> <http://purl.org/vocab/relationship/parentOf> <test:flintstones#Pebbles>
<test:flintstones#Wilma> <http://purl.org/vocab/relationship/spouseOf> <test:flintstones#Fred>
<test:flintstones#Wilma> <test:flintstones#hasAge> 44
<test:flintstones#Wilma> <test:flintstones#livesIn> <test:flintstones#Bedrock>
<test:flintstones#Wilma> rdf:type foaf:Person
<test:flintstones#Wilma> foaf:familyName Slaghoople
<test:flintstones#Wilma> foaf:givenName Pebbles
<test:flintstones#Wilma> foaf:givenName Wilma
<test:flintstones#Wilma> foaf:name Wilma Flintstone

So in our original query we returned Wilma and Barney, and this second query returned any triple that has the <test:flintstones#Wilma> or <test:flintstones#Barney> in them anywhere.

In an Islandora 7.x-1.x context you can find all the information about an object by performing

DESCRIBE <info:fedora/PID>

Where PID is the PID of the object you want to see all the information for.

Note: DESCRIBE queries don't seem to be supported by Mulgara through Fedora.

Sparql-Update

Sparql-Update (W3C spec) is syntax based around updating RDF information. This is the syntax you use to add, delete or update triples in a triplestore. It is also how you update information directly in Fedora (versions 4+)

Generally this is done by a PATCH HTTP request.

We'll stick to single graph operations for now.

Adding data (INSERT)

To add triples/RDF you use the keyword INSERT, inserts come in generally two types literal and informative.

A literal INSERT is when you provide all parts of the triple. You also use the keyword DATA after insert

For example:

PREFIX dc: <http://purl.org/dc/elements/1.1/>
INSERT DATA
{ 
   <http://example/book1> dc:title "A new book" .
}

This will insert one triple into your triplestore, you can also do.

PREFIX dc: <http://purl.org/dc/elements/1.1/>
INSERT DATA
{ 
   <http://example/book1> dc:title "A new book" ;
                          dc:author "Jane Smith" .
}

This will insert two triples.

The other form is informative, that is akin to a SQL INSERT ... WHERE statement.

For example, if you want to add a new example:nationality predicate with Canadian for all of Margaret Atwood's novels. You would:

PREFIX example <http://whatever.org/smashing/predicates/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
INSERT {
  ?book example:nationality "Canadian" .
} 
WHERE {
  ?book dc:author <https://www.loc.gov/item/n79102766/margaret-atwood/>
}

Now you are doing a query for all ?books that have the author of Margaret Atwood (or rather an authority record for her) and using those query results to determine what new triples you are adding.

Deleting data (DELETE)

Delete works much the same as INSERT, it also has a literal and informative syntax.

ie.

DELETE DATA {
   <http://example/book7> dc:title "Bad data" .
}

This will only delete the triple that exactly matches this one.

For informative, we can delete all the history books.

PREFIX dc: <http://purl.org/dc/elements/1.1/>
DELETE {
  ?book ?p ?v
}
WHERE {
  ?book dc:subject "history" .
  ?book ?p ?v
}

Note: When using the INSERT {} WHERE {} or DELETE {} WHERE {} the WHERE {} part is executed first and its results determine what gets INSERT/DELETE. If it returns 0 results, there will be no action.

For example, if I have no books by Neil Gaiman in my triplestore and try to execute.

PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX example <http://whatever.org/smashing/predicates/>
INSERT {
  ?book example:something "Wonderful" .
}
WHERE {
  ?book dc:author "Gaiman, Neil" .
}

Nothing happens as the result of the WHERE clause is 0 records.

Replacing values

Pretend you have a bad title, like "The Dark Twower" and you want to fix it.

subject predicate object
<http://example.org/book1> dc:title The Dark Twower

If you do:

PREFIX dc: <http://purl.org/dc/elements/1.1/>
INSERT {
  ?book dc:title "The Dark Tower" .
}
WHERE {
  ?book dc:title "The Dark Twower" .
}

You end up with 2 dc:title for the book.

subject predicate object
<http://example.org/book1> dc:title The Dark Twower
<http://example.org/book1> dc:title The Dark Tower

It does not automatically replace, because you can add as many titles as you'd like.

To replace this incorrect value, you need to delete the exact wrong triple (or all values) and then insert your new one.

For example:

PREFIX dc: <http://purl.org/dc/elements/1.1/>
DELETE {
  ?book dc:title "The Dark Twower" .
}
INSERT {
  ?book dc:title "The Dark Tower" .
}
WHERE {
  ?book dc:title "The Dark Twower" .
}

How this executes is

  1. It does the SELECT in the WHERE clause. Find all books with a dc:title matching "The Dark Twower".
  2. Using the list of books from step 1, delete any dc:title predicates with a value of "The Dark Twower".
  3. Lastly, again using the list of books in step 1, insert a new triple of predicate dc:title and value "The Dark Tower".