It is a bit embarassing when I teach SPARQL to someone with a background in SQL. Once they figure out how it works, they start to appreciate it as a powerful language for representing just what information you want from a graph. Then they start asking about some of the commonly used query operations from SQL - things like aggregators and grouping operations like COUNT, SUM, MAX, MIN and AVG.
The SPARQL specification at least has some guidance for how to do negation in SPARQL, though this idiom is a bit more difficult than simply being able to have NOT as a keyword.
As part of a recent exercise, I wondered whether it was possible to use a similar trick to define MAX and MIN in SPARQL. I was surprised to find that it is possible.
Suppose we have data on the members of a prominent family, where we have the year of birth represented in triples of the form
:person1 :birth-year 1888 .
:person2 :birth-year 1890 .
:person2 :birth-year 1915 .
etc.
How would we find the name (rdfs:label) of the oldest known member of the family? Here is a solution just using current SPARQL:
SELECT DISTINCT ?label ?by
WHERE {?kennedy a :Person .
?kennedy rdfs:label ?label .
?kennedy :birth-year ?by .
OPTIONAL {?older a :Person .
?older :birth-year ?oby .
FILTER (?oby < ?by)}
FILTER (!bound (?older))
}
How does this work? The first three triples match any member of the family for which a birth year is known.
The pattern inside the OPTIONAL clause also matches a family member, and gets their birth year. We use the variable name ?older for this person because of the FILTER clause; we retain only the bindings of ?older that have an earlier birth year than ?kennedy.
Now here's how we get a max out of this: what happens if we can't find anyone with an earlier birth year? Then all matches to the pattern inside the OPTIONAL braces will be filtered out, and no bindings will remain for ?older.
Back outside the OPTIONAL, we filter based on the binding of ?older; if ?older is not bound, then we didn't find anyone with an earlier birth year.
Who is the person for whom nobody else has an earlier birth year? That's the oldest, of course.
Left as an exercise for the reader:
- Suppose also have :death-year represented in the same way as :birth-year, but there is no triple in case the person is still living. How do we modify this query to find the oldest living family member? Or the youngest dead kennedy?
- What happens if the oldest (youngest) is not unique? What would you expect to happen? What does this query do?