This is a blog entry I promised some folks a few months back - I'm finally getting around to it.
It is another in my series of "What can you do with an RDF data set that someone gives you?" In this case, someone asked me if there was a way to get rid of the instance data in an RDF data set, leaving just the 'ontology' (where by 'ontology' in this context we mean all the schema stuff, e.g., the RDFS).
Now, I usually recommend when someone wants to create a set of ontologies and data, that they develop these things separately. Create your schema in one file, your data in another. You can merge them together easily enough with just a couple of owl:imports; e.g., you can either have the data file import the appropriate schema information (a typical pattern when re-using a schema, e.g., SKOS). Alternatively, you can build a file that imports both your schema and your data. This can be useful if you want to manage different data files with different schema information.
But what if someone gives you a file that hasn't been factored like this? What can you do about it?
One option, if you are a TopBraid Composer user, is to just remove all the instances. Holger gave an example of this in a very particular context in his blog. That method will work for other ontologies as well, not just the Semantic XML example that he gives.
Another way to do this is to use SPARQL. With SPARQL, you can select the instance information, and separate it out. There are a number of ways to approach this, depending a bit on the details of the RDF file you are working with.
Let's suppose someone has sent you an OWL file, with a bunch of owl:Classes, and some connections between them. With SPARQL you can select for all the classes easily enough:
SELECT ?class WHERE {?class a owl:Class}
Now the members of that class can be found pretty easily too:
SELECT ?member
WHERE {?class a owl:Class.
?member a ?class}
So - how do we get rid of all the information about these members? Easy - match all the triples about them:
CONSTRUCT {?member ?p ?o}
WHERE {?class a owl:Class .
?member a ?class .
?member ?p ?o .}
This gives us all the triples about the instances (including the type triples!). You can save these in a file on their own. Using TopBraid Composer, you can do this in the SPARQL tab by pulling down the context menu option, "Export results to file ...". If you are coding in Java, you can use the Jena API to create a new OntModel with these triples in it.
But how do we get just the schema? Well, if we use the ARQ SPARQL extensions (which are pretty sure to get into the recommendation soon), this is easy; just delete these triples:
DELETE {?member ?p ?o}
WHERE {?class a owl:Class .
?member a ?class .
?member ?p ?o .}
The stuff that's left is the schema. In Composer, you can just File>Save As, and put this where you like.
OWL's insistance that members of classes not be classes themselves (or properties) does a lot of work for us here - we know that we didn't 'accidentally' get any classes or properties in with our instance data. The situation gets a bit more complex if you can't count on this separation, but the principle is the same - select the triples that make up the part of the file you are interested in, and save it separately.