Lots of mention of Semantic Web in Data.gov ConOps. I'll read it in detail on the plane . . .
Lots of mention of Semantic Web in Data.gov ConOps. I'll read it in detail on the plane . . .
Posted at 12:55 PM | Permalink | Comments (0)
I felt that my own major session, a tutorial on building semantic web applications for government data, was very successful in this regard. I was originally disappointed with the registration, but discovered on the day that most people signed up for 'tutorials' in general, then attended whatever they liked. The room made it to SRO before the first coffee break, and many of the people there were from government agencies or contractors, as desired (many were from other places, but I'm not going to complain about that!).
Probably the biggest measure of success for this goal was the exposure in the Government Computer News. Senior Technology Editor Joab Jackson seemed to like my elevator pitch about the Semantic Web (though it only works for veerrryyy slloooww elevators) enough to repeat it in one of his articles about the event. In another article, Joab told me something I didn't know - that the report we generated in the tutorial is actually interesting to government IT managers, and would somewhat labor intensive without linked open government data.
GCN's 13 resources seems like an intentional flaunting of superstition, since one could easily come up with many more. I am flattered that one of my own pages made it to the list; many of the omissions are available there, and include US Gov XML and OEgov.
All in all (thanks to a great extent to Mr. Jackson's efforts), I think we managed to achieve some exposure for semantic web technology for government information managers.
Posted at 03:06 PM in Goverment | Permalink | Comments (0)
Back on August 1, Ralph Hodgson declared Data Independence Day , to celebrate the opening of oegov, a website that collects and organizes ontologies and data sets about government. Along with recent developments in open data in the US government, this creates a an opportunity to mash-up government data in a way that has not been possible before.
We're celebrating next week at ISWC with a tutorial on building semantic web applications for government. The tutorial will show attendees how to use semantic web standards to create their own data mashup applications. A lot of the features of the semantic web come in to play - distributed vocabularies (using SKOS, of course), linked open data, RSS, etc. The idea is that each attendee will walk away from the workshop with their own app that they created from data now available from the goverment.
Controlled vocabularies play a big role in this - bigger than you might have thought possible. After all, if two people use a common controlled vocabulary well, they can share data. But if they use it badly, well, then data quality issues dominate. Fortunately, there are some controlled vocabularies being used in the government in a pretty consistent way. They are published in convenient forms on OEGov, where they can be used as terminology hubs for mashing up information.
The workshop is part of the International Semantic Web Conference 2009, to be held near Washington, DC from 25-29 October (the workshop itself will be held on Oct 26 in the afternoon, and you can register without attending the whole conference!). The conference this year has a special focus on government data and applications, and should be a great event for anyone interested in openness of government data.
Posted at 09:08 PM in Goverment, SKOS | Permalink | Comments (0)
FOAF was one of the first Semantic Web projects, and is still trotted out as an example on a regular basis. The FOAF model itself has been criticized a number of times (I don't feel like googling all the examples), but there are some things about FOAF that are very interesting in today's world.
One could criticize FOAF for having invented social networking in the late nineties, then having missed the whole Web 2.0 boat, to have the limelight taken by myspace, linkedin, livejournal, and nowadays by facebook. Indeed in terms of bringing social networking awareness to the masses, this criticism would be true. But if you have a look at some of the founding assumptions behind FOAF, you'll find that the project was eerily prescient - forseeing problems with social networking that took years to come to light once social networks became commonplace.
A simple example is a bit of drama that happened on the social networking site LiveJournal a couple of years ago. Livejournal was sold to a Russian firm, with the risk that all the servers, with all those back journals, would migrate outside the United States. Many American users (who for the most part had been ignoing the vast number of Russian speaking users) suddenly became aware of the fact that their precious journal data might drop out of control of copywrite laws that they understood. A panic ensued, and LiveJournal dump programs became quite the "meme".
A more recent example was the change of the terms of use for Facebook. Suddenly, Facebook reserved the right to use your photos in its advertising. Okay, they probably don't want that photo of the time you passed out in Vegas and your 'friends' stripped you to your underwear and drew faces on your chest with shaving cream, but you never know. The outcry amongst FB users cause them to rescind this policy. But the same issue came up again - who owns the data that you put on social networking servers?
FOAF understood this issue over a decade ago, when they envisioned a distributed social network, where servers owned/operated by different agents could participate in the same social network. A sort of decentralized, distributed version of facebook. Where you kept your own ownership, access control, backups, etc. Or you could hire someone to do it for you, if you preferred. But you had the option.
This is a key idea behind the Social Web - not just social networking on the web, but making the network part of the web itself. How can this work? The Semantic Web plays a big role in the solution - or so many of us believe. Come to the Social Web Camp in Santa Clara on November 2 and find out what the W3C and others are doing to make this come true.
Posted at 05:51 PM | Permalink | Comments (4) | TrackBack (0)
But now that we have these vocabularies, how can we view or edit them? One way is to use TopBraid Composer. Since Composer is a native RDF system, importing, viewing, editing and saving SKOS files is second-nature. As an example, I have downloaded one of my favorite vocabularies, the AGROVOC vocabulary from the United Nations Food and Agriculture Organization. The AGROVOC appears on the W3C SKOS Implementation Page as a SKOS (RDF) file [0]. The screenshot below shows this file displayed in the Free Edition of TopBraid Composer:
Since AGROVOC is a multi-lingual thesaurus, I can usefully set the language of Composer to something other than my native tongue; in this case, I have chosen French. In the upper left, we see the broader/narrower tree, in particular the part about Ruminants, with current focus on Dairy Cattle. In the center form, we see the details of this term: Its preferred expression in several languages (this is part of the AGROVOC data), its situation in terms of broader/narrower terms, and even the related term, Milk ("Lait" in French). In the upper right, we see the SKOS relationship hierarchy; we are currently focusing on skos:broader for our view. In the bottom, we see a SPARQL query, rather fancifully determining the connection between Cattle and Foxes. Notice that like many professional thesauri, AGROVOC uses numeric codes for its terms; creating such a SPARQL query could be quite hard work if you had to cross-reference all these numbers. But in Composer, you can use the display name to help you out. This query was written by copy-and-pasting terms from the bookmarks window ("Basket") into the SPARQL tab. We see the terms printed with readable names (in French) in the Basket; they show up in the SPARQL editor as URIs, processable by the SPARQL engine.
In the Maestro edition of TopBraid Composer, you can even see the relationships graphically; below you see the results of that SPARQL query displayed as a graph, showing all the steps from Cattle to Foxes (now in English) in the AGROVOC vocabulary.
We are finding SKOS to be an invaluable asset in vocabulary management applications. It covers the basics that are expected of any vocabulary representation (including multilingualism) with a very simple meta-model. The meta-model itself makes modest use of OWL (transitive, symmetric, inverse, and one functional property), but there is no need for someone who is editing or viewing a vocabulary to have any familiarity with OWL at all. The ability to distribute vocabularies over the web, and to connect them together (using the SKOS matching vocabulary) addresses a wide variety of real-world vocabulary management needs, which are not met by any other standard. I'll be giving a tutorial at KMWorld on the use of SKOS in vocabulary management on November 16.
[0] Last time I checked, the link on the W3C page to AGROVOC was broken. I downloaded the example file a few weeks ago, and still have it. I don't know if the link is temporarily broken, or if the file has moved, or if there is another reason why the link is currently not available.
Posted at 07:38 PM in SKOS | Permalink | Comments (3) | TrackBack (0)
Written yesterday at the EA Conference held in Washington, DC.
This morning's session on data.gov was really nothing short of inspiring. There has been a sea change in how government data is made public. As little as a year ago, even government RSS feeds were presented in such a way as to be barely re-usable, as if their agencies were providing open data under protest, and doing as much as possible to keep their data secret.
Contrast that to the accomplishments of data.gov today, with their tens of thousands of data sources, RSS feeds that really expose data, application contests to do interesting things with public data.
I asked the data.gov panel at the Government EA conference this week in the Ronald Reagan building what had changed. This seems like a difference of work culture in the agencies. What was the cause of that?
I got insightful answers from all the panelists. I don't want to put words into their mouths, so I won't attribute any particular answer to any of them, but the panelists were Sonny Bhagowalia (DOI), Jerry Johnston (EPA), Marion Royal (GSA) and Martha Dorris (GSA).
There are a few forces that are coming together to cause this change. First, there are people in the agencies who have always believed in open data, and wanted to share it, but have not had a charter to do so. They have effectively done it in their spare time, just waiting for a chance.
The efforts that they have managed to make have been oriented toward very specific tasks; they made data available in a way that they thought some particular consumer wanted it. This would allow them to justify the effort of publishing the data. But data presented for a single consumer doesn't feel like 'open' data to the rest of us; it can even feel as if the data is being kept intentionally secret. Early feedback (early? As recently as June the whole effort was called a "significant failure" on this point alone) to data.gov told the providers that there is an audience for 'raw' open data. So they have started to do both.
Another force is hard times. This country is in the midst of a number of crises, and the government is involved to a great extent in the problems and any solutions. Government data is more important than ever. And the agencies need to harness the ingenuity of the masses to work through it, adding another incentive.
This situation is like a powder keg ready to go off. We have people in the agencies who want to share data, who want to stimulate the clever folks at MIT or Stanford or in their garages to solve problems using government data, and who want to get around requirements for particular audiences for their published data. To this mix, you add a spark: in February, President Obama signed the memorandum about Transparency and Open Government.
Critics might cry that this is too little, too late. But the gains that data.gov has made in the past few months show a real change in attitude; a far cry from what we had before.
Posted at 11:56 AM | Permalink | Comments (2) | TrackBack (0)
This is a blog entry I promised some folks a few months back - I'm finally getting around to it.
It is another in my series of "What can you do with an RDF data set that someone gives you?" In this case, someone asked me if there was a way to get rid of the instance data in an RDF data set, leaving just the 'ontology' (where by 'ontology' in this context we mean all the schema stuff, e.g., the RDFS).
Now, I usually recommend when someone wants to create a set of ontologies and data, that they develop these things separately. Create your schema in one file, your data in another. You can merge them together easily enough with just a couple of owl:imports; e.g., you can either have the data file import the appropriate schema information (a typical pattern when re-using a schema, e.g., SKOS). Alternatively, you can build a file that imports both your schema and your data. This can be useful if you want to manage different data files with different schema information.
But what if someone gives you a file that hasn't been factored like this? What can you do about it?
One option, if you are a TopBraid Composer user, is to just remove all the instances. Holger gave an example of this in a very particular context in his blog. That method will work for other ontologies as well, not just the Semantic XML example that he gives.
Another way to do this is to use SPARQL. With SPARQL, you can select the instance information, and separate it out. There are a number of ways to approach this, depending a bit on the details of the RDF file you are working with.
Let's suppose someone has sent you an OWL file, with a bunch of owl:Classes, and some connections between them. With SPARQL you can select for all the classes easily enough:
SELECT ?class WHERE {?class a owl:Class}
Now the members of that class can be found pretty easily too:
SELECT ?member
WHERE {?class a owl:Class.
?member a ?class}
So - how do we get rid of all the information about these members? Easy - match all the triples about them:
CONSTRUCT {?member ?p ?o}
WHERE {?class a owl:Class .
?member a ?class .
?member ?p ?o .}
This gives us all the triples about the instances (including the type triples!). You can save these in a file on their own. Using TopBraid Composer, you can do this in the SPARQL tab by pulling down the context menu option, "Export results to file ...". If you are coding in Java, you can use the Jena API to create a new OntModel with these triples in it.
But how do we get just the schema? Well, if we use the ARQ SPARQL extensions (which are pretty sure to get into the recommendation soon), this is easy; just delete these triples:
DELETE {?member ?p ?o}
WHERE {?class a owl:Class .
?member a ?class .
?member ?p ?o .}
The stuff that's left is the schema. In Composer, you can just File>Save As, and put this where you like.
OWL's insistance that members of classes not be classes themselves (or properties) does a lot of work for us here - we know that we didn't 'accidentally' get any classes or properties in with our instance data. The situation gets a bit more complex if you can't count on this separation, but the principle is the same - select the triples that make up the part of the file you are interested in, and save it separately.
Posted at 04:46 PM in SPARQL | Permalink | Comments (2) | TrackBack (0)
For those of us who have been doing Knowledge Representation for decades, we judge a modeling tool on its power: How many whiz-bang shortcuts for complex OWL restrictions or mass editing of similar items or refactoring does it have? But when we try to get Modeling to the Masses, or at least Modeling in the Enterprise, we find that it isn't the power tools that they are interested in. Enterprise knowledge workers will prefer pretty simple model editing tools. But they insist that they have strict control over version governance.
What exactly is version governance? Often the people who want it aren't quite sure, but they know it when it isn't there. Someone makes a change to a part of a model on someone else's turf. Someone wants to try out a long-transaction 'better idea' to see how it works - but we want to be able to toss it later on if we don't like it. Or we find something wrong in a category - who changed it? When? What was the model like when they changed it?
Some of this stuff comes for free when you use a version control system like SVN or CVS. But these solutions, which are great for managing versions of java code, aren't intuitive to a team that is organizing, say, a vocabulary project. They want something a bit finer grained (who changed this term?) and with a bit of process ("I can propose a change, but only John can approve it").
That's why the biggest part of TopQuadrant's Enterprise Vocabulary Management System (EVMS) is a system for collaborating on model changes. You don't just use the EVMS to change a vocabulary; you use it to build a sandbox in which you make your changes. The changes then enter a (configurable) workflow, where, if they get approved, they are committed to the shared version. If not, well, then they aren't.
Now, that's pretty cool. After all, it lets teams collaborate on their vocabulary management, lets them manage territory on a term-by-term basis, and even provides a process for moving the changes along. But the thing that I find most cool about this is that it was all built using the TopBraid Ensemble assembly platform.
You see, I never got the hang of coding Java, and I'm not really a programmer. But I like making systems do what I want them to do, so I am a big scripter. The entire EVMS collaboration control system is written as a TopBraid Ensemble application.
What does this mean? It means a lot of things, but for this project it means that when I was talking to a colleague about how to display the changes that had been made to a vocabulary. He said, "to my mind, I want to be able to click on a term, and see all the people who have changed it, and why!" Well, all that information is modeled in the system - it is just a matter of querying it out with SPARQL.
In the figure, we see the final step of this. We are looking at a fragment of the NCI Thesaurus regarding Organisms. The change log shows a rather silly argument over what we should call lab mice by two of the taxonomists. Every change was made through the EVMS, so we can track back the whole story about each term. Adding this to the system was as easy as writing a SPARQL query and wiring it up to the display components (a grid in the upper-right and a form in the lower-right) so that the changes relevant to a chosen term would be shown.
Posted at 07:53 PM | Permalink | Comments (2)
Posted at 07:00 PM | Permalink | Comments (0)
Posted at 02:24 PM | Permalink | Comments (1)
Dean Allemang and Jim Hendler: Semantic Web for the Working Ontologist: Effective Modeling in RDFS and OWL
Latest book by Dean Allemang and Jim Hendler. Aimed at people who wonder what the Semantic Web is all about, but who don't want to have to learn logic. It is still a technical book - you'll learn what the standards are and how they work, but from the point of view that someone who is technical in some other field (say, biology or astronomy) could use.