Bio-IT Expo - then and now
This year's BIO-IT World Expo displayed a marked change from the event two years ago, from the point of view of Semantic Technology. In 2003, it was difficult to find a pharmaceutical company who had even heard of the semantic web, and it was difficult to make a value case for such a new technology. In 2005, Tim Berners-Lee delivered a keynote talk in which he outlined the value of the semantic web for life sciences.
Two years ago, conventional wisdom was that the life sciences industry wasn't ready for advanced data management. That practitioners in the industry were attached to their spreadsheet macros they had developed for managing their own data. Today, large-scale integration was the word of the day.
What changed in the meantime? In 2003, BIO-IT World Expo celebrated the completion of the genome project, shepherding in the beginning of a new era of information-based innovation. In order to be competitive, pharmaceutical companies needed to have as much information as possible effectively at their disposal. Gone were the days when any human could reasonably be expected to keep in their head all the information about even a small corner of what was known about the biological processes behind diseases.
Another change that has happened since 2003 is the 2004 ratification of OWL and RDF as W3C Recommendations, stabilizing the semantic technology industry. Since that time, semantic tecnology vendors have in turn stabilized their offerings, making the value proposition for the application of these technologies clearer.
Finally, as recently as Spring 2005, Oracle announced the inclusion of RDF support in their Network Data Model. This was the first time one of the major players announced official support for the Semantic Web. In fact, the Oracle Life Sciences User Group held a workshop on Semantic Web in conjunction with the BIO-IT Expo. One often thinks of big companies like Oracle as being slow on the uptake of new technology, waiting for the little companies to work it out, and only getting in the act when it is mature. Well, the folks at Oracle who developed this capability don't seem to have read that memo. They to have done a great job with this. As a long-time RDB skeptic, I never thought I would be this eager to get my hands on a copy of an Oracle product.
The workshop featured an outline of the Semantic Web given by Susie Stephens, followed by a detailed description of the RDF support built-in to Oracle 10g. The presentation made by Souripraya Das was impressive. In addition to an RDF triple store, 10g includes a basic rule engine and an integrated query language that allows federated queries over triple stores and relational tables together.
The workshop continued with a demonstration from a team made up of Siderean Software Inc. and Joanne Luciano of Predictive Medicine, Inc. The demonstration used about a dozen public domain life sciences databases and ontologies (e.g., the Gene Ontology, Uniprot, KEGG, etc.) The team had converted these databases into RDF, and loaded them all into Oracle 10g. Then Siderean Seamark was used to provide a faceted browser, allowing users to explore the information in the unified databases. As with many Seamark deployments, this one was finished in very short order - less than 2 weeks from inception to demonstration.
A recurring topic in many of the discussions around the ONDM had to do with the boundary between the utility of RDF and OWL. Dr. Luciano conjectured that plain RDF was most useful in the early stages of a data integration project, when the team is exploring the data, determining how it is structured. Once the structure of the data has been determined, then a more logical modeling style (like RDFS and OWL) becomes more appropriate. Fortunately, the Oracle 10g ONDM design allows for a smooth transition from RDF queries to OWL inferencing, without having to re-organize the data store. This solution, which is similar to the solution used in the IBM project SNOBASE and in Intellidimension's product RDF Gateway uses a general-purpose rule engine to do inferencing over the graph data. Support for RDFS and OWL is achieved by writing rules for these inferences in the language of the general-purpose rule engine. This solution also allows for a smooth transition to custom rulesets. The current implementation of the rule engine in 10g has some performance issues that prevent it from being able to handle more than the most basic OWL inferencing, but the Oracle technical team is considering how to integrate a more efficient rule engine into the product.
Integrating the rule engine into the database in this way has a lot of advantages, in contrast to retro-fitting another rule engine at arm's length. First, it allows other applications (like Seamark) to benefit from the inference engine, moving programmatic complexity from the query into the model. Second, it allows the rule engine to take advantage of optimization opportunities with respect to the data store. Finally, having the rule engine installed in the database allows the query language to integrate the triple store with relational tables in a smooth way.
The entry of Oracle into the Semantic Web space has already made a big splash, and rightly so. This isn't a big-name player passing off some veneer over an old product as something new; this is a genuine new capability, done well.
