Random Thoughts on The Semantic Web

Posted by on January 23, 2011 at 8:39 pm.

The idea of the Semantic Web has been simmering for some time now, the (next) great idea of Sir Tim Berners-Lee to move beyond just linking documents to providing them with a structure that would allow for the programmatic discovery of links, and create along with it another great wave of innovation.  It has been simmering largely because there was no compelling, singular current benefit to be gained from creating semantic content, despite requiring significantly more effort to create than just authoring “regular” content.  To many people, the whole notion was too far-fetched, a neat idea that was completely impractical to implement and so a waste of time to pursue.

Recently, though, the cynicism which the Semantic Web encountered early on seems to be dying back.  This is partially because the Semantic Web proponents don’t talk as much about the Grand Vision, and also because there have been some successes and inroads.  The most notable of these have been with Linked Data (it is easier to impart semantic structure to data than to content) and with various simple semantic structures (RFAa, microformats, FOAF, GoodRelations to name a few).

Some of the success in both areas has been driven by the fact that much of the content today is generated through web applications like content management systems, which have created data fields for the capture of additional information besides just “content.”  Though the purpose of such structure was originally to achieve consistent formatting for websites and aid in search engine optimization, once in place it became very easy to have it generate semantic mark-up as well, even if there wasn’t any particular use defined.  The other reason is that there have been a number of tireless champions diligently working for so long now that  they have actually managed to create interest, change, and actual content; small works over long periods eventually begin to create noticeable effects.  Because of these forces the semantic web is becoming a reality, even if it isn’t appearing as a great fireworks event, or in its most elegant and pristine form.

A few more specific examples might help illustrate why people are interested in the Semantic Web. Before I left Harvard I met with some of the people developing a platform they called the Scientific Collaboration Framework.  This project was building a system so that scientific communities of interest could create a collaborative workspace to make it easy to share research, papers, ideas, data and other things.  Despite the fact that scientific communities are relatively small, it is very easy to miss research which is similar or complementary to your own when  it is not occurring in the specific confines of your particular research discipline.  The framework sought to promote communities around topics instead of disciplines, with a considerable amount of effort spent making it easy for researchers to add content that could be automatically tagged and referenced to semantic terms to aid in the discovery of new collaboration opportunities.  One might call it social networking for data, in the sense that by posting your own research, you could discover other people who were working on similar problems, which could unearth both data useful for your own investigations as well as other researchers whom you could create new and novel collaborations with.

More recently I’ve been following the development of GoodRelations, a semantic approach for commerce (very generally defined) which defines a standard way to describe companies, products, and offers.  Once so described, it becomes considerably easier for the data to be utilized in any number of ways, such as submitting it to product search engines.  In general, creating this type of semantically structured data should help companies achieve greater visibility in the marketplace by making it easier for consumers to find information when they search.  But there is also the larger prospect that instead of relying on users to do just the right search, it could become possible for a system to actively place the right product, at the right price, at just the right time to the right consumer who is looking to buy, which some might call the holy grail of marketing.

The general promise of the semantic web, then, is to make it much easier to find and utilize interesting connections between objects—people, data, products, anything really—where historically it would have been extremely difficult to identify the connection.  This can be any number of things, from the examples above to things like job seekers and employers, apartments and renters,  a restaurant you didn’t know about, an article relevant to the blog post you’re writing, the long-lost cousin who just moved around the corner from your best friend from college.  So instead of relying on random chance to create connections and surface useful information, the connections can be discovered instantly when needed, or even in advance of us knowing that there might be an interesting connection to be made.

Regardless of how it evolves, I think there will be very fascinating things appearing which are built upon semantic web technologies, some of which will be very useful but not obvious (like Google’s Rich Snippets, which add the Yelp reviews to your searches) and others which will be considerable more ambitious (like Siri, the personal assistant, which was acquired by Apple).  As more data and content is generated, tagged, or transformed to include semantic elements, the promised innovation will come, and we will find it ever easier to discover information, and to have it discover us as well.