Ricardo Baeza-Yates, Peter Mika and Hugo Zaragoza from Yahoo! Research wrote a really insightful, meaningful and down to earth article called “Search, Web 2.0, and the semantic web”. It is a response to all of the buzz around those topics. If anyone can give a straight answer on these topics, it’s these guys.
The semantic web has succeeded in “…a variety of well defined expert domains, where the power of semantics has been successfully exploited in interconnecting heterogeneous data sources and finding new insights through reasoning”
There are a good number of forward thinking resources like Twine and Freebase for example that are used and have been successful. I think that a lot of the web community are quick to put down the semantic web without really having investigated the current developments. The semantic web is definitely in its infancy, let’s not pronounce it dead at birth.
They go on to explain why search is so fundamental to the web, and clearly as most of us know, there wouldn’t be a working web without search engines. They retrieve relevant data based on data and also query logs, links and other patterns. They go on to explain that the semantic web is all about making data more adapted to machine processing.
As they say, web 2.0 has allowed for users to generate a whole lot more data in a really easy way, so the amount of it available is staggering. Search engines need to exploit web 2.0 improvements “…this would let search engines move away from document retrieval to directly addressing the anomalous state of knowledge that triggered the user’s action.”
They talk about how the semantic web changes search, for example crawlers directly capturing data from RDF-like formats. This means that search engines can display and present data to the user. They believe more site owners will make their data available in this format; “We believe that ultimately authors will be motivated to adopt these technologies to make it easier for users to find and use their resources.”
One thing we often come across is how to get people motivated to tag things up. Blog owners are doing it, and Del.icio.us users for example. The authors reckon that given the right incentives, users are willing to do this more and more. It’s important in the making of the semantic web.
“If there’s any sense in talking about the Web that will come after Web 2.0, we believe that it will integrate and use all the experience that can be captured from both traditional content (Web 1.0) and user-generated content and meta data (Web 2.0), as well as data about this content’s usage.” Please stop saying there’s no such thing as web 3.0. Call it what you will, there is an “after web 2.0″.
Reasons they give for the semantic web being slow in development:
1 – “realizing the Semantic Web would require solving extraordinarily difficult problems in the areas of knowledge representation, natural language understanding, and semantics—problems that have been recognized and studied for years in IR and other fields, with partial success.”
2 – “The Web is for all intents and purposes infinite, with wide use of dynamic page generation. At the same time, users are becoming more impatient.” – this means it also has to be super fast.
3 – There’s a divide in the research arena:
“IR research is strongly driven by a problem, whereas Semantic Web research is driven by a solution. Metaphorically speaking, Semantic Web researchers are like the hobbyist toolsmith who has the idea for the perfect tool and presents compromises on the design. However, IR, as a customer, is interested in buying a hammer that might not be perfect but can drive nails quickly and precisely.”
I love that particular quote and that analogy, it’s exactly right.
They go on to explain how the notion of “relevance” is changing as well, meaning that we’re no longer looking for the same things as we were 10 years ago for example. Nowadays we are also looking at user intent and behavioural variables which weren’t included before. As they point out, as the technology advances and as we have a need for new, better tools, IR and the semantic web will meet.
Why should you care?
The semantic web is not dead, it is quite healthy and developing. Eventually it will be a strong feature in information retrieval and also on the web at large. It’s important to keep on top of developments and start finding out more about how it can be integrated in websites and how it can benefit them. As the authors say, if it’s going to help the search engines find your site more easily, I reckon every SEO will eventually be frantically optimising for it!