The big news this week has been Google announcing their use of semantics to enhance the performance of the search engine. This will not come as a surprise to computer scientists working in the language field (IR, NLP etc…). There are also already quite a few semantic search engines around like cognition for example. I think we were waiting for Google to take this step for a while and now it has it’s really interesting. This does not take over from the keyword approach obviously but is an enhancement.
What the announcement means:
This announcement has led to questions about what the difference was between the semantic web and semantic search. The announcement does not relate to the semantic web in any shape or form. Google is not announcing that it is adding support for RDFa, OWL, microformats or anything else to allow for structured browsing. Their improvement means that by looking at relationships between the words in queries (and in documents I imagine) they can find a better spread of relevant results.
3 useful definitions:
- Concept: an abstract or general idea inferred or derived from specific instances
- Data: a collection of facts from which conclusions may be drawn
- Information: knowledge acquired through study or experience or instruction
- Semantics: the study of language meaning
Putting it all together:
Semantics identify concepts which allow for the extraction of information from data. If you are looking for the meaning of documents or queries concepts need to be captured.
The semantic web/search:
I have covered this at length on this blog at length, take a look at the semantic web section. Particularly “What is semantic search“. In that post you will find the difference between semantic search and semantic web explained in an easy to digest way.
Instead of repeating myself I will list a number of tools that I have been using for quite a while to find semantically related concepts using keywords as a starting point. I think these might give a bit more insight into what is involved and what kind of thing is output as a result.
- Wordnet::Similarity (Perl module that implements a variety of semantic similarity and relatedness measures based on info found in WordNet)
- MSR (You can find how semantically related words are using Google, Wikipedia and many others – or all of them at once)
- SenseRelate (uses measures of semantic similarity and relatedness to perform word sense disambiguation)
- UMLS::Similarity::path (Perl module for computing semantic similarity of concepts in the UMLS by simple edge counting)
- SenseBot (Search engine but it will display a number of semantically related terms to your query)
- SenseLearner (A Tool for All-Words Word Sense Disambiguation)
- GWSD (Unsupervised Graph-based Word Sense Disambiguation)
- FrameNet (Visualise relationships)
Semantic Networks: Visualizations of Knowledge (Roger Hartley and John Barnden)
Keyphrase Extraction using Semantic Networks Structure Analysis (Chong Huang, Yonghong Tian2, Zhi Zhou, Charles X. Ling, Tiejun Huang)
Semantic Search (Guha, McCool, Miller)
Here is a very interesting presentation by Jon Atle Gulla – It’s 2 years old but still current you will notice.