Semantic net
The big news this week has been Google announcing
their use of semantics to enhance the performance of the search engine. This will not come as a surprise to computer scientists working in the language field (IR, NLP etc…). There are also already quite a few semantic search engines around like cognition for example. I think we were waiting for Google to take this step for a while and now it has it’s really interesting. This does not take over from the keyword approach obviously but is an enhancement.
For more info about this see Greg Sterling’s post
and David Harry’s post
on the topic – both quite different so they compliment each other well.
What the announcement means:
This announcement has led to questions about what the difference was between the semantic web and semantic search. The announcement does not relate to the semantic web in any shape or form. Google is not announcing that it is adding support for RDFa, OWL, microformats or anything else to allow for structured browsing. Their improvement means that by looking at relationships between the words in queries (and in documents I imagine) they can find a better spread of relevant results.
3 useful definitions:
- Concept: an abstract or general idea inferred or derived from specific instances
- Data: a collection of facts from which conclusions may be drawn
- Information: knowledge acquired through study or experience or instruction
- Semantics: the study of language meaning
Putting it all together:
Semantics identify concepts which allow for the extraction of information from data. If you are looking for the meaning of documents or queries concepts need to be captured.
The semantic web/search:
I have covered this at length on this blog at length, take a look at the semantic web section
. Particularly “What is semantic search
“. In that post you will find the difference between semantic search and semantic web explained in an easy to digest way.
Instead of repeating myself I will list a number of tools that I have been using for quite a while to find semantically related concepts using keywords as a starting point. I think these might give a bit more insight into what is involved and what kind of thing is output as a result.
The tools:
- Wordnet::Similarity
(Perl module that implements a variety of semantic similarity and relatedness measures based on info found in WordNet)
- MSR
(You can find how semantically related words are using Google, Wikipedia and many others – or all of them at once)
- SenseRelate
(uses measures of semantic similarity and relatedness to perform word sense disambiguation)
- UMLS::Similarity::path
(Perl module for computing semantic similarity of concepts in the UMLS by simple edge counting)
- SenseBot
(Search engine but it will display a number of semantically related terms to your query)
- SenseLearner
(A Tool for All-Words Word Sense Disambiguation)
- GWSD
(Unsupervised Graph-based Word Sense Disambiguation)
- FrameNet
(Visualise relationships)
Reading:
Semantic Networks: Visualizations of Knowledge
(Roger Hartley and John Barnden)
Keyphrase Extraction using Semantic Networks Structure Analysis
(Chong Huang, Yonghong Tian2, Zhi Zhou, Charles X. Ling, Tiejun Huang)
Semantic Search
(Guha, McCool, Miller)
Here is a very interesting presentation by Jon Atle Gulla – It’s 2 years old but still current you will notice.


Two popular unsupervised techniques for doing this are clustering and dimensionality reduction, for instance latent Dirichlet allocation (LDA) and singular value decomposition (SVD). The latter is used for latent semantic indexing/analysis, whereas the former will even relate senses of words, not just words as a whole.
We have working code to do this with examples over language data in LingPipe:
http://alias-i.com/lingpipe/demos/tutorial/cluster/read-me.html
http://alias-i.com/lingpipe/demos/tutorial/svd/read-me.html
Not to mention word-sense disambiguation tutorials:
http://alias-i.com/lingpipe/demos/tutorial/wordSense/read-me.html
Hey Bob – sorry! Of course LingPipe! I think it’s so obvious to me seeing as I use it loads that I didn’t include it.Apologies!
I’ve been using Sensebot for this instead of a ‘typical tool’, guess its high time I tried out something more flexible and reliable, hopefully SenseRelate. Good comparisons, was unaware of the existence of most of these tools.
I’ve long been searching of a free semantic search tool extractor that can extract up to at least 50 to 100 LSI keywords. Yes, you can find a couple of free LSI tools online but they often produce very minimal results.
The only one we use is sensebot. haven’t even heard of the other tools.
Semantics it’s the study of meaning. Man, it will take ages to understand all word meanings