Word sense disambiguation (WSD) belongs to the field of computational linguistics. It’s the research area dedicated to finding ways for machines to understand the meaning of words. More precisely, it’s about determining the word sense of a particular word in a context. This is really important as without this, it’s difficult for search engines, machine translation systems, dialogue systems, speech systems and so many more to function properly (or at all).
How hard is it?
It’s described as an “AI-complete” problem which means that solving it is at least as hard as the most difficult problems in artificial intelligence. Researchers began looking into it in 1949 in the field of machine translation. It’s so hard for machines to deal this problem that it still hasn’t been resolved. SENSEVAL takes place every year, and systems are evaluated, findings are shared and discussed.
WSD uses a lot of the work done in cognitive linguistics, psychology, linguistics and artificial intelligence as well. When we approach language problems like this, some are surprised to hear that the philosophy of language is also researched and discussed.
We don’t even know enough about how humans process world knowledge and link it to grammar, performing all types of linguistic manipulations to communicate and understand language. Some believe we learn it from a young age, others that it is built into us, and there is always the possibility that we don’t use such complex methods and we abstract everything. This is nice overview by Steven Pinker a very well known linguist. You can find a list of linguists categorised under their particular theory on Wikipedia.
Another issue is when words have more than one meaning (polysemy), when they are metaphors or are an extension of another word (metonymic).
Language is very ambiguous (meaning can be misleading – more than one meaning) and so are the senses used by different people. It is not always possible for the machine to give a very precise solution because of this variance. In fact humans have been shown to be about 90% accurate.
Examples of ambiguity in language:
- “Drunk gets 10 years in violin case”
- “The lady hit the man with an umbrella”
- “Green” and “Green”
- “Chair” and “Chair
knowledge based approach: Dictionaries and thesauri can be used (like WordNet and the Collins dictionary) to try and narrow down the possibilities. Using these you can find similarities between words and their definitions. It is also possible to find which semantic network the word belongs to. There is no manually annotated corpora, it is “raw”.
The Lesk algorithm uses the resources available (dictionary and thesaurus) to find definition overlaps of these words. This means they can be disambiguated in context. All of the possible definitions are retrieved for the words, the definition overlaps for all of the possible combinations are found, and then the highest overlap indicates the correct word sense. The simplified version works on a single word rather than a group of words. It reduces the search space.
Deep approach: A vast amount of information is fed to the machine. It actually seeks to derive meaning from the body knowledge it is given. It’s not very useful because of all the data that needs to be gathered and processed first. It demands very sophisticated artificial intelligence techniques to work well, and we have not yet perfected (or in some cases invented) these.
Shallow approach: This method bypasses the whole idea of getting the machine to understand the text. Instead it uses natural language processing techniques like ngrams (word groupings), frequency counts, conditional probabilities and other techniques which basically attach extra information to the words provided. A number of texts are prepared by a human who tags everything up correctly. This information is then given to the machine. It will identify patterns and use those to derive meaning. This is achieved by using machine learning techniques like decision trees or naive bayes classifiers for example.
“Bootstrapping” is a method where the machine is given a small amount of tagged up data dn the a large amount of raw data. It is also equipped with classifiers which proceed to improve on the original classification by finding patterns.
The problem with methods requiring an initial tagged up corpus is that these are not readily available. They are expensive to create and hugely time consuming. It has to be repeated for every different language as well, which is really not efficient. This method does work very well, but in order to work across the broad, in every context you would need millions of tagged up words.
(Part=of-speech tagging is different to WSD because it doesn’t tag words with senses but rather grammatical classes.)
WSD on the web:
The data on the web ranges from websites, to journals, to blogs and many other types of document structure. This makes the whole corpus of the web, an “unstructured” one. Traditionally search engines use lexicosyntactic analysis which is not deep enough to actually determining meaning in context. It can’t deal sufficiently well with the range of ambiguity in language.
In search engines for example, precision is reduced due to queries being typically sparse (not containing enough information).
Why should you care?
When website copy is written up for SEO purposes, it would be good to understand how the search engines figure out what your are writing about, based on the words, word groups and distinctions between these. It also helps to understand how keywords and keyword phrases could be interpreted to discover query intent. Using historical user queries help narrow this down. Finding out what the possible historical queries could be i relation to a single query is far more thorough than looking at volumes of searches.
For computer scientists the use of such technology is so very valuable in so so many ways. It is possible that the techniques being developed for words could be adapted to other sears of science and not only in computing.
Some further resources:
Learning extraction patterns using Wordnet (Mark Stevenson and Mark A. Greenwood, Sheffield University)
WordNet::similarity (Ted Pederson)
Word sense ambiguation: clustering related senses (William Dolan, Microsoft research)
Word Sense Disambiguation and Information Retrieval (Mark Sanderson, University of Glasgow)
Meaningful clustering of sense helps boost word sense disambiguation performance (Roberto Navigli, University of Rome)
I don’t believe in word senses (Adam Kilgarriff, University of Brighton)
Using Wikipedia for Automatic Word Sense Disambiguation (Rada Mihalcea, University of North Texas)