I wasn’t going to blog about the newly released search engine Cuil, because there’s so so so much talk about it all over the place, and the collection of resources available will give you good insight and an awful lot of info, which I don’t need to repeat here. However it would be a bit rude of me to not acknowledge it and give my couple of pence about it.
In short, Cuil is a new search engine launched by Anna Patterson, Russell Power and Louis Monier. Ex-Google employees imho are top class engineers and scientists. This means that they have the skills to make a good engine.
People have tested it and report bad results, images being associated with the wrong sites, porn being shown in safe mode, a mix up of links internally to the engine, not respecting robots.txt, omitting well know important websites, etc…basically not a good start. I’ve also noticed they don’t make great use of stemming, but seeing as they are using a form of contextual search, then that might make sense, depending on what they’re trying to do.
Contextual search: a method based on search through the text of a page in any part of the file rather than in pre-defined fields. It’s a similarity measure. Context, inter-relationships and coherence are measured and analysed in oder to give good information on the page. Google uses very statistical methods, and to be honest most IR is based on those but contextual search is more language oriented.
A good example of this concept is given by Miley Watts and Anthony Coats:
“For example, a user searching for components with both “Automotive” and “North America” contexts should receive results that include business components tagged with “Automotive Engines” and “Detroit” (but which do not have “Automotive” nor “North America” as direct contexts). Equally, if the contexts “United States of America”, “USA”, and “America” all refer to the same country, then a search for any one of those contexts should return results from all three equivalent contexts.”
Let’s try this in Cuil:
I typed in “Automotive” and “North america” and got results about:
- Automotive designLine
- Automotive modules
- Unusual automotive solutions
- Automotive accessories (ipod)
- Automotive testing
- Linux automotive
- BMW (north america)
- AERA engine builders association (north america)
- Automotive testing (x2 results)
- buy/sell cars (canada)
The “USA” and “America” example doesn’t work either.
This doesn’t live up to the academic example does it. It doesn’t look very contextual at all. Some of those results have nothing to do with north america. Some do like BMW which I think is probably an ok result as well as AERA. “Buy/sell cars” is totally irrelevant.
Something is clearly wrong. Everyone else is right, the results aren’t really relevant and also there’s not really evidence of good contextual search. Yes, it’s not a simple keyword search or a text-free or text based search, but where is the contextual stuff?
Why? There are good people working on this with exceptional skills and expertise. I cannot understand what has gone so wrong.
But, I like the interface personally and the right hand side categories and think it looks fine.
This SIGIR paper will show you the use of contextual analysis for email, which is interesting.