This is a list of my top 10 freely available papers on the topic of information retrieval. You will notice that they are rather old, but the techniques used described and the findings are not always dated. Those that dated are important nonetheless because they provide a good foundation to under[...]
Archive for the ‘Information retrieval’ Category
Dr. Searcher and Mr. Browser
A really interesting method, partly for combating search result manipulation is described in “Dr. Searcher and Mr. Browser: A unified hyperlink-click graph” by Poblete (Uni. Pompeu Fabra), Castillo and Gionis (Yahoo). They worked on making a unified graph representation of the web includ[...]
User goals and tailoring click models
There’s an awful lot of attention is being given to users and click data at the moment (despite the fact that analysing query logs dates back to 10 years ago). “Tailoring Click Models to User Goals” by Guo, Li, Faloutsos (Carnegie University) is from the 2009 workshop on Web Search[...]
Information Extraction is not Information Retrieval
Here we will be covering mostly what information extraction (IE) is because it isn’t given nearly as much attention as information retrieval (IR). The differences are highlighted but for more in-depth information on IR check Mannings online book. I’ve provided an IR glossary and also a [...]
How does a search engine know what words mean?
Word sense disambiguation (WSD) belongs to the field of computational linguistics. It’s the research area dedicated to finding ways for machines to understand the meaning of words. More precisely, it’s about determining the word sense of a particular word in a context. This [...]
User intent in real time
“Determining User’s Interest in Real Time” by Singh, Murthy and Gonsalves is a pretty interesting paper. It was presented at www 08. This is an interesting one for the SEO’s out there in particular. There are a lot of techniques that they use to determine how popular a term o[...]
Search engines and long queries
Since we talked about long-tail queries earlier in the week, I was inspired to look a little more at query analysis and how search engines could deal with the troublesome long-tail ones. You know when every time you see a pair of shoes you really like and it’s always the same designer? Well [...]
Long-tail is rubbish!
I got to the Hitwise report through a link from Dave to a post on Search Engine Land written by Matt McGee (convoluted journey I know). I liked the post because I have tons of papers and things to read each day and the concise writeup pleased me greatly. The report (isn’t that long [...]

