Google announced this week that they would start personalising search results, even for users that are not signed into a Google service at the time. It has caused a consistent flow of posts from the blogosphere and quite a lot of comments as well which show that there is a gap in knowledge at some level. Personalisation is not a new concept, it’s been in research for quite some years and there is plenty of literature available on the subject. Unfortunately I don;t have time to sum these up into a handy post for you but I do have time to drop you this handy list of papers that are freely available for your perusal. They are all recent and all free and the aim here is to get a better grasp of what personalization really is and how it works. I’ve read in few comments things along the lines of “If you happen to be interested in bananas one day then after that all your results will be about bananas” – well that’s just nuts isn’t it
The following papers cover all sorts of aspects and I selected these because if you read them all you get a general overview of what is possible at this time and what is of interest as well (If you an ACM or IEEE member you will have access to some pretty awesome papers not listed here but I wanted to make this list accessible to all):
- System and methods for personalized search, information filtering and for generating recommendations utilizing statistical latent class models by Thomas Hoffman and Jan Puzicha (Reccomind, inc – Berkeley)
- Enhancing Collaborative Web Search with Personalization: Groupization, Smart Splitting, and Group Hit-Highlighting by Meredith Ringel Morris, Jaime Teevan, Steve Bush (Microsoft Research, Redmond, WA, USA)
- Task aware search personalization by Julia Luxemburger, Shady Elbassuoni & Gerhard Weikum (Max-Planck Institute of Informatics)
- Personalized Concept-Based Clustering of Search Engine Queries by Kenneth Wai-Ting Leung, Wilfred Ng, and Dik Lun Lee (Hong-Kong University)
- Discovering and Using Groups to Improve Personalized Search Jaime Teevan, Meredith Ringel Morris Redmond, Steve Bush(Microsoft Research)
- Cluster Based Personalized Search by Hyun Chu, Lee and Allan Borodin (University of Toronto)
- Personalized Web Search by Gossiping with Unknown Social Acquaintances by Marin Bertier, Davide Frey, Rachid Guerraouiz Anne-Marie Kermarrec, Vincent Leroy (INRIA)
- Rank Optimization of Personalized Search by Lin LI, Zhenglu YANG, and Masaru KITSUREGAWA (University of Tokyo)
- To Personalize or Not to Personalize: Modeling Queries with Variation in User Intent by Jaime Teevan,Susan T. Dumais & Daniel J. Liebling (Microsoft Research)
- Learning user interests for a session-based personalized search by Mariam Daoud, Lynda Tamine-Lechani & Mohand Boughanem (Institut de Recherche en Informatique de Toulouse)
There is a really fantastic paper written by Jaime Teevan, Susan T. Dumais, and Eric Horvitz (Microsoft Research) called “Potential for personalization“. They find that:
“To better understand the variation in what people using the same query are searching for, we examine explicit relevance judgments and implicit indicators of user interest. We develop analytical techniques to summarize the amount of variation across individuals (potential for personalization curves), and compare different data mining techniques for generating these measures. Through analysis of the explicit relevance judgments made by different individuals for the same queries, we find that significant variation exists not only when people have very different underlying information needs (e.g., “computer-human-interaction” vs. “ch’i”), but also when they appear to have very similar needs (e.g., “key papers in human-computer interaction” vs. “important papers in human-computer interaction”). While explicit relevance judgments are impractical for a search engine to collect, search engines do typically have a large amount of implicit data generated through users’ interactions with their service. By exploring several of these sources of implicit data, we find it is possible to use them to approximate the variation in explicit relevance judgments and improve the user experience. Implicit measures that are behavior-based (e.g., related to the similarity of a result to previously visited URLs) appear to hold potential for capturing relevance, while measures that are content-based (e.g., related to the similarity of a result to other electronic content the individual has viewed) appear to hold the potential for capturing variation across individuals.”
In fact anything by Susan Dumais is likely to be very enlightening as far as personalization goes! I particularly recommend her slides from UMAP on personalization and Bing. She describes the process and challenges in creating and supporting personalized search.
I know there are a lot of strong feelings about personalization but I recommend you read the writing on the van!