I’ve been catching up on all my reading these last couple of weeks and I particularly liked “From X-Rays to Silly Putty via Uranus: Serendipity and its Role in Web Search”. It’s by Paul André from Southampton University together with Jaime Teevan and Susan Dumais from Microsoft Research.
Louis Pasteur once famously said: “In the fields of observation chance favors only the prepared mind.” Serendipity is the effect of discovering something really interesting, whilst looking for something else entirely. A completely unintended but fortunate discovery. X-rays, silly putty and Uranus were all chance discoveries. This paper investigates this effect in relation to the web and asks this question: Does improving search engine results through personalisation diminish chance encounters with highly prized information?
“By its nature, serendipity is hard to study. In this paper rather than trying to induce or identify serendipity, we conducted a study to explore the potential for serendipitous encounters. Specifically we examine:
1. Whether there is the potential for serendipitous encounters during Web search; and
2. Whether the ability to better target the user‘s interests through personalization reduces this potential.”
They mention Amanda Spink‘s work, where she found that “partially-relevant search results, identified as containing multiple concepts, [or] on target but too narrow, play an important role in a user‘s information seeking process and problem definition.” It is however very difficult to induce, observe, identify or study serendipity in web search. What we’re really trying to understand here is whether partially relevant results are also valuable. In the SERPS results that are not highly relevant to your query but still interest you would be considered valid anyway, seeing as they are of use to you.
The researchers studied web search engine query logs and carried out a controlled study where people marked results as interesting and relevant. They were asked to rate on a scale how relevant and interesting 25 of the 50 results presented were to them. The queries were not imposed, users queried according to their information need. The results were from Live search and displayed in random order. They used the logs to measure the popularity of the queries, the number of results for each query and their diversity. They also looked at user interactions with the search results. All 36 participants were Microsoft staff and they analysed a total of 92 queries.
They measured “Click entropy” which is basically a measure of how often the same results were selected. “Low click entropy” means that people selected the same results in response to a query and “high click entropy” means that everyone selected different ones. They also asked the people taking part in the study to install a toolbar which allowed them to check desktop content, and such things so they could assess how close the results were to people’s information habits. There was therefore a content based similarity measure and a behaviour based similarity measure.
21% of results were considered serendipitous. Query length, number of results returned, popularity and other measures like this were not considered a factor but the positive correlation between entropy and the number of interesting (and potentially serendipitous) results was indeed a factor. People tend to click on things not because they could’t find what they were looking for but because they were interested. They deduced that queries with high click entropy therefore have a higher potential for serendipity. “Interestingness” and “relevance” score both showed that personalization could be an important factor in finding serendipitous results for users.
It hadn’t actually occurred to me that partially relevant search results could be of value. In IR the standard measures are “precision” and “recall” meaning that we strive to make results as relevant as possible. This study suggests to us that there is room for “serendipitous results” and it makes a lot of sense. There are sites such as Reddit where we (I at least) never go to find anything in particular but rather to stumble over something interesting. Obviously Stumbleupon is based on this idea as well. Could we possibly be looking at a new way to measure the efficiency of a web search engine? How about business websites, could they benefit from popping up in seemingly irrelevant SERPS? Either way, it’s a really interesting idea and I look forward to reading more.