“Determining User’s Interest in Real Time” by Singh, Murthy and Gonsalves is a pretty interesting paper. It was presented at www 08.
This is an interesting one for the SEO’s out there in particular. There are a lot of techniques that they use to determine how popular a term or keyphrase is so that they can target those that bring in the most quality traffic to a site. These methods may involves things like checking Google keyword tool, wordtracker, even adwords sometimes, and many other sources of data available to them. This paper however proposes a method that does something much more interesting than that and much more valuable because it determines user interest in real time, rather than in the last month or the last day.
As the authors point out, user interest depends on many factors that can’t be taken into account using past data. They use the current query session to determine the interest in real time and thus show that search engines could well adapt differently according to data gathered immediately. They tested on short queries and found that it worked quite well.
Most of us have figured out that search engines that rely on personalisation techniques, history, and clicks may well not be very representative of the current aim of the search. Our aims change sometimes, even though we may be looking for something similar.
Their search session begins at the users first query and ends when he/she stops searching. The results that the user chooses to click on give information on the intent of the search. They suggest that search engines could refine the 2nd page of results if the user goes there to include only ones that relate to the previous behaviour on page 1.This means that the search engines can adapt during the session to the users’ intent. This makes searching far more dynamic.
The tool used for this experiment:
“This tool is a metasearch engine which monitors and records the users behaviour and performs few possible optimizations. For each new query request, the log contains the query, query session id, time of the request, ip address. For each click on the results, the log contains the url, rank of the url in the list of the results, time of the click, query, session id, ip address. It also maintains a cache containing detail information of the results and the clicks for every query session. It allows the tool to process the request in real time”.
They also observed that “query session with very short and very long queries are more likely to be expanded compared to the query sessions of query lengths 2 to 3 words”.
Their system predicts the lists using the first page of results that are then compared with the actually clicked results by the user in the next page of the expansion.
The method for determining user intent:
They use the set of results that the user has encountered first off and each result is a set of terms found in the snipets of the reults.
Then they divide set of intial results into 2 groups, one containing the results actually clicked on by the user and those that were not.
Next they determine the set of terms that frequently occur there and those which do not. and they weight each one accordingly. They decide which terms are likely to have a high frequency and those which will not.
Weights that are high mean that those terms are relevant and those that are low are not.
They compare the predicted list from the 1st page to the actual clicks from the user on the expanded page. if they click on those predicted by the system, it has worked.
For long queries almost all the results are covered from the predicted list in the next page. This because long queries cover a vast number of topics. This means that they’re not of great interest. The short queries however are a better measure of accuracy. Even though these were fewer on the 2nd page of results, they were far more accurate.
Why should you care?
This offers to SEO people a new way of assessing user intent and also seeing which results are actually of interest to them..This allows for quality traffic to come through via more accurate user interest in the queries they’re chosen in the first place. It also poses a new problem because if search engines did indeed use this method, then the results would be highly personalised. However because they do not rely on data completely unavailable to the SEO, using a well defined method for also predicting results through experimenting would give insight into which terms are most relevant to the website they are working on.
Computing people will appreciate this method also, because it gives a little more insight into how results could be made more accurate, and this not only for the web Search engines.