Ranking algorithms are changing a great deal at the moment. Before, like HITS and PageRank they looked at how websites and pages were connected and used any information they could gather from those to use as variables. Nowadays we’re seeing research papers come out focusing on how the semantic web, tags, social media can contribute to improved ranking. The reason is quite simply because it’s a whole lot more data to work with. Personalisation is being tested by search engines and we know that the semantic web has received a lot of attention.
I’ll go into more detil to explain the ins and outs of the technology in a simple way so don’t worry about weird jargon.
I think the paper called “Ranking in Folksonomy Systems: Can context help?” by Abel, Henze and Krause from Leibniz University Hannover was a nice introduction. They explores graph-based ranking algorithms. They “enhance these algorithms by exploiting the context of tag assignmets, and evaluate the results on the GroupMe!” dataset.” They prove that this kind of approach can indeed improve web search ranking algorithms. They also look at how they help with ranking result lists in taxonomies. They combine the grouping and tagging paradigms to get additional context information which is embedded into tag assignments that go on to form the folksonomy. All the ranking algorithms rely on this.
A few definitions:
Folksonomy: collaboratively generated, open-ended labels that categorize online content such as Web pages. It is used in information retrieval.
Taxonomy: It’s basically the practice of classifying things. It’s a classification system.
Graph-based ranking algorithm: Klein-berg’s HITS algorithm or Google’s PageRank are graph-based. It’s a way of determining the importance of a vertex within a graph, by also considering the global information which is recursively computed from the whole graph,rather than only on local vertex-specific information.
GroupMe!: Here users annotate information that can be interpreted in a certain group context. This is where the semantic information comes from for this experiment.
Here is a short description of the algos used:
GRank: “exploits the context gained by grouping resources and which improves search for resources”. It also ranks relevant resources according to a keyword query. Like FolkRank it also computes topic-sensitive rankings.
FolkRank: It adapts Personalized PageRank for folksonomies & computes a ranking score of all types of entities in the folksonomy. This algorithm exploits the structure of folksonomies to improve search results. It also computes topic-sensitive rankings. You can try it out at BibSonomy. More reading here.
GFolkRank: This is an adaptation of the 2 above. It “interprets groups as artificial, unique tags”.
GFolkRank+: In addition to the above, this is capable of propagating tags, which have been assigned to a group, to its resources.
SocialPageRank: It ranks resources and tags respectively. It computes static, global rankings independent of the context. It determines the popularity of websites according to the popularity of users and tags. You can get it here.
SocialSimRank: It ranks resources and tags respectively. It computes static, global rankings independent of the context. It determines the similarity between tags or query terms based on the tag assignments.
Results – which ones performed best:
GFolkRank did better than FolkRank and SocialPageRank as far as the overlapping similarity goes.
They then tested them on untagged resources (because a lot of the time not everything is tagged up) and FolkRank was outstanding, doing better by far than SocialSimRank.
GFolkRank did better than because it ranked the first relevant tag consistently.
FolkRank gains the highest precision at rank 3.
There is further evaluation and comparison in this paper.
Why should you care?
New ranking algorithms are showing up all the time, and they revolve around improving search engine results. Much research is looking at using semantic web features such as folsonomies, taxonomies, all the tags available…This illustrates how important it is to start taking note of this stuff and looking at how your website or blog can feed these algorithms which what they need. It’ll give you an edge when something concrete and commercial descends on us.