Two notable computer scientists, Krishna Bharat and George Mihaila, filed a patent describing a “Method for ranking hypertext search results by analysis of hyperlinks from expert documents and keyword scope”. The patent was published on the 18/03/2008, but filed in 1999.
In short: “A computer-implemented method and system for determining search results for a search query for hypertext documents. The hypertext documents are reviewed to determine expert documents. When a query is received, the expert documents are ranked in accordance with the query. Then the target documents of the ranked expert documents are ranked to determine the search result set.”
An expert document is is a document that is about a certain topic and has links to many “non-affiliated” documents on that topic.
It’s very difficult to asses how authoritative a page is, analysing their content alone is not enough. Human editors have been used in the past, but that method is way too slow. Collecting usage information has also been look into but you’d need huge amounts of it to be accurate. The method described in the patent proposes expert lookup followed by target ranking.
A summary of the process:
- The expert document list is created in pre-processing, and these are indexed in a special inverted index called an “expert reverse index.”
- A query is raised and the “expert reverse index” is used to find and rank documents matching the query. The best expert pages are found and ranked according to match information.
- Out-going links on expert pages are analysed by the target ranking and combining relevant outgoing links from many experts on the query topic, the best pages can be found: “This is the basis of the high relevance that the described embodiment of the invention delivers.”
It determines which hypertext documents are experts, ranking the expert documents according to the query, ranking target documents pointed to by the ranked expert documents, then the results are based on the ranked target documents.
Hilltop deals with expert documents and was acquired by Google in 2003. It’s useful for identifying strong cross-linking, and defines how a site is related to another. You can also find info In Poster proceedings of WWW9, pages 72-73, 2000.