Google picked up Amrit Gruber who is doing an internship with them. He’s pretty valuable because of his PhD research in statistical text analysis (which is what LSI is). His method is uses Hidden Topic Markov Models (HTMM) and a working version was released in 2007. In this post Google m[...]
Archive for the ‘Uncategorized’ Category
LSA/LSI source code & tools
I’m often asked by students, researchers in other areas and sometimes SEO people where they can find LSI/LSA source code/tools. My favourite beginners tutorial on LSI is by Genevieve Gorrell from Sheffield University. The term is LSA mostly used in computer science these days but it doesn[...]
Tips for blog writing? Really?
After writing commercial blogs and such things, I really like that my blog doesn’t sell anything, or try to be anything more than what it is: A place for information about IR related topics which relate to SEO work, although not all SEO peeps will see it that way. I follow no guidelines, I don[...]
Advances in IE for the Web
This article was published in the ACM communications and was written by Oren Etzioni, Michele Banko,Stephen Soderland, and Daniel S. Weld. It’s freely availble and you can read the whole issue here. Google usually give you way too many documents when you’re searching for a very simple[...]
LSI – No more!
With the help of some very cool Tweeters, I found some interesting facts about LSI and SEO. They are @dpn and @Mendicott. For a simple idea of what LSI/A is please read the wikipedia entry on it. The original paper is here. LSI was patented in 1988 by Scott Deerwester (doing humanitarian work n[...]
The importance of Datamining
Data mining is also called knowledge discovery and data mining (KDD). Data mining is the extraction of useful patterns and relationships from data sources, such as databases, texts, the web… It has nothing to do however with SQL, OLAP, data warehousing or any of that kind of thing. It uses [...]
The impact of SEO on the online advertising market
This paper written by BO Xing and Zhangxi Lin from the Texas Tech University in 2006 discusses the impact of SEO online. The study is conducted in an analytical way, using a number of good resources but has at times a simplistic view of the SEO effort. SEO’s are considered to be of “para[...]
Hot topics in comp sci vs SEO
To see if there was a correlation between hot topics in SEO and hot topics in IR, I’ve listed the top 10 in each in no particular order. I may have forgotten some in SEO because that space is not as ordered at the comp sci one. SEO popular topics: - How to get more [...][...]
10 free papers: semantic relatedness of words
There’s a lot of buzz about keywords and their semantic relatedness recently so I thought I’d volunteer 10 good papers, freely available via citeseer to widen or extend the conversation. The list is obviously by no means exhaustive. Non-computer scientists, don’t be afraid of the[...]
CredibleRank
I thought I’d share ”Countering Web Spam with Credibility-Based Link Analysis” by James Caverlee (Texas A&M University) and Ling Liu (Georgia Institute of Technology) at PODC’07 today. PageRank,TrustRank and HITS all couple link credibility and page quality, which isn[...]

