A very useful and eye-opening paper crossed my desk called “Nullification test collections for web spam and SEO” by Jones and Ramesh, Hawking and Craswell from Canberra University in Australia. They want to encourage the compilation of a large corpus for adversial IR research. CMU are bu[...]
Posts Tagged ‘web spam;’
Clickstream spam detected
Clickstream analysis is a basic form of metric used to determine how much traffic comes to a site and some analysts also look at the quality of the traffic using this metric. There is more research being done into clickstream analysis because it is littered with noise, has a very high dimensionali[...]
SEO = Adversarial IR
SEO is more than often classified as an “Adversarial information retrieval” technique in the computing world. I say this because AIRWeb for example consider “malicious attempts to influence the outcome of ranking algorithms, aimed at getting an undeserved high ranking for some it[...]
CredibleRank
I thought I’d share ”Countering Web Spam with Credibility-Based Link Analysis” by James Caverlee (Texas A&M University) and Ling Liu (Georgia Institute of Technology) at PODC’07 today. PageRank,TrustRank and HITS all couple link credibility and page quality, which isn[...]
Corpus for nasty web spam
Researches who study webspam are limited by the lack of corpus available. There is one that gets used quite often called “WEBSPAM-UK2007“, released by Yahoo. There’s also the 2006 version. It’s really useful but as they say, it was generated to aid the researchers so it[...]

