Over-optimised sites nullified

A very useful and eye-opening paper crossed my desk called “Nullification test collections for web spam and SEO” by Jones and Ramesh, Hawking and Craswell from Canberra University in Australia.

They want to encourage the compilation of a large corpus for adversial IR research. CMU are building one right now called web09-bstmini rdf Over optimised sites nullified. The authors think that it needs to be improved though. Their method is about nullifying sites rather than removing them from the index.

I have always been acutely aware of the issue involving good informative sites that are not optimised being steam rolled by ones that are.

This not to say that all optimised sites are spam due to over optimization (especially compared to non-optimised sites), but that they affect rankings and they may not always be the best results.

The bad techniques we all know about such as link spam, keyword stuffing and so on are classed as web spam. SEO is classified as positive in the way that the pracise involves streamlining pages, but negative when it in involves over-optimisation. It is not easy to make that distinction though.

They mention the Stanford WebBase Projectmini rdf Over optimised sites nullified which conducted monthly crawls in 2008/2009 ranging from 61 million to 81 million pages. Web09-bst has a 25 terabyte dataset of about 1 billion web pages crawled in November, 2008. Both contain spam.

They discuss the performance of PageRank, Robust PageRank, TrustRank and Anti-TrustRank. They also discuss the use of standard IR metrics such as MAP, NDCG and infAP.

Here are some snippets from the paper, it’s freely availablemini rdf Over optimised sites nullified so you can benefit from it with minimum involvement from me:

“To motivate the idea of nullification as opposed to removal, and to demonstrate that not all content that complicates ranking is also spam”

“…achieving good search results requires the nullifcation of the the thousands of template-driven links and their anchor text.”

“…research into nullifying the negative effect of spam or excessive search engine optimisation (SEO) on the ranking of non-spam pages is not well supported…”

“We introduce the term nullifcation which we see as preventing problem pages from negatively affecting search results”.”

“Research oriented toward measuring the adverse effect of spam and excessive SEO on search engine users cannot be conducted in the absence of sets of realistic queries and corresponding judgments. When selecting queries for evaluation of spam nullification, it is important to select queries of high interest to spammers”.

This last comment would also point towards using highly popular and competitive search terms for SEO’s. While I am in total agreement with the fact that over-optimisation is a serious problem for rankings, I am also of the opinion, as the authors are, that sensible SEO which improves pages for the user as well as the engines is beneficial. The sites that do not get on board need to, and this is simply a natural development of life on the web.

Related Posts:


2 Comments Add Yours ↓

  1. 1

    That’s really interesting, CJ. Isn’t that where trust-based networks really show their worth? If a SEO’d site can drown an non-SEO site in search, but the non-SEO site is the one every real person talks about, then that provides a compensating advantage for the “real” site, right?

  2. CJ #
    2

    Definitely, trust is a very important factor indeed but it’s not that easy to measure. There’s issues with gathering user data for obvious reasons, but the logs are telling. Capturing browsing data is very valuable, and then I think using the visibility of the site through the social media landscape is very important as well – lots of work to be done in ranking still!


1 Trackbacks/Pingbacks

  1. Topics about Seo » Science for SEO 22 04 09

Your Comment






© 2009-2013 Science for SEO All Rights Reserved -- Copyright notice by Blog Copyright

SEO Powered by Platinum SEO from Techblissonline

Twitter links powered by Tweet This v1.8.1, a WordPress plugin for Twitter.