*Before you read, a clarification – I’m aware that Sphinn is working on a newer version, and also that they use editors, mods and user interaction for spam fighting – this post is about those techniques and their limitations, and also introduces some new ideas*
Sphinn has been swamped with spam recently, I’ve seen a lot of it myself and it’s been reported by other users, including Zigojacko. What’s up?
Although Sphinn is small in comparison to Digg, it uses the same kind of system. People submit stories and they get voted by other members. The posts with the most votes go to the “Hot topics” page, which is also the content that you’ll get in your feed. Basically, it’s the same thing. People post advertising for their products and services instead of information rich resources that can be shared with the community. It’s also a drain on resources. All in all, a nasty thing that needs to be dealt with.
Ways that this problem can be solved include:
- Having a human spam editor
- Getting users to flag spam
- Captcha (not effective for human submissions)
- Relevance rank
- and finally…personalisation.
Having a human spam editor isn’t ideal in a very dynamic environment like Sphinn. It works for Wikipedia, but it moves at a much slower pace. Captcha is only useful for deterring bots (although some can break captcha now). Moderation uses human resources and is time consuming, Tamar and Danny at Sphinn make it clear in the Zigojacko thread. Moderators should not have to clear out the spam anyway. That leaves…
Digg already announced at Web 2.0 expo that they were working on a personalised front page. This means that, yes, you might still get spam on your front page, but it’s not really going to be worthwhile for the spammers, seeing as their audience becomes very small all of a sudden. You get to moderate your own “front page”, and in this sense, I guess something like Twine
is worth a look (I really like Twine btw).
There is a way for spammers to use this to their advantage though, and this would be through social network monitoring, to detect where their interest group is, and then target them in some way, like with paid for ads. It is still tricky though.
Most people will be aware of this, it’s basically ranking results by relevance, but first you have to decide what’s relevant.
On Sphinn, new submissions come in on the “what’s new” section as they are submitted, which I like. Sometimes stuff I’m interested in doesn’t get many votes and they’d be buried before I could come across them (not “find” – I’m never looking on Sphinn, I’m browsing). This section is easily spammed and to be honest it’s not as bad as I’ve seen it elsewhere.
There has to be a filter as stories come in to minimise spam at this level. One way to do it would be to use a topic detection algorithm and train it on a clean already existing Sphinn corpus. The system can draw patterns from the training data which help it label a submission with “Sphinn” or “Foe”. The patterns will be numerous! A cool by-product is a way to visualize the community.
This type of method needs to be flexible as well though, otherwise if you used an unconventional title for example, or weird words, your submission would be chucked out. The more you train it the better it gets and I would define Sphinn as a closed environment, which makes the problem easier to deal with. There are only so many categories. It’s not as difficult as tracking spam in a global engine. On top of that you could take into consideration user interaction to solidify your method.
Or failing all of that, we could beg: “Spammers, please please stop peeing in the beer”.