Identifying the Influential Bloggers in a Community
This paper was presented at WSDM 08 by Agarwal et al. from NEC laboratories. ”Identifying the Influential Bloggers in a Community” can be read at the ACM.
They look at the very important area of research concerning how we deal with the huge amount of data generated by bloggers and how we rank these blog posts.
I’ve presented you with a short summary of the main points:
Whether a blogger is active or not does not necessarily mean that s/he is not influential. Very active bloggers can be influential and just as easily not. The influential ones however are very important because they can help companies in developing new business ideas, identify key concerns and trends, competitive products,…Bloggers can become product advocates, and basically, they are market movers. The blogging on the recent US electoral campaign shows how bloggers can have influence over social and political issues also.
The researchers say that 64% of companies have identified the importance of the blogospere for their business. Instead of trawling through endless posts in the relevant community, the best entry point are the most influential posts.
Technorati reports a 100% increase in the size of the Blogosphere every month. This is huge and means that methods need to be developed in order to deal with this enormous amount of data.
You can’t (as we’ve seen before
) use PageRank
or whatever method applied to search engines for the Blogosphere, because the blogs are sparsely linked, and the Random Surfer model just doesn’t work for this. Web pages can gain authority over time, but this is not necessarily true of Blogs. As they say, a blog post and a bloggers influence actually decreases over time. This is because even more sparsely linked posts come into existence.
They say that there is research going on regarding ranking on topic similarity but this is still very much on the drawing board right now. They say that you could use traffic information, number of comments and more of these kinds of statistics, however you’d be leaving out all of those inactive bloggers.
They identify 4 groups of bloggers:
“active and influential, active and non-influential, inactive and influential, and inactive and non-influential”. They create an influence score based on whether the blogger has any influential posts.
You’re influential in the following circumstances (obviously you could probably add quite a few more):
- Recognition – An influential blog post is recognized by many.
- Activity Generation – A blog post’s capability of generating activity (comments, follow-up discussions…)
- Novelty – Novel ideas exert more influence (lots of outlinks means that the post is not novel)
- The blog post length is positively correlated with number of comments which means longer blog posts attract people’s attention.
Active & influential:
“‘Erica Sadun’ submitted 152 posts in the last 30 days, among which 9 of them are influential, attracting a large number of readers evidenced by 75 comments and 80 citations”.
Inactive but influential:
“‘Dan Lurie’ published only 16 posts (much fewer than 152 posts comparing with ‘Erica Sadun’, an active influential blogger) in the last 30 days”.
This is a very good example of a paper addressing the issues we’re encountering in Blog post retrieval, categorisation and so on. It is a very very important area of research and needs imho to receive a lot more attention and budget dare I say